Review incident metrics
Once an incident is resolved, stakeholders can review information and metrics about the incident through the incident console. This allows you to clearly see the incident's key details, such as time to detect, time to acknowledge, and time to resolve, to help perform a comprehensive analysis of an incident's life cycle. Metrics like status by duration and duration by severity are displayed in a graph to provide a visual representation of the information. You can export the data into a spreadsheet at any time during the incident's lifecycle for additional updating, reporting, or filing purposes.
To view the console of a resolved incident, click the name of the incident you want to view in the Incidents list. A resolved incident's console displays the incident metrics and other key details. Some details can be edited (such as the incident summary, description, and impact duration) to provide a more accurate postmortem report. The console for a resolved incident displays the following details:
The summary appears as the title of the incident on the console and in the Incidents list. This can also be used for categorizing an incident.
The Incident ID is used to reference the incident within xMatters. This is assigned to the incident either by xMatters, or by an automatic process during incident creation.
This indicates the incident's current status within the incident resolution process.
Time to Detect is used to identify how long it takes from when a problem is first noticed by a person or monitoring tool, to when the incident resolution process began. This metric is calculated as the time from impact start to when the incident was initiated in xMatters.
This metric is useful for understanding how quickly teams are able to detect an incident and start the resolution process. If this is longer than expected, it could indicate that your team's processes to identify and initiate an incident need to be reviewed.
Time to Acknowledge is used to determine how long it takes from when the incident resolution process began, to when a resolver became engaged. In xMatters, this metric is calculated as the time from incident initiation to when the incident status was updated to In Progress.
This metric helps determine resolver response time. If there is a lag between when a notification is sent and when a user becomes engaged, it may indicate that their devices need to be reconfigured to maximize efficiency. Learn more about managing devices.
The Impact Duration shows how long an incident impacted the service or business. In xMatters, this metric is calculated as the time between when the incident was initiated to when its status was changed to Mitigated (or Resolved, if you bypassed Mitigated). If the impact started before the incident was initiated, you can edit it to reflect this.
This important metric not only helps to determine the efficiency of your incident resolution process as a whole, but also highlights the total degradation your customers experience. It includes the time it takes for teams to detect, diagnose, and repair issues. Ideally, this metric should decrease over time.
Time to Resolve identifies how long it takes from when an incident was detected, to when it was resolved. In xMatters, this metric is calculated as the time from when an incident was initiated to when the incident status was updated to Resolved.
This metric indicates the amount of time teams are spending on repairing issues outside of the total Impact Duration experienced by your customers. Time to Resolve also helps to show how often, and for how long, your services are experiencing issues. If this is an increased amount of time, it may indicate further architecture work needs to be done on your systems.
Duration by Status shows how much time an incident spent in each status (for example, Open, In Progress, or Mitigated), giving you a more detailed picture of the resolution process, how teams progressed, and if there were any unexpected issues. For example, if an incident was Open for a long time before teams start working on it, it could indicate an issue with notifications, on-call scheduling, or your team's workload. Or, if it was Mitigated for an extended period before being fully resolved, it might mean there was additional follow up work that needed to be done to restore the services.
Duration by Severity shows how much time an incident spent in each severity, (for example, Medium, High, or Critical). Use this metric to gain a better idea of how an incident affected your services, and what the peak severity was.
Ideally, an incident's severity level should decrease as its lifecycle progresses. If it doesn't, you can use severity updates to find out why, and evaluate what can be done to avoid similar issues reoccurring during future incidents. Duration by Severity can also be used to monitor and compare different incidents. For example, if multiple higher severity incidents occur over a short period of time, it may suggest that your systems need further analysis to resolve any underlying issues or architectural problems.
The incident description field is used to provide more detailed information, such as the background about events leading up to the incident or the affected services.
The roles indicate who is overseeing or performing different tasks in the incident resolution process. As these can be edited during an incident's lifecycle, they can also be used to indicate when and why responsibilities were transferred.
The collaboration section includes details of any collaboration channels or applications that were active during an incident's lifetime. Use this section to reference any conversations or updates that happened outside of xMatters.
The Timeline is used to understand how an incident progressed in xMatters. This is displayed as a chronological list of when the incident status or severity was updated, when and how recipients were notified during the incident, and how they responded.
This metric allows you to pinpoint important moments during an incident's lifecycle, see updates on how it progressed, and how resolvers contributed to the incident resolution process.
You can export an incident's details as a spreadsheet to perform further data analysis or update stakeholders at any time during its lifecycle. If an incident is in the Resolved status, the metrics are also included in the export.
The spreadsheet includes three tabs:
- The Incident Summary provides an overview of the incident, with details like the peak severity level and overall impact duration.
- The Incident Details show information based on how resolvers responded to the incident, including the user who originally initiated the incident, how long it took users to acknowledge and respond to notifications, and any collaboration channels used.
- The Timeline displays all the status and severity updates, resolver responses, and comments made during the lifetime of the incident.
To export the incident:
- On the Incidents screen, click the name of the incident you want to export the metrics and details for.
- Make sure the basic details (such as impact duration and severity) are correct; if not, click the drop-down menu or Edit next to the relevant detail to update it.
- In the top right corner of the console, click Export.