Manage and resolve an incident

Whatever your role in the incident response process—incident commander, resolver, or stakeholder—the Incident Console lets you clearly see what's going on from the moment an incident is initiated. It displays the critical information you and your teams need to stay aligned, allowing you to prioritize tasks and focus on what's important: resolving the incident.

Having everything in one place means you can easily monitor an incident's progress, collaborate with resolvers, and make relevant updates, all from the same screen. Here, we'll walk you through the entire life cycle of an incident, from hitting the initiate incident button, right through to resolution.

Step 1: Initiate an incident

When an event occurs that impact your business, there are multiple ways you can quickly initiate an incident in xMatters. The way you initiate an incident defines the initial incident, so it's important to remember that the details you enter shape the notification your resolvers receive, depending on how the incident is initiated.

For more details about how to initiate an incident, and the different ways you can do this, see Initiate an incident.

Step 2: Review an incident's information

Once you've initiated your incident, you can review and update its information in the Incident Console. The Incident Console displays the key details about the incident, and includes the overview information that was entered when the incident was initiated (like summary, description, incident properties, impacted services, status, and severity). Other details are automatically created (like the incident ID, which is assigned either automatically by xMatters or another system as part of a flow, or manually by the initiator), but some can be edited. For example, by default the impact duration starts when the incident was initiated and ends when the incident status is changed to Mitigated (or the Resolved status, if you bypass Mitigated), but you can change this to match the actual impact duration.

The Incident Console also includes links to any collaboration channels, who is responsible for specific roles, any resolvers who were notified or have engaged, and a timeline that shows any status updates, responses, details, comments, or additional notes received during the incident.

If you'd like to learn more about what specific details mean and how to edit them, see Update incident details.

Step 3: Explore potential root causes and recent changes

The Impacted Services section of the Incident Console displays services reported as impacted by the incident, and allows you to explore a map of the dependencies between these services and others in your architecture, and identify potential root causes. Since changes are typically the main cause of incidents, the map includes the ability to view recent changes to your services to help identify what actions you should take—such as determining which subject matter experts to engage as resolvers or whether to run an automation like a failover or a rollback.

To learn more about using the incident service dependency map, see Dependencies map for an incident.

Step 4: Link to related active and historical incidents

The Linked Incidents section displays links to other active or historical incidents with their corresponding relationship types. You can glean insights from related incidents such as merging resolution teams for similar incidents, reusing steps that were used to resolve a historical incident, or reducing time to mitigate or resolve the incident by combining duplicates.

To learn more about how linking incidents can shorten your resolution time and make the incident management process more efficient, see Incident links.

Step 5: Add and manage resolvers

The Resolvers section of the Incident Console shows who the Incident Commander is, which resolvers have been notified, their response, and if they're engaged in the incident resolution process. If groups were targeted as resolvers, the names of any users who have responded will also be visible here.

For more information about managing resolvers, see Engage resolvers.

Step 6: Run automations

The Automation section provides a list of automations available for the incident. Automations are a quick way to run common tasks related to an incident—such as notifying resolvers, sending alerts, and creating Jira tickets—without interrupting your incident resolution process.

To learn about how to run automations, or add one to an incident using Flow Designer, see Run automations.

Step 7: Join collaboration channels

The Collaboration section provides links to the active chat channels and conference bridges resolvers are using to communicate.

To learn more about how to access collaboration channels, or add one to an incident using Flow Designer, see Collaborations.

Step 8: Add tasks

It's important that everyone is on the same page during an incident. The Tasks section allows teams to add and track incident related tasks from within the Incident Console so everyone understands what they need to do and when they need to do it by.

To learn more about adding and updating tasks in the Incident Console, see Manage incident tasks.

Step 9: Add attachments

Include additional resources for your resolvers by adding attachments to the incident. An attachment could be a video or screen capture of an error, or procedures that resolvers should use to help resolve an incident more quickly.

To learn more about working with attachments, see Manage incident attachments.

Step 10: Monitor and record progress

The best way to monitor an incident's progress is through the incident Timeline. This displays all the changes and comments made during the lifetime of the incident. You can filter the timeline by Notes, Resolvers, Attachments, or Updates to find specific results.

To learn more about working with the timeline, see Timeline.

Step 11: Resolve and view post-incident metrics

Once an incident is resolved, the incident metrics are added to the top of the Incident Console, so stakeholders can review how the incident progressed. This allows you see a clear summary of the incident's data and perform a detailed analysis of an incident's lifetime. Once you've reviewed the data, you can easily export it for further reporting or filing purposes.

For more information about the incident metrics, or to understand how they're calculated, see incident metrics.

Step 12: Create a Post-Incident Report

Finally, Advanced plan customers can create a Post-Incident Report. This allows you to share information about the response process with stakeholders, and help your team understand what went wrong and what can be improved on to ensure a the incident doesn't reoccur. Fill out the Analysis, Timeline, and Actions sections to complete a detailed retrospective and assign any required post-incident activity.

To learn about how to complete the sections of the report, see Post-Incident Report.