Incident management

Process

The purpose of this document is to outline, at a high level, the process together with the roles and responsibilities to execute, manage and govern the DOE ITD Incident Management Process. This document describes the overall positioning, break down and resources needed to complete the Incident Management Process.

The goal is to provide a level of standardisation and control that is critical for reasons related to customer experience, regulatory and standards issues, efficiency / effectiveness, monitoring and reporting of the process.

The intended audience are those stakeholders throughout DOE that have a role to play in this process either directly or indirectly as dependent stakeholders. As well as those stakeholders who are interested in the process for execution and governance reasons.

This process document inherits the definitions contained in the ITD SMO Glossary for Managing Services.

This document is created to provide a framework that underpins the execution and governance of this process. It is the definitive reference material for this process.

1.1. Process overview

This process document articulates the why and what practices are required for the DOE Incident Management Process to operate. Detailed procedures and work instructions describing how to apply this process are available within the SMO Intranet page and Remedy knowledge base.

1.2. Process Purpose

The purpose of incident management is to restore normal service in the shortest possible time with the smallest possible impact on business activity of the Department and the customer.

In doing so, this process will:

  • Minimise impact to business operations
  • Align IT Incident Management activities to real-time business priorities
  • Provide capability in identifying business priorities
  • Ensures consistency and effectiveness of support and the coordination of Incident Management activities

1.3. Incident Definition

For further information please refer to the Incident Management Incident Definition page.

1.4. Guiding Principles

In designing this process, the following guiding principles were applied:

  • Purpose: restore normal service operation as quickly as possible and minimise adverse impact to the Department
  • Usage: all staff are to ensure incidents are logged, categorised, investigated and resolved consistently end-to-end
  • Management technology: all incidents will be logged and managed in the ITSM technology, namely, Remedy Incident1
  • Measuring: agreed response and resolution targets will apply and be measured across all teams
  • Incident Ownership:
    • each incident will have one assigned owner throughout its incident lifecycle
    • incident analyst assignee becomes the incident owner and is accountable for the customer experience
  • Good customer service: engagement and communications will drive incident management and support.

1.5. Process Objectives

The objectives of the Incident Management Process are to:

  1. Align Incident Management activities, priorities and outcomes with those of the business,
  2. Match incident management in an end-to-end manner to support ITD Service Catalogue delivery of service to the ITD customers,
  3. Implement an Incident Management measurement framework that ensures:
    • governance & management mechanisms are tracking incidents to agreed service targets,
    • produce process reports to benchmark current performance, support continual process improvement and integrate to the ITD CSI role.

1.6. Scope

The scope of this process includes:

  • Any event which disrupts, or degrades, a service. Events that have been communicated directly by customers, either through a Service Desk or through an interface from Event management to Incident Management tools.
  • The adequate classification and prioritising of incident records;
  • Integration with other ITD processes including Incident sub processes.
1.6.1. Incidents escalated to DOE suppliers

Requests directed to DOE suppliers must be managed in accordance with the DOE Incident Management Process. Where support is provided by a DOE supplier, and related to services underpinning the DOE production environment, then requests must be:

  • Updated and resolved by the supplier – if access is available to Remedy Incident;
  • Updated and resolved by an ITD or DOE support group – MUST have access to Remedy Incident.

This Incident Management process applies to the management of any incident that occurs in relation to a product or service delivered to customers by any of the Service Management organisations.

The Incident Management process is governed and owned by the Service Management Office and the Incident Management process owner.

DOE ITD Leadership group, Regional ICT Managers and Service Management leadership are accountable for ensuring that their respective managers, team leaders and support staff follow the guidelines within this standard statement and the procedures derived from it.

2.1. Single Incident Management process

A single Incident Management Process will be used to manage incidents to resolution and is based on best practice for the Service Management industry.

The reason for this standard is to:

  • Ensure consistent handling of incidents so that the customer experience is predictable;
  • Ensure all incidents reported are recorded, tracked and managed to completion;
  • Enable quality assurance and the continuous improvement of a single process;
  • Assist in providing quality data from a single source for reporting and review.

2.2. Single Incident Management toolset

All incidents reported by customers or detected via event monitoring will be logged within the Service Management toolset.

The reason for the standard is to:

  • Ensure all incidents are recorded, classified and managed to completion;
  • Provide opportunities for automation, continuous improvement, improved customer satisfaction, knowledge management and service enhancement.

2.3. Incident unique identifier standard

Each incident will be allocated a unique identifier and that identifier is to be communicated to the customer.

The reason for the standard is to:

  • Be able to uniquely identify every incident so that the customer obtains rapid response to inquiries.

2.4. Incident Logging standard

Each Incident logged will follow a standard logging process (refer training material) and be supported by a toolset with a defined work flow.

The reason for the standard is to:

  • Provide a ‘best practice’ workflow to allow Service Management staff to follow the same process across the entire organisation ensuring the customer context and the product/service are captured accurately;
  • Capture the context of the customer to drive positive customer outcomes;
  • Provide a product/service focused approach to Incident logging, aligning Incident Management with the Service Catalogue which facilitates Service Level management, trending and problem management activities within the Service Management organisation;
  • Support and underpin Knowledge Management activities within the Service Management Organisation;
  • Integrate with the Service Management Service Catalogue.

2.5. Incident classification standard

All incidents will be categorised and prioritised when recorded. Priority will be determined by assessing business impact and business urgency of the incident.

The reason for the standard is to:

  • Ensure that customer expectations are understood, agreed and delivered;
  • Enable establishment of trends based on incident types and their frequency.

2.6. Incident assignment standard

In the event that an incident cannot be resolved to the satisfaction of the customer by the Service Desk or front line Support group, it will be escalated to the most appropriate Support Team with the right skills to provide timely resolution.

The reason for the standard is to:

  • Ensure the necessary knowledge and skills are provided for timely resolution;
  • Ensure all incidents are dealt with in a timely manner.

2.7. Incident ownership

The analyst that is assigned the incident becomes the incident owner and is accountable for the customer experience. This includes updating the customer on Incident progress and setting an expectation for service restoration. The Support group manager or Service Desk manager is responsible for monitoring of all incidents against Service Level Objectives and for management escalations as required.

The reason for the standard is to:

  • Ensure a customer focus is maintained throughout the life of the incident;
  • Effectively manage the incident management process on behalf of the customer.

2.8. Support Teams (groups) standard

All members of nominated support groups within the Service Management toolset are responsible for detecting incidents assigned to the group, accepting such incidents, responding, updating and resolving them within service level targets. They are also responsible for the response and resolution within Service Level Objectives (SLOs).

The reason for the standard is to:

  • Ensure effective incident and queue management by requiring all support staff to take responsibility for incidents assigned to their group;
  • Effectively manage incidents to meet customer expectations;
  • Ensure a consistent customer experience;
  • Ensure that the knowledge community has timely access to high quality and valuable knowledge articles (KA).

2.9. Incident escalation standard

Incidents will be escalated when they are in danger of breaching SLO for response and/or resolution to ensure that every opportunity for resolving the incident within SLO has been achieved.

2.10. Incident Resolution standard

Incidents are to be resolved when the customer is satisfied that all practicable actions have been undertaken.

The reason for the standard is to:

  • Manage customer expectations;
  • Ensure customer satisfaction with the service;
  • Enable identification of improvements to the incident management process.

2.11. Customer Identification standard

All Incidents have been recorded against a specific customer. The reason for this standard is to:

  • Delivery a unified process for identifying the customer ensuring uniformity across the entire Service Management organisation;
  • Automate Identity integration and management, providing synchronised identity management between the toolset and identity directory (Active Directory);
  • Ensure customer contact information is accurate and current facilitating Incident Management activities that are not resolved at first point of contact
  • Provide every opportunity to meet agreed service levels;
  • Underpin communication channels through accurate customer information allowing communication via telephone, mobile phone, email and in person channels.

2.12. Incident Status standard

Incidents status can be set against specific Incident types depending on the 'Status' selected. Depending on the 'Status' selected you will in turn be able select a 'Status Reason'

  • 'New', 'Assigned' and 'In Progress' statuses do not require (nor provide) a 'Status Reason';
  • 'Pending', 'Resolved', 'Closed' and 'Cancelled' allow for a variety of 'Status Reason'.

The ITD Incident Management process MUST be used when:

  • Incidents are raised by customers and end-users for:
    • Service disruption and a degradation of service
    • Inquiry, suggestions or feedback on services
    • Comprise of the confidentiality, integrity or availability of an information resource or asset
  • And/or when Incidents are detected via event monitoring and/or alerts.

The process is NOT to be used to:

  • To track or resolve issues with testing of ITD Business Service Offerings under development.

If there is doubt as to whether an incident lies within the scope of Incident management process, the record should still be logged and advice should be sought from the Incident Process Owner.

3.1. Process Triggers

The process is triggered through the following means:

3.1.1. DOE customers and end-users

Customers and staff can logged inquires and incidents using a number of communication channels:

  • DOE Insight portal (Kinetic), Email, Phone, Fax and In-Person.
3.1.2. Event Management and Alerts

Event management will allow the automation of incidents being raised and escalated to the correct support team. Incidents may also be logged by support staff when system alerts are received that require action from supporting teams.

Print version for process incident management (PDF)

Procedure

The purpose of this document is to document the procedures required to execute, manage and govern the DoE ITD Incident Management Process.

The intended audience are those stakeholders throughout DoE that have a role to play in this process either directly or indirectly as dependent stakeholders. As well as those stakeholders who are interested in the process for execution and governance reasons.

This procedure document inherits the definitions contained in the ITD SMO Glossary for Managing Services.

This document is created to provide a framework that underpins the execution and governance of this process. It is part of the definitive reference materials for this process.

1.1. Procedure overview

This procedure document articulates how the DoE Incident Management Process will operate. It describes where, who and when activities will be managed through the process lifecycle. Detailed work instructions describing how to execute this process are available within the SMO portal and Remedy knowledge base.

1.2. Procedure overview

The Incident Management process, together with the Remedy technology, underpins the identification, logging, categorisation, investigation and resolution to ITD customers in a common and consistent manner.

The Incident Management process supports the ITD adoption and standardisation on the industry good practice service management framework. This framework gives the necessary standard foundations to manage the delivery and support of services from a Service Provider to its customers.

The process consists of the following 6 lifecycle activities:

  1. Detection and Recording;
  2. Classification and Prioritisation;
  3. Initial Diagnosis;
  4. Investigation and Diagnosis;
  5. Resolution and Recovery;
  6. Closure.

Please refer to the Incident Management process overview diagram.  

1.2.1. Detection and recording

An event becomes an incident when it is detected and recorded in the DoE standard Service Management toolset. Incidents may be recorded by:

  • A service desk support analyst receiving a phone call or other notification such as email or fax;
  • A support analyst receiving a phone call, email or fax requesting support;
  • An affected user creating an incident ticket directly using a self-service electronic form;
  • A system monitoring tool automatically generating an alert that results in an incident being recorded in the DoE standard service management toolset;
  • A systems Engineer, or a support analyst at any level in the organisation detecting an actual or potential system fault.

Upon notification, all incidents will be recorded in the DoE standard service management toolset. An important activity at this stage is to validate the customer and review their contact details are up to date.

1.2.2. Classification and prioritisation

Incidents will be initially categorised by:

  • Recording that the event was an incident;
  • Recording the product;
  • Recording a Query Type (Categorisation);
  • Recording an appropriate Summary of the incident then providing a full description in Notes including any error messages that the customer may have captured;
  • The toolset provides a date and time for the incident when logged into the Service Management toolset.

Incidents will be initially prioritised by a combination of business urgency and business impact.

1.2.3. Initial diagnosis

Initial diagnosis will involve searching the knowledge base for relevant articles and searching the incident data base for duplicate and/or related incidents.

If a knowledge article exists it will be used to resolve or escalate the incident, as required. If a knowledge article does not exist, then a framed knowledge article will be created and initial diagnosis undertaken and recorded. If a solution is found then the customer is informed, the knowledge article updated and the incident closed. If at this stage a solution is not found or when the 30-minute target time for first point resolution has been exceeded the incident will be escalated to a support team.

1.2.4. Investigation and diagnosis

Investigation is undertaken when a solution cannot be immediately found. Here the analyst has to decide whether to attempt resolution on his own or escalate to the most appropriately skilled support group.

The analyst to whom an incident is assigned is expected review and accept the assignment. If not, the incident should be re-assigned to the most appropriate team. If accepted the analyst will respond to the assignment within the response target in the OLA. Investigation is likely to include some or all of the following:

  • Establish what has gone wrong or being request by the user;
  • Understand the chronological order of events;
  • Identify any events that could have triggered the incident;
  • Knowledge searches looking for previous occurrences in incident/problem records or knowledge databases or manufacturers’/suppliers’ error logs or knowledge databases.

If a solution cannot be found and the service level target has not been breached, then the incident should be escalated to the most appropriate support team or vendor.

If the incident is in danger of breaching its service level target, then the incident will be escalated to the support group team leader or manager for remedial action.

1.2.5. Resolution and recovery

An incident is resolved where the service is restored.

Once a solution is found and has been tested the incident analyst will update the resolution details and categorisation, update the relevant KA and resolve the incident record. It should be noted that the resolution details will be communicated to the customer and hence should be expressed in customer understandable language.

It is important that any recovery requirement be identified and actioned prior to resolving the incident. These should also be documented in the resolution details.

1.2.6. Closure

Following incident resolution, the affected customer receives an automated resolution notification.

  • If the customer is satisfied with the resolution they need do nothing more as the record will be automatically closed after a grace period of 14 calendar days.
  • If the customer is unsatisfied with the resolution they should contact the relevant Service Desk, Help Desk or Support Group with the original incident record number and have the incident updated with the reasons why the call was not rectified. The incident will then be re-assigned back to resolving group for further investigation and resolution.

1.3. Key process interfaces

The main interfaces between Incident Management and other ITD SMO ITSM processes are:

1.3.1. Service request management (SRM)

SRM may trigger Incidents where a service request is incorrectly categorized requiring conversion to an Incident.

1.3.2. Problem Management

Incident Management provides a triggering event for Problem Management to be initiated. Problem Management uses incident trending information to proactively identify potential problems. It also provides workarounds and eliminates causes of incidents to assist in rapid resolution of incidents, thus contributing to increased availability of services.

1.3.3. Knowledge Management

Assists Incident Management by iteratively providing knowledge on solut ons and workarounds. New incidents may trigger new knowledge requests via the Knowledge Management process. Early Life Support and Knowledge Transfer aim to underpin changing or new services with known errors ready for the service initiation and potential incidents.

1.3.4. Change Management

Assists Incident Management by providing the Service Desk with information on current and future change activity, as well as change history. It provides authorised and controlled implementation of changes and providing up-to-date information on progress of changes. Incident Management uses Change Management to implement changes to restore services and provide traceability for all activities.

1.3.5. Configuration Management

FYI: Configuration Management is currently not a formal and operationalised process. Assists Incident Management by providing valuable information about the CIs contained within the CMDB. The relationships and dependencies that exist between the CIs (and related People, Services and Service Management) provide up-to-date information status of CIs to quickly and efficiently assist in the process of diagnosis and resolution.

 

2.1. Taxonomy

2.1.1. Taxonomy

Taxonomy (from Greek taxis meaning arrangement or division and nomos meaning law) is the science of classification, naming conventions, according to a pre-determined system, with the resulting catalogue used to provide a conceptual framework for discussion, analysis, or information retrieval.

2.1.2. Record Type

The Record Type helps classify the nature of calls received from customers. ITIL has segmented calls as Incidents and Service requests. The record type is automatically populated based on the selection of the query type.

2.1.3. Query

A query is the conceptual framework defining what the customer is contacting the Service Management organisation for. In essence, the query is the nature of the call, from the customer's perspective.

Queries allow for global reporting eg: How many logon queries are received annually on all web applications? As all Applications and Systems require a logo we can use the Query as the primary reference and then filter down by the call type.

Queries define, and will automate the selection of the Record Type, which defines whether the call is a Service Request or an Incident.

  • Billing | charging
  • Functionality (how to)
  • Policy | process | procedures
  • Booking | Scheduling
  • Install | Uninstall | Upgrade
  • Procure | Loan | Evaluate
  • Breach | Complaint | Dispute Content | Data
  • Logon | Access
  • Reporting
  • Documentation | Templates | Forms
  • Lost | Damaged | Stolen
  • Review | Audit | Test
  • Enquiry | Suggestion | Feedback
  • Move | Add | Change
  • System Message | Error Message
  • Fulfilment | Reconciliation
  • Performance
2.1.4. Product Categorisation Tiers

The Product Categorisation tiers are a broad tier based approach to grouping the types of records that are raised with the Service Management organisation.

The desired objective is to limit the Product Type tiers to only two tiers, with the option of a third tier if absolutely necessary.

The product types will be used to align to the service catalogue, segment the knowledgebase, empower trending and problem management and provide the ability to AutoRoute records.

Please refer to the product categorisation page for further information.

2.1.5. Product Name

The Products are organised and arranged within a three tiered approach as outlined above. The Product names are structured using the above tiers.

1.1. Incident Prioritisation

1.1.1. Impact Assessment

Business Impact is generally based on the number of sites and/or customers affected by the incident. The following table provides some guidance in determining the Business Impact.

Please refer to the Business Impact page for further information.

To assist with gaining a consistent understanding of business impact it might be helpful to inquire of the customer the following questions:

  • Are you the only person impacted?
  • If not, how many others at your location are impacted?
  • And from within the Service Desk, Help Desk or Support Group:
  • Am I able to access the service?
  • Are there many calls coming in reporting the same problem?
1.1.2. Urgency Assessment

Business urgency is based on the customer being able to complete a business activity and can be characterised using the following table:

For further information refer to the Business urgency page.

1.1.3. Priority Matrix

The incident priority will be set using a combination of impact and urgency described in the following table.

For further information refer to the incident priority page.

Any incidents that have a critical priority, the support analyst must escalate the incident to their supervisor who would confirm the priority and immediately notify the duty Major Incident Manager. The incident will be handled according to the provisions of the Major Incident Management Process.

1.1.4. Priority Level Response Resolution

Incident management service level objectives are based on priority and cover response targets, customer updates and restoration targets.

For further information refer to the Priority Level Response Resolution page.

The Impact/Urgency matrix below is used to determine the priority of an Incident.

1.1.5. Incident Resolution Codes

The reason the customer had the query and what action was performed to either resolve or fulfil the incident or request.

  • Additional resources
  • Advice | Info | Consult
  • Backup | Restore | Recovery
  • Cancel | Disable | Remove
  • Capacity Related
  • Change Related
  • Comms | Notification
  • Configuration
  • Connectivity
  • Duplicate Call Removal
  • Enhancement Request
  • Fault | bug | conflict
  • Functional Limitation
  • Human Error
  • Maintain | Repair | Replace
  • Malware
  • Migration | Relocation
  • Non IT Cause
  • Outage
  • Patch | Update | Development
  • Provisioning (Fulfilment).
  • Process Related Issue
  • Records Management
  • Release | deploy | install
  • Reset | Re-activate
  • Service Request Fulfilment
  • System Alerts & Monitors
  • Testing
  • Training | Assist
  • Transfer | Refer
  • Uninstall/Reinstall
  • User Provisioning

 

The incident management process requires a number of specific roles with responsibilities. They include Incident Analysts and Technical support analysts providing response and resolution services and support team leaders/managers and service delivery managers for escalations where appropriate.

3.1. Process owner

The Incident Management process owner has responsibility for:

  • Ensure Incident Management is fit for use, fit for purpose and scalable to be adopted across whole of DoE and TAFE.
  • Ensuring that the process consistently achieves the purpose and objectives. Producing incident management information for operational support and performance, customer engagement, stakeholder management, process adherence and governance;
  • Monitoring the effectiveness of incident management;
  • Coordinate internal quality reviews to identify gaps in process compliance;
  • Development of a continuous improvement plan for incident management process;
  • Ensure alignment of the incident management process with all other processes.

3.2. Technical support manager/team leader

The Technical Support Manager/Team Leader can be covered by a number of functional roles, but not limited to ITD Team Managers/Team Leads, Regional ICT Team Managers/Team Leads, TAFE Institute ICT Support Team Managers/Team Leads, Shared Service Centres Team Managers/Team Leads, TAFE Learning and Business system support teams Team Managers/Team Leads, etc. A technical support team leader or manager has responsibility for the following activities:

  • Overall queue management
  • Responsible for calls escalated to their team;
  • Responsible for allocating incidents within their team and ensuring they are dealt with in an appropriate order;
  • Responsible for ensuring that all are responded to within OLA;
  • Responsible for ensuring all calls are resolved within Service Level Objective (SLO);
  • Responsible for escalations where any calls are likely to breach SLO;
  • Management of the customer experience and the resulting customer satisfaction.

3.3. Service desk manager

The Service Desk Manager can be covered by a number of functional roles, but not limited to the leadership of the Service Desk, Service Delivery team, a Shared Service Centre, TAFE Institute Customer Services manager or Institute Help Desk team leader. Their responsibilities include:

  • Queue management of incidents reported to their Service Desk or Support Group to comply with OLAs and customer SLOs;
  • Management of the customer experience and the resulting customer satisfaction;
  • Managing the work of incident support staff including second and third level support;
  • Managing compliance to knowledge management components of incident management.

3.4. Technical support analyst

The Technical Support Analyst role can be covered by a number of functional roles, but not limited to ITD Technical Support teams, Regional ICT teams, TAFE Institute ICT Support, Shared Service Centres, TAFE Learning and Business system support teams, etc. Their collective responsibilities include:

The technical support analyst has responsibility for:

  • Responding to all calls assigned to them within the service level target in their OLA;
  • Resolving all calls assigned to them within service level targets in the SLO;
  • Updating and reviewing KA as and when appropriate;
  • Creating a framed KA where one does not exist, update a KA where appropriate and/or rate a KA and/or mark when a KA requires a review;
  • Updating the work info section of the incident record with details of any investigations and the results that were undertaken;
  • Updating the resolution details of the incident record when remediation and recovery are complete;
  • Adding value to the customer experience and resulting customer satisfaction.

3.5. Incident analyst

The Incident Analyst role can be covered by a number functional roles, but not limited to the EDConnect Service Desk, Regional ICT Staff, TAFE Institute Help Desks, Shared Service Centres, TAFE Learning and Business system support teams, etc. Their collective responsibilities include:

  • Recording all relevant incidents and/or service request details, contact details, allocating categorisation, products from the Service Catalogue and business impact and urgency to establish a priority;
  • Providing first level investigations and diagnosis including searching for relevant KA and similar incidents;
  • Creating Framed KA where one does not exist, a KA where appropriate and/or rate a KA and/or mark requires review;
  • Resolving those incidents or service requests where they are able;
  • Escalating incidents or service requests that cannot be resolved within agreed timeframes;
  • Communicating with users keeping them informed with progress and impending changes;
  • Adding value to the customer experience and resulting customer satisfaction;
  • Closing all resolved incidents or service requests;
  • Conducting customer satisfaction call-backs or surveys.

3.6. Customer or end-user

The Customer or end-user is responsible to:

  • Initiate Incidents via the approved channels
  • Supply accurate and complete information
  • Clarify and validate incident information when required;
  • Confirm successful resolution of incidents

3.7. RACI (responsible, accountable, consulted & informed) matrix

Please refer to the Incident Management RACI for further information.  

4.1. Level 1 process flow

Each Process flow illustrates the responsibilities for each of the Levels within the Incident Management process. Each of these steps are explained in section 4.2.

ncident Management Level 1 workflow 

4.2. Incident management activity descriptions

THe Incident management activity descriptions page lists the specific activities as referenced in the Incident Management workflow diagram and provides some level of detail as to how these activities are performed.

4.3. Level 2 process flow

Each of these steps in this procedure are explained in section 4.4.

ncident Management Level 2 and 3 workflows 

4.4. Incident management activity descriptions

THe Incident management activity descriptions page lists the specific activities as referenced in the Incident Management workflow diagram and provides some level of detail as to how these activities are performed.

 

Please refer to the Incident Management Process critical success factors and KPIs page for further information.  

Return to top of page Back to top