Problem Management

Process

The purpose of this document is to outline, at a high level, the process together with the roles and responsibilities to execute, manage and govern the DoE ITD Problem Management Process. This document describes the overall positioning, break down and resources needed to complete the Problem Management Process.

The goal is to provide a level of standardisation and control that is critical for reasons related to customer experience, regulatory and standards issues, efficiency / effectiveness, monitoring and reporting of the process.

The intended audience are those stakeholders throughout DoE that have a role to play in this process either directly or indirectly as dependent stakeholders. As well as those stakeholders who are interested in the process for execution and governance reasons.

This process document inherits the definitions contained in the ITD Glossary for Managing Services.

This document is created to provide a framework that underpins the execution and governance of this process. It is the definitive reference material for this process.

1.1. Process overview

This process document articulates the why and what practices are required for the DoE Problem Management Process to operate. Detailed procedures and work instructions describing how to apply this process are available within the ITD SMO Intranet page and Remedy knowledge base.

1.2. Process purpose

Problem Management framework is applied when managing the lifecycle of problem records.  A Problem is defined as an underlying (root) cause of one or more incidents or, as a proactive measure to prevent incidents occurring.

A problem that has been analysed and its root cause identified is termed a ‘Known Error’ (KE).  A problem that can be resolved through remedial action may lead to one or more ‘Change Request’ (CR).  Where a permanent corrective action cannot be applied, a suitable temporary or permanent ‘Workaround’ may be defined and applied.

The purpose of the Problem Management process describes the way in which problems are detected in a service or configuration item. This process covers the activities to:

  • Investigate the people, processes or technologies impacting the cause of incidents or potential Incident with proactive Problem Management;
  • Provide the Problem Review Board (PRB) oversight of all critical and high priority problems, as well as Problem Management process governance;
  • Determine the best solution for the Problem/Known Error;
  • Document workarounds, propose a solution and raise an CR (notably ITD SMO Change Management) if required;
  • Resolve and Close both Problems and Know Errors.

1.3. DoE problem management process goal

The goal of the Problem Management Process is to eliminate problems/defects from the environment and to prevent problems and resulting incidents from happening or reoccurring and to minimize the impact of incidents that cannot be prevented.

1.4. Guiding principles

In designing this process, the following guiding principles were applied:

  • Purpose: to eliminate problems/defects from the environment and to prevent problems and resulting incidents from happening or reoccurring and to minimize the impact of incidents that cannot be prevented.
  • Usage: all staff are to ensure incidents are logged, classified, investigated, known error control and resolved consistently and in an end-to-end manner
  • Management technology:  all problems and known errors will be logged and managed in the SMO technology, namely, Remedy Problem[1]
  • Problem Ownership:
    • Critical and High problems will be determined by the Problem Review Board.
    • Problem coordinator is determined by the problem owner and is accountable for the completion of the problem and known error lifecycle.
  • Root Cause Analysis: investigate and diagnose the problem to determine the true cause.
  • Known Error Control: record and clearly state the cause and/or workaround where it can be used by all supporting teams.

1.5. Process objectives

The objectives of the Problem Management Process are to:

  1. Prevent problems and their resulting incident from occurring within the environment
  2. Eliminate recurring incidents
  3. Minimises the impact of Incidents and problems that cannon be prevented.

1.6. Scope

The scope of the Problem Management process includes:

  • Root cause analysis using the Kepner and Fourie methodology
  • Solution (and workaround) definition selection
  • Submission of Requests for Change (RFCs)
  • Managing/understand Business Risk

1.7. Problem escalated to DoE suppliers

Problems directed to DoE suppliers must be managed in accordance with the DoE Problem Management Process. Where support is provided by a DoE supplier, and related to services underpinning the DoE production environment, then requests must be:

  • Updated and resolved by the supplier – if access is available to Remedy Incident;
  • Updated and resolved by an ITD or DoE support group – MUST have access to Remedy Problem.

1.8. Problem triggers

The two distinct forms of triggering a Problem investigation are either reactive or proactive..  Please refer to the process triggers table for description and examples.  

 

2.1. Single process

A single Problem Management process, based on service management best practices and fit-for-purpose across the Department.

The reasons for this standard are to:

  • Improve customer experience by establishing a predictable, repeatable and consistent way of managing Problem records and their outcomes across DoE ensuring a consistent Problem Management framework;
  • Ensure all problems are reported and recorded, tracked and managed;
  • Enable quality assurance and the continuous improvement of a single process against a given baseline;
  • Assist in providing quality data from a single source for reporting, review and, Service Level governance.

2.2. Problem logging

All problems, known errors, progress updates and resolution information shall be recorded in the DoE Service Management Tool and be linked to any relevant incident, configuration item (CI), known error/knowledge article and change management records.

The reasons for this standard are to:

  • Ensure all problems are recorded, documented sufficiently, monitored and managed to completion;
  • Ensure linkages between problems, incidents, Known Errors, knowledge articles and change request;
  • Provide opportunity for continuous improvement, service enhancement and improved customer satisfaction through correct taxonomy, field and status compliance and source data channels for informed decision makers and analysts;
  • Ensure that audit trails are maintained.

2.3. Critical incident management

All Critical Incidents will have a related Problem record. The Problem record shall create within the first 5 working days.

The reasons for this standard are to:

  • Decrease the likelihood of a critical incident repeating;
  • Provide a single source of data for critical impacts to DoE, the steps taken to mitigate impacts, apply workarounds, and taxonomy for reporting and informed decision making;
  • Allow for increased visibility of ITD operational risks to the ITD leadership team.

2.4. Unique problem identifier

Each problem record will be allocated a unique identifier. The unique identifier will be associated with related incident records, Service request records, known error records, knowledge article records and change records.

The reasons for this standard are to:

  • Allow easy identification and management of problems;
  • Support efforts to ensure that problems are not duplicated;
  • Allow easy association between Problems and associated Incidents, KEs, KAs, CRs and workarounds.

2.5. Problem prioritisation

All problems will be prioritised by assessing business impact and business urgency of the problem. This is not inherently defaulted by the classification of related Incident/s.

The reasons for this standard are to:

  • Ensure customer expectations are understood, agreed and delivered;
  • Provide opportunity for continuous improvement, service enhancement and improved customer satisfaction.

2.6. Problem assignment/reassignment

Problem ownership is to align to the Service Thinking model, where each Service Owner/Element Owner is assigned to the problem ownership. Any disputes about which team should ‘own’ the Problem Record and co-ordinate the related tasks, permanent corrective actions and confirm that the customer service has been restored permanently, should be referred to the Problem Review Board for the final decision.

The reasons for this standard are to:

  • Ensure that the necessary knowledge and skills are correctly allocated for a timely resolution;
  • Ensure the effective use of ITD resources to perform a comprehensive root cause investigation and the subsequent corrective actions;
  • Provide clarity of the responsibilities of the Problem Owner to reassign the Problem record to the correct resolving team.

2.7. Problem ownership

The assigned Problem Owner owns the Problem record and is accountable for managing the Problem record from its inception or receipt in a support queue, through to resolution.  The Problem Owner manages the Problem record throughout its lifecycle and oversees all aspects of investigations, record maintenance with regards to taxonomy, Problem statements, technical cause, true cause statement and status compliance, as well as all corrective actions.

The reasons for this standard are to:

  • Ensure a customer focus is maintained throughout the life of the problem;
  • Effectively manage the problem management process on behalf of the customer;
  • Provider clarity of the correct resolving group;
  • Provide clarity of the role and responsibilities of Problem Owner;
  • Ensure that escalation channels are clearly understood and utilised when necessary.
  • For problems recorded against a service, where a service owner does not exist the following guiding principle will be used:
    • Who can help facilitate the problem were a logical owner cannot be identified.
    • The problem owner should have knowledge and experience in the product/technology to take on ownership.

2.8. Support teams

All members of nominated support groups within the Service Management tool are responsible for contributing to the resolution and management of problems assigned to their group.

The reasons for this standard are to:

  • Ensure effective problem management by requiring all support staff to take responsibility for problems assigned to their group;
  • Promote group learning through collaborative problem management;
  • Ensure a consistent customer experience.

2.9. Investigation and diagnosis

All problems investigations will employ the Kepner and Fourie Root Cause analysis techniques and methodology.

The reasons for this standard are to:

  • Ensure a standard method to identify true root cause;
  • Elimination of trial and error practices

2.10. Known errors and workarounds

Problem Management is responsible for ensuring that any known errors are raised as soon as useful knowledge is available, even before a permanent resolution is found. Known Errors can be documented from a number of sources i.e. unresolved application defects prior to the go-live of an additional or changed service, any known infrastructure instabilities from an implemented change or identified during BAU diagnosis.

The reasons for this standard are to:

  • Decrease restoration times for DoE customers;
  • Reuse and repeatable workaround utilisation;
  • Avoid unnecessary or unauthorised workarounds.

2.11. Problem closure

Problems will only be closed when a documented permanent method (workaround to a KE) for reducing, eliminating the impact of a problem where a full resolution is not available or a change has been implemented successfully. An unresolved problem (unidentified root cause) is to only be closed if the Problem Review Board has accepted the operational risk to do so, applicable only for Critical problems. All other priorities the Service owner is to accept.

The reasons for this standard are to:

  • Effectively manage problems to meet customer expectations;
  • Reinforce service management commitment to problem resolution;
  • Ensure a consistent customer experience;
  • Manage risk.

2.12. Escalation management

An escalation management process will be invoked where a problem is being governed/monitored by the Problem Review Board. And where the following criteria’s have been met; No update within the Remedy Problem Record for two reporting cycles, Root Cause is known but solution implementation has missed 2 estimated implementation dates, Problem Review board recommend to have it escalated, Problem closure report has not been completed within 2 weeks without a satisfactory reason and/or a problem has been updated but does not demonstrate progress.

The reasons for this standard are to:

  • Ensure appropriate action is performed when obstacles are faced.
  • Ongoing process adherence and keeping all process governance groups informed on any constraints during the problem management lifecycle.

The purpose of the governance group is to ensure that standards and strategies are implemented and that the agreed processes are correctly followed. Governance includes endorsing a DoE fit-for-purpose framework, defining roles and responsibilities, measuring, reporting and taking actions to resolve any identified issues or conflicts.

3.1. Process triggers

Members of the Governance Group are:

Problem Review Board:

  • Keeping informed of Critical or High severity problems and the risk to the Department:
  • Re-evaluating, amending and agreeing to the prioritisation (Critical and High), as well as resource allocation of Problem and /or Known Error records and their related tasks:
  • Approval of solutions which could potentially impact commercial relationships and/or DoE reputation, as well as resource utilisation and service efficiency:
  • Approval to close and/or Resolve (when required) Problem and Known Error records:
  • Assessing and mitigating residual risk of closed problems.

Operation Problem Review

  • Problem Process Governance - Tracking actions, root causes and quality of Problem records
  • Support problem owners
  • Assess critical Business impact of Problems
  • Identify critical problems to escalate to PRB for allocation of resources, budget and attention.

Problem Management Process Owner

  • End to end process ownership ensuring it is initially designed and implemented to match the Departments needs and evolves as the business requirements change
  • Ensuring the process matures after it is implemented and fit for purpose via the Continual Service Improvement Process
  • Liaise with Problem Coordinators to provide staff with Problem Management Process training and/or familiarisation

3.2. Governance group model

See a diagram of the Problem Management Governance group model

It is the responsibility of the following to ensure the Problem Management standard is applied as and where required.

4.1. Problem management process owner

  • End to end process ownership ensuring it is initially designed and implemented to match the Departments needs and evolves as the business requirements change
  • Identifying and addressing areas of non-conformance to the process
  • Ensuring the process matures after it is implemented and fit for purpose via the Continual Service Improvement Process
  • Liaise with Problem Owners, Coordinators to provide staff with Problem Management Process training and/or familiarisation
  • Review and track outstanding and unassigned Problem Investigation tickets to ensure raised problems are being managed within agreed targets
  • Collecting and assessing process improvement ideas from staff and users
  • Ensuring all associated Problem Management documentation (Standards, Procedure, work instructions, Remedy toolset) is maintained and circulated to all appropriate staff
  • The Process Owner is responsible for liaising with other Process Owners to maintain and improve process relationships.

4.2. Problem coordinator

  • Review of incidents to identify problems and raise problem investigations
  • Attendance to Problem Governance group meetings when required i.e. attendance to Problem Review Board and/or Problem Operation review
  • Determine appropriate Specialists or vendors that are required in the problem investigation and diagnosis
  • Search existing Problem Investigations and Known Errors to ensure the new problem has not been previously recorded
  • Conduct regular problem analysis meetings and involve all relevant stakeholders
  • Validate and approve proposed permanent Solutions and/or Workarounds
  • Monitor and manage their Problem Queue within the Service Management tool
  • Manage the workarounds and/or solutions by utilising the ITD Change Management process
  • Notify stakeholders of planned solutions and set expectations for changes
  • Identify any potential improvements to the problem management process and notify the process owner
  • Champion the utilisation of the problem management process
  • Review and approval of the Problem Resolution
  • Follow the closure procedure prior to closing a problem record

4.3. Problem owner

  • Problem Owner is to ensure a Problem is managed from end-to-end of the Problem lifecycle according to the defined process.
  • Attendance to Problem Governance group meetings when required i.e. attendance to Problem Review Board and/or Problem Operation review
  • Approval of Problem Closure Report for all Critical and High priority problems.
  • Notify stakeholders of planned solutions and set expectations for changes
  • Ensure all Critical priority problems are periodical updated.

4.4. Problem review board

  • Keeping informed of Critical or High severity problems and the risk to the Department:
  • Re-evaluating, amending and agreeing to the prioritisation (Critical and High), as well as resource allocation of Problem and /or Known Error records and their related tasks:
  • Approval of solutions which could potentially impact commercial relationships and/or DoE reputation, as well as resource utilisation and service efficiency:
  • Approval to close and/or Resolve (when required) Problem and Known Error records:
  • Assessing and mitigating residual risk of closed problems.

4.5. Service desk

  • Identify potential problems and notify Problem Coordinator
  • Assess the potential problem to determine whether it is a problem
  • Participate in the problem analysis meeting when required
  • Identify any potential changes to the problem management process and notify the process owner.

4.6. Service delivery manager

  • Work with the Problem Coordinator, Specialists, Vendor and Service Desk to ensure customer expectations are managed
  • Participate in the problem analysis meeting to identify a proposed solution
  • Notify the Problem Coordinator of any issues impacting/potentially impacting resolution of a Known Error
  • Be a liaison point for problem owners, coordinators on business impact.
  • Determine reporting requirements and identify any changes to the reports
  • Identify any potential changes to the problem management process and notify the process owner.

4.7. Problem specialist

  • Investigate and diagnosis the root cause of all problems escalated to the team utilising their subject matter expertise
  • Ensure the problem has not previously been recorded as a known error
  • Participation of problem diagnosis meetings
  • Liaise with vendors if required to perform investigation or analysis work
  • Work with other specialists and/or vendors to assess the problem and identify the known error and/or solution
  • Create a known error record documenting the root cause
  • Develop and document a workaround, where a workaround exists
  • Notify the Problem Coordinator of any existing or potential issues impacting investigation or the eventual resolution of a known error
  • Develop a solution implementation plan
  • Adhere to the Change Management process when implementing workarounds or solution
  • Identify any potential changes to the problem management process and notify the process owner.

4.8. Vendors/service partners

  • Notify relevant DET contact when a problem or potential problem is identified
  • Assess problems that have been assigned to them, including details of the problem record
  • Assist the assigned Specialist to analyse a problem or identify the root cause.
  • Develop a work around, informing the Problem Coordinator of work performed by the vendor organisation
  • Inform the Problem Coordinator of any work performed by their organisation in terms of the known error
  • Identify a solution through corrective action or accepting the known error
  • Develop a solution implementation plan if warranted
  • Implement the authorised solution, verify the corrective action has worked, provide a back-out procedure if the corrective action fails and investigate and resolve the failure of the corrective action
  • Identify any potential changes to the problem management process and notify the process

Download the print version of the Problem Management Process (PDF).

Procedure

The purpose of this document is to document the procedures required to execute, manage and govern the DoE ITD Problem Management Process.

The intended audience are those stakeholders throughout DoE that have a role to play in this process either directly or indirectly as dependent stakeholders. As well as those stakeholders who are interested in the process for execution and governance reasons.

This procedure document inherits the definitions contained in the SMO Glossary for Managing Services.

This document is created to provide a framework that underpins the execution and governance of this process. It is part of the definitive reference material for this process.

1.1. Procedure overview

This procedure document articulates by what means the DoE Problem Management Process will operate. It describes the where, who and when activities will be managed through the process lifecycle. Detailed work instructions describing how to execute this process are available within the ITD SMO portal and Remedy knowledge base.

1.2.  Procedures overview

The Problem Management process is a significant component of the ITIL Service Management framework, within the Service Operation phase.

The Problem Management process supports the ITD adoption and standardisation on the industry best practice ITIL process framework. This framework gives the necessary standard foundations to manage the delivery of services from a Service Provider to its customers.

The process consists of the following 5 main procedures, each of which is dependent on other external processes (through defined interfaces) to progress requests through to closure.

Problem management process overview workflow diagram

1.2.1. Identification and Classification

The Problem Management Process may be triggered from various sources.  When a significant or recurring incident or an anticipated problem exists, the Problem Coordinator responsible for a particular service assess the situation and determines whether a Problem exists or performs further investigation. This will involve classifying the service/product name affected and priority assessment.

1.2.2. Review

This procedure is used to control and reduce the duplication of problem investigation and to determine the resources required to investigate.

1.2.3. Investigation and Diagnosis

The investigation and diagnosis procedure involves to the use of the Kepner and Fourie Root cause analysis methodology (CauseWise). The aim is to standardise the method each team approaches problem investigation, as a result will increase the speed and accuracy of root cause.

1.2.4. Resolution and Recovery

This procedure is used to identify a preferred solution where the solution is applied using the ITD Change Management process.

1.2.5. Closure

When any change has been completed (and successfully reviewed), and the resolution has been applied, the Problem Record should be formally closed – as should any related Incident Records that are still open.

1.3. Key process interfaces

1.3.1. Incident Management

Inputs

Information needed from the Incident Management process by Problem Management includes:

  • Incident Data
  • Critical Incidents
  • Trend and Statistical data
  • Problems Identified

Outputs

Information needed to Incident Management process from Problem Management includes:

  • Resolutions information
  • Workarounds information
  • Known Error Database articles
  • Reports on the status of Problems and Known Errors in progress
1.3.2. Change Management

Inputs

Information needed from the Change Management process to Problem Management includes:

  • Results from Request for Change (RFC) submitted for approval
  • RFC Rejections

Outputs

Information needed to Change Management process from Problem Management includes:

  • RFCs
  • Known Error Database data to assist with change decisions
1.3.3. Configuration Management

Inputs

Information needed from the Configuration Management process by Problem Management includes:

  • CI data
  • Relationships between CIs

Outputs

Information needed to Configuration Management process to Problem Management includes:

  • Trend and statistical data on CIs (CIs that are Known Errors)
  • Known Error Database articles that relate to Configuration Management

To see a graphical representation  refere to interfaces between Problem Management and other ITD Service Management processes diagram  

 

 

2.1. Problem prioritisation

A problem’s priority is defined via the same prioritisation methods used by ITD Incident Management; i.e. based on two key factors – Impact and Urgency. The priority is assigned to determine the order in which the support organisation will respond to a reported Problem (highest priority calls are responded to first).

Note: A problem affecting the SALM School service, and is impacting a business functions of either; Student Wellbeing, Financial Reporting, Student attendance will require a high priority at a minimum.

Impact is defined as the effect of the Problem on the productivity of the service client’s function.  The clients of a service estimate business impact in relation to non-delivery of the affected business process.

For further information please refer to the Problem Management impact and urgency matrix.

2.2. Problem investigation lifecycle

The following table and lifecycle flowchart describes the life cycle of a Problem record in the Remedy Problem Management system.  

For further information please refer to the Problem investigation lifecycle page.

2.2.1. Known error record lifecycle

For further information  please refer to the Known Error lifecycle page.

2.3. Identification and classification

The Problem Management Triggers are outlined below. The Problem Management Process may be triggered from various sources.  Post the restoration of a Critical/Major Incident management, via a Post Incident Report. When a significant or recurring incident or an anticipated problem exists, the Problem Coordinator responsible for a particular service assess the situation and determines whether a Problem exists or performs further investigation.

A potential problem is identified by analysing:

  • High Impact Incident: isolated incidents which adversely affect one or more critical business services, typically managed as a Major Incident
  • Re-Occurring Incidents: incidents that recur or have affected multiple people over time, identified through analysis of previous Incidents
  • Non-Routine incidents: potential problems identified pre-emptively before being reported as incidents, such as through service testing prior to releasing a new or changed service
  • Other: Any identified or potential problems that do not meet other categories or require further assessment before categorisation.

The Problem Coordinator creates the Problem Record with as much detail as is currently known, including selecting the affected product, entering a summary of the problem statement.

Any actions the Problem Coordinator takes to improve the understanding of the problem should be entered as a work detail entry to ensure all relevant information is retained with the ticket. The Problem Coordinator may also create links to other ticket types, such as Incidents or Change record.

The Problem Coordinator must select a target date for resolution of the Problem ticket. This signifies the anticipated date by which the ticket will be considered complete, with the problem being sufficient addressed, with or without the problem or root cause being removed.

The Problem Coordinator then assigns the Problem ticket to the Specialist (Assignee) Group for further investigation. Unless previously agreed, the Problem Coordinator should assign the ticket to a group rather than an individual.

For further information please refer to Problem Management process triggers page

2.4. Review

When assigned to a Specialist Group, the Problem ticket moves to the Review workflow stage with a status of “Assigned”. The nominated person/people within that team review the Problem Ticket and assign a Specialist as the Assignee to that Problem. The ticket then moves to a status of “Under Investigation”.

Any actions the Specialist takes to improve the understanding of the problem should be entered as a work detail entry to ensure all relevant information is retained with the ticket. The Problem Coordinator may also create links to other ticket types, such as Incidents or Change record.

During the investigation, a number of common activities may be performed. These include:

  • Relate Incidents to the Problem as the problem is better understood and Incidents are assessed and analysed for commonalities and relationships
  • Assign the problem ticket to an initial Specialist or reassign to a more appropriate Specialist.
  • Generate Tasks to assign work to individuals to assist in the investigation
  • Relate CI to the Problem as affected services are identified and confirmed.

Alternatively, under certain circumstances the person/people nominated to review the queue may move the ticket directly to the Resolution stage of the Workflow. In doing to, that person must still specify the Specialist and the reason that the Problem ticket has been closed. Additional information should be recorded as a

2.5. Investigation and diagnosis

The Specialist then works independently (for isolated issues) or other Specialists to investigate the Root Cause, define a Known Error and Workaround, and develop a Solution. If the Solution requires a system change, the Specialist then initiates the Change process.

For complex Problem investigations, the Specialist may establish a project or group to formally investigate the problem with other Specialists from multiple support groups. The assigned Specialist is responsible for managing the investigation project as the "lead"; Specialist (still nominated as the Assignee for the purpose of the Problem ticket). During the investigation, a more appropriate lead" Specialist from the same or different Specialist support group may be identified and agreed and the ticket updated to reflect this.

If a vendor's support is required for investigation or resolution, the Specialist contacts the Vendor to log a call and records the Vendor's reference as the Ticket Number.

The Problem Coordinator is responsible for:

  • Investigation of the problem (root cause analysis)
  • Engagement of ICT support teams
  • End to end management and resolution of the Problem Investigation
  • Creating a knowledge article with the work around (when using KCS)

ICT support groups are responsible for:

  • Undertake detailed investigation of the problem
  • Register the cause as a known error
  • Raise changes to resolve problems
  • Update Problem Investigations with details of findings
2.5.1. Known Error Control

When a Known Error is identified, the specialist is required to raise a Known Error record and clearly state the defect, symptoms and workarounds (you can have more than one Known Error related to a Problem).

During the development of the Known Error Solution, the specialist is required to document and prepare the solution and have it reviewed and endorsed by the Problem Coordinator. If a proposed solution requires a modification to a configuration item, the ITD Change Management process must be adhered too.

2.6. Resolution and recovery

When the preferred solution has been identified and agreed the Specialist then moves the Problem ticket through the workflow to Resolution and Recovery.

The Problem is resolved as one of the following:

  • A defined and approved Solution that prevents the Incident from re-occurring.
  • An Enhancement Request, when a modification to a system is required to resolve the root cause. The Specialist creates a Service Request record to capture the required enhancement details.
  • Unresolvable in a situation where a solution cannot be implemented due to technical or financial implications.

Note: these are the default ‘reasons’ in the tool and may be reassessed.

Any actions Specialist(s) and/or Vendor takes to improve the understanding of the problem should be entered as a work detail entry to ensure all relevant information is retained with the ticket. The Problem Coordinator may also create links to other ticket types, such as Incidents or Change record.

The Specialist ensures that a workaround and solution are recorded when they exist. These details are used by Support Analysts to provide ongoing support to customers prior to final resolutions being implemented.

If the root cause or a practical resolution cannot be identified, the Specialist may move the Problem Investigation to a Status of “Pending”. The Specialist then monitors to Problem Investigation and may periodically review the Problem Investigation for a root cause or resolution that may become available over time.

The Specialist then notifies the Problem Coordinator that the Problem Ticket of the updated status.

For all Critical and High priority problems, Problem Review Board will have oversight on proposed solutions before proceeding.

2.7. Closed

When the Problem record has been completed and verified, the workflow moves to Closed. The Problem Coordinator reviews the Problem Record and verifies the status and any associated records (Known Error, Change, etc).

The Problem Coordinator may reject the proposed solution, or the Change Request rejected through the CAB. When the Problem Investigation is not accepted, the Problem Coordinator may move the workflow back to Assigned and assign it to the same or an alternative Specialist group to continue investigation.

For all Critical priority problems, Problem Review Board will have oversight on closure of problems, where it is mandatory that the problem coordinator completes the Problem Closure Report.

2.8. Cancelled

A Problem Investigation may also be cancelled at any stage, completing the workflow without implementing a solution.

The reasons for cancelling a Problem Investigation are:

  • Duplicate Investigation, when another Problem Investigation is underway (or Pending) for the same Incident(s).

 

The following section provides an overview of the roles involved in the Problem Management process. They are categorised into the:

  • day-to-day operational management of the process
  • end-to-end governance oversight across the problem lifecycle

3.1. Problem management process owner

The Process Owner is accountable for:

  • End to end process ownership ensuring it is initially designed and implemented to match the Departments needs and evolves as the business requirements change
  • Ensuring the problem management process health is maintained and matured by reporting, monitoring, analysing and governing the process against metrics.
  • Ensuring the process matures after it is implemented and fit for purpose via the Continual Service Improvement Process
  • Liaise with Problem Owner and Coordinators to provide staff with Problem Management Process training and/or familiarisation
  • Review and track outstanding and unassigned Problem Investigation tickets to ensure raised problems are being managed within agreed targets
  • Collecting and assessing process improvement ideas from staff and users
  • Ensuring all associated Problem Management documentation (Standards, Procedure, work instructions, Remedy toolset) is maintained and circulated to all appropriate staff
  • The Process Owner is responsible for liaising with other Process Owners to maintain and improve process relationships.
  • Chair the Community of Practice, including the Problem operation review meetings.

3.2. Problem coordinator

The Problem is accountable for:

  • Review of incidents to identify problems and raise problem investigations
  • Attendance to Problem Governance group meetings when required i.e. attendance to Problem Review Board and/or Problem Operation review.
  • Determine appropriate Specialists or vendors that are required in the problem investigation and diagnosis
  • Search existing Problem Investigations and Known Errors to ensure the new problem has not been previously recorded
  • Conduct regular problem analysis meetings and involve all relevant stakeholders
  • Validate and approve proposed permanent Solutions and/or Workarounds
  • Ensure updates are made throughout the problem management lifecycle and adhering to the update frequency if a problem is being tracked on the IT Executive dashboard
  • Monitor and manage their Problem Queue within the Service Management tool
  • Manage the workarounds and/or solutions by utilising the ITD Change Management process
  • Notify stakeholders of planned solutions and set expectations for changes
  • Identify any potential improvements to the problem management process and notify the process owner
  • Champion the utilisation of the problem management process
  • Review and approval of the Problem Resolution
  • Follow the closure procedure prior to closing a problem record

3.3. Problem owner

The Problem is accountable for:

  • Problem Owner is to ensure a Problem is managed from end-to-end of the Problem lifecycle according to the defined process.
  • Attendance to Problem Governance group meetings when required i.e. attendance to Problem Review Board and/or Problem Operation review.
  • Ensure updates are made throughout the problem management lifecycle and adhering to the update frequency if a problem is being tracked on the IT Executive dashboard
  • Approval of Problem Closure Report for all Critical and High priority problems.
  • Notify stakeholders of planned solutions and set expectations for changes
  • Ensure all Critical priority problems are periodical updated.

3.4. Specialist

The Specialist is accountable for:

  • Investigate and diagnosis the root cause of all problems escalated to the team utilising their subject matter expertise
  • Ensure the problem has not previously been recorded as a known error
  • Participation of problem diagnosis meetings
  • Liaise with vendors if required to perform investigation or analysis work
  • Work with other specialists and/or vendors to assess the problem and identify the known error and/or solution
  • Create a known error record documenting the root cause
  • Develop and document a workaround, where a workaround exists
  • Notify the Problem Coordinator of any existing or potential issues impacting investigation or the eventual resolution of a known error
  • Develop a solution implementation plan
  • Adhere to the Change Management process when implementing workarounds or solution
  • Identify any potential changes to the problem management process and notify the process owner.

3.5. Service desk

The Service Desk is accountable for:

  • Identify potential problems and notify Problem Coordinator
  • Assess the potential problem to determine whether it is a problem
  • Participate in the problem analysis meeting when required
  • Identify any potential change to the problem management process and notify the process owner.
  • Ensure the suitable use of the Incident Management Parent and Child incidents.

3.6. Vendor/service partners

The Vendor/Service Partner is accountable for:

  • Notify relevant DET contact when a problem or potential problem is identified
  • Assess problems that have been assigned to them, including details of the problem record
  • Assist the assigned Specialist to analyse a problem or identify the root cause.
  • Develop a work around, informing the Problem Coordinator of work performed by the vendor organisation
  • Inform the Problem Coordinator of any work performed by their organisation in terms of the known error
  • Identify a solution through corrective action or accepting the known error
  • Develop a solution implementation plan if warranted
  • Implement the authorised solution, verify the corrective action has worked, provide a back-out procedure if the corrective action fails and investigate and resolve the failure of the corrective action
  • Identify any potential change to the problem management process and notify the process owner.

3.7. Service delivery manager

The Service Manager is accountable for:

  • Problem Coordinator for Problem investigation that are a result of a Critical (Major Incident) Incident
  • Work with the Problem Coordinator, Specialists, Vendor and Service Desk to ensure customer expectations are managed
  • Participate in the problem analysis meeting to identify a proposed solution
  • Notify the Problem Coordinator of any issues impacting/potentially impacting resolution of a Known Error
  • Determine reporting requirements and identify any changes to the reports
  • Identify any potential change to the problem management process and notify the process owner.

3.8. Problem review board

The Problem Review Board is accountable for:

  • Keeping informed of Critical or High severity problems and the risk to the Department:
  • Re-evaluating, amending and agreeing to the prioritisation (Critical and High), as well as resource allocation of Problem and /or Known Error records and their related tasks:
  • Approval of solutions which could potentially impact commercial relationships and/or DoE reputation, as well as resource utilisation and service efficiency:
  • Approval to close and/or Resolve (when required) Problem and Known Error records:
  • Assessing and mitigating residual risk of closed problems.

3.9. RACI

To view the RACI chart please refer to the Problem Management process RACI.

4.1. Problem closure review (meets problem review board criteria)

For further information please refer to the Problem Closure Review document.

4.2. Problem management activity descriptions

This section of the document lists the specific activities as referenced in the Problem Management workflow diagram and provides some level of detail as to how these activities are performed.

For further information please refer to the Problem Management process  activities page. 

 

 

Critical Success Factors 

Minimise the impact of problems

KPI's

  • % of incidents correlated to Known Errors
  • The # of Problems Closed this month with a permanent resolution (by category)

Avoiding repeated incidents

KPI's

  • % of incidents correlated to Known Errors
  • The # of Problems Closed this month with a permanent resolution (by category)

Improved service quality

KPI's

  • % of incidents correlated to Known Errors
  • Ratio of problems to incidents

5.1. Reporting

A number of reports will be used to identify problems and known errors before the incident occur, thus minimising the impact on the service. Incident and problem analysis reports provide information for proactive measures to improve service quality.

Critical report is what is known as the Problem Review Board dashboard.  The report is crucial for the effective delivery of the weekly Problem Review Board meeting, where members refer to the report to complete their responsibilities as a board.

5.1.1. Governance reports

For furhter information please refer to the Problem Management process Governance reports page.  

 

An escalation management process will be invoked where a problem is being governed/monitored by the Problem Review Board.  And where the following criteria’s have been met;

  • No update within the Remedy Problem Record for two reporting cycles,
  • Root Cause is known but solution implementation has missed 2 estimated implementation dates,
  • Problem Review board recommend to have it escalated,
  • Problem closure report has not been completed within 2 weeks without a satisfactory reason and/or
  • Problem has been updated but does not demonstrate progress.

The Problem Review Board Charter (PRB) describes the purpose, values and how the PRB will operate within the Department ICT environment.  The PRB quorum will consist of DEC ITD Directors who will meet on a regular basis or during an emergency situation which requires the approval.

The PRB are to endorse the Problem Management Policy:

  • Keeping informed of Critical or High priority problems and the risk to the Department:
  • Re-evaluating, amending and agreeing to the prioritisation (Critical and High), as well as resource allocation of Problem and /or Known Error records and their related tasks:
  • Approval of solutions which could potentially impact commercial relationships and/or DEC reputation, as well as resource utilisation and service efficiency:
  • Approval to close and/or Resolve (when required)  Problem and Known Error records:
  • Assessing and mitigating residual risk of closed problems.

7.1. Problem review board roles

Problem Review Board Roles

The PRB will consist of ITD Directors who will have the authority and sponsorship to execute the PRB charter and may also require the presents of a PRB Guest.  The following roles will be required to participate within the PRB meetings.

PRB Chairperson

  • The PRB chairperson chairs;
  • Presenting problems that require risk assessment and final prioritisation, resource allocation, approving/rejecting proposed solution and acceptance of problem closures.

PRB Member

  • Provide input into the risk assessment and final prioritisation on critical and high problems;
  • Approving the suitable resource allocation to perform further investigation of problems;
  • Approving/rejecting proposed solutions based on commercial or resourcing and the review of critical and high problems and problem closures.
  • Review all new Critical and/or High priority problems raised and approve additions/ changes to CIO Dashboard
  • Acceptance of risks where mitigation activities have taken place

Problem Process Owner

  • Produce a report of critical and high priority problems that require risk assessment and final prioritisation, resource allocation, approving/rejecting proposed solution and acceptance of problem closures;
  • Minutes and coordination of action items as a result of the PRB meetings:

PRB Guests (Service Delivery Manager, Operations Manager, Service Desk Manager)

  • The PRB guest is a role which may require further explanation, clarification of a problem and its risks, workaround and/solutions or a detailed status update is required in person
  • Invitation of guest will be extended by core PRB members on a case by case basis

Appendix A: root cause analysis techniques

7.1.1. RCA techniques
For further information please refer to the  root cause analysis techniques page.

Appendix B: problem closure report

Problem Closure Report template is available from the Service Management Office.

Appendix C: Kepner and Fourie Root Cause Analysis Templates

Kepner and Fourie Root Cause Analysis Templates can be obtained from the Service Management Office.

Download a print version of the Problem Management Procedures (PDF).

Return to top of page Back to top