The focus of safety management for many years has been on the prevention and mitigation of incidents. In this article we aim to highlight the importance of incident investigation techniques as a powerful management systems tool and how, through adaption, similar incident investigation techniques can be used for both occupational and process safety incident investigation, regardless of size or complexity.
Background
The investigation of process safety incidents requires a systematic approach to ensure that all the underlying failures in the management system are identified and addressed. In their book ‘Investigating Chemical Process Incidents’ the CCPS (Ref 1) identify that ‘Process Safety incidents are the result of management system failures’. Whilst most organisations have good systems for investigating occupational safety incidents, process safety incidents are more complex and as such are typically more difficult to investigate. Experience has shown that the current widely used approaches for investigating process safety incidents support teams in the development of incident timelines, but often result in poor root cause analyses because of the complexity of their methodologies. As a result, underlying failures of the management system may not be clearly identified, unless suitably competent practitioners apply them.
Development of Process Safety Management systems
Industry has progressed in its understanding of the role of management systems in prevention of accidents. Bird et al (Ref 2) noted that in 1931 an electrical manufacturer analyzed 1490 lost time or medical attention accidents and found that less than 1% were seen to be the fault of the employer. It was conventional wisdom that employee faults were the cause of 90% of incidents and the remaining 10% were unpreventable acts of God.
The groundbreaking work that pushed safety science forward was the energy transfer model developed by Gibson and Haddon, with its earliest success in the 1960’s applied to highway safety, and the requirements for many safety devices (seat belts, dual breaking systems, windscreen manufacture) combined with a major program to reduce drink driving.
Structured safety management systems for process facilities which place safety into a total site management context are a more recent development. Frank Bird was one of the pioneers in modern safety management and developed a number of prototype management systems, culminating in the International Safety Rating System in 1978. This was based on detailed study of accident causation initially at steel works in the USA. ISRS has been though many editions since then reaching the ISRS 5th edition milestone in 1993 and the current ISRS 8th edition in 2009.
The chemical industry in Canada developed Responsible Care in 1985 and this slowly progressed globally, now covering 53 countries. The CCPS group developed its Guideline for Technical Management of Process Safety in 1989 and API issued a management system protocol RP750 Recommended Practice for the Management of Process Hazards in 1990. These were key documents under-pinning the OSHA 1910.119 regulation on Process Safety Management. The OSHA standard was created following the major vapour cloud explosion accident at Phillips Pasadena and also to address learnings from the Bhopal incident. API 750 was subsequently updated and reissued as an integrated standard also covering safety and environment as API 9100A.
A feature of almost all of these systems is an element based structure with Leadership being a prominent feature, derived as they were from the original ISRS structure.
Incident investigation: Different approaches
The Incident Ratio Pyramid was developed by Heinrich and Bird, based on data from a wide range of industrial accidents and described in the book ‘Practical Loss Control Leadership’ (Ref 3). This suggested that for every serious major injury there were an increasing number of minor injuries, property damage events and incidents with no visible injury or damage. These incidents could be seen to display a fixed relationship. This relationship has been subsequently validated by other work, and although the ratios have varied to a small extent, this concept has formed the basis of safety management systems development for over 20 years. However, more recent work by a number of groups including DNV indicates that there is a different ratio pyramid where process safety incidents are concerned (Figure 1). Process safety incidents are typically less frequent, have greater potential for harm, and ‘near misses’ are not as obvious. The barriers that need to be defeated to result in a process safety incident are also different from those which are relevant for an occupational safety incident.
In terms of incident investigation this means that approaches to occupational incidents need to be adapted to be able also to address process safety incidents. This adaption is necessary to allow the consideration of the complex people, plant and management system barriers that prevent, detect, control and mitigate process hazards.
Improved processes
Industry’s focus is to try and prevent incidents through the application of predictive techniques for determining the barriers necessary to manage risks. However this assumes that there is sufficient knowledge and experience within the organisation of the nature of incidents that have occurred to similar facilities. As a result, predictive techniques are not generally useful for investigation although they may help in identifying barriers that were expected to operate.
Techniques for investigating incidents need to take a systematic, comprehensive and structured approach. Without this, too much reliance may be placed on the knowledge, experience and personal viewpoint of the person completing it, which potentially means that different investigators could identify different causes. It is also often too easy to blame either the equipment or the person without consideration of the underlying root causes. Only by identifying the underlying weaknesses in the management system can more effective solutions be developed that will develop more effective barriers and prevent recurrences.
Barriers and beyond
Safety barriers are physical and/or non-physical means planned to prevent, control, or mitigate undesired events or accidents.
A key stage in any incident investigation is establishing the barriers that failed as well as those that worked. Concentrating on the barriers ultimately enables the direct and root causes to be established, as barriers ultimately reply on management system controls being in place. Only by addressing the root causes for barrier failure can the underlying deficiencies in the management system be addressed. Unlike occupational safety incidents in a process safety incident there are frequently many barriers. For example, in the Bhopal accident 8 barriers were defeated – any one of which would have prevented that accident, and similarly 12 barriers were identified for Texas City with similar potential to prevent the accident.
Barriers can take a number of different forms; normally technical (physical), administrative (procedures), or people-based, (training, competence, etc). There are also ‘fortunate mitigating circumstances’. Time of day or night and weather (including wind direction) have played a part in reducing the effects of some major incidents but they should never be relied upon as a normal barrier as they cannot be controlled.
Once securing evidence, data collection, and interviews are completed, creating a time-line is normally the next stage in any Process Safety Incident Investigative Technique (Ref 1). From the time-line a probable sequence of events can be established and discrepancies, omissions and areas to explore further can be identified. Whilst this process does not identify the root cause of an incident directly, it tracks the sequence of events and barriers present and therefore allows all the relevant barriers to be identified. It then identifies which barriers worked effectively, which worked partially, but for which we have reduced confidence in their integrity, and which failed completely. In a typical process safety incident there will be barriers in all these categories. Further techniques can then be used to identify root causes for barrier failures and hence management system failures. Barrier identification also has additional benefits; it enables consideration to be given to barriers that could / should have been present, it enables the effectiveness of barriers to be assessed, it can aid with a Layer of Protection Analysis (LOPA), and finally, results can also be used to update predictive methods such as Hazard Identification (HAZID) and Hazard & Operability (HAZOP) Studies.
Developing the root causes
Having identified a time-line and sequence of barrier failures it is now possible to apply your technique of choice to determine the root causes contributing to the barrier failure. DNV’s experience has been that for occupational safety incidents the small numbers of barriers means it is possible to analyse the whole incident in a single step. For process safety there are multiple barriers of disparate types and therefore a more sequential approach is needed. The most powerful approach has been to apply a formal root cause technique to each failed barrier in turn. This may seem time-consuming, but experience has shown that each barrier failure is typically due to a small sub-set of management system failures. Therefore it is relatively quick to analyse each barrier with the benefit of more rigorous analysis.

Once all the individual barrier root causes have been determined, the results can be collated to provide the overall assessment. At this point an approach with a pre-worked checklist of specific, defined, root causes ideally linked to management system elements becomes a powerful tool, since common root causes (those which under-pinned several barrier failures) become immediately apparent. This is an important management insight because it highlights those areas which should be addressed to strengthen the management system. If this process is repeated over time for each incident that occurs, a repeating picture of the common management system failings can be identified. With sufficient data this can be applied retrospectively.
Applying the approach
Systematic Cause Analysis Technique (SCAT) has been used for many years by people at all levels in organisations from Supervisors through to Safety Managers to investigate incidents with good success due to its ease of use and thoroughness of approach. SCAT is based on the Loss Causation Model (See Fig 3) which is a means of linking actual loss to root causes and underlying management system failings in a domino type model. The Loss Causation Model was identified by Bird (Ref 3) in the 1970’s and has been used as the basis for a number of incident investigation techniques.
What DNV has discovered, is that the Loss Causation Model should be applied to each barrier in turn, effectively treating each barrier failure as a loss. This is in contrast to the original approach of applying the model to the entire incident.
The approach outlined in Figure 2 shows how it is possible to take an existing root cause analysis technique such as SCAT and apply it to more complex incidents in a simple structured manner. DNV have called this approach, supplemented by a series of specific process safety checklists, SCAT PSM.
In summary, incident investigation is a powerful management system process for controlling hazards and in pro-actively learning from experience. In order to learn from incidents and prevent their re-occurrence it is necessary to identify the root causes and act on them, by modifying management systems where necessary. Incidents are due to failures of barriers, be they people, plant or management systems. Process safety incidents are generally more hazardous and complex than occupational safety incidents, have more barriers to consider, and are therefore more different to predict. However, application of a systematic, structured approach by knowledgeable people makes the analysis possible and beneficial for ongoing risk reduction.
References:
1: Guidelines for Investigating Chemical Process Incidents, 2nd Ed, CCPS, 2003
2: Practical Loss Control Leader. FE Bird, GL Germain, MD Clarke 3rd Ed. DNV Atlanta, US
3: ‘Practical Loss Control Leadership’ F E Bird, Jr & G L Germaine, 3rd Edition, DNV (USA) Inc. August 1996
4: ‘Linking OII and RMP*Info Data: Does Everyday Safety Prevent Catastrophic Loss?’ M Elliott, P Kleindorfer, J DuBois, Yanlin Wang, October 2007
Authors:
Tony Potts, Stuart Greenfield and Mark Fisher, DNV
Article appeared in the September 2009 Edition of the Chemical Engineering Magazine (TCE).
Date: 13 November 2009
