May 15, 2025

Incident Investigation and Root Cause Analysis

Email

By Safety Team

Investigate incidents and near-misses to uncover the system failures behind them, using structured techniques like the 5 Whys and fishbone diagrams to drive corrective actions that prevent recurrence.

administrative-management

Shareable Safety Snapshot

administrative management

Incident Investigation and Root Cause Analysis

Investigate incidents and near-misses to uncover the system failures behind them, using structured techniques like the 5 Whys and fishbone diagrams to drive corrective actions that prevent recurrence.

1

Match corrective actions to the hierarchy of controls: engineer out the hazard where possible (install a nip-point guard that prevents access), rather than relying solely on retrained behavior (remind workers to lock out)

2

Assign each corrective action to a specific person with a specific deadline, and track completion in a system that is visible to leadership

3

Verify effectiveness by auditing whether the corrective action actually changed the condition it was intended to fix, not just whether the paperwork was completed

dailysafetymoment.com Ready to screenshot and share

What is Incident Investigation and Root Cause Analysis?

A worker's hand was caught in a conveyor nip point, crushing two fingers. The initial incident report said, "Employee failed to follow lockout/tagout procedure." The investigation could have stopped there. But when the team applied the 5 Whys, they discovered that the lockout procedure required a lock stored in a cabinet 200 feet away, that production pressure discouraged the 8-minute walk, that the supervisor regularly bypassed lockout for "quick adjustments," and that three similar near-misses had been reported in the prior year with no corrective action. The root cause was not one worker's shortcut. It was a system that made the shortcut predictable.

Incident investigation and root cause analysis is the disciplined process of examining what happened, why it happened, and what systemic changes will prevent it from happening again. It goes far beyond assigning blame to an individual, seeking instead to identify the organizational, procedural, and engineering failures that allowed the incident to occur.

Key Components

1. Immediate Response and Evidence Preservation

  • Secure the scene immediately after ensuring all injured persons have received medical care; do not move equipment, tools, or materials until they have been documented
  • Photograph the scene from multiple angles, sketch the layout including positions of equipment and people, and note environmental conditions such as lighting, temperature, noise, and floor surface
  • Interview witnesses and involved parties as soon as practical while details are fresh, using open-ended questions like "Walk me through what happened" rather than leading questions that suggest a cause
  • Collect physical evidence including damaged equipment, failed parts, PPE that was or was not in use, and relevant documents like work permits, inspection logs, and training records

2. Root Cause Analysis Techniques

  • Apply the 5 Whys by asking "Why?" repeatedly for each contributing factor until you reach a systemic cause that the organization can control, not an individual behavior that is a symptom
  • Use a fishbone (Ishikawa) diagram to organize potential causes into categories: people, procedures, equipment, environment, materials, and management, ensuring no category is overlooked
  • Distinguish between the direct cause (the conveyor was running), the contributing causes (the lock was stored far away, the supervisor modeled bypassing the procedure), and the root cause (the management system tolerated routine deviation from lockout)
  • Challenge the investigation team to keep asking "What allowed this condition to exist?" because the first answer is almost never the deepest one

3. Corrective Actions and Verification

  • Match corrective actions to the hierarchy of controls: engineer out the hazard where possible (install a nip-point guard that prevents access), rather than relying solely on retrained behavior (remind workers to lock out)
  • Assign each corrective action to a specific person with a specific deadline, and track completion in a system that is visible to leadership
  • Verify effectiveness by auditing whether the corrective action actually changed the condition it was intended to fix, not just whether the paperwork was completed
  • Share findings and corrective actions across the organization so that the same root cause is not waiting to produce the same incident at a different location

Building Your Safety Mindset

  1. Investigate to Learn, Not to Blame

    • When the investigation starts with "Who made the mistake?" it will end with a retraining memo that changes nothing. Start instead with "What conditions made this outcome possible?"
    • Create an environment where people involved in incidents are treated as sources of critical information, not as defendants, because honest testimony is the only path to honest root causes
    • Recognize that human error is always the last link in a chain of system failures; the goal is to shorten the chain so that even when someone makes a mistake, the system catches it before harm occurs
  2. Be Thorough, Not Fast

    • Resist pressure to close an investigation quickly with a surface-level finding. A rushed conclusion that blames the individual guarantees the same incident will recur
    • Include diverse perspectives on the investigation team: the operator, the supervisor, maintenance, engineering, and someone from a different department who can ask naive but revealing questions
    • Review at least 12 months of near-miss and incident data for patterns, because a single event rarely has a single cause; it usually has a history of warnings that were normalized
  3. Close the Loop and Sustain the Fix

    • A corrective action that is assigned but never verified is not a corrective action; it is a documented intention. Follow up at 30, 60, and 90 days to confirm the change is in place and working
    • If a corrective action is not implemented by its deadline, escalate it rather than extending the date, because the hazard that caused the incident is still present
    • Use stop-work authority if you observe a condition that was identified as a root cause in a previous investigation but has not yet been corrected

Discussion Points

  1. Think about the last incident or near-miss on our site. Did the investigation reach a root cause that changed a system, or did it stop at an individual's behavior? If it stopped early, what further questions could we have asked?
  2. When a corrective action is assigned after an investigation, how do we verify that it was actually implemented and that it actually works? What happens on our site when a corrective action deadline passes without completion?
  3. If you were involved in an incident tomorrow, would you feel comfortable giving a fully honest account of what happened, including any shortcuts you took and any pressures you felt? What would make that easier?

Action Steps

  • Review the most recent incident or near-miss investigation report for your area and check whether the corrective actions have been implemented and verified as effective
  • Practice the 5 Whys technique on a recent near-miss by writing each "Why" on paper until you reach a systemic cause that the organization controls, and share your analysis with your supervisor
  • Check whether any corrective actions assigned to your team are past their deadline, and escalate any open items today
  • During your next pre-task briefing, ask the crew: "What could go wrong with this task, and what would we do differently than the last time something similar happened?"

Related Safety Resources

Loading related resources...