Failure Investigation: The CSI of Machines
Failure Investigation: The CSI of Machines
We have all seen TV shows with crime scene investigations, where intelligent persons catch the “insignificant” details that solve the mystery. In a failure investigation, the method is pretty much the same. No detail is too small to be considered.
A failure is a trauma for an organisation, even if there are no personal injuries. The stress of failure on the organisation is high and it affects every single employee. Of course, the best way to handle a failure is to not have one, but this is, of course, an impossibility. When the inevitable failure occurs, consider the following questions:
- What happened?
- Who is responsible?
- Did a single person’s decision lead to this?
- Who is to blame?
The answer to the last question is easy: do not blame anyone. Blaming often leads to a suspicious organisation with a paranoid and restricted outlook. Instead, consider the possible reasons for the failure. Human error is always a possibility, but it is only one of many until otherwise proven. And a common reason for human error is flaws in organisation and leadership.
The reason for a failure can seem quite obvious at first glance, but very often something else is found when conducting the root cause analysis (RCA). The best way to proceed after discovering a failure is to follow this short list:
Freeze the situation. Ensure that everything is secure. Make sure that everybody and everything is safe. Lock the space/room to seal off the situation, and eliminate as many risk factors as possible.
Nothing is too small to note. Make sure to save all information related to the failure as soon as possible. Save event lists. Secure all data from operating systems and condition monitoring systems. Save pictures from measuring systems. Talk to everybody that was present at the event, and, if possible, record these conversations.
Take photos/video. Document everything! No detail is too small. And don’t forget to record secondary marks such as oil smears, holes in the wall, pieces on the floor, etc.
Contact the insurance company, the supplier and, if needed, an independent investigator.
Ensure that all steps of the failure procedure are completed. Keep your eyes on the failed equipment at all times. Document all the steps and situations with video and/or photos. Be present all the time, both on-site and if the failed equipment is moved off-site. Be present when the equipment is opened, as the smell can tell you a lot.
When working with rotating machinery, there are a lot of things that can go wrong, often due to dynamic sources. Dynamic sources are often the cause of the failure, though there can also be other reasons.
To help with the first critical moments of the investigation, we have created a leaflet with the list of failure procedures. The leaflet is in PDF form and easily printable.
Figure 1. Torsion vibration fatigue caused the failure of a flue gas fan. (Click on picture to enlarge)