Tuesday, October 1, 2024

On automation and redundancies

 Three events made me think about the topic of automation and redundancies. First one was collision of a containership with a vital bridge in Baltimore a few months ago. The second, the crash of an Air France flight (AF447) in Atlantic a few years ago, and the third, the massive global impact of a glitch in a Microsoft software that happened just a few days ago. 

 

The container ship apparently had several power outages as it left the port. Without power, it was not possible to steer the ship and the result was catastrophic. My question is what was there no redundancy bult in the system, so if one thing fails another takes over? That’s what make airplanes so safe.

 

Then there was a curious case of the crash of Air France flight AF477. This modern Airbus A330 was flying from Rio de Janeiro to Paris. As the airplane entered a thunderstorm, its air speed indicator froze and the autopilot stopped working, as well as a fly by wire system which corrects pilot’s errors. When automation failed, the pilots made mistakes and the airplane entered a stall. One major reason was these pilots were trained for using fail-safe systems all the time and their experience in manually operating an aircraft was limited, particularly at high altitudes. 

 

In an excellent Podcast of the event (“Cautionary Tales”) Tim Hartford mentions experiments in which training under full automation led to poor responses for situations where it failed. Almost like it is better to train using automation that is not guaranteed to work always. The humans then know what to do in case of failure.

 

Finally, something that happened ta few days ago….it seems that an error in the Cybersecurity code that Microsoft asked people to use resulted in a massive meltdown in all types of sectors: aviation, banking, and hospitals. This brings forth another issue… how can we give so much power to one company or one software? Here too, there was a human override possible but that did not work well. Perhaps people were not used to manually doing things that were automated. 

 

Automation is inevitable, but these examples show that (1) make sure there are redundancies built in, (2) make sure that humans are trained to take over if automated system fails (i.e., they are trained on non-foolproof systems), (3) do not let one company, system, person become the ruler of the automation game across sectors.

 

Early in my professional career I used to do quantitative risk analysis. There were tools available even then, like Fault Tree Analysis, that would have caught some of these issues before they resulted in tragedies. I am sure there are more sophisticated tools available now then at that time So, why were they not used?

 

Finally, as Tim Hartford mentions, in the coming wave of AI, humans are going to become even more reliant on AI based automation. 

Will that reduce even further the chances of humans learning the basics of how to manual take over if automation fails?