A catastrophic bug: mistakes developers make that cost lives

ispmanager.com - Sep 24 - - Dev Community

Programmers don't perform heart surgery, fight fires, or monitor the stability of nuclear plants. Nevertheless, developers' mistakes can cost human lives. Here are five cases when a small error in some code led to tragedy.

Chaos in an ambulance

In October 1992, the London Ambulance Service was equipped with a new dispatch system. It was supposed to handle about 2,000–2,500 calls a day while dispatching the medics. But in the first few hours, the system failed. There was chaos in the streets and no one could get any medical help.

Some patients waited 11 hours for medics to arrive, and the service was flooded with repeated calls from panicking people. This mistake resulted in the deaths of about 30 people.

This is an example of a total lapse in management. It turned out that the new software had been implemented without quality testing or training. The program contained 81 bugs and ran on cheap, weak hardware. In addition, employees found it hard to use. They were trained on it for 10 months before it got implemented and still found the interface hard to use.

Radiation sickness instead of a cure

In August 2000, an event that set off several slow deaths occurred at the National Cancer Institute in Panama. A radiologist requested that an additional shielding block be added to the radiation therapy chamber to protect patients’ healthy tissue from the radiation.

However, the software was only designed to work with the existing four blocks so the specialists found a way to bypass this limitation. It turned out that they could enter false data manually. Instead of two blocks, they could draw a single large one with a gap. The device was launched with the new settings without prior testing.

Soon thereafter, some patients began to notice symptoms of radiation sickness. Doctors dismissed the complaints for a while. It wasn't until almost a year later, in March 2001, that the technical problem was brought to their attention.

It turned out that the program was calculating therapy times based on the location and size of the non-existent holes in the protective blocks. As a result, people were being irradiated longer than necessary. By 2003, 17 people had died from the complications of radiation sickness.

How much a miscalculation of 0.24 seconds can cost

In 1991, during the Persian Gulf War, U.S. troops discovered that enemy strikes were getting through their Patriot air defense system. The fact is that the internal clock in the missile's software was 0.24 seconds slow. This arose due to inaccuracies in the translation of decimal numbers into the binary system. This might seem to be an insignificant detail.

However, the complex was not intended for long-term continuous operation and was intended to serve as a portable unit, but the reality turned out to be quite different.

Patriots were kept active for 8 or more hours at a time. And the longer it worked, the more errors accumulated. Eventually, the missiles began to miss their targets by 20 %, leading to dozens of casualties.

Boeing crash

In the fall of 2018, a Boeing 737 Max crashed in Indonesia. Just six months later, a crash with the same model of plane happened again in Ethiopia. The crashes occurred due to a software error.

The system incorrectly determined the position of the nose of the aircraft relative to the horizon and activated the automatic stabilization system even in manual control mode.

346 people died in the two crashes combined, the company lost $4.9 billion, and many countries have abandoned the 737.

Toyota swept problems in its code under the rug

From 2000 to 2010, Toyota received about 6,200 complaints about the sudden acceleration of its cars. These complaints included descriptions of crashes that resulted in 89 deaths.

At first, Toyota blamed this on a poor model of floor mats — they claimed they caused the gas pedal to stick in their cars. But later, it turned out that the problem was with the electronics.

They found 81,514 violations of MISRA's C software development standards in the code. To identify them, two engineers, Michael Barr and Philip Koopman, parsed 280,000 lines of code over 20 months. They worked under special conditions: in a classified, secure location with no internet or phone service and no right to take any documents off-site. The result was a classified 800-page report.

These stories show how important it is to follow quality standards, test code thoroughly, not indulge in crutch solutions, and consider user convenience during implementation. Then, software can be used to save lives rather than ending them.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .