Machines Make Mistakes—How Can We Trust Artificial Intelligence to Fly and Drive? | Opinion

Current data from the tragic Lion Air and Ethiopian Airlines crashes may not be sufficient to establish what caused the accidents, but initial evidence is pointing to an automated anti-stall feature designed for high altitudes that isn't sufficiently familiar—even to experienced pilots.

More generally, the incidents bring to the forefront some fundamental issues about the man-machine interface, and the increasingly automated computer systems in our lives.

The accidents raise critical two questions: when should we trust fully-automated computer systems with autonomous decision making, and how do we allow humans to gain control if things go wrong?

The questions are of broad interest, particularly in the context of driverless cars which are expected to be safer, on average, than human drivers. In domains like navigation, however, when the consequences of error are human lives, we care about the worst case, not the average. It isn't sufficient to ask whether society will benefit from a reduction in accident-related deaths, but to envision whether the worst case consequences are acceptable.

This is a vexing problem because it involves envisioning these worst-case Rumsfeldian "unknown unknowns." In machine learning, these are referred to as "edge cases" that were not encountered in the training data.

When should we trust machines that make mistakes? My research suggests that we are willing to trust machines when they make few mistakes, and when the consequences of the worst mistakes are not severe. As machines make more mistakes, we require that the consequences be less severe.

In many domains, we have reasonable predictions of how often we should expect errors and their estimated costs. But in automated navigation, we require zero errors because of their severe consequences.. As long as we are aware of any errors that can endanger human lives, complete automation is unacceptable.

This doesn't mean that humans will do better at autonomous navigation in the aggregate, but that we shouldn't cede complete control of decision making to the machine. But in those rare cases when the machine has a reasonable chance of being wrong, how can we let a human take over?

This is a tricky question to answer, particularly when mistakes are rare. If a machine is rarely wrong, humans are very unlikely to detect an error that could arise from an edge case or sensor failure. We expect the machine to be right. After all, if all experiences until now have been error-free, that is the current expectation.

This was evident in the Arizona Uber accident, where the human driver didn't act in time because she didn't expect the failure that occurred—the vehicle assumed that the pedestrian was an object not worth reacting to, like a bird of a piece of paper wafting in the wind—where braking is unnecessary and potentially dangerous. The lesson from the Arizona incident, which applies more broadly to systems where failure is rare but error consequences are severe, is that the interface should alert the human when the probability of sensor failure exceeds a threshold.

But such a calibration should not result in too many false alarms, which would effectively neutralize the benefits of automation—and indicate that such a task is not suitable for automation. As long as the false alarms are few, finding the right balance is the $64m question.

To borrow from Dirty Harry, just like a "man's gotta know his limitations," we need machines that know their limitations and provide humans with sufficient time and information to take over.

In order for this to happen, machines will need to have a better understanding of human behavior. We would be wise to require makers of autonomous vehicles to collect and share simulated and real-world data with regulators and the public about the consequences of machine failure, especially on the behavior of humans under such circumstances. This data could be used to design the human-machine interface in a way that provides humans sufficient time to take over if necessary. Indeed, this should be a precondition before we let driverless cars "into the wild."

Ethiopia Alirlines crash
Debris of the crashed Ethiopia Airlines airplane, taken on March 11, 2019. MICHAEL TEWELDE/AFP/Getty Images

The immediate lesson for airplane manufacturers is similar. While there were zero commercial aviation deaths in 2017, which bolstered our faith in automation, recent events should alert us to dealing with the risks of automation.

Manufacturers and airlines need to collect and share data from simulations of failure of any and all components associated with automated systems in aircraft, and design an interface that enables operators to take over with sufficient notice when required. This would force pilots to be trained about new features and technology—and to be confident about taking over when necessary. Such data would provide a major step towards realizing the best of both machines and humans, and help bolster our trust in artificial intelligence based systems.

Vasant Dhar is a professor at New York University's Stern School of Business and the Center for Data Science.

Views expressed in this article are the author's own.