Robust human organizations: Lessons for artificial intelligence research and application
In the 1980s and 1990s, several researchers (most notably Todd LaPorte, Gene Rochlin, and Karlene Roberts) undertook the study of human organizations that operate high-risk systems while achieving very low error rates. They called these organizations “High Reliability Organizations” (or HROs), and they tried to characterize what aspects of these organizations accounted for their high reliability.
My attention was drawn to HROs as a result of reading Paul Scharre’s excellent book Army of None (2018) that reviews the current state and future prospects of autonomous weapon systems. Among the many existing autonomous weapon systems that Scharre analyzes, I found his discussion of the Aegis cruisers to be the most interesting from an AI standpoint. Scharre discusses the tragic Vincennes incident in which the Vincennes, a cruiser equipped with the Aegis autonomous ship defense system, accidentally shot down an Iranian civilian airliner resulting in the death of all passengers and crew. Scharre describes how in response the US Navy completely overhauled its procedures for training Aegis cruiser personnel and adopted many of the practices of HROs. He suggests that the safe deployment of an autonomous weapon system requires that the organization using that system be a high-reliability organization.
What are the properties of HROs? Weick, Sutcliffe, and Obstfeld (1999) give an excellent summary. They identify five practices that they believe are responsible for the low error rates of HROs:
1. Preoccupation with failure. HROs believe that there exist new failure modes that they have not yet observed. These failure modes are rare, so it is impossible to learn from experience how to handle them. Consequently, HROs study all known failures carefully, they study anomalies and near misses, and they treat the absence of anomalies and near misses as a sign that they are not being sufficiently vigilant in looking for problems. HROs encourage the reporting of all mistakes and anomalies.
2. Reluctance to simplify interpretations. HROs cultivate a diverse collection of expertise so that multiple interpretations can be generated for any observed event. They adopt many forms of checks and balances and perform frequent adversarial reviews. They hire people with non-traditional training, perform job rotations, and engage in repeated retraining. To deal with the conflicts that arise from multiple interpretations, they hire and value people for their interpersonal skills as much as for their technical knowledge.
3. Sensitivity to operations. HROs maintain at all times a small group of people who have deep situational awareness. This group constantly checks whether the observed behavior of the system is the result of its known inputs or whether there might be other factors at work.
4. Commitment to resilience. Teams practice managing surprise. They practice recombining existing actions and procedures in novel ways in order to attain high skill at improvisation. They practice the rapid formation of ad hoc teams to improvise solutions to novel problems.
5. Under-specification of organizational structures. HROs empower every team member to make decisions related to his/her expertise. Any person can raise an alarm and halt operations. When anomalies or near misses arise, their descriptions are propagated throughout the organization, rather than following a fixed reporting path, in the hopes that a person with the right expertise will see them. Power is delegated to operations personnel, but management is available at all times.
HROs provide at least three lessons for the development and application of AI technology. First, our goal should be to create combined human-machine systems that function as high-reliability organizations. We should consider how AI systems can incorporate the five principles listed above. Our AI systems should continuously monitor their own behavior, the behavior of the human team, and the behavior of the environment to check for anomalies, near misses, and unanticipated side effects of actions. Our AI systems should be built of ensembles of diverse models to reduce the risk that any one model contains critical errors. They should incorporate techniques, such as minimizing down-side risk, that confer robustness to model error (Chow, et al., 2015). Our AI systems must support combined human-machine situational awareness, which will require not only excellent user interface design but the creation of AI systems whose structure can be understood and whose behavior can be predicted by the human members of the team. Our AI systems must support combined human-machine improvisational planning. Rather than executing fixed policies, methods that combine real time planning (e.g., receding horizon control, also known as model-predictive control) are likely to be better-suited to improvisational planning. Researchers in reinforcement learning should learn from the experience of human-machine mixed initiative planning systems (Bresina and Morris, 2007). Finally, our AI systems should have models of their own expertise and models of the expertise of the human operators so that the systems can route problems to the right humans when needed.
A second lesson from HRO studies is that we should not deploy AI technology in situations where it is impossible for the surrounding human organization to achieve high reliability. Consider, for example, the deployment of face recognition tools by law enforcement. Quite aside from questions of privacy and civil liberties, I believe the HRO perspective provides important guidance for deciding under what conditions it is safe to deploy face recognition. The South Wales police have made public the results of 15 deployments of face recognition technology at public events. Across these 15 deployments, they caught 234 people with outstanding arrest warrants. They also experienced 2,451 false alarms — a false alarm rate of 91.3% (South Wales Police, 2018). This is typical of many applications of face recognition and fraud detection. To ensure that we achieve 100% detection of criminals, we must set the detection threshold quite low, which leads to high false alarm rates. While I do not know the details of the South Wales Police procedures, it is easy to imagine that this organization could achieve high reliability through a combination of careful procedures (e.g., human checks of all alarms; employing fall-back methods to compensate for dark skin and other known weaknesses of face recognition software; looking for patterns and anomalies in the alarms; continuous vetting of the list of outstanding arrest warrants and the provenance of the library face images; etc.). But now consider the proposal to incorporate face recognition into the “body cams” worn by police. A single officer engaged in a confrontation with a person believed to be armed would not have the ability to carefully handle false alarms. It is difficult to imagine any organizational design that would enable an officer engaged in a firefight to properly handle this technology.
A third lesson is that our AI systems should be continuously monitoring the functioning of the human organization to check for threats to high reliability. Just as any human member of an HRO can halt operations if they spot a problem, the AI system should also be empowered to halt operations. As AI technology continues to improve, it should be possible to detect problems in the team such as over-confidence, reduced attention, complacency, inertia, homogeneity, bullheadedness, hubris, headstrong acts, and self-importance.
In summary, as with previous technological advances, AI technology increases the risk that failures in human organizations and actions will be magnified by the technology with devastating consequences. To avoid such catastrophic failures, the combined human and AI organization must achieve high reliability. Work on high-reliability organizations suggests important directions for both technological development and policy making. It is critical that we fund and pursue these research directions immediately and that we only deploy AI technology in organizations that maintain high reliability.
Bresina, J. L., Morris, P. H. (2007). Mixed-initiative planning in space mission operations. AI Magazine, 28. 75–88.
Chow, Y., Tamar, A., Mannor, S., and Pavone, M. (2015). Risk-Sensitive and Robust Decision-Making: a CVaR Optimization Approach. Advances in Neural Information Processing Systems (NIPS) 2015.
Scharre, P. (2018). Army of None: Autonomous weapons and the future of war. W. W. Norton.
South Wales Police (2018). https://www.south-wales.police.uk/en/advice/facial-recognition-technology/ Accessed November 12, 2018.
Weick, K. E., Sutcliffe, K. M., Obstfeld D. (1999). Organizing for High Reliability: Processes of Collective Mindfulness. In R.S. Sutton and B.M. Staw (Eds.), Research in Organizational Behavior, Volume 1 (Stanford: Jai Press, 1999), Chapter 44, pp. 81–123.