Nudge Users to Catch Generative AI Errors

May 29, 2024

120 2 minutes read

Nudge Users to Catch Generative AI Errors — 2024SUM Gosline 2400x1260 1 1200x630.jpg

Using large language models to generate text can save time but often results in unpredictable errors. Prompting users to review outputs can improve their quality.

Renée Richardson Gosline, Yunhao Zhang, Haiwen Li, Paul Daugherty, Arnab D. Chakraborty, Philippe Roussiere, and Patrick Connolly

May 29, 2024

Reading Time: 7 min

Topics

Frontiers

An MIT SMR initiative exploring how technology is reshaping the practice of management.

More in this series

Neil Webb/Ikon Images

OpenAI’s ChatGPT has generated excitement since its release in November 2022, but it has also created new challenges for managers. On the one hand, business leaders understand that they cannot afford to overlook the potential of generative AI large language models (LLMs). On the other hand, apprehensions surrounding issues such as bias, inaccuracy, and security breaches loom large, limiting trust in these models.

In such an environment, responsible approaches to using LLMs are critical to the safe adoption of generative AI. Consensus is building that humans must remain in the loop (a scenario in which human oversight and intervention places the algorithm in the role of a learning apprentice) and responsible AI principles must be codified. Without a proper understanding of AI models and their limitations, users could place too much trust in AI-generated content. Accessible and user-friendly interfaces like ChatGPT, in particular, can present errors with confidence while lacking transparency, warnings, or any communication of their own limitations to users. A more effective approach must assist users with identifying the parts of AI-generated content that require affirmative human choice, fact-checking, and scrutiny.

In a recent field experiment, we explored a way to assist users in this endeavor. We provided global business research professionals at Accenture with a tool developed at Accenture’s Dock innovation center, designed to highlight potential errors and omissions in LLM content. We then measured the extent to which adding this layer of friction had the intended effect of reducing the likelihood of uncritical adoption of LLM content and bolstering the benefits of having humans in the loop.

The findings revealed that consciously adding some friction to the process of reviewing LLM-generated content can lead to increased accuracy — without significantly increasing the time required to complete the task. This has implications for how companies can deploy generative AI applications more responsibly.

Experiment With Friction

Friction has a bad name in the realm of digital customer experience, where companies strive to eliminate any roadblocks to satisfying user needs. But recent research suggests that organizations should embrace beneficial friction in AI systems to improve human decision-making.

About the Authors

Renée Richardson Gosline is head of the Human-First AI Group at MIT’s Initiative on the Digital Economy and a senior lecturer and research scientist at the MIT Sloan School of Management. Yunhao Zhang is a postdoctoral fellow at the Psychology of Technology Institute. Haiwen Li is a doctoral candidate at the MIT Institute for Data, Systems, and Society. Paul Daugherty is chief technology and innovation officer at Accenture. Arnab D. Chakraborty is the global responsible AI lead and a senior managing director at Accenture. Philippe Roussiere is global lead, Paris, for research innovation and AI at Accenture. Patrick Connolly is global responsible AI/generative AI research manager at Accenture Research, Dublin.

Source

May 29, 2024

120 2 minutes read