Nudge Users to Catch Generative AI Errors
Using large language models to generate text can save time but often results in unpredictable errors. Prompting users to review outputs can improve their quality.
Topics
Frontiers
An MIT SMR initiative exploring how technology is reshaping the practice of management.
More in this series
OpenAI’s ChatGPT has generated excitement since its release in November 2022, but it has also created new challenges for managers. On the one hand, business leaders understand that they cannot afford to overlook the potential of generative AI large language models (LLMs). On the other hand, apprehensions surrounding issues such as bias, inaccuracy, and security breaches loom large, limiting trust in these models.
In such an environment, responsible approaches to using LLMs are critical to the safe adoption of generative AI. Consensus is building that humans must remain in the loop (a scenario in which human oversight and intervention places the algorithm in the role of a learning apprentice) and responsible AI principles must be codified. Without a proper understanding of AI models and their limitations, users could place too much trust in AI-generated content. Accessible and user-friendly interfaces like ChatGPT, in particular, can present errors with confidence while lacking transparency, warnings, or any communication of their own limitations to users. A more effective approach must assist users with identifying the parts of AI-generated content that require affirmative human choice, fact-checking, and scrutiny.
Get Updates on Leading With AI and Data
Get monthly insights on how artificial intelligence impacts your organization and what it means for your company and customers.
Please enter a valid email address
Thank you for signing up
In a recent field experiment, we explored a way to assist users in this endeavor. We provided global business research professionals at Accenture with a tool developed at Accenture’s Dock innovation center, designed to highlight potential errors and omissions in LLM content. We then measured the extent to which adding this layer of friction had the intended effect of reducing the likelihood of uncritical adoption of LLM content and bolstering the benefits of having humans in the loop.
The findings revealed that consciously adding some friction to the process of reviewing LLM-generated content can lead to increased accuracy — without significantly increasing the time required to complete the task. This has implications for how companies can deploy generative AI applications more responsibly.
Experiment With Friction
Friction has a bad name in the realm of digital customer experience, where companies strive to eliminate any roadblocks to satisfying user needs. But recent research suggests that organizations should embrace beneficial friction in AI systems to improve human decision-making.