Artificial intelligence appears to be quickly reaching point where it can’t get any smarter
Researchers warn that artificial intelligence companies like OpenAI and Google are quickly depleting the human-written training data necessary for their AI models to continue improving.
Without new training data, these AI models may hit a plateau, posing a significant challenge for the rapidly growing AI industry.
“There is a serious bottleneck here. If you start hitting those constraints about how much data you have, then you can’t really scale up your models efficiently anymore,” AI researcher Tamay Besiroglu, lead author of a new paper, told The Associated Press.
This threat is particularly dire for AI tools that rely on vast amounts of data, often sourced from online public archives.
That ongoing need has already sparked lawsuits from publishers, including the New York Times, against OpenAI for copyright infringement.
The situation could worsen as companies continue to lay off workers while increasing investments in AI, potentially reducing the flow of new content.
A new paper from San Francisco-based think tank Epoch suggests that the volume of text data used for training AI models is growing at 2.5 times per year, while computing capabilities are growing at four times that rate.
If this trend continues, large language models like Meta’s Llama 3 or OpenAI’s GPT-4 could run out of fresh data by 2026.
In response, AI companies might turn to training their models on AI-generated data. Companies like OpenAI, Google, and Anthropic are already exploring “synthetic data” for this purpose.
However, experts remain skeptical. A study by scientists at Rice and Stanford found that using AI-generated content for training causes the quality of AI output to deteriorate, likening it to a snake eating its own tail.
• Staff can be reached at 202-636-3000.