Why ‘artificial intelligence’ keeps getting dumber
Did you know that cats have been to the moon? That it’s safe to stare at the sun for 15 minutes, or even longer, as long as you have dark skin? Or that to stay healthy, you should eat one small rock per day?
These are some of the latest pearls of wisdom that Google has been serving to its American users (we aren’t so lucky here yet in the UK). ‘Let Google do the searching for you’, the search giant promised when it introduced a feature called AI Overviews earlier this month. This integrates Google’s Gemini generative-AI model into its search engine. The answers it generates appear above the traditional list of ranked results. And you can’t get rid of them.
AI Overviews hasn’t had the effect that Google hoped for, to say the least. It has certainly garnered immediate internet virality, with people sharing their favourite answers. Not because these are helpful, but because they are so laughable. For instance, when you ask AI Overviews for a list of fruits ending with ‘um’ it returns: ‘Applum, Strawberrum and Coconut.’ This is what, in AI parlance, is called a ‘hallucination’.
Despite having a market capitalisation of $2 trillion and the ability to hire the biggest brains on the planet, Google keeps stumbling over AI. Its first attempt to join the generative-AI goldrush in February last year was the ill-fated Bard chatbot, which had similar issues with spouting factual inaccuracies. On its first live demo, Bard mistakenly declared that the James Webb Space Telescope, launched only in 2021, had taken ‘the first pictures’ ever of Earth from outside the solar system. The mistake wiped $100 billion off Google’s market value.
This February, Google had another go at AI, this time with Gemini, an image and text generator. The problem was that it had very heavy-handed diversity guardrails. When asked to produce historically accurate images, it would instead generate black Nazi soldiers, Native American Founding Fathers and a South Asian female pope.
This was ‘a well-meaning mistake’, pleaded The Economist. But Google wasn’t caught unawares by the problems inherent to generative AI. It will have known about its capabilities and pitfalls.
Before the current AI mania truly kicked off, analysts had already worked out that generative AI would be unlikely to improve user experience, and may well degrade it. That caution was abandoned once investors started piling in.
So why is Google’s AI putting out such rotten results? In fact, it’s working exactly as you would expect. Don’t be fooled by the ‘artificial intelligence’ branding. Fundamentally, AI Overviews is simply trying to guess the next word it should use, according to statistical probability, but without having any mooring to reality. The algorithm cannot say ‘I don’t know’ when asked a difficult question, because it doesn’t ‘know’ anything. It cannot even perform simple maths, as users have demonstrated, because it has no underlying concept of numbers or of valid arithmetic operations. Hence the hallucinations and omissions.
This is less of a problem when the output doesn’t matter as much, such as when AI is processing an image and creates a minor glitch. Our phones use machine learning every day to process our photos, and we don’t notice or care much about most of the glitches. But for Google to advise us all to start eating rocks is no minor glitch.
Such errors are more or less inevitable because of the way the AI is trained. Rather than learning from a curated dataset of accurate information, AI models are trained on a huge, practically open-ended data set. Google’s AI and ChatGPT have already scraped as much of the web as they can and, needless to say, lots of what’s on the web isn’t true. Forums like Reddit teem with sarcasm and jokes, but these are treated by the AI as trustworthy, as sincere and correct explanations to problems. Programmers have long used the phrase ‘GIGO’ to describe what is going on here: garbage in, garbage out.
AI’s hallucination problem is consistent across all fields. It pretty much precludes generative AI being practically useful in commercial and business applications, where you might expect it to save a great deal of time. A new study of generative AI in legal work finds the additional verification steps now required to ensure the AI isn’t hallucinating cancel out the time saved from deploying it in the first place.
‘[Programmers] are still making the same bone-headed mistakes as before. Nobody has actually solved hallucinations with large-language models and I don’t think we can’, the cognitive scientist and veteran AI sceptic, Professor Gary Marcus, observed last week.
Another problem is now coming into view. The AI is making an already bad job worse, by generating bogus information, which then pollutes the rest of the web. ‘Google learns whatever junk it sees on the internet and nothing generates junk better than AI’, as one X user put it.
Last year, the leading AI companies acknowledged that, having run out of content to scrape from the web, they were beginning to use synthetic training data – that is, data generated by generative AI itself. A year ago, OpenAI’s Sam Altman said he was ‘pretty confident that soon all data will be synthetic data’, made up by other AIs.
This is a huge problem. It essentially causes the models to ‘collapse’ and to stop giving useful results. ‘Model collapse is when generative AI becomes unstable, unreliable or stops functioning. It can occur when generative AI models are trained on content generated by AI rather than humans’, Professor Nigel Shadbolt of the Open Data Institute warned last December. One researcher, Jathan Sadowski, has called this phenomenon ‘Habsburg AI’, after the Spanish Habsburg dynasty, which died out in 1700 as a result of illnesses caused by in-breeding.
You can argue that something like this already happens without the assistance of AI, such as when a bogus fact is inserted on to Wikipedia, cited in the media and then the media citations become the justification for its continued inclusion on Wikipedia.
AI simply automates and speeds up this process of generating falsehoods. This week, the Telegraph gave the following example: ‘When Google claimed there was no African country beginning with the letter K, its answer appeared to have been based on a web discussion of ChatGPT getting the same question wrong. In other words, AI is now using other AI fabrications as gospel.’
The most apt description of this phenomenon comes from some American researchers, who last year coined the phrase ‘Model Autophagy Disorder’, or MAD. They wanted to evoke the practice of introducing bovine prions into the cattle food supply, a practice which caused bovine spongiform encephalopathy, or mad cow disease. ‘Our primary conclusion across all scenarios is that without enough fresh real data in each generation of an autophagous loop, future generative models are doomed to have their quality (precision) or diversity (recall) progressively decrease’, they wrote.
Very few people warned of the downsides of generative AI when OpenAI opened its ChatGPT tool in November 2022. Now, ChatGPT has polluted the web and has poisoned itself and other AI tools. Cleaning this up will be a huge challenge. While the promised gains of AI remain elusive, the costs are clearly starting to mount.
Andrew Orlowski is a weekly columnist at the Telegraph. Visit his website here. Follow him on X: @AndrewOrlowski.