As Google embraces generative AI, news publishers chart courses into an uncertain future
Generative AI could revolutionize online search. But for news publishers, it could be a disaster.
News publishers have long been disproportionately dependent upon Google, the undisputed leader of online search.
Roughly 40% of web traffic across media sites originates from Google, according to a December report from The Wall Street Journal which cited data from web analytics company SimilarWeb. In return, news publishers have implicitly agreed to provide their data to the tech giant so that the company can continue to improve the efficacy and reliability of its search engine.
Google’s experiments with generative AI could cause a significant shift in this dynamic – and not in news publishers’ favor.
In May of last year, the tech giant debuted a new AI-powered tool called Search Generative Experience (SGE), an experimental feature that summarizes responses to search queries in natural language text analogous to that used by ChatGPT.
Through SGE, Google hopes to provide a more streamlined and personalized search experience: answers to queries can now be found in a single, convenient chatbox, and users can ask follow-up questions. The model even generates suggestions for such questions; if someone were to search for the best vineyard tours in Napa Valley, for example, it might then suggest asking about Airbnb prices in the area.
Whereas Google search users would traditionally have been led to news publishers’ websites in order to find more information, soon they may be able to confine their search entirely to Google.
“At the scale Google operates, even a small percentage of [lost] traffic could mean millions – if not billions – of fewer visitors to publisher sites,” says Jim Lecinski, associate professor of marketing at Northwestern University. “This is the publishers’ biggest challenge.”
Google says it’s too soon to tell how publishers will be affected by its latest efforts in generative AI-powered search. “It’s premature to estimate the traffic impact of our SGE experiment as we continue to rapidly evolve the user experience and design, including how links are displayed,” a company spokesperson told The Drum. “We’ll continue to prioritize approaches that send valuable traffic to publishers, and in fact, we are showing more links to sites with SGE in search than before, creating new opportunities for content to be discovered.”
Ironically, the functionality of SGE depends partly on data from news publishers’ sites.
“Google wants to be a one-stop-shop for information,” says Chris Rodgers, founder and CEO of CSP, a search engine optimization (SEO) agency. “The problem is that they don’t have that information – they have to go get it from other places. In their perfect world, they’re just gonna go take it and disseminate it directly to users.”
The problem with that approach, Rodgers says, is that it’s always been a two-way street; if publishers suddenly find that they’re not receiving what’s been assured in the bargain – namely, traffic to their websites – then they may decide it’s in their best interest to search for an alternative means of engaging with their audiences.
This is already starting to take place. Some publishers have responded by fixing bits of code into their websites preventing Google’s algorithm from crawling their content – that is, using it to train the company’s large language model (LLM).
In September, Google announced in a blog post that it was introducing a new feature, called Google-Extended, that enables publishers to opt out of having their content used for the training of some of the company’s AI offerings. (According to Google, the company’s foundational LLMs are trained mainly on publicly available content from the internet – blog posts and chat forums, for example.)
But like a person caught in quicksand, this struggle threatens to only worsen publishers’ predicament: by making their data uncrawlable, they also decrease their visibility on Google Search.
The growing tension between publishers and Google, Rodgers argues, cannot hold forever; something will have to give.
“What happens if Google just starts taking information and not giving anything back?,” he says. “The natural progression is what we’re seeing: pushback to Google that’s saying, ’If you do not give us some kind of credit – and, furthermore, threaten our industry and viability – then we’re not going to give you the content.’”
“That pushback is really important, in my eyes,” he says.
Different paths forward for publishers
Many news publishers have begun to adopt one of two strategies to protect themselves in the dawning era of generative AI: either enter into licensing agreements with the big tech companies, or suing them.
Publishers that have licensed their content have, according to Northwestern’s Lecinski, effectively made a wager “that the fee for training AI will offset the lost traffic and ad revenue if people just get their answers from AI and don’t visit sites.”
Reddit recently signed a $60m-per-year deal with Google that allows the search giant to use the platform’s data to train its AI models. A similar deal was struck in December between OpenAI and German media firm Axel Springer. (Axel Springer is also suing Google over what it believes are anti-competitive adtech practices).
Meanwhile, The New York Times in December sued OpenAI and Microsoft, claiming that its proprietary content had been illegally used to train the LLM powering ChatGPT.
More recently, in March, French regulators fined Google 250m euros (about $270m) for failing to enter into fair licensing deals with news publishers, and for not disclosing to them that their articles were being used to train Gemini, its AI chatbot, according to a report from The New York Times.
There is, of course, an enormous difference in the levels of power between Google – one of the most valuable companies in the world – and individual news publishers, many of whom have been facing mounting economic pressures since before most of the world had ever heard of generative AI.
But what if there was a coordinated, concerted effort across the media landscape to stand up to Google by blocking its AI crawlers? For publishers to collectively declare, in effect, that they wouldn’t be idly pushed into a position of heightened dependency and vulnerability?
Such an unignorable signal could, according to Rodgers, conceivably force Google to reassess its current approach and work towards a new kind of relationship with news publishers which leverages generative AI and benefits both parties. “If everyone does that across the media industry, Google’s gonna get a message that [they’ve] got to do something to even out the relationship and make it right.”
What that something might be is not yet clear. Generative AI, despite its rapid proliferation over the past year and a half, remains a very new technology. Courts will be grappling with the technology’s legal implications for some time, and companies of all sizes are still trying to figure out how to use it productively and responsibly.
But like nature, business abhors a vacuum.
“Google is the behemoth,” Rodgers says, “they’ve had the lion’s share of [the online search industry] for ages and ages. I don’t know if it’s going to change. But if you’ve got other players that are going to better serve the media, or businesses, someone’s going to come in and fill that void.”
For more on the latest happenings in AI, web3 and other cutting-edge technologies, sign up for The Emerging Tech Briefing newsletter.