Generative AI

GPT-4o and Gemini 1.5 Pro: How the New AI Models Compare


It was a battle of the bots last week as AI startup OpenAI hosted its Spring Update a day before Google’s annual I/O developer conference.

Both announced updates to their generative AI models, tossing around terms like tokens and parameters while showcasing new interfaces and functionality.

The terminology gets wonky, and it’s not always easy to understand the distinctions between these models — not just between OpenAI’s ChatGPT and Google’s Gemini, but also all their competitors.

ai-atlas-tag.png ai-atlas-tag.png

Don’t get me wrong; there are differences. Check out the AI chatbot reviews from CNET’s Imad Khan for his hands-on experiences and what he has to say about the pros and cons of each. 

But as I thought about how to compare the latest models, OpenAI’s GPT-4o and Gemini 1.5 Pro, I called a longtime contact to get his perspective. He’s a tech executive with 30 years of experience and has often helped to break down complex topics for me. (He asked not to be named here because he’s not authorized to speak on the record.) 

“In my head, it’s like Coke and Pepsi. You know what I mean?” he said.

Here’s what he means:

Coke and Pepsi are both colas, but made with different formulas, and as any soda drinker will tell you they don’t taste the same. GPT-4o and Gemini 1.5 Pro are both advanced language models, designed according to their makers’ specifications to understand the text prompts you give them and to generate text responses that seem like they were written by a human. But ChatGPT’s responses won’t be exactly like Gemini’s.

The same, but different.

One is built to integrate with Microsoft products, and also functions on its own. One is designed for Google.

Both models offer free and subscription versions. ChatGPT Plus and Gemini Advanced are each $20 per month and give you access to the latest models and more capabilities.

Welcome to the gen AI arms race that kicked off with the arrival of ChatGPT in late 2022. Startups like Anthropic, as well as tech giants including Google and Microsoft, are regularly updating their chatbots, while also in some cases teasing advancements in video, audio and gaming as they vie for market share. (See our reviews of those products, as well as advice and news, at our new AI Atlas hub.) 

And just as you may prefer the taste of one cola over the other, it’s really up to you and your needs and preferences as to which generative AI model you like best. (And, of course, the branding and marketing efforts of each platform will also play a role.)

Here’s a closer look at how GPT-4o and Gemini 1.5 Pro stack up.

Context windows

Last week, Google announced that Gemini 1.5 Pro is expanding to a 1 million token context window, with promises to double to 2 million tokens later this year. (It launched with a 128,000-token context window in February.)

GPT-4o and the earlier GPT-4, on the other hand, have context windows of 128,000 tokens.

What does that mean?

Thecontext window is the span of text a language model can consider when generating a response, sort of like its memory. The larger the context window, the more it can remember from prior conversations, or the more words, video, audio or lines of code it can ingest on your behalf. (It’s under the hood of the model, as opposed to the user interface windows in which you type and receive responses.)

So Gemini has a much larger capacity at this point.

But when it comes to parameters…

Neither OpenAI nor Google have much to say about parameters.

What are they?

First a quick refresher on tokens: Large language models break up queries into tokens in order to process them and provide answers. Tokens can be as short as one character and as long as a word. So in the example, “Hello, reader,” one token might be “hello,” and the other, “reader.” (Remember the model is looking for patterns to predict what will come next.)

Signup notice for AI Atlas newsletter Signup notice for AI Atlas newsletter

Parameters determine the model’s ability to process these tokens and to generate text accurately.

You can also think of parameters like neurons in your brain. The more neurons you have, the more complex your thoughts can be. The same is true of parameters.

A spokesperson said Google hasn’t publicly disclosed the parameters of its models. Estimates range from 1.6 trillion to 175 trillion parameters.

It wasn’t immediately clear how many parameters GPT-4o uses, but in her announcement, OpenAI CTO Mira Murati said the model “brings GPT-4-level intelligence to everything.” GPT-4, which came out in March 2023, reportedly uses 1.8 trillion parameters to process queries.

Therefore, we can’t make an apples-to-apples comparison here, but it’s fair to say both models have a lot of neurons for complex thoughts.

Information access

In Khan’s review of Gemini, he noted that its connection to the internet should give it an advantage over GPT-3.5 — the language model in ChatGPT’s free version at the time — since it can pull up more up-to-date information.

That’s important, because language models have knowledge cutoffs. That is, their training data only includes information up to a certain point in time. For GPT-4o, the knowledge cutoff is October 2023. For Gemini, it’s “early 2023.”

However, in addition to its tool being trained on more-recent data, OpenAI has signed deals with social platform Reddit and media company News Corp to pull in more up-to-date content. And so any advantage may be moot now.

Languages

GPT-4o will be available in 50 languages. Gemini 1.5 Pro is available in 35.

But given Google’s 18-year history with Google Translate, it potentially has a lot more data to train its models in multilingual capabilities.

Interfaces

One last similarity: Both models recently introduced functionality to become more conversational.

For ChatGPT-4o, that includes a new interface that allows you to talk to the chatbot or to share live video footage. (It even uses the familiar phrase “Hey, ChatGPT.”)

You can interrupt the model, and the model can even pick up on your emotions.

For its part, Google has now introduced Gemini Live, which allows you to converse with Gemini. You can interrupt Gemini Live, too.

Editors’ note: CNET used an AI engine to help create several dozen stories, which are labeled accordingly. The note you’re reading is attached to articles that deal substantively with the topic of AI but are created entirely by our expert editors and writers. For more, see our AI policy.





Source

Related Articles

Back to top button