OpenAI Shows Off New GPT-4o Generative AI Model and More ChatGPT Upgrades
OpenAI has introduced its latest generative AI model, GPT-4o, which the company describes as a multimodal upgrade to GPT -4’s abilities as a large language model (LLM) GPT-4o integrates voice, text, and visual data, with the “o” stands for omni in reference to its multimodal functionality.
GPT-4o
OpenAI CTO Muri Murati shared the details of GPT-4o in a virtual presentation at what looked like the basement from The Brady Bunch. She explained how, while GPT-4 was trained on images and text, GPT-4o added auditory data to its training regimen, allowing for a more complete understanding and interaction with users across multiple media and formats.
Based on tests shared by OpenAI, GPT-4o represents a notable evolution in how AI models interact with people. OpenAI said GPT-4o is as good as GPT-4 Turbo in its performance in English text and code and much better in non-English languages. It’s also faster and costs half as much as the GPT-4 Turbo API.
“GPT-4o provides GPT-4 intelligence, but it is much faster, and it improves on its capabilities across text, vision, and audio. For the past couple of years, we have been very focused on improving the intelligence of these models. And they have gotten pretty good. But this is the first time that we are really making a huge step forward when it comes to the ease of use,” Murati said during the presentation. “This is incredibly important because we are looking at the future of interaction between ourselves and the machines. We think that GPT-4o is really shifting the paradigm into the future of collaboration.”
ChatGPT-4o
The company plans to incorporate the new model into ChatGPT, which will up its abilities and responsiveness both through text and voice. The new model can respond to audio at an average speed of 320 milliseconds, essentially the same speed as a human. ChatGPT will converse in a more dynamic way with GPT-40, allowing users to interrupt the AI and enabling the chatbot to detect emotional nuances in the user’s voice and respond in a tone appropriate to what it hears.
Improved visual data processing thanks to GPT-4o will also enhance CHatGPT or applications running the LLM in its speed and accuracy processing images. Users can ask context-specific questions about the content of images, including getting the AI to read code on a screen or identify a brand based on a product in a photo. These advancements aim to facilitate a more natural and intuitive user experience akin to conversing with a human assistant.
In addition to these advancements, OpenAI announced the launch of a desktop version of ChatGPT, as can be seen above. The desktop app is only for macOS for now, but it looks a lot like how Microsoft has been incorporating its Copilot into Windows 11. A Windows ChatGPT desktop app is in the development stage, though there is no mention of a dedicated button like Microsoft gave Copilot. Users can converse with ChatGPT on the computer and even share screenshots. There’s also an audio conversational option with the Voice Mode for ChatGPT available on the web client also available with the desktop app. In other words, Apple users may never bother with whatever generative AI assistant comes native with future versions of Apple computers. The online version of ChatGPT isn’t being left out, either, as it’s getting a facelift in appearance and user interface.
Follow @voicebotaiFollow @erichschwartz
OpenAI Promises Creators New Tool to Control How Their Content Trains Generative AI Models
OpenAI Enhances Assistants API with Advanced File Management and Cost Control Features