The Latest Generative AI Model From The OpenAI That Is Still Waiting For

May 7, 2024

70 5 minutes read

The Latest Generative AI Model From The OpenAI That Is Still Waiting For — bG9jYWw6Ly8vcHVibGlzaGVycy8zNzg5MzQvMjAyNDA1MDYxNDI3LW1haW4uY3JvcHBlZF8xNzE0OTgwNDcwLmpwZWc.jpg

JAKARTA – GPT-4 is currently the best generative AI tool on the market, but that doesn’t mean we’re not looking into the future. OpenAI CEO Sam Altman regularly provides clues about GPT-5, it looks like we’re going to see a new and improved AI model soon.

While there is no specific launch date for the GPT-5, many think that the public may see it soon. However, no matter when it launches, there are some key features that we expect to be there when the GPT-5 launches.

What Is GPT-5 From OpenAI?

The GPT-5 is a highly anticipated successor to OpenAI’s AI GPT-4 model, which is expected to be the strongest generative model in the market. While there is no official date yet for the launch of the GPT-5, there are indications that this model may be released in the summer of 2024. Very few details about this model are known today, but some things can be said with a certain level of certainty:

OpenAI has registered the trademark for the name with the United States Patent and Trademark Office.

Several OpenAI executives have discussed or provided clues about the possible capabilities of this model.

OpenAI CEO Sam Altman repeatedly mentioned this model during a YouTube interview with Lex Fridman in March 2024.

All of this shows a thrilling reality: GPT-5 is coming soon! However, there are still many things that are speculation at this time. However, there are some things we hope for and are quite sure to be present in this model. Here are some of them:

More Multimodal

One of the most interesting improvements to the family of AI GPT models is multimodality. Multimodality is the ability of AI models to process not only text but also other types of input such as images, audio, and videos.

Multimodality will be an important milestone for the progress of the GPT model family going forward. GPT-4 is already proficient in managing input and image output, improvements that include audio and video processing are the next milestones for OpenAI, and GPT-5 is the right place to start.

Google has made serious progress with this kind of multimodality with its Gemini AI model. It would be strange if OpenAI didn’t give a response. In its Unconfuse Me podcast [PDF transcript], Bill Gates asked OpenAI CEO Sam Altman about what achievements he saw for the GPT series in the next two years. The answer? Video processing.

So, for GPT-5, it is hoped that it can play with videos ‘uploading videos as a request, create videos directly, edit videos with demand text, extract segments from videos, and find certain scenes from large video files. We hope to do the same with audio files. It’s a huge demand, yes. However, given how fast AI development is, this is a very reasonable hope.

Bigger And More Efficient Context Window

Despite being one of the most sophisticated AI models in the market, the family of AI GPT models has a fairly small context window. For example, Anthropic’s Claude 3 has a context window of 200,000 tokens, while Google’s Gemini can process up to 1 million tokens (128,000 for standard use).

By comparison, the GPT-4 has a relatively smaller context window, about 128,000 tokens, with about 32,000 tokens or fewer realistic ones available for use on interfaces like ChatGPT.

With advanced multimodality included in the picture, the improvement of the context window is almost inevitable. Maybe a two or four-fold increase would be enough, but we hope to see an increase of about ten times. This will allow the GPT-5 to process much more information in a much more efficient way. However, larger context windows don’t necessarily mean better. So, rather than just improving the context window, we would like to see an increase in the efficiency of context processing.

GPT Agent

One of the most interesting possibilities of the GPT-5 release is the appearance of a GPT Agent. Although the term “game changer” may have been used too often in the context of AI, GPT agents will actually change the game in every practical sense. But how change in game?

Currently, AI models like GPT-4 can help you complete tasks. They can write emails, make jokes, solve mathematical problems, or post blogs for you. However, they can only perform certain tasks and cannot complete the set of related tasks needed to complete your work.

Suppose you are a web developer. As part of your work, you are expected to do a lot: design, codewriting, problem solving, and more. Right now, you can only assign some of these tasks to AI models in stages. Maybe you can ask the GPT-4 model to write code for the home page, then ask it to do it for the contact page, and then for the About Page, etc. You need to complete these tasks over and over again. And there are tasks that the model cannot complete.

This gradual process of asking AI models for certain sub-tasks is time-consuming and inefficient. In this scenario, you ‘web developers’ are human agents responsible for coordinating and requesting a single AI model by one task until you complete a complete set of related tasks.

GPT agents promise special expert bots coordinated by, hopefully, GPT-5 capable of self-thinking and handling all subsets of complex tasks autonomously. Emphasis on “self-thinking” and “autonomous.”

So, if the GPT-5 is equipped with a GPT Agent, you can ask it to “Build a portfolio site for Maxwell Timothy” rather than just “writing code for the home page.” The GPT-5 will then theoretically be able to ask itself by calling an expert AI agent to handle the various sub-tasks needed to build a website.

Maybe he called in one GPT to collect information from the web about Maxwell Timothy, another agent to write code for different pages, another agent to generate and optimize images, and even another AI agent to implement the site, all without the need for repetition of instruction from humans.

Less Hallucination

Although OpenAI has made progress in dealing with hallucinations in their AI model, true trials for GPT-5 will be his ability to address persistent hallucinatory issues, which have severely hampered AI adoption in important and critical domains such as health care, aviation, and cybersecurity.

These are all areas that will benefit greatly from AI’s great engagement but are currently avoiding significant adoption.

For more details, hallucinations in this context refer to situations where AI models produce and present information that sounds reasonable but is completely made up with a high level of confidence.

Imagine a scenario in which GPT-4 is integrated into a diagnostic system to analyze patient symptoms and medical reports. Hallucination can make AI confidently provide incorrect diagnosiss or recommend potentially harmful treatment pathways based on envisioned facts and incorrect logic. The consequences of such errors in the medical field can be very fatal.

Similar reservoirs apply to other critical areas, such as aviation, nuclear energy, maritime operations, and cybersecurity. We don’t expect GPT-5 to fully resolve hallucinatory issues, but we expect to significantly reduce the possibility of such incidents.

As we look forward to the official launch of this highly anticipated AI model, one thing is for sure: GPT-5 has the potential to redefine what limits are possible with artificial intelligence, bringing a new era of collaboration and innovation between humans and machines.

Tag:
openai
chatgpt
kecerdasan buatan
artificial intelligence

Source

May 7, 2024

70 5 minutes read