What is Generative AI? | IBM
Generative AI begins with a foundation model—a deep learning model that serves as the basis for multiple different types of generative AI applications. The most common foundation models today are large language models (LLMs), created for text generation applications, but there are also foundation models for image generation, video generation, and sound and music generation—as well as multimodal foundation models that can support several kinds content generation.
To create a foundation model, practitioners train a deep learning algorithm on huge volumes of raw, unstructured, unlabeled data—e.g., terabytes of data culled from the internet or some other huge data source. During training, the algorithm performs and evaluates millions of ‘fill in the blank’ exercises, trying to predict the next element in a sequence—e.g., the next word in a sentence, the next element in an image, the next command in a line of code—and continually adjusting itself to minimize the difference between its predictions and the actual data (or ‘correct’ result).
The result of this training is a neural network of parameters—encoded representations of the entities, patterns and relationships in the data—that can generate content autonomously in response to inputs, or prompts.
This training process is compute-intensive, time-consuming and expensive: it requires thousands of clustered graphics processing units (GPUs) and weeks of processing, all of which costs millions of dollars. Open-source foundation model projects, such as Meta’s Llama-2, enable gen AI developers to avoid this step and its costs.