Generative AI

Will AI Outgrow Its Resource Shoes?


The Gist

  • Industry bottleneck. LLM expansion challenges intensify with increased data and compute requirements.
  • Hardware evolution. Custom AI chips are crucial for next-generation model training and efficiency.
  • Synthetic shift. Utilizing synthetic data is becoming a key strategy in training more capable AI models.

Each new generation of large language model consumes a staggering amount of resources. 

Meta, for instance, trained its new Llama 3 models with about 10 times more data and 100 times more compute than Llama 2. Amid a chip shortage, it used two 24,000 GPU clusters, with each chip running around the price of a luxury car. It employed so much data in its AI work, it considered buying the publishing house Simon & Schuster to find more. 

Afterward, even its executives wondered aloud if the pace was sustainable.

“It is unclear whether we need to continue scaling or whether we need more innovation on post-training,” Ahmad Al-Dahle, Meta’s VP of generative AI, told me in an interview. “Is the infrastructure investment unsustainable over the long run? I don’t think we know.”

Related Article: What Are Large Language Models (LLMs)? Definition, Types & Uses

Meta Faces Limits in LLM Expansion

For Meta — and its counterparts running large language models — the question of whether throwing more data, compute, and energy at the problem will lead to further scale looms large. Since LLMs entered the popular imagination, the best path to exponential improvement seemed to be combining these ingredients and allowing the magic to happen. But with the top bound of all three potentially in sight, the industry will need newer techniques, more efficient training and custom built hardware to progress. Without advances in these areas, LLMs may indeed hit a wall.

A high speed train in motion with the blurred railway stating visible in the background in piece about the future of large language models.
The industry will need newer techniques, more efficient training and custom built hardware to progress.den-belitsky on Adobe Stock Photos

Related Article: Did Meta Just Top ChatGPT With Its Release of Llama 3’s Meta AI?

New Architectures to Revolutionize LLM Scaling

The path of continued scale probably starts with better methods to train and run LLMs, some of which is already in motion. “We are starting to see new kinds of architectures that are going to change how these models scale in the future,” Swami Sivasubramanian, VP of AI and data at Amazon Web Services, told me in an interview.

Sivasubramanian said researchers within Stanford and elsewhere are getting models to learn faster, with the same amount of data, and 10 times cheaper inference. “I’m actually very optimistic about the future when it comes to novel model architectures, which has the potential to disrupt the space,” he said.



Source

Related Articles

Back to top button