Intel Announces Gaudi 3 Accelerator For Generative AI

April 17, 2024

150 6 minutes read

Intel Announces Gaudi 3 Accelerator For Generative AI — 1713328447 0x0.jpg

SUQIAN, CHINA – APRIL 10, 2024 – Illustration Intel unveils Gaudi 3 chip, Suqian, Jiangsu province, … [+] China, April 10, 2024.

Future Publishing via Getty Images

AI and generative AI dominate virtually every enterprise IT strategy discussion these days. Long the solution looking for a problem, the use cases for AI have started to emerge and will likely unlock countless opportunities for businesses to drive efficiencies, healthcare providers to deliver better healthcare and governments to serve their constituencies better. While this may sound a little like the typical hyperbole from AI industry pundits, it’s also true.

Because of this promise, organizations are eager to jumpstart their AI journey. In response, every IT solutions vendor has oriented every product toward AI, whether it’s a good fit or not. Further, as we’ve seen the market capitalization of companies starting with Nvidia rise with apparently no ceiling, the message to other silicon vendors such as AMD and Intel is clear: deliver value in the AI space or get left behind.

In November 2023, AMD jumped on the bandwagon with the release of its MI300-series AI accelerators. Because of the insatiable market demand, the company can’t seem to produce its MI300X GPU fast enough. Next up was Intel, with the recent launch of its Gaudi 3 AI accelerator. There are a lot of questions to answer about Gaudi 3, including its performance and fit for enterprise IT. However, perhaps the biggest question is whether it can compete with the H100/H200 from Nvidia and the MI300X from AMD.

AI Is Complex And Costly

While virtually every enterprise IT organization is looking to accelerate the adoption of AI, very few have found success. While generative AI has dominated the conversation for a year and a half now, 2023 saw only 10% of organizations deploy it. Why? Because it’s hard. Specifically, in a study conducted by cnvrg.io (an Intel company), nearly half (46%) of the AI professionals surveyed cited infrastructure as a barrier to putting large language models into production.

Part of the infrastructure challenge is tied directly to what we’ve experienced with Nvidia’s meteoric rise: product availability. When I speak to customers, it’s not unusual to hear about 12-month wait times for Nvidia H100s. This is for both enterprise IT and cloud providers alike. When the product is finally received, it’s feasible that some of these organizations will lose most of the product’s value as next-generation parts become available in the market.

Another part of the infrastructure challenge is complexity. Designing, deploying and managing an AI environment is unlike other infrastructure tasks. If not tightly coupled, the hardware and software stack can (and will) lead to sunk costs resulting from failed projects.

The last big challenge we see with AI environments is around data. Preparing and training organization-specific and -relevant data while adhering to local data sovereignty and privacy requirements is difficult.

Because of these challenges, Moor Insights & Strategy sees large training projects happening in the cloud today. The infrastructure required for very large-scale GAI workloads (hundreds of billions of parameters) is too expensive and has too long of a lead time to secure and deploy for most companies. Enterprise IT organizations can expect to wait even longer if even the largest cloud providers are waiting months for GPUs. This challenge has fueled a new class of cloud providers such as CoreWeave, Lambda Labs, Gemini and others. It has also led established cloud players such as Vultr to aggressively expand their offerings.

It is important to provide these challenges as a backdrop because that sets the tone for Intel’s value proposition with Gaudi 3.

Gaudi 3 Overview And Claims

As mentioned, Gaudi 3 is an AI accelerator, not a GPU. It is designed from the ground up for one thing—to make AI tasks of training, tuning and inference run faster. From my perspective, Gaudi 3 is an accelerator that seems to have been built with the enterprise datacenter in mind.

Intel Gaudi 3 by the numbers

Intel

When looking at Gaudi 3 architecturally, its 64fifth-generation tensor processing cores combine with eight matrix math engines to deliver efficient performance. These cores are fed by 128GB of high-bandwidth memory with a throughput of 3.7 TB/s and 96MB of SRAM with 12 TB/s throughput. Finally, from a networking perspective, Gaudi 3 ships with 24 200Gb Ethernet on board.

Gaudi 3 will ship in three form factors: an OCP accelerator module-compliant card on the motherboard, a PCIe card and a universal baseboard (with eight accelerators).

Gaudi 3 universal baseboard specifications

Intel

For those unfamiliar with the universal baseboard, this is a form factor that is the size of a motherboard. It resides in a server with a motherboard connected via the backplane. And on this baseboard are the eight Gaudi accelerators. This amounts to 512 TPCs accompanied by 64 matrix math engines and 192GbE NICs to deliver the absolute best raw performance for training and inference.

What does all of this mean for AI workloads? Regardless of how its deployed, Gaudi 3 can ingest a lot of data and feed those tensor cores so that training jobs can happen fast—really fast—as can inferencing. Based on its internal testing, Intel claims that training on a Llama 2 model with 13 billion parameters happens up to 1.7x times faster than Nvidia’s best projections for the H100. Intel says it can train GPT-3 (175 billion parameters) up to 1.4x faster.

The company’s claims for inference are equally impressive, showing Llama2 7B inferencing slightly faster (1.1x) than on Nvidia, while Llama2 70B will infer up to 1.7x faster. Most impressively, the Falcon 180B model inferences up to 4x faster.

I never take company-produced performance benchmarks as gospel. Tweaks and tunes are made within labs to produce the most favorable results. Further, these results are usually compared with the worst possible results that can be found or created for the competition. Keep in mind, this is not a statement about Intel or Nvidia specifically, but about how every company runs benchmarks.

So take the exact numbers in these claims with a grain of salt. However, also take these claims in the spirit in which they are meant: Gaudi 3 is competitive. It’s a huge step up in performance relative to Gaudi 2, and it will be attractive to enterprise IT organizations that want to set up their AI environments within budget, with broad OEM support and without requiring a nuclear power plant.

Can Gaudi 3 Compete With Established Giants?

In short, yes. But this requires a little context. Gaudi 3 is not the H100, H200 or MI300. This is an AI accelerator designed for enterprise AI. As GAI finds more and more utility in the enterprise, Gaudi 3 is a platform that can perform training and inferencing very quickly at a very affordable price point and within a lower power budget. These are key metrics to consider: performance per dollar and performance per watt.

While performance-per-dollar metrics are not yet available because pricing hasn’t been published, I think we can safely say that Gaudi will not be priced at the $30,000 or so where Nvidia is pricing its GPUs. I will bet on this one.

What is available is power efficiency—how much work can be done per card, per watt. And the numbers are compelling. Gaudi can achieve up to 2.3x the power efficiency of the H100. As I looked through the numbers in more detail, it seems that the longer the sequence, the better the power efficiency. For example, look at the Falcon numbers in the chart below. This 2.3x efficiency advantage is based on an input and output length of 2,048 tokens. However, when looking at an input length of 2,048 tokens and an output length of 128 tokens, that advantage drops to 1.1x.

Gaudi 3 demonstrates a significant efficiency advantage

Intel

These are critical factors when considering where Intel looks for Gaudi 3 to land—the enterprise datacenter. Absolute best performance is always desirable. However, at what cost? TCO matters to CIOs and IT ops executives. And with AI, it matters perhaps more than ever.

So, when considering all of these factors, I do believe that Gaudi 3 is going to be competitive. Not only that, but I think Intel will experience the same gold rush that Nvidia and AMD have experienced: they won’t be able to build these parts fast enough to satisfy the demand in the market.

Final Thoughts

The promise of AI is enormous—but AI is complex and costly. Enterprise IT craves help in solving these challenges so it can start realizing the benefits of generative AI.

With Gaudi 3, Intel has delivered a platform that can help with both of these challenges. Further, its vast hardware, software and channel partner ecosystem will help simplify the complexity challenges.

While this is a win for Intel, the big potential winner is enterprise IT. I can hardly wait to see how it all plays out when Gaudi ships in the second half of the year.

Source

April 17, 2024

150 6 minutes read