The Great GPU Shortage (again) - H+ Weekly - Issue #427

This week - AI generates dangerous recipes; 25 years of RoboCup and embryonic stem cells; gene therapies for eternal youth; and more!

Aug 11, 2023

We are now in the middle a generative AI boom. Everyone is asking not if, but how they can incorporate generative AI into their services. With this surge in demand for computing power, there is an equal surge in demand for the hardware required to run all this AI. Just as everyone is scrambling to add AI to their products, AI providers are scrambling to get the latest and most powerful GPUs.

Training and running large language models require an enormous amount of computing power. We're talking about hundreds, if not thousands, of GPUs working together to train and deliver the latest LLM models. To illustrate what numbers we're dealing with, AI experts estimate that GPT-4 was trained on somewhere between 10,000 and 25,000 Nvidia A100 GPUs. Meta has 21,000 A100s, Tesla has 7,000, and StabilityAI has 5,000 A100s.

These are the figures for Nvidia A100 GPUs, based on the Ampere architecture. Nvidia has now introduced a new model, the H100, based on the new Hopper architecture. The H100 has become the new leader in GPU performance, boasting a speed increase of 50-400% compared to the A100 (depending on the task), while costing 1.5 times more than the A100. This performance to price ratio makes H100 GPU very desirable by AI companies.

Inflection is in the process of building the world's largest AI cluster, which will feature 22,000 NVIDIA H100 Tensor Core GPUs. Inflection predicts that if this supercomputer were to be listed on the Top 500 list of the most powerful supercomputers, it could potentially secure the second, if not the first, position, even though it's optimized for AI calculations.

OpenAI might need 25,000-30,000 H100s to train their next-generation large language models and they will be provided through their partnership with Azure. Meta is exploring the possibility of acquiring 25,000 H100s. Major cloud providers like Azure, Google Cloud, AWS, and Oracle might each seek 30,000 units. Lambda, CoreWeave, and other private cloud providers may collectively require 100,000 units. Smaller companies such as Anthropic, Helsing, Mistral, Character, and Elon Musk's new AI lab, x.ai, are aiming to acquire 10,000 H100s each.

This adds up to a minimum of 320,000 H100s on order and awaiting delivery. At an approximate price of $35,000 per unit, this equates to roughly $12 billion worth of GPUs. Nvidia is riding this wave high, reaching a valuation of $1 trillion and joining the prestigious $1T club, alongside Apple, Microsoft, Saudi Aramco, Alphabet, and Google.

However, addressing such high demand presents challenges. As Andrej Karpathy tweeted, “Who’s getting how many H100s and when is top gossip of the valley rn”. Getting hold on new, powerful, or even additional GPUs is becoming a bottleneck for many AI companies.

OpenAI CEO, Sam Altman, admitted that the company is heavily limited by GPU availability and that it affects their short term plans like fine-tuning or introducing 32k context windows and multimodality. Altman even jokes that he would love it if people use ChatGPT less because they don’t have enough GPUs. Elon Musk joked that getting GPUs is harder than getting drugs. The situation has escalated to the point where CoreWeave, an AI cloud service provider collaborating with Inflection on their supercomputer mentioned earlier, secured a $2.3 billion debt facility using Nvidia's H100 GPUs as collateral.

Nvidia is taking steps to address the demand. As reported by Moore's Law is Dead, Nvidia has reportedly paused the production of RTX 4000 cards and shifted its focus to producing the H100 as much as possible. When asked about shortages, Nvidia's executives stated that the bottleneck is not in chip production itself, but in the packaging process, according to Venture Beat. Nvidia's CFO mentioned during an earnings call that the supply situation will improve in the second half of the year.

Given the shortage of Nvidia GPUs, what about alternatives? While AI companies could consider turning to AMD for high-end GPUs, Nvidia's GPUs remain the preferred choice in the AI industry due to the combination of hardware performance and software optimization. Most popular machine learning frameworks, such as PyTorch and TensorFlow, are optimized to work with Nvidia's CUDA framework. MosaicLM showed that it is possible to run ML models on AMD hardware and software stack with no modifications, but this combination still lags behind Nvidia's offering in terms of performance.

This is just a high-level overview of the current situation in the ultra-high-end GPU market for AI. For more detailed information, I recommend checking out the post from GPU Utils, which delves much deeper into this topic than I can cover here.

Becoming a paid subscriber now is the best way to support the newsletter.

Become a paid subscriber

If you enjoy and find value in what I write about, feel free to hit the like button and share your thoughts in the comments. Share the newsletter with someone who will enjoy it, too. That will help the newsletter grow and reach more people.

You can also buy me a coffee.

🦾 More than a human

Gene Therapies for Eternal Youth
Imagine a future in which you just stop ageing. You take a pill or receive an injection in your arm, and you remain at your current age. This is what some companies and scientists are working on right now. By using partial reprogramming with Yamanaka factors to rejuvenate cells, they promise mass-produced and affordable gene therapies that will stop ageing and make us live longer lives.

Scientists May Have Found Mechanism Behind Cognitive Decline in Aging
Scientists at the University of Colorado Anschutz Medical Campus have discovered what they believe to be the central mechanism behind cognitive decline associated with normal ageing. Researchers, using mouse models, found that altering the CaMKII brain protein caused similar cognitive effects as those that occur through normal ageing. This research paves the way for developing drugs and other therapeutic interventions that could impact the expression of the protein and target cognitive decline, such as Alzheimer’s.

🧠 Artificial Intelligence

The Economic Case for Generative AI and Foundation Models
Two partners from Andreessen Horowitz explore the economics of traditional AI and why it’s typically been difficult to reach escape velocity for startups using AI as a core differentiator. They argue that the new wave of AI startups using generative models can address more markets that do not require absolute correctness (and if correctness is required, like in coding, the system can be guided by the user) while offering services far better than humans at high-value tasks. They also note that we see all sorts of new user behaviours, something we haven’t seen that much before, they argue. “The result is more jobs, more economic expansion, and better goods for consumers. This was the case with the microchip and the Internet, and it’ll happen with generative AI, too”, concludes the article.

What Self-Driving Cars Tell Us About AI Risks
Mary L. “Missy” Cummings, an expert in systems automation and now a safety consultant for the US National Highway Traffic Safety Administration, shares 5 lessons the wider AI community should take from the efforts to make cars drive themselves to make their AI systems safe. These lessons come from years of experience in an industry in which a mistake, either if it is caused by a human or by an error in the code or by an unexpected situation, can kill someone.

Meta disbands protein-folding team in shift towards commercial AI
Did you know Meta had a team of researchers working on an AI to predict the shape of proteins, a rival to DeepMind’s AlphaFold? It had because as Financial Times reports, the team has been disbanded. According to the report, this move is part of a wider shift in Meta’s strategy (“year of efficiency” as Mark Zuckerberg described it) to focus on more commercially viable products.

AI is ruining the internet
From data scraping forcing online services to become more closed, to the proliferation of misinformation and the increase in scams, the internet has become a more hostile place for humans. All of this is thanks to AI, which can now generate very convincing content and act like a human would online, making it sometimes almost impossible to distinguish if you are interacting with a human or with a bot, as argued in this article.

Supermarket AI meal planner app suggests recipe that would create chlorine gas
Here is why you should check what a large language model generates. A New Zealand supermarket used an AI to create recipes to creatively use up leftovers during the cost of living crisis. The AI did generate creative recipes, recommending customers recipes for deadly chlorine gas, “poison bread sandwiches” and mosquito-repellent roast potatoes. It remains uncertain whether these recipes included garnishes such as fish-shaped crackers, fish-shaped candies, fish-shaped solid waste, fish-shaped dirt or fish-shaped ethylbenzene.

🤖 Robotics

Will Robots Triumph over World Cup Winners by 2050?
Nothing drives innovation like good competition. With this idea in mind, in 1997 Dr Hiroaki Kitano created the first RoboCup - a tournament similar to the football World Cup in which robots, not humans, run after a ball and score goals. This article tells the rich story of the tournament, how it changed over the last 25 years and how it impacted the careers of many roboticists and the robotics industry.

▶️ Towards Legged Locomotion on Steep Planetary Terrain (1:23)
Climbing steep slopes can pose a significant challenge, even for the most advanced four-legged robots. In an effort to address this issue, researchers from ETH Zurich have taught their robot to walk on its knees. This adaptation enables the robot to effectively climb sandy slopes with angles exceeding 15 degrees. The paper describing the robot is available here.

🧬 Biotechnology

After 25 years of hype, embryonic stem cells are still waiting for their moment
In 1998, researchers in Wisconsin successfully isolated stem cells from human embryos. This marked a significant breakthrough, holding the promise of curing a wide array of diseases. However, more than two decades have passed since then, and to date, there have been no treatments on the market utilizing these cells. This article delves into the tumultuous 25 years of stem cell research, a journey fraught with not only technical hurdles but also substantial political and ethical resistance from the public. Nevertheless, it appears that we have moved beyond that stage, and we are now getting closer to a point of broader accessibility to stem cell-based therapies.

Tangents

▶️ The Plan to Build an Island Using Only Electricity (36:17)

Atlas Pro tells a fascinating story of one man’s dream to grow artificial coral reefs, artificial islands, or even entire cities in the middle of the ocean, straight from the sea using only electricity.

H+ Weekly sheds light on the bleeding edge of technology and how advancements in AI, robotics, and biotech can usher in abundance, expand humanity's horizons, and redefine what it means to be human.

A big thank you to my paid subscribers, to my Patrons: whmr, Florian, dux, Eric, Preppikoma and Andrew, and to everyone who bought me a coffee on Ko-Fi. Thank you for the support!

You can follow H+ Weekly on Twitter and on LinkedIn.

Thank you for reading and see you next Friday!

Discussion about this post

Ready for more?