Humanity Redefined

Humanity Redefined

Share this post

Humanity Redefined
Humanity Redefined
In-Memory Computing and Analog Chips for AI
Copy link
Facebook
Email
Notes
More

In-Memory Computing and Analog Chips for AI

As modern AI models push digital computers to their limits, engineers look at ways to improve AI computations in terms of speed and efficiency. One of the paths is to go back to the roots of computing

Conrad Gray's avatar
Conrad Gray
Jul 12, 2023
∙ Paid
3

Share this post

Humanity Redefined
Humanity Redefined
In-Memory Computing and Analog Chips for AI
Copy link
Facebook
Email
Notes
More
1
Share

Modern AI models are huge. The number of their parameters is measured in billions. All those parameters need to be stored somewhere and that takes a lot of memory.

For instance, the open-source LLaMa model, which consists of 65 billion parameters, demands 120GB of RAM. Even after converting the original 16-bit numbers to 4-bit numbers for weight representation, it still requires 38.5GB of memory. Keep in mind that these figures are for a model with 65 billion parameters. GPT-3, with its 176 billion parameters, requires even more memory. Nvidia's top consumer GPU, the RTX 4090, has 24GB of memory, while their data centre-focused product, the H100, offers 80GB of memory. It's worth noting that the memory requirements for training AI models are typically several times larger than the number of parameters. This is because training involves storing intermediate activations, which typically adds 3-4 times more memory than the number of parameters (excluding embeddings).

Due to their size, large neural networks cannot fit into the local memory of CPUs or GPUs, and need to be transferred from external memory such as RAM. However, moving such vast amounts of data between memory and processors pushes current computer architectures to their limits.

The Memory Wall

The digital computers we use today, ranging from smartphones in our pockets to powerful supercomputers, follow the von Neumann architecture. In this architecture, memory (which holds the data and the program to process that data), I/O and processing units (like CPUs or GPUs) are separate units connected with a bus to exchange information between them.

Keep reading with a 7-day free trial

Subscribe to Humanity Redefined to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 Humanity Redefined
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share

Copy link
Facebook
Email
Notes
More