In-Memory Computing and Analog Chips for AI
As modern AI models push digital computers to their limits, engineers look at ways to improve AI computations in terms of speed and efficiency. One of the paths is to go back to the roots of computing
Modern AI models are huge. The number of their parameters is measured in billions. All those parameters need to be stored somewhere and that takes a lot of memory.
For instance, the open-source LLaMa model, which consists of 65 billion parameters, demands 120GB of RAM. Even after converting the original 16-bit numbers to 4-bit numbers for weight representation, it still requires 38.5GB of memory. Keep in mind that these figures are for a model with 65 billion parameters. GPT-3, with its 176 billion parameters, requires even more memory. Nvidia's top consumer GPU, the RTX 4090, has 24GB of memory, while their data centre-focused product, the H100, offers 80GB of memory. It's worth noting that the memory requirements for training AI models are typically several times larger than the number of parameters. This is because training involves storing intermediate activations, which typically adds 3-4 times more memory than the number of parameters (excluding embeddings).
Due to their size, large neural networks cannot fit into the local memory of CPUs or GPUs, and need to be transferred from external memory such as RAM. However, moving such vast amounts of data between memory and processors pushes current computer architectures to their limits.
H+ Weekly is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.
The Memory Wall
The digital computers we use today, ranging from smartphones in our pockets to powerful supercomputers, follow the von Neumann architecture. In this architecture, memory (which holds the data and the program to process that data), I/O and processing units (like CPUs or GPUs) are separate units connected with a bus to exchange information between them.