Merging Memory and Compute
New Approaches Promise Power Savings But Bring Confusion
Near-memory and in-memory compute are techniques for reducing computing power—especially for AI. But they mean different things to different companies. Understanding the differences is important for understanding how some AI chips work.
In high-performance computing (HPC) and artificial intelligence (AI), data movement has surpassed computing as the main energy consumer. Moving a 64-bit word from off-chip DRAM to a CPU can consume a thousand times the energy of performing a compute operation on that word. For AI workloads, traditional von Neumann approaches can dedicate as much as 90% of energy to fetching data and delivering it to the compute engine. Reducing energy is important for both data centers and battery-powered edge equipment. Developers therefore seek creative new approaches to reducing the number of memory fetches.
Among these approaches are ideas variously called near-memory and in-memory compute, although no formal definition exists for either term. Near memory refers to moving memory close to the compute function or vice versa. In memory has many embodiments, such as adding compute resources to a memory array, adding logic or other computation circuits to a memory bit cell, and having a memory element perform the computation.
Many of these techniques started in universities or other research labs, but they are beginning to show up in commercial products. Ambient Scientific and Upmem are approaching production of near-memory chips. Samsung is pursuing a near-memory proof-of-concept, and Untether has a near-memory chip in production. Mythic is sampling a flash-based in-memory chip, and TetraMem has announced an in-memory chip that employs resistive RAM (RRAM). Practical challenges blocking commercial production include differences between DRAM and logic processes, analog-process variability, and digital/analog conversion.