Visualizing the process, price, and physical limitations of AI compute

Tracking the AI Compute Bottlenecks

Memory bandwidth, interconnect, packaging, and power — where physics gates AI compute.

A modern GPU peaks at 1,979 TFLOPS in FP8. In decode at batch = 1, a 70B model uses 3% of that. The other 97% is tensor cores waiting for memory. This site is about that gap.

choose your path