Nvidia's 1000x Performance Boost Claim Verified

Maxim Saplin - Jun 4 - - Dev Community

Nvidia's keynote at the recent Computex was full of bold marketing and messaging, bordering on complete BS.

CEO Math Lesson

The "CEO Math" lesson with the "The more you buy, the more you save" conclusion has reminded me of another bold claim (and play with the numbers) from earlier this year.

At Blackwell's intro, one of the slides stated there's a 1000x boost in the compute power of Nvidia GPUs. Though many noticed the comparison was not apples-to-apples: FP16 data type performance for older generations was compared against FP8 and FP4 smaller data types introduced in the newer hardware. Apparently, lower precision computation is faster. The graph would be much nicer if the FP16 line continued. Like that:

Blackwell FP16 performance

It is great that the new hardware has acceleration for smaller data types. It follows the trend of quantized language models - trading off slight LLM performance degradation for smaller size and faster inference. Though presenting the figures in the way they were presented:

  • not explaining the difference in datatypes,
  • hiding the baseline and breaking consistency
  • not highlighting the downside of decreased precision...

... that seems like a sketchy move worth of "How to Lie with Statistics" book.

How to Lie with Statistics

Anyways... To come up with the above numbers for the FP16 performance for Hopper and Blackwell I found the specs for the products that had 4000 TFLOPS FP8 and 20000 TFLOPS FP4.

They are:

  • H100 SXM FP8 3,958 teraFLOPS and FP16 1,979 teraFLOPS

H100 SXM

  • GB200 NVL2 dual GPU system with FP4 40 PFLOPS and FP16 10 PFLOPS (5000 FP16 teraFLOPS per GPU)

GB200 NVL2

The improvement in performance is still impressive, yet 1000x is way nicer than a mere 263x ;)

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .