Q3 FY24 Earnings Summary
NVIDIA®
New TensorRT-LLM Software
More Than Doubles Inference
Performance
TensorRT-LLM Supercharges Hopper Performance
Software optimizations double leading performance
8X Increase in GPT-J 6B Inference Performance
4.6X Higher Llama2 Inference Performance
NVIDIA developed TensorRT-LLM, an open-source software
library that enables customers to more than double the
inference performance of their GPUs
TensorRT-LLM on H100 GPUs provides up to an 8X
performance speedup compared to prior generation A100
GPUs running GPT-J 6B without the software
5.3X reduction in TCO and 5.6X reduction in energy costs
With TensorRT-LLM for Windows, LLMs and generative Al
applications can run up to 4x faster locally on PCs and
Workstations powered by NVIDIA GeForce RTX and NVIDIA
RTX GPUs
TensorRT-LLM for data centers now publicly available;
TensorRT-LLM for Windows in beta
8X
7X
6X
5X
2X
5X
8x
4X
3X
4X
4x
2X
3X
1X
1X
2.6X
4.6X
1X
1x
OX
OX
A100
H100 August H100 Tensor RT-
A100
LLM
H100 August H100 TensorRT-
LLM
Text summarization, variable input/output length, CNN / DailyMail dataset | A100 FP 16 PyTorch eager mode/H100 FP8 | H100 FP8, TensorRT-LLM, in-flight batchingView entire presentation