NVIDIA Financial and Market Overview
NVIDIA
New TensorRT-LLM Software
More Than Doubles Inference
Performance
TensorRT-LLM Supercharges Hopper Performance
Software optimizations double leading performance
8X Increase in GPT-J 6B Inference Performance
4.6X Higher Llama2 Inference Performance
NVIDIA developed TensorRT-LLM, an open-source software
library that enables customers to more than double the
inference performance of their GPUs
TensorRT-LLM on H100 GPUs provides up to an 8X
performance speedup compared to prior generation A100
GPUs running GPT-J 6B without the software
5.3X reduction in TCO and 5.6X reduction in energy costs
With TensorRT-LLM for Windows, LLMs and generative Al
applications can run up to 4x faster locally on PCs and
Workstations powered by NVIDIA GeForce RTX and NVIDIA
RTX GPUs
TensorRT-LLM for data centers now publicly available;
TensorRT-LLM for Windows in beta
8X
7X
6X
5X
2X
5X
8x
4X
3X
4X
4x
2X
3X
1X
1x
1X
1X
2.6X
4.6X
OX
OX
A 100
H100 August H100 TensorRT-
A100
LLM
H100 August H100 TensorRT-
LLM
Text summarization, variable input/output length, CNN/ DailyMail dataset | A100 FP 16 PyTorch eager mode/H100 FP8 | H100 FP8, TensorRT-LLM, in-flight batchingView entire presentation