PitchSend

Q3 FY24 Earnings Summary

NVIDIA® New TensorRT-LLM Software More Than Doubles Inference Performance TensorRT-LLM Supercharges Hopper Performance Software optimizations double leading performance 8X Increase in GPT-J 6B Inference Performance 4.6X Higher Llama2 Inference Performance NVIDIA developed TensorRT-LLM, an open-source software library that enables customers to more than double the inference performance of their GPUs TensorRT-LLM on H100 GPUs provides up to an 8X performance speedup compared to prior generation A100 GPUs running GPT-J 6B without the software 5.3X reduction in TCO and 5.6X reduction in energy costs With TensorRT-LLM for Windows, LLMs and generative Al applications can run up to 4x faster locally on PCs and Workstations powered by NVIDIA GeForce RTX and NVIDIA RTX GPUs TensorRT-LLM for data centers now publicly available; TensorRT-LLM for Windows in beta 8X 7X 6X 5X 2X 5X 8x 4X 3X 4X 4x 2X 3X 1X 1X 2.6X 4.6X 1X 1x OX OX A100 H100 August H100 Tensor RT- A100 LLM H100 August H100 TensorRT- LLM Text summarization, variable input/output length, CNN / DailyMail dataset | A100 FP 16 PyTorch eager mode/H100 FP8 | H100 FP8, TensorRT-LLM, in-flight batching

View entire presentation