NVIDIA Financial and Market Overview slide image

NVIDIA Financial and Market Overview

NVIDIA New TensorRT-LLM Software More Than Doubles Inference Performance TensorRT-LLM Supercharges Hopper Performance Software optimizations double leading performance 8X Increase in GPT-J 6B Inference Performance 4.6X Higher Llama2 Inference Performance NVIDIA developed TensorRT-LLM, an open-source software library that enables customers to more than double the inference performance of their GPUs TensorRT-LLM on H100 GPUs provides up to an 8X performance speedup compared to prior generation A100 GPUs running GPT-J 6B without the software 5.3X reduction in TCO and 5.6X reduction in energy costs With TensorRT-LLM for Windows, LLMs and generative Al applications can run up to 4x faster locally on PCs and Workstations powered by NVIDIA GeForce RTX and NVIDIA RTX GPUs TensorRT-LLM for data centers now publicly available; TensorRT-LLM for Windows in beta 8X 7X 6X 5X 2X 5X 8x 4X 3X 4X 4x 2X 3X 1X 1x 1X 1X 2.6X 4.6X OX OX A 100 H100 August H100 TensorRT- A100 LLM H100 August H100 TensorRT- LLM Text summarization, variable input/output length, CNN/ DailyMail dataset | A100 FP 16 PyTorch eager mode/H100 FP8 | H100 FP8, TensorRT-LLM, in-flight batching
View entire presentation