PitchSend

Information Technology

Mining and Metals

NVIDIA Investor Presentation Deck

Released by

NVIDIA

3 of 26

Creator

NVIDIA

Category

Technology

Published

May 2020

Transcriptions

#1NVIDIA Investor Call May 28, 2020 NVIDIA #2Petaflop/s - Days 1E+03 1E+02 1E+01 1E+00 1E-01 1E-02 1E-03 Source: OpenAl and NVIDIA Analysis AlexNet ● 2012 Computing For Training Al 3000x **********. CHALLENGES: ACCELERATING BIG AND SMALL ●ResNet Turing NLG Megatron-GPT2 GPT-2 ●BERT 2013 2014 2015 2016 2017 2018 2019 2020 3000X Higher Compute Required to Train Largest Models Since Volta Megatron-BERT Millions of Interactions Billions of photos tagged Al Interactions Per Day Q Billions of Searches 100s of Billions Events For Cyber Threat 10s Billions of Ecom Recommendations Thousands Ads / Person Every Al Powered Interaction Needs Varying Amount of Compute Millions of Medical Scans. 100s of Millions Fin Txn For Fraud #3MODERN CLOUD DATA CENTER Diverse Applications | Scale-Up & Scale-Out Workloads | Insatiable Demand 2 #4REIMAGINING THE GPU Three Breakthroughs to Fuel the Next Era of Modern Accelerated Data Centers 20X A GIANT PERFORMANCE LEAP meall ADOMI .. UNIFIED AI TRAINING AND INFERENCE ACCELERATION 1-50 SCALABILITY FOR THE ELASTIC DATACENTER #5FP32 TRAINING INT8 INFERENCE FP64 HPC MULTI INSTANCE GPU Peak 312 TFLOPS 1,248 TOPS 19.5 TFLOPS Vs Volta 20X 20X 2.5X 7X GPUs ANNOUNCING NVIDIA A100 GREATEST GENERATIONAL LEAP - 20X VOLTA 400 ‒‒‒‒ www.tatt ரய்மீா DOH #654 BILLION XTORS ANNOUNCING NVIDIA A100 GREATEST GENERATIONAL LEAP - 20X VOLTA 3RD GEN TENSOR CORES SPARSITY ACCELERATION MULTI INSTANCE GPU 3RD GEN NVLINK & NVSWITCH 400 www.t*** OHH 54B xtors | 826mm² | TSMC 7N | 40GB Samsung HBM2 | 3rd gen Tensor Core GPU | 600 GB/s NVLink #7NEW TF32 TENSOR CORES Range of FP32 and Precision of FP16 | Input in FP32 and Accumulation in FP32 | No Code Change Speed-up for Training FP32 TENSOR FLOAT32 FP16 BFLOAT 16 8 BITS 8 BITS 5 BITS 8 BITS 10 BITS 10 BITS 7 BITS 23 BITS 400 www.tatt ரய்மீா DOH #8NEW TENSOR CORE ACCELERATION FOR SPARSITY Optimized For Sparse Al Tensor Ops | 2X Faster Execution | Supported on TF32, FP16, BFLOAT16, INT8 and INT4 Dense Matrix Sparse Matrix 2X Effective A100 Sparsity Optimized Tensor Core 400 www.tatt ரய்மீா DOH #9NEW MULTI-INSTANCE GPU FOR ELASTIC GPU COMPUTING 7x Higher Throughput of V100 with Simultaneous Instances per GPU 400 www.tatt ரய்மீா DOH #10BERT Training 1x 6x V100 A100 UNIFIED AI TRAINING AND INFERENCE ACCELERATION 10.6x T4 1x V100 7x A100 (7 MIGS) BERT Inference ‒‒‒‒ wat www.tatt ரய்மீகா BERT Pre-Training Throughput using Pytorch including (2/3)Phase 1 and (1/3)Phase 2 | Phase 1 Seq Len = 128, Phase 2 Seq Len = 512 V100: DGX-1 Server with 8xV100 using FP32 precision A100: DGX A100 Server with 8xA100 using TF32 precision | BERT Large Inference | T4, V100: TRT 7.1, Precision = FP16, Batch Size =256 | A100 MIG: Pre-production TRT, Batch Size =94, Precision = INT8 with Sparsity #11ANNOUNCING NVIDIA DGX A100 3RD GENERATION INTEGRATED AI SYSTEM 5 PetaFLOPS of Performance in a Single Node Unified System for End-to-End Data Science and Al Fully Accelerated Stacks - Spark 3.0, RAPIDS, TensorFlow, PyTorch, Triton Elastic Scale-Up or Scale-Out Computing High Scalability with Mellanox Networking INT8 FP16 TF32 FP64 10 PetaOPS Peak 5 PFLOPS Peak 2.5 PFLOPS Peak 156 TFLOPS Peak NVIDIA 9x Mellanox ConnectX-6 VPI 200Gb/s Network Interface Dual 64-core AMD Rome CPU 1TB RAM 8x NVIDIA A100 GPUs 6x NVIDIA NVSwitches 4.8 TB/s Bi-Directional Bandwidth 600 GB/s GPU-to-GPU Bandwidth 15TB Gen4 NVME SSD wwwww s!!!!#12ANNOUNCING NVIDIA A100 LIGHTHOUSE CUSTOMERS Elastic Data Center Accelerator Choice of Industry Leaders (-) Alibaba Cloud Google Cloud Atos CRAY a Hewlett Packard Enterprise company Hewlett Packard Enterprise CLOUD Microsoft Azure aws inspur SYSTEMS DELL Technologies Lenovo BAIDU AI CLOUD ORACLE Cloud Infrastructure FUJITSU QCT Tencent Cloud GIGABYTE™ SUPERMICRO 10808080 77 #13TODAY'S AI DATA CENTER 50 DGX-1 Systems for Al training 600 CPU Systems for Al Inference $11M 25 Racks 630 kW $11M 630 kW #14DGX A100 AI 5 DGX A100 Systems for Al Training and Inference $1M 1 Rack 28 kW W $1M 28 kW 1/10th COST 1/20th POWER #15ANNOUNCING NVIDIA DGX A100 SUPERPOD 140 DGX A100 Systems (1,120 A100) 170 Mellanox Quantum 200G InfiniBand Switches 280 TB/s Network Fabric - 15km of Optical Cable 4 PB of All-Flash Networked Storage 700 PFLOPS of Al Performance Built in under 3 Weeks #16NVIDIA EXPANDS SATURNV Before Expansion 1,800 DGX Systems 1.8 ExaFLOPS Adding 4 DGX SuperPODs 560 DGX A100 = 2.8 ExaFLOPS 4.6 ExaFLOPS Total Capacity A #17SMART EVERYTHING REVOLUTION ALWAYS-ON | INSTANT SENSE-INFER-ACT | DISTANT | TRILLIONS #18ANNOUNCING NVIDIA EGX A100 WITH MELLANOX CX6 DX NVIDIA Mellanox ConnectX-6 DX Dual 100 Gb/s Ethernet or InfiniBand Line-speed TLS/IPSec Crypto Engine Time Triggered Transmission Tech for Telco (5T for 5G) ASAP² SR-IOV and VirtuallO Offload C NVIDIA NVIDIA. NVIDIA Ampere GPU 3rd generation Tensor Core New Security Engine for Confidential Al Secure, Authenticated Boot #19ALTRAN Capgemin CUMULUS IBM Atos 5G & CloudRAN ERICSSON SECURITY & NETWORKING FORTINET Red Hat INFRASTRUCTURE MAVENIR DELL Technologies ... Baidu E Juniper SYSTEMS CLOUD Adve vmware Hewlett Packard Enterprise Microsoft Viasat nuagenetworks WIND Lenovo METROPOLIS NVIDIA EGX ECOSYSTEM CLARA AERIAL NVIDIA NVIDIA. JARVIS ISAAC gnani.ai AIFI Micron golí CONVERSATIONAL AI ✔ Intelligent Voice DENSO INTELLIGENT VIDEO ANALYTICS ADEEPVISION CHOOCH DawnLight MUSASH; KENSHC S&P Global FANUC MMALONG ROBOTICS MEDICAL SoftBank GE Healthcare INDUSTRIAL SAMSUNG FOXCONN INDUSTRY LEADERS IRON UCSF SEAGATE SERVICE voca.ai SAFR KOMATSU WHITEBOARD tsmc Walmart #20M Mellanox TECHNOLOGIES LЕСНИОГО0188 TAIGITALOX Data Center-Scale Computing Omniverse RTX Server GTC 2020 ANNOUNCEMENTS Merlin Recommender System Spark RAPIDS Magnum 10 JARVIS Conversational Al TensorFlow cuDNN Magnum 10 NVIDIA AI ONNX Tensor RT Triton .00 DGX A100 Powered by A100 Need T B EGX A100 ISAAC, and BMW M W #21 #22CPUs ETL 2 HR GPUs ETL 3 Min ANNOUNCING NVIDIA MERLIN DEEP RECOMMENDER APPLICATION FRAMEWORK 1TB Ads Dataset TRAINING 12 Days TRAINING 16 Min NVTabular RAPIDS Magnum 10 NVIDIA Merlin HugeCTR cuDNN Magnum 10 Data Lake 100's PB USERS EMBEDDING CANDIDATES GENERATION ↑ 0[1000) O[Billions) RANKING ITEMS EMBEDDING TensorRT Triton Inference Server 0(10) Recommendation User Query #23PRE-TRAINED MODEL 0-0-0 ANNOUNCING NVIDIA JARVIS - MULTIMODAL CONVERSATIONAL AI SERVICES FRAMEWORK NVIDIA GPU CLOUD RE-TRAIN Transfer Learning NeMo Service Maker NVIDIA AI TOOLKIT Multi- Speaker Transcription NLU Vision Speech Language Model NVIDIA JARVIS Dialog Manager Chatbot 77-7 Decoder Gesture Recognition 3/2 BOT Acoustic Model NLU & Speech Recommenders Synthesis Look to Talk Feature Extraction. Voice Encoder TRITON INFERENCE SERVER 11 JESSICA: What will you have ready for Wednesday? DOUGLAS: I expect to have early designs of the packaging. JESSICA: Great. Join Early Access Program developer.nvidia.com/nvidia-jarvis #24OF SPARK 3.0 DataFrame Spark SQL RAPIDS Accelerator for Apache Spark cuML RAPIDS CUDA-X cuGRAPH APACHE ARROW MAGNUM 10 CU ANNOUNCING NVIDIA ACCELERATES SPARK 3.0 O(Peta] ETL APACHE Spark RAPIDS Magnum 10 O(Tera) TRAINING TensorFlow PyTorch cuDNN Magnum 10 ( Data Lake 100's PB INFERENCE ↑ USERS EMBEDDING CANDIDATES GENERATION 0[1000) O[Billions) RANKING ITEMS EMBEDDING TensorRT Triton Inference Server 0(10) Recommendation User Query

Download to PowerPoint

Download presentation as an editable powerpoint.

Related