NVIDIA Q2 FY2021 Financial Summary
Speedup Over V100
NVIDIA A100 SETS ALL 8 PER CHIP AI PERFORMANCE
RECORDS
3x
2x
1x
0.7X
1.2X
1.5X
Relative Speedup
Commercially Available Solutions
2.0X
2.0X
1.9X
1.6X
2.5X
2.4X
2.4X
1.0X
1.0X
1.0X
1.0X
1.0X
1.0X
1.0X
1.0X
0.9X
Ox
Image Classification
ResNet-50 v.1.5
X
NLP
BERT
XX
Object Detection (Heavy
Weight)
Mask R-CNN
XX
Reinforcement Learning
XX
MiniGo
Object Detection (Light
Weight)
SSD
XX
Translation (Recurrent)
XX
XX
Translation
GNMT
(Non-recurrent)
Transformer
Recommendation
DLRM
X = No result submitted
Huawei Ascend
TPUv3
V100
A100
Per Chip Performance arrived at by comparing performance at same scale when possible and normalizing it to a single chip. 8 chip scale: V100, A100 Mask R-CNN, MiniGo, SSD, GNMT, Transformer. 16 chip scale: V100, A100, TPUV3 for ResNet-50 v1.5 and BERT. 512
chip scale: Huawei Ascend 910 for ResNet-50. DLRM compared 8 A100 and 16 V100. Submission IDs: ResNet-50 v1.5: 0.7-3, 0.7-1, 0.7-44, 0.7-18, 0.7-21, 0.7-15 BERT: 0.7-1, 0.7-45, 0.7-22, Mask R-CNN: 0.7-40, 0.7-19, MiniGo: 0.7-41, 0.7-20, SSD: 0.7-40, 0.7-19,
GNMT: 0.7-40, 0.7-19, Transformer: 0.7-40, 0.7-19, DLRM: 0.7-43, 0.7-17| MLPerf name and logo are trademarks. See www.mlperf.org for more information.View entire presentation