AMD Investor Day Presentation Deck
ENDNOTES
MI200-01 - World's fastest data center GPU is the AMD Instinct™ MI250X. Calculations conducted by AMD Performance Labs as of Sep 15, 2021, for the AMD Instinct™ MI250X (128GB HBM2e
OAM module) accelerator at 1,700 MHz peak boost engine clock resulted in 95.7 TFLOPS peak theoretical double precision (FP64 Matrix), 47.9 TFLOPS peak theoretical double precision (FP64),
95.7 TFLOPS peak theoretical single precision matrix (FP32 Matrix), 47.9 TFLOPS peak theoretical single precision (FP32), 383.0 TFLOPS peak theoretical half precision (FP16), and 383.0
TFLOPS peak theoretical Bfloat16 format precision (BF16) floating-point performance.
Calculations conducted by AMD Performance Labs as of Sep 18, 2020 for the AMD Instinct™ MI100 (32GB HBM2 PCIe® card) accelerator at 1,502 MHz peak boost engine clock resulted in 11.54
TFLOPS peak theoretical double precision (FP64), 46.1 TFLOPS peak theoretical single precision matrix (FP32), 23.1 TFLOPS peak theoretical single precision (FP32), 184.6 TFLOPS peak
theoretical half precision (FP16) floating-point performance.
Published results on the NVidia Ampere A100 (80GB) GPU accelerator, boost engine clock of 1410 MHz, resulted in 19.5 TFLOPS peak double precision tensor cores (FP64 Tensor Core), 9.7
TFLOPS peak double precision (FP64). 19.5 TFLOPS peak single precision (FP32), 78 TFLOPS peak half precision (FP16), 312 TFLOPS peak half precision (FP16 Tensor Flow), 39 TFLOPS peak
Bfloat 16 (BF16), 312 TFLOPS peak Bfloat16 format precision (BF16 Tensor Flow), theoretical floating-point performance. The TF32 data format is not IEEE compliant and not included in this
comparison.
https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/nvidia-ampere-architecture-whitepaper.pdf, page 15, Table 1.
MI200-26B - Testing Conducted by AMD performance lab as of 10/14/2021, on a single socket Optimized 3rd Gen AMD EPYC™ CPU (64) server, with 1x AMD Instinct™ MI250X OAM (128 GB
HBM2e, 560W) GPU with AMD Infinity Fabric™ technology using benchmark HPL v2.3, plus AMD optimizations to HPL that are not yet upstream. vs. Nvidia DGX dual socket AMD EPYC 7742
(64C) @2.25GHz CPU server with 1x NVIDIA A100 SXM 80GB (400W) using benchmark HPL Nvidia container image 21.4-HPL Information on HPL:
https://www.netlib.org/benchmark/hpl/Nvidia.
HPL Container Detail: https://ngc.nvidia.com/catalog/containers/nvidia:hpc-benchmarks.
Server manufacturers may vary configurations, yielding different results. Performance may vary based on use of latest drivers and optimizations.
MI200-58 - Testing Conducted by AMD performance lab as of 5/25/2022, on a dual socket AMD EPYC™ 7200 Series CPUs (64C) server, with 8x AMD Instinct™ MI250X OAM (128 GB HBM2e,
500W) GPU with AMD Infinity Fabric™ technology using benchmark HPL-AI compiled with HIP version 5.1.20531.cacfa990, AMD clang version 14.0.0, OpenMPI 4.1.2. vs. dual socket AMD EPYC
7002 (64C) Series CPU (64) server with 8x NVIDIA A100 SXM 80GB (400W) using benchmark HPL-AI with CUDA 11.6. HPL-AI container 21.4-hpl
Information on HPL-AI: https://hpl-ai.org/
AMD HPL-AI container detail: https://github.com/ROCmSoftwarePlatform/hpl-ai.git rev bae3342
Nvidia HPL-AI Container Detail: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/hpc-benchmarks Nvidia container image 21.4-HPL
Server manufacturers may vary configurations, yielding different results. Performance may vary based on use of latest drivers and optimizations.
M1300-004 - Measurements by AMD Performance Labs June 4, 2022. MI250X (560W) FP16 (306.4 estimated delivered TFLOPS based on 80% of peak theoretical floating-point performance).
M1300 FP8 performance based on preliminary estimates and expectations. M1300 TDP power based on preliminary projections. Actual results based on production silicon may vary.View entire presentation