Unprecedented Acceleration for World’s Highest-Performing Elastic Data Centers
The NVIDIA A2 Tensor Core GPU provides entry-level inference with low power, a small footprint, and high performance for intelligent video analytics (IVA) or NVIDIA AI at the edge. Featuring a low-profile PCIe Gen4 card and a low 40–60 watt (W) configurable thermal design power (TDP) capability, the A2 brings versatile inference acceleration to any server.
A2’s versatility, compact size, and low power exceed the demands for edge deployments at scale, instantly upgrading existing entry-level CPU servers to handle inference. Servers accelerated with A2 GPUs deliver up to 20X higher inference performance versus CPUs and 1.3x more efficient IVA deployments than previous GPU generations — all at an entry-level price point.
NVIDIA-Certified systems with the NVIDIA A2, A30, and A100 Tensor Core GPUs and NVIDIA AI—including the NVIDIA Triton Inference Server, open source inference service software—deliver breakthrough inference performance across edge, data center, and cloud. They ensure that AI-enabled applications deploy with fewer servers and less power, resulting in easier deployments and faster insights with dramatically lower costs.
CUDA Cores | 1280 |
Streaming Multiprocessors | 40 | Gen 3 |
Tensor Cores | Gen 3 | 108 Gen 2 |
GPU Memory | 16 GB GDDR6 ECC |
Thermal Solution | Passive |
Memory Bandwidth | 200 GB/s |
Peak FP32 | 4.5 TFLOPS |
Peak TF32 Tensor Core | 9 TFLOPS | 18 TFLOPS Sparsity |
Peak FP16 Tensor Core | 18 TFLOPS | 36 TFLOPS Sparsity |
INT8 | 36 TOPS | 72 TOPS Sparsity |
INT4 | 72 TOPS | 144 TOPS Sparsity |
Third-Generation NVIDIA Tensor Cores The third-generation Tensor Cores in NVIDIA A2 support integer math down to INT4 and floating-point math up to FP32 to deliver high AI training and inference performance. A2’s NVIDIA Ampere architecture also supports TF32 and NVIDIA’s automatic mixed precision (AMP) capabilities. |
Second-Generation RT Cores The NVIDIA A2 GPU includes dedicated RT Cores for ray tracing and Tensor Cores for AI to power groundbreaking results at breakthrough speed. It delivers up to 2x the throughput over the previous generation and the ability to concurrently run ray tracing with either shading or denoising capabilities. |
Structural Sparsity Modern AI networks are big and getting bigger, with millions to billions of parameters. Not all of these parameters are needed for accurate predictions and inference. A2 provides up to 2x higher compute performance for sparse models compared to previous-generation GPUs. This feature readily benefits AI inference and can be used to improve the performance of model training. |
Compare GPUs For Virtualization
NVIDIA virtual GPU (vGPU) software runs on NVIDIA GPUs.
Match your needs with the right GPU below.
A100 | A30 | L40 | A40 | RTX 6000 | A16 | A2 | |
GPU Architecture | NVIDIA Ampere | NVIDIA Ampere | NVIDIA Ada Lovelace | NVIDIA Ampere | NVIDIA Ada Lovelace | NVIDIA Ampere | NVIDIA Ampere |
Memory Size | 80 GB/40 GB HBM2 | 24 GB HBM2 | 48 GB GDDR6 with ECC | 48 GB GDDR6 | 48 GB DDR6 | 64 GB GDDR6 (16 GB per GPU) | 16 GB GDDR6 |
Virtualization Workload | Highest performance virtualized compute including AI, HPC, and data processing. Includes support for up to 7 MIG instances. Upgrade path for V100/V100S Tensor Core GPUs. | Virtualize mainstream compute and AI inference, includes support for up to 4 MIG instances. Upgrade path for T4. | High-end 3D visualization applications, AI training and inference workloads Upgrade path for Quadro® RTX 8000, Quadro RTX 6000, or T4. | Mid-range to high-end 3D design and creative workflows with NVIDIA RTX® Virtual Workstation (vWS). Upgrade path for Quadro RTX™ 8000, RTX 6000 or T4. | High-end design, real-time rendering, AI, and high-performance compute workflows. Upgrade path for Quadro RTX 6000. | Office productivity applications, streaming video and teleconferencing tools for graphics-rich virtual desktops accessible from anywhere. Upgrade path for M10 or T4. | Space constrained environments or edge deployments of graphics or compute workloads. |
vGPU Software Support | NVIDIA AI Enterprise | NVIDIA AI Enterprise | NVIDIA RTX vWS, NVIDIA Virtual PC (vPC), NVIDIA Virtual Apps (vApps), NVIDIA AI Enterprise* | NVIDIA RTX vWS, vPC, vApps, NVIDIA AI Enterprise | NVIDIA RTX vWS, vPC, vApps, NVIDIA AI Enterprise* | NVIDIA RTX vWS, vPC, vApps, NVIDIA AI Enterprise | NVIDIA RTX vWS, vPC, vApps |
Architecture | NVIDIA Ampere |
---|---|
GPU rozwiązania i zastosowanie | NVIDIA Virtual PC (vPC), NVIDIA Virtual Applications (vApps), NVIDIA RTX Workstation (vWS), NVIDIA Virtual Compute Server (vCS), Visualization and AI, NVIDIA AI Enterprise for VMWare |
wsparcie VR | tak |
chłodzenie grafiki | pasywne serwerowe |
rodzaj pamięci | GDDR6 |
pamieć karty | 16GB |
Mocowanie i profil kart | Low-Profile |
Gwarancja | 3 lata |
Konfiguracja