DGX A100 P3687 System 8x 80GB GPUs Full
8x NVIDIA A100 GPUs with up to 640 GB Total GPU Memory
12 NVLinks/GPU, 600 GB/s GPU-to-GPU Bi-directonal Bandwidth
6x NVIDIA NVSwitches
4.8 TB/s Bi-directional Bandwidth, 2X More than Previous Generation NVSwitch
Up to 10X NVIDIA Connectx-7 200 Gb/s Network Interface
500GB/s Peak Bidirectional Bandwidth
Dual 64-Core AMD CPUs and 2 TB System Memory
3.2X More Cores to Power the Most Intensive AI Jobs
30 TB Gen4 NVME SSD
50 GB/s Peak Bandwidth, 2X Faster than Gen3 NVME SSDs
Component | NVIDIA DGX A100 640GB System | NVIDIA DGX A100 320GB System |
GPU |
Qty 8 NVIDIA A100 GPUs Third-generation NVLinks |
Qty 8 NVIDIA A100 GPUs Third-generation NVLinks |
Total GPU Memory | 640 GB | 320 GB |
NVIDIA NVSwitch |
Qty 6 Second generation (2x faster than first generation) |
Qty 6 Second generation (2x faster than first generation) |
Networking |
Up to 10 (Factory ship config) NVIDIA ConnectX-6 or ConnectX-7 InfiniBand/200 Gb/s Ethernet |
Up to 9 (Factory ship config) NVIDIA ConnectX-6 or ConnectX-7 IB/200 Gb/s Ethernet (Optional Add-on: Second dual- port 200 Gb/s Ethernet) |
CPU | 2 AMD Rome, 128 cores total | 2 AMD Rome, 128 cores total |
System Memory | 2 TB (Factory ship config) |
1 TB (Factory ship config) (Optional Add-on: 1 TB to get 2 TB max.) |
Storage |
30 TB (Factory ship config) U.2 NVMe Drives (Optional drive upgrade to 60 TB) |
15 TB (Factory ship config) U.2 NVMe Drives (Optional Add-on: 15 TB to get 30 TB max. Optional drive upgrade to 60 TB) |
Component Description
Component | Description |
GPU | NVIDIA A100 GPU |
CPU | 2x AMD EPYC 7742 CPU w/64 cores |
NVSwitch | 600 GB/s GPU-to-GPU bandwidth |
Storage (OS) | 1.92 TB NVMe M.2 SSD (ea) in RAID 1 array |
Storage (Data Cache) |
3.84 TB NVMe U.2 SED (ea) in RAID 0 array (Optional 7.68 TB NVMe U.2. SEDs) |
Network (Cluster) card |
NVIDIA ConnectX-6 or ConnectX-7 Single Port InfiniBand (default): Up to 200Gbps Ethernet: 200GbE, 100GbE, 50GbE, 40GbE, 25GbE, and 10GbE |
Network (Storage) card |
NVIDIA ConnectX-6 or ConnectX-7 Dual Port Ethernet (default): 200GbE, 100GbE, 50GbE, 40GbE, 25GbE, and 10GbE InfiniBand: Up to 200Gbps |
System Memory (DIMM) | 1 TB per 16 DIMMs |
BMC (out-of-band system management) |
1 GbE RJ45 interface Supports IPMI, SNMP, KVM, and Web UI |
In-band system management | 1 GbE RJ45 interface |
Power Supply | 3 kW |
Form Factor | 6U Rackmount |
Height Width Depth | 10.4” (264 mm) 19" (482.3 mm) max 35.3" (897.1 mm) max |
System Weight | 271.5 lbs (123.16 kg) max |
Specification for Each Power Supply | 200-240 volts AC 6.5 kW max. 3000 W @ 200-240 V, 16 A, 50-60 Hz |
NVIDIA DGX™ A100 is the universal system for all AI workloads, offering unprecedented compute density, performance, and flexibility in the world’s first 5 petaFLOPS AI system. Featuring the NVIDIA A100 Tensor Core GPU, DGX A100 enables enterprises to consolidate training, inference, and analytics into a unified, easy-to-deploy AI infrastructure that includes direct access to NVIDIA AI experts.
https://images.nvidia.com/aem-dam/Solutions/Data-Center/nvidia-dgx-a100-80gb-datasheet.pdf
The NVIDIA A100 Tensor Core GPU delivers unprecedented acceleration for AI, data analytics, and high-performance computing (HPC) to tackle the world’s toughest computing challenges. With third-generation NVIDIA Tensor Cores providing a huge performance boost, the A100 GPU can efficiently scale up to the thousands or, with Multi-Instance GPU, be allocated as seven smaller, dedicated instances to accelerate workloads of all sizes.
With MIG, the eight A100 GPUs in DGX A100 can be configured into as many as 56 GPU instances, each fully isolated with their own high-bandwidth memory, cache, and compute cores. This allows administrators to right-size GPUs with guaranteed quality of service (QoS) for multiple workloads.
A GPU can be partitioned into different-sized MIG instances. For example, in an NVIDIA A100 40GB, an administrator could create two instances with 20 gigabytes (GB) of memory each, three instances with 10GB each, or seven instances with 5GB each. Or a mix.
MIG instances can also be dynamically reconfigured, enabling administrators to shift GPU resources in response to changing user and business demands. For example, seven MIG instances can be used during the day for low-throughput inference and reconfigured to one large MIG instance at night for deep learning training.
With a dedicated set of hardware resources for compute, memory, and cache, each MIG instance delivers guaranteed QoS and fault isolation. That means that failure in an application running on one instance doesn’t impact applications running on other instances.
It also means that different instances can run different types of workloads—interactive model development, deep learning training, AI inference, or HPC applications. Since the instances run in parallel, the workloads also run in parallel—but separate and isolated—on the same physical GPU.
Without MIG, different jobs running on the same GPU, such as different AI inference requests, compete for the same resources. A job consuming larger memory bandwidth starves others, resulting in several jobs missing their latency targets. With MIG, jobs run simultaneously on different instances, each with dedicated resources for compute, memory, and memory bandwidth, resulting in predictable performance with QoS and maximum GPU utilization.
The third generation of NVIDIA® NVLink™ in DGX A100 doubles the GPU-to-GPU direct bandwidth to 600 gigabytes per second (GB/s), almost 10X higher than PCIe Gen4. DGX A100 also features next-generation NVIDIA NVSwitch™, which is 2X times faster than the previous generation.
The fourth generation of NVIDIA® NVLink® technology provides 1.5X higher bandwidth and improved scalability for multi-GPU system configurations. A single NVIDIA H100 Tensor Core GPU supports up to 18 NVLink connections for a total bandwidth of 900 gigabytes per second (GB/s)—over 7X the bandwidth of PCIe Gen5.
Servers like the NVIDIA DGX™ H100 take advantage of this technology to deliver greater scalability for ultrafast deep learning training.
DGX A100 features the NVIDIA ConnectX-7 InfiniBand and VPI (Infiniband or Ethernet) adapters, each running at 200 gigabits per second (Gb/s) to create a high-speed fabric for large-scale AI workloads. DGX A100 systems are also available with ConnectX-6 adapters.
DGX A100 integrates a tested and optimized DGX software stack, including an AI-tuned base operating system, all necessary system software, and GPU-accelerated applications, pre-trained models, and more from NGC™.
Enterprise Cloud Services
NGC offers a collection of cloud services, including NVIDIA NeMo LLM, BioNemo, and Riva Studio for natural language understanding (NLU), drug discovery, and speech AI solutions, and the NGC Private Registry for securely sharing proprietary AI software.
DGX A100 delivers the most robust security posture for AI deployments, with a multi-layered approach stretching across the baseboard management controller (BMC), CPU board, GPU board, self-encrypted drives, and secure boot.
Rozmiar obudowy | 6U Rack |
---|---|
Procesor CPU | AMD EPYC™ |
Producent Procesora | AMD |
podstawka procesora | SP3 |
Chipset | SoC |
ILOŚĆ PROCESORÓW | 1xCPU |
ilość slotów pamięci | 32 DIMM slots |
Typ pamięci | DDR4 DIMM |
Standard pamięci | DDR4-3200 MHz |
ilość GPU/HPC | 8 |
interfejs SSD/HDD | PCIe 4.0 x4, PCIe 3.0 x4/x8, SATA |
rozmiar kieszeni hdd/ssd | 2.5" 15mm |
kieszenie 2.5'' | 8 |
kieszenie 3.5'' | n/a |
złącza M.2 | 2 |
moc zasilaczy | 2200W |
ceryfikaty zasilaczy | 80 plus Platinum |
redundancja zasilaczy | tak |
Gwarancja | 1 rok, 3 lata |
Konfiguracja