移至主內容

How Supermicro AMD Servers Deliver High Throughput and Low Latency for AI Solutions

AI Requires Low Latency Storage: Get It Now with Supermicro Servers Based on AMD EPYC™ CPUs

There’s a complete makeover underway in modern enterprises today. It’s centered around what might be called the “AI revolution.” Organizations are obtaining competitive advantages and key insights when they put advanced, AI- or ML-based applications to work. Among the leading examples of such workloads are AI-based large language models (LLMs) that include ChatGPT, LLaMa, and more, along with ML models based on humongous training data sets, complex 3D models, animation and virtual reality, simulations, and other data- and compute-intensive applications.

Behind the flashy rack-mounted hardware that houses the GPU-driven brains of any AI cluster, you must also find high throughput, low-latency storage systems to keep the cluster productive. These support the channels feeding massive amounts of data to train models and perform complex simulations and analyses needed to support AI, ML, and similar workloads. Indeed, one of the biggest challenges facing businesses looking to capitalize on the growth of AI is finding a storage solution that won’t bottleneck their high-performance CPUs, GPUs, or database clusters.

The Holy Grail: High Throughput, Low Latency

Everyone’s jumping on the AI bandwagon and looking for corresponding workload support. To make this not-so-crazy dream come true, a server architecture that’s optimized to support demanding workloads is absolutely essential. AMD has built its EPYC server CPUs—currently in their fourth generation in the 9004 product family—to get the best performance out of server hardware and software with a single CPU. In fact, the 4th gen AMD EPYC™ family offers the following advantages:

  • Leadership in socket and per core performance, with up to 96 Zen 4 Cores in 5nm Core Compute Dies (CCDs)
  • Leadership in memory bandwidth and capacity, with 12 channels for up to 6TB of DDR5 memory per socket
  • Leadership in IO, with up to 128 lanes of PCIe 5.0 access for CXL memory devices, SSDs, NICs, GPUs, and more

Designed from the ground up for maximum performance, efficiency, and sustainability, the AMD EPYC-based servers can manage the balancing acts necessary to get the most out of CPUs, memory, GPUs, storage, and network interfaces. Indeed, the AMD EPYC architecture prioritizes threads so that L3 cache can be locked in for intensive workloads to use exclusively, so PCIe lanes aren’t subject to typical IO scheduling and contention delays.

Filesystem Support and Bottleneck Avoidance

What happens in distributed and parallel modes is that for the distributed file systems data arrives from multiple sources where that data needs to get processed at scale across various protocols and for various applications. In a typical storage system, metadata quickly becomes a bottleneck. Indeed, you can only pump as much data through the system as the metadata supports. As the amount of data scales, the ability to handle metadata needs to scale proportionally. Supermicro AMD servers support WEKA distributed storage: It’s architected to provide such proportional scaling. That explains why despite adding more data capacity and service to a Supermicro system or cluster, the I/O performance continues unabated. Performance scales linearly from eight (minimum node count for a WEKA cluster) to hundreds of nodes. It does so by eliminating bottlenecks and providing support for even the heaviest and most demanding AI/ML (and other similar) workloads.

But there’s more to optimizing servers and clusters than providing scalable, high-performance, low-latency storage. When designing an entire system, the focus cannot be exclusively on any single feature or function. The entire architecture must work in concert to support targeted workloads. Thus, designing a system for AI applications means creating a runtime environment built from the ground up to handle data-intensive applications both quickly and satisfactorily. This benefits from all-around server performance for inference and analytics and overall IO capabilities. What the server does with the data while handling an AI (or similar) workload is as important as the data traffic into and out of any given node. Support for highly parallel activities is essential, so a high core count to handle all the parallelized sub-tasks that involve such programs' execution is critical.

Another critical feature is the number of PCIe 5.0 lanes in AMD EPYC-based servers (up to 128 for a single socket). This enables servers to accommodate larger collections of SSDs, NICs, GPUs, and even extended memory CXL devices. All of these play essential roles in handling demanding AI and ML (or similar) workloads, including:

  • Up to 32 PCIe Gen5 SSDs for high-speed local storage
  • Large numbers of high-speed network interfaces to connect servers to other nodes, such as storage or other specialized servers, to extend data scope and reach
  • Large numbers of GPUs for handling specialized, targeted tasks or workloads

In general, it’s important to have lots of storage on server nodes and high network bandwidth to provide appropriate levels of data ingress and egress for each such node from storage that may not reside on the host. This is essentially what stands behind most of the statements here regarding high throughput and low latency for Supermicro AMD EPYC servers.

More Cores Mean More “Oomph!”

Another critical factor for optimized AI capability is that a high core count per CPU provides hardware-level support for what's called a UP (uni- or single processor). AMD’s leadership in core count (the AMD EPYC 9004 family supports from 24 to 96 cores, for example) confers numerous necessary capabilities and advantages. Most importantly, such CPUs provide uniform memory access for all its cores. This feature helps with determinism, reduces blocking, and makes server motherboards easier to design and build for high performance. By design, the AMD EPYC architecture boosts AI workload performance, offering optimized network, storage, and GPU access.

Case in Point: Supermicro H13 1U Petascale Storage System

The Supermicro H13 Petascale Storage System provides an excellent illustration of what the EPYC architecture can do. It offers high densities for software-defined storage, in-memory computing, data-intensive HPC, private and public cloud, and—especially—AI/ML applications. Its specifications include the following details:

  • 16 hot-swap EDSFF E3.S NVMe slots for up to 480TB of storage in a 1U chassis
  • Optional 4 CXL E3.S 2T form factor memory expansion modules plus 8 E3.S NVMe storage devices
  • One 4th Gen AMD EPYC™ processor—up to 96 cores
  • 24 DIMMs for up to 6TB of DDR5 memory
  • 2 PCIe 5.0 Open Compute Project (OCP) 3.0 SFF-compliant AIOM slots
  • 2 full-height half-length PCIe 5.0 slots with auxiliary power
  • Titanium-Level efficiency power supplies

The Supermicro H13 system can be an invaluable addition to any data center where AI, ML, or other compute- and data-intensive workloads need high-performance, low latency storage access (and lots of it).

Why AMD and Supermicro Server Architecture Is Optimal for AI

NVMe has totally changed the server and cluster game. With NVMe at its base, a completely reworked architecture becomes possible. It allows storage to work at scale and speed alongside high-performance CPUs, GPUs, and NICs, especially with the EDSFF form factor. Single-socket design enables best-of-breed CPUs to fully saturate network cards and storage and exploit the highest possible levels of parallelism and clustering capabilities for HPC, AI, and other next-generation solutions. Balancing performance and power to support sustainability, memory bandwidth doubles from AMD EPYC 3rd Gen to 4th Gen, which also better supports AI workloads. When dealing with single-chip architecture, you can allocate other CPU resources (e.g., L3 cache and memory bandwidth) preferentially to high-demand threads to improve performance and reduce latency. You can tune threads to support such workloads all the way down to the hardware level. There’s no better, faster, or more efficient way to put AI and ML to work than on such servers.