Skip to main content

Why Move to Supermicro Servers with 4th Gen Intel Xeon Scalable Processors

The latest servers from Supermicro contain the 4th Gen Intel Xeon Scalable processors. These new CPUs offer a significant performance boost over the previous two generations of Intel CPUs. Many benchmarks can be performed, so let’s look at a few.

The base capabilities compare the different generations of Intel Xeon CPUs.

 2nd Gen (Cascade Lake) (92xx series excluded)3rd Gen (Ice Lake)4th Gen (Sapphire Rapids)Increase 2nd to 4th
Maximum Cores284060114%
Max GHz at Max Cores2.72.31.9 
Max Core*GHz=28*2.7 = 75.6 =60*1.9 = 11451%
Memory Speed2400 MHz3200 MHz4800 MHz100%
Max Memory Per Socket3TB8TB (DRAM Only)8TB (DRAM Only)166%
High Bandwidth MemoryXXUp to 64 GBN/A
UPI Links*Performance2 @ 9.6 GT/s = 19.2 GT/s3 @ 11.2 GT/s = 33.6 GT/s4@16 GT/s = 64 GT/s233 %

Range of Benchmarks

Although specific benchmarks may exist that are generally accepted, many workloads that a modern enterprise may run cannot be simply forced into a general benchmark report. Benchmarks can be categorized in the following hierarchy, from low level to full applications.

Lowest – absolute maximum performance based on capabilities of the CPU. This number is the theoretical performance of a single CPU and can generally be calculated by multiplying the clock rate by the number of cores by the instructions per clock.

Math Kernel levels – a small application highly tuned to the CPU architecture. Maximum performance is usually about 85% of the theoretical performance. The most common Math benchmark is LINPACK, which solves linear equations.

Small Applications – The most popular for enterprise class servers that are commonly used to test the performance of the system is SPEC (Standard Performance Evaluation Corporation). SPEC has been the provider and collector of various test suites for over 30 years).

Complete Applications – Entire applications are run, and the time to completion is recorded.
 

Supermicro servers with 4th Gen Intel Xeon Scalable Processors perform excellently on various SPEC results. Specifically,

The SPECcpu2017 suite measures the performance of a system in the following ways:

Floating Point: (applications are heavily floating-point focused)

  1. Speed – A single copy of each application from the suite is run. The “score” is then calculated by dividing the time to completion of a reference machine.
  2. Rate – The system is loaded with many copies of the test suite (typically equal to the number of threads), and the result is then divided by a time from a reference machine.

Integer: (applications use only integer calculations)

  1. Speed – A single copy of each application from the suite is run. The “score” is then calculated by dividing the time to completion of a reference machine.
  2. Rate – The system is loaded with many copies of the test suite (typically equal to the number of threads), and the result is then divided by a reference machine.

Peak – Each application source code can be recompiled with specific flags.

Base – The same compiler flags are used for compiling all applications.

SPEC Results

Supermicro 8-socket SPEC CPU Benchmarks:

SPECcpu2017 Integer
SystemIntel XeonWorkloadSignificanceScore
SuperServer SYS-681E-TR8490HSPECcpu2017_int_speed_baseBest 8 socket system13.8
SuperServer SYS-681E-TR8490HSPECcpu2017_int_speed_peakBest 8 socket system14.0
SuperServer SYS-681E-TR8490HSPECcpu2017_int_rate_baseTop 3 Best 8 socket system3510
SuperServer SYS-681E-TR8490HSPECcpu2017_int_rate_peakTop 2 Best 8 socket system3560
SPECcpu2017 Floating Point
SystemIntel XeonWorkloadSignificanceScore
SuperServer SYS-681E-TR8490HSPECcpu2017_fp_rate_baseTop 2 Best 8 socket system3540
SuperServer SYS-681E-TR8490HSPECcpu2017_fp_rate_peakTop 2 Best 8 socket system3560
SuperServer SYS-681E-TR8490HSPECcpu2017_fp_speed_baseBest 8 socket system343
SuperServer SYS-681E-TR8490HSPECcpu2017_fp_speed_peakBest 8 socket system334

Supermicro 4-socket SPEC CPU Benchmarks:

SPECcpu2017 Integer
SystemIntel XeonWorkloadSignificanceScore
SuperServer SYS-241H-TNRTTP8490HSPECcpu2017_int_rate_baseTop 4 Best 4 socket system1930
SuperServer SYS-241H-TNRTTP8490HSPECcpu2017_int_rate_peakTop 4 Best 4 socket system1970
SuperServer SYS-241H-TNRTTP8490HSPECcpu2017_int_speed_baseTop 3 Best 4 socket system16
SuperServer SYS-241H-TNRTTP8490HSPECcpu2017_int_speed_peakTop 3 Best 4 socket system16.2
SPECcpu2017 Floating Point
SystemIntel XeonWorkloadSignificanceScore
SuperServer SYS-241H-TNRTTP8490HSPECcpu2017_fp_rate_baseTop 2 Best 4 socket system1900
SuperServer SYS-241H-TNRTTP8490HSPECcpu2017_fp_rate_peakTop 2 Best 4 socket system2010
SuperServer SYS-241H-TNRTTP8490HSPECcpu2017_fp_speed_baseTop 2 Best 4 socket system387
SuperServer SYS-241H-TNRTTP8490HSPECcpu2017_fp_speed_peakTop 2 Best 4 socket system387

SPECStorage

The SPECstorage Solution 2020 benchmark measures the performance of an entire storage configuration as it interacts with application-based workloads. The latest version includes new workloads for artificial intelligence (AI) and genomics, expanded custom workload capabilities, massively better scaling, and a statistical visualization mechanism for displaying benchmark results.
(https://www.spec.org/storage2020/press/release.html)

SystemIntel XeonWorkloadSignificanceScore
SYS-221H-TN24R Hyper Storage Server8468V
8450H
SPECstorage Solution 2020Best SpecStorage_2020 result on AI Image0.57
SYS-221H-TN24R Hyper Storage Server8468V
8450H
SPECstorage Solution 2020Best SpecStorage_2020 result on SWBUILD/Jobs: 720.47
SYS-221H-TN24R Hyper Storage Sever8468V
8450H
SPECstorage Solution 2020#1 SpecStorage_2020 leadership on Genomics per top 5 IDC vendors.0.19
SYS-221H-TN24R Hyper Storage Sever8468V
8450H
SPECstorage Solution 2020#1 SpecStorage_2020 leadership on VDA/Jobs: 720 per top 5 IDC vendors.5.56
SYS-220U-TNR with 22 NVMe Storage Node8380
8360Y
SPECstorage Solution 2020#1 SpecStorage_2020 leadership on EDA/Jobs: 240 per top 5 IDC vendors.0.28
SuperServer SYS-741GE-TNRT8490HSPEChpc2021_TinyBest single node base result on MPI model8.20
SuperServer SYS-741GE-TNRT8490HSPEChpc2021_Tiny#1 single node base result leadership on OPM model among top 5 vendors per IDC9.24
SuperServer SYS-741GE-TNRT8490HSPECpower_ssj2008#1 leadership 4U server among top 5 vendors per IDC13546

Full Application Benchmarks Using Intel Accelerator Engines

Supermicro has run several benchmarks that compare the 4th Gen Intel Xeon Scalable processors with different Intel Accelerator Engines turned on. The chart below shows real-world benchmarks and compares an Intel Xeon 8380 to an Intel Xeon 8490H for both performance and performance per watt. The Intel Accelerator Engine that was used for the particular benchmark is listed as well. The specifics of the servers that the benchmarks were run on are described at the end of this document.

Significant Performance and Performance/Watt Gains – Benefits of Intel® Accelerator Engines

Supermicro’s X13 CloudDC server was used for testing the ResNet 50 v1.5 Inference benchmark, and the Intel Xeon 8480+ was compared to the Intel Xeon 8380 CPU. In the chart below, the performance gain was from 2.38X to 3.24X, depending on the data set. Intel’s AMX acceleration features were used for this benchmark.

Supermicro’s Performance Gains in AI – ResNet 50 v1.5 inference on CloudDC SuperServer – Intel® Advanced Matrix Extensions (Intel® AMX). Up to 3.24x higher performance.

Supermicro’s X13 GrandTwin® system was compared to 3rd Gen Intel Xeon Gold processors with the 4th Gen Intel Xeon Gold processors using the Intel® AMX features. The results show that there was between a 2.38 and 3.24 speedup when running the ResNet 50 v1.5 inference test.

Supermicro’s Performance Gains in AI – ResNet 50 v1.5 inference on GrandTwin SuperServer – Intel® Advanced Matrix Extensions (Intel® AMX). Up to 2.85x better performance.

There is a significant improvement for database and analytics applications when moving from a Supermicro X12 generation system with the 3rd Gen Intel Xeon Scalable processor (using 80 cores) to the 4th Gen Intel Xeon Scalable processor (using 48 cores). Using an X12 Ultra platform compared to an X13 Hyper platform, a 25% performance gain is observed using the ClickHouse database with 40% fewer cores.

Supermicro’s Performance Gains in Analytics – ClickHouse Improvement Gen over Gen. 25% higher performance with 40% fewer cores

Summary

The 4th Gen Intel Xeon Scalable processors show significant performance gains running applications on the Supermicro 8 and 4 socket systems. The Supermicro SYS-681E-TR eight socket system shows the fastest performance on a single system ever recorded for:

  • SPECcpu2017_int_rate_base
  • SPECcpu2017_int_rate_peak
  • SPECcpu2017_fp_rate_base
  • SPECcpu2017_fp_rate_peak

The SMP architecture of the eight socket and four socket Supermicro servers is ideal for large scale enterprise applications that require many cores and memory.

Intel consistently improves performance and security from generation to generation. Below is a comparison, courtesy of Intel, that shows how Intel is improving its performance. Supermicro servers incorporate the latest 4th Gen Intel Xeon Scalable processors across the product line, from the edge to multi-processor systems that reside in the data center.

Intel Accelerator Engines by Processor Generation (Comparison Chart)
Resources and Configurations