It’s surprisingly hard to tell exactly How does Apple’s M1 compare to Intel x86 processors. While the chipset family has been extensively reviewed in a number of popular consumer applications, the inevitable differences between macOS and Windows, the impact of emulation, and the varying degrees of optimization between x86 and M1 make accurate scaling more difficult.
An interesting new benchmark result and an accompanying review from application developer and engineer Craig Hunter show that the M1 Ultra absolutely destroys every Intel x86 CPU in the field. It’s not even a fair fight. According to Hunter’s findings, the six-threaded M1 Ultra matches the performance of a 28-core Xeon workstation as of 2019.
Any surviving hopes that the M1 Ultra suffers from a sudden, unexplained scaling disaster above six cores fade once we expand the graph’s y-axis high enough to accommodate the data.
This is a massive win for M1. Apple’s new CPU is two times faster than the highest score of the 28-core Mac Pro. But what do we know about the test itself?
Hunter Standards USM3D, described by NASA as “a tetrahedral unstructured flow analyzer has become widely used in industry, government, and academia to solve aerodynamic problems.” Since its first introduction in 1989, USM3D has steadily evolved from an invisible Euler analyzer to a Navier-code Stokes is fully viscous.”
As noted earlier, this is a computational fluid dynamics test, and CFD tests are known for their sensitivity to memory bandwidth. We’ve never tested USM3D in ExtremeTech which is not an app that I know very well, so we reached out to Hunter to get some additional clarification on the test itself and how he compiled it for each platform. There has been some speculation online that the M1 Ultra achieved these levels of performance thanks to advanced matrix extensions or some other unspecified optimization that wasn’t turned on for the Intel platform.
According to Hunter, this is not true.
“I have not linked to any of the Apple frameworks when compiling USM3D on the M1, or trying to tune or optimize code for Accelerate or AMX,” said the engineer and app developer. “I used a USM3D repository source with gfortran and did a fairly standard compilation with -O3 optimization.”
“To be honest, I think this puts the M1 USM3D executable at a slight disadvantage to the Intel USM3D executable version,” he continued. “I’ve used the Intel Fortran compiler for over 30 years (it was DEC Fortran and then Compaq Fortran before it became Intel Fortran) and know how to get the most out of it. The Intel compiler does some strong optimization and optimization when compiling USM3D, and has historically provided better performance on x86-64 from gfortran. So I expect I left some performance on the table with gfortran for the M1.”
We asked Hunter what he felt explains the M1 Ultra’s performance on various Intel platforms. The engineer has decades of experience evaluating CFD performance on various platforms, from desktop systems like Mac Pro and Mac Studio to physical supercomputers.
“Based on all past and current testing, I feel that it is the SoC architecture that makes the biggest difference here with Apple Silicon hardware, and as we call more cores into the computation, system bandwidth will be a major driver of performance scaling. The M1 Ultra in the studio has an insane amount of the system bandwidth.
The standard is based on the NASA USM3D CFD code, and is available to US citizens upon request at software.nasa.gov. It comes as source code and will have to be compiled using the Fortran compiler (you’ll also need to build OpenMPI with matching compiler support). makefiles for macOS or Linux are prepared using the Intel Fortran compiler, creating an executable that is highly optimized for x86-64. You can also use gfortran (what I used for arm-64 Apple M1 systems) but I would expect performance to be lower than ifort can enable it on x86-64. “
What do these results say about x86/M1 matching
It’s not entirely surprising that a SoC with more memory bandwidth than any previous CPU would perform well in a bandwidth-limited environment. What is interesting about these results is that they no It necessarily depends on any particular aspect of ARM versus x86. Give your AMD or Intel CPU the same memory bandwidth as Apple does here, and performance may improve likewise.
In my article RISC vs. CISC is the Wrong Lens for Comparing Modern x86 and ARM CPUs, I spent some time discussing how Intel won the ISA wars decades ago not because x86 was intrinsically the best instruction set architecture, but because it could benefit from a matrix of continuous manufacturing improvements with Optimizing x86 iteratively from generation to generation. Here, we can say that Apple is doing something similar. The M1 Ultra smashes every Intel x86 CPU not because it’s magic, but because of incorporating DRAM on the packaging the way Apple unlocked massive performance improvements. There’s no reason x86 CPUs can’t take advantage of these gains either. The fact that this standard has limited memory bandwidth suggests that high-end Alder Lake systems may match or surpass older Xeons like the 28-core Mac Pro, but it’s still no match for the M1 Ultra for the massive bandwidth between the SoC and main memory.
In fact, we’re seeing x86 CPUs take small steps toward incorporating more high-speed memory right on the packaging, but Intel is keeping this technology focused in servers for now, with Sapphire Rapids and HBM2 onboard memory (available in some future SKUs) ). However, neither Intel nor AMD has built anything like the M1 Ultra, at least not yet. So far, AMD has focused on incorporating larger L3 caches rather than moving toward on-pack DRAM. Any such move would require a purchase from OEMs and many other players in the PC industry.
I wouldn’t expect any x86 manufacturers to be quick to adopt the technology just because Apple uses it, but the M1 delivers some exceptional performance in some tests, with excellent performance per watt. You can bet that every aspect of Cupertino’s approach to manufacturing and design has been put under a microscope (possibly literally) at AMD and Intel. This is especially true for earnings that are not linked to any particular ISA or manufacturing technology.