AMD Ryzen Reportedly Crashes With Specific Chain of FMA3 instructions
Samuel Wan / 4 years ago
Every time a new processor series launches, there are always some hardware bugs that surface. AMD previous had the TLB bug with Phenom and Intel had to disable TSX instructions on Haswell. More recently Intel had issues with certain Atom chips, and Skylake would crash under a specific demanding FMA3 and AVX workload. It looks like Ryzen is no exception as testing has revealed that certain FMA3 workloads can cause a system crash.
Using a very particular set of FMA3 instructions in Flops version 2, a simple open-source CPU benchmark, Ryzen will crash when using Haswell optimised binaries. It’s important to note that while the binaries are optimised for a specific architecture, they should still run on any modern x86 CPU without issue. It just so happens that when feeding FMA3 instructions in an order optimised for Haswell, that Ryzen will cause the entire system crash.
Due to the usefulness of FMA3, this is a problematic situation. Usually, an errata would hopefully only crash the application and not the entire system. This means it could pose a security risk on Ryzen systems as a way to cause systems to shut down. This is especially problematic for Ryzen based servers as it might be possible to trigger a workload known to require the use of FMA3 to cause a crash.
In most cases, errata is somewhat easily fixed with a workaround in a microcode update. This usually means slightly slower performance in the affected scenario. Since this is only occurring under a Haswell optimised workload, I am hopeful AMD will be able to resolve this without any noticeable performance impact. If a microcode update can’t fix the issue and considering the importance of having FMA3, AMD has the final resort of recalling the chips and fixing the problem with a new batch of chips.