A Ryzen-based server

AMD has introduced EPYC for servers that need a lot of cores and/or a lot of RAM or RAM-bandwidth, but currently (July 2017) nothing that officially competes with the Xeon E3 (i.e., something similar to the desktop CPUs, but with ECC enabled). However, Ryzen CPUs contain ECC logic that is not disabled, although it is not officially supported (and if you are unlucky, you might get one where it does not work), so if you don't need official certification etc., you can build a server based on Ryzen. That's what we did, and here I report about that.

Note that the currently available motherboards are not designed for servers, so they may miss features you may be interested in. In our case what we miss is slightly better ECC support and on-board graphics; we decided to live with the limited ECC support, and use a discrete graphics card.


The components we use for our server are (we built two similar ones, with the components of the smaller one in parentheses, if they differ):

CPU: Ryzen 7 1800X (Ryzen 5 1600X)
Motherboard: Asrock A320M Pro4
RAM: 4 Kingston ValueRAM Server Premier DIMM 16GB, DDR4-2400 (2 of these DIMMs)
Cooler: Thermalright AXP-200R ROG
Case and PSU: LC-Power 2002MB, 300W ATX 2.2
Graphics: Sapphire Radeon R5 230
Ethernet (2nd port): Intel Gigabit CT Desktop Adapter
Mass Storage: Intel SSD DC S3520 480GB, Seagate Nytro XF1230 480GB
              (Western Digital WD Purple 2TB, Seagate SkyHawk 2TB)
Some comments on the components:

Motherboard: You may wonder about the low-end A320-based board, but it has all that we need (apart from a second Ethernet port) and is therefore sufficient for our needs. If you want to overclock, you need a B350-based board, though; but who wants to overclock a server? We chose an Asrock board, because Asrock and ASUS are reported to have the best ECC support for AM4 boards.

Cooler: We decided on a relatively small case, which eliminated powerful tower coolers from the selection. This cooler fits in this configuration only with the ends of the heat pipes oriented towards the back of the case (I/O Panel). The part that holds the cooler to the CPU is designed to lock with the stabilizing wires on the cooler, but that does not fit (the heat pipes collide with the mounting frame), so we went without this locking (should not hurt given that we don't move our servers a lot). Also, at first it looked as if the supplied back plate collides with some stuff on the board, but once we put the plastic washers in the right place between the board and the back plate, this proved not to be a problem.

Mass Storage: For SSDs we decided on SATA models with power loss protection, and for both SSDs and hard disks, we chose two models per server from two different manufacturers (to be used in a RAID1; we have experienced that drives from the same manufacturer failed at the same time). While PCIe M.2 SSDs are the rage, and the board has space for two M.2 SSDs (but apparently only one of them PCIe), we chose SATA so that we can also access them on our legacy machines if necessary.

The components cost a little shy of EUR3000 (including 20% VAT) in July 2017, with the components for the big box being about EUR1900, and the components for the smaller box a little over EUR1000.

ECC testing

Given that neither AMD nor Asrock give any guarantee wrt ECC functionality, we wanted to check ourselves whether it works, and followed the example set by Hardware Canucks in testing it.

For the smaller machine (2 DIMMs), we did all that they did in Linux, and got pretty much the same results, with a few differences: We changed the timings to DDR4-2400 13-13-13-13-21 in order to see correctable errors, and then it soon crashed.

For the bigger machine (4 DIMMs), we saw the EDAC entries reporting ECC, but I had a hard time finding timings that would run, but produce errors reported by EDAC. Eventually I found that changing the first two parameters can easily cross the border into Crashland (in one case we needed to take out the CMOS battery to get to sane BIOS settings again), while varying the third and/or fourth parameters (Trcdwr, Trp) resulted in a setting that was stable enough to run, yet also produced (correctable) ECC errors; the setting I used was 14-14-11-10-21. I first tested with "stress -m50" and (on a RAM-disk) with "stress -m 50 -d 50 --hdd-bytes 100M"; this produced reports of correctable ECC errors. In order to test whether the correction actually works correctly, we then ran "memtester 60G" (as root); this produced correctable error reports at a slower rate than stress (often with 5 minutes between reports), but in >1h of memtesting (with over 10 errors corrected), no error was reported by memtester, so it looks like the correction is working.

Power usage

The 1600X box (including PSU, i.e., we measured the power coming through the leads) consumes 43W idle and up to 155W with an integer load: 1 instance of memtester (where the different phases seem to have a measurably different power consumption) and 11 instances of "yes >/dev/null".

Proper results for the 1800X box are not yet done, but the first impression is 38W idle, i.e., a little less when idle (thanks to the SSDs, and obviously AMD now implements power-gating of idle cores well), and quite a bit more when loaded (I have seen 180W; makes me wonder if the CPU stays within its TDP).


These machines worked fine for half a year, then started hanging about once a week. We tried a number of measures to get it to become stable (e.g., disabling deep sleep and switching power supplies), but they did not help permanently. Eventually we (temporarily) disabled SMT, and we have not seen crashes since then (for 133 days as of this writing). However, after a power outage on one of the boxes we did not disable SMT again, and yet it has not hung since then; so maybe the measure that worked was not SMT-disabling, but something else we did at the same time.

Anyway, if you have this problem and want to disable SMT, you can do so at the BIOS level; but alternatively, you can ask the Linux kernel to use only one logical thread per core. For our Ryzen 1xxx CPUs, you can do it like this (as root with bash):

  for i in /sys/devices/system/cpu/cpu[0-9]*; do
    if test $(( ${i##*cpu} % 2)) = 1; then
      echo 0 >$i/online
(Note that the logical cores are numbered differently on Intel CPUs, so you need to change this for Intel CPUs.)


Compared to fast Intel CPUs I measured:

Single-thread performance

On our Latex Benchmark (the numbers are the user time in seconds):

- Ryzen 5 1600X, 4000MHz, 8MB L2, Debian 9 (64-bit)               0.287
- Core i7-4790K, 4400MHz (Turbo), 8MB L3, Debian Jessie (64-bit)  0.204
- Core i7-6700K, 4200MHz (Turbo), 8MB L3, Debian Jessie (64-bit)  0.200
On the Gforth benchmarks (again, user time in seconds):
sieve bubble matrix fib   fft  release; CPU; gcc
0.093 0.099  0.042 0.104 0.030 2017-07-05; AMD Ryzen 1600X 4GHz; gcc-6.3
0.076 0.104  0.040 0.076 0.032 2016-05-03; Intel Core i7-4790K 4.4GHz; gcc-4.9
0.076 0.112  0.040 0.080 0.028 2015-12-26; Intel Core i7-6700K 4.0GHz; gcc-4.9

Multi-thread performance

To be measured.
Anton Ertl