# Ryzen 3900x at 65W PPT

The energy consumption of CPUs rises linearly with the clock frequency
and quadratically with the voltage (dynamic voltage and frequency
scaling (DVFS)). Higher frequency usually needs higher voltage, so
the power consumption rises superlinearly with frequency (and
performance). So, if we don't need the result ASAP, we can limit the
power consumption, and use less power for the same computation.
The Ryzen 3900x is advertized as having 105W TDP (thermal design
power, not really meaningful these days), which in practice means a
power limit (PPT) of 142W, and indeed, when loaded fully, one of our
machines took roughly 142W more than it takes when idle (48W at idle;
~190W loaded, all measured at the mains).

In the BIOS of the ASUS TUF Gaming B550M-Plus mainboard we can
reduce the PPT under AI Tweaker after setting Precision Boost to
"manual". Then we can set the PPT. We first tried a value of 80,
but it had no effect. Then we tried a value of 65, and indeed, the
power consumption under load was then about 62W above idle (i.e.,
110W total).

## Results

Running a 6000x6000 matrix multiplication (a pretty power-hungry
workload, the difference may be less for other workloads) using
libopenblas using all 24 threads of a Ryzen 3900X gives the
following results:
Energy
PPT total clock time PPT total
65W ~110W 2390MHz 2.06s 134J 227J
142W ~190W 3890MHz 1.54s 219J 293J

So the lower-power setting is a factor 1.34 slower for this workload
than the default setting, but saves a factor 1.63 in energy if you
consider PPT. If you consider total power (relevant if you don't let
the computer run idle for the rest of the time, i.e., you turn off the
computer once the computations are done), the power savings is a
factor 1.29.
You may wonder why a clock rate difference by a factor of 1.63
results in only a factor 1.34 difference in run-time. One part of
the explanation is that apparently the synchronization overheads
between the application threads don't scale with CPU speed,
resulting in a lower utilization of the threads at higher CPU
frequency: 1638% vs. 1779% CPU utilization out of 2400% on this CPU.
Another factor is that the uncore (L3, memory controllers) don't
scale with the core frequency, so with a slower core frequency,
accesses to uncore consume fewer cycles. This results in a factor
of about 4.15Gcycles/3.93Gcycles=1.06. That still leaves a gap for
which I don't have an explanation.

Anton Ertl