From Hacker News - %CPU utilization is a lie referencing Brendan Long's Blog
"a server at 60% utilization had zero room left" can be explained with queuing theory:
- Up to a hair over 60% utilization the queuing delays on any work queue remain essentially negligible. At 70 they become noticeable, and at 80% they've doubled. And then it just turns into a shitshow from there on.
- rule of thumb is 60% is zero, and 80% is the inflection point where delays go exponential.
- CPU utilization metrics are frequently averaged over a long period of time (like a minute), but if your SLO is 100 ms, what you care about is whether there's any ~100 ms period where CPU utilization is at 100%. Measuring p99 (or even p100) CPU utilization can make this a lot more visible.

