cpufreq: powernv: Ramp-down global pstate slower than local-pstate
The frequency transition latency from pmin to pmax is observed to be in
few millisecond granurality. And it usually happens to take a performance
penalty during sudden frequency rampup requests.
This patch set solves this problem by using an entity called "global
pstates". The global pstate is a Chip-level entity, so the global entitiy
(Voltage) is managed across the cores. The local pstate is a Core-level
entity, so the local entity (frequency) is managed across threads.
This patch brings down global pstate at a slower rate than the local
pstate. Hence by holding global pstates higher than local pstate makes
the subsequent rampups faster.
A per policy structure is maintained to keep track of the global and
local pstate changes. The global pstate is brought down using a parabolic
equation. The ramp down time to pmin is set to ~5 seconds. To make sure
that the global pstates are dropped at regular interval , a timer is
queued for every 2 seconds during ramp-down phase, which eventually brings
the pstate down to local pstate.
Iozone results show fairly consistent performance boost.
YCSB on redis shows improved Max latencies in most cases.
Iozone write/rewite test were made with filesizes 200704Kb and 401408Kb
with different record sizes . The following table shows IOoperations/sec
with and without patch.
Iozone Results ( in op/sec) ( mean over 3 iterations )
---------------------------------------------------------------------
file size- with without %
recordsize-IOtype patch patch change
----------------------------------------------------------------------
200704-1-SeqWrite
1616532 1615425 0.06
200704-1-Rewrite
2423195 2303130 5.21
200704-2-SeqWrite
1628577 1602620 1.61
200704-2-Rewrite
2428264 2312154 5.02
200704-4-SeqWrite
1617605 1617182 0.02
200704-4-Rewrite
2430524 2351238 3.37
200704-8-SeqWrite
1629478 1600436 1.81
200704-8-Rewrite
2415308 2298136 5.09
200704-16-SeqWrite
1619632 1618250 0.08
200704-16-Rewrite
2396650 2352591 1.87
200704-32-SeqWrite
1632544 1598083 2.15
200704-32-Rewrite
2425119 2329743 4.09
200704-64-SeqWrite
1617812 1617235 0.03
200704-64-Rewrite
2402021 2321080 3.48
200704-128-SeqWrite
1631998 1600256 1.98
200704-128-Rewrite
2422389 2304954 5.09
200704-256 SeqWrite
1617065 1616962 0.00
200704-256-Rewrite
2432539 2301980 5.67
200704-512-SeqWrite
1632599 1598656 2.12
200704-512-Rewrite
2429270 2323676 4.54
200704-1024-SeqWrite
1618758 1616156 0.16
200704-1024-Rewrite
2431631 2315889 4.99
401408-1-SeqWrite
1631479 1608132 1.45
401408-1-Rewrite
2501550 2459409 1.71
401408-2-SeqWrite
1617095 1626069 -0.55
401408-2-Rewrite
2507557 2443621 2.61
401408-4-SeqWrite
1629601 1611869 1.10
401408-4-Rewrite
2505909 2462098 1.77
401408-8-SeqWrite
1617110 1626968 -0.60
401408-8-Rewrite
2512244 2456827 2.25
401408-16-SeqWrite
1632609 1609603 1.42
401408-16-Rewrite
2500792 2451405 2.01
401408-32-SeqWrite
1619294 1628167 -0.54
401408-32-Rewrite
2510115 2451292 2.39
401408-64-SeqWrite
1632709 1603746 1.80
401408-64-Rewrite
2506692 2433186 3.02
401408-128-SeqWrite
1619284 1627461 -0.50
401408-128-Rewrite
2518698 2453361 2.66
401408-256-SeqWrite
1634022 1610681 1.44
401408-256-Rewrite
2509987 2446328 2.60
401408-512-SeqWrite
1617524 1628016 -0.64
401408-512-Rewrite
2504409 2442899 2.51
401408-1024-SeqWrite
1629812 1611566 1.13
401408-1024-Rewrite
2507620 2442968 2.64
Tested with YCSB workload (50% update + 50% read) over redis for 1 million
records and 1 million operation. Each test was carried out with target
operations per second and persistence disabled.
Max-latency (in us)( mean over 5 iterations )
---------------------------------------------------------------
op/s Operation with patch without patch %change
---------------------------------------------------------------
15000 Read 61480.6 50261.4 22.32
15000 cleanup 215.2 293.6 -26.70
15000 update 25666.2 25163.8 2.00
25000 Read 32626.2 89525.4 -63.56
25000 cleanup 292.2 263.0 11.10
25000 update 32293.4 90255.0 -64.22
35000 Read 34783.0 33119.0 5.02
35000 cleanup 321.2 395.8 -18.8
35000 update 36047.0 38747.8 -6.97
40000 Read 38562.2 42357.4 -8.96
40000 cleanup 371.8 384.6 -3.33
40000 update 27861.4 41547.8 -32.94
45000 Read 42271.0 88120.6 -52.03
45000 cleanup 263.6 383.0 -31.17
45000 update 29755.8 81359.0 -63.43
(test without target op/s)
47659 Read 83061.4 136440.6 -39.12
47659 cleanup 195.8 193.8 1.03
47659 update 73429.4 124971.8 -41.24
Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com>
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>