Nikolay Aleksandrov says:
====================
bridge: improve cache utilization
This is the first set which begins to deal with the bad bridge cache
access patterns. The first patch rearranges the bridge and port structs
a little so the frequently (and closely) accessed members are in the same
cache line. The second patch then moves the garbage collection to a
workqueue trying to improve system responsiveness under load (many fdbs)
and more importantly removes the need to check if the matched entry is
expired in __br_fdb_get which was a major source of false-sharing.
The third patch is a preparation for the final one which
If properly configured, i.e. ports bound to CPUs (thus updating "updated"
locally) then the bridge's HitM goes from 100% to 0%, but even without
binding we get a win because previously every lookup that iterated over
the hash chain caused false-sharing due to the first cache line being
used for both mac/vid and used/updated fields.
Some results from tests I've run:
(note that these were run in good conditions for the baseline, everything
ran on a single NUMA node and there were only 3 fdbs)
1. baseline
100% Load HitM on the fdbs (between everyone who has done lookups and hit
one of the 3 hash chains of the communicating
src/dst fdbs)
Overall 5.06% Load HitM for the bridge, first place in the list
2. patched & ports bound to CPUs
0% Local load HitM, bridge is not even in the c2c report list
Also there's 3% consistent improvement in netperf tests.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>