Merge branch 'fib_trie-next'
Alexander Duyck says:
====================
fib_trie: Reduce time spent in fib_table_lookup by 35 to 75%
These patches are meant to address several performance issues I have seen
in the fib_trie implementation, and fib_table_lookup specifically. With
these changes in place I have seen a reduction of up to 35 to 75% for the
total time spent in fib_table_lookup depending on the type of search being
performed.
On a VM running in my Corei7-4930K system with a trie of maximum depth of 7
this resulted in a reduction of over 370ns per packet in the total time to
process packets received from an ixgbe interface and route them to a dummy
interface. This represents a failed lookup in the local trie followed by
a successful search in the main trie.
Baseline Refactor
ixgbe->dummy routing 1.20Mpps 2.21Mpps
------------------------------------------------------------
processing time per packet 835ns 453ns
fib_table_lookup 50.1% 418ns 25.0% 113ns
check_leaf.isra.9 7.9% 66ns -- --
ixgbe_clean_rx_irq 5.3% 44ns 9.8% 44ns
ip_route_input_noref 2.9% 25ns 4.6% 21ns
pvclock_clocksource_read 2.6% 21ns 4.6% 21ns
ip_rcv 2.6% 22ns 4.0% 18ns
In the simple case of receiving a frame and dropping it before it can reach
the socket layer I saw a reduction of 40ns per packet. This represents a
trip through the local trie with the correct leaf found with no need for
any backtracing.
Baseline Refactor
ixgbe->local receive 2.65Mpps 2.96Mpps
------------------------------------------------------------
processing time per packet 377ns 337ns
fib_table_lookup 25.1% 95ns 25.8% 87ns
ixgbe_clean_rx_irq 8.7% 33ns 9.0% 30ns
check_leaf.isra.9 7.2% 27ns -- --
ip_rcv 5.7% 21ns 6.5% 22ns
These changes have resulted in several functions being inlined such as
check_leaf and fib_find_node, but due to the code simplification the
overall size of the code has been reduced.
text data bss dec hex filename
16932 376 16 17324 43ac net/ipv4/fib_trie.o - before
15259 376 8 15643 3d1b net/ipv4/fib_trie.o - after
Changes since RFC:
Replaced this_cpu_ptr with correct call to this_cpu_inc in patch 1
Changed test for leaf_info mismatch to (key ^ n->key) & li->mask_plen in patch 10
====================
Signed-off-by: David S. Miller <davem@davemloft.net>