Based almost entirely upon a patch by Eric Dumazet.
The common case is to have num-tx-queues <= num_rx_queues
and even if num_tx_queues is larger it will not be significantly
larger.
Therefore, a subtraction loop is always going to be faster than
modulus.
Signed-off-by: David S. Miller <davem@davemloft.net>
{
u32 hash;
- if (skb_rx_queue_recorded(skb))
- return skb_get_rx_queue(skb) % dev->real_num_tx_queues;
+ if (skb_rx_queue_recorded(skb)) {
+ hash = skb_get_rx_queue(skb);
+ while (unlikely (hash >= dev->real_num_tx_queues))
+ hash -= dev->real_num_tx_queues;
+ return hash;
+ }
if (skb->sk && skb->sk->sk_hash)
hash = skb->sk->sk_hash;