net/mlx5e: Use dma_rmb rather than rmb in CQE fetch routine
authorSaeed Mahameed <saeedm@mellanox.com>
Fri, 24 Mar 2017 21:52:03 +0000 (00:52 +0300)
committerDavid S. Miller <davem@davemloft.net>
Sat, 25 Mar 2017 02:11:44 +0000 (19:11 -0700)
Use dma_rmb in mlx5e_get_cqe rather than aggressive rmb (at least on
some architectures), this should help improve the performance on such
CPU archs where dma_rmb is optimized.

Performance improvement:
System: Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz

Test case                   Baseline      Now      improvement
---------------------------------------------------------------
TX packets (24 threads)     45Mpps        50Mpps      11%
TC stack Drop (1 core)      3.45Mpps      3.6Mpps     5%
XDP Drop      (1 core)      14Mpps        16.9Mpps    20%
XDP TX        (1 core)      10.4Mpps      12Mpps      15%

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c

index e5c12a732aa1212274943183ed83696ce2606639..d8cda2f6239bd500832113063fb01cfada349254 100644 (file)
@@ -44,7 +44,7 @@ struct mlx5_cqe64 *mlx5e_get_cqe(struct mlx5e_cq *cq)
                return NULL;
 
        /* ensure cqe content is read after cqe ownership bit */
-       rmb();
+       dma_rmb();
 
        return cqe;
 }