net/mlx4: use one page fragment per incoming frame
mlx4 driver has a suboptimal memory allocation strategy for regular
MTU=1500 frames, as it uses two page fragments :
One of 512 bytes and one of 1024 bytes.
This makes GRO less effective, as each GSO packet contains 8 MSS instead
of 16 MSS.
Performance of a single TCP flow gains 25 % increase with the following
patch.
Before patch :
A:~# netperf -H 192.168.0.2 -Cc
MIGRATED TCP STREAM TEST ...
Recv Send Send Utilization Service Demand
Socket Socket Message Elapsed Send Recv Send Recv
Size Size Size Time Throughput local remote local remote
bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB
87380 16384 16384 10.00 13798.47 3.06 4.20 0.436 0.598
After patch :
A:~# netperf -H 192.68.0.2 -Cc
MIGRATED TCP STREAM TEST ...
Recv Send Send Utilization Service Demand
Socket Socket Message Elapsed Send Recv Send Recv
Size Size Size Time Throughput local remote local remote
bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB
87380 16384 16384 10.00 17273.80 3.44 4.19 0.391 0.477
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Amir Vadai <amirv@mellanox.com>
Acked-By: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>