Arnd Bergmann [Wed, 16 Nov 2016 14:10:49 +0000 (15:10 +0100)]
netronome: don't access real_num_rx_queues directly
The netdev->real_num_rx_queues setting is only available if CONFIG_SYSFS
is enabled, so we now get a build failure when that is turned off:
netronome/nfp/nfp_net_common.c: In function 'nfp_net_ring_swap_enable':
netronome/nfp/nfp_net_common.c:2489:18: error: 'struct net_device' has no member named 'real_num_rx_queues'; did you mean 'real_num_tx_queues'?
As far as I can tell, the check here is only used as an optimization that
we can skip in order to fix the compilation. If sysfs is disabled,
the following netif_set_real_num_rx_queues() has no effect.
Fixes:
164d1e9e5d52 ("nfp: add support for ethtool .set_channels")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 16 Nov 2016 14:01:47 +0000 (06:01 -0800)]
sfc: remove napi_hash_del() call
Calling napi_hash_del() after netif_napi_del() is pointless.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Edward Cree <ecree@solarflare.com>
Cc: Bert Kenward <bkenward@solarflare.com>
Acked-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Lebrun [Wed, 16 Nov 2016 09:05:46 +0000 (10:05 +0100)]
lwtunnel: subtract tunnel headroom from mtu on output redirect
This patch changes the lwtunnel_headroom() function which is called
in ipv4_mtu() and ip6_mtu(), to also return the correct headroom
value when the lwtunnel state is OUTPUT_REDIRECT.
This patch enables e.g. SR-IPv6 encapsulations to work without
manually setting the route mtu.
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Wed, 16 Nov 2016 08:51:58 +0000 (09:51 +0100)]
mlxsw: spectrum_router: Adjust placement of FIB abort warning
The recent merge commit
bb598c1b8c9b ("Merge
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net") would cause
the FIB abort warning to fire whenever we flush the FIB tables - either
during module removal or actual abort.
Move it back to its rightful location in the FIB abort function.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Wed, 16 Nov 2016 03:26:48 +0000 (04:26 +0100)]
net: dsa: mv88e6xxx: Respect SPEED_UNFORCED, don't set force bit
The SPEED_UNFORCED indicates the MAC & PHY should perform
auto-negotiation to determine a speed which works. If this is called
for, don't set the force bit. If it is set, the MAC actually does
10Gbps, why the internal PHYs don't support.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 16 Nov 2016 18:57:45 +0000 (13:57 -0500)]
Merge branch 'amd-xgbe-next'
Tom Lendacky says:
====================
amd-xgbe: AMD XGBE driver updates 2016-11-15
This patch series addresses some minor issues found in the recently
accepted patch series for the AMD XGBE driver.
The following fixes are included in this driver update series:
- Fix a possibly uninitialized variable in the debugfs support
- Fix the GPIO pin number constraint check
This patch series is based on net-next.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Lendacky, Thomas [Tue, 15 Nov 2016 22:11:15 +0000 (16:11 -0600)]
amd-xgbe: Fix maximum GPIO value check
The GPIO support in the hardware allows for up to 16 GPIO pins, enumerated
from 0 to 15. The driver uses the wrong value (16) to validate the GPIO
pin range in the routines to set and clear the GPIO output pins. Update
the code to use the correct value (15).
Reported-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lendacky, Thomas [Tue, 15 Nov 2016 22:11:05 +0000 (16:11 -0600)]
amd-xgbe: Fix possible uninitialized variable
The debugfs support in the driver uses a common routine to write the
debugfs values. In this routine, if the input file position is non-zero
then the write routine will not return an error and an output parameter
will not have been set. Because an error isn't returned an uninitialized
value will be written into a register.
Fix the common write routine to return an error if the input file position
is non-zero, which will propagate back to the caller.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 16 Nov 2016 18:44:02 +0000 (13:44 -0500)]
Merge branch 'nway-reset'
Florian Fainelli says:
====================
net: Implenent ethtool::nway_reset for a few drivers
This patch series depends on "net: phy: Centralize auto-negotation restart"
since it provides phy_ethtool_nway_reset as a helper function.
The drivers here already support PHYLIB, so there really is no reason why
restarting auto-negotiation would not be possible with these.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Tue, 15 Nov 2016 19:19:49 +0000 (11:19 -0800)]
net: ethernet: marvell: pxa168_eth: Implement ethtool::nway_reset
Implement ethtool::nway_reset using phy_ethtool_nway_reset. We are
already using dev->phydev all over the place so this comes for free.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Tue, 15 Nov 2016 19:19:48 +0000 (11:19 -0800)]
net: ethernet: mvpp2: Implement ethtool::nway_reset
Implement ethtool::nway_reset using phy_ethtool_nway_reset. We are
already using dev->phydev all over the place so this comes for free.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Tue, 15 Nov 2016 19:19:47 +0000 (11:19 -0800)]
net: ethernet: mvneta: Implement ethtool::nway_reset
Implement ethtool::nway_reset using phy_ethtool_nway_reset. We are
already using dev->phydev all over the place so this comes for free.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Tue, 15 Nov 2016 19:19:46 +0000 (11:19 -0800)]
net: ethoc: Implement ethtool::nway_reset
Implement ethtool::nway_reset using phy_ethtool_nway_reset. We are
already using dev->phydev all over the place so this comes for free.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Tue, 15 Nov 2016 19:19:45 +0000 (11:19 -0800)]
net: stmmac: Implement ethtool::nway_reset
Utilize the generic phy_ethtool_nway_reset() helper function to
implement an autonegotiation restart.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 16 Nov 2016 18:40:59 +0000 (13:40 -0500)]
Merge branch 'busypoll-preemption-and-other-optimizations'
Eric Dumazet says:
====================
net: busy-poll: allow preemption and other optimizations
It is time to have preemption points in sk_busy_loop() and improve
its scalability.
Also napi_complete() and friends can tell drivers when it is safe to
not re-enable device interrupts, saving some overhead under
high busy polling.
mlx4 and bnx2x are changed accordingly, to show how this busy polling
status can be exploited by drivers.
Next steps will implement Zach Brown suggestion, where NAPI polling
would be enabled all the time for some chosen queues.
This is needed for efficient epoll() support anyway.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 15 Nov 2016 18:15:15 +0000 (10:15 -0800)]
bnx2x: switch to napi_complete_done()
Switch from napi_complete() to napi_complete_done()
for better GRO support (gro_flush_timeout) and core NAPI
features.
Do not rearm interrupts if we are busy polling,
to reduce bus and interrupts overhead.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Adam Belay <abelay@google.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Cc: Yuval Mintz <Yuval.Mintz@cavium.com>
Cc: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 15 Nov 2016 18:15:14 +0000 (10:15 -0800)]
net/mlx4_en: use napi_complete_done() return value
Do not rearm interrupts if we are busy polling.
mlx4 uses separate CQ for TX and RX, so number of TX interrupts
does not change, unfortunately.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Adam Belay <abelay@google.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Cc: Yuval Mintz <Yuval.Mintz@cavium.com>
Cc: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 15 Nov 2016 18:15:13 +0000 (10:15 -0800)]
net: busy-poll: return busypolling status to drivers
NAPI drivers use napi_complete_done() or napi_complete() when
they drained RX ring and right before re-enabling device interrupts.
In busy polling, we can avoid interrupts being delivered since
we are polling RX ring in a controlled loop.
Drivers can chose to use napi_complete_done() return value
to reduce interrupts overhead while busy polling is active.
This is optional, legacy drivers should work fine even
if not updated.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Adam Belay <abelay@google.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Cc: Yuval Mintz <Yuval.Mintz@cavium.com>
Cc: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 15 Nov 2016 18:15:12 +0000 (10:15 -0800)]
net: busy-poll: remove need_resched() from sk_can_busy_loop()
Now sk_busy_loop() can schedule by itself, we can remove
need_resched() check from sk_can_busy_loop()
Also add a const to its struct sock parameter.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Adam Belay <abelay@google.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Cc: Yuval Mintz <Yuval.Mintz@cavium.com>
Cc: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 15 Nov 2016 18:15:11 +0000 (10:15 -0800)]
net: busy-poll: allow preemption in sk_busy_loop()
After commit
4cd13c21b207 ("softirq: Let ksoftirqd do its job"),
sk_busy_loop() needs a bit of care :
softirqs might be delayed since we do not allow preemption yet.
This patch adds preemptiom points in sk_busy_loop(),
and makes sure no unnecessary cache line dirtying
or atomic operations are done while looping.
A new flag is added into napi->state : NAPI_STATE_IN_BUSY_POLL
This prevents napi_complete_done() from clearing NAPIF_STATE_SCHED,
so that sk_busy_loop() does not have to grab it again.
Similarly, netpoll_poll_lock() is done one time.
This gives about 10 to 20 % improvement in various busy polling
tests, especially when many threads are busy polling in
configurations with large number of NIC queues.
This should allow experimenting with bigger delays without
hurting overall latencies.
Tested:
On a 40Gb mlx4 NIC, 32 RX/TX queues.
echo 70 >/proc/sys/net/core/busy_read
for i in `seq 1 40`; do echo -n $i: ; ./super_netperf $i -H lpaa24 -t UDP_RR -- -N -n; done
Before: After:
1: 90072 92819
2: 157289 184007
3: 235772 213504
4: 344074 357513
5: 394755 458267
6: 461151 487819
7: 549116 625963
8: 544423 716219
9: 720460 738446
10: 794686 837612
11: 915998 923960
12: 937507 925107
13:
1019677 971506
14:
1046831 1113650
15:
1114154 1148902
16:
1105221 1179263
17:
1266552 1299585
18:
1258454 1383817
19:
1341453 1312194
20:
1363557 1488487
21:
1387979 1501004
22:
1417552 1601683
23:
1550049 1642002
24:
1568876 1601915
25:
1560239 1683607
26:
1640207 1745211
27:
1706540 1723574
28:
1638518 1722036
29:
1734309 1757447
30:
1782007 1855436
31:
1724806 1888539
32:
1717716 1944297
33:
1778716 1869118
34:
1805738 1983466
35:
1815694 2020758
36:
1893059 2035632
37:
1843406 2034653
38:
1888830 2086580
39:
1972827 2143567
40:
1877729 2181851
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Adam Belay <abelay@google.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Cc: Yuval Mintz <Yuval.Mintz@cavium.com>
Cc: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Martin KaFai Lau [Tue, 15 Nov 2016 19:00:04 +0000 (11:00 -0800)]
bpf: Fix compilation warning in __bpf_lru_list_rotate_inactive
gcc-6.2.1 gives the following warning:
kernel/bpf/bpf_lru_list.c: In function ‘__bpf_lru_list_rotate_inactive.isra.3’:
kernel/bpf/bpf_lru_list.c:201:28: warning: ‘next’ may be used uninitialized in this function [-Wmaybe-uninitialized]
The "next" is currently initialized in the while() loop which must have >=1
iterations.
This patch initializes next to get rid of the compiler warning.
Fixes:
3a08c2fd7634 ("bpf: LRU List")
Reported-by: David Miller <davem@davemloft.net>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Lebrun [Tue, 15 Nov 2016 15:14:04 +0000 (16:14 +0100)]
ipv6: sr: add option to control lwtunnel support
This patch adds a new option CONFIG_IPV6_SEG6_LWTUNNEL to enable/disable
support of encapsulation with the lightweight tunnels. When this option
is enabled, CONFIG_LWTUNNEL is automatically selected.
Fix commit
6c8702c60b88 ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels")
Without a proper option to control lwtunnel support for SR-IPv6, if
CONFIG_LWTUNNEL=n then the IPv6 initialization fails as a consequence
of seg6_iptunnel_init() failure with EOPNOTSUPP:
NET: Registered protocol family 10
IPv6: Attempt to unregister permanent protocol 6
IPv6: Attempt to unregister permanent protocol 136
IPv6: Attempt to unregister permanent protocol 17
NET: Unregistered protocol family 10
Tested (compiling, booting, and loading ipv6 module when relevant)
with possible combinations of CONFIG_IPV6={y,m,n},
CONFIG_IPV6_SEG6_LWTUNNEL={y,n} and CONFIG_LWTUNNEL={y,n}.
Reported-by: Lorenzo Colitti <lorenzo@google.com>
Suggested-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 16 Nov 2016 03:46:31 +0000 (22:46 -0500)]
Merge branch 'alx-multiqueue-support'
Tobias Regnery says:
====================
alx: add multi queue support
This patchset lays the groundwork for multi queue support in the alx driver
and enables multi queue support for the tx path by default. The hardware
supports up to 4 tx queues.
Benefits are better utilization of multi core cpus and the usage of the
msi-x support by default which splits the handling of rx / tx and misc
other interrupts.
The rx path is a little bit harder because apparently (based on the limited
information from the downstream driver) the hardware supports up to 8 rss
queues but only has one hardware descriptor ring on the rx side. So the rx
path will be part of another patchset.
Tested on my AR8161 ethernet adapter with different tests:
- there are no regressions observed during my daily usage
- iperf tcp and udp tests shows no performance regressions
- netperf TCP_RR and UDP_RR shows a slight performance increase of about
1-2% with this patchset applied
This work is based on the downstream driver at github.com/qca/alx
Changes in V2:
- drop unneeded casts in alx_alloc_rx_ring (Patch 1)
- add additional information about testing and benefit to the
changelog
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Tobias Regnery [Tue, 15 Nov 2016 11:43:16 +0000 (12:43 +0100)]
alx: enable multiple tx queues
Enable multiple tx queues by default based on the number of online cpus. The
hardware supports up to four tx queues.
Based on the downstream driver at github.com/qca/alx
Signed-off-by: Tobias Regnery <tobias.regnery@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tobias Regnery [Tue, 15 Nov 2016 11:43:15 +0000 (12:43 +0100)]
alx: enable msi-x interrupts by default
Remove the module parameter to enable msi-x support and enable msi-x
interrupts unconditionally by default. This is a preparatory step to enable
multi queue support by default, because this is only working with msi-x
interrupts.
Signed-off-by: Tobias Regnery <tobias.regnery@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tobias Regnery [Tue, 15 Nov 2016 11:43:14 +0000 (12:43 +0100)]
alx: prepare tx path for multi queue support
This patch prepares the tx path to send data on multiple tx queues. It
introduces per queue register adresses and uses them in the alx_tx_queue
structs.
There are new helper functions for the queue mapping in the tx path.
Based on the downstream driver at github.com/qca/alx
Signed-off-by: Tobias Regnery <tobias.regnery@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tobias Regnery [Tue, 15 Nov 2016 11:43:13 +0000 (12:43 +0100)]
alx: prepare resource allocation for multi queue support
Allocate, initialise and free alx_tx_queue structs based on the number of
alx_napi structures. Also increase the size of the descriptor memory based
on the number of tx queues in use.
Based on the downstream driver at github.com/qca/alx
Signed-off-by: Tobias Regnery <tobias.regnery@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tobias Regnery [Tue, 15 Nov 2016 11:43:12 +0000 (12:43 +0100)]
alx: prepare interrupt functions for multiple queues
Extend the interrupt bringup code and the interrupt handler for msi-x
interrupts in order to handle multiple queues.
We must change the poll function because with multiple queues it is possible
that an alx_napi structure has only a tx or only a rx queue pointer.
Based on the downstream driver at github.com/qca/alx
Signed-off-by: Tobias Regnery <tobias.regnery@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tobias Regnery [Tue, 15 Nov 2016 11:43:11 +0000 (12:43 +0100)]
alx: switch to per queue data structures
Remove the tx and rx queue structures from the alx_priv structure and switch
everything over to the queue pointers in the alx_napi structure.
Based on the downstream driver at github.com/qca/alx
Signed-off-by: Tobias Regnery <tobias.regnery@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tobias Regnery [Tue, 15 Nov 2016 11:43:10 +0000 (12:43 +0100)]
alx: add ability to allocate and free alx_napi structures
Add new functions to allocate and free the alx_napi structures and use them
in __alx_open and __alx_stop. We only allocate one of these structures for
now, as the rest of the driver is not yet ready for multiple queues.
We switch over the setup of the interrupt mask and the call to netif_napi_add
to the new function because we must adjust these later on a per queue basis.
Based on the downstream driver at github.com/qca/alx
Signed-off-by: Tobias Regnery <tobias.regnery@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tobias Regnery [Tue, 15 Nov 2016 11:43:09 +0000 (12:43 +0100)]
alx: extend data structures for multi queue support
Extend the driver data structures to be able to handle multiple queues.
Based on the downstream driver at github.com/qca/alx
Signed-off-by: Tobias Regnery <tobias.regnery@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tobias Regnery [Tue, 15 Nov 2016 11:43:08 +0000 (12:43 +0100)]
alx: refactor descriptor allocation
Split the allocation of descriptor memory and the buffer allocation into a
tx and rx function. This is in preparation for multiple queues where we
need to iterate over the new functions.
While at it drop the unneeded casting on the rx side.
Signed-off-by: Tobias Regnery <tobias.regnery@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 16 Nov 2016 03:34:27 +0000 (22:34 -0500)]
Merge branch 'dpaa_eth-next'
Madalin Bucur says:
====================
dpaa_eth: Add the QorIQ DPAA Ethernet driver
This patch series adds the Ethernet driver for the Freescale
QorIQ Data Path Acceleration Architecture (DPAA).
This version includes changes following the feedback received
on previous versions from Eric Dumazet, Bob Cochran, Joe Perches,
Paul Bolle, Joakim Tjernlund, Scott Wood, David Miller - thank you.
Together with the driver a managed version of alloc_percpu
is provided that simplifies the release of per-CPU memory.
The Freescale DPAA architecture consists in a series of hardware
blocks that support the Ethernet connectivity. The Ethernet driver
depends upon the following drivers that are currently in the Linux
kernel:
- Peripheral Access Memory Unit (PAMU)
drivers/iommu/fsl_*
- Frame Manager (FMan) added in v4.4
drivers/net/ethernet/freescale/fman
- Queue Manager (QMan), Buffer Manager (BMan) added in v4.9-rc1
drivers/soc/fsl/qbman
dpaa_eth interfaces mapping to FMan MACs:
dpaa_eth /eth0\ ... /ethN\
driver | | | |
------------- ---- ----------- ---- -------------
-Ports / Tx Rx \ ... / Tx Rx \
FMan | | | |
-MACs | MAC0 | | MACN |
/ dtsec0 \ ... / dtsecN \ (or tgec)
/ \ / \(or memac)
--------- -------------- --- -------------- ---------
FMan, FMan Port, FMan SP, FMan MURAM drivers
---------------------------------------------------------
FMan HW blocks: MURAM, MACs, Ports, SP
---------------------------------------------------------
dpaa_eth relation to QMan, FMan:
________________________________
dpaa_eth / eth0 \
driver / \
--------- -^- -^- -^- --- ---------
QMan driver / \ / \ / \ \ / | BMan |
|Rx | |Rx | |Tx | |Tx | | driver |
--------- |Dfl| |Err| |Cnf| |FQs| | |
QMan HW |FQ | |FQ | |FQ | | | | |
/ \ / \ / \ \ / | |
--------- --- --- --- -v- ---------
| FMan QMI | |
| FMan HW FMan BMI | BMan HW |
----------------------- --------
where the acronyms used above (and in the code) are:
DPAA = Data Path Acceleration Architecture
FMan = DPAA Frame Manager
QMan = DPAA Queue Manager
BMan = DPAA Buffers Manager
QMI = QMan interface in FMan
BMI = BMan interface in FMan
FMan SP = FMan Storage Profiles
MURAM = Multi-user RAM in FMan
FQ = QMan Frame Queue
Rx Dfl FQ = default reception FQ
Rx Err FQ = Rx error frames FQ
Tx Cnf FQ = Tx confirmation FQ
Tx FQs = transmission frame queues
dtsec = datapath three speed Ethernet controller (10/100/1000 Mbps)
tgec = ten gigabit Ethernet controller (10 Gbps)
memac = multirate Ethernet MAC (10/100/1000/10000)
Changes from v7:
- remove the debug option to use a common buffer pool for all the
interfaces
Changed from v6:
- fixed an issue on an error path in dpaa_set_mac_address()
- removed NDO operation definitions that were not needed
- sorted the local variable declarations
- cleaned up a few checkpatch checks
- removed friendly network interface naming code
Changes from v5:
- adapt to the latest Q/BMan drivers API
- use build_skb() on Rx path instead of buffer pool refill path
- proper support for multiple buffer pools
- align function, variable names, code cleanup
- driver file structure cleanup
Changes from v4:
- addressed feedback from Scott Wood and Joe Perches
- fixed spelling
- fixed leak of uninitialized stack to userspace
- fix prints
- replace raw_cpu_ptr() with this_cpu_ptr()
- remove _s from the end of structure names
- remove underscores at start of functions, goto labels
- remove likely in error paths
- use container_of() instead of open casts
- remove priv from the driver name
- move return type on same line with function name
- drop DPA_READ_SKB_PTR/DPA_WRITE_SKB_PTR
Changes from v3:
- removed bogus delay and comment in .ndo_stop implementation
- addressed minor issues reported by David Miller
Changes from v2:
- removed debugfs, moved exports to ethtool statistics
- removed congestion groups Kconfig params
Changes from v1:
- bpool level Kconfig options removed
- print format using pr_fmt, cleaned up prints
- __hot/__cold removed
- gratuitous unlikely() removed
- code style aligned, consistent spacing for declarations
- comment formatting
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Madalin Bucur [Tue, 15 Nov 2016 08:41:09 +0000 (10:41 +0200)]
arch/powerpc: Enable dpaa_eth
Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Madalin Bucur [Tue, 15 Nov 2016 08:41:08 +0000 (10:41 +0200)]
arch/powerpc: Enable FSL_FMAN
Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Madalin Bucur [Tue, 15 Nov 2016 08:41:07 +0000 (10:41 +0200)]
arch/powerpc: Enable FSL_PAMU
Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Madalin Bucur [Tue, 15 Nov 2016 08:41:06 +0000 (10:41 +0200)]
dpaa_eth: add trace points
Add trace points on the hot processing path.
Signed-off-by: Ruxandra Ioana Radulescu <ruxandra.radulescu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Madalin Bucur [Tue, 15 Nov 2016 08:41:05 +0000 (10:41 +0200)]
dpaa_eth: add sysfs exports
Export Frame Queue and Buffer Pool IDs through sysfs.
Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Madalin Bucur [Tue, 15 Nov 2016 08:41:04 +0000 (10:41 +0200)]
dpaa_eth: add ethtool statistics
Add a series of counters to be exported through ethtool:
- add detailed counters for reception errors;
- add detailed counters for QMan enqueue reject events;
- count the number of fragmented skbs received from the stack;
- count all frames received on the Tx confirmation path;
- add congestion group statistics;
- count the number of interrupts for each CPU.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Madalin Bucur [Tue, 15 Nov 2016 08:41:03 +0000 (10:41 +0200)]
dpaa_eth: add ethtool functionality
Add support for basic ethtool operations.
Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Madalin Bucur [Tue, 15 Nov 2016 08:41:02 +0000 (10:41 +0200)]
dpaa_eth: add support for DPAA Ethernet
This introduces the Freescale Data Path Acceleration Architecture
(DPAA) Ethernet driver (dpaa_eth) that builds upon the DPAA QMan,
BMan, PAMU and FMan drivers to deliver Ethernet connectivity on
the Freescale DPAA QorIQ platforms.
Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Madalin Bucur [Tue, 15 Nov 2016 08:41:01 +0000 (10:41 +0200)]
devres: add devm_alloc_percpu()
Introduce managed counterparts for alloc_percpu() and free_percpu().
Add devm_alloc_percpu() and devm_free_percpu() into the managed
interfaces list.
Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrey Vagin [Tue, 15 Nov 2016 02:15:14 +0000 (18:15 -0800)]
tcp: allow to enable the repair mode for non-listening sockets
The repair mode is used to get and restore sequence numbers and
data from queues. It used to checkpoint/restore connections.
Currently the repair mode can be enabled for sockets in the established
and closed states, but for other states we have to dump the same socket
properties, so lets allow to enable repair mode for these sockets.
The repair mode reveals nothing more for sockets in other states.
Signed-off-by: Andrei Vagin <avagin@openvz.org>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 16 Nov 2016 03:24:42 +0000 (22:24 -0500)]
Merge branch 'liquidio-CN23XX-VF-support'
Raghu Vatsavayi says:
====================
liquidio CN23XX VF support
Following is the V6 patch series for adding VF support on
CN23XX devices. This version addressed:
1) Your concern for ordering of local variable declarations
from longest to shortest line.
2) Removed module parameters max_vfs, num_queues_per_{p,v}f.
3) Minor changes for fixing new checkpatch script related
errors on pre-existing driver.
4) Fixed compilation issues when CONFIG_PCI_IOV/CONFIG_PCI_ATS
options are disabled.
5) Modified qualifiers for printing mac addresses with pM format.
I will post remaining VF patches soon after this patchseries is
applied. Please apply patches in the following order as some of
the patches depend on earlier patches.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Raghu Vatsavayi [Mon, 14 Nov 2016 23:54:47 +0000 (15:54 -0800)]
liquidio CN23XX: fix for new check patch errors
New checkpatch script shows some errors with pre-existing
driver. This patch provides fix for those errors.
Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@caviumnetworks.com>
Signed-off-by: Derek Chickles <derek.chickles@caviumnetworks.com>
Signed-off-by: Satanand Burla <satananda.burla@caviumnetworks.com>
Signed-off-by: Felix Manlunas <felix.manlunas@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Raghu Vatsavayi [Mon, 14 Nov 2016 23:54:46 +0000 (15:54 -0800)]
liquidio CN23XX: copyrights changes and alignment
Updated copyrights comments and also changed some other comments
alignments.
Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@caviumnetworks.com>
Signed-off-by: Derek Chickles <derek.chickles@caviumnetworks.com>
Signed-off-by: Satanand Burla <satananda.burla@caviumnetworks.com>
Signed-off-by: Felix Manlunas <felix.manlunas@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Raghu Vatsavayi [Mon, 14 Nov 2016 23:54:45 +0000 (15:54 -0800)]
liquidio CN23XX: code cleanup
Cleaned up unnecessary comments and added some minor macros.
Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@caviumnetworks.com>
Signed-off-by: Derek Chickles <derek.chickles@caviumnetworks.com>
Signed-off-by: Satanand Burla <satananda.burla@caviumnetworks.com>
Signed-off-by: Felix Manlunas <felix.manlunas@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Raghu Vatsavayi [Mon, 14 Nov 2016 23:54:44 +0000 (15:54 -0800)]
liquidio CN23XX: device states
Cleaned up resource leaks during destroy resources by
introducing more device states.
Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@caviumnetworks.com>
Signed-off-by: Derek Chickles <derek.chickles@caviumnetworks.com>
Signed-off-by: Satanand Burla <satananda.burla@caviumnetworks.com>
Signed-off-by: Felix Manlunas <felix.manlunas@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Raghu Vatsavayi [Mon, 14 Nov 2016 23:54:43 +0000 (15:54 -0800)]
liquidio CN23XX: VF related operations
Adds support for VF related operations like mac address vlan
and link changes.
Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@caviumnetworks.com>
Signed-off-by: Derek Chickles <derek.chickles@caviumnetworks.com>
Signed-off-by: Satanand Burla <satananda.burla@caviumnetworks.com>
Signed-off-by: Felix Manlunas <felix.manlunas@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Raghu Vatsavayi [Mon, 14 Nov 2016 23:54:42 +0000 (15:54 -0800)]
liquidio CN23XX: mailbox interrupt processing
Adds support for mailbox interrupt processing of various
commands.
Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@caviumnetworks.com>
Signed-off-by: Derek Chickles <derek.chickles@caviumnetworks.com>
Signed-off-by: Satanand Burla <satananda.burla@caviumnetworks.com>
Signed-off-by: Felix Manlunas <felix.manlunas@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Raghu Vatsavayi [Mon, 14 Nov 2016 23:54:41 +0000 (15:54 -0800)]
liquidio CN23XX: Mailbox support
Adds support for mailbox communication between PF and VF.
Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@caviumnetworks.com>
Signed-off-by: Derek Chickles <derek.chickles@caviumnetworks.com>
Signed-off-by: Satanand Burla <satananda.burla@caviumnetworks.com>
Signed-off-by: Felix Manlunas <felix.manlunas@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Raghu Vatsavayi [Mon, 14 Nov 2016 23:54:40 +0000 (15:54 -0800)]
liquidio CN23XX: sysfs VF config support
Adds sysfs based support for enabling or disabling VFs.
Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@caviumnetworks.com>
Signed-off-by: Derek Chickles <derek.chickles@caviumnetworks.com>
Signed-off-by: Satanand Burla <satananda.burla@caviumnetworks.com>
Signed-off-by: Felix Manlunas <felix.manlunas@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Raghu Vatsavayi [Mon, 14 Nov 2016 23:54:39 +0000 (15:54 -0800)]
liquidio CN23XX: HW config for VF support
Adds support for configuring HW for creating VFs.
Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@caviumnetworks.com>
Signed-off-by: Derek Chickles <derek.chickles@caviumnetworks.com>
Signed-off-by: Satanand Burla <satananda.burla@caviumnetworks.com>
Signed-off-by: Felix Manlunas <felix.manlunas@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 16 Nov 2016 03:12:58 +0000 (22:12 -0500)]
Merge branch 'amd-xgbe-next'
Tom Lendacky says:
====================
amd-xgbe: AMD XGBE driver updates 2016-11-14
This patch series addresses some minor issues found in the recently
accepted patch series for the AMD XGBE driver.
The following fixes are included in this driver update series:
- Fix how a mask is applied to a Clause 37 register value
- Fix some coccinelle identified warnings
This patch series is based on net-next.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Lendacky, Thomas [Mon, 14 Nov 2016 22:39:16 +0000 (16:39 -0600)]
amd-xgbe: Fix up some coccinelle identified warnings
Fix up some warnings that were identified by coccinelle:
Clean up an if/else block that can look confusing since the same statement
is executed in an "else if" check and the final "else" statement.
Change a variable from unsigned int to int since it is used in an if
statement checking the value to be less than 0.
Reported-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lendacky, Thomas [Mon, 14 Nov 2016 22:39:07 +0000 (16:39 -0600)]
amd-xgbe: Fix mask appliciation for Clause 37 register
The application of a mask to clear an area of a clause 37 register value
was not properly applied. Update the code to do the proper application
of the mask.
Reported-by: Marion & Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 16 Nov 2016 03:05:55 +0000 (22:05 -0500)]
Merge branch 'sun4i-emac-big-endian'
Michael Weiser says:
====================
sun4i-emac: Fixes for running a big-endian kernel on Cubieboard2
the following patches are what remains to be fixed in order to allow running a
big-endian kernel on the Cubieboard2.
The first patch fixes up endianness problems with DMA descriptors in
the stmmac driver preventing it from working correctly when runnning a
big-endian kernel.
The second patch adds the ability to enable diagnostic messages in the
sun4i-emac driver which were instrumental in finding the problem fixed
by patch number three: Endianness confusion caused by dual-purpose I/O
register usage in sun4i-emac.
All of these have been tested successfully on a Cubieboard2 DualCard.
Changes since v4:
- Rebased to current master
- Removed already applied patches to sunxi-mmc and sunxi-Kconfig
Changes since v3:
- Rebased sunxi-mmc patch against Ulf's mmc.git/next
- Changed Kconfig change to enable big-endian support only for sun7i
devices
Changes since v2:
- Fixed typo in stmmac patch causing a build failure
- Added sun4i-emac patches
Changes since v1:
- Fixed checkpatch niggles
- Added respective Cc:s
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Weiser [Mon, 14 Nov 2016 17:58:07 +0000 (18:58 +0100)]
net: ethernet: sun4i-emac: Read rxhdr in CPU byte-order
The EMAC EMAC_RX_IO_DATA_REG data register is dual-purpose: On one hand
it is used to move actual packet data off the wire. This will be in
wire-format and accepted as such by higher layers such as IP. Therefore
it is correctly read as-is (i.e. raw) using readsl.
On the other hand it provides metadata about incoming transfers to the
driver such as length and checksum validation status. This data is
little-endian, always and it is interpreted by the driver. Therefore it
needs to be swapped to CPU endianness to make sense to the driver. This
is already done for the "receive header" but not rxhdr.
Read rxhdr using readl in order for sun4i-emac to work correctly when
running a big-endian kernel.
Signed-off-by: Michael Weiser <michael.weiser@gmx.de>
Cc: Maxime Ripard <maxime.ripard@free-electrons.com>
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Weiser [Mon, 14 Nov 2016 17:58:06 +0000 (18:58 +0100)]
net: ethernet: sun4i-emac: Allow to enable netif messages
sun4i-emac has the ability to print a number of diagnostic messages using
dev_dbg depending on message level settings implemented using netif_msg_*
macros. But there's no way to actually enable them.
Add the ability to switch diagnostic messages on using either a module
parameter debug or ethtool -s <netif> msglvl <flags>.
Signed-off-by: Michael Weiser <michael.weiser@gmx.de>
Cc: Maxime Ripard <maxime.ripard@free-electrons.com>
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Weiser [Mon, 14 Nov 2016 17:58:05 +0000 (18:58 +0100)]
net: ethernet: stmmac: change dma descriptors to __le32
The stmmac driver does not take into account the processor may be big
endian when writing the DMA descriptors. This causes the ethernet
interface not to be initialised correctly when running a big-endian
kernel. Change the descriptors for DMA to use __le32 and ensure they are
suitably swapped before writing. Tested successfully on the
Cubieboard2.
Signed-off-by: Michael Weiser <michael.weiser@gmx.de>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Westphal [Mon, 14 Nov 2016 15:42:01 +0000 (16:42 +0100)]
dctcp: update cwnd on congestion event
draft-ietf-tcpm-dctcp-02 says:
... when the sender receives an indication of congestion
(ECE), the sender SHOULD update cwnd as follows:
cwnd = cwnd * (1 - DCTCP.Alpha / 2)
So, lets do this and reduce cwnd more smoothly (and faster), as per
current congestion estimate.
Cc: Lawrence Brakmo <brakmo@fb.com>
Cc: Andrew Shewmaker <agshew@gmail.com>
Cc: Glenn Judd <glenn.judd@morganstanley.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Wed, 16 Nov 2016 02:21:09 +0000 (18:21 -0800)]
net: bcm63xx_enet: Fix build failure with phy_ethtool_nway_reset
Introduced a typo making the driver no longer build, *sigh*.
Fixes:
42469bf5d9bb ("net: bcm63xx_enet: Utilize phy_ethtool_nway_reset")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Philippe Reynes [Sun, 13 Nov 2016 22:33:46 +0000 (23:33 +0100)]
net: bnx2: use new api ethtool_{get|set}_link_ksettings
The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 15 Nov 2016 21:33:38 +0000 (16:33 -0500)]
Merge branch 'phy_ethtool_nway_reset'
Florian Fainelli says:
====================
net: phy: Centralize auto-negotation restart
This patch series centralizes how ethtool::nway_reset is implemented
by providing a PHYLIB function which calls into genphy_restart_aneg().
All drivers below are converted to use this new helper function. Some
other have specific requirements that make them not quite suitable for
a straight forward conversion.
There is another patch series which implements ethtool::nway_reset
using the helper function introduced that depends on this patch
series.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Tue, 15 Nov 2016 18:06:44 +0000 (10:06 -0800)]
net: usb: lan78xx: Utilize phy_ethtool_nway_reset
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Woojung Huh <woojung.huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Tue, 15 Nov 2016 18:06:43 +0000 (10:06 -0800)]
net: usb: ax88172x: Utilize phy_ethtool_nway_reset
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Tue, 15 Nov 2016 18:06:42 +0000 (10:06 -0800)]
net: ethernet: lantiq_etop: Utilize phy_ethtool_nway_reset
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Thomas Langer <Thomas.langer@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Tue, 15 Nov 2016 18:06:41 +0000 (10:06 -0800)]
net: ethernet: ucc: Utilize phy_ethtool_nway_reset
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Tue, 15 Nov 2016 18:06:40 +0000 (10:06 -0800)]
net: fec: Utilize phy_ethtool_nway_reset
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Tue, 15 Nov 2016 18:06:39 +0000 (10:06 -0800)]
net: fs_enet: Utilize phy_ethtool_nway_reset
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Tue, 15 Nov 2016 18:06:38 +0000 (10:06 -0800)]
net: bcmgenet: Utilize phy_ethtool_nway_reset
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Tue, 15 Nov 2016 18:06:37 +0000 (10:06 -0800)]
net: ethernet: ixp4xx_eth: Utilize phy_ethtool_nway_reset
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Tue, 15 Nov 2016 18:06:36 +0000 (10:06 -0800)]
net: ethernet: ll_temac: Utilize phy_ethtool_nway_reset
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Tue, 15 Nov 2016 18:06:35 +0000 (10:06 -0800)]
net: ethernet: smsc9420: Utilize phy_ethtool_nway_reset
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Tue, 15 Nov 2016 18:06:34 +0000 (10:06 -0800)]
net: smsc911x: Utilize phy_ethtool_nway_reset
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Tue, 15 Nov 2016 18:06:33 +0000 (10:06 -0800)]
net: mv643xx_eth: Utilize phy_ethtool_nway_reset
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Tue, 15 Nov 2016 18:06:32 +0000 (10:06 -0800)]
net: bcm63xx_enet: Utilize phy_ethtool_nway_reset
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Tue, 15 Nov 2016 18:06:31 +0000 (10:06 -0800)]
net: nb8800: Utilize phy_ethtool_nway_reset
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Tue, 15 Nov 2016 18:06:30 +0000 (10:06 -0800)]
net: phy: Add phy_ethtool_nway_reset
This function just calls into genphy_restart_aneg() to perform an
autonegotation restart.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 15 Nov 2016 21:32:11 +0000 (16:32 -0500)]
vxlan: Fix uninitialized variable warnings.
drivers/net/vxlan.c: In function ‘vxlan_xmit_one’:
drivers/net/vxlan.c:2141:10: warning: ‘err’ may be used uninitialized in this function [-Wmaybe-uninitialized]
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 15 Nov 2016 17:16:14 +0000 (12:16 -0500)]
Merge branch 'vxlan-xmit-improvements'
Pravin B Shelar says:
====================
vxlan: xmit improvements.
Following patch series improves vxlan fast path, removes
duplicate code and simplifies vxlan xmit code path.
v2-v3:
Removed unrelated warning fix from patch 2.
rearranged error handling from patch 3
Fixed stats updates in vxlan route lookup in patch 4
v1-v2:
Fix compilation error when IPv6 support is not enabled.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
pravin shelar [Mon, 14 Nov 2016 04:43:58 +0000 (20:43 -0800)]
vxlan: remove unsed vxlan_dev_dst_port()
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
pravin shelar [Mon, 14 Nov 2016 04:43:57 +0000 (20:43 -0800)]
vxlan: simplify vxlan xmit
Existing vxlan xmit function handles two distinct cases.
1. vxlan net device
2. vxlan lwt device.
By seperating initialization these two cases the egress path
looks better.
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
pravin shelar [Mon, 14 Nov 2016 04:43:56 +0000 (20:43 -0800)]
vxlan: simplify RTF_LOCAL handling.
Avoid code duplicate code for handling RTF_LOCAL routes.
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
pravin shelar [Mon, 14 Nov 2016 04:43:55 +0000 (20:43 -0800)]
vxlan: improve vxlan route lookup checks.
Move route sanity check to respective vxlan[4/6]_get_route functions.
This allows us to perform all sanity checks before caching the dst so
that we can avoid these checks on subsequent packets.
This give move accurate metadata information for packet from
fill_metadata_dst().
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
pravin shelar [Mon, 14 Nov 2016 04:43:54 +0000 (20:43 -0800)]
vxlan: simplify exception handling
vxlan egress path error handling has became complicated, it
need to handle IPv4 and IPv6 tunnel cases.
Earlier patch removes vlan handling from vxlan_build_skb(), so
vxlan_build_skb does not need to free skb and we can simplify
the xmit path by having single error handling for both type of
tunnels.
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
pravin shelar [Mon, 14 Nov 2016 04:43:53 +0000 (20:43 -0800)]
vxlan: avoid checking socket multiple times.
Check the vxlan socket in vxlan6_getroute().
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
pravin shelar [Mon, 14 Nov 2016 04:43:52 +0000 (20:43 -0800)]
vxlan: avoid vlan processing in vxlan device.
VxLan device does not have special handling for vlan taging on egress.
Therefore it does not make sense to expose vlan offloading feature.
This patch does not change vxlan functinality.
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Tue, 15 Nov 2016 15:37:53 +0000 (16:37 +0100)]
udplite: fix NULL pointer dereference
The commit
850cbaddb52d ("udp: use it's own memory accounting schema")
assumes that the socket proto has memory accounting enabled,
but this is not the case for UDPLITE.
Fix it enabling memory accounting for UDPLITE and performing
fwd allocated memory reclaiming on socket shutdown.
UDP and UDPLITE share now the same memory accounting limits.
Also drop the backlog receive operation, since is no more needed.
Fixes:
850cbaddb52d ("udp: use it's own memory accounting schema")
Reported-by: Andrei Vagin <avagin@gmail.com>
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 15 Nov 2016 16:50:51 +0000 (11:50 -0500)]
Merge branch 'bpf-lru'
Martin KaFai Lau says:
====================
bpf: LRU map
This patch set adds LRU map implementation to the existing BPF map
family.
The first few patches introduce the basic BPF LRU list
implementation.
The later patches introduce the LRU versions of the
existing BPF_MAP_TYPE_LRU_[PERCPU_]HASH maps by leveraging
the BPF LRU list.
v2:
- Added a percpu LRU list option which can be specified as
a map attribute.
[Note: percpu LRU list has nothing to do with the map's value]
- Removed the cpu variable from the struct bpf_lru_locallist
since it is not needed.
- Changed the __bpf_lru_node_move_out to __bpf_lru_node_move_to_free in
patch 1 to prepare the percpu LRU list in patch 2.
- Moved the test_lru_map under selftests
- Refactored a few things in the test codes
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Martin KaFai Lau [Fri, 11 Nov 2016 18:55:11 +0000 (10:55 -0800)]
bpf: Add tests for the LRU bpf_htab
This patch has some unit tests and a test_lru_dist.
The test_lru_dist reads in the numeric keys from a file.
The files used here are generated by a modified fio-genzipf tool
originated from the fio test suit. The sample data file can be
found here: https://github.com/iamkafai/bpf-lru
The zipf.* data files have 100k numeric keys and the key is also
ranged from 1 to 100k.
The test_lru_dist outputs the number of unique keys (nr_unique).
F.e. The following means, 61239 of them is unique out of 100k keys.
nr_misses means it cannot be found in the LRU map, so nr_misses
must be >= nr_unique. test_lru_dist also simulates a perfect LRU
map as a comparison:
[root@arch-fb-vm1 ~]# ~/devshare/fb-kernel/linux/samples/bpf/test_lru_dist \
/root/zipf.100k.a1_01.out 4000 1
...
test_parallel_lru_dist (map_type:9 map_flags:0x0):
task:0 BPF LRU: nr_unique:23093(/100000) nr_misses:31603(/100000)
task:0 Perfect LRU: nr_unique:23093(/100000 nr_misses:34328(/100000)
....
test_parallel_lru_dist (map_type:9 map_flags:0x2):
task:0 BPF LRU: nr_unique:23093(/100000) nr_misses:31710(/100000)
task:0 Perfect LRU: nr_unique:23093(/100000 nr_misses:34328(/100000)
[root@arch-fb-vm1 ~]# ~/devshare/fb-kernel/linux/samples/bpf/test_lru_dist \
/root/zipf.100k.a0_01.out 40000 1
...
test_parallel_lru_dist (map_type:9 map_flags:0x0):
task:0 BPF LRU: nr_unique:61239(/100000) nr_misses:67054(/100000)
task:0 Perfect LRU: nr_unique:61239(/100000 nr_misses:66993(/100000)
...
test_parallel_lru_dist (map_type:9 map_flags:0x2):
task:0 BPF LRU: nr_unique:61239(/100000) nr_misses:67068(/100000)
task:0 Perfect LRU: nr_unique:61239(/100000 nr_misses:66993(/100000)
LRU map has also been added to map_perf_test:
/* Global LRU */
[root@kerneltest003.31.prn1 ~]# for i in 1 4 8; do echo -n "$i cpus: "; \
./map_perf_test 16 $i | awk '{r += $3}END{print r " updates"}'; done
1 cpus:
2934082 updates
4 cpus:
7391434 updates
8 cpus:
6500576 updates
/* Percpu LRU */
[root@kerneltest003.31.prn1 ~]# for i in 1 4 8; do echo -n "$i cpus: "; \
./map_perf_test 32 $i | awk '{r += $3}END{print r " updates"}'; done
1 cpus:
2896553 updates
4 cpus:
9766395 updates
8 cpus:
17460553 updates
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Martin KaFai Lau [Fri, 11 Nov 2016 18:55:10 +0000 (10:55 -0800)]
bpf: Add BPF_MAP_TYPE_LRU_PERCPU_HASH
Provide a LRU version of the existing BPF_MAP_TYPE_PERCPU_HASH
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Martin KaFai Lau [Fri, 11 Nov 2016 18:55:09 +0000 (10:55 -0800)]
bpf: Add BPF_MAP_TYPE_LRU_HASH
Provide a LRU version of the existing BPF_MAP_TYPE_HASH.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Martin KaFai Lau [Fri, 11 Nov 2016 18:55:08 +0000 (10:55 -0800)]
bpf: Refactor codes handling percpu map
Refactor the codes that populate the value
of a htab_elem in a BPF_MAP_TYPE_PERCPU_HASH
typed bpf_map.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Martin KaFai Lau [Fri, 11 Nov 2016 18:55:07 +0000 (10:55 -0800)]
bpf: Add percpu LRU list
Instead of having a common LRU list, this patch allows a
percpu LRU list which can be selected by specifying a map
attribute. The map attribute will be added in the later
patch.
While the common use case for LRU is #reads >> #updates,
percpu LRU list allows bpf prog to absorb unusual #updates
under pathological case (e.g. external traffic facing machine which
could be under attack).
Each percpu LRU is isolated from each other. The LRU nodes (including
free nodes) cannot be moved across different LRU Lists.
Here are the update performance comparison between
common LRU list and percpu LRU list (the test code is
at the last patch):
[root@kerneltest003.31.prn1 ~]# for i in 1 4 8; do echo -n "$i cpus: "; \
./map_perf_test 16 $i | awk '{r += $3}END{print r " updates"}'; done
1 cpus:
2934082 updates
4 cpus:
7391434 updates
8 cpus:
6500576 updates
[root@kerneltest003.31.prn1 ~]# for i in 1 4 8; do echo -n "$i cpus: "; \
./map_perf_test 32 $i | awk '{r += $3}END{printr " updates"}'; done
1 cpus:
2896553 updates
4 cpus:
9766395 updates
8 cpus:
17460553 updates
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Martin KaFai Lau [Fri, 11 Nov 2016 18:55:06 +0000 (10:55 -0800)]
bpf: LRU List
Introduce bpf_lru_list which will provide LRU capability to
the bpf_htab in the later patch.
* General Thoughts:
1. Target use case. Read is more often than update.
(i.e. bpf_lookup_elem() is more often than bpf_update_elem()).
If bpf_prog does a bpf_lookup_elem() first and then an in-place
update, it still counts as a read operation to the LRU list concern.
2. It may be useful to think of it as a LRU cache
3. Optimize the read case
3.1 No lock in read case
3.2 The LRU maintenance is only done during bpf_update_elem()
4. If there is a percpu LRU list, it will lose the system-wise LRU
property. A completely isolated percpu LRU list has the best
performance but the memory utilization is not ideal considering
the work load may be imbalance.
5. Hence, this patch starts the LRU implementation with a global LRU
list with batched operations before accessing the global LRU list.
As a LRU cache, #read >> #update/#insert operations, it will work well.
6. There is a local list (for each cpu) which is named
'struct bpf_lru_locallist'. This local list is not used to sort
the LRU property. Instead, the local list is to batch enough
operations before acquiring the lock of the global LRU list. More
details on this later.
7. In the later patch, it allows a percpu LRU list by specifying a
map-attribute for scalability reason and for use cases that need to
prepare for the worst (and pathological) case like DoS attack.
The percpu LRU list is completely isolated from each other and the
LRU nodes (including free nodes) cannot be moved across the list. The
following description is for the global LRU list but mostly applicable
to the percpu LRU list also.
* Global LRU List:
1. It has three sub-lists: active-list, inactive-list and free-list.
2. The two list idea, active and inactive, is borrowed from the
page cache.
3. All nodes are pre-allocated and all sit at the free-list (of the
global LRU list) at the beginning. The pre-allocation reasoning
is similar to the existing BPF_MAP_TYPE_HASH. However,
opting-out prealloc (BPF_F_NO_PREALLOC) is not supported in
the LRU map.
* Active/Inactive List (of the global LRU list):
1. The active list, as its name says it, maintains the active set of
the nodes. We can think of it as the working set or more frequently
accessed nodes. The access frequency is approximated by a ref-bit.
The ref-bit is set during the bpf_lookup_elem().
2. The inactive list, as its name also says it, maintains a less
active set of nodes. They are the candidates to be removed
from the bpf_htab when we are running out of free nodes.
3. The ordering of these two lists is acting as a rough clock.
The tail of the inactive list is the older nodes and
should be released first if the bpf_htab needs free element.
* Rotating the Active/Inactive List (of the global LRU list):
1. It is the basic operation to maintain the LRU property of
the global list.
2. The active list is only rotated when the inactive list is running
low. This idea is similar to the current page cache.
Inactive running low is currently defined as
"# of inactive < # of active".
3. The active list rotation always starts from the tail. It moves
node without ref-bit set to the head of the inactive list.
It moves node with ref-bit set back to the head of the active
list and then clears its ref-bit.
4. The inactive rotation is pretty simply.
It walks the inactive list and moves the nodes back to the head of
active list if its ref-bit is set. The ref-bit is cleared after moving
to the active list.
If the node does not have ref-bit set, it just leave it as it is
because it is already in the inactive list.
* Shrinking the Inactive List (of the global LRU list):
1. Shrinking is the operation to get free nodes when the bpf_htab is
full.
2. It usually only shrinks the inactive list to get free nodes.
3. During shrinking, it will walk the inactive list from the tail,
delete the nodes without ref-bit set from bpf_htab.
4. If no free node found after step (3), it will forcefully get
one node from the tail of inactive or active list. Forcefully is
in the sense that it ignores the ref-bit.
* Local List:
1. Each CPU has a 'struct bpf_lru_locallist'. The purpose is to
batch enough operations before acquiring the lock of the
global LRU.
2. A local list has two sub-lists, free-list and pending-list.
3. During bpf_update_elem(), it will try to get from the free-list
of (the current CPU local list).
4. If the local free-list is empty, it will acquire from the
global LRU list. The global LRU list can either satisfy it
by its global free-list or by shrinking the global inactive
list. Since we have acquired the global LRU list lock,
it will try to get at most LOCAL_FREE_TARGET elements
to the local free list.
5. When a new element is added to the bpf_htab, it will
first sit at the pending-list (of the local list) first.
The pending-list will be flushed to the global LRU list
when it needs to acquire free nodes from the global list
next time.
* Lock Consideration:
The LRU list has a lock (lru_lock). Each bucket of htab has a
lock (buck_lock). If both locks need to be acquired together,
the lock order is always lru_lock -> buck_lock and this only
happens in the bpf_lru_list.c logic.
In hashtab.c, both locks are not acquired together (i.e. one
lock is always released first before acquiring another lock).
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 15 Nov 2016 15:54:36 +0000 (10:54 -0500)]
Merge git://git./linux/kernel/git/davem/net
Several cases of bug fixes in 'net' overlapping other changes in
'net-next-.
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Mon, 14 Nov 2016 22:15:53 +0000 (14:15 -0800)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) Fix off by one wrt. indexing when dumping /proc/net/route entries,
from Alexander Duyck.
2) Fix lockdep splats in iwlwifi, from Johannes Berg.
3) Cure panic when inserting certain netfilter rules when NFT_SET_HASH
is disabled, from Liping Zhang.
4) Memory leak when nft_expr_clone() fails, also from Liping Zhang.
5) Disable UFO when path will apply IPSEC tranformations, from Jakub
Sitnicki.
6) Don't bogusly double cwnd in dctcp module, from Florian Westphal.
7) skb_checksum_help() should never actually use the value "0" for the
resulting checksum, that has a special meaning, use CSUM_MANGLED_0
instead. From Eric Dumazet.
8) Per-tx/rx queue statistic strings are wrong in qed driver, fix from
Yuval MIntz.
9) Fix SCTP reference counting of associations and transports in
sctp_diag. From Xin Long.
10) When we hit ip6tunnel_xmit() we could have come from an ipv4 path in
a previous layer or similar, so explicitly clear the ipv6 control
block in the skb. From Eli Cooper.
11) Fix bogus sleeping inside of inet_wait_for_connect(), from WANG
Cong.
12) Correct deivce ID of T6 adapter in cxgb4 driver, from Hariprasad
Shenai.
13) Fix potential access past the end of the skb page frag array in
tcp_sendmsg(). From Eric Dumazet.
14) 'skb' can legitimately be NULL in inet{,6}_exact_dif_match(). Fix
from David Ahern.
15) Don't return an error in tcp_sendmsg() if we wronte any bytes
successfully, from Eric Dumazet.
16) Extraneous unlocks in netlink_diag_dump(), we removed the locking
but forgot to purge these unlock calls. From Eric Dumazet.
17) Fix memory leak in error path of __genl_register_family(). We leak
the attrbuf, from WANG Cong.
18) cgroupstats netlink policy table is mis-sized, from WANG Cong.
19) Several XDP bug fixes in mlx5, from Saeed Mahameed.
20) Fix several device refcount leaks in network drivers, from Johan
Hovold.
21) icmp6_send() should use skb dst device not skb->dev to determine L3
routing domain. From David Ahern.
22) ip_vs_genl_family sets maxattr incorrectly, from WANG Cong.
23) We leak new macvlan port in some cases of maclan_common_netlink()
errors. Fix from Gao Feng.
24) Similar to the icmp6_send() fix, icmp_route_lookup() should
determine L3 routing domain using skb_dst(skb)->dev not skb->dev.
Also from David Ahern.
25) Several fixes for route offloading and FIB notification handling in
mlxsw driver, from Jiri Pirko.
26) Properly cap __skb_flow_dissect()'s return value, from Eric Dumazet.
27) Fix long standing regression in ipv4 redirect handling, wrt.
validating the new neighbour's reachability. From Stephen Suryaputra
Lin.
28) If sk_filter() trims the packet excessively, handle it reasonably in
tcp input instead of exploding. From Eric Dumazet.
29) Fix handling of napi hash state when copying channels in sfc driver,
from Bert Kenward.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (121 commits)
mlxsw: spectrum_router: Flush FIB tables during fini
net: stmmac: Fix lack of link transition for fixed PHYs
sctp: change sk state only when it has assocs in sctp_shutdown
bnx2: Wait for in-flight DMA to complete at probe stage
Revert "bnx2: Reset device during driver initialization"
ps3_gelic: fix spelling mistake in debug message
net: ethernet: ixp4xx_eth: fix spelling mistake in debug message
ibmvnic: Fix size of debugfs name buffer
ibmvnic: Unmap ibmvnic_statistics structure
sfc: clear napi_hash state when copying channels
mlxsw: spectrum_router: Correctly dump neighbour activity
mlxsw: spectrum: Fix refcount bug on span entries
bnxt_en: Fix VF virtual link state.
bnxt_en: Fix ring arithmetic in bnxt_setup_tc().
Revert "include/uapi/linux/atm_zatm.h: include linux/time.h"
tcp: take care of truncations done by sk_filter()
ipv4: use new_gw for redirect neigh lookup
r8152: Fix error path in open function
net: bpqether.h: remove if_ether.h guard
net: __skb_flow_dissect() must cap its return value
...
Linus Torvalds [Mon, 14 Nov 2016 22:07:13 +0000 (14:07 -0800)]
Merge branch 'stable' of git://git./linux/kernel/git/cmetcalf/linux-tile
Pull arch/tile bugfix from Chris Metcalf:
"This just fixes an incompatibility with tile __ro_after_init"
* 'stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
tile: handle __ro_after_init like parisc does
Linus Torvalds [Mon, 14 Nov 2016 22:00:29 +0000 (14:00 -0800)]
Merge tag 'rtc-4.9-2' of git://git./linux/kernel/git/abelloni/linux
Pull RTC fixes from Alexandre Belloni:
"Here are a few driver fixes for 4.9. It has been calm for a while so I
don't expect more for this cycle.
Drivers:
- asm9260: fix module autoload
- cmos: fix crashes
- omap: fix clock handling"
* tag 'rtc-4.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux:
rtc: omap: prevent disabling of clock/module during suspend
rtc: omap: Fix selecting external osc
rtc: cmos: Don't enable interrupts in the middle of the interrupt handler
rtc: cmos: remove all __exit_p annotations
rtc: asm9260: fix module autoload