Ido Schimmel [Wed, 8 Feb 2017 10:16:38 +0000 (11:16 +0100)]
mlxsw: spectrum_router: Use trap action only for some route types
The device can have one of three actions associated with a route:
1) Remote - packets continue to the adjacency table
2) Local - packets continue to the neighbour table
3) Trap - packets continue to the CPU
The first two actions can also trap packets to the CPU, but they do so
using a different trap ID, which has a lower traffic class and less
allotted bandwidth.
We currently use the third action for both RTN_{LOCAL,BROADCAST} routes
and RTN_UNICAST routes not pointing to the switch ports.
However, packets that merely need to be forwarded by the switch are
likely not control packets and can be therefore scheduled towards the
CPU using a lower traffic class.
Achieve the above by assigning the third action only to local and
broadcast routes and have any other route use either of the first two
actions, based on whether the route is gatewayed or not.
This will also allow us to refresh routes using the local action and
have them trap packets when their RIF is no longer valid following a
NH_DEL event.
One side effect of this patch is that we no longer give special
treatment to multipath routes using both switch and non-switch ports
towards their nexthops. If at least one of the nexthops can be resolved,
then the device will forward the packets instead of trapping them.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Wed, 8 Feb 2017 10:16:37 +0000 (11:16 +0100)]
mlxsw: spectrum_router: Determine offload status using generic function
The previous patch introduced a generic function to determine whether a
route should be offloaded or not. Make use of it here.
In the future we're going to add more conditions to this test (e.g.,
whether TOS is non-zero), so it makes sense to centralize it instead of
open coding it in a few places.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Wed, 8 Feb 2017 10:16:36 +0000 (11:16 +0100)]
mlxsw: spectrum_router: More accurately set offload flag
We currently set the RTNH_F_OFFLOAD flag for all routes using remote
action, but this isn't always correct. If none of the nexthops
associated with a gatewayed route can be offloaded into the device, then
any packet hitting it would be trapped to the CPU and forwarded by the
kernel.
Solve this by pushing the setting of the offload flag to after the route
was programmed into the device, thereby allowing us to take all the
parameters into account.
This change will also help us further in the patchset, when we refresh
routes following the reception of NH_{ADD,DEL} events.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Wed, 8 Feb 2017 10:16:35 +0000 (11:16 +0100)]
mlxsw: spectrum_router: Refactor nexthop init routine
The nexthop init and de-init functions both have symmetric parts
concerned with the reflection of the neighbour entry into the device's
adjacency table, in case it's used by a gatewayed route.
These sections of code also need to be called when a nexthop is marked
as valid / invalid following NH_{ADD,DEL} events. Break these out into
appropriate functions, so that they could be invoked following the
reception of above events.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Wed, 8 Feb 2017 10:16:34 +0000 (11:16 +0100)]
mlxsw: spectrum_router: Remove FIB info from FIB entry struct
After the previous changes, the FIB info is embedded in every nexthop
group struct, which in turn is embedded in every FIB entry struct.
We can therefore safely remove the FIB info from the entry struct. This
has the added advantage of making the router-related structs more
generic and suitable for use with IPv6 offloads.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Wed, 8 Feb 2017 10:16:33 +0000 (11:16 +0100)]
mlxsw: spectrum_router: Store routes in a more generic way
Up until now, the only FIB entries that were associated with a nexthop
group were routes to remote networks where all the nexthop devices had a
valid router interface (RIF). This is in contrast to the FIB code,
where all the routes are associated with a FIB info. The same design
choice needs to be applied to the driver's cache.
Based on the NH_{ADD,DEL} events which will be added later in the
patchset, we need to be able to change the action (forward / trap)
associated with all the routes using the nexthop group. However, if we
can't link between the nexthop and the routes using it, then the above
is impossible.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Wed, 8 Feb 2017 10:16:32 +0000 (11:16 +0100)]
mlxsw: spectrum_router: Add gateway indication to nexthop group
The next patch is going to generalize the way in which we store routes.
Instead of attaching a nexthop group only to gatewayed routes, one will
be attached to each route, in a similar way to the way the FIB code
stores its routes.
The above means that any function operating on a nexthop group cannot
assume the group represents only gatewayed nexthops. One such function
is the one that refreshes a nexthop group and updates the adjacency
table following nexthop changes.
For a nexthop group that doesn't represent any gateways this function
would essentially be a NOP, but it would be useful if it did update the
action associated with any route using it. This will allow us to later
consolidate code paths when a nexthop changes following NH_{ADD,DEL}
events.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Wed, 8 Feb 2017 10:16:31 +0000 (11:16 +0100)]
mlxsw: spectrum_router: Use nexthop's scope to set action type
We currently use the scope of the FIB info to distinguish between a
direct unicast route and a gatewayed one. However, the kernel is
perfectly happy to configure a route with scope UNIVERSE to a directly
connected network.
Instead, we can rely on the first nexthop's scope to check if the route
is gatewayed or not.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Wed, 8 Feb 2017 10:16:30 +0000 (11:16 +0100)]
mlxsw: spectrum_router: Store nexthops in a hash table
Later in the patchset we'll add the NH_{ADD,DEL} events which will let
us know when a nexthop is considered to be dead. Based on these events
we need to be able to add or remove the nexthop from the device's
tables.
Therefore, store the private nexthop structs in a hash table and use the
kernel's fib_nh struct as the key, so that we'll be able to easily find
them when the events are received.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Wed, 8 Feb 2017 10:16:29 +0000 (11:16 +0100)]
mlxsw: spectrum_router: Store nexthop groups in a hash table
Currently, when we're notified about a new RTN_UNICAST route we perform
a lookup on the nexthop group list looking for a group with a matching
configuration to that found in the FIB info. This is quite inefficient.
Instead, we can simply rely on the kernel to consolidate several FIB
configurations into the same FIB info and use the FIB info as the key
for our private nexthop group struct.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Wed, 8 Feb 2017 10:16:28 +0000 (11:16 +0100)]
mlxsw: spectrum_router: Nullify nexthop's neigh pointer
When we invalidate a nexthop we should also invalidate its neighbour
entry pointer as it might be destroyed later on. This makes the nexthop
de-init function symmetric with its init and also ensures nobody will
try to access the neighbour entry.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Wed, 8 Feb 2017 09:39:16 +0000 (10:39 +0100)]
mlxsw: acl: Fix mlxsw_afa_block_commit error path
No rollback is needed since the chain is in consistent state and
mlxsw_afa_block_destroy() will take care of putting it away. So remove
the one we have now which is wrong. Also move the set of 'finished' flag
to the beginning of the function, because the block is certainly unusable
for future action addition no matter if the function succeeds or not.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes:
4cda7d8d7098 ("mlxsw: core: Introduce flexible actions support")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 8 Feb 2017 20:11:28 +0000 (15:11 -0500)]
Merge branch 'stmmac-cleanups'
Corentin Labbe says:
====================
net: stmmac: misc fix
I am currently working on dwmac-sun8i glue driver for Allwinner H3/A83T/A64.
This series is the result of all minor problem found in the stmmac driver.
All patch are tested on cubieboard2 via dwmac-sunxi and on pine64/orangepis via dwmac-sun8i.
Changes since v1:
- Removed netdev_dbg() in "net: stmmac: print phy information"
- Removed patch "net: stmmac: Implement NAPI for TX", it will be reworked
- Changed error message in "Correct the error message about invalid speed"
- Added some acked-by
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
LABBE Corentin [Wed, 8 Feb 2017 08:31:21 +0000 (09:31 +0100)]
net: stmmac: replace unsigned by u32
checkpatch complains about two unsigned without type after.
Since the value return is u32, it is simpler to replace it by u32 instead
of "unsigned int"
Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
LABBE Corentin [Wed, 8 Feb 2017 08:31:20 +0000 (09:31 +0100)]
net: stmmac: remove unused variable in sysfs_display_ring
The u64 x variable in sysfs_display_ring is unused.
This patch remove it.
Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
LABBE Corentin [Wed, 8 Feb 2017 08:31:19 +0000 (09:31 +0100)]
net: stmmac: remove dead code in stmmac_tx_clean
Since commit
cf32deec16e4 ("stmmac: add tx_skbuff_dma to save descriptors used by PTP"),
the struct dma_desc *p in stmmac_tx_clean was not used at all.
This patch remove this dead code.
Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
LABBE Corentin [Wed, 8 Feb 2017 08:31:18 +0000 (09:31 +0100)]
net: stmmac: print phy information
When a PHY is found, printing which one was found (and which type/model) is
a good information to know.
Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
LABBE Corentin [Wed, 8 Feb 2017 08:31:17 +0000 (09:31 +0100)]
net: stmmac: rename rx_crc to rx_crc_errors
The ethtool stat counter rx_crc from stmmac is mis-named, the name
seems to speak about the number of RX CRC done, but in fact it is about
errors.
This patch rename it to rx_crc_errors, just like the same ifconfig
counter.
Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
LABBE Corentin [Wed, 8 Feb 2017 08:31:16 +0000 (09:31 +0100)]
net: stmmac: Rewrite two test against NULL value
This patch rewrite two test against NULL value with correct style.
Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
LABBE Corentin [Wed, 8 Feb 2017 08:31:15 +0000 (09:31 +0100)]
net: stmmac: Correct the error message about invalid speed
The message about invalid speed does not state 1000 as a valid speed.
It is much simpler to said that the speed is invalid.
Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
LABBE Corentin [Wed, 8 Feb 2017 08:31:14 +0000 (09:31 +0100)]
net: stmmac: replace ENOSYS by EINVAL
As said by checkpatch ENOSYS means 'invalid syscall nr' and nothing
else.
This patch replace ENOSYS by the more appropriate value EINVAL.
Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
LABBE Corentin [Wed, 8 Feb 2017 08:31:13 +0000 (09:31 +0100)]
net: stmmac: Use readl_poll_timeout
The dwmac_dma_reset function use an open coded of readl_poll_timeout().
Replace the open coded handling with the proper function.
Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
LABBE Corentin [Wed, 8 Feb 2017 08:31:12 +0000 (09:31 +0100)]
net: stmmac: replace stmmac_mdio_busy_wait by readl_poll_timeout
The stmmac_mdio_busy_wait() function do the same job than
readl_poll_timeout().
So is is better to replace it.
Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
LABBE Corentin [Wed, 8 Feb 2017 08:31:11 +0000 (09:31 +0100)]
net: stmmac: fix some code style problem
Checkpatch complains about some code style problem on stmmac_mdio.c.
This patch fix them.
Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
LABBE Corentin [Wed, 8 Feb 2017 08:31:10 +0000 (09:31 +0100)]
net: stmmac: remplace asm/io.h by linux/io.h
This patch fix the checkpatch warning about asm/io.h.
Sorting all includes in the process.
Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
LABBE Corentin [Wed, 8 Feb 2017 08:31:09 +0000 (09:31 +0100)]
net: stmmac: remove freesoftware address
This patch fix the checkpatch warning about free software address.
Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
LABBE Corentin [Wed, 8 Feb 2017 08:31:08 +0000 (09:31 +0100)]
net: stmmac: fix some typos in comments
This patch fix some typos in comments.
Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
LABBE Corentin [Wed, 8 Feb 2017 08:31:07 +0000 (09:31 +0100)]
net: stmmac: Remove the bus_setup function pointer
The bus_setup function pointer is not used at all, this patch remove it.
Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
LABBE Corentin [Wed, 8 Feb 2017 08:31:06 +0000 (09:31 +0100)]
net: stmmac: fix the typo on MAC_RNABLE_RX
the define MAC_RNABLE_RX have a typo, rename it to MAC_ENABLE_RX
Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Wed, 8 Feb 2017 00:19:43 +0000 (01:19 +0100)]
bpf, lpm: fix overflows in trie_alloc checks
Cap the maximum (total) value size and bail out if larger than KMALLOC_MAX_SIZE
as otherwise it doesn't make any sense to proceed further, since we're
guaranteed to fail to allocate elements anyway in lpm_trie_node_alloc();
likleyhood of failure is still high for large values, though, similarly
as with htab case in non-prealloc.
Next, make sure that cost vars are really u64 instead of size_t, so that we
don't overflow on 32 bit and charge only tiny map.pages against memlock while
allowing huge max_entries; cap also the max cost like we do with other map
types.
Fixes:
b95a5c4db09b ("bpf: add a longest prefix match trie map implementation")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Roopa Prabhu [Wed, 8 Feb 2017 00:12:00 +0000 (16:12 -0800)]
bridge: vlan tunnel id info range fill size calc cleanups
This fixes a bug and cleans up tunnelid range size
calculation code by using consistent variable names
and checks in size calculation and fill functions.
tested for a few cases of vlan-vni range mappings:
(output from patched iproute2):
$bridge vlan showtunnel
port vid tunid
vxlan0 100-105 1000-1005
200 2000
210 2100
211-213 2100-2102
214 2104
216-217 2108-2109
219 2119
Fixes:
efa5356b0d97 ("bridge: per vlan dst_metadata netlink support")
Reported-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 7 Feb 2017 23:37:15 +0000 (15:37 -0800)]
gro_cells: move to net/core/gro_cells.c
We have many gro cells users, so lets move the code to avoid
duplication.
This creates a CONFIG_GRO_CELLS option.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Philippe Reynes [Tue, 7 Feb 2017 23:07:33 +0000 (00:07 +0100)]
net: mellanox: switchx2: use new api ethtool_{get|set}_link_ksettings
The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.
As I don't have the hardware, I'd be very pleased if
someone may test this patch.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Tested-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wu Fengguang [Tue, 7 Feb 2017 19:42:05 +0000 (03:42 +0800)]
net: qcom/emac: fix semicolon.cocci warnings
drivers/net/ethernet/qualcomm/emac/emac-ethtool.c:155:49-50: Unneeded semicolon
Remove unneeded semicolon.
Generated by: scripts/coccinelle/misc/semicolon.cocci
CC: Timur Tabi <timur@codeaurora.org>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Raju Lakkaraju [Tue, 7 Feb 2017 13:40:26 +0000 (19:10 +0530)]
net: phy: Add LED mode driver for Microsemi PHYs.
LED Mode:
Microsemi PHY support 2 LEDs (LED[0] and LED[1]) to display different
status information that can be selected by setting LED mode.
LED Mode parameter (vsc8531, led-0-mode) and (vsc8531, led-1-mode) get
from Device Tree.
Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microsemi.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Tue, 7 Feb 2017 13:15:27 +0000 (16:15 +0300)]
net: dsa: bcm_sf2: cleanup bcm_sf2_cfp_rule_get() a little
This patch doesn't affect how the code works.
My static checker complains that the mask and shift doesn't make sense
because 0xffffff << 16 goes beyond the end of 32 bits. It should be
0xffff instead but the existing code won't cause runtime bugs.
Also the casting here is not needed and not consistent with the rest of
the code.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 7 Feb 2017 21:29:30 +0000 (16:29 -0500)]
Merge git://git./linux/kernel/git/davem/net
The conflict was an interaction between a bug fix in the
netvsc driver in 'net' and an optimization of the RX path
in 'net-next'.
Signed-off-by: David S. Miller <davem@davemloft.net>
Felix Manlunas [Tue, 7 Feb 2017 20:10:58 +0000 (12:10 -0800)]
liquidio: do not dereference pointer if it's NULL
Fix smatch errors by not dereferencing iq pointer if it's NULL.
See http://marc.info/?l=kernel-janitors&m=
148637299004834&w=2
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Tue, 7 Feb 2017 20:10:57 +0000 (12:10 -0800)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) Load correct firmware in rtl8192ce wireless driver, from Jurij
Smakov.
2) Fix leak of tx_ring and tx_cq due to overwriting in mlx4 driver,
from Martin KaFai Lau.
3) Need to reference count PHY driver module when it is attached, from
Mao Wenan.
4) Don't do zero length vzalloc() in ethtool register dump, from
Stanislaw Gruszka.
5) Defer net_disable_timestamp() to a workqueue to get out of locking
issues, from Eric Dumazet.
6) We cannot drop the SKB dst when IP options refer to them, fix also
from Eric Dumazet.
7) Incorrect packet header offset calculations in ip6_gre, again from
Eric Dumazet.
8) Missing tcp_v6_restore_cb() causes use-after-free, from Eric too.
9) tcp_splice_read() can get into an infinite loop with URG, and hey
it's from Eric once more.
10) vnet_hdr_sz can change asynchronously, so read it once during
decision making in macvtap and tun, from Willem de Bruijn.
11) Can't use kernel stack for DMA transfers in USB networking drivers,
from Ben Hutchings.
12) Handle csum errors properly in UDP by calling the proper destructor,
from Eric Dumazet.
13) For non-deterministic softirq run when scheduling NAPI from a
workqueue in mlx4, from Benjamin Poirier.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (28 commits)
sctp: check af before verify address in sctp_addr_id2transport
sctp: avoid BUG_ON on sctp_wait_for_sndbuf
mlx4: Invoke softirqs after napi_reschedule
udp: properly cope with csum errors
catc: Use heap buffer for memory size test
catc: Combine failure cleanup code in catc_probe()
rtl8150: Use heap buffers for all register access
pegasus: Use heap buffers for all register access
macvtap: read vnet_hdr_size once
tun: read vnet_hdr_sz once
tcp: avoid infinite loop in tcp_splice_read()
hns: avoid stack overflow with CONFIG_KASAN
ipv6: Fix IPv6 packet loss in scenarios involving roaming + snooping switches
ipv6: tcp: add a missing tcp_v6_restore_cb()
nl80211: Fix mesh HT operation check
mac80211: Fix adding of mesh vendor IEs
mac80211: Allocate a sync skcipher explicitly for FILS AEAD
mac80211: Fix FILS AEAD protection in Association Request frame
ip6_gre: fix ip6gre_err() invalid reads
netlabel: out of bound access in cipso_v4_validate()
...
Hugh Dickins [Tue, 7 Feb 2017 19:11:16 +0000 (11:11 -0800)]
mm: fix KPF_SWAPCACHE in /proc/kpageflags
Commit
6326fec1122c ("mm: Use owner_priv bit for PageSwapCache, valid
when PageSwapBacked") aliased PG_swapcache to PG_owner_priv_1 (and
depending on PageSwapBacked being true).
As a result, the KPF_SWAPCACHE bit in '/proc/kpageflags' should now be
synthesized, instead of being shown on unrelated pages which just happen
to have PG_owner_priv_1 set.
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
stephen hemminger [Tue, 7 Feb 2017 16:46:46 +0000 (08:46 -0800)]
bridge: avoid unnecessary read of jiffies
Jiffies is volatile so read it once.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Tue, 7 Feb 2017 16:27:47 +0000 (17:27 +0100)]
spectrum: acl_tcam: Fix catchall prio value
This fixes an issue reported by smatch:
mlxsw_sp_acl_tcam_chunk_create() warn: impossible condition '(priority == (-1)) => (0-u32max == u64max)'
Reported-by: Or Gerlitz <ogerlitz@mellanox.com>
Reported-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes:
22a677661f56 ("mlxsw: spectrum: Introduce ACL core with simple TCAM implementation")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Roopa Prabhu [Tue, 7 Feb 2017 14:43:23 +0000 (06:43 -0800)]
bridge: remove unnecessary check for vtbegin in br_fill_vlan_tinfo_range
vtbegin should not be NULL in this function, Its already checked by the
caller.
this should silence the below smatch complaint:
net/bridge/br_netlink_tunnel.c:144 br_fill_vlan_tinfo_range()
error: we previously assumed 'vtbegin' could be null (see line 130)
net/bridge/br_netlink_tunnel.c
129
130 if (vtbegin && vtend && (vtend->vid - vtbegin->vid) > 0) {
^^^^^^^
Check for NULL.
Fixes:
efa5356b0d97 ("bridge: per vlan dst_metadata netlink support")
Reported-By: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Marcelo Ricardo Leitner [Tue, 7 Feb 2017 13:37:56 +0000 (11:37 -0200)]
sctp: drop __packed from almost all SCTP structures
__packed is considered harmful as it potentially generates code that
doesn't perform well and its usage should be avoided as much as
possible.
This patch drops __packed from all SCTP structures except one, which is
sctp_signed_cookie. In there it's required, as per changelog on
commit
9834a2bb4970 ("[SCTP]: Fix sctp_cookie alignment in the packet.").
After this patch, no alignment changes neither in x86 or x86_64 and
no exceptions were noticed during testing on both archs.
Code size for SCTP module also didn't change with this patch.
Cc: David Miller <davem@davemloft.net>
Cc: David Laight <David.Laight@ACULAB.COM>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Tue, 7 Feb 2017 12:56:08 +0000 (20:56 +0800)]
sctp: check af before verify address in sctp_addr_id2transport
Commit
6f29a1306131 ("sctp: sctp_addr_id2transport should verify the
addr before looking up assoc") invoked sctp_verify_addr to verify the
addr.
But it didn't check af variable beforehand, once users pass an address
with family = 0 through sockopt, sctp_get_af_specific will return NULL
and NULL pointer dereference will be caused by af->sockaddr_len.
This patch is to fix it by returning NULL if af variable is NULL.
Fixes:
6f29a1306131 ("sctp: sctp_addr_id2transport should verify the addr before looking up assoc")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nikolay Aleksandrov [Tue, 7 Feb 2017 11:46:46 +0000 (12:46 +0100)]
bridge: tunnel: fix attribute checks in br_parse_vlan_tunnel_info
These checks should go after the attributes have been parsed otherwise
we're using tb uninitialized.
Fixes:
efa5356b0d97 ("bridge: per vlan dst_metadata netlink support")
Reported-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Tue, 7 Feb 2017 10:56:38 +0000 (10:56 +0000)]
net: bridge: remove redundant check to see if err is set
The error check on err is redundant as it is being checked
previously each time it has been updated. Remove this redundant
check.
Detected with CoverityScan, CID#140030("Logically dead code")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Tue, 7 Feb 2017 07:44:31 +0000 (10:44 +0300)]
sfc: fix an off by one bug
This bug is harmless because it's just a sanity check and we always
pass valid values for "encap_type" but the test is off by one.
Fixes:
9b4108012517 ("sfc: insert catch-all filters for encapsulated traffic")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lukasz Majewski [Tue, 7 Feb 2017 05:20:24 +0000 (06:20 +0100)]
net: phy: dp83867: Recover from "port mirroring" N/A MODE4
The DP83867 when not properly bootstrapped - especially with LED_0 pin -
can enter N/A MODE4 for "port mirroring" feature.
To provide normal operation of the PHY, one needs not only to explicitly
disable the port mirroring feature, but as well stop some IC internal
testing (which disables RGMII communication).
To do that the STRAP_STS1 (0x006E) register must be read and RESERVED bit
11 examined. When it is set, the another RESERVED bit (11) at PHYCR
(0x0010) register must be clear to disable testing mode and enable RGMII
communication.
Thorough explanation of the problem can be found at following e2e thread:
"DP83867IR: Problem with RESERVED bits in PHY Control Register (PHYCR) -
Linux driver"
https://e2e.ti.com/support/interface/ethernet/f/903/p/571313/
2096954#
2096954
Signed-off-by: Lukasz Majewski <lukma@denx.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lukasz Majewski [Tue, 7 Feb 2017 05:20:23 +0000 (06:20 +0100)]
net: phy: dp83867: Add lane swapping support in the DP83867 TI's PHY driver
This patch adds support for enabling or disabling the lane swapping (called
"port mirroring" in PHY's CFG4 register) feature of the DP83867 TI's PHY
device.
One use case is when bootstrap configuration enables this feature (because
of e.g. LED_0 wrong wiring) so then one needs to disable it in software
(at u-boot/Linux).
Signed-off-by: Lukasz Majewski <lukma@denx.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lukasz Majewski [Tue, 7 Feb 2017 05:21:34 +0000 (06:21 +0100)]
Documentation: devicetree: Add PHY no lane swap binding
Add the documentation to avoid PHY lane swapping. This is a boolean
entry to notify the phy device drivers that the TX/RX lanes NO need
to be swapped.
The use case for this binding mostly happens after wrong HW
configuration of PHY IC during bootstrap.
Signed-off-by: Lukasz Majewski <lukma@denx.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 7 Feb 2017 18:48:23 +0000 (13:48 -0500)]
Merge branch 'Incorrect-use-of-phy_read_status'
Florian Fainelli says:
====================
net: Incorrect use of phy_read_status()
This patch series removes incorrect uses of phy_read_status() which can clobber
the PHY device link while we are executing with the state machine running.
greth was potentially another candidate, but it does funky stuff with
auto-negotation that I am still trying to understand.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Mon, 6 Feb 2017 23:55:23 +0000 (15:55 -0800)]
net: dsa: Do not clobber PHY link outside of state machine
Calling phy_read_status() means that we may call into
genphy_read_status() which in turn will use genphy_update_link() which
can make changes to phydev->link outside of the state machine's state
transitions. This is an invalid behavior that is now caught as of
811a919135b9 ("phy state machine: failsafe leave invalid RUNNING state")
Reported-by: Zefir Kurtisi <zefir.kurtisi@neratec.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Mon, 6 Feb 2017 23:55:22 +0000 (15:55 -0800)]
net: netcp: Do not clobber PHY link outside of state machine
Calling phy_read_status() means that we may call into
genphy_read_status() which in turn will use genphy_update_link() which
can make changes to phydev->link outside of the state machine's state
transitions. This is an invalid behavior that is now caught as off
811a919135b9 ("phy state machine: failsafe leave invalid RUNNING state")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Mon, 6 Feb 2017 23:55:21 +0000 (15:55 -0800)]
net: pxa168_eth: Do not clobber PHY link outside of state machine
Calling phy_read_status() means that we may call into
genphy_read_status() which in turn will use genphy_update_link() which
can make changes to phydev->link outside of the state machine's state
transitions. This is an invalid behavior that is now caught as of
811a919135b9 ("phy state machine: failsafe leave invalid RUNNING state")
Since we don't have anything special, switch to the generic
phy_ethtool_get_link_ksettings() function now.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Mon, 6 Feb 2017 23:55:20 +0000 (15:55 -0800)]
net: mv643xx_eth: Do not clobber PHY link outside of state machine
Calling phy_read_status() means that we may call into
genphy_read_status() which in turn will use genphy_update_link() which
can make changes to phydev->link outside of the state machine's state
transitions. This is an invalid behavior that is now caught as of
811a919135b9 ("phy state machine: failsafe leave invalid RUNNING state")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 7 Feb 2017 18:44:08 +0000 (13:44 -0500)]
Merge tag 'mlx5-updates-2017-01-31' of git://git./linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2017-01-31
This series includes some updates to mlx5 core and ethernet driver.
We got one patch from Or to fix some static checker warnings.
2nd patche from Dan came to add the support for 128B cache line
in the HCA, which will configures the hardware to use 128B alignment only
on systems with 128B cache lines, otherwise it will be kept as the current
default of 64B.
From me three patches to support no inline copy on TX on ConnectX-5 and
later HCAs. Starting with two small infrastructure changes and
refactoring patches followed by two patches to add the actual support for
both xmit ndo and XDP xmit routines.
Last patch is a simple fix to return a mistakenly removed pointer from the
SQ structure, which was remove in previous submission of mlx5 4K UAR.
Saeed.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ivan Khoronzhuk [Mon, 6 Feb 2017 22:53:45 +0000 (00:53 +0200)]
net: ethernet: ti: cpsw: remove netif_trans_update
No need to update jiffies in txq->trans_start twice, it's supposed to be
done in netdev_start_xmit() and anyway is re-written. Also, no reason to
update trans time in case of an error.
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 7 Feb 2017 18:31:01 +0000 (13:31 -0500)]
Merge branch 'bnxt_en-Add-XDP-support'
Michael Chan says:
====================
bnxt_en: Add XDP support.
The first 10 patches refactor the code (rx/tx code paths and ring logic)
and add the basic infrastructure to support XDP. The 11th patch adds
basic ndo_xdp to support XDP_DROP and XDP_PASS only. The 12th patch
completes the series with XDP_TX.
Thanks to Andy Gospodarek for testing and uncovering some bugs.
v3: Removed Kconfig option.
Pass modified offset and length to stack for XDP_PASS.
Improved buffer recycling scheme for XDP_TX.
Other minor fixes.
v2: Addressed review comments from Alexei Starovoitov, Jakub Kicinski,
and David Miller:
- Added missing dma syncs.
- Added XDP headroom support.
- Added tracing in exception path.
- Clarified a parameter change.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Mon, 6 Feb 2017 21:55:43 +0000 (16:55 -0500)]
bnxt_en: Add support for XDP_TX action.
Add dedicated transmit function and transmit completion handler for
XDP. The XDP transmit logic and completion logic are different than
regular TX ring. The TX buffer is recycled back to the RX ring when
it completes.
v3: Improved the buffer recyling scheme for XDP_TX.
v2: Add trace_xdp_exception().
Add dma_sync.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Tested-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Mon, 6 Feb 2017 21:55:42 +0000 (16:55 -0500)]
bnxt_en: Add basic XDP support.
Add basic ndo_xdp support to setup and query program, configure the NIC
to run in rx page mode, and support XDP_PASS, XDP_DROP, XDP_ABORTED
actions only.
v3: Pass modified offset and length to stack for XDP_PASS.
Remove Kconfig option.
v2: Added trace_xdp_exception()
Added dma_syncs.
Added XDP headroom support.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Tested-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Mon, 6 Feb 2017 21:55:41 +0000 (16:55 -0500)]
bnxt_en: Refactor tx completion path.
XDP_TX requires a different function to handle completion. Add a
function pointer to handle tx completion logic. Regular TX rings
will be assigned the current bnxt_tx_int() for the ->tx_int()
function pointer.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Mon, 6 Feb 2017 21:55:40 +0000 (16:55 -0500)]
bnxt_en: Add a set of TX rings to support XDP.
Add logic for an extra set of TX rings for XDP. If enabled, this
set of TX rings equals the number of RX rings and shares the same
IRQ as the RX ring set. A new field bp->tx_nr_rings_xdp is added
to keep track of these TX XDP rings. Adjust all other relevant functions
to handle bp->tx_nr_rings_xdp.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Mon, 6 Feb 2017 21:55:39 +0000 (16:55 -0500)]
bnxt_en: Add tx ring mapping logic.
To support XDP_TX, we need to add a set of dedicated TX rings, each
associated with the NAPI of an RX ring. To assign XDP rings and regular
rings in a flexible way, we add a bp->tx_ring_map[] array to do the
remapping. The netdev txq index is stored in the new field txq_index
so that we can retrieve the netdev txq when handling TX completions.
In this patch, before we introduce XDP_TX, the mapping is 1:1.
v2: Fixed a bug in bnxt_tx_int().
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Mon, 6 Feb 2017 21:55:38 +0000 (16:55 -0500)]
bnxt_en: Centralize logic to reserve rings.
Currently, bnxt_setup_tc() and bnxt_set_channels() have similar and
duplicated code to check and reserve rx and tx rings. Add a new
function bnxt_reserve_rings() to centralize the logic. This will
make it easier to add XDP_TX support which requires allocating a
new set of TX rings.
Also, the tx ring checking logic in bnxt_setup_msix() can be removed.
The rings have been reserved before hand.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Mon, 6 Feb 2017 21:55:37 +0000 (16:55 -0500)]
bnxt_en: Use event bit map in RX path.
In the current code, we have separate rx_event and agg_event parameters
to keep track of rx and aggregation events. Combine these events into
an u8 event mask with different bits defined for different events. This
way, it is easier to expand the logic to include XDP tx events.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Mon, 6 Feb 2017 21:55:36 +0000 (16:55 -0500)]
bnxt_en: Add RX page mode support.
This mode is to support XDP. In this mode, each rx ring is configured
with page sized buffers for linear placement of each packet. MTU will be
restricted to what the page sized buffers can support.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Mon, 6 Feb 2017 21:55:35 +0000 (16:55 -0500)]
bnxt_en: Parameterize RX buffer offsets.
Convert the global constants BNXT_RX_OFFSET and BNXT_RX_DMA_OFFSET to
device parameters. This will make it easier to support XDP with
headroom support which requires different RX buffer offsets.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Mon, 6 Feb 2017 21:55:34 +0000 (16:55 -0500)]
bnxt_en: Add bp->rx_dir field for rx buffer DMA direction.
When driver is running in XDP mode, rx buffers are DMA mapped as
DMA_BIDIRECTIONAL. Add a field so the code will map/unmap rx buffers
according to this field.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Mon, 6 Feb 2017 21:55:33 +0000 (16:55 -0500)]
bnxt_en: Don't use DEFINE_DMA_UNMAP_ADDR to store DMA address in RX path.
To support XDP_TX, we need the RX buffer's DMA address to transmit the
packet. Convert the DMA address field to a permanent field in
bnxt_sw_rx_bd.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Mon, 6 Feb 2017 21:55:32 +0000 (16:55 -0500)]
bnxt_en: Refactor rx SKB function.
Minor refactoring of bnxt_rx_skb() so that it can easily be replaced by
a new function that handles packets in a single page. Also, use a
function pointer bp->rx_skb_func() to switch to a new function when
we add the new mode in the next patch.
Add a new field data_ptr that points to the packet data in the
bnxt_sw_rx_bd structure. The original data field is changed to void
pointer so that it can either hold the kmalloc'ed data or a page
pointer.
The last parameter of bnxt_rx_skb() which was the length parameter is
changed to include the payload offset of the packet in the upper 16 bit.
The offset is needed to support the rx page mode and is not used in
this existing function.
v3: Added a new data_ptr parameter to bp->rx_skb_func(). The caller
has the option to modify the starting address of the packet. This
will be needed when XDP with headroom support is added.
v2: Changed the name of the last parameter to offset_and_len to make the
code more clear.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Timur Tabi [Mon, 6 Feb 2017 21:34:52 +0000 (15:34 -0600)]
net: qcom/emac: add ethool support for setting pause parameters
To support setting the pause parameters, the driver can no longer just
mirror the PHY. The set_pauseparam feature allows the driver to
force the setting in the MAC, regardless of how the PHY is configured.
This means that we now need to maintain an internal state for pause
frame support, and so get_pauseparam also needs to be updated.
If the interface is already running when the setting is changed, then
the interface is reset.
Note that if the MAC is configured to enable RX pause frame support
(i.e. it transmits pause frames to throttle the other end), but the
PHY is configured to block those frames, then the feature will not work.
Also some buffer size initialization code into emac_init_adapter(),
so that it lives with similar code, including the initializtion of
pause frame support.
Signed-off-by: Timur Tabi <timur@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 7 Feb 2017 18:07:56 +0000 (13:07 -0500)]
Merge branch 'replace-dst_confirm'
Julian Anastasov says:
====================
net: dst_confirm replacement
This patchset addresses the problem of neighbour
confirmation where received replies from one nexthop
can cause confirmation of different nexthop when using
the same dst. Thanks to YueHaibing <yuehaibing@huawei.com>
for tracking the dst->pending_confirm problem.
Sockets can obtain cached output route. Such
routes can be to known nexthop (rt_gateway=IP) or to be
used simultaneously for different nexthop IPs by different
subnet prefixes (nh->nh_scope = RT_SCOPE_HOST, rt_gateway=0).
At first look, there are more problems:
- dst_confirm() sets flag on dst and not on dst->path,
as result, indication is lost when XFRM is used
- DNAT can change the nexthop, so the really used nexthop is
not confirmed
So, the following solution is to avoid using
dst->pending_confirm.
The current dst_confirm() usage is as follows:
Protocols confirming dst on received packets:
- TCP (1 dst per socket)
- SCTP (1 dst per transport)
- CXGB*
Protocols supporting sendmsg with MSG_CONFIRM [ | MSG_PROBE ] to
confirm neighbour:
- UDP IPv4/IPv6
- ICMPv4 PING
- RAW IPv4/IPv6
- L2TP/IPv6
MSG_CONFIRM for other purposes (fix not needed):
- CAN
Sending without locking the socket:
- UDP (when no cork)
- RAW (when hdrincl=1)
Redirects from old to new GW:
- rt6_do_redirect
The patchset includes the following changes:
1. sock: add sk_dst_pending_confirm flag
- used only by TCP with patch 4 to remember the received
indication in sk->sk_dst_pending_confirm
2. net: add dst_pending_confirm flag to skbuff
- skb->dst_pending_confirm will be used by all protocols
in following patches, via skb_{set,get}_dst_pending_confirm
3. sctp: add dst_pending_confirm flag
- SCTP uses per-transport dsts and can not use
sk->sk_dst_pending_confirm like TCP
4. tcp: replace dst_confirm with sk_dst_confirm
5. net: add confirm_neigh method to dst_ops
- IPv4 and IPv6 provision for slow neigh lookups for MSG_PROBE users.
I decided to use neigh lookup only for this case because on
MSG_PROBE the skb may pass MTU checks but it does not reach
the neigh confirmation code. This patch will be used from patch 6.
- xfrm_confirm_neigh: we use the last tunnel address, if present.
When there are only transports, the original dest address is used.
6. net: use dst_confirm_neigh for UDP, RAW, ICMP, L2TP
- dst_confirm conversion for UDP, RAW, ICMP and L2TP/IPv6
- these protocols use MSG_CONFIRM propagated by ip*_append_data
to skb->dst_pending_confirm. sk->sk_dst_pending_confirm is not
used because some sending paths do not lock the socket. For
MSG_PROBE we use the slow lookup (dst_confirm_neigh).
- there are also 2 cases that need the slow lookup:
__ip6_rt_update_pmtu and rt6_do_redirect. I hope
&ipv6_hdr(skb)->saddr is the correct nexthop address to use here.
7. net: pending_confirm is not used anymore
- I failed to understand the CXGB* code, I see dst_confirm()
calls but I'm not sure dst_neigh_output() was called. For now
I just removed the dst->pending_confirm flag and left all
dst_confirm() calls there. Any better idea?
- Now may be old function neigh_output() should be restored
instead of dst_neigh_output?
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Anastasov [Mon, 6 Feb 2017 21:14:17 +0000 (23:14 +0200)]
net: pending_confirm is not used anymore
When same struct dst_entry can be used for many different
neighbours we can not use it for pending confirmations.
As last step, we can remove the pending_confirm flag.
Reported-by: YueHaibing <yuehaibing@huawei.com>
Fixes:
5110effee8fd ("net: Do delayed neigh confirmation.")
Fixes:
f2bb4bedf35d ("ipv4: Cache output routes in fib_info nexthops.")
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Anastasov [Mon, 6 Feb 2017 21:14:16 +0000 (23:14 +0200)]
net: use dst_confirm_neigh for UDP, RAW, ICMP, L2TP
When same struct dst_entry can be used for many different
neighbours we can not use it for pending confirmations.
The datagram protocols can use MSG_CONFIRM to confirm the
neighbour. When used with MSG_PROBE we do not reach the
code where neighbour is confirmed, so we have to do the
same slow lookup by using the dst_confirm_neigh() helper.
When MSG_PROBE is not used, ip_append_data/ip6_append_data
will set the skb flag dst_pending_confirm.
Reported-by: YueHaibing <yuehaibing@huawei.com>
Fixes:
5110effee8fd ("net: Do delayed neigh confirmation.")
Fixes:
f2bb4bedf35d ("ipv4: Cache output routes in fib_info nexthops.")
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Anastasov [Mon, 6 Feb 2017 21:14:15 +0000 (23:14 +0200)]
net: add confirm_neigh method to dst_ops
Add confirm_neigh method to dst_ops and use it from IPv4 and IPv6
to lookup and confirm the neighbour. Its usage via the new helper
dst_confirm_neigh() should be restricted to MSG_PROBE users for
performance reasons.
For XFRM prefer the last tunnel address, if present. With help
from Steffen Klassert.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Anastasov [Mon, 6 Feb 2017 21:14:14 +0000 (23:14 +0200)]
tcp: replace dst_confirm with sk_dst_confirm
When same struct dst_entry can be used for many different
neighbours we can not use it for pending confirmations.
Use the new sk_dst_confirm() helper to propagate the
indication from received packets to sock_confirm_neigh().
Reported-by: YueHaibing <yuehaibing@huawei.com>
Fixes:
5110effee8fd ("net: Do delayed neigh confirmation.")
Fixes:
f2bb4bedf35d ("ipv4: Cache output routes in fib_info nexthops.")
Tested-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Anastasov [Mon, 6 Feb 2017 21:14:13 +0000 (23:14 +0200)]
sctp: add dst_pending_confirm flag
Add new transport flag to allow sockets to confirm neighbour.
When same struct dst_entry can be used for many different
neighbours we can not use it for pending confirmations.
The flag is propagated from transport to every packet.
It is reset when cached dst is reset.
Reported-by: YueHaibing <yuehaibing@huawei.com>
Fixes:
5110effee8fd ("net: Do delayed neigh confirmation.")
Fixes:
f2bb4bedf35d ("ipv4: Cache output routes in fib_info nexthops.")
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Anastasov [Mon, 6 Feb 2017 21:14:12 +0000 (23:14 +0200)]
net: add dst_pending_confirm flag to skbuff
Add new skbuff flag to allow protocols to confirm neighbour.
When same struct dst_entry can be used for many different
neighbours we can not use it for pending confirmations.
Add sock_confirm_neigh() helper to confirm the neighbour and
use it for IPv4, IPv6 and VRF before dst_neigh_output.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Anastasov [Mon, 6 Feb 2017 21:14:11 +0000 (23:14 +0200)]
sock: add sk_dst_pending_confirm flag
Add new sock flag to allow sockets to confirm neighbour.
When same struct dst_entry can be used for many different
neighbours we can not use it for pending confirmations.
As not all call paths lock the socket use full word for
the flag.
Add sk_dst_confirm as replacement for dst_confirm when
called for received packets.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Mon, 6 Feb 2017 21:01:16 +0000 (13:01 -0800)]
net: phy: bcm7xxx: Add BCM74371 PHY ID
Add the BCM74371 PHY ID to the list of supported chips. This is a 28nm
technology Gigabit PHY SoC.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vineet Gupta [Tue, 7 Feb 2017 17:44:58 +0000 (09:44 -0800)]
ARC: [arcompact] brown paper bag bug in unaligned access delay slot fixup
Reported-by: Jo-Philipp Wich <jo@mein.io>
Fixes:
9aed02feae57bf7 ("ARC: [arcompact] handle unaligned access delay slot")
Cc: linux-kernel@vger.kernel.org
Cc: linux-snps-arc@lists.infradead.org
Cc: stable@vger.kernel.org
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Marcelo Ricardo Leitner [Mon, 6 Feb 2017 20:10:31 +0000 (18:10 -0200)]
sctp: avoid BUG_ON on sctp_wait_for_sndbuf
Alexander Popov reported that an application may trigger a BUG_ON in
sctp_wait_for_sndbuf if the socket tx buffer is full, a thread is
waiting on it to queue more data and meanwhile another thread peels off
the association being used by the first thread.
This patch replaces the BUG_ON call with a proper error handling. It
will return -EPIPE to the original sendmsg call, similarly to what would
have been done if the association wasn't found in the first place.
Acked-by: Alexander Popov <alex.popov@linux.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Benjamin Poirier [Mon, 6 Feb 2017 18:14:31 +0000 (10:14 -0800)]
mlx4: Invoke softirqs after napi_reschedule
mlx4 may schedule napi from a workqueue. Afterwards, softirqs are not run
in a deterministic time frame and the following message may be logged:
NOHZ: local_softirq_pending 08
The problem is the same as what was described in commit
ec13ee80145c
("virtio_net: invoke softirqs after __napi_schedule") and this patch
applies the same fix to mlx4.
Fixes:
07841f9d94c1 ("net/mlx4_en: Schedule napi when RX buffers allocation fails")
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Mon, 6 Feb 2017 16:26:30 +0000 (17:26 +0100)]
mlxsw: add psample dependency for spectrum
When PSAMPLE is a loadable module, spectrum must not be built-in:
drivers/net/built-in.o: In function `mlxsw_sp_rx_listener_sample_func':
spectrum.c:(.text+0xe357e): undefined reference to `psample_sample_packet'
This adds a Kconfig dependency to enforce usable configurations.
Fixes:
98d0f7b9acda ("mlxsw: spectrum: Add packet sample offloading support")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Yotam Gigi <yotamg@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Yongjun [Mon, 6 Feb 2017 16:15:05 +0000 (16:15 +0000)]
ipv6: sr: fix non static symbol warnings
Fixes the following sparse warnings:
net/ipv6/seg6_iptunnel.c:58:5: warning:
symbol 'nla_put_srh' was not declared. Should it be static?
net/ipv6/seg6_iptunnel.c:238:5: warning:
symbol 'seg6_input' was not declared. Should it be static?
net/ipv6/seg6_iptunnel.c:254:5: warning:
symbol 'seg6_output' was not declared. Should it be static?
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Yongjun [Mon, 6 Feb 2017 16:07:13 +0000 (16:07 +0000)]
net/sched: act_mirred: remove duplicated include from act_mirred.c
Remove duplicated include.
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Yongjun [Mon, 6 Feb 2017 15:26:17 +0000 (15:26 +0000)]
net: wan: slic_ds26522: Remove .owner field for driver
Remove .owner field if calls are used which set it automatically.
Generated by: scripts/coccinelle/api/platform_no_drv_owner.cocci
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Yongjun [Mon, 6 Feb 2017 15:25:52 +0000 (15:25 +0000)]
net: wan: slic_ds26522: Use module_spi_driver to simplify the code
module_spi_driver() makes the code simpler by eliminating
boilerplate code.
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Sun, 5 Feb 2017 17:25:24 +0000 (09:25 -0800)]
udp: properly cope with csum errors
Dmitry reported that UDP sockets being destroyed would trigger the
WARN_ON(atomic_read(&sk->sk_rmem_alloc)); in inet_sock_destruct()
It turns out we do not properly destroy skb(s) that have wrong UDP
checksum.
Thanks again to syzkaller team.
Fixes :
7c13f97ffde6 ("udp: do fwd memory scheduling on dequeue")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 7 Feb 2017 15:51:48 +0000 (10:51 -0500)]
Merge branch 'dsa2-pdata'
Florian Fainelli says:
====================
net: dsa: Support for pdata in dsa2
This is not exactly new, and was sent before, although back then, I did not
have an user of the pre-declared MDIO board information, but now we do. Note
that I have additional changes queued up to have b53 register platform data for
MIPS bcm47xx and bcm63xx.
Yes I know that we should have the Orion platforms eventually be converted to
Device Tree, but until that happens, I don't want any remaining users of the
old "dsa" platform device (hence the previous DTS submissions for ARM/mvebu)
and, there will be platforms out there that most likely won't never see DT
coming their way (BCM47xx is almost 100% sure, BCM63xx maybe not in a distant
future).
We would probably want the whole series to be merged via David Miller's tree
to simplify things.
Thanks!
Changes in v5:
- dropped changes to drivers/base/ because after more than a month, we cannot
get any answer from Greg KH
Changes in v4:
- Changed device_find_class() to device_find_in_class_name()
- Added kerneldoc above device_find_in_class_name() to explain what it does
and the calling convention regarding device reference counts
- Changed dev_to_net_device to device_to_net_device() added comments
about what it does and the caller conventions regarding reference counts
Changes in v3:
- Tested EPROBE_DEFER from a mockup MDIO/DSA switch driver and everything
is fine, once the driver finally probes we have access to platform data
as expected
- added comment above dsa_port_is_valid() that port->name is mandatory
for platform data cases
- added an extra check in dsa_parse_member() for a NULL pdata pointer
- fixed a bunch of checkpatch errors and warnings
Changes in v2:
- Rebased against latest net-next/master
- Moved dev_find_class() to device_find_class() into drivers/base/core.c
- Moved dev_to_net_device into net/core/dev.c
- Utilize dsa_chip_data directly instead of dsa_platform_data
- Augmented dsa_chip_data to be multi-CPU port ready
Changes from last submission (few months back):
- rebased against latest net-next
- do not introduce dsa2_platform_data which was overkill and was meant to
allow us to do exaclty the same things with platform data and Device Tree
we use the existing dsa_platform_data instead
- properly register MDIO devices when the MDIO bus is registered and associate
platform_data with them
- add a change to the Orion platform code to demonstrate how this can be used
Thank you
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Sat, 4 Feb 2017 21:02:45 +0000 (13:02 -0800)]
ARM: orion: Register DSA switch as a MDIO device
Utilize the ability to pass board specific MDIO bus information towards a
particular MDIO device thus allowing us to provide the per-port switch layout
to the Marvell 88E6XXX switch driver.
Since we would end-up with conflicting registration paths, do not register the
"dsa" platform device anymore.
Note that the MDIO devices registered by code in net/dsa/dsa2.c does not
parse a dsa_platform_data, but directly take a dsa_chip_data (specific
to a single switch chip), so we update the different call sites to pass
this structure down to orion_ge00_switch_init().
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Sat, 4 Feb 2017 21:02:44 +0000 (13:02 -0800)]
net: phy: Allow pre-declaration of MDIO devices
Allow board support code to collect pre-declarations for MDIO devices by
registering them with mdiobus_register_board_info(). SPI and I2C buses
have a similar feature, we were missing this for MDIO devices, but this
is particularly useful for e.g: MDIO-connected switches which need to
provide their port layout (often board-specific) to a MDIO Ethernet
switch driver.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Sat, 4 Feb 2017 21:02:43 +0000 (13:02 -0800)]
net: dsa: Add support for platform data
Allow drivers to use the new DSA API with platform data. Most of the
code in net/dsa/dsa2.c does not rely so much on device_nodes and can get
the same information from platform_data instead.
We purposely do not support distributed configurations with platform
data, so drivers should be providing a pointer to a 'struct
dsa_chip_data' structure if they wish to communicate per-port layout.
Multiple CPUs port could potentially be supported and dsa_chip_data is
extended to receive up to one reference to an upstream network device
per port described by a dsa_chip_data structure.
dsa_dev_to_net_device() increments the network device's reference count,
so we intentionally call dev_put() to be consistent with the DT-enabled
path, until we have a generic notifier based solution.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Sat, 4 Feb 2017 21:02:42 +0000 (13:02 -0800)]
net: dsa: Rename and export dev_to_net_device()
In preparation for using this function in net/dsa/dsa2.c, rename the function
to make its scope DSA specific, and export it.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Sat, 4 Feb 2017 19:15:28 +0000 (20:15 +0100)]
net: dsa: mv88e6xxx: Refactor remaining port setup
Move the remaining port configuration code which varies per device
into port.c, using ops were necessary. This makes
mv88e6xxx_6185_family() and mv88e6xxx_6095_family() unused, so remove
them.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Sat, 4 Feb 2017 19:12:24 +0000 (20:12 +0100)]
net: dsa: mv88e6xxx: Implement Clause 45 access to SMI devices
The mv88e6390 MDIO bus controllers can support for clause 45 accesses.
The internal SERDES interfaces need this, and it is likely external
10GHz PHYs will be clause 45.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 7 Feb 2017 15:34:43 +0000 (10:34 -0500)]
Merge branch 'mv88e6390-CMODE'
Andrew Lunn says:
====================
Set the CMODE for mv88e6390 ports
The mv88e6390 ports 9 & 10 allow there CMODE to be set. CMODE is part
of what linux defines as phy-mode. Add the needed phy-modes to linux,
and add code which will act upon the phy-mode property to configure
the switch port.
These patches have been posted before as part of a bigger patchset
which has now been broken up. I've added the received reviewed by
tags, and added device tree documentation.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Sat, 4 Feb 2017 19:02:50 +0000 (20:02 +0100)]
net: dsa: mv88e6xxx: Set the CMODE for mv88e6390 ports 9 & 10
Unlike most ports, ports 9 and 10 of the 6390X family have configurable
PHY modes. Set the mode as part of adjust_link().
Ordering is important, because the SERDES interfaces connected to
ports 9 and 10 can be split and assigned to other ports. The CMODE has
to be correctly set before the SERDES interface on another port can be
configured. Such configuration is likely to be performed in
port_enable() and port_disabled(), called on slave_open() and
slave_close().
The simple case is port 9 and 10 are used for 'CPU' or 'DSA'. In this
case, the CMODE is set via a phy-mode in dsa_cpu_dsa_setup(), which is
called early in the switch setup.
When ports 9 or 10 are used as user ports, and have a fixed-phy, when
the fixed fixed-phy is attached, dsa_slave_adjust_link() is called,
which results in the adjust_link function being called, setting the
cmode. The port_enable() will for other ports will be called much
later.
When ports 9 or 10 are used as user ports and have a real phy attached
which does not use all the available SERDES interface, e.g. a 1Gbps
SGMII, there is currently no mechanism in place to set the CMODE of
the port from software. It must be hoped the stripping resistors are
correct.
At the same time, add a function to get the cmode. This will be needed
when configuring the SERDES interfaces.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Sat, 4 Feb 2017 19:02:49 +0000 (20:02 +0100)]
net: phy: Add 2000base-x, 2500base-x and rxaui modes
The mv88e6390 ports 9 and 10 supports some additional PHY modes. Add
these modes to the PHY core so they can be used in the binding.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>