John Fastabend [Sat, 5 Aug 2017 05:02:19 +0000 (22:02 -0700)]
bpf: devmap fix mutex in rcu critical section
Originally we used a mutex to protect concurrent devmap update
and delete operations from racing with netdev unregister notifier
callbacks.
The notifier hook is needed because we increment the netdev ref
count when a dev is added to the devmap. This ensures the netdev
reference is valid in the datapath. However, we don't want to block
unregister events, hence the initial mutex and notifier handler.
The concern was in the notifier hook we search the map for dev
entries that hold a refcnt on the net device being torn down. But,
in order to do this we require two steps,
(i) dereference the netdev: dev = rcu_dereference(map[i])
(ii) test ifindex: dev->ifindex == removing_ifindex
and then finally we can swap in the NULL dev in the map via an
xchg operation,
xchg(map[i], NULL)
The danger here is a concurrent update could run a different
xchg op concurrently leading us to replace the new dev with a
NULL dev incorrectly.
CPU 1 CPU 2
notifier hook bpf devmap update
dev = rcu_dereference(map[i])
dev = rcu_dereference(map[i])
xchg(map[i]), new_dev);
rcu_call(dev,...)
xchg(map[i], NULL)
The above flow would create the incorrect state with the dev
reference in the update path being lost. To resolve this the
original code used a mutex around the above block. However,
updates, deletes, and lookups occur inside rcu critical sections
so we can't use a mutex in this context safely.
Fortunately, by writing slightly better code we can avoid the
mutex altogether. If CPU 1 in the above example uses a cmpxchg
and _only_ replaces the dev reference in the map when it is in
fact the expected dev the race is removed completely. The two
cases being illustrated here, first the race condition,
CPU 1 CPU 2
notifier hook bpf devmap update
dev = rcu_dereference(map[i])
dev = rcu_dereference(map[i])
xchg(map[i]), new_dev);
rcu_call(dev,...)
odev = cmpxchg(map[i], dev, NULL)
Now we can test the cmpxchg return value, detect odev != dev and
abort. Or in the good case,
CPU 1 CPU 2
notifier hook bpf devmap update
dev = rcu_dereference(map[i])
odev = cmpxchg(map[i], dev, NULL)
[...]
Now 'odev == dev' and we can do proper cleanup.
And viola the original race we tried to solve with a mutex is
corrected and the trace noted by Sasha below is resolved due
to removal of the mutex.
Note: When walking the devmap and removing dev references as needed
we depend on the core to fail any calls to dev_get_by_index() using
the ifindex of the device being removed. This way we do not race with
the user while searching the devmap.
Additionally, the mutex was also protecting list add/del/read on
the list of maps in-use. This patch converts this to an RCU list
and spinlock implementation. This protects the list from concurrent
alloc/free operations. The notifier hook walks this list so it uses
RCU read semantics.
BUG: sleeping function called from invalid context at kernel/locking/mutex.c:747
in_atomic(): 1, irqs_disabled(): 0, pid: 16315, name: syz-executor1
1 lock held by syz-executor1/16315:
#0: (rcu_read_lock){......}, at: [<
ffffffff8c363bc2>] map_delete_elem kernel/bpf/syscall.c:577 [inline]
#0: (rcu_read_lock){......}, at: [<
ffffffff8c363bc2>] SYSC_bpf kernel/bpf/syscall.c:1427 [inline]
#0: (rcu_read_lock){......}, at: [<
ffffffff8c363bc2>] SyS_bpf+0x1d32/0x4ba0 kernel/bpf/syscall.c:1388
Fixes:
2ddf71e23cc2 ("net: add notifier hooks for devmap bpf map")
Reported-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 7 Aug 2017 21:12:18 +0000 (14:12 -0700)]
Merge branch 'net_sched-clean-up-filter-handle'
Cong Wang says:
====================
net_sched: clean up filter handle
This patchset sits in my local branch for a long time, it is time to
send it out. It cleans up the ambiguous use of 'unsigned long fh',
please see each of them for details.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
WANG Cong [Sat, 5 Aug 2017 04:31:43 +0000 (21:31 -0700)]
net_sched: use void pointer for filter handle
Now we use 'unsigned long fh' as a pointer in every place,
it is safe to convert it to a void pointer now. This gets
rid of many casts to pointer.
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
WANG Cong [Sat, 5 Aug 2017 04:31:42 +0000 (21:31 -0700)]
net_sched: refactor notification code for RTM_DELTFILTER
It is confusing to use 'unsigned long fh' as both a handle
and a pointer, especially commit
9ee7837449b3
("net sched filters: fix notification of filter delete with proper handle").
This patch introduces tfilter_del_notify() so that we can
pass it as a pointer as before, and we don't need to check
RTM_DELTFILTER in tcf_fill_node() any more.
This prepares for the next patch.
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Roopa Prabhu [Sat, 5 Aug 2017 01:19:18 +0000 (18:19 -0700)]
lwtunnel: replace EXPORT_SYMBOL with EXPORT_SYMBOL_GPL
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 7 Aug 2017 21:09:48 +0000 (14:09 -0700)]
Merge branch 'bpf-add-support-for-sys-enter-exit-tracepoints'
Yonghong Song says:
====================
bpf: add support for sys_{enter|exit}_* tracepoints
Currently, bpf programs cannot be attached to sys_enter_* and sys_exit_*
style tracepoints. The main reason is that syscalls/sys_enter_* and syscalls/sys_exit_*
tracepoints are treated differently from other tracepoints and there
is no bpf hook to it.
This patch set adds bpf support for these syscalls tracepoints and also
adds a test case for it.
Changelogs:
v3 -> v4:
- Check the legality of ctx offset access for syscall tracepoint as well.
trace_event_get_offsets will return correct max offset for each
specific syscall tracepoint.
- Use variable length array to avoid hardcode 6 as the maximum
arguments beyond syscall_nr.
v2 -> v3:
- Fix a build issue
v1 -> v2:
- Do not use TRACE_EVENT_FL_CAP_ANY to identify syscall tracepoint.
Instead use trace_event_call->class.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Yonghong Song [Fri, 4 Aug 2017 23:00:10 +0000 (16:00 -0700)]
bpf: add a test case for syscalls/sys_{enter|exit}_* tracepoints
Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yonghong Song [Fri, 4 Aug 2017 23:00:09 +0000 (16:00 -0700)]
bpf: add support for sys_enter_* and sys_exit_* tracepoints
Currently, bpf programs cannot be attached to sys_enter_* and sys_exit_*
style tracepoints. The iovisor/bcc issue #748
(https://github.com/iovisor/bcc/issues/748) documents this issue.
For example, if you try to attach a bpf program to tracepoints
syscalls/sys_enter_newfstat, you will get the following error:
# ./tools/trace.py t:syscalls:sys_enter_newfstat
Ioctl(PERF_EVENT_IOC_SET_BPF): Invalid argument
Failed to attach BPF to tracepoint
The main reason is that syscalls/sys_enter_* and syscalls/sys_exit_*
tracepoints are treated differently from other tracepoints and there
is no bpf hook to it.
This patch adds bpf support for these syscalls tracepoints by
. permitting bpf attachment in ioctl PERF_EVENT_IOC_SET_BPF
. calling bpf programs in perf_syscall_enter and perf_syscall_exit
The legality of bpf program ctx access is also checked.
Function trace_event_get_offsets returns correct max offset for each
specific syscall tracepoint, which is compared against the maximum offset
access in bpf program.
Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sergei Shtylyov [Fri, 4 Aug 2017 21:43:43 +0000 (00:43 +0300)]
of_mdio: use of_property_read_u32_array()
The "fixed-link" prop support predated of_property_read_u32_array(), so
basically had to open-code it. Using the modern API saves 24 bytes of the
object code (ARM gcc 4.8.5); the only behavior change would be that the
prop length check is now less strict (however the strict pre-check done
in of_phy_is_fixed_link() is left intact anyway)...
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
John Allen [Mon, 7 Aug 2017 20:42:30 +0000 (15:42 -0500)]
ibmvnic: Report rx buffer return codes as netdev_dbg
Reporting any return code for a receive buffer as an "rx error" only
produces alarming noise and the only values that have been observed to be
used in this field are not error conditions. Change this to a netdev_dbg
with a more descriptive message.
Signed-off-by: John Allen <jallen@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 7 Aug 2017 18:39:22 +0000 (11:39 -0700)]
Merge branch 'net-l3mdev-Support-for-sockets-bound-to-enslaved-device'
David Ahern says:
====================
net: l3mdev: Support for sockets bound to enslaved device
A missing piece to the VRF puzzle is the ability to bind sockets to
devices enslaved to a VRF. This patch set adds the enslaved device
index, sdif, to IPv4 and IPv6 socket lookups. The end result for users
is the following scope options for services:
1. "global" services - sockets not bound to any device
Allows 1 service to work across all network interfaces with
connected sockets bound to the VRF the connection originates
(Requires net.ipv4.tcp_l3mdev_accept=1 for TCP and
net.ipv4.udp_l3mdev_accept=1 for UDP)
2. "VRF" local services - sockets bound to a VRF
Sockets work across all network interfaces enslaved to a VRF but
are limited to just the one VRF.
3. "device" services - sockets bound to a specific network interface
Service works only through the one specific interface.
v3
- convert __inet_lookup_established in dccp_v4_err; missed in v2
v2
- remove sk_lookup struct and add sdif as an argument to existing
functions
Changes since RFC:
- no significant logic changes; mainly whitespace cleanups
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Mon, 7 Aug 2017 15:44:22 +0000 (08:44 -0700)]
net: ipv6: add second dif to raw socket lookups
Add a second device index, sdif, to raw socket lookups. sdif is the
index for ingress devices enslaved to an l3mdev. It allows the lookups
to consider the enslaved device as well as the L3 domain when searching
for a socket.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Mon, 7 Aug 2017 15:44:21 +0000 (08:44 -0700)]
net: ipv6: add second dif to inet6 socket lookups
Add a second device index, sdif, to inet6 socket lookups. sdif is the
index for ingress devices enslaved to an l3mdev. It allows the lookups
to consider the enslaved device as well as the L3 domain when searching
for a socket.
TCP moves the data in the cb. Prior to tcp_v4_rcv (e.g., early demux) the
ingress index is obtained from IPCB using inet_sdif and after tcp_v4_rcv
tcp_v4_sdif is used.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Mon, 7 Aug 2017 15:44:20 +0000 (08:44 -0700)]
net: ipv6: add second dif to udp socket lookups
Add a second device index, sdif, to udp socket lookups. sdif is the
index for ingress devices enslaved to an l3mdev. It allows the lookups
to consider the enslaved device as well as the L3 domain when searching
for a socket.
Early demux lookups are handled in the next patch as part of INET_MATCH
changes.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Mon, 7 Aug 2017 15:44:19 +0000 (08:44 -0700)]
net: ipv4: add second dif to multicast source filter
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Mon, 7 Aug 2017 15:44:18 +0000 (08:44 -0700)]
net: ipv4: add second dif to raw socket lookups
Add a second device index, sdif, to raw socket lookups. sdif is the
index for ingress devices enslaved to an l3mdev. It allows the lookups
to consider the enslaved device as well as the L3 domain when searching
for a socket.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Mon, 7 Aug 2017 15:44:17 +0000 (08:44 -0700)]
net: ipv4: add second dif to inet socket lookups
Add a second device index, sdif, to inet socket lookups. sdif is the
index for ingress devices enslaved to an l3mdev. It allows the lookups
to consider the enslaved device as well as the L3 domain when searching
for a socket.
TCP moves the data in the cb. Prior to tcp_v4_rcv (e.g., early demux) the
ingress index is obtained from IPCB using inet_sdif and after the cb move
in tcp_v4_rcv the tcp_v4_sdif helper is used.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Mon, 7 Aug 2017 15:44:16 +0000 (08:44 -0700)]
net: ipv4: add second dif to udp socket lookups
Add a second device index, sdif, to udp socket lookups. sdif is the
index for ingress devices enslaved to an l3mdev. It allows the lookups
to consider the enslaved device as well as the L3 domain when searching
for a socket.
Early demux lookups are handled in the next patch as part of INET_MATCH
changes.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 7 Aug 2017 18:34:41 +0000 (11:34 -0700)]
Merge tag 'wireless-drivers-next-for-davem-2017-08-07' of git://git./linux/kernel/git/kvalo/wireless-drivers-next
Kalle Valo says:
====================
wireless-drivers-next patches for 4.14
The first wireless-drivers-next pull request for 4.14. I'm submitting
this unusally late in the cycle as my vacation postponed this. But
even if this is late there's not still that much new features, mostly
cleanup or fixes.
Major changes:
ath10k
* preparation for wcn3990 support
iwlwifi
* Reorganization of the code into separate directories continues
qtnfmac
* regulatory support updates
* add get_channel, dump_survey and channel_switch cfg80211 handlers
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Mon, 7 Aug 2017 10:41:53 +0000 (12:41 +0200)]
hns3: fix unused function warning
Without CONFIG_PCI_IOV, we get a harmless warning about an
unused function:
drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c:2273:13: error: 'hclge_disable_sriov' defined but not used [-Werror=unused-function]
The #ifdefs in this driver are obviously wrong, so this just
removes them and uses an IS_ENABLED() check that does the same
thing correctly in a more readable way.
Fixes:
46a3df9f9718 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 7 Aug 2017 17:42:09 +0000 (10:42 -0700)]
Merge tag 'mlx5-shared-2017-08-07' of git://git./linux/kernel/git/mellanox/linux
Saeed Mahameed says:
====================
mlx5-shared-2017-08-07
This series includes some mlx5 updates for both net-next and rdma trees.
From Saeed,
Core driver updates to allow selectively building the driver with
or without some large driver components, such as
- E-Switch (Ethernet SRIOV support).
- Multi-Physical Function Switch (MPFs) support.
For that we split E-Switch and MPFs functionalities into separate files.
From Erez,
Delay mlx5_core events when mlx5 interfaces, namely mlx5_ib, registration
is taking place and until it completes.
From Rabie,
Increase the maximum supported flow counters.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 7 Aug 2017 16:42:37 +0000 (09:42 -0700)]
Merge branch 'net-sched-summer-cleanup-part-2-ndo_setup_tc'
Jiri Pirko says:
====================
net: sched: summer cleanup part 2, ndo_setup_tc
This patchset focuses on ndo_setup_tc and its args.
Currently there are couple of things that do not make much sense.
The type is passed in struct tc_to_netdev, but as it is always
required, should be arg of the ndo. Other things are passed as args
but they are only relevant for cls offloads and not mqprio. Therefore,
they should be pushed to struct. As the tc_to_netdev struct in the end
is just a container of single pointer, we get rid of it and pass the
struct according to type. So in the end, we have:
ndo_setup_tc(dev, type, type_data_struct)
There are couple of cosmetics done on the way to make things smooth.
Also, reported error is consolidated to eopnotsupp in case the
asked offload is not supported.
v1->v2:
- added forgotten hns3pf bits
====================
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Mon, 7 Aug 2017 08:15:32 +0000 (10:15 +0200)]
net: sched: get rid of struct tc_to_netdev
Get rid of struct tc_to_netdev which is now just unnecessary container
and rather pass per-type structures down to drivers directly.
Along with that, consolidate the naming of per-type structure variables
in cls_*.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Mon, 7 Aug 2017 08:15:31 +0000 (10:15 +0200)]
net: sched: change return value of ndo_setup_tc for driver supporting mqprio only
Change the return value from -EINVAL to -EOPNOTSUPP. The rest of the
drivers have it like that, so be aligned.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Mon, 7 Aug 2017 08:15:30 +0000 (10:15 +0200)]
net: sched: move prio into cls_common
prio is not cls_flower specific, but it is meaningful for all
classifiers. Seems that only mlxsw cares about the value. Obviously,
cls offload in other drivers is broken.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Mon, 7 Aug 2017 08:15:29 +0000 (10:15 +0200)]
net: sched: push cls related args into cls_common structure
As ndo_setup_tc is generic offload op for whole tc subsystem, does not
really make sense to have cls-specific args. So move them under
cls_common structurure which is embedded in all cls structs.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Mon, 7 Aug 2017 08:15:28 +0000 (10:15 +0200)]
hns3pf: don't check handle during mqprio offload
Similar to the rest offloaders of mqprio, no need to check handle.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Mon, 7 Aug 2017 08:15:27 +0000 (10:15 +0200)]
nfp: change flows in apps that offload ndo_setup_tc
Change the flows a bit in preparation of follow-up changes in
ndo_setup_tc args. Also, change the error code to align with the rest of
the drivers.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Mon, 7 Aug 2017 08:15:26 +0000 (10:15 +0200)]
dsa: push cls_matchall setup_tc processing into a separate function
Let dsa_slave_setup_tc be a splitter for specific setup_tc types and
push out cls_matchall specific code into a separate function.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Mon, 7 Aug 2017 08:15:25 +0000 (10:15 +0200)]
mlxsw: spectrum: rename cls arg in matchall processing
To sync-up with the naming in the rest of the driver, rename the cls arg.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Mon, 7 Aug 2017 08:15:24 +0000 (10:15 +0200)]
mlxsw: spectrum: push cls_flower and cls_matchall setup_tc processing into separate functions
Let mlxsw_sp_setup_tc be a splitter for specific setup_tc types and push
out cls_flower and cls_matchall specific codes into separate functions.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Mon, 7 Aug 2017 08:15:23 +0000 (10:15 +0200)]
mlx5e_rep: push cls_flower setup_tc processing into a separate function
Let mlx5e_rep_setup_tc (former mlx5e_rep_ndo_setup_tc) be a splitter for
specific setup_tc types and push out cls_flower specific code into
a separate function.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Mon, 7 Aug 2017 08:15:22 +0000 (10:15 +0200)]
mlx5e: push cls_flower and mqprio setup_tc processing into separate functions
Let mlx5e_setup_tc (former mlx5e_ndo_setup_tc) be a splitter for specific
setup_tc types and push out cls_flower and mqprio specific codes into
separate functions. Also change the return values so they are the same
as in the rest of the drivers.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Mon, 7 Aug 2017 08:15:21 +0000 (10:15 +0200)]
ixgbe: push cls_u32 and mqprio setup_tc processing into separate functions
Let __ixgbe_setup_tc be a splitter for specific setup_tc types and push out
cls_u32 and mqprio specific codes into separate functions. Also change
the return values so they are the same as in the rest of the drivers.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Mon, 7 Aug 2017 08:15:20 +0000 (10:15 +0200)]
cxgb4: push cls_u32 setup_tc processing into a separate function
Let cxgb_setup_tc be a splitter for specific setup_tc types and push out
cls_u32 specific code into a separate function.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Mon, 7 Aug 2017 08:15:19 +0000 (10:15 +0200)]
net: sched: make egress_dev flag part of flower offload struct
Since this is specific to flower now, make it part of the flower offload
struct.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Mon, 7 Aug 2017 08:15:18 +0000 (10:15 +0200)]
net: sched: rename TC_SETUP_MATCHALL to TC_SETUP_CLSMATCHALL
In order to be aligned with the rest of the types, rename
TC_SETUP_MATCHALL to TC_SETUP_CLSMATCHALL.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Mon, 7 Aug 2017 08:15:17 +0000 (10:15 +0200)]
net: sched: make type an argument for ndo_setup_tc
Since the type is always present, push it to be a separate argument to
ndo_setup_tc. On the way, name the type enum and use it for arg type.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rabie Loulou [Sun, 9 Jul 2017 10:39:30 +0000 (13:39 +0300)]
net/mlx5: Increase the maximum flow counters supported
Read new NIC capability field which represnts 16 MSBs of the max flow
counters number supported (max_flow_counter_31_16).
Backward compatibility with older firmware is preserved, the modified
driver reads max_flow_counter_31_16 as 0 from the older firmware and
uses up to 64K counters.
Changed flow counter id from 16 bits to 32 bits. Backward compatibility
with older firmware is preserved as we kept the 16 LSBs of the counter
id in place and added 16 MSBs from reserved field.
Changed the background bulk reading of flow counters to work in chunks
of at most 32K counters, to make sure we don't attempt to allocate very
large buffers.
Signed-off-by: Rabie Loulou <rabiel@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Rabie Loulou [Mon, 10 Jul 2017 11:35:10 +0000 (14:35 +0300)]
net/mlx5: Fix counter list hardware structure
The counter list hardware structure doesn't contain a clear and
num_of_counters fields, remove them.
These wrong fields were never used by the driver hence no other driver
changes.
Fixes:
a351a1b03bf1 ("net/mlx5: Introduce bulk reading of flow counters")
Signed-off-by: Rabie Loulou <rabiel@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Erez Shitrit [Wed, 7 Jun 2017 09:14:24 +0000 (12:14 +0300)]
net/mlx5: Delay events till ib registration ends
When mlx5_ib registers itself to mlx5_core as an interface, it will
call mlx5_add_device which will call mlx5_ib interface add callback,
in case the latter successfully returns, only then mlx5_core will add
it to the interface list and async events will be forwarded to mlx5_ib.
Between mlx5_ib interface add callback and mlx5_core adding the mlx5_ib
interface to its devices list, arriving mlx5_core events can be missed
by the new mlx5_ib registering interface.
In other words:
thread 1: mlx5_ib: mlx5_register_interface(dev)
thread 1: mlx5_core: mlx5_add_device(dev)
thread 1: mlx5_core: ctx = dev->add => (mlx5_ib)->mlx5_ib_add
thread 2: mlx5_core_event: **new event arrives, forward to dev_list
thread 1: mlx5_core: add_ctx_to_dev_list(ctx)
/* previous event was missed by the new interface.*/
It is ok to miss events before dev->add (mlx5_ib)->mlx5_ib_add_device
but not after.
We fix this race by accumulating the events that come between the
ib_register_device (inside mlx5_add_device->(dev->add)) till the adding
to the list completes and fire them to the new registering interface
after that.
Fixes:
f1ee87fe55c8 ("net/mlx5: Organize device list API in one place")
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Saeed Mahameed [Mon, 5 Jun 2017 12:17:12 +0000 (15:17 +0300)]
net/mlx5: Add CONFIG_MLX5_ESWITCH Kconfig
Allow to selectively build the driver with or without sriov eswitch, VF
representors and TC offloads.
Also remove the need of two ndo ops structures (sriov & basic)
and keep only one unified ndo ops, compile out VF SRIOV ndos when not
needed (MLX5_ESWITCH=n), and for VF netdev calling those ndos will result
in returning -EPERM.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Cc: Jes Sorensen <jsorensen@fb.com>
Cc: kernel-team@fb.com
Saeed Mahameed [Sun, 4 Jun 2017 20:11:55 +0000 (23:11 +0300)]
net/mlx5: Separate between E-Switch and MPFS
Multi-Physical Function Switch (MPFs) is required for when multi-PF
configuration is enabled to allow passing user configured unicast MAC
addresses to the requesting PF.
Before this patch eswitch.c used to manage the HW MPFS l2 table,
E-Switch always (regardless of sriov) enabled vport(0) (NIC PF) vport's
contexts update on unicast mac address list changes, to populate the PF's
MPFS L2 table accordingly.
In downstream patch we would like to allow compiling the driver without
E-Switch functionalities, for that we move MPFS l2 table logic out
of eswitch.c into its own file, and provide Kconfig flag (MLX5_MPFS) to
allow compiling out MPFS for those who don't want Multi-PF support.
NIC PF netdevice will now directly update MPFS l2 table via the new MPFS
API. VF netdevice has no access to MPFS L2 table, so E-Switch will remain
responsible of updating its MPFS l2 table on behalf of its VFs.
Due to this change we also don't require enabling vport(0) (PF vport)
unicast mac changes events anymore, for when SRIOV is not enabled.
Which means E-Switch is now activated only on SRIOV activation, and not
required otherwise.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Cc: Jes Sorensen <jsorensen@fb.com>
Cc: kernel-team@fb.com
Saeed Mahameed [Sun, 11 Jun 2017 16:05:10 +0000 (19:05 +0300)]
net/mlx5: Unify vport manager capability check
Expose MLX5_VPORT_MANAGER macro to check for strict vport manager
E-switch and MPFS (Multi Physical Function Switch) abilities.
VPORT manager must be a PF with an ethernet link and with FW advertised
vport group manager capability
Replace older checks with the new macro and use it where needed in
eswitch.c and mlx5e netdev eswitch related flows.
The same macro will be reused in MPFS separation downstream patch.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Saeed Mahameed [Tue, 6 Jun 2017 06:12:04 +0000 (09:12 +0300)]
net/mlx5e: NIC netdev init flow cleanup
Remove redundant call to unregister vport representor in mlx5e_add error
flow.
Hide the representor priv and eswitch internal structures from en_main.c
as preparation step for downstream patches which would allow building
the driver without support for representors and eswitch.
Fixes:
6f08a22c5fb2 ("net/mlx5e: Register/unregister vport representors on interface attach/detach")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Saeed Mahameed [Tue, 6 Jun 2017 14:46:49 +0000 (17:46 +0300)]
net/mlx5e: Rearrange netdevice ops structures
Since we are going to allow building the driver without eswitch support,
it would be possible to compile out the sriov netdevice ops struct such
that the basic ops instance will be used for non VF devices too.
Add missing udp tunnel ndos into mlx5e_netdev_ops_basic.
While here, rearrange some ndos in the sriov ops struct and put
vf/eswitch related ndos towards the end of it.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
David S. Miller [Mon, 7 Aug 2017 04:33:43 +0000 (21:33 -0700)]
Merge branch 'sctp-remove-typedefs-from-structures-part-5'
Xin Long says:
====================
sctp: remove typedefs from structures part 5
As we know, typedef is suggested not to use in kernel, even checkpatch.pl
also gives warnings about it. Now sctp is using it for many structures.
All this kind of typedef's using should be removed. This patchset is the
part 5 to remove all typedefs in include/net/sctp/constants.h.
Just as the part 1-4, No any code's logic would be changed in these patches,
only cleaning up.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Sat, 5 Aug 2017 12:00:04 +0000 (20:00 +0800)]
sctp: remove the typedef sctp_subtype_t
This patch is to remove the typedef sctp_subtype_t, and
replace with union sctp_subtype in the places where it's
using this typedef.
Note that it doesn't fix many indents although it should,
as sctp_disposition_t's removal would mess them up again.
So better to fix them when removing sctp_disposition_t in
later patch.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Sat, 5 Aug 2017 12:00:03 +0000 (20:00 +0800)]
sctp: remove the typedef sctp_event_t
This patch is to remove the typedef sctp_event_t, and
replace with enum sctp_event in the places where it's
using this typedef.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Sat, 5 Aug 2017 12:00:02 +0000 (20:00 +0800)]
sctp: remove the typedef sctp_event_timeout_t
This patch is to remove the typedef sctp_event_timeout_t, and
replace with enum sctp_event_timeout in the places where it's
using this typedef.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Sat, 5 Aug 2017 12:00:01 +0000 (20:00 +0800)]
sctp: remove the typedef sctp_event_other_t
This patch is to remove the typedef sctp_event_other_t, and
replace with enum sctp_event_other in the places where it's
using this typedef.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Sat, 5 Aug 2017 12:00:00 +0000 (20:00 +0800)]
sctp: remove the typedef sctp_event_primitive_t
This patch is to remove the typedef sctp_event_primitive_t, and
replace with enum sctp_event_primitive in the places where it's
using this typedef.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Sat, 5 Aug 2017 11:59:59 +0000 (19:59 +0800)]
sctp: remove the typedef sctp_state_t
This patch is to remove the typedef sctp_state_t, and
replace with enum sctp_state in the places where it's
using this typedef.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Sat, 5 Aug 2017 11:59:58 +0000 (19:59 +0800)]
sctp: remove the typedef sctp_ierror_t
This patch is to remove the typedef sctp_ierror_t, and
replace with enum sctp_ierror in the places where it's
using this typedef.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Sat, 5 Aug 2017 11:59:57 +0000 (19:59 +0800)]
sctp: remove the typedef sctp_xmit_t
This patch is to remove the typedef sctp_xmit_t, and
replace with enum sctp_xmit in the places where it's
using this typedef.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Sat, 5 Aug 2017 11:59:56 +0000 (19:59 +0800)]
sctp: remove the typedef sctp_sock_state_t
This patch is to remove the typedef sctp_sock_state_t, and
replace with enum sctp_sock_state in the places where it's
using this typedef.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Sat, 5 Aug 2017 11:59:55 +0000 (19:59 +0800)]
sctp: remove the typedef sctp_transport_cmd_t
This patch is to remove the typedef sctp_transport_cmd_t, and
replace with enum sctp_transport_cmd in the places where it's
using this typedef.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Sat, 5 Aug 2017 11:59:54 +0000 (19:59 +0800)]
sctp: remove the typedef sctp_scope_t
This patch is to remove the typedef sctp_scope_t, and
replace with enum sctp_scope in the places where it's
using this typedef.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Sat, 5 Aug 2017 11:59:53 +0000 (19:59 +0800)]
sctp: remove the typedef sctp_scope_policy_t
This patch is to remove the typedef sctp_scope_policy_t and keep
it's members as an anonymous enum.
It is also to define SCTP_SCOPE_POLICY_MAX to replace the num 3
in sysctl.c to make codes clear.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Sat, 5 Aug 2017 11:59:52 +0000 (19:59 +0800)]
sctp: remove the typedef sctp_retransmit_reason_t
This patch is to remove the typedef sctp_retransmit_reason_t, and
replace with enum sctp_retransmit_reason in the places where it's
using this typedef.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Sat, 5 Aug 2017 11:59:51 +0000 (19:59 +0800)]
sctp: remove the typedef sctp_lower_cwnd_t
This patch is to remove the typedef sctp_lower_cwnd_t, and
replace with enum sctp_lower_cwnd in the places where it's
using this typedef.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexandru Gagniuc [Fri, 4 Aug 2017 20:08:52 +0000 (13:08 -0700)]
dt-bindings: net: Document bindings for anarion-gmac
Signed-off-by: Alexandru Gagniuc <alex.g@adaptrum.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexandru Gagniuc [Fri, 4 Aug 2017 20:08:51 +0000 (13:08 -0700)]
net: stmmac: Add Adaptrum Anarion GMAC glue layer
Before the GMAC on the Anarion chip can be used, the PHY interface
selection must be configured with the DWMAC block in reset.
This layer covers a block containing only two registers. Although it
is possible to model this as a reset controller and use the "resets"
property of stmmac, it's much more intuitive to include this in the
glue layer instead.
At this time only RGMII is supported, because it is the only mode
which has been validated hardware-wise.
Signed-off-by: Alexandru Gagniuc <alex.g@adaptrum.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
stephen hemminger [Fri, 4 Aug 2017 19:14:00 +0000 (12:14 -0700)]
netvsc: fix rtnl deadlock on unregister of vf
With new transparent VF support, it is possible to get a deadlock
when some of the deferred work is running and the unregister_vf
is trying to cancel the work element. The solution is to use
trylock and reschedule (similar to bonding and team device).
Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Fixes:
0c195567a8f6 ("netvsc: transparent VF management")
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Fri, 4 Aug 2017 04:33:27 +0000 (21:33 -0700)]
net: dsa: User per-cpu 64-bit statistics
During testing with a background iperf pushing 1Gbit/sec worth of
traffic and having both ifconfig and ethtool collect statistics, we
could see quite frequent deadlocks. Convert the often accessed DSA slave
network devices statistics to per-cpu 64-bit statistics to remove these
deadlocks and provide fast efficient statistics updates.
Fixes:
f613ed665bb3 ("net: dsa: Add support for 64-bit statistics")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 7 Aug 2017 04:25:10 +0000 (21:25 -0700)]
Merge branch 'tcp-cwnd-undo-refactor'
Yuchung Cheng says:
====================
tcp cwnd undo refactor
This patch series consolidate similar cwnd undo functions
implemented by various congestion control by using existing
tcp socket state variable. The first patch fixes a corner
case in of cwnd undo in Reno and HTCP. Since the bug has
existed for many years and is very minor, we consider this
patch set more suitable for net-next as the major change
is the refactor itself.
- v1->v2
Fix trivial compile errors
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuchung Cheng [Fri, 4 Aug 2017 03:38:52 +0000 (20:38 -0700)]
tcp: consolidate congestion control undo functions
Most TCP congestion controls are using identical logic to undo
cwnd except BBR. This patch consolidates these similar functions
to the one used currently by Reno and others.
Suggested-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuchung Cheng [Fri, 4 Aug 2017 03:38:51 +0000 (20:38 -0700)]
tcp: fix cwnd undo in Reno and HTCP congestion controls
Using ssthresh to revert cwnd is less reliable when ssthresh is
bounded to 2 packets. This patch uses an existing variable in TCP
"prior_cwnd" that snapshots the cwnd right before entering fast
recovery and RTO recovery in Reno. This fixes the issue discussed
in netdev thread: "A buggy behavior for Linux TCP Reno and HTCP"
https://www.spinics.net/lists/netdev/msg444955.html
Suggested-by: Neal Cardwell <ncardwell@google.com>
Reported-by: Wei Sun <unlcsewsun@gmail.com>
Signed-off-by: Yuchung Cheng <ncardwell@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
kiki good [Thu, 3 Aug 2017 23:07:45 +0000 (00:07 +0100)]
net: systemport: Support 64bit statistics
When using Broadcom Systemport device in 32bit Platform, ifconfig can
only report up to 4G tx,rx status, which will be wrapped to 0 when the
number of incoming or outgoing packets exceeds 4G, only taking
around 2 hours in busy network environment (such as streaming).
Therefore, it makes hard for network diagnostic tool to get reliable
statistical result, so the patch is used to add 64bit support for
Broadcom Systemport device in 32bit Platform.
This patch provides 64bit statistics capability on both ethtool and ifconfig.
Signed-off-by: Jianming.qiao <kiki-good@hotmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Intiyaz Basha [Thu, 3 Aug 2017 22:10:17 +0000 (15:10 -0700)]
liquidio: moved console_bitmask module param to lio_main.c
Moving PF module param console_bitmask to lio_main.c for consistency.
Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Intiyaz Basha [Thu, 3 Aug 2017 20:08:24 +0000 (13:08 -0700)]
liquidio: add missing strings in oct_dev_state_str array
There's supposed to be a one-to-one correspondence between the 18 macros
that #define the OCT_DEV states (in octeon_device.h) and the strings in the
oct_dev_state_str array, but there are only 14 strings in the array.
Add the missing strings (so they become 18 in total), and also revise some
incorrect/outdated text of existing strings.
Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 7 Aug 2017 03:55:29 +0000 (20:55 -0700)]
Merge branch 'phylink-and-sfp-support'
Russell King says:
====================
phylink and sfp support
This patch series introduces generic support for SFP sockets found on
various Marvell based platforms. The idea here is to provide common
SFP socket support which can be re-used by network drivers as
appropriate, rather than each network driver having to re-implement
SFP socket support.
SFP sockets typically use other system resources, eg, I2C buses to read
identifying information, and GPIOs to monitor socket state and control
the socket. Meanwhile, some network drivers drive multiple ethernet
ports from one instantiation of the driver.
It is not desirable to block the initialisation of a network driver
(thus denying other ports from being operational) if the resources
for the SFP socket are not yet available. This means that an element
of independence between the SFP support code and the driver is
required.
More than that, SFP modules effectively bring hotplug PHYs to
networking - SFP copper modules normally contain a standard PHY
accessed over the I2C bus, and it is desirable to read their state
so network drivers can be appropriately configured.
To add to the complexity, SFP modules can be connected in at least
two places:
1. Directly to the serdes output of a MAC with no intervening PHY.
For example:
mvneta ----> SFP socket
2. To a PHY, for example:
mvpp2 ---> PHY ---> copper
|
`-----> SFP socket
This code supports both setups, although it's not fully implemented
with scenario (2).
Moreover, the link presented by the SFP module can be one of the
10Gbase-R family (for SFP+ sockets), SGMII or 1000base-X (for SFP
sockets) depending on the module, and network drivers need to
reconfigure themselves accordingly for the link to come up.
For example, if the MAC is configured for SGMII and a fibre module
is plugged in, the link won't come up until the MAC is reconfigured
for 1000base-X mode.
The SFP code manages the SFP socket - detecting the module, reading
the identifying information, and managing the control and status
signals. Importantly, it disables the SFP module transmitter when
the MAC is down, so that the laser is turned off (but that is not
a guarantee.)
phylink provides the mechanisms necessary to manage the link modes,
based on the SFP module type, and supports hot-plugging of the PHY
without needing the MAC driver to be brought up and down on
transitions. phylink also supports the classical static PHY and
fixed-link modes.
I currently (but not included in this series) have code to convert
mvneta to use phylink, and the out of tree mvpp2x driver. I have
nothing for the mvpp2 driver at present as that driver is only
recently becoming functional on 10G hardware, and is missing a lot
of features that are necessary to make things work correctly.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Tue, 25 Jul 2017 14:03:39 +0000 (15:03 +0100)]
sfp: add SFP module support
Add support for SFP hotpluggable modules via sfp-bus and phylink.
This supports both copper and optical SFP modules, which require
different Serdes modes in order to properly negotiate the link.
Optical SFP modules typically require the Serdes link to be talking
1000BaseX mode - this is the gigabit ethernet mode defined by the
802.3 standard.
Copper SFP modules typically integrate a PHY in the module to convert
from Serdes to copper, and the PHY will be configured by the vendor
to either present a 1000BaseX Serdes link (for fixed 1000BaseT) or a
SGMII Serdes link. However, this is vendor defined, so we instead
detect the PHY, switch the link to SGMII mode, and use traditional
PHY based negotiation.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Tue, 25 Jul 2017 14:03:34 +0000 (15:03 +0100)]
phylink: add in-band autonegotiation support for 10GBase-KR mode.
Add in-band autonegotation support for 10GBase-KR mode.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Tue, 25 Jul 2017 14:03:28 +0000 (15:03 +0100)]
phylink: add support for MII ioctl access to Clause 45 PHYs
Add support for reading and writing the clause 45 MII registers.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Tue, 25 Jul 2017 14:03:23 +0000 (15:03 +0100)]
phylink: add module EEPROM support
Add support for reading module EEPROMs through phylink.
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Tue, 25 Jul 2017 14:03:18 +0000 (15:03 +0100)]
sfp: add sfp-bus to bridge between network devices and sfp cages
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Tue, 25 Jul 2017 14:03:13 +0000 (15:03 +0100)]
phylink: add phylink infrastructure
The link between the ethernet MAC and its PHY has become more complex
as the interface evolves. This is especially true with serdes links,
where the part of the PHY is effectively integrated into the MAC.
Serdes links can be connected to a variety of devices, including SFF
modules soldered down onto the board with the MAC, a SFP cage with
a hotpluggable SFP module which may contain a PHY or directly modulate
the serdes signals onto optical media with or without a PHY, or even
a classical PHY connection.
Moreover, the negotiation information on serdes links comes in two
varieties - SGMII mode, where the PHY provides its speed/duplex/flow
control information to the MAC, and 1000base-X mode where both ends
exchange their abilities and each resolve the link capabilities.
This means we need a more flexible means to support these arrangements,
particularly with the hotpluggable nature of SFP, where the PHY can
be attached or detached after the network device has been brought up.
Ethtool information can come from multiple sources:
- we may have a PHY operating in either SGMII or 1000base-X mode, in
which case we take ethtool/mii data directly from the PHY.
- we may have a optical SFP module without a PHY, with the MAC
operating in 1000base-X mode - the ethtool/mii data needs to come
from the MAC.
- we may have a copper SFP module with a PHY whic can't be accessed,
which means we need to take ethtool/mii data from the MAC.
Phylink aims to solve this by providing an intermediary between the
MAC and PHY, providing a safe way for PHYs to be hotplugged, and
allowing a SFP driver to reconfigure the serdes connection.
Phylink also takes over support of fixed link connections, where the
speed/duplex/flow control are fixed, but link status may be controlled
by a GPIO signal. By avoiding the fixed-phy implementation, phylink
can provide a faster response to link events: fixed-phy has to wait for
phylib to operate its state machine, which can take several seconds.
In comparison, phylink takes milliseconds.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
- remove sync status
- rework supported and advertisment handling
- add 1000base-x speed for fixed links
- use functionality exported from phy-core, reworking
__phylink_ethtool_ksettings_set for it
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Tue, 25 Jul 2017 14:03:08 +0000 (15:03 +0100)]
net: phy: add I2C mdio bus
Add an I2C MDIO bus bridge library, to allow phylib to access PHYs which
are connected to an I2C bus instead of the more conventional MDIO bus.
Such PHYs can be found in SFP adapters and SFF modules.
Since PHYs appear at I2C bus address 0x40..0x5f, and 0x50/0x51 are
reserved for SFP EEPROMs/diagnostics, we must not allow the MDIO bus
to access these I2C addresses.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Tue, 25 Jul 2017 14:03:03 +0000 (15:03 +0100)]
net: phy: export phy_start_machine() for phylink
phylink will need phy_start_machine exported, so lets export it as a
GPL symbol. Documentation/networking/phy.txt indicates that this
should be a PHY API function.
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Tue, 25 Jul 2017 14:02:58 +0000 (15:02 +0100)]
net: phy: provide a hook for link up/link down events
Sometimes, we need to do additional work between the PHY coming up and
marking the carrier present - for example, we may need to wait for the
PHY to MAC link to finish negotiation. This changes phylib to provide
a notification function pointer which avoids the built-in
netif_carrier_on() and netif_carrier_off() functions.
Standard ->adjust_link functionality is provided by hooking a helper
into the new ->phy_link_change method.
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Tue, 25 Jul 2017 14:02:52 +0000 (15:02 +0100)]
net: phy: add 1000Base-X to phy settings table
Add the missing 1000Base-X entry to the phy settings table. This was
not included because the original code could not cope with more than
32 bits of link mode mask.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Tue, 25 Jul 2017 14:02:47 +0000 (15:02 +0100)]
net: phy: move phy_lookup_setting() and guts of phy_supported_speeds() to phy-core
phy_lookup_setting() provides useful functionality in ethtool code
outside phylib. Move it to phy-core and allow it to be re-used (eg,
in phylink) rather than duplicated elsewhere. Note that this supports
the larger linkmode space.
As we move the phy settings table, we also need to move the guts of
phy_supported_speeds() as well.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Tue, 25 Jul 2017 14:02:42 +0000 (15:02 +0100)]
net: phy: split out PHY speed and duplex string generation
Other code would like to make use of this, so make the speed and duplex
string generation visible, and place it in a separate file.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Tue, 25 Jul 2017 14:02:37 +0000 (15:02 +0100)]
net: phy: allow settings table to support more than 32 link modes
Allow the phy settings table to support more than 32 link modes by
switching to the ethtool link mode bit number representation, rather
than storing the mask. This will allow phylink and other ethtool
code to share the settings table to look up settings.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 7 Aug 2017 03:51:12 +0000 (20:51 -0700)]
Merge branch 'IP-cleanup-LSRR-option-processing'
Paolo Abeni says:
====================
IP: cleanup LSRR option processing
The __ip_options_echo() function expect a valid dst entry in skb->dst;
as result we sometimes need to preserve the dst entry for the whole IP
RX path.
The current usage of skb->dst looks more a relic from ancient past that
a real functional constraint. This patchset tries to remove such usage,
and than drops some hacks currently in place in the IP code to keep
skb->dst around.
__ip_options_echo() uses of skb->dst for two different purposes: retrieving
the netns assicated with the skb, and modify the ingress packet LSRR address
list.
The first patch removes the code modifying the ingress packet, and the second
one provides an explicit netns argument to __ip_options_echo(). The following
patches cleanup the current code keeping arund skb->dst for __ip_options_echo's
sake.
Updating the __ip_options_echo() function has been previously discussed here:
http://marc.info/?l=linux-netdev&m=
150064533516348&w=2
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Thu, 3 Aug 2017 16:07:08 +0000 (18:07 +0200)]
udp: no need to preserve skb->dst
__ip_options_echo() does not need anymore skb->dst, so we can
avoid explicitly preserving it for its own sake.
This is almost a revert of commit
0ddf3fb2c43d ("udp: preserve
skb->dst if required for IP options processing") plus some
lifting to fit later changes.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Thu, 3 Aug 2017 16:07:07 +0000 (18:07 +0200)]
Revert "ipv4: keep skb->dst around in presence of IP options"
ip_options_echo() does not use anymore the skb->dst and don't
need to keep the dst around for options's sake only.
This reverts commit
34b2cef20f19c87999fff3da4071e66937db9644.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Thu, 3 Aug 2017 16:07:06 +0000 (18:07 +0200)]
ip/options: explicitly provide net ns to __ip_options_echo()
__ip_options_echo() uses the current network namespace, and
currently retrives it via skb->dst->dev.
This commit adds an explicit 'net' argument to __ip_options_echo()
and update all the call sites to provide it, usually via a simpler
sock_net().
After this change, __ip_options_echo() no more needs to access
skb->dst and we can drop a couple of hack to preserve such
info in the rx path.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Thu, 3 Aug 2017 16:07:05 +0000 (18:07 +0200)]
IP: do not modify ingress packet IP option in ip_options_echo()
While computing the response option set for LSRR, ip_options_echo()
also changes the ingress packet LSRR addresses list, setting
the last one to the dst specific address for the ingress packet
- via memset(start[ ...
The only visible effect of such change - beyond possibly damaging
shared/cloned skbs - is modifying the data carried by ICMP replies
changing the header information for reported the ingress packet,
which violates RFC1122 3.2.2.6.
All the others call sites just ignore the ingress packet IP options
after calling ip_options_echo()
Note that the last element in the LSRR option address list for the
reply packet will be properly set later in the ip output path
via ip_options_build().
This buggy memset() predates git history and apparently was present
into the initial ip_options_echo() implementation in linux 1.3.30 but
still looks wrong.
The removal of the fib_compute_spec_dst() call will help
completely dropping the skb->dst usage by __ip_options_echo() with a
later patch.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Mon, 7 Aug 2017 01:44:49 +0000 (18:44 -0700)]
Linux 4.13-rc4
Linus Torvalds [Sun, 6 Aug 2017 23:11:34 +0000 (16:11 -0700)]
Merge tag 'platform-drivers-x86-v4.13-4' of git://git.infradead.org/linux-platform-drivers-x86
Pull x86 platform driver fix from Darren Hart:
"Fix loop preventing some platforms from waking up via the power button
in s2idle:
- intel-vbtn: match power button on press rather than release"
* tag 'platform-drivers-x86-v4.13-4' of git://git.infradead.org/linux-platform-drivers-x86:
platform/x86: intel-vbtn: match power button on press rather than release
Linus Torvalds [Sun, 6 Aug 2017 19:31:17 +0000 (12:31 -0700)]
Merge tag 'ext4_for_linus_stable' of git://git./linux/kernel/git/tytso/ext4
Pull ext4 fixes from Ted Ts'o:
"A large number of ext4 bug fixes and cleanups for v4.13"
* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: fix copy paste error in ext4_swap_extents()
ext4: fix overflow caused by missing cast in ext4_resize_fs()
ext4, project: expand inode extra size if possible
ext4: cleanup ext4_expand_extra_isize_ea()
ext4: restructure ext4_expand_extra_isize
ext4: fix forgetten xattr lock protection in ext4_expand_extra_isize
ext4: make xattr inode reads faster
ext4: inplace xattr block update fails to deduplicate blocks
ext4: remove unused mode parameter
ext4: fix warning about stack corruption
ext4: fix dir_nlink behaviour
ext4: silence array overflow warning
ext4: fix SEEK_HOLE/SEEK_DATA for blocksize < pagesize
ext4: release discard bio after sending discard commands
ext4: convert swap_inode_data() over to use swap() on most of the fields
ext4: error should be cleared if ea_inode isn't added to the cache
ext4: Don't clear SGID when inheriting ACLs
ext4: preserve i_mode if __ext4_set_acl() fails
ext4: remove unused metadata accounting variables
ext4: correct comment references to ext4_ext_direct_IO()
Linus Torvalds [Sun, 6 Aug 2017 18:52:01 +0000 (11:52 -0700)]
Merge branch 'upstream' of git://git.linux-mips.org/ralf/upstream-linus
Pull MIPS fixes from Ralf Baechle:
"This fixes two build issues for ralink platforms, both due to missing
#includes which used to be included indirectly via other headers"
* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
MIPS: ralink: mt7620: Add missing header
MIPS: ralink: Fix build error due to missing header
Dmitry V. Levin [Sat, 5 Aug 2017 20:00:50 +0000 (23:00 +0300)]
Fix compat_sys_sigpending breakage
The latest change of compat_sys_sigpending in commit
8f13621abced
("sigpending(): move compat to native") has broken it in two ways.
First, it tries to write 4 bytes more than userspace expects:
sizeof(old_sigset_t) == sizeof(long) == 8 instead of
sizeof(compat_old_sigset_t) == sizeof(u32) == 4.
Second, on big endian architectures these bytes are being written in the
wrong order.
This bug was found by strace test suite.
Reported-by: Anatoly Pugachev <matorola@gmail.com>
Inspired-by: Eugene Syromyatnikov <evgsyr@gmail.com>
Fixes:
8f13621abced ("sigpending(): move compat to native")
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Maninder Singh [Sun, 6 Aug 2017 05:33:07 +0000 (01:33 -0400)]
ext4: fix copy paste error in ext4_swap_extents()
This bug was found by a static code checker tool for copy paste
problems.
Signed-off-by: Maninder Singh <maninder1.s@samsung.com>
Signed-off-by: Vaneet Narang <v.narang@samsung.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Jerry Lee [Sun, 6 Aug 2017 05:18:31 +0000 (01:18 -0400)]
ext4: fix overflow caused by missing cast in ext4_resize_fs()
On a 32-bit platform, the value of n_blcoks_count may be wrong during
the file system is resized to size larger than 2^32 blocks. This may
caused the superblock being corrupted with zero blocks count.
Fixes:
1c6bd7173d66
Signed-off-by: Jerry Lee <jerrylee@qnap.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org # 3.7+
Miao Xie [Sun, 6 Aug 2017 05:00:49 +0000 (01:00 -0400)]
ext4, project: expand inode extra size if possible
When upgrading from old format, try to set project id
to old file first time, it will return EOVERFLOW, but if
that file is dirtied(touch etc), changing project id will
be allowed, this might be confusing for users, we could
try to expand @i_extra_isize here too.
Reported-by: Zhang Yi <yi.zhang@huawei.com>
Signed-off-by: Miao Xie <miaoxie@huawei.com>
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Miao Xie [Sun, 6 Aug 2017 04:55:48 +0000 (00:55 -0400)]
ext4: cleanup ext4_expand_extra_isize_ea()
Clean up some goto statement, make ext4_expand_extra_isize_ea() clearer.
Signed-off-by: Miao Xie <miaoxie@huawei.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Miao Xie [Sun, 6 Aug 2017 04:40:01 +0000 (00:40 -0400)]
ext4: restructure ext4_expand_extra_isize
Current ext4_expand_extra_isize just tries to expand extra isize, if
someone is holding xattr lock or some check fails, it will give up.
So rename its name to ext4_try_to_expand_extra_isize.
Besides that, we clean up unnecessary check and move some relative checks
into it.
Signed-off-by: Miao Xie <miaoxie@huawei.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Wang Shilong <wshilong@ddn.com>