Hante Meuleman [Thu, 8 Oct 2015 18:33:15 +0000 (20:33 +0200)]
brcmfmac: Rework p2p attach, use single method for p2p dev creation.
When module param p2pon is used a p2p device is created at init.
This patch reworks how this is done by using the same method as
for a dynamically (by user space) created p2p device.
Reviewed-by: Arend Van Spriel <arend@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieterpg@broadcom.com>
Reviewed-by: Franky (Zhenhui) Lin <frankyl@broadcom.com>
Signed-off-by: Hante Meuleman <meuleman@broadcom.com>
Signed-off-by: Arend van Spriel <arend@broadcom.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Arend van Spriel [Thu, 8 Oct 2015 18:33:14 +0000 (20:33 +0200)]
brcmfmac: remove conversational comment
Removing a comment that was only useful during the review of
the change that introduced it and which should never have been
submitted.
Reviewed-by: Hante Meuleman <meuleman@broadcom.com>
Signed-off-by: Arend van Spriel <arend@broadcom.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Franky Lin [Thu, 8 Oct 2015 18:33:13 +0000 (20:33 +0200)]
brcmfmac: rename firmware_path to alternative_fw_path
In brcmfmac the module parameter "firmware_path" is used as an
alternative relative path under the search path used by firmware_class
or ueventhelper. Rename the parameter to alternative_fw_path to avoid
confusion.
Reviewed-by: Pieter-Paul Giesberts <pieterpg@broadcom.com>
Reviewed-by: Arend Van Spriel <arend@broadcom.com>
Reviewed-by: Hante Meuleman <meuleman@broadcom.com>
Signed-off-by: Franky Lin <frankyl@broadcom.com>
Signed-off-by: Arend van Spriel <arend@broadcom.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Hante Meuleman [Thu, 8 Oct 2015 18:33:12 +0000 (20:33 +0200)]
brcmfmac: Fix race condition between USB probe/load and disconnect.
When a USB device gets disconnected due to for example removal
then it is possible that it is still in the loading phase due to
the asynchronous load routines. These routines can then possible
access memory which has been freed. Fix this by mutex locking the
device init phase.
Reviewed-by: Arend Van Spriel <arend@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieterpg@broadcom.com>
Signed-off-by: Hante Meuleman <meuleman@broadcom.com>
Signed-off-by: Arend van Spriel <arend@broadcom.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Arend van Spriel [Thu, 8 Oct 2015 18:33:11 +0000 (20:33 +0200)]
brcmfmac: expose device memory to devcoredump subsystem
Upon PSM watchdog event received from firmware the driver will obtain
a memory snapshot of the device and expose it to user-space through
the devcoredump framework. This will trigger a uevent.
Reviewed-by: Hante Meuleman <meuleman@broadcom.com>
Reviewed-by: Franky (Zhenhui) Lin <frankyl@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieterpg@broadcom.com>
Signed-off-by: Arend van Spriel <arend@broadcom.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Jes Sorensen [Thu, 15 Oct 2015 00:44:51 +0000 (20:44 -0400)]
New driver: rtl8xxxu (mac80211)
This is an alternate driver for a number of Realtek WiFi USB devices,
including RTL8723AU, RTL8188CU, RTL8188RU, RTL8191CU, and RTL8192CU.
It was written from scratch utilizing the Linux mac80211 stack.
After spending months cleaning up the vendor provided rtl8723au
driver, which comes with it's own 802.11 stack included, I decided to
rewrite this driver from the bottom up.
Many thanks to Johannes Berg for 802.11 insights and help and Larry
Finger for help with the vendor driver.
The full git log for the development of this driver can be found here:
git git://git.kernel.org/pub/scm/linux/kernel/git/jes/linux.git
branch rtl8723au-mac80211
This driver is still under development, but has proven to be very
stable for me. It currently supports station mode only. It has support
for OFDM and CCK rates. It does lack certain features found in the
staging driver, such as power management, AMPDU, and 40MHz channel
support. In addition it does not support AD-HOC, AP, and monitor mode
support at this point.
The driver is known to work with the following devices:
Lenovo Yoga (rtl8723au)
TP-Link TL-WN823N (rtl8192cu)
Etekcity 6R (rtl8188cu)
Daffodil LAN03 (rtl8188cu)
Alfa AWUS036NHR (rtl8188ru)
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Xinming Hu [Fri, 9 Oct 2015 11:26:35 +0000 (04:26 -0700)]
mwifiex: remove unnecessary NULL check
ra_list cannot be NULL here, so remove the unnecessary NULL check.
Signed-off-by: Xinming Hu <huxm@marvell.com>
Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Amitkumar Karwar [Fri, 9 Oct 2015 11:26:34 +0000 (04:26 -0700)]
mwifiex: add ndo_validate_addr netdev ops
ndo_validate_addr is set to generic eth_validate_addr() function
used for MAC address validation.
Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Ganapathi Bhat [Fri, 9 Oct 2015 11:26:33 +0000 (04:26 -0700)]
mwifiex: fix AP VHT behaviour
Even if hostapd configuration file contains VHT parameters,
they were not getting reflected in beacons. The reason is
we are resetting them before starting AP. This patch removes
redundant BSS_STOP and SYS_RESET firmware commands before
starting AP to fix the problem.
Signed-off-by: Ganapathi Bhat <gbhat@marvell.com>
Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Amitkumar Karwar [Fri, 9 Oct 2015 11:26:23 +0000 (04:26 -0700)]
mwifiex: control WLAN and bluetooth coexistence modes
By default our chip will be in spatial coexistence mode.
This patch adds a provision to change it to timeshare mode
via debugfs command.
Enable timeshare coexistence mode
echo 1 > /sys/kernel/debug/mwifiex/mlan0/timeshare_coex
Go back to spacial coexistence mode
echo 0 > /sys/kernel/debug/mwifiex/mlan0/timeshare_coex
Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Geliang Tang [Sun, 4 Oct 2015 08:46:55 +0000 (16:46 +0800)]
mwifiex: fix a comment typo
Just fix a typo in the code comment.
Signed-off-by: Geliang Tang <geliangtang@163.com>
Acked-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Miaoqing Pan [Tue, 29 Sep 2015 05:24:37 +0000 (13:24 +0800)]
ath9k: fix QCA9561 XLNA rxgain initial
A small bugfix for commit
ede6a5e7b859 ("ath9k: Add QCA956x HW support").
I guess I would have skipped renaming (that initial QCA956x commit has
been there already for almost a year with the "5g" in the name) and move
the call outside AR_SREV_9462_20_OR_LATER() to make it reachable.
Signed-off-by: Miaoqing Pan <miaoqing@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Miaoqing Pan [Tue, 29 Sep 2015 05:24:36 +0000 (13:24 +0800)]
ath9k: rename ini_modes_rxgain_5g_xlna to ini_modes_rxgain_xlna
rename the variable as preparation for using the array with 2.4 GHz
band, etc.
Signed-off-by: Miaoqing Pan <miaoqing@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Priit Laes [Tue, 15 Sep 2015 06:01:56 +0000 (09:01 +0300)]
rtlwifi: rtl8192cu: Add missing case in rtl92cu_get_hw_reg
Driver was reporting 'switch case not processed' after association,
so HW_VAR_KEEP_ALIVE was added and filled similarily to other drivers.
Positive side effect to this seems to be a bit more stable connection.
Signed-off-by: Priit Laes <plaes@plaes.org>
Acked-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Amitkumar Karwar [Thu, 10 Sep 2015 14:27:34 +0000 (07:27 -0700)]
mwifiex: correction in USB8997 chipset's product ID
For 8897 chipset, mwifiex_usb unnecessarily used to
come into picture when wlan is supposed to be used
via PCIe interface and USB interface is for
bluetooth.
This problem has been resolved for newer chipset by
having separate USB product ids for USB-USB8997 and
PCIe-USB8997 chipset variants. This patch ensures to
use wlan specific product id.
Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Frank Huang <frankh@marvell.com>
Signed-off-by: Nishant Sarmukadam <nishants@marvell.com>
Signed-off-by: Cathy Luo <cluo@marvell.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Amitkumar Karwar [Thu, 10 Sep 2015 14:27:33 +0000 (07:27 -0700)]
mwifiex: remove USB8897 chipset support
We don't have any customer using this chipset via USB
interface. if both mwifiex_pcie and mwifiex_usb modules
are enabled by user, sometimes mwifiex_usb wins the race
even if user wants wlan interface to be on PCIe and
USB for bluetooth. This patch solves the problem.
Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Frank Huang <frankh@marvell.com>
Signed-off-by: Nishant Sarmukadam <nishants@marvell.com>
Signed-off-by: Cathy Luo <cluo@marvell.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Martin Blumenstingl [Wed, 22 Jul 2015 08:42:43 +0000 (10:42 +0200)]
ath9k: Fix NF CCA limits for AR9287 and AR9227
The FreeBSD driver [0] uses the same 2G values as for the AR9280 chips.
Using the same values in ath9k results in much better throughput for me.
Before this patch I had a huge amount of packet loss (sometimes up to
40%) and the max transfer speed was somewhere around 5Mbit/s. With this
patch applied I have zero packet loss and ten times the throughput.
My device uses a AR9227 which is the PCI variant of the AR9287.
[0] http://bxr.su/FreeBSD/sys/dev/ath/ath_hal/ar9002/ar9287.h
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Larry Finger [Mon, 7 Sep 2015 20:59:16 +0000 (15:59 -0500)]
rtlwifi: rtl818x: Move drivers into new realtek directory
Now that a new mac80211-based driver for Realtek devices has been submitted,
it is time to reorganize the directories. Rather than having directories
rtlwifi and rtl818x be in drivers/net/wireless/, they will now be in
drivers/net/wireless/realtek/. This change simplifies the directory
structure, but does not result in any configuration changes that are
visable to the user.
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Nikolay Aleksandrov [Sun, 11 Oct 2015 10:49:56 +0000 (12:49 +0200)]
bridge: vlan: enforce no pvid flag in vlan ranges
Currently it's possible for someone to send a vlan range to the kernel
with the pvid flag set which will result in the pvid bouncing from a
vlan to vlan and isn't correct, it also introduces problems for hardware
where it doesn't make sense having more than 1 pvid. iproute2 already
enforces this, so let's enforce it on kernel-side as well.
Reported-by: Elad Raz <eladr@mellanox.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tillmann Heidsieck [Sat, 10 Oct 2015 19:47:19 +0000 (21:47 +0200)]
atm: iphase: fix misleading indention
Fix a smatch warning:
drivers/atm/iphase.c:1178 rx_pkt() warn: curly braces intended?
The code is correct, the indention is misleading. In case the allocation
of skb fails, we want to skip to the end.
Signed-off-by: Tillmann Heidsieck <theidsieck@leenox.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tillmann Heidsieck [Sat, 10 Oct 2015 19:47:18 +0000 (21:47 +0200)]
atm: iphase: return -ENOMEM instead of -1 in case of failed kmalloc()
Smatch complains about returning hard coded error codes, silence this
warning.
drivers/atm/iphase.c:115 ia_enque_rtn_q() warn: returning -1 instead of -ENOMEM is sloppy
Signed-off-by: Tillmann Heidsieck <theidsieck@leenox.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Roopa Prabhu [Sat, 10 Oct 2015 15:26:36 +0000 (08:26 -0700)]
ipv6 route: use err pointers instead of returning pointer by reference
This patch makes ip6_route_info_create return err pointer instead of
returning the rt pointer by reference as suggested by Dave
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
huangdaode [Sat, 10 Oct 2015 09:20:38 +0000 (17:20 +0800)]
net: hns: fix the unknown phy_nterface_t type error
This patch fix the building error reported by Jiri Pirko <jiri@resnulli.us>
drivers/net/ethernet/hisilicon/hns/hnae.h:465:2: error: unknown type
name 'phy_interface_t'
phy_interface_t phy_if;
^
the full build log is on https://lists.01.org/pipermail/kbuild-all.
Signed-off-by: huangdaode <huangdaode@hisilicon.com>
Signed-off-by: yankejian <yankejian@huawei.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 9 Oct 2015 22:42:21 +0000 (15:42 -0700)]
tun: use sk_fullsock() before reading sk->sk_tsflags
timewait or request sockets are small and do not contain sk->sk_tsflags
Without this fix, we might read garbage, and crash later in
__skb_complete_tx_timestamp()
-> sock_queue_err_skb()
(These pseudo sockets do not have an error queue either)
Fixes:
ca6fb0651883 ("tcp: attach SYNACK messages to request sockets instead of listener")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 13 Oct 2015 02:44:22 +0000 (19:44 -0700)]
Merge branch 'netns-defrag'
Eric W. Biederman says:
====================
net: Pass net into defragmentation
This is the next installment of my work to pass struct net through the
output path so the code does not need to guess how to figure out which
network namespace it is in, and ultimately routes can have output
devices in another network namespace.
In netfilter and af_packet we defragment packets in the output path,
and there is the usual amount of confusion about how to compute which
net we are processing the packets in. This patchset clears that
confusion up by explicitly passing in struct net in ip_defrag,
ip_check_defrag, and nf_ct_frag6_gather.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric W. Biederman [Fri, 9 Oct 2015 18:44:55 +0000 (13:44 -0500)]
ipv6: Pass struct net into nf_ct_frag6_gather
The function nf_ct_frag6_gather is called on both the input and the
output paths of the networking stack. In particular ipv6_defrag which
calls nf_ct_frag6_gather is called from both the the PRE_ROUTING chain
on input and the LOCAL_OUT chain on output.
The addition of a net parameter makes it explicit which network
namespace the packets are being reassembled in, and removes the need
for nf_ct_frag6_gather to guess.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric W. Biederman [Fri, 9 Oct 2015 18:44:54 +0000 (13:44 -0500)]
ipv4: Pass struct net into ip_defrag and ip_check_defrag
The function ip_defrag is called on both the input and the output
paths of the networking stack. In particular conntrack when it is
tracking outbound packets from the local machine calls ip_defrag.
So add a struct net parameter and stop making ip_defrag guess which
network namespace it needs to defragment packets in.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric W. Biederman [Fri, 9 Oct 2015 18:44:53 +0000 (13:44 -0500)]
ipv4: Only compute net once in ip_call_ra_chain
ip_call_ra_chain is called early in the forwarding chain from
ip_forward and ip_mr_input, which makes skb->dev the correct
expression to get the input network device and dev_net(skb->dev) a
correct expression for the network namespace the packet is being
processed in.
Compute the network namespace and store it in a variable to make the
code clearer.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 9 Oct 2015 18:29:32 +0000 (11:29 -0700)]
packet: fix match_fanout_group()
Recent TCP listener patches exposed a prior af_packet bug :
match_fanout_group() blindly assumes it is always safe
to cast sk to a packet socket to compare fanout with af_packet_priv
But SYNACK packets can be sent while attached to request_sock, which
are smaller than a "struct sock".
We can read non existent memory and crash.
Fixes:
c0de08d04215 ("af_packet: don't emit packet on orig fanout group")
Fixes:
ca6fb0651883 ("tcp: attach SYNACK messages to request sockets instead of listener")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Eric Leblond <eric@regit.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 13 Oct 2015 02:39:18 +0000 (19:39 -0700)]
Merge tag 'wireless-drivers-next-for-davem-2015-10-09' of git://git./linux/kernel/git/kvalo/wireless-drivers-next
Kalle Valo says:
====================
Major changes:
iwlwifi
* some debugfs improvements
* fix signedness in beacon statistics
* deinline some functions to reduce size when device tracing is enabled
* filter beacons out in AP mode when no stations are associated
* deprecate firmwares version -12
* fix a runtime PM vs. legacy suspend race
* one-liner fix for a ToF bug
* clean-ups in the rx code
* small debugging improvement
* fix WoWLAN with new firmware versions
* more clean-ups towards multiple RX queues;
* some rate scaling fixes and improvements;
* some time-of-flight fixes;
* other generic improvements and clean-ups;
brcmfmac
* rework code dealing with multiple interfaces
* allow logging firmware console using debug level
* support for BCM4350, BCM4365, and BCM4366 PCIE devices
* fixed for legacy P2P and P2P device handling
* correct set and get tx-power
ath9k
* add support for Outside Context of a BSS (OCB) mode
mwifiex
* add USB multichannel feature
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Fri, 9 Oct 2015 12:34:31 +0000 (14:34 +0200)]
ipv4/icmp: redirect messages can use the ingress daddr as source
This patch allows configuring how the source address of ICMP
redirect messages is selected; by default the old behaviour is
retained, while setting icmp_redirects_use_orig_daddr force the
usage of the destination address of the packet that caused the
redirect.
The new behaviour fits closely the RFC 5798 section 8.1.1, and fix the
following scenario:
Two machines are set up with VRRP to act as routers out of a subnet,
they have IPs x.x.x.1/24 and x.x.x.2/24, with VRRP holding on to
x.x.x.254/24.
If a host in said subnet needs to get an ICMP redirect from the VRRP
router, i.e. to reach a destination behind a different gateway, the
source IP in the ICMP redirect is chosen as the primary IP on the
interface that the packet arrived at, i.e. x.x.x.1 or x.x.x.2.
The host will then ignore said redirect, due to RFC 1122 section 3.2.2.2,
and will continue to use the wrong next-op.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Fri, 9 Oct 2015 11:54:11 +0000 (13:54 +0200)]
bridge: try switchdev op first in __vlan_vid_add/del
Some drivers need to implement both switchdev vlan ops and
vid_add/kill ndos. For that to work in bridge code, we need to try
switchdev op first when adding/deleting vlan id.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
wangweidong [Tue, 13 Oct 2015 02:05:19 +0000 (10:05 +0800)]
BNX2: free temp_stats_blk on error path
In bnx2_init_board, missing free temp_stats_blk on error path when
some operations do failed. Just add the 'kfree' operation.
Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 13 Oct 2015 02:28:32 +0000 (19:28 -0700)]
Merge branch 'setsockopt_incoming_cpu'
Eric Dumazet says:
====================
tcp: better smp listener behavior
As promised in last patch series, we implement a better SO_REUSEPORT
strategy, based on cpu hints if given by the application.
We also moved sk_refcnt out of the cache line containing the lookup
keys, as it was considerably slowing down smp operations because
of false sharing. This was simpler than converting listen sockets
to conventional RCU (to avoid sk_refcnt dirtying)
Could process 6.0 Mpps SYN instead of 4.2 Mpps on my test server.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 9 Oct 2015 02:33:24 +0000 (19:33 -0700)]
tcp: shrink tcp_timewait_sock by 8 bytes
Reducing tcp_timewait_sock from 280 bytes to 272 bytes
allows SLAB to pack 15 objects per page instead of 14 (on x86)
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 9 Oct 2015 02:33:23 +0000 (19:33 -0700)]
net: shrink struct sock and request_sock by 8 bytes
One 32bit hole is following skc_refcnt, use it.
skc_incoming_cpu can also be an union for request_sock rcv_wnd.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 9 Oct 2015 02:33:22 +0000 (19:33 -0700)]
net: align sk_refcnt on 128 bytes boundary
sk->sk_refcnt is dirtied for every TCP/UDP incoming packet.
This is a performance issue if multiple cpus hit a common socket,
or multiple sockets are chained due to SO_REUSEPORT.
By moving sk_refcnt 8 bytes further, first 128 bytes of sockets
are mostly read. As they contain the lookup keys, this has
a considerable performance impact, as cpus can cache them.
These 8 bytes are not wasted, we use them as a place holder
for various fields, depending on the socket type.
Tested:
SYN flood hitting a 16 RX queues NIC.
TCP listener using 16 sockets and SO_REUSEPORT
and SO_INCOMING_CPU for proper siloing.
Could process 6.0 Mpps SYN instead of 4.2 Mpps
Kernel profile looked like :
11.68% [kernel] [k] sha_transform
6.51% [kernel] [k] __inet_lookup_listener
5.07% [kernel] [k] __inet_lookup_established
4.15% [kernel] [k] memcpy_erms
3.46% [kernel] [k] ipt_do_table
2.74% [kernel] [k] fib_table_lookup
2.54% [kernel] [k] tcp_make_synack
2.34% [kernel] [k] tcp_conn_request
2.05% [kernel] [k] __netif_receive_skb_core
2.03% [kernel] [k] kmem_cache_alloc
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 9 Oct 2015 02:33:21 +0000 (19:33 -0700)]
net: SO_INCOMING_CPU setsockopt() support
SO_INCOMING_CPU as added in commit
2c8c56e15df3 was a getsockopt() command
to fetch incoming cpu handling a particular TCP flow after accept()
This commits adds setsockopt() support and extends SO_REUSEPORT selection
logic : If a TCP listener or UDP socket has this option set, a packet is
delivered to this socket only if CPU handling the packet matches the specified
one.
This allows to build very efficient TCP servers, using one listener per
RX queue, as the associated TCP listener should only accept flows handled
in softirq by the same cpu.
This provides optimal NUMA behavior and keep cpu caches hot.
Note that __inet_lookup_listener() still has to iterate over the list of
all listeners. Following patch puts sk_refcnt in a different cache line
to let this iteration hit only shared and read mostly cache lines.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Jee [Thu, 8 Oct 2015 21:56:49 +0000 (14:56 -0700)]
packet: support per-packet fwmark for af_packet sendmsg
Signed-off-by: Edward Hyunkoo Jee <edjee@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Jee [Thu, 8 Oct 2015 21:56:48 +0000 (14:56 -0700)]
sock: support per-packet fwmark
It's useful to allow users to set fwmark for an individual packet,
without changing the socket state. The function this patch adds in
sock layer can be used by the protocols that need such a feature.
Signed-off-by: Edward Hyunkoo Jee <edjee@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 13 Oct 2015 02:13:41 +0000 (19:13 -0700)]
Merge branch 'bpf-unprivileged'
Alexei Starovoitov says:
====================
bpf: unprivileged
v1-v2:
- this set logically depends on cb patch
"bpf: fix cb access in socket filter programs":
http://patchwork.ozlabs.org/patch/527391/
which is must have to allow unprivileged programs.
Thanks Daniel for finding that issue.
- refactored sysctl to be similar to 'modules_disabled'
- dropped bpf_trace_printk
- split tests into separate patch and added more tests
based on discussion
v1 cover letter:
I think it is time to liberate eBPF from CAP_SYS_ADMIN.
As was discussed when eBPF was first introduced two years ago
the only piece missing in eBPF verifier is 'pointer leak detection'
to make it available to non-root users.
Patch 1 adds this pointer analysis.
The eBPF programs, obviously, need to see and operate on kernel addresses,
but with these extra checks they won't be able to pass these addresses
to user space.
Patch 2 adds accounting of kernel memory used by programs and maps.
It changes behavoir for existing root users, but I think it needs
to be done consistently for both root and non-root, since today
programs and maps are only limited by number of open FDs (RLIMIT_NOFILE).
Patch 2 accounts program's and map's kernel memory as RLIMIT_MEMLOCK.
Unprivileged eBPF is only meaningful for 'socket filter'-like programs.
eBPF programs for tracing and TC classifiers/actions will stay root only.
In parallel the bpf fuzzing effort is ongoing and so far
we've found only one verifier bug and that was already fixed.
The 'constant blinding' pass also being worked on.
It will obfuscate constant-like values that are part of eBPF ISA
to make jit spraying attacks even harder.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexei Starovoitov [Thu, 8 Oct 2015 05:23:23 +0000 (22:23 -0700)]
bpf: add unprivileged bpf tests
Add new tests samples/bpf/test_verifier:
unpriv: return pointer
checks that pointer cannot be returned from the eBPF program
unpriv: add const to pointer
unpriv: add pointer to pointer
unpriv: neg pointer
checks that pointer arithmetic is disallowed
unpriv: cmp pointer with const
unpriv: cmp pointer with pointer
checks that comparison of pointers is disallowed
Only one case allowed 'void *value = bpf_map_lookup_elem(..); if (value == 0) ...'
unpriv: check that printk is disallowed
since bpf_trace_printk is not available to unprivileged
unpriv: pass pointer to helper function
checks that pointers cannot be passed to functions that expect integers
If function expects a pointer the verifier allows only that type of pointer.
Like 1st argument of bpf_map_lookup_elem() must be pointer to map.
(applies to non-root as well)
unpriv: indirectly pass pointer on stack to helper function
checks that pointer stored into stack cannot be used as part of key
passed into bpf_map_lookup_elem()
unpriv: mangle pointer on stack 1
unpriv: mangle pointer on stack 2
checks that writing into stack slot that already contains a pointer
is disallowed
unpriv: read pointer from stack in small chunks
checks that < 8 byte read from stack slot that contains a pointer is
disallowed
unpriv: write pointer into ctx
checks that storing pointers into skb->fields is disallowed
unpriv: write pointer into map elem value
checks that storing pointers into element values is disallowed
For example:
int bpf_prog(struct __sk_buff *skb)
{
u32 key = 0;
u64 *value = bpf_map_lookup_elem(&map, &key);
if (value)
*value = (u64) skb;
}
will be rejected.
unpriv: partial copy of pointer
checks that doing 32-bit register mov from register containing
a pointer is disallowed
unpriv: pass pointer to tail_call
checks that passing pointer as an index into bpf_tail_call
is disallowed
unpriv: cmp map pointer with zero
checks that comparing map pointer with constant is disallowed
unpriv: write into frame pointer
checks that frame pointer is read-only (applies to root too)
unpriv: cmp of frame pointer
checks that R10 cannot be using in comparison
unpriv: cmp of stack pointer
checks that Rx = R10 - imm is ok, but comparing Rx is not
unpriv: obfuscate stack pointer
checks that Rx = R10 - imm is ok, but Rx -= imm is not
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexei Starovoitov [Thu, 8 Oct 2015 05:23:22 +0000 (22:23 -0700)]
bpf: charge user for creation of BPF maps and programs
since eBPF programs and maps use kernel memory consider it 'locked' memory
from user accounting point of view and charge it against RLIMIT_MEMLOCK limit.
This limit is typically set to 64Kbytes by distros, so almost all
bpf+tracing programs would need to increase it, since they use maps,
but kernel charges maximum map size upfront.
For example the hash map of 1024 elements will be charged as 64Kbyte.
It's inconvenient for current users and changes current behavior for root,
but probably worth doing to be consistent root vs non-root.
Similar accounting logic is done by mmap of perf_event.
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexei Starovoitov [Thu, 8 Oct 2015 05:23:21 +0000 (22:23 -0700)]
bpf: enable non-root eBPF programs
In order to let unprivileged users load and execute eBPF programs
teach verifier to prevent pointer leaks.
Verifier will prevent
- any arithmetic on pointers
(except R10+Imm which is used to compute stack addresses)
- comparison of pointers
(except if (map_value_ptr == 0) ... )
- passing pointers to helper functions
- indirectly passing pointers in stack to helper functions
- returning pointer from bpf program
- storing pointers into ctx or maps
Spill/fill of pointers into stack is allowed, but mangling
of pointers stored in the stack or reading them byte by byte is not.
Within bpf programs the pointers do exist, since programs need to
be able to access maps, pass skb pointer to LD_ABS insns, etc
but programs cannot pass such pointer values to the outside
or obfuscate them.
Only allow BPF_PROG_TYPE_SOCKET_FILTER unprivileged programs,
so that socket filters (tcpdump), af_packet (quic acceleration)
and future kcm can use it.
tracing and tc cls/act program types still require root permissions,
since tracing actually needs to be able to see all kernel pointers
and tc is for root only.
For example, the following unprivileged socket filter program is allowed:
int bpf_prog1(struct __sk_buff *skb)
{
u32 index = load_byte(skb, ETH_HLEN + offsetof(struct iphdr, protocol));
u64 *value = bpf_map_lookup_elem(&my_map, &index);
if (value)
*value += skb->len;
return 0;
}
but the following program is not:
int bpf_prog1(struct __sk_buff *skb)
{
u32 index = load_byte(skb, ETH_HLEN + offsetof(struct iphdr, protocol));
u64 *value = bpf_map_lookup_elem(&my_map, &index);
if (value)
*value += (u64) skb;
return 0;
}
since it would leak the kernel address into the map.
Unprivileged socket filter bpf programs have access to the
following helper functions:
- map lookup/update/delete (but they cannot store kernel pointers into them)
- get_random (it's already exposed to unprivileged user space)
- get_smp_processor_id
- tail_call into another socket filter program
- ktime_get_ns
The feature is controlled by sysctl kernel.unprivileged_bpf_disabled.
This toggle defaults to off (0), but can be set true (1). Once true,
bpf programs and maps cannot be accessed from unprivileged process,
and the toggle cannot be set back to false.
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Fri, 9 Oct 2015 12:53:54 +0000 (14:53 +0200)]
net: HNS: fix MDIO dependencies
The newly introduced HNS_MDIO Kconfig symbol selects 'MDIO', but
that is the wrong symbol as the code used by this driver is
provided by PHYLIB rather than the MDIO driver. Also, there is
no need to make this driver user selectable, because it is already
selected by all drivers that need it.
This changes the Kconfig file to select the correct library, and
to make the option silent.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes:
5b904d39406 ("net: add Hisilicon Network Subsystem MDIO support")
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Pieczko [Fri, 9 Oct 2015 09:40:35 +0000 (10:40 +0100)]
sfc: fully reset if MC_REBOOT event received without warm_boot_count increment
On EF10, MC_CMD_VPORT_RECONFIGURE can cause a CODE_MC_REBOOT event
to be sent to a function without incrementing the (adapter-wide)
warm_boot_count. In this case, the reboot is not detected by the
loop on efx_mcdi_poll_reboot(), so prepare for recovery from an MC
reboot anyway. When this codepath is run, the MC has always just
rebooted, so this recovery is valid.
The loop on efx_mcdi_poll_reboot() is still required for other MC
reboot cases, so that actions in response to an MC reboot are
performed, such as clearing locally calculated statistics.
Siena NICs are unaffected by this change as the above scenario
does not apply.
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 12 Oct 2015 12:20:28 +0000 (05:20 -0700)]
Merge branch 'switchdev_ageing_time'
Scott Feldman says:
====================
switchdev: push bridge ageing_time attribute down
Push bridge-level attributes down to switchdev drivers. This patchset
adds the infrastructure and then pushes, as an example, ageing_time attribute
down from bridge to switchdev (rocker) driver. Add some range-checking
for ageing_time.
RTNETLINK answers: Numerical result out of range
Up until now, switchdev attrs where port-level attrs, so the netdev used in
switchdev_attr_set() would be a switch port or bond of switch ports. With
bridge-level attrs, the netdev passed to switchdev_attr_set() is the bridge
netdev. The same recusive algo is used to visit the leaves of the stacked
drivers to set the attr, it's just in this case we start one layer higher in
the stack. One note is not all ports in the bridge may support setting a
bridge-level attribute, so rather than failing the entire set, we'll skip over
those ports returning -EOPNOTSUPP.
v2->v3: Per Jiri review: push only ageing_time attr down at this time, and
don't pass raw bridge IFLA_BR_* values; rather use new switchdev attr ID for
ageing_time.
v1->v2: rebase w/ net-next
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Scott Feldman [Fri, 9 Oct 2015 02:23:20 +0000 (19:23 -0700)]
rocker: handle setting bridge ageing_time
The FDB cleanup timer will get rescheduled to re-evaluate FDB entries
based on new ageing_time.
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Scott Feldman [Fri, 9 Oct 2015 02:23:19 +0000 (19:23 -0700)]
bridge: push bridge setting ageing_time down to switchdev
Use SWITCHDEV_F_SKIP_EOPNOTSUPP to skip over ports in bridge that don't
support setting ageing_time (or setting bridge attrs in general).
If push fails, don't update ageing_time in bridge and return err to user.
If push succeeds, update ageing_time in bridge and run gc_timer now to
recalabrate when to run gc_timer next, based on new ageing_time.
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Scott Feldman [Fri, 9 Oct 2015 02:23:18 +0000 (19:23 -0700)]
switchdev: skip over ports returning -EOPNOTSUPP when recursing ports
This allows us to recurse over all the ports, skipping over unsupporting
ports. Without the change, the recursion would stop at first unsupported
port.
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Scott Feldman [Fri, 9 Oct 2015 02:23:17 +0000 (19:23 -0700)]
switchdev: add bridge ageing_time attribute
Setting the stage to push bridge-level attributes down to port driver so
hardware can be programmed accordingly. Bridge-level attribute example is
ageing_time. This is a per-bridge attribute, not a per-bridge-port attr.
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Richard Sailer [Fri, 9 Oct 2015 00:41:37 +0000 (02:41 +0200)]
tcp: change type of alive from int to bool
The alive parameter of tcp_orphan_retries, indicates
whether the connection is assumed alive or not.
In the function and all places calling it is used as a boolean value.
Therefore this changes the type of alive to bool in the function
definition and all calling locations.
Since tcp_orphan_tries is a tcp_timer.c local function no change in
any other file or header is necessary.
Signed-off-by: Richard Sailer <richard@weltraumpflege.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Roopa Prabhu [Thu, 8 Oct 2015 17:38:52 +0000 (10:38 -0700)]
bridge: allow adding of fdb entries pointing to the bridge device
This patch enables adding of fdb entries pointing to the bridge device.
This can be used to propagate mac address of vlan interfaces
configured on top of the vlan filtering bridge.
Before:
$bridge fdb add 44:38:39:00:27:9f dev bridge
RTNETLINK answers: Invalid argument
After:
$bridge fdb add 44:38:39:00:27:9f dev bridge
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 8 Oct 2015 18:16:48 +0000 (11:16 -0700)]
tcp: fix RFS vs lockless listeners
Before recent TCP listener patches, we were updating listener
sk->sk_rxhash before the cloning of master socket.
children sk_rxhash was therefore correct after the normal 3WHS.
But with lockless listener, we no longer dirty/change listener sk_rxhash
as it would be racy.
We need to correctly update the child sk_rxhash, otherwise first data
packet wont hit correct cpu if RFS is used.
Fixes:
079096f103fa ("tcp/dccp: install syn_recv requests into ehash table")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Willem de Bruijn <willemb@google.com>
Cc: Tom Herbert <tom@herbertland.com>
Acked-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 11 Oct 2015 12:28:57 +0000 (05:28 -0700)]
Merge branch 'dsa-next'
Vivien Didelot says:
====================
net: dsa: push switchdev prepare phase in FDB ops
This patchset pushes the switchdev prepare phase for the FDB add and del
operations down to the DSA drivers. Currently only mv88e6xxx is affected.
Since the dump requires a bit of refactoring in the driver, it'll come in a
future patchset.
Changes in v2:
* forward declare switchdev structs instead of fixing the dsa.h include.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Thu, 8 Oct 2015 15:35:14 +0000 (11:35 -0400)]
net: dsa: use switchdev obj in port_fdb_del
For consistency with the FDB add operation, propagate the
switchdev_obj_port_fdb structure in the DSA drivers.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Thu, 8 Oct 2015 15:35:13 +0000 (11:35 -0400)]
net: dsa: push prepare phase in port_fdb_add
Now that the prepare phase is pushed down to the DSA drivers, propagate
it to the port_fdb_add function.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Thu, 8 Oct 2015 15:35:12 +0000 (11:35 -0400)]
net: dsa: add port_fdb_prepare
Push the prepare phase for FDB operations down to the DSA drivers, with
a new port_fdb_prepare function. Currently only mv88e6xxx is affected.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Ringle [Thu, 8 Oct 2015 15:10:19 +0000 (11:10 -0400)]
net: encx24j600: Fix typos in Kconfig
Signed-off-by: Jon Ringle <jringle@gridpoint.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 11 Oct 2015 12:15:30 +0000 (05:15 -0700)]
Merge branch 'for-upstream' of git://git./linux/kernel/git/bluetooth/bluetooth-next
Johan Hedberg says:
====================
pull request: bluetooth-next 2015-10-08
Here's another set of Bluetooth & 802.15.4 patches for the 4.4 kernel.
802.15.4:
- Many improvements & fixes to the mrf24j40 driver
- Fixes and cleanups to nl802154, mac802154 & ieee802154 code
Bluetooth:
- New chipset support in btmrvl driver
- Fixes & cleanups to btbcm, btmrvl, bpa10x & btintel drivers
- Support for vendor specific diagnostic data through common API
- Cleanups to the 6lowpan code
- New events & message types for monitor channel
Please let me know if there are any issues pulling. Thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
wangweidong [Thu, 8 Oct 2015 10:03:47 +0000 (18:03 +0800)]
BNX2: fix a Null Pointer for stats_blk
we have two processes to do:
P1#: ifconfig eth0 down; which will call bnx2_close, then will
, and set Null to stats_blk
P2#: ifconfig eth0; which will call bnx2_get_stats64, it will
use stats_blk.
In one case:
--P1#-- --P2#--
stats_blk(no null)
bnx2_free_mem
->bp->stats_blk = NULL
GET_64BIT_NET_STATS
then it will cause 'NULL Pointer' Problem.
it is as well with 'ethtool -S ethx'.
Allocate the statistics block at probe time so that this problem is
impossible
Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 8 Oct 2015 12:01:55 +0000 (05:01 -0700)]
net: synack packets can be attached to request sockets
selinux needs few changes to accommodate fact that SYNACK messages
can be attached to a request socket, lacking sk_security pointer
(Only syncookies are still attached to a TCP_LISTEN socket)
Adds a new sk_listener() helper, and use it in selinux and sch_fq
Fixes:
ca6fb0651883 ("tcp: attach SYNACK messages to request sockets instead of listener")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported by: kernel test robot <ying.huang@linux.intel.com>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Eric Paris <eparis@parisplace.org>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hariprasad Shenai [Thu, 8 Oct 2015 04:38:23 +0000 (10:08 +0530)]
cxgb4: Enhance driver to update FW, when FW is too old
t4_check_fw_version() can return several error codes (-EINVAL, -EBUSY,
-EAGAIN). The present code sets the adapter state to UNINIT only if its
an EFAULT. In all the error cases set the adapter to uninitialized state.
In t4_check_fw_version() if call to t4_get_fw_version() fails, repeat the
operation a few times before returning failure.
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexei Starovoitov [Wed, 7 Oct 2015 17:55:41 +0000 (10:55 -0700)]
bpf: fix cb access in socket filter programs
eBPF socket filter programs may see junk in 'u32 cb[5]' area,
since it could have been used by protocol layers earlier.
For socket filter programs used in af_packet we need to clean
20 bytes of skb->cb area if it could be used by the program.
For programs attached to TCP/UDP sockets we need to save/restore
these 20 bytes, since it's used by protocol layers.
Remove SK_RUN_FILTER macro, since it's no longer used.
Long term we may move this bpf cb area to per-cpu scratch, but that
requires addition of new 'per-cpu load/store' instructions,
so not suitable as a short term fix.
Fixes:
d691f9e8d440 ("bpf: allow programs to write to certain skb fields")
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 9 Oct 2015 14:52:35 +0000 (07:52 -0700)]
Merge branch 'net-non-modular'
Paul Gortmaker says:
====================
make non-modular code explicitly non-modular
[v2: drop m68k patches that Geert converted to modules; add one ARM
driver patch ; update net-next baseline to today; switch to ARM
for build testing.]
In a previous merge window, we made changes to allow better
delineation between modular and non-modular code in commit
0fd972a7d91d6e15393c449492a04d94c0b89351 ("module: relocate module_init
from init.h to module.h"). This allows us to now ensure module code
looks modular and non-modular code does not accidentally look modular
just to avoid suffering build breakage.
Here we target code that is, by nature of their Makefile and/or
Kconfig settings, only available to be built-in, but implicitly
presenting itself as being possibly modular by way of using modular
headers, macros, and functions.
The goal here is to remove that illusion of modularity from these
files, but in a way that leaves the actual runtime unchanged.
In doing so, we remove code that has never been tested and adds
no value to the tree. And we continue the process of expecting a
level of consistency between the Kconfig/Makefile of code and the
code in use itself.
Fortuntately the net subsystem has relatively few instances, given
the overall amount of code and drivers it contains. For comparison
there are over 300 instances tree wide, resulting in a possible net
removal of on the order of 5000 lines of unused code.
Build tested on net-next from today, on ARM, since that is the arch
where the one ethernet driver changed here is available.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Paul Gortmaker [Wed, 7 Oct 2015 21:27:46 +0000 (17:27 -0400)]
drivers/net/ethernet: make ti/cpsw-phy-sel.c explicitly non-modular
The Kconfig currently controlling compilation of this code is:
drivers/net/ethernet/ti/Kconfig:config TI_CPSW_PHY_SEL
drivers/net/ethernet/ti/Kconfig: bool "TI CPSW Switch Phy sel Support"
...meaning that it currently is not being built as a module by anyone.
Lets remove the couple traces of modularity so that when reading the
driver there is no doubt it is builtin-only.
Since module_platform_driver() uses the same init level priority as
builtin_platform_driver() the init ordering remains unchanged with
this commit.
Also note that MODULE_DEVICE_TABLE is a no-op for non-modular code.
We also delete the MODULE_LICENSE tag etc. since all that information
was (or is now) contained at the top of the file in the comments.
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Varka Bhadram <varkabhadram@gmail.com>
Cc: netdev@vger.kernel.org
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paul Gortmaker [Wed, 7 Oct 2015 21:27:45 +0000 (17:27 -0400)]
net/sched: make sch_blackhole.c explicitly non-modular
The Kconfig currently controlling compilation of this code is:
net/sched/Kconfig:menuconfig NET_SCHED
net/sched/Kconfig: bool "QoS and/or fair queueing"
...meaning that it currently is not being built as a module by anyone.
Lets remove the modular code that is essentially orphaned, so that
when reading the driver there is no doubt it is builtin-only.
Since module_init translates to device_initcall in the non-modular
case, the init ordering remains unchanged with this commit. We can
change to one of the other priority initcalls (subsys?) at any later
date, if desired.
We also delete the MODULE_LICENSE tag since all that information
is already contained at the top of the file in the comments.
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paul Gortmaker [Wed, 7 Oct 2015 21:27:44 +0000 (17:27 -0400)]
net/dcb: make dcbnl.c explicitly non-modular
The Kconfig currently controlling compilation of this code is:
net/dcb/Kconfig:config DCB
net/dcb/Kconfig: bool "Data Center Bridging support"
...meaning that it currently is not being built as a module by anyone.
Lets remove the modular code that is essentially orphaned, so that
when reading the driver there is no doubt it is builtin-only.
Since module_init translates to device_initcall in the non-modular
case, the init ordering remains unchanged with this commit. We can
change to one of the other priority initcalls (subsys?) at any later
date, if desired.
We also delete the MODULE_LICENSE tag etc. since all that information
is (or is now) already contained at the top of the file in the comments.
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: Anish Bhatt <anish@chelsio.com>
Cc: John Fastabend <john.r.fastabend@intel.com>
Cc: Shani Michaeli <shanim@mellanox.com>
Cc: netdev@vger.kernel.org
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paul Gortmaker [Wed, 7 Oct 2015 21:27:43 +0000 (17:27 -0400)]
net/core: make sock_diag.c explicitly non-modular
The Makefile currently controlling compilation of this code lists
it under "obj-y" ...meaning that it currently is not being built as
a module by anyone.
Lets remove the modular code that is essentially orphaned, so that
when reading the driver there is no doubt it is builtin-only.
Since module_init translates to device_initcall in the non-modular
case, the init ordering remains unchanged with this commit. We can
change to one of the other priority initcalls (subsys?) at any later
date, if desired.
We can't remove module.h since the file uses other module related
stuff even though it is not modular itself.
We move the information from the MODULE_LICENSE tag to the top of the
file, since that information is not captured anywhere else. The
MODULE_ALIAS_NET_PF_PROTO becomes a no-op in the non modular case, so
it is removed.
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Craig Gallek <kraig@google.com>
Cc: netdev@vger.kernel.org
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 9 Oct 2015 14:49:08 +0000 (07:49 -0700)]
Merge branch 'net-bool'
Yaowei Bai says:
====================
net: small improvement
This patchset makes several functions in net return bool to improve
readability and/or simplicity because these functions only use one
or zero as their return value.
No functional changes.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Yaowei Bai [Thu, 8 Oct 2015 13:29:02 +0000 (21:29 +0800)]
net/core: lockdep_rtnl_is_held can be boolean
This patch makes lockdep_rtnl_is_held return bool due to this
particular function only using either one or zero as its return
value.
In another patch lockdep_is_held is also made return bool.
No functional change.
Signed-off-by: Yaowei Bai <bywxiaobai@163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yaowei Bai [Thu, 8 Oct 2015 13:29:01 +0000 (21:29 +0800)]
net/inetdevice: bad_mask can be boolean
This patch makes bad_mask return bool due to this particular function
only using either one or zero as its return value.
No functional change.
Signed-off-by: Yaowei Bai <bywxiaobai@163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yaowei Bai [Thu, 8 Oct 2015 13:29:00 +0000 (21:29 +0800)]
net/inetdevice: inet_ifa_match can be boolean
This patch makes inet_ifa_match return bool due to this
particular function only using either one or zero as its return
value.
No functional change.
Signed-off-by: Yaowei Bai <bywxiaobai@163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yaowei Bai [Thu, 8 Oct 2015 13:28:59 +0000 (21:28 +0800)]
net/dccp: dccp_bad_service_code can be boolean
This patch makes dccp_bad_service_code return bool due to these
particular functions only using either one or zero as their return
value.
dccp_list_has_service is also been made return bool in this patchset.
No functional change.
Signed-off-by: Yaowei Bai <bywxiaobai@163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yaowei Bai [Thu, 8 Oct 2015 13:28:58 +0000 (21:28 +0800)]
net/dccp: dccp_list_has_service can be boolean
This patch makes dccp_list_has_service return bool due to this
particular function only using either one or zero as its return
value.
No functional change.
Signed-off-by: Yaowei Bai <bywxiaobai@163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yaowei Bai [Thu, 8 Oct 2015 13:28:57 +0000 (21:28 +0800)]
net/can: can_dropped_invalid_skb can be boolean
This patch makes can_dropped_invalid_skb return bool due to this
particular function only using either one or zero as its return
value.
No functional change.
Signed-off-by: Yaowei Bai <bywxiaobai@163.com>
Acked-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yaowei Bai [Thu, 8 Oct 2015 13:28:56 +0000 (21:28 +0800)]
net/nfnetlink: lockdep_nfnl_is_held can be boolean
This patch makes lockdep_nfnl_is_held return bool to improve
readability due to this particular function only using either
one or zero as its return value.
No functional change.
Signed-off-by: Yaowei Bai <bywxiaobai@163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yaowei Bai [Thu, 8 Oct 2015 13:28:55 +0000 (21:28 +0800)]
net/ieee80211: ieee80211_is_* can be boolean
This patch makes ieee80211_is_* return bool to improve
readability due to these particular functions only using either
one or zero as their return value.
No functional change.
Signed-off-by: Yaowei Bai <bywxiaobai@163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yaowei Bai [Thu, 8 Oct 2015 13:28:54 +0000 (21:28 +0800)]
net/netlink: lockdep_genl_is_held can be boolean
This patch makes lockdep_genl_is_held return bool to improve
readability due to this particular function only using either
one or zero as its return value.
No functional change.
Signed-off-by: Yaowei Bai <bywxiaobai@163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 9 Oct 2015 14:28:38 +0000 (07:28 -0700)]
Merge branch 'mlx-next'
Or Gerlitz says:
==============================
Mellanox driver update for net-next
Some small fixes and small enhancements from the team.
Series applies over net-next commit
acb4a6b "tcp: ensure prior synack rtx behavior
with small backlogs".
==============================
Signed-off-by: David S. Miller <davem@davemloft.net>
Saeed Mahameed [Thu, 8 Oct 2015 14:14:03 +0000 (17:14 +0300)]
net/mlx4_core: Fix resource tracker error flow in add_res_range
The 'for' loop when undoing rb-tree insertions and list-adds in the
error flow in add_res_range had errors, fix them.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jack Morgenstein [Thu, 8 Oct 2015 14:14:02 +0000 (17:14 +0300)]
net/mlx4_core: Fix mailbox leak in error flow when performing update qp
The procedure mlx4_update_qp leaks mailboxes in its error-flow, fix that.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Shamay [Thu, 8 Oct 2015 14:14:01 +0000 (17:14 +0300)]
net/mlx4_en: Add steering rules after RSS creation
Changed the receive control flow in a way that steering
rules are added only when the RSS object is already in RTR/RTS mode.
Some optimization features, which are enabled by the device firmware,
require this condition in order to be effective.
Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eli Cohen [Thu, 8 Oct 2015 14:14:00 +0000 (17:14 +0300)]
net/mlx5_core: Use private health thread for each device
Use a single threaded work queue for each device in the system instead of
using one thread for any device. This is required so we can concurrently
process system error handling for all the devices that need that.
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eli Cohen [Thu, 8 Oct 2015 14:13:59 +0000 (17:13 +0300)]
net/mlx5_core: Use accessor functions to read from device memory
Use ioread function to read health buffer data. In addition, print the
firmware version as a string for readability and also use dev_err to have
the device string to be printed.
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eli Cohen [Thu, 8 Oct 2015 14:13:58 +0000 (17:13 +0300)]
net/mlx5_core: Prepare cmd interface to system errors handling
In preparation to handling system errors at the mlx5_core level, change the
interface of cmd_work_handler to accept a 64 bit argument for the vector.
This allows to encode a flag that signifies when the handler is called
as a result of a driver logic that wishes to terminate commands that
the hardware may not be able to terminate. Such command completions
are detected at the handler and proper return status is encoded.
To be able to terminate page handler commands, we make sure to set
the corresponding bit in the bitmask.
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eli Cohen [Thu, 8 Oct 2015 14:13:57 +0000 (17:13 +0300)]
net/mlx5_core: Improve mlx5 messages
Improve the messages printed by the mlx5 macros to include the device
string. In addition, prefix names used by the macros with two underscores
to avoid possible name collisions.
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 9 Oct 2015 14:05:49 +0000 (07:05 -0700)]
Merge branch 'master' of git://git./linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates 2015-10-08
This series contains updates to i40e and i40evf only (again).
Jesse fixes an issue where the driver was issuing a WARN_ON during ring
size changes because the code was cloning the rx_ring struct but not
zeroing out the pointers before allocating new memory, so simply zero
out the pointers. Also reduced the function call overhead by moving
the interrupt enable function by moving it to the header file, which it
in turn allows us to inline it. Also does a thorough job of code
cleanup to fix spaces after declarations, remove unnecessary braces
and breaks, remove another __func__ use and general code tidiness.
Mitch adds mover verbose error messages when the number of supported VFs
is reported in driver init and it different from the number reported in
config space. Updated the VF driver to now detect a reset with the
VF_ARQLEN register since the enable bit is cleared when the VF is reset
and it stays cleared until the VF driver processes the reset and
re-enables the admin queue which is more reliable than using the
VFGEN_RSTAT as previously.
Neerav adds parsing for CEE DCBx TLVs from the LLDP MIB since there is
a need to get the CEE DesiredCfg Tx by firmware and DCB configuration
Rx from peer for debug and other application purposes.
Carolyn fixes a problem where the PF's Flow Director filter table would
have an entry that the hardware was unable to add, when this occurs an
invalid entry gets replayed and a valid one is lost.
Matt fixes an issue where multiple link up messages can be logged
resulting from admin queue link status timing when link properties are
changed.
Shannon adds the ability to control the period link polling through
ethtool to be able to switch it off and on for debugging link issues.
Serey explicitly assigns the enum index for each VSI type so that the PF
and VF always reference to the same VSI type event if the enum lists
are different.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jesse Brandeburg [Fri, 28 Aug 2015 21:55:59 +0000 (17:55 -0400)]
i40e: print neato new features
To help users and developers know what compile options
and hardware features are enabled at compile time, print
VxLAN is available.
Change-ID: I3162f3b7678dc725a597f964217920eb218b480b
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Mitch Williams [Fri, 28 Aug 2015 21:55:58 +0000 (17:55 -0400)]
i40e/i40evf: pass QOS handle to VF
The VF really doesn't care about the QOS handle but it will in the
future. Since the VF only uses TC0, send it that handle. On the VF
side, save the handle and use it to populate the QOS params when we call
into the client interface.
Change-ID: I76f41b070baeaa09b19383e9168bc677837e0761
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Mitch Williams [Fri, 28 Aug 2015 21:55:57 +0000 (17:55 -0400)]
i40evf: use capabilities flags properly
Use the capabilities passed to us by the PF driver to control VF driver
behavior. In the process, clean up the VLAN add/remove code so it's not
a horrible morass of ifdefs.
Change-ID: I1050eaf12b658a26fea6813047c9964163c70a73
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jesse Brandeburg [Fri, 28 Aug 2015 21:55:56 +0000 (17:55 -0400)]
i40e: refactor code to remove indent
I found a code indent that was avoidable because a whole function is inside
an if block, reverse the if and move the code back a tab.
Change-ID: I9989c8750ee61678fbf96a3b0fd7bf7cc7ef300a
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jesse Brandeburg [Fri, 28 Aug 2015 21:55:54 +0000 (17:55 -0400)]
i40e/i40evf: clean up some code
Add missings spaces after declarations, remove another __func__ use,
remove uncessary braces, remove unneeded breaks, and useless returns,
and generally fix up some code.
Change-ID: Ie715d6b64976c50e1c21531685fe0a2bd38c4244
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Mitch Williams [Fri, 28 Aug 2015 21:55:53 +0000 (17:55 -0400)]
i40evf: detect reset more reliably
Using VFGEN_RSTAT to detect a VF reset is an endeavor that is fraught
with peril. It's entirely too easy to miss a reset because none of the
bits are sticky. By the time the VF driver reads the register, the reset
may have been processed and cleaned up by the PF driver, leaving the
register in the same state that it was before the reset.
Instead, detect a reset with the VF_ARQLEN register. When the VF is
reset, the enable bit in this register is cleared, and it stays cleared
until the VF driver processes the reset and re-enables the admin queue.
Because we now deal with multiple registers in the reset and watchdog
tasks, rename the rstat_val variable to reg_val.
Change-ID: Id1df17045c0992e607da0162d31807f7fc20d199
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Greg Bowers [Fri, 28 Aug 2015 21:55:52 +0000 (17:55 -0400)]
i40e: Support FW CEE DCB UP to TC map nibble swap
Changes parsing of AQ command Get CEE DCBX OPER CFG (0x0A07). Change is
required because FW creates the oper_prio_tc nibbles reversed from those
in the CEE Priority Group sub-TLV.
Change-ID: I7d9d8641bb430d30e286fc3fac909866ef8a0de8
Signed-off-by: Greg Bowers <gregory.j.bowers@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Serey Kong [Thu, 27 Aug 2015 15:42:41 +0000 (11:42 -0400)]
i40e/i40evf: Explicitly assign enum index for VSI type
Ran into an issue where PF's VSI type list was different from VF's,
which was resulted in different enum index. The VSI type list can
be different depending on what build flag is used for PF and VF.
The change is to explicitly assign enum index for each VSI type
so that PF and VF always reference to the same VSI type event if the
enum lists are different.
Change-ID: I8c0e5fdb515f324f7964df863a458073cf467e57
Signed-off-by: Serey Kong <serey.kong@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Shannon Nelson [Thu, 27 Aug 2015 15:42:40 +0000 (11:42 -0400)]
i40e: add switch for link polling
There's been some need for controlling the periodic link polling for
debugging link issues. This patch enables switching it off and on
through an ethtool private flag. The link poll remains on by default,
but can be turned off with
ethtool --set-priv-flags p261p1 LinkPolling off
and later turned back on with
ethtool --set-priv-flags p261p1 LinkPolling on
To check the current status, use
ethtool --show-priv-flags p261p1
Change-ID: I32e4ab654ff3eec90a06cf144899971b82d71c40
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Matt Jared [Thu, 27 Aug 2015 15:42:39 +0000 (11:42 -0400)]
i40e: Fix multiple link up messages
This patch addresses an issue where multiple link up messages can be logged
resulting from aq link status timing when link properties are changed (fc,
speed, etc.); solved by using a single function to handle status printing
and adding a mechanism to track whether link state (up or down) has
actually changed.
Change-ID: Ied6ed6e49dc397c77d992adc0bc9ed3767152b9d
Signed-off-by: Matt Jared <matthew.a.jared@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Carolyn Wyborny [Thu, 27 Aug 2015 15:42:38 +0000 (11:42 -0400)]
i40e: Fix for extra Flow Director filter in table after error
This patch fixes a problem where the PF's fdir filter table would have an
entry that the hw was unable to add. This notification happens in the hot
path, so instead of trying to fix it then, we note the location in the
failure case and delete it during regular fdir subtask callback. Without
this patch, a case can occur where an invalid entry gets replayed and a
valid one is not.
Change-ID: I67831c183b5d0309876de807cc434809b74c9cb7
Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Neerav Parikh [Thu, 27 Aug 2015 15:42:37 +0000 (11:42 -0400)]
i40e/i40evf: Store CEE DCBX DesiredCfg and RemoteCfg
This patch adds capability to query and store the CEE DCBX DesiredCfg
and RemoteCfg data from the LLDP MIB.
Added new member "desired_dcbx_config" in the i40e_hw data structure
to hold CEE only DesiredCfg data.
Change-ID: I19c550369594384eaff4cc63e690ca740231195d
Signed-off-by: Neerav Parikh <neerav.parikh@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>