Michael J. Ruhl [Fri, 26 May 2017 12:35:25 +0000 (05:35 -0700)]
IB/hfi1: Name function prototype parameters for affinity module
To improve the readability of function prototypes, give the parameters
names in the affinity module.
Reviewed-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sebastian Sanchez [Fri, 26 May 2017 12:35:18 +0000 (05:35 -0700)]
IB/hfi1: Optimize cachelines for user SDMA request structure
The current user SDMA request structure layout has holes.
The cachelines can be reduced to improve cacheline trading.
Separate fields in the following categories: mostly read,
writable and shared with interrupt.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sebastian Sanchez [Fri, 26 May 2017 12:35:12 +0000 (05:35 -0700)]
IB/hfi1: Don't remove RB entry when not needed.
An RB tree is used for the SDMA pinning cache. Cache
entries are extracted and reinserted from the tree
in case the address range for it changes. However,
if the address range for the entry doesn't change,
deleting the entry from the RB tree is not necessary.
This affects performance since the tree needs to be
rebalanced for each insertion, and this happens in
the hot path. Optimize RB search by not removing
entries when it's not needed.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Mike Marciniszyn [Fri, 12 May 2017 16:20:31 +0000 (09:20 -0700)]
IB/rdmavt: Compress adjacent SGEs in rvt_lkey_ok()
SGEs that are contiguous needlessly consume driver dependent TX resources.
The lkey validation logic is enhanced to compress the SGE that ends
up in the send wqe when consecutive addresses are detected.
The lkey validation API used to return 1 (success) or 0 (fail).
The return value is now an -errno, 0 (compressed), or 1 (uncompressed). A
additional argument is added to pass the last SQE for the compression.
Loopback callers always pass a NULL to last_sge since the optimization is
of little benefit in that situation.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Brian Welty <brian.welty@intel.com>
Signed-off-by: Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Don Hiatt [Fri, 12 May 2017 16:20:20 +0000 (09:20 -0700)]
IB/hfi1: Setup common IB fields in hfi1_packet struct
We move many common IB fields into the hfi1_packet structure and
set them up in a single function. This allows us to set the fields
in a single place and not deal with them throughout the driver.
Reviewed-by: Brian Welty <brian.welty@intel.com>
Reviewed-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Don Hiatt [Fri, 12 May 2017 16:20:08 +0000 (09:20 -0700)]
IB/hfi1: Separate input/output header tracing
Calls to trace incoming packets will now receive the packet
context as parameter. This enables trace support for future
packet types.
Header trace output is in the format <field>:<value>
which makes parsing easier.
input_ibhdr trace before change:
<idle>-0 [001] d.h. 5904.250925: input_ibhdr: [0000:05:00.0] vl 0
lver 0 sl 0 lnh 2,LRH_BTH dlid 0002 len 18 slid 0001 op
0x64,UD_SEND_ONLY se 0 m 0 pad 0 tver 0 pkey 0xffff f 0 b 0 qpn 0x000001
a 0 psn 0x000001b2 deth qkey 0x80010000 sqpn 0x000001
input_ibhdr trace after change:
<idle>-0 [001] d.h. 6655.714488: input_ibhdr: [0000:05:00.0] (IB)
len:124 sc:0 dlid:0x0001 slid:0x0002 lnh:2,LRH_BTH lver:0 sl:0 age:0
becn:0 fecn:0 l4:0 rc:0 entropy:0 op:0x64,UD_SEND_ONLY se:0 m:0 pad:0
tver:0 pkey:0x7fff f:0 b:0 qpn:0x000001 a:0 psn:0x00000036 hlen:8 deth
qkey:0x80010000 sqpn:0x000001
Reviewed-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Don Hiatt [Fri, 12 May 2017 16:19:55 +0000 (09:19 -0700)]
IB/hfi1: Add functions to parse BTH/IB headers
Improve code readablity by adding inline functions
to read specific BTH/IB fields without knowledge of
byte offsets.
Reviewed-by: Brian Welty <brian.welty@intel.com>
Reviewed-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Ira Weiny [Fri, 12 May 2017 16:19:47 +0000 (09:19 -0700)]
IB/hfi1: Remove unused mk_qpn function
Leftover function that is not used. Remove it.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sebastian Sanchez [Fri, 12 May 2017 16:19:36 +0000 (09:19 -0700)]
IB/hfi1: Remove unnecessary initialization from tx request
The tx request is unnecessarily initialized in the hot
code path with memset(), however, there's no need to do
this as most fields are initialized later on. this
initialization shows to be costly in the profile.
Remove unnecessary initialization from tx request and make
sure all variables are initialized properly.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Doug Ledford [Tue, 27 Jun 2017 20:55:59 +0000 (16:55 -0400)]
Merge branch 'k.o/for-4.12-rc' into k.o/for-4.13-mlx-shared
Saeed Mahameed [Thu, 15 Jun 2017 11:35:32 +0000 (14:35 +0300)]
net/mlx4_en: Optimized single ring steering
Avoid touching RX QP RSS context when loading with only
one RX ring, to allow optimized A0 RX steering.
Enable by:
- loading mlx4_core with module param: log_num_mgm_entry_size = -6.
- then: ethtool -L <interface> rx 1
Performance tests:
Tested on ConnectX3Pro, Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
XDP_DROP packet rate:
-------------------------------------
| Before | After | Gain |
IPv4 | 20.5 Mpps | 28.1 Mpps | 37% |
IPv6 | 18.4 Mpps | 28.1 Mpps | 53% |
-------------------------------------
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Cc: kernel-team@fb.com
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tariq Toukan [Thu, 15 Jun 2017 11:35:31 +0000 (14:35 +0300)]
net/mlx4_en: Remove unused argument in TX datapath function
Remove owner argument, as it is obsolete and unused.
This also saves the overhead of calculating its value in data-path.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Cc: kernel-team@fb.com
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gustavo A. R. Silva [Thu, 15 Jun 2017 19:56:21 +0000 (14:56 -0500)]
atm: solos-pci: remove useless variable assignments
Value assigned to variable _data32_ at lines 1254 and 1257 is
overwritten at line 1260 before it can be used. This makes
such variable assignments useless.
Addresses-Coverity-ID:
1227049
Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Thu, 15 Jun 2017 19:06:54 +0000 (15:06 -0400)]
net: dsa: assign default CPU port to all ports
The current code only assigns the default cpu_dp to all user ports of
the switch to which the CPU port belongs. The user ports of the other
switches of the fabric thus don't have a default CPU port.
This patch fixes this by assigning the cpu_dp of all user ports of all
switches of the fabric when the tree is fully parsed.
Fixes:
a29342e73911 ("net: dsa: Associate slave network device with CPU port")
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 15 Jun 2017 18:31:56 +0000 (14:31 -0400)]
Merge branch 'r8152-support-new-chips'
Hayes Wang says:
====================
r8152: support new chips
These patches are used to support new chips.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
hayeswang [Thu, 15 Jun 2017 06:44:04 +0000 (14:44 +0800)]
r8152: add byte_enable for ocp_read_word function
Add byte_enable for ocp_read_word() to replace reading 4
bytes data with reading the desired 2 bytes data.
This is used to avoid the issue which is described in
commit
b4d99def0938 ("r8152: remove sram_read"). The
original method always reads 4 bytes data, and it may
have problem when reading the PHY registers.
The new method is supported since RTL8153B, but it
doesn't influence the previous chips. The bits of the
byte_enable for the previous chips are the reserved
bits, and the hw would ignore them.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
hayeswang [Thu, 15 Jun 2017 06:44:03 +0000 (14:44 +0800)]
r8152: support RTL8153B
This patch supports two new chips for RTL8153B.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
hayeswang [Thu, 15 Jun 2017 06:44:02 +0000 (14:44 +0800)]
r8152: support new chip 8050
The settings of the new chip are the same with RTL8152, except that
its product ID is 0x8050.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 15 Jun 2017 18:29:01 +0000 (14:29 -0400)]
Merge branch 'ibmvnic-LPM-bug-fixes'
Thomas Falcon says:
====================
ibmvnic: LPM bug fixes
This series of small patches is meant to resolve a number of
bugs, mostly occurring during an ibmvnic driver reset when
recovering from a logical partition migration (LPM).
The first patch ensures that RX buffer pools are properly
activated following an adapter reset by setting the proper
flag in the pool data structure.
The second patch uses netif_tx_disable to stop TX queues when
closing the device during a reset.
Third, fixup a typo that resulted in partial sanitization of
TX/RX descriptor queues following a device reset.
Fourth, remove an ambiguous conditional check that was resulting
in a kernel panic as null RX/TX completion descriptors were being
processed during napi polling while the device is closing.
Finally, fix a condition where the napi polling routine exits
before it has completed its work budget without notifying the
upper network layers. This omission could result in the
napi_disable function sleeping indefinitely under certain conditions.
v2: Attempt to provide a proper cover letter
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Falcon [Thu, 15 Jun 2017 04:50:09 +0000 (23:50 -0500)]
ibmvnic: Exit polling routine correctly during adapter reset
This patch fixes a bug where, in the case of a device reset,
the polling routine will never complete, causing napi_disable
to sleep indefinitely when attempting to close the device.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Falcon [Thu, 15 Jun 2017 04:50:08 +0000 (23:50 -0500)]
ibmvnic: Remove VNIC_CLOSING check from pending_scrq
Fix a kernel panic resulting from data access of a NULL
pointer during device close. The pending_scrq routine is
meant to determine whether there is a valid sub-CRQ message
awaiting processing. When the device is closing, however,
there is a possibility that NULL messages can be processed
because pending_scrq will always return 1 even if there
no valid message in the queue.
It's not clear what this closing state check was originally
meant to accomplish, so just remove it.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Falcon [Thu, 15 Jun 2017 04:50:07 +0000 (23:50 -0500)]
ibmvnic: Sanitize entire SCRQ buffer on reset
Fixup a typo so that the entire SCRQ buffer is cleaned.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Falcon [Thu, 15 Jun 2017 04:50:06 +0000 (23:50 -0500)]
ibmvnic: Ensure that TX queues are disabled in __ibmvnic_close
Use netif_tx_disable to guarantee that TX queues are disabled
when __ibmvnic_close is called by the device reset routine.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Falcon [Thu, 15 Jun 2017 04:50:05 +0000 (23:50 -0500)]
ibmvnic: Activate disabled RX buffer pools on reset
RX buffer pools are disabled while awaiting a device
reset if firmware indicates that the resource is closed.
This patch fixes a bug where pools were not being
subsequently enabled after the device reset, causing
the device to become inoperable.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shannon Nelson [Wed, 14 Jun 2017 22:43:37 +0000 (15:43 -0700)]
sunvnet: restrict advertized checksum offloads to just IP
As much as we'd like to play well with others, we really aren't
handling the checksums on non-IP protocol packets very well. This
is easily seen when trying to do TCP over ipv6 - the checksums are
garbage.
Here we restrict the checksum feature flag to just IP traffic so
that we aren't given work we can't yet do.
Orabug:
26175391,
26259755
Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 15 Jun 2017 18:21:04 +0000 (14:21 -0400)]
Merge branch 'sched-act_tunnel_key-UDP-checksusm'
Jiri Benc says:
====================
net: sched: act_tunnel_key: UDP checksums
Currently, the tunnel_key tc action does not set TUNNEL_CSUM, thus
transmitting packets with zero UDP checksum. This is inconsistent with how
we treat non-lwt UDP tunnels where the default is to fill in the UDP
checksum. Non-zero UDP checksum is the better default anyway for various
reasons previously discussed.
Make this configurable for the tunnel_key tc action with the default being
non-zero checksum. Saves a lot of surprises especially with IPv6.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Benc [Wed, 14 Jun 2017 19:19:31 +0000 (21:19 +0200)]
net: sched: act_tunnel_key: make UDP checksum configurable
Allow requesting of zero UDP checksum for encapsulated packets. The name and
meaning of the attribute is "NO_CSUM" in order to have the same meaning of
the attribute missing and being 0.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Benc [Wed, 14 Jun 2017 19:19:30 +0000 (21:19 +0200)]
net: sched: act_tunnel_key: request UDP checksum by default
There's currently no way to request (outer) UDP checksum with
act_tunnel_key. This is problem especially for IPv6. Right now, tunnel_key
action with IPv6 does not work without going through hassles: both sides
have to have udp6zerocsumrx configured on the tunnel interface. This is
obviously not a good solution universally.
It makes more sense to compute the UDP checksum by default even for IPv4.
Just set the default to request the checksum when using act_tunnel_key.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gustavo A. R. Silva [Thu, 15 Jun 2017 02:58:17 +0000 (21:58 -0500)]
net: s2io: remove useless variable in fill_rx_buffers
Remove useless variable rxd_index and code related.
Addresses-Coverity-ID:
1397691
Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 15 Jun 2017 18:07:51 +0000 (14:07 -0400)]
Merge branch 'dsa-prefix-Global-macros'
Vivien Didelot says:
====================
net: dsa: prefix Global macros
This patch series is the 2/3 step of the register definitions cleanup.
It brings no functional changes.
It prefixes and documents all Global (1) registers with MV88E6XXX_G1_
(or a specific model like MV88E6352_G1_STS_PPU_STATE), and prefers a
16-bit hexadecimal representation of the Marvell registers layout.
The next and last patchset will prefix the Global 2 registers.
====================
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Thu, 15 Jun 2017 16:14:06 +0000 (12:14 -0400)]
net: dsa: mv88e6xxx: prefix Global Prio and Tag macros
Prefix and document the remaining Global IP and IEEE Priority and Core
Tag Type registers and give them a clear 16-bit register representation.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Thu, 15 Jun 2017 16:14:05 +0000 (12:14 -0400)]
net: dsa: mv88e6xxx: prefix Global Stats macros
Prefix and document the Global Stats Operation and Counter registers and
give them a clear 16-bit registers representation.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Thu, 15 Jun 2017 16:14:04 +0000 (12:14 -0400)]
net: dsa: mv88e6xxx: prefix Global Monitor Control macros
Prefix and document the Global Monitor Control Register macros
(which became the Global Monitor & MGMT Control Register with
88E6390)
and give a clear 16-bit registers representation.
Use __bf_shf to get the shift value at compile time instead of adding
new defined macros for it.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Thu, 15 Jun 2017 16:14:03 +0000 (12:14 -0400)]
net: dsa: mv88e6xxx: prefix Global Control macros
Prefix and document the Global Control and Control 2 registers macros
and give a clear 16-bit registers representation.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Thu, 15 Jun 2017 16:14:02 +0000 (12:14 -0400)]
net: dsa: mv88e6xxx: prefix Global VTU macros
Prefix and document the Global VTU registers macros and give a clear
16-bit registers representation.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Thu, 15 Jun 2017 16:14:01 +0000 (12:14 -0400)]
net: dsa: mv88e6xxx: prefix Global ATU macros
Prefix and document the Global ATU Registers macros and give clear
16-bit registers representation.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Thu, 15 Jun 2017 16:14:00 +0000 (12:14 -0400)]
net: dsa: mv88e6xxx: prefix Global Switch MAC macros
Prefix and document the Global Switch MAC Address Register macros and
give clear 16-bit register representation.
At the same time, move mv88e6xxx_g1_set_switch_mac in global1.c, where
it belongs.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Thu, 15 Jun 2017 16:13:59 +0000 (12:13 -0400)]
net: dsa: mv88e6xxx: prefix Global Status macros
Prefix and document the Global Status Register macros and give clear
16-bit register representation.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johannes Berg [Wed, 14 Jun 2017 20:17:20 +0000 (22:17 +0200)]
skbuff: make skb_put_zero() return void
It's nicer to return void, since then there's no need to
cast to any structures. Currently none of the users have
a cast, but a number of future conversions do.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 15 Jun 2017 16:12:41 +0000 (12:12 -0400)]
Merge branch 'net-ktls'
Dave Watson says:
====================
net: kernel TLS
This series adds support for kernel TLS encryption over TCP sockets.
A standard TCP socket is converted to a TLS socket using a setsockopt.
Only symmetric crypto is done in the kernel, as well as TLS record
framing. The handshake remains in userspace, and the negotiated
cipher keys/iv are provided to the TCP socket.
We implemented support for this API in OpenSSL 1.1.0, the code is
available at https://github.com/Mellanox/tls-openssl/tree/master
It should work with any TLS library with similar modifications,
a test tool using gnutls is here: https://github.com/Mellanox/tls-af_ktls_tool
RFC patch to openssl:
https://mta.openssl.org/pipermail/openssl-dev/2017-June/009384.html
Changes from V2:
* EXPORT_SYMBOL_GPL in patch 1
* Ensure cleanup code always called before sk_stream_kill_queues to
avoid warnings
Changes from V1:
* EXPORT_SYMBOL GPL in patch 2
* Add link to OpenSSL patch & gnutls example in documentation patch.
* sk_write_pending check was rolled in to wait_for_memory path,
avoids special case and fixes lock inbalance issue.
* Unify flag handling for sendmsg/sendfile
Changes from RFC V2:
* Generic ULP (upper layer protocol) framework instead of TLS specific
setsockopts
* Dropped Mellanox hardware patches, will come as separate series.
Framework will work for both.
RFC V2:
http://www.mail-archive.com/netdev@vger.kernel.org/msg160317.html
Changes from RFC V1:
* Socket based on changing TCP proto_ops instead of crypto framework
* Merged code with Mellanox's hardware tls offload
* Zerocopy sendmsg support added - sendpage/sendfile is no longer
necessary for zerocopy optimization
RFC V1:
http://www.mail-archive.com/netdev@vger.kernel.org/msg88021.html
* Socket based on crypto userspace API framework, required two
sockets in userspace, one encrypted, one unencrypted.
Paper: https://netdevconf.org/1.2/papers/ktls.pdf
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Dave Watson [Wed, 14 Jun 2017 18:37:51 +0000 (11:37 -0700)]
tls: Documentation
Add documentation for the tcp ULP tls interface.
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dave Watson [Wed, 14 Jun 2017 18:37:39 +0000 (11:37 -0700)]
tls: kernel TLS support
Software implementation of transport layer security, implemented using ULP
infrastructure. tcp proto_ops are replaced with tls equivalents of sendmsg and
sendpage.
Only symmetric crypto is done in the kernel, keys are passed by setsockopt
after the handshake is complete. All control messages are supported via CMSG
data - the actual symmetric encryption is the same, just the message type needs
to be passed separately.
For user API, please see Documentation patch.
Pieces that can be shared between hw and sw implementation
are in tls_main.c
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dave Watson [Wed, 14 Jun 2017 18:37:26 +0000 (11:37 -0700)]
tcp: export do_tcp_sendpages and tcp_rate_check_app_limited functions
Export do_tcp_sendpages and tcp_rate_check_app_limited, since tls will need to
sendpages while the socket is already locked.
tcp_sendpage is exported, but requires the socket lock to not be held already.
Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dave Watson [Wed, 14 Jun 2017 18:37:14 +0000 (11:37 -0700)]
tcp: ULP infrastructure
Add the infrustructure for attaching Upper Layer Protocols (ULPs) over TCP
sockets. Based on a similar infrastructure in tcp_cong. The idea is that any
ULP can add its own logic by changing the TCP proto_ops structure to its own
methods.
Example usage:
setsockopt(sock, SOL_TCP, TCP_ULP, "tls", sizeof("tls"));
modules will call:
tcp_register_ulp(&tcp_tls_ulp_ops);
to register/unregister their ulp, with an init function and name.
A list of registered ulps will be returned by tcp_get_available_ulp, which is
hooked up to /proc. Example:
$ cat /proc/sys/net/ipv4/tcp_available_ulp
tls
There is currently no functionality to remove or chain ULPs, but
it should be possible to add these in the future if needed.
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 15 Jun 2017 16:07:16 +0000 (12:07 -0400)]
Merge branch 'Broadcom-DTE-based-PTP-clock'
Arun Parameswaran says:
====================
Add support for Broadcom DTE based PTP clock
This patchset adds support for the DTE based PTP clock for Broadcom SoCs.
The DTE nco based PTP clock can be used in both wired and wireless networks
for precision time-stmaping purposes.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Arun Parameswaran [Mon, 12 Jun 2017 20:26:01 +0000 (13:26 -0700)]
ptp: Add a ptp clock driver for Broadcom DTE
This patch adds a ptp clock driver for the Broadcom SoCs using
the Digital timing Engine (DTE) nco.
Signed-off-by: Arun Parameswaran <arun.parameswaran@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arun Parameswaran [Mon, 12 Jun 2017 20:26:00 +0000 (13:26 -0700)]
dt-binding: ptp: add bindings document for dte based ptp clock
Add device tree binding documentation for the Broadcom DTE
PTP clock driver.
Signed-off-by: Arun Parameswaran <arun.parameswaran@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 15 Jun 2017 15:31:37 +0000 (11:31 -0400)]
Merge git://git./linux/kernel/git/davem/net
The conflicts were two cases of overlapping changes in
batman-adv and the qed driver.
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Thu, 15 Jun 2017 09:09:47 +0000 (18:09 +0900)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) The netlink attribute passed in to dev_set_alias() is not
necessarily NULL terminated, don't use strlcpy() on it. From
Alexander Potapenko.
2) Fix implementation of atomics in arm64 bpf JIT, from Daniel
Borkmann.
3) Correct the release of netdevs and driver private data in certain
circumstances.
4) Sanitize netlink message length properly in decnet, from Mateusz
Jurczyk.
5) Don't leak kernel data in rtnl_fill_vfinfo() netlink blobs. From
Yuval Mintz.
6) Hash secret is never initialized in ipv6 ILA translation code, from
Arnd Bergmann. I guess those clang warnings about unused inline
functions are useful for something!
7) Fix endian selection in bpf_endian.h, from Daniel Borkmann.
8) Sanitize sockaddr length before dereferncing any fields in AF_UNIX
and CAIF. From Mateusz Jurczyk.
9) Fix timestamping for GMAC3 chips in stmmac driver, from Mario
Molitor.
10) Do not leak netdev on dev_alloc_name() errors in mac80211, from
Johannes Berg.
11) Fix locking in sctp_for_each_endpoint(), from Xin Long.
12) Fix wrong memset size on 32-bit in snmp6, from Christian Perle.
13) Fix use after free in ip_mc_clear_src(), from WANG Cong.
14) Fix regressions caused by ICMP rate limiting changes in 4.11, from
Jesper Dangaard Brouer.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (91 commits)
i40e: Fix a sleep-in-atomic bug
net: don't global ICMP rate limit packets originating from loopback
net/act_pedit: fix an error code
net: update undefined ->ndo_change_mtu() comment
net_sched: move tcf_lock down after gen_replace_estimator()
caif: Add sockaddr length check before accessing sa_family in connect handler
qed: fix dump of context data
qmi_wwan: new Telewell and Sierra device IDs
net: phy: Fix MDIO_THUNDER dependencies
netconsole: Remove duplicate "netconsole: " logging prefix
igmp: acquire pmc lock for ip_mc_clear_src()
r8152: give the device version
net: rps: fix uninitialized symbol warning
mac80211: don't send SMPS action frame in AP mode when not needed
mac80211/wpa: use constant time memory comparison for MACs
mac80211: set bss_info data before configuring the channel
mac80211: remove 5/10 MHz rate code from station MLME
mac80211: Fix incorrect condition when checking rx timestamp
mac80211: don't look at the PM bit of BAR frames
i40e: fix handling of HW ATR eviction
...
Linus Torvalds [Thu, 15 Jun 2017 08:54:51 +0000 (17:54 +0900)]
Merge branch 'linus' of git://git./linux/kernel/git/herbert/crypto-2.6
Pull crypto fix from Herbert Xu:
"This fixes a bug on sparc where we may dereference freed stack memory"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: Work around deallocated stack frame reference gcc bug on sparc.
Linus Torvalds [Thu, 15 Jun 2017 08:51:19 +0000 (17:51 +0900)]
Merge tag 'acpi-4.12-rc6' of git://git./linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"These revert an ACPICA commit from the 4.11 cycle that causes problems
to happen on some systems and add a protection against possible kernel
crashes due to table reference counter imbalance.
Specifics:
- Revert a 4.11 ACPICA change that made assumptions which are not
satisfied on some systems and caused the enumeration of resources
to fail on them (Rafael Wysocki).
- Add a mechanism to prevent tables from being unmapped prematurely
due to reference counter overflows (Lv Zheng)"
* tag 'acpi-4.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPICA: Tables: Mechanism to handle late stage acpi_get_table() imbalance
Revert "ACPICA: Disassembler: Enhance resource descriptor detection"
Linus Torvalds [Thu, 15 Jun 2017 08:47:46 +0000 (17:47 +0900)]
Merge tag 'pm-4.12-rc6' of git://git./linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These revert a recent cpufreq schedutil governor change that turned
out to be problematic and fix a few minor issues in cpufreq, cpuidle
and the Exynos devfreq drivers.
Specifics:
- Revert a recent cpufreq schedutil governor change that caused some
systems to behave undesirably (Rafael Wysocki).
- Fix a cpufreq conservative governor issue introduced during the
3.10 cycle that prevents it from working as expected in some
situations (Tomasz Wilczyński).
- Fix an error code path in the generic cpuidle driver for DT-based
systems (Christophe Jaillet).
- Fix three minor issues in devfreq drivers for Exynos (Arvind Yadav,
Krzysztof Kozlowski)"
* tag 'pm-4.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpuidle: dt: Add missing 'of_node_put()'
cpufreq: conservative: Allow down_threshold to take values from 1 to 10
Revert "cpufreq: schedutil: Reduce frequencies slower"
PM / devfreq: exynos-ppmu: Staticize event list
PM / devfreq: exynos-ppmu: Handle return value of clk_prepare_enable
PM / devfreq: exynos-nocp: Handle return value of clk_prepare_enable
Linus Torvalds [Thu, 15 Jun 2017 08:44:41 +0000 (17:44 +0900)]
Merge branch 'for-4.12/driver-matching-fix' of git://git./linux/kernel/git/jikos/hid
Pull HID fix from Jiri Kosina:
- ifdef-based bandaid for a long-standing issue with HID driver
matching, avoiding regressions in cases where specific driver is not
enabled in kernel .config, from Jiri Kosina
* 'for-4.12/driver-matching-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
HID: let generic driver yield control iff specific driver has been enabled
Linus Torvalds [Thu, 15 Jun 2017 08:37:40 +0000 (17:37 +0900)]
Merge tag 'media/v4.12-3' of git://git./linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
- some build dependency issues at CEC core with randconfigs
- fix an off by one error at vb2
- a race fix at cec core
- driver fixes at tc358743, sir_ir and rainshadow-cec
* tag 'media/v4.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
[media] media/cec.h: use IS_REACHABLE instead of IS_ENABLED
[media] cec: race fix: don't return -ENONET in cec_receive()
[media] sir_ir: infinite loop in interrupt handler
[media] cec-notifier.h: handle unreachable CONFIG_CEC_CORE
[media] cec: improve MEDIA_CEC_RC dependencies
[media] vb2: Fix an off by one error in 'vb2_plane_vaddr'
[media] rainshadow-cec: Fix missing spin_lock_init()
[media] tc358743: fix register i2c_rd/wr function fix
Jia-Ju Bai [Wed, 14 Jun 2017 23:35:31 +0000 (16:35 -0700)]
i40e: Fix a sleep-in-atomic bug
The driver may sleep under a spin lock, and the function call path is:
i40e_ndo_set_vf_port_vlan (acquire the lock by spin_lock_bh)
i40e_vsi_remove_pvid
i40e_vlan_stripping_disable
i40e_aq_update_vsi_params
i40e_asq_send_command
mutex_lock --> may sleep
To fixed it, the spin lock is released before "i40e_vsi_remove_pvid", and
the lock is acquired again after this function.
Signed-off-by: Jia-Ju Bai <baijiaju1990@163.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rafael J. Wysocki [Wed, 14 Jun 2017 23:52:32 +0000 (01:52 +0200)]
Merge branch 'acpica-fixes'
* acpica-fixes:
ACPICA: Tables: Mechanism to handle late stage acpi_get_table() imbalance
Revert "ACPICA: Disassembler: Enhance resource descriptor detection"
Rafael J. Wysocki [Wed, 14 Jun 2017 23:51:33 +0000 (01:51 +0200)]
Merge branches 'pm-cpufreq', 'pm-cpuidle' and 'pm-devfreq'
* pm-cpufreq:
cpufreq: conservative: Allow down_threshold to take values from 1 to 10
Revert "cpufreq: schedutil: Reduce frequencies slower"
* pm-cpuidle:
cpuidle: dt: Add missing 'of_node_put()'
* pm-devfreq:
PM / devfreq: exynos-ppmu: Staticize event list
PM / devfreq: exynos-ppmu: Handle return value of clk_prepare_enable
PM / devfreq: exynos-nocp: Handle return value of clk_prepare_enable
David Howells [Wed, 14 Jun 2017 16:56:50 +0000 (17:56 +0100)]
rxrpc: Cache the congestion window setting
Cache the congestion window setting that was determined during a call's
transmission phase when it finishes so that it can be used by the next call
to the same peer, thereby shortcutting the slow-start algorithm.
The value is stored in the rxrpc_peer struct and is accessed without
locking. Each call takes the value that happens to be there when it starts
and just overwrites the value when it finishes.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Weilin Chang [Wed, 14 Jun 2017 16:11:31 +0000 (09:11 -0700)]
liquidio: fix VF driver off-by-one bug when setting ethtool -C ethX rx-frames
Signed-off-by: Weilin Chang <weilin.chang@cavium.com>
Signed-off-by: Derek Chickles <derek.chickles@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jesper Dangaard Brouer [Wed, 14 Jun 2017 11:27:37 +0000 (13:27 +0200)]
net: don't global ICMP rate limit packets originating from loopback
Florian Weimer seems to have a glibc test-case which requires that
loopback interfaces does not get ICMP ratelimited. This was broken by
commit
c0303efeab73 ("net: reduce cycles spend on ICMP replies that
gets rate limited").
An ICMP response will usually be routed back-out the same incoming
interface. Thus, take advantage of this and skip global ICMP
ratelimit when the incoming device is loopback. In the unlikely event
that the outgoing it not loopback, due to strange routing policy
rules, ICMP rate limiting still works via peer ratelimiting via
icmpv4_xrlim_allow(). Thus, we should still comply with RFC1812
(section 4.3.2.8 "Rate Limiting").
This seems to fix the reproducer given by Florian. While still
avoiding to perform expensive and unneeded outgoing route lookup for
rate limited packets (in the non-loopback case).
Fixes:
c0303efeab73 ("net: reduce cycles spend on ICMP replies that gets rate limited")
Reported-by: Florian Weimer <fweimer@redhat.com>
Reported-by: "H.J. Lu" <hjl.tools@gmail.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Wed, 14 Jun 2017 10:41:52 +0000 (13:41 +0300)]
net/mlxfw: fix a NULL dereference
If we hit this error path we end up returning ERR_PTR(0) which is NULL.
The caller is not expecting that so it results in a NULL dereference.
Fixes:
410ed13cae39 ("Add the mlxfw module for Mellanox firmware flash process")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Raju Rangoju [Fri, 9 Jun 2017 16:47:49 +0000 (22:17 +0530)]
rdma/cxgb4: Fix memory leaks during module exit
Fix memory leaks of iw_cxgb4 module in the exit path
Signed-off-by: Raju Rangoju <rajur@chelsio.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Dan Carpenter [Wed, 14 Jun 2017 10:29:31 +0000 (13:29 +0300)]
net/act_pedit: fix an error code
I'm reviewing static checker warnings where we do ERR_PTR(0), which is
the same as NULL. I'm pretty sure we intended to return ERR_PTR(-EINVAL)
here. Sometimes these bugs lead to a NULL dereference but I don't
immediately see that problem here.
Fixes:
71d0ed7079df ("net/act_pedit: Support using offset relative to the conventional network headers")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Amir Vadai <amir@vadai.me>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Wed, 14 Jun 2017 09:48:48 +0000 (11:48 +0200)]
net: use skb_unref() in napi_consume_skb()
The commit
83ada39bb79d ("net: factor out a helper to decrement the
skb refcount") provided and used a helper for decrementing skb usage,
but I missed at least a spot for it.
This change remove some more duplicated code reusing skb_unref() in
napi_consume_skb(), too. The helper uses an additional, unneeded
unlikely(!skb) test - napi_consume_skb() already check it a few lines
above - but the compiler is smart enough to optimize the duplicated
test out.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 14 Jun 2017 19:22:17 +0000 (15:22 -0400)]
Merge branch 'for-upstream' of git://git./linux/kernel/git/bluetooth/bluetooth-next
Johan Hedberg says:
====================
pull request: bluetooth-next 2017-06-14
Here's another batch of Bluetooth patches for the 4.13 kernel:
- Fix for Broadcom controllers not supporting Event Mask Page 2
- New QCA ROME USB ID for btusb
- Fix for Security Manager Protocol to use constant-time memcmp
- Improved support for TI WiLink chips
Please let me know if there are any issues pulling. Thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Wed, 14 Jun 2017 09:10:10 +0000 (12:10 +0300)]
qed: Fix an off by one bug
The p_l2_info->pp_qid_usage[] array has "p_l2_info->queues" elements so
the > here should be a >= or we write beyond the end of the array.
Fixes:
bbe3f233ec5e ("qed: Assign a unique per-queue index to queue-cid")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 14 Jun 2017 19:16:31 +0000 (15:16 -0400)]
Merge branch 'mlxsw-Add-support-for-cable-info-access'
Jiri Pirko says:
====================
mlxsw: Add support for cable info access
Add support for cable info access via ethtool. This is done by accessing
the SFP+/QSFP internal EEPROM.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Arkadi Sharshevsky [Wed, 14 Jun 2017 07:27:40 +0000 (09:27 +0200)]
mlxsw: spectrum: Add support for access cable info via ethtool
Add support for access cable info via ethtool.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arkadi Sharshevsky [Wed, 14 Jun 2017 07:27:39 +0000 (09:27 +0200)]
mlxsw: reg: Add MCIA register for cable info access
The MCIA register is used to access the SFP+ and QSFP connector's
EPROM. It will be used to query the cable info.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Feras Daoud [Wed, 14 Jun 2017 06:59:09 +0000 (09:59 +0300)]
IB/ipoib: Fix memory leak in create child syscall
The flow of creating a new child goes through ipoib_vlan_add
which allocates a new interface and checks the rtnl_lock.
If the lock is taken, restart_syscall will be called to restart
the system call again. In this case we are not releasing the
already allocated interface, causing a leak.
Fixes:
9baa0b036410 ("IB/ipoib: Add rtnl_link_ops support")
Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Alex Vesker [Wed, 14 Jun 2017 06:59:08 +0000 (09:59 +0300)]
IB/ipoib: Fix access to un-initialized napi struct
There is no need to re-enable napi since we set the initialized
flag before calling ipoib_ib_dev_stop which will disable napi,
disabling napi twice is harmless in case it was already disabled.
One more reason for this fix is that when using IPoIB new device
driver napi is not added to priv, this can lead to kernel panic
when rn_ops ndo_open fails.
[ 289.755840] invalid opcode: 0000 [#1] SMP
[ 289.757111] task:
ffff880036964440 ti:
ffff880178ee8000 task.ti:
ffff880178ee8000
[ 289.757111] RIP: 0010:[<
ffffffffa05368d6>] [<
ffffffffa05368d6>] napi_enable.part.24+0x4/0x6 [ib_ipoib]
[ 289.757111] RSP: 0018:
ffff880178eeb6d8 EFLAGS:
00010246
[ 289.757111] RAX:
0000000000000000 RBX:
ffff880177a80010 RCX:
000000007fffffff
[ 289.757111] RDX:
ffffffff81d5f118 RSI:
0000000000000000 RDI:
ffff880177a80010
[ 289.757111] RBP:
ffff880178eeb6d8 R08:
0000000000000082 R09:
0000000000000283
[ 289.757111] R10:
0000000000000000 R11:
0000000000000000 R12:
ffff880175a00000
[ 289.757111] R13:
ffff880177a80080 R14:
0000000000000000 R15:
0000000000000001
[ 289.757111] FS:
00007fe2ee346880(0000) GS:
ffff88017fc00000(0000) knlGS:
0000000000000000
[ 289.757111] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 289.757111] CR2:
00007fffca979020 CR3:
00000001792e4000 CR4:
00000000000006f0
[ 289.757111] DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
[ 289.757111] DR3:
0000000000000000 DR6:
00000000ffff0ff0 DR7:
0000000000000400
[ 289.757111] Stack:
[ 289.796027]
ffff880178eeb6f0 ffffffffa05251f5 ffff880177a80000 ffff880178eeb718
[ 289.796027]
ffffffffa0528505 ffff880175a00000 ffff880177a80000 0000000000000000
[ 289.796027]
ffff880178eeb748 ffffffffa051f0ab ffff880175a00000 ffffffffa0537d60
[ 289.796027] Call Trace:
[ 289.796027] [<
ffffffffa05251f5>] napi_enable+0x25/0x30 [ib_ipoib]
[ 289.796027] [<
ffffffffa0528505>] ipoib_ib_dev_open+0x175/0x190 [ib_ipoib]
[ 289.796027] [<
ffffffffa051f0ab>] ipoib_open+0x4b/0x160 [ib_ipoib]
[ 289.796027] [<
ffffffff814fe33f>] _dev_open+0xbf/0x130
[ 289.796027] [<
ffffffff814fe62d>] __dev_change_flags+0x9d/0x170
[ 289.796027] [<
ffffffff814fe729>] dev_change_flags+0x29/0x60
[ 289.796027] [<
ffffffff8150caf7>] do_setlink+0x397/0xa40
Fixes:
cd565b4b51e5 ('IB/IPoIB: Support acceleration options callbacks')
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Alex Vesker [Wed, 14 Jun 2017 06:59:07 +0000 (09:59 +0300)]
IB/ipoib: Delete napi in device uninit default
This patch mekas init_default and uninit_default symmetric
with a call to delete napi. Additionally, the uninit_default
gained delete napi call in case of init_default fails.
Fixes:
515ed4f3aab4 ('IB/IPoIB: Separate control and data related initializations')
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Alex Vesker [Wed, 14 Jun 2017 06:59:06 +0000 (09:59 +0300)]
IB/ipoib: Limit call to free rdma_netdev for capable devices
Limit calls to free_rdma_netdev() for capable devices only.
Fixes:
cd565b4b51e5 ('IB/IPoIB: Support acceleration options callbacks')
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Alex Vesker [Wed, 14 Jun 2017 06:59:05 +0000 (09:59 +0300)]
IB/ipoib: Fix memory leaks for child interfaces priv
There is a need to free priv explicitly and not just to release
the device, child priv is freed explicitly on remove flow and this
patch also includes priv free on error flow in P_key creation
and also in add_port.
Fixes:
cd565b4b51e5 ('IB/IPoIB: Support acceleration options callbacks')
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Magnus Damm [Wed, 14 Jun 2017 07:15:24 +0000 (16:15 +0900)]
net: update undefined ->ndo_change_mtu() comment
Update ->ndo_change_mtu() callback comment to remove text
about returning error in case of undefined callback. This
change makes the comment match the existing code behavior.
Signed-off-by: Magnus Damm <damm+renesas@opensource.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 14 Jun 2017 19:03:23 +0000 (15:03 -0400)]
Merge branch 'bpf-MIPS-infra'
David Daney says:
====================
bpf: Changes needed (or desired) for MIPS support
This is a grab bag of changes to the bpf testing infrastructure I
developed working on MIPS eBPF JIT support. The change to
bpf_jit_disasm is probably universally beneficial, the others are more
MIPS specific.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David Daney [Tue, 13 Jun 2017 23:49:38 +0000 (16:49 -0700)]
samples/bpf: Fix tracex5 to work with MIPS syscalls.
There are two problems:
1) In MIPS the __NR_* macros expand to an expression, this causes the
sections of the object file to be named like:
.
.
.
[ 5] kprobe/(5000 + 1) PROGBITS
0000000000000000 000160 ...
[ 6] kprobe/(5000 + 0) PROGBITS
0000000000000000 000258 ...
[ 7] kprobe/(5000 + 9) PROGBITS
0000000000000000 000348 ...
.
.
.
The fix here is to use the "asm_offsets" trick to evaluate the macros
in the C compiler and generate a header file with a usable form of the
macros.
2) MIPS syscall numbers start at 5000, so we need a bigger map to hold
the sub-programs.
Signed-off-by: David Daney <david.daney@cavium.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Daney [Tue, 13 Jun 2017 23:49:37 +0000 (16:49 -0700)]
bpf: Add MIPS support to samples/bpf.
Signed-off-by: David Daney <david.daney@cavium.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Daney [Tue, 13 Jun 2017 23:49:36 +0000 (16:49 -0700)]
test_bpf: Add test to make conditional jump cross a large number of insns.
On MIPS, conditional branches can only span 32k instructions. To
exceed this limit in the JIT with the BPF maximum of 4k insns, we need
to choose eBPF insns that expand to more than 8 machine instructions.
Use BPF_LD_ABS as it is quite complex. This forces the JIT to invert
the sense of the branch to branch around a long jump to the end.
This (somewhat) verifies that the branch inversion logic and target
address calculation of the long jumps are done correctly.
Signed-off-by: David Daney <david.daney@cavium.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Daney [Tue, 13 Jun 2017 23:49:35 +0000 (16:49 -0700)]
tools: bpf_jit_disasm: Handle large images.
Dynamically allocate memory so that JIT images larger than the size of
the statically allocated array can be handled.
Signed-off-by: David Daney <david.daney@cavium.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 14 Jun 2017 18:56:26 +0000 (14:56 -0400)]
Merge branch 'bpf-ctx-narrow'
Yonghong Song says:
====================
bpf: permit bpf program narrower loads for ctx fields
Today, if users try to access a ctx field through a narrower load, e.g.,
__be16 prot = __sk_buff->protocol, verifier will fail.
This set contains the verifier change to permit such loads for
certain ctx fields as well as the new test cases in selftests/bpf.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Yonghong Song [Tue, 13 Jun 2017 22:52:14 +0000 (15:52 -0700)]
selftests/bpf: Add test cases to test narrower ctx field loads
Add test cases in test_verifier and test_progs.
Negative tests are added in test_verifier as well.
The test in test_progs will compare the value of narrower ctx field
load result vs. the masked value of normal full-field load result,
and will fail if they are not the same.
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yonghong Song [Tue, 13 Jun 2017 22:52:13 +0000 (15:52 -0700)]
bpf: permits narrower load from bpf program context fields
Currently, verifier will reject a program if it contains an
narrower load from the bpf context structure. For example,
__u8 h = __sk_buff->hash, or
__u16 p = __sk_buff->protocol
__u32 sample_period = bpf_perf_event_data->sample_period
which are narrower loads of 4-byte or 8-byte field.
This patch solves the issue by:
. Introduce a new parameter ctx_field_size to carry the
field size of narrower load from prog type
specific *__is_valid_access validator back to verifier.
. The non-zero ctx_field_size for a memory access indicates
(1). underlying prog type specific convert_ctx_accesses
supporting non-whole-field access
(2). the current insn is a narrower or whole field access.
. In verifier, for such loads where load memory size is
less than ctx_field_size, verifier transforms it
to a full field load followed by proper masking.
. Currently, __sk_buff and bpf_perf_event_data->sample_period
are supporting narrowing loads.
. Narrower stores are still not allowed as typical ctx stores
are just normal stores.
Because of this change, some tests in verifier will fail and
these tests are removed. As a bonus, rename some out of bound
__sk_buff->cb access to proper field name and remove two
redundant "skb cb oob" tests.
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
WANG Cong [Tue, 13 Jun 2017 20:36:24 +0000 (13:36 -0700)]
net_sched: move tcf_lock down after gen_replace_estimator()
Laura reported a sleep-in-atomic kernel warning inside
tcf_act_police_init() which calls gen_replace_estimator() with
spinlock protection.
It is not necessary in this case, we already have RTNL lock here
so it is enough to protect concurrent writers. For the reader,
i.e. tcf_act_police(), it needs to make decision based on this
rate estimator, in the worst case we drop more/less packets than
necessary while changing the rate in parallel, it is still acceptable.
Reported-by: Laura Abbott <labbott@redhat.com>
Reported-by: Nick Huber <nicholashuber@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Zhang Shengju [Tue, 13 Jun 2017 14:45:11 +0000 (22:45 +0800)]
macvlan: propagate the mac address change status for lowerdev
The macvlan dev should propagate the return value of mac address change for
lower device in the passthru mode, instead of always return 0.
Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 14 Jun 2017 18:26:21 +0000 (14:26 -0400)]
Merge branch '10GbE' of git://git./linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:
====================
10GbE Intel Wired LAN Driver Updates 2017-06-13
This series contains updates to ixgbe and ixgbevf only.
Jake completes his fix ups for our drivers with the ixgbe changes to
resolve a race condition in processing timestamp requests. These fixes
are the same fixes Jake applied earlier to the other drivers, including
the added statistic to help administrators know when an application
timestamp request is ignored.
With all the recent ixgbe/ixgbevf changes and fixes, Tony bumps the
the driver versions. Then Tony provides a fix to resolve a static
analysis warning by changing a variable to unsigned integer since the
value can never be negative.
Emil fixes an issue for X550 devices where the qde parameter was being
ignored, so PFQDE.HIDE_VLAN was not being set.
Jeff Mahoney from SuSE fixes a possible kernel crash, where there was
a small window where tasks writing to the sriov_numvfs sysfs attribute
can sneak in after we call register_netdev(). So we need to call
pci_set_drvdata() before and not after register_netdev() to preserve the
intent of commit
0fb6a55cc31f ("ixgbe: fix crash on rmmod after probe
fail").
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jia-Ju Bai [Mon, 5 Jun 2017 12:23:40 +0000 (20:23 +0800)]
rxe: Fix a sleep-in-atomic bug in post_one_send
The driver may sleep under a spin lock, and the function call path is:
post_one_send (acquire the lock by spin_lock_irqsave)
init_send_wqe
copy_from_user --> may sleep
There is no flow that makes "qp->is_user" true, and copy_from_user may
cause bug when a non-user pointer is used. So the lines of copy_from_user
and check of "qp->is_user" are removed.
Signed-off-by: Jia-Ju Bai <baijiaju1990@163.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Ram Amrani [Mon, 5 Jun 2017 13:32:27 +0000 (16:32 +0300)]
RDMA/qedr: Add 64KB PAGE_SIZE support to user-space queues
Add 64KB PAGE_SIZE support to user-space CQ, SQ and RQ queues.
De-facto it means that code was added to translate 64KB
pages to smaller 4KB pages that the FW can handle. Otherwise,
the FW would wrap (or jump to the next page) when reaching 4KB
while the user space library will continue on the same large page.
Note that MR code remains as is since the FW supports larger pages
for MRs.
Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com>
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Michal Kalderon [Mon, 5 Jun 2017 13:32:26 +0000 (16:32 +0300)]
RDMA/qedr: Initialize byte_len in WC of READ and SEND commands
Initialize byte_len in work completion of RDMA_READ and RDMA_SEND.
Exposed by uDAPL application.
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Selvin Xavier [Mon, 22 May 2017 10:15:44 +0000 (03:15 -0700)]
RDMA/bnxt_re: Remove FMR support
Some issues observed with FMR implementation
while running stress traffic. So removing the
FMR verbs support for now.
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Devesh Sharma [Mon, 22 May 2017 10:15:40 +0000 (03:15 -0700)]
RDMA/bnxt_re: Fix RQE posting logic
This patch adds code to ring RQ Doorbell aggressively
so that the adapter can DMA RQ buffers sooner, instead
of DMA all WQEs in the post_recv WR list together at the
end of the post_recv verb.
Also use spinlock to serialize RQ posting
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Somnath Kotur [Mon, 22 May 2017 10:15:36 +0000 (03:15 -0700)]
RDMA/bnxt_re: Add HW workaround for avoiding stall for UD QPs
HW stalls out after 0x800000 WQEs are posted for UD QPs.
To workaround this problem, driver will send a modify_qp cmd
to the HW at around the halfway mark(0x400000) so that FW
can accordingly modify the QP context in the HW to prevent this
stall.
This workaround needs to be done for UD, QP1 and Raw Ethertype
packets. Added a counter to keep track of WQEs posted during post_send.
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Selvin Xavier [Mon, 22 May 2017 10:15:34 +0000 (03:15 -0700)]
RDMA/bnxt_re: Dereg MR in FW before freeing the fast_reg_page_list
If the host buffers are freed before destroying MR in HW,
HW could try accessing these buffers. This could cause a host
crash. Fixing the code to avoid this condition.
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Eddie Wai [Wed, 14 Jun 2017 10:26:23 +0000 (03:26 -0700)]
RDMA/bnxt_re: HW workarounds for handling specific conditions
This patch implements the following HW workarounds
1. The SQ depth needs to be augmented by 128 + 1 to avoid running
into an Out of order CQE issue
2. Workaround to handle the problem where the HW fast path engine continues
to access DMA memory in retranmission mode even after the WQE has
already been completed. If the HW reports this condition, driver detects
it and posts a Fence WQE. The driver stops reporting the completions
to stack until it receives completion for Fence WQE.
Signed-off-by: Eddie Wai <eddie.wai@broadcom.com>
Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Shih-Yuan Lee (FourDollars) [Wed, 14 Jun 2017 06:42:00 +0000 (14:42 +0800)]
Bluetooth: btusb: Add support for 0489:e0a2 QCA_ROME device
T: Bus=01 Lev=01 Prnt=01 Port=06 Cnt=03 Dev#= 3 Spd=12 MxCh= 0
D: Ver= 1.10 Cls=e0(wlcon) Sub=01 Prot=01 MxPS=64 #Cfgs= 1
P: Vendor=0489 ProdID=e0a2 Rev=00.01
C: #Ifs= 2 Cfg#= 1 Atr=e0 MxPwr=100mA
I: If#= 0 Alt= 0 #EPs= 3 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
I: If#= 1 Alt= 0 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
Signed-off-by: Shih-Yuan Lee (FourDollars) <sylee@canonical.com>
Suggested-by: Owen Lin <olin@rivetnetworks.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Jeff Mahoney [Sat, 3 Jun 2017 22:01:17 +0000 (18:01 -0400)]
ixgbe: pci_set_drvdata must be called before register_netdev
We call pci_set_drvdata immediately after calling register_netdev,
which leaves a window where tasks writing to the sriov_numvfs sysfs
attribute can sneak in and crash the kernel. register_netdev cleans
up after itself so placing pci_set_drvdata immediately before it
should preserve the intent of commit
0fb6a55cc31f ("ixgbe: fix crash
on rmmod after probe fail").
Fixes:
0fb6a55cc31f ("ixgbe: fix crash on rmmod after probe fail")
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tony Nguyen [Thu, 1 Jun 2017 19:06:05 +0000 (12:06 -0700)]
ixgbe: Resolve cppcheck format string warning
cppcheck warns that the format string is incorrect in the function
ixgbe_get_strings(). Since the value cannot be negative, change the
variable to unsigned which matches the format specifier.
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Emil Tantilov [Tue, 23 May 2017 21:02:23 +0000 (14:02 -0700)]
ixgbe: fix writes to PFQDE
ixgbe_write_qde() was ignoring the qde parameter which resulted
in PFQDE.HIDE_VLAN not being set for X550.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tony Nguyen [Thu, 18 May 2017 21:55:23 +0000 (14:55 -0700)]
ixgbevf: Bump version number
Update ixgbevf version number.
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tony Nguyen [Thu, 18 May 2017 21:55:07 +0000 (14:55 -0700)]
ixgbe: Bump version number
Update ixgbe version number.
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>