Majd Dibbiny [Thu, 9 Feb 2017 12:20:12 +0000 (14:20 +0200)]
net/mlx5: Add fast unload support in shutdown flow
Adding a support to flush all HW resources with one FW command and
skip all the heavy unload flows of the driver on kernel shutdown.
There's no need to free all the SW context since a new fresh kernel
will be loaded afterwards.
Regarding the FW resources, they should be closed, otherwise we will
have leakage in the FW. To accelerate this flow, we execute one command
in the beginning that tells the FW that the driver isn't going to close
any of the FW resources and asks the FW to clean up everything.
Once the commands complete, it's safe to close the PCI resources and
finish the routine.
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Majd Dibbiny [Thu, 9 Feb 2017 11:20:46 +0000 (13:20 +0200)]
net/mlx5: Expose command polling interface
Add a new interface for commands execution that allows the
caller to wait for the command's completion in a busy-wait
loop (polling mode).
This is useful if we want to execute a command in a polling mode
while the driver is working in events mode for the rest of
the commands.
This interface will be used in the downstream patches.
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Gal Pressman [Wed, 10 May 2017 12:10:33 +0000 (15:10 +0300)]
net/mlx5e: Optimize update stats work
Unlike ethtool stats, get_stats ndo provides information cached by
update stats work that is running in the background without updating
them explicitly.
We cannot update all counters inside the ndo because some
updates require firmware commands that cannot be performed under a
spinlock.
update_stats work does not need to update ALL counters, since only
some of them are needed by ndo_get_stats.
This patch will allow for a minimal run of update_stats using an extra
parameter which will update necessary counters only and cut 13
firmware commands in each iteration of the work.
Work duration previous to this patch: ~4200us.
Work duration after this patch: ~700us (17% of the original time).
Signed-off-by: Gal Pressman <galp@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Cc: kernel-team@fb.com
Gal Pressman [Wed, 14 Jun 2017 08:52:33 +0000 (11:52 +0300)]
net/mlx5e: Move and optimize query out of buffer function
Move "query queue counter out of buffer" helper function out of
qp.c to en_main.c, since mlx5e netdev driver is the only one to use it.
Also allocate the output buffer on the stack instead of the heap, to reduce
number of heap allocs on update_stats work.
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Cc: kernel-team@fb.com
Gal Pressman [Mon, 5 Jun 2017 13:45:45 +0000 (16:45 +0300)]
net/mlx5e: Reduce number of heap allocated buffers for update stats
Allocating buffers on the heap every 200ms is something we should avoid,
let's use buffers located on the stack instead.
Signed-off-by: Gal Pressman <galp@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Cc: kernel-team@fb.com
Gal Pressman [Wed, 19 Apr 2017 15:10:37 +0000 (18:10 +0300)]
net/mlx5e: Rename physical symbol errors counter
Rename rx_symbol_errors_phy to rx_pcs_symbol_err_phy, in order to
prevent confusion with rx_symbol_err_phy counter.
rx_pcs_symbol_err_phy counter counts the number of symbol errors that
were detected on the PCS (regardless of traffic) and weren't
corrected by FEC correction algorithm or that FEC algorithm was
not active on this interface.
rx_symbol_err_phy refers to errors on packet level (physical error
during a packet receive).
Fixes:
5db0a4f64c04 ("net/mlx5e: Expose physical layer statistical...")
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Cc: kernel-team@fb.com
Itay Aveksis [Wed, 7 Jun 2017 14:01:51 +0000 (17:01 +0300)]
net/mlx5e: Fix typo in warning if CQ moderation is not supported
Signed-off-by: Itay Aveksis <itayav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Tariq Toukan [Sun, 21 May 2017 08:08:18 +0000 (11:08 +0300)]
net/mlx5e: Use function to map aRFS into traffic type
For a better code reuse and readability, use the existing
function arfs_get_tt() to map arfs_type into mlx5e_traffic_types,
instead of duplicating the switch-case logic.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Moni Shoua [Tue, 9 May 2017 09:20:55 +0000 (12:20 +0300)]
net/mlx5: Undo LAG upon request to create virtual functions
LAG cannot work if virtual functions are present. Therefore, if LAG is
configured, the attempt to create virtual functions will fail. This gives
precedence to LAG over SRIOV which is not the desired behavior as users
might want to use the bonding/teaming driver also want to work with SRIOV.
In that case we don't want to force an order of actions, first create
virtual functions and only than configure a bonding/teaming net device.
To fix, if LAG is configured during a request to create virtual
functions, remove it and continue.
We ignore ENODEV when trying to forbid lag. This makes sense
because "No such device" means that lag is forbidden anyway.
Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Or Gerlitz [Sun, 28 May 2017 12:50:50 +0000 (15:50 +0300)]
net/mlx5: Avoid space after casting
Fix checkpatch complaints on that:
CHECK: No space is necessary after a cast
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Or Gerlitz [Sun, 28 May 2017 12:40:43 +0000 (15:40 +0300)]
net/mlx5: Align to match opening parenthesis
Fixed checkpatch complaints of the form:
CHECK: Alignment should match open parenthesis
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Or Gerlitz [Sun, 28 May 2017 12:35:27 +0000 (15:35 +0300)]
net/mlx5: Avoid blank lines before/after closing/opening braces
Fixed checkpatch complaints on that:
CHECK: Blank lines aren't necessary before a close brace '}'
CHECK: Blank lines aren't necessary after an open brace '{'
and one on missing blank line..
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Or Gerlitz [Sun, 28 May 2017 12:32:35 +0000 (15:32 +0300)]
net/mlx5: Avoid using multiple blank lines
Fixed bunch of this checkpatch complaint:
CHECK: Please don't use multiple blank lines
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Or Gerlitz [Sun, 28 May 2017 12:24:17 +0000 (15:24 +0300)]
net/mlx5: Fix some spelling mistakes
Fixed few places where endianness was misspelled and
one spot whwere output was:
CHECK: 'endianess' may be misspelled - perhaps 'endianness'?
CHECK: 'ouput' may be misspelled - perhaps 'output'?
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Eli Cohen [Fri, 28 Oct 2016 14:52:56 +0000 (09:52 -0500)]
net/mlx5: Update eqe_type_str() event names
Add missing NIC_VPORT_CHANGE event.
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
David S. Miller [Thu, 15 Jun 2017 18:31:56 +0000 (14:31 -0400)]
Merge branch 'r8152-support-new-chips'
Hayes Wang says:
====================
r8152: support new chips
These patches are used to support new chips.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
hayeswang [Thu, 15 Jun 2017 06:44:04 +0000 (14:44 +0800)]
r8152: add byte_enable for ocp_read_word function
Add byte_enable for ocp_read_word() to replace reading 4
bytes data with reading the desired 2 bytes data.
This is used to avoid the issue which is described in
commit
b4d99def0938 ("r8152: remove sram_read"). The
original method always reads 4 bytes data, and it may
have problem when reading the PHY registers.
The new method is supported since RTL8153B, but it
doesn't influence the previous chips. The bits of the
byte_enable for the previous chips are the reserved
bits, and the hw would ignore them.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
hayeswang [Thu, 15 Jun 2017 06:44:03 +0000 (14:44 +0800)]
r8152: support RTL8153B
This patch supports two new chips for RTL8153B.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
hayeswang [Thu, 15 Jun 2017 06:44:02 +0000 (14:44 +0800)]
r8152: support new chip 8050
The settings of the new chip are the same with RTL8152, except that
its product ID is 0x8050.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 15 Jun 2017 18:29:01 +0000 (14:29 -0400)]
Merge branch 'ibmvnic-LPM-bug-fixes'
Thomas Falcon says:
====================
ibmvnic: LPM bug fixes
This series of small patches is meant to resolve a number of
bugs, mostly occurring during an ibmvnic driver reset when
recovering from a logical partition migration (LPM).
The first patch ensures that RX buffer pools are properly
activated following an adapter reset by setting the proper
flag in the pool data structure.
The second patch uses netif_tx_disable to stop TX queues when
closing the device during a reset.
Third, fixup a typo that resulted in partial sanitization of
TX/RX descriptor queues following a device reset.
Fourth, remove an ambiguous conditional check that was resulting
in a kernel panic as null RX/TX completion descriptors were being
processed during napi polling while the device is closing.
Finally, fix a condition where the napi polling routine exits
before it has completed its work budget without notifying the
upper network layers. This omission could result in the
napi_disable function sleeping indefinitely under certain conditions.
v2: Attempt to provide a proper cover letter
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Falcon [Thu, 15 Jun 2017 04:50:09 +0000 (23:50 -0500)]
ibmvnic: Exit polling routine correctly during adapter reset
This patch fixes a bug where, in the case of a device reset,
the polling routine will never complete, causing napi_disable
to sleep indefinitely when attempting to close the device.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Falcon [Thu, 15 Jun 2017 04:50:08 +0000 (23:50 -0500)]
ibmvnic: Remove VNIC_CLOSING check from pending_scrq
Fix a kernel panic resulting from data access of a NULL
pointer during device close. The pending_scrq routine is
meant to determine whether there is a valid sub-CRQ message
awaiting processing. When the device is closing, however,
there is a possibility that NULL messages can be processed
because pending_scrq will always return 1 even if there
no valid message in the queue.
It's not clear what this closing state check was originally
meant to accomplish, so just remove it.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Falcon [Thu, 15 Jun 2017 04:50:07 +0000 (23:50 -0500)]
ibmvnic: Sanitize entire SCRQ buffer on reset
Fixup a typo so that the entire SCRQ buffer is cleaned.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Falcon [Thu, 15 Jun 2017 04:50:06 +0000 (23:50 -0500)]
ibmvnic: Ensure that TX queues are disabled in __ibmvnic_close
Use netif_tx_disable to guarantee that TX queues are disabled
when __ibmvnic_close is called by the device reset routine.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Falcon [Thu, 15 Jun 2017 04:50:05 +0000 (23:50 -0500)]
ibmvnic: Activate disabled RX buffer pools on reset
RX buffer pools are disabled while awaiting a device
reset if firmware indicates that the resource is closed.
This patch fixes a bug where pools were not being
subsequently enabled after the device reset, causing
the device to become inoperable.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shannon Nelson [Wed, 14 Jun 2017 22:43:37 +0000 (15:43 -0700)]
sunvnet: restrict advertized checksum offloads to just IP
As much as we'd like to play well with others, we really aren't
handling the checksums on non-IP protocol packets very well. This
is easily seen when trying to do TCP over ipv6 - the checksums are
garbage.
Here we restrict the checksum feature flag to just IP traffic so
that we aren't given work we can't yet do.
Orabug:
26175391,
26259755
Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 15 Jun 2017 18:21:04 +0000 (14:21 -0400)]
Merge branch 'sched-act_tunnel_key-UDP-checksusm'
Jiri Benc says:
====================
net: sched: act_tunnel_key: UDP checksums
Currently, the tunnel_key tc action does not set TUNNEL_CSUM, thus
transmitting packets with zero UDP checksum. This is inconsistent with how
we treat non-lwt UDP tunnels where the default is to fill in the UDP
checksum. Non-zero UDP checksum is the better default anyway for various
reasons previously discussed.
Make this configurable for the tunnel_key tc action with the default being
non-zero checksum. Saves a lot of surprises especially with IPv6.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Benc [Wed, 14 Jun 2017 19:19:31 +0000 (21:19 +0200)]
net: sched: act_tunnel_key: make UDP checksum configurable
Allow requesting of zero UDP checksum for encapsulated packets. The name and
meaning of the attribute is "NO_CSUM" in order to have the same meaning of
the attribute missing and being 0.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Benc [Wed, 14 Jun 2017 19:19:30 +0000 (21:19 +0200)]
net: sched: act_tunnel_key: request UDP checksum by default
There's currently no way to request (outer) UDP checksum with
act_tunnel_key. This is problem especially for IPv6. Right now, tunnel_key
action with IPv6 does not work without going through hassles: both sides
have to have udp6zerocsumrx configured on the tunnel interface. This is
obviously not a good solution universally.
It makes more sense to compute the UDP checksum by default even for IPv4.
Just set the default to request the checksum when using act_tunnel_key.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gustavo A. R. Silva [Thu, 15 Jun 2017 02:58:17 +0000 (21:58 -0500)]
net: s2io: remove useless variable in fill_rx_buffers
Remove useless variable rxd_index and code related.
Addresses-Coverity-ID:
1397691
Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 15 Jun 2017 18:07:51 +0000 (14:07 -0400)]
Merge branch 'dsa-prefix-Global-macros'
Vivien Didelot says:
====================
net: dsa: prefix Global macros
This patch series is the 2/3 step of the register definitions cleanup.
It brings no functional changes.
It prefixes and documents all Global (1) registers with MV88E6XXX_G1_
(or a specific model like MV88E6352_G1_STS_PPU_STATE), and prefers a
16-bit hexadecimal representation of the Marvell registers layout.
The next and last patchset will prefix the Global 2 registers.
====================
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Thu, 15 Jun 2017 16:14:06 +0000 (12:14 -0400)]
net: dsa: mv88e6xxx: prefix Global Prio and Tag macros
Prefix and document the remaining Global IP and IEEE Priority and Core
Tag Type registers and give them a clear 16-bit register representation.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Thu, 15 Jun 2017 16:14:05 +0000 (12:14 -0400)]
net: dsa: mv88e6xxx: prefix Global Stats macros
Prefix and document the Global Stats Operation and Counter registers and
give them a clear 16-bit registers representation.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Thu, 15 Jun 2017 16:14:04 +0000 (12:14 -0400)]
net: dsa: mv88e6xxx: prefix Global Monitor Control macros
Prefix and document the Global Monitor Control Register macros
(which became the Global Monitor & MGMT Control Register with
88E6390)
and give a clear 16-bit registers representation.
Use __bf_shf to get the shift value at compile time instead of adding
new defined macros for it.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Thu, 15 Jun 2017 16:14:03 +0000 (12:14 -0400)]
net: dsa: mv88e6xxx: prefix Global Control macros
Prefix and document the Global Control and Control 2 registers macros
and give a clear 16-bit registers representation.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Thu, 15 Jun 2017 16:14:02 +0000 (12:14 -0400)]
net: dsa: mv88e6xxx: prefix Global VTU macros
Prefix and document the Global VTU registers macros and give a clear
16-bit registers representation.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Thu, 15 Jun 2017 16:14:01 +0000 (12:14 -0400)]
net: dsa: mv88e6xxx: prefix Global ATU macros
Prefix and document the Global ATU Registers macros and give clear
16-bit registers representation.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Thu, 15 Jun 2017 16:14:00 +0000 (12:14 -0400)]
net: dsa: mv88e6xxx: prefix Global Switch MAC macros
Prefix and document the Global Switch MAC Address Register macros and
give clear 16-bit register representation.
At the same time, move mv88e6xxx_g1_set_switch_mac in global1.c, where
it belongs.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Thu, 15 Jun 2017 16:13:59 +0000 (12:13 -0400)]
net: dsa: mv88e6xxx: prefix Global Status macros
Prefix and document the Global Status Register macros and give clear
16-bit register representation.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johannes Berg [Wed, 14 Jun 2017 20:17:20 +0000 (22:17 +0200)]
skbuff: make skb_put_zero() return void
It's nicer to return void, since then there's no need to
cast to any structures. Currently none of the users have
a cast, but a number of future conversions do.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 15 Jun 2017 16:12:41 +0000 (12:12 -0400)]
Merge branch 'net-ktls'
Dave Watson says:
====================
net: kernel TLS
This series adds support for kernel TLS encryption over TCP sockets.
A standard TCP socket is converted to a TLS socket using a setsockopt.
Only symmetric crypto is done in the kernel, as well as TLS record
framing. The handshake remains in userspace, and the negotiated
cipher keys/iv are provided to the TCP socket.
We implemented support for this API in OpenSSL 1.1.0, the code is
available at https://github.com/Mellanox/tls-openssl/tree/master
It should work with any TLS library with similar modifications,
a test tool using gnutls is here: https://github.com/Mellanox/tls-af_ktls_tool
RFC patch to openssl:
https://mta.openssl.org/pipermail/openssl-dev/2017-June/009384.html
Changes from V2:
* EXPORT_SYMBOL_GPL in patch 1
* Ensure cleanup code always called before sk_stream_kill_queues to
avoid warnings
Changes from V1:
* EXPORT_SYMBOL GPL in patch 2
* Add link to OpenSSL patch & gnutls example in documentation patch.
* sk_write_pending check was rolled in to wait_for_memory path,
avoids special case and fixes lock inbalance issue.
* Unify flag handling for sendmsg/sendfile
Changes from RFC V2:
* Generic ULP (upper layer protocol) framework instead of TLS specific
setsockopts
* Dropped Mellanox hardware patches, will come as separate series.
Framework will work for both.
RFC V2:
http://www.mail-archive.com/netdev@vger.kernel.org/msg160317.html
Changes from RFC V1:
* Socket based on changing TCP proto_ops instead of crypto framework
* Merged code with Mellanox's hardware tls offload
* Zerocopy sendmsg support added - sendpage/sendfile is no longer
necessary for zerocopy optimization
RFC V1:
http://www.mail-archive.com/netdev@vger.kernel.org/msg88021.html
* Socket based on crypto userspace API framework, required two
sockets in userspace, one encrypted, one unencrypted.
Paper: https://netdevconf.org/1.2/papers/ktls.pdf
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Dave Watson [Wed, 14 Jun 2017 18:37:51 +0000 (11:37 -0700)]
tls: Documentation
Add documentation for the tcp ULP tls interface.
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dave Watson [Wed, 14 Jun 2017 18:37:39 +0000 (11:37 -0700)]
tls: kernel TLS support
Software implementation of transport layer security, implemented using ULP
infrastructure. tcp proto_ops are replaced with tls equivalents of sendmsg and
sendpage.
Only symmetric crypto is done in the kernel, keys are passed by setsockopt
after the handshake is complete. All control messages are supported via CMSG
data - the actual symmetric encryption is the same, just the message type needs
to be passed separately.
For user API, please see Documentation patch.
Pieces that can be shared between hw and sw implementation
are in tls_main.c
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dave Watson [Wed, 14 Jun 2017 18:37:26 +0000 (11:37 -0700)]
tcp: export do_tcp_sendpages and tcp_rate_check_app_limited functions
Export do_tcp_sendpages and tcp_rate_check_app_limited, since tls will need to
sendpages while the socket is already locked.
tcp_sendpage is exported, but requires the socket lock to not be held already.
Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dave Watson [Wed, 14 Jun 2017 18:37:14 +0000 (11:37 -0700)]
tcp: ULP infrastructure
Add the infrustructure for attaching Upper Layer Protocols (ULPs) over TCP
sockets. Based on a similar infrastructure in tcp_cong. The idea is that any
ULP can add its own logic by changing the TCP proto_ops structure to its own
methods.
Example usage:
setsockopt(sock, SOL_TCP, TCP_ULP, "tls", sizeof("tls"));
modules will call:
tcp_register_ulp(&tcp_tls_ulp_ops);
to register/unregister their ulp, with an init function and name.
A list of registered ulps will be returned by tcp_get_available_ulp, which is
hooked up to /proc. Example:
$ cat /proc/sys/net/ipv4/tcp_available_ulp
tls
There is currently no functionality to remove or chain ULPs, but
it should be possible to add these in the future if needed.
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 15 Jun 2017 16:07:16 +0000 (12:07 -0400)]
Merge branch 'Broadcom-DTE-based-PTP-clock'
Arun Parameswaran says:
====================
Add support for Broadcom DTE based PTP clock
This patchset adds support for the DTE based PTP clock for Broadcom SoCs.
The DTE nco based PTP clock can be used in both wired and wireless networks
for precision time-stmaping purposes.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Arun Parameswaran [Mon, 12 Jun 2017 20:26:01 +0000 (13:26 -0700)]
ptp: Add a ptp clock driver for Broadcom DTE
This patch adds a ptp clock driver for the Broadcom SoCs using
the Digital timing Engine (DTE) nco.
Signed-off-by: Arun Parameswaran <arun.parameswaran@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arun Parameswaran [Mon, 12 Jun 2017 20:26:00 +0000 (13:26 -0700)]
dt-binding: ptp: add bindings document for dte based ptp clock
Add device tree binding documentation for the Broadcom DTE
PTP clock driver.
Signed-off-by: Arun Parameswaran <arun.parameswaran@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 15 Jun 2017 15:31:37 +0000 (11:31 -0400)]
Merge git://git./linux/kernel/git/davem/net
The conflicts were two cases of overlapping changes in
batman-adv and the qed driver.
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Thu, 15 Jun 2017 09:09:47 +0000 (18:09 +0900)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) The netlink attribute passed in to dev_set_alias() is not
necessarily NULL terminated, don't use strlcpy() on it. From
Alexander Potapenko.
2) Fix implementation of atomics in arm64 bpf JIT, from Daniel
Borkmann.
3) Correct the release of netdevs and driver private data in certain
circumstances.
4) Sanitize netlink message length properly in decnet, from Mateusz
Jurczyk.
5) Don't leak kernel data in rtnl_fill_vfinfo() netlink blobs. From
Yuval Mintz.
6) Hash secret is never initialized in ipv6 ILA translation code, from
Arnd Bergmann. I guess those clang warnings about unused inline
functions are useful for something!
7) Fix endian selection in bpf_endian.h, from Daniel Borkmann.
8) Sanitize sockaddr length before dereferncing any fields in AF_UNIX
and CAIF. From Mateusz Jurczyk.
9) Fix timestamping for GMAC3 chips in stmmac driver, from Mario
Molitor.
10) Do not leak netdev on dev_alloc_name() errors in mac80211, from
Johannes Berg.
11) Fix locking in sctp_for_each_endpoint(), from Xin Long.
12) Fix wrong memset size on 32-bit in snmp6, from Christian Perle.
13) Fix use after free in ip_mc_clear_src(), from WANG Cong.
14) Fix regressions caused by ICMP rate limiting changes in 4.11, from
Jesper Dangaard Brouer.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (91 commits)
i40e: Fix a sleep-in-atomic bug
net: don't global ICMP rate limit packets originating from loopback
net/act_pedit: fix an error code
net: update undefined ->ndo_change_mtu() comment
net_sched: move tcf_lock down after gen_replace_estimator()
caif: Add sockaddr length check before accessing sa_family in connect handler
qed: fix dump of context data
qmi_wwan: new Telewell and Sierra device IDs
net: phy: Fix MDIO_THUNDER dependencies
netconsole: Remove duplicate "netconsole: " logging prefix
igmp: acquire pmc lock for ip_mc_clear_src()
r8152: give the device version
net: rps: fix uninitialized symbol warning
mac80211: don't send SMPS action frame in AP mode when not needed
mac80211/wpa: use constant time memory comparison for MACs
mac80211: set bss_info data before configuring the channel
mac80211: remove 5/10 MHz rate code from station MLME
mac80211: Fix incorrect condition when checking rx timestamp
mac80211: don't look at the PM bit of BAR frames
i40e: fix handling of HW ATR eviction
...
Linus Torvalds [Thu, 15 Jun 2017 08:54:51 +0000 (17:54 +0900)]
Merge branch 'linus' of git://git./linux/kernel/git/herbert/crypto-2.6
Pull crypto fix from Herbert Xu:
"This fixes a bug on sparc where we may dereference freed stack memory"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: Work around deallocated stack frame reference gcc bug on sparc.
Linus Torvalds [Thu, 15 Jun 2017 08:51:19 +0000 (17:51 +0900)]
Merge tag 'acpi-4.12-rc6' of git://git./linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"These revert an ACPICA commit from the 4.11 cycle that causes problems
to happen on some systems and add a protection against possible kernel
crashes due to table reference counter imbalance.
Specifics:
- Revert a 4.11 ACPICA change that made assumptions which are not
satisfied on some systems and caused the enumeration of resources
to fail on them (Rafael Wysocki).
- Add a mechanism to prevent tables from being unmapped prematurely
due to reference counter overflows (Lv Zheng)"
* tag 'acpi-4.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPICA: Tables: Mechanism to handle late stage acpi_get_table() imbalance
Revert "ACPICA: Disassembler: Enhance resource descriptor detection"
Linus Torvalds [Thu, 15 Jun 2017 08:47:46 +0000 (17:47 +0900)]
Merge tag 'pm-4.12-rc6' of git://git./linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These revert a recent cpufreq schedutil governor change that turned
out to be problematic and fix a few minor issues in cpufreq, cpuidle
and the Exynos devfreq drivers.
Specifics:
- Revert a recent cpufreq schedutil governor change that caused some
systems to behave undesirably (Rafael Wysocki).
- Fix a cpufreq conservative governor issue introduced during the
3.10 cycle that prevents it from working as expected in some
situations (Tomasz Wilczyński).
- Fix an error code path in the generic cpuidle driver for DT-based
systems (Christophe Jaillet).
- Fix three minor issues in devfreq drivers for Exynos (Arvind Yadav,
Krzysztof Kozlowski)"
* tag 'pm-4.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpuidle: dt: Add missing 'of_node_put()'
cpufreq: conservative: Allow down_threshold to take values from 1 to 10
Revert "cpufreq: schedutil: Reduce frequencies slower"
PM / devfreq: exynos-ppmu: Staticize event list
PM / devfreq: exynos-ppmu: Handle return value of clk_prepare_enable
PM / devfreq: exynos-nocp: Handle return value of clk_prepare_enable
Linus Torvalds [Thu, 15 Jun 2017 08:44:41 +0000 (17:44 +0900)]
Merge branch 'for-4.12/driver-matching-fix' of git://git./linux/kernel/git/jikos/hid
Pull HID fix from Jiri Kosina:
- ifdef-based bandaid for a long-standing issue with HID driver
matching, avoiding regressions in cases where specific driver is not
enabled in kernel .config, from Jiri Kosina
* 'for-4.12/driver-matching-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
HID: let generic driver yield control iff specific driver has been enabled
Linus Torvalds [Thu, 15 Jun 2017 08:37:40 +0000 (17:37 +0900)]
Merge tag 'media/v4.12-3' of git://git./linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
- some build dependency issues at CEC core with randconfigs
- fix an off by one error at vb2
- a race fix at cec core
- driver fixes at tc358743, sir_ir and rainshadow-cec
* tag 'media/v4.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
[media] media/cec.h: use IS_REACHABLE instead of IS_ENABLED
[media] cec: race fix: don't return -ENONET in cec_receive()
[media] sir_ir: infinite loop in interrupt handler
[media] cec-notifier.h: handle unreachable CONFIG_CEC_CORE
[media] cec: improve MEDIA_CEC_RC dependencies
[media] vb2: Fix an off by one error in 'vb2_plane_vaddr'
[media] rainshadow-cec: Fix missing spin_lock_init()
[media] tc358743: fix register i2c_rd/wr function fix
Jia-Ju Bai [Wed, 14 Jun 2017 23:35:31 +0000 (16:35 -0700)]
i40e: Fix a sleep-in-atomic bug
The driver may sleep under a spin lock, and the function call path is:
i40e_ndo_set_vf_port_vlan (acquire the lock by spin_lock_bh)
i40e_vsi_remove_pvid
i40e_vlan_stripping_disable
i40e_aq_update_vsi_params
i40e_asq_send_command
mutex_lock --> may sleep
To fixed it, the spin lock is released before "i40e_vsi_remove_pvid", and
the lock is acquired again after this function.
Signed-off-by: Jia-Ju Bai <baijiaju1990@163.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rafael J. Wysocki [Wed, 14 Jun 2017 23:52:32 +0000 (01:52 +0200)]
Merge branch 'acpica-fixes'
* acpica-fixes:
ACPICA: Tables: Mechanism to handle late stage acpi_get_table() imbalance
Revert "ACPICA: Disassembler: Enhance resource descriptor detection"
Rafael J. Wysocki [Wed, 14 Jun 2017 23:51:33 +0000 (01:51 +0200)]
Merge branches 'pm-cpufreq', 'pm-cpuidle' and 'pm-devfreq'
* pm-cpufreq:
cpufreq: conservative: Allow down_threshold to take values from 1 to 10
Revert "cpufreq: schedutil: Reduce frequencies slower"
* pm-cpuidle:
cpuidle: dt: Add missing 'of_node_put()'
* pm-devfreq:
PM / devfreq: exynos-ppmu: Staticize event list
PM / devfreq: exynos-ppmu: Handle return value of clk_prepare_enable
PM / devfreq: exynos-nocp: Handle return value of clk_prepare_enable
David Howells [Wed, 14 Jun 2017 16:56:50 +0000 (17:56 +0100)]
rxrpc: Cache the congestion window setting
Cache the congestion window setting that was determined during a call's
transmission phase when it finishes so that it can be used by the next call
to the same peer, thereby shortcutting the slow-start algorithm.
The value is stored in the rxrpc_peer struct and is accessed without
locking. Each call takes the value that happens to be there when it starts
and just overwrites the value when it finishes.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Weilin Chang [Wed, 14 Jun 2017 16:11:31 +0000 (09:11 -0700)]
liquidio: fix VF driver off-by-one bug when setting ethtool -C ethX rx-frames
Signed-off-by: Weilin Chang <weilin.chang@cavium.com>
Signed-off-by: Derek Chickles <derek.chickles@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jesper Dangaard Brouer [Wed, 14 Jun 2017 11:27:37 +0000 (13:27 +0200)]
net: don't global ICMP rate limit packets originating from loopback
Florian Weimer seems to have a glibc test-case which requires that
loopback interfaces does not get ICMP ratelimited. This was broken by
commit
c0303efeab73 ("net: reduce cycles spend on ICMP replies that
gets rate limited").
An ICMP response will usually be routed back-out the same incoming
interface. Thus, take advantage of this and skip global ICMP
ratelimit when the incoming device is loopback. In the unlikely event
that the outgoing it not loopback, due to strange routing policy
rules, ICMP rate limiting still works via peer ratelimiting via
icmpv4_xrlim_allow(). Thus, we should still comply with RFC1812
(section 4.3.2.8 "Rate Limiting").
This seems to fix the reproducer given by Florian. While still
avoiding to perform expensive and unneeded outgoing route lookup for
rate limited packets (in the non-loopback case).
Fixes:
c0303efeab73 ("net: reduce cycles spend on ICMP replies that gets rate limited")
Reported-by: Florian Weimer <fweimer@redhat.com>
Reported-by: "H.J. Lu" <hjl.tools@gmail.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Wed, 14 Jun 2017 10:41:52 +0000 (13:41 +0300)]
net/mlxfw: fix a NULL dereference
If we hit this error path we end up returning ERR_PTR(0) which is NULL.
The caller is not expecting that so it results in a NULL dereference.
Fixes:
410ed13cae39 ("Add the mlxfw module for Mellanox firmware flash process")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Wed, 14 Jun 2017 10:29:31 +0000 (13:29 +0300)]
net/act_pedit: fix an error code
I'm reviewing static checker warnings where we do ERR_PTR(0), which is
the same as NULL. I'm pretty sure we intended to return ERR_PTR(-EINVAL)
here. Sometimes these bugs lead to a NULL dereference but I don't
immediately see that problem here.
Fixes:
71d0ed7079df ("net/act_pedit: Support using offset relative to the conventional network headers")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Amir Vadai <amir@vadai.me>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Wed, 14 Jun 2017 09:48:48 +0000 (11:48 +0200)]
net: use skb_unref() in napi_consume_skb()
The commit
83ada39bb79d ("net: factor out a helper to decrement the
skb refcount") provided and used a helper for decrementing skb usage,
but I missed at least a spot for it.
This change remove some more duplicated code reusing skb_unref() in
napi_consume_skb(), too. The helper uses an additional, unneeded
unlikely(!skb) test - napi_consume_skb() already check it a few lines
above - but the compiler is smart enough to optimize the duplicated
test out.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 14 Jun 2017 19:22:17 +0000 (15:22 -0400)]
Merge branch 'for-upstream' of git://git./linux/kernel/git/bluetooth/bluetooth-next
Johan Hedberg says:
====================
pull request: bluetooth-next 2017-06-14
Here's another batch of Bluetooth patches for the 4.13 kernel:
- Fix for Broadcom controllers not supporting Event Mask Page 2
- New QCA ROME USB ID for btusb
- Fix for Security Manager Protocol to use constant-time memcmp
- Improved support for TI WiLink chips
Please let me know if there are any issues pulling. Thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Wed, 14 Jun 2017 09:10:10 +0000 (12:10 +0300)]
qed: Fix an off by one bug
The p_l2_info->pp_qid_usage[] array has "p_l2_info->queues" elements so
the > here should be a >= or we write beyond the end of the array.
Fixes:
bbe3f233ec5e ("qed: Assign a unique per-queue index to queue-cid")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 14 Jun 2017 19:16:31 +0000 (15:16 -0400)]
Merge branch 'mlxsw-Add-support-for-cable-info-access'
Jiri Pirko says:
====================
mlxsw: Add support for cable info access
Add support for cable info access via ethtool. This is done by accessing
the SFP+/QSFP internal EEPROM.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Arkadi Sharshevsky [Wed, 14 Jun 2017 07:27:40 +0000 (09:27 +0200)]
mlxsw: spectrum: Add support for access cable info via ethtool
Add support for access cable info via ethtool.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arkadi Sharshevsky [Wed, 14 Jun 2017 07:27:39 +0000 (09:27 +0200)]
mlxsw: reg: Add MCIA register for cable info access
The MCIA register is used to access the SFP+ and QSFP connector's
EPROM. It will be used to query the cable info.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Magnus Damm [Wed, 14 Jun 2017 07:15:24 +0000 (16:15 +0900)]
net: update undefined ->ndo_change_mtu() comment
Update ->ndo_change_mtu() callback comment to remove text
about returning error in case of undefined callback. This
change makes the comment match the existing code behavior.
Signed-off-by: Magnus Damm <damm+renesas@opensource.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 14 Jun 2017 19:03:23 +0000 (15:03 -0400)]
Merge branch 'bpf-MIPS-infra'
David Daney says:
====================
bpf: Changes needed (or desired) for MIPS support
This is a grab bag of changes to the bpf testing infrastructure I
developed working on MIPS eBPF JIT support. The change to
bpf_jit_disasm is probably universally beneficial, the others are more
MIPS specific.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David Daney [Tue, 13 Jun 2017 23:49:38 +0000 (16:49 -0700)]
samples/bpf: Fix tracex5 to work with MIPS syscalls.
There are two problems:
1) In MIPS the __NR_* macros expand to an expression, this causes the
sections of the object file to be named like:
.
.
.
[ 5] kprobe/(5000 + 1) PROGBITS
0000000000000000 000160 ...
[ 6] kprobe/(5000 + 0) PROGBITS
0000000000000000 000258 ...
[ 7] kprobe/(5000 + 9) PROGBITS
0000000000000000 000348 ...
.
.
.
The fix here is to use the "asm_offsets" trick to evaluate the macros
in the C compiler and generate a header file with a usable form of the
macros.
2) MIPS syscall numbers start at 5000, so we need a bigger map to hold
the sub-programs.
Signed-off-by: David Daney <david.daney@cavium.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Daney [Tue, 13 Jun 2017 23:49:37 +0000 (16:49 -0700)]
bpf: Add MIPS support to samples/bpf.
Signed-off-by: David Daney <david.daney@cavium.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Daney [Tue, 13 Jun 2017 23:49:36 +0000 (16:49 -0700)]
test_bpf: Add test to make conditional jump cross a large number of insns.
On MIPS, conditional branches can only span 32k instructions. To
exceed this limit in the JIT with the BPF maximum of 4k insns, we need
to choose eBPF insns that expand to more than 8 machine instructions.
Use BPF_LD_ABS as it is quite complex. This forces the JIT to invert
the sense of the branch to branch around a long jump to the end.
This (somewhat) verifies that the branch inversion logic and target
address calculation of the long jumps are done correctly.
Signed-off-by: David Daney <david.daney@cavium.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Daney [Tue, 13 Jun 2017 23:49:35 +0000 (16:49 -0700)]
tools: bpf_jit_disasm: Handle large images.
Dynamically allocate memory so that JIT images larger than the size of
the statically allocated array can be handled.
Signed-off-by: David Daney <david.daney@cavium.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 14 Jun 2017 18:56:26 +0000 (14:56 -0400)]
Merge branch 'bpf-ctx-narrow'
Yonghong Song says:
====================
bpf: permit bpf program narrower loads for ctx fields
Today, if users try to access a ctx field through a narrower load, e.g.,
__be16 prot = __sk_buff->protocol, verifier will fail.
This set contains the verifier change to permit such loads for
certain ctx fields as well as the new test cases in selftests/bpf.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Yonghong Song [Tue, 13 Jun 2017 22:52:14 +0000 (15:52 -0700)]
selftests/bpf: Add test cases to test narrower ctx field loads
Add test cases in test_verifier and test_progs.
Negative tests are added in test_verifier as well.
The test in test_progs will compare the value of narrower ctx field
load result vs. the masked value of normal full-field load result,
and will fail if they are not the same.
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yonghong Song [Tue, 13 Jun 2017 22:52:13 +0000 (15:52 -0700)]
bpf: permits narrower load from bpf program context fields
Currently, verifier will reject a program if it contains an
narrower load from the bpf context structure. For example,
__u8 h = __sk_buff->hash, or
__u16 p = __sk_buff->protocol
__u32 sample_period = bpf_perf_event_data->sample_period
which are narrower loads of 4-byte or 8-byte field.
This patch solves the issue by:
. Introduce a new parameter ctx_field_size to carry the
field size of narrower load from prog type
specific *__is_valid_access validator back to verifier.
. The non-zero ctx_field_size for a memory access indicates
(1). underlying prog type specific convert_ctx_accesses
supporting non-whole-field access
(2). the current insn is a narrower or whole field access.
. In verifier, for such loads where load memory size is
less than ctx_field_size, verifier transforms it
to a full field load followed by proper masking.
. Currently, __sk_buff and bpf_perf_event_data->sample_period
are supporting narrowing loads.
. Narrower stores are still not allowed as typical ctx stores
are just normal stores.
Because of this change, some tests in verifier will fail and
these tests are removed. As a bonus, rename some out of bound
__sk_buff->cb access to proper field name and remove two
redundant "skb cb oob" tests.
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
WANG Cong [Tue, 13 Jun 2017 20:36:24 +0000 (13:36 -0700)]
net_sched: move tcf_lock down after gen_replace_estimator()
Laura reported a sleep-in-atomic kernel warning inside
tcf_act_police_init() which calls gen_replace_estimator() with
spinlock protection.
It is not necessary in this case, we already have RTNL lock here
so it is enough to protect concurrent writers. For the reader,
i.e. tcf_act_police(), it needs to make decision based on this
rate estimator, in the worst case we drop more/less packets than
necessary while changing the rate in parallel, it is still acceptable.
Reported-by: Laura Abbott <labbott@redhat.com>
Reported-by: Nick Huber <nicholashuber@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Zhang Shengju [Tue, 13 Jun 2017 14:45:11 +0000 (22:45 +0800)]
macvlan: propagate the mac address change status for lowerdev
The macvlan dev should propagate the return value of mac address change for
lower device in the passthru mode, instead of always return 0.
Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 14 Jun 2017 18:26:21 +0000 (14:26 -0400)]
Merge branch '10GbE' of git://git./linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:
====================
10GbE Intel Wired LAN Driver Updates 2017-06-13
This series contains updates to ixgbe and ixgbevf only.
Jake completes his fix ups for our drivers with the ixgbe changes to
resolve a race condition in processing timestamp requests. These fixes
are the same fixes Jake applied earlier to the other drivers, including
the added statistic to help administrators know when an application
timestamp request is ignored.
With all the recent ixgbe/ixgbevf changes and fixes, Tony bumps the
the driver versions. Then Tony provides a fix to resolve a static
analysis warning by changing a variable to unsigned integer since the
value can never be negative.
Emil fixes an issue for X550 devices where the qde parameter was being
ignored, so PFQDE.HIDE_VLAN was not being set.
Jeff Mahoney from SuSE fixes a possible kernel crash, where there was
a small window where tasks writing to the sriov_numvfs sysfs attribute
can sneak in after we call register_netdev(). So we need to call
pci_set_drvdata() before and not after register_netdev() to preserve the
intent of commit
0fb6a55cc31f ("ixgbe: fix crash on rmmod after probe
fail").
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Shih-Yuan Lee (FourDollars) [Wed, 14 Jun 2017 06:42:00 +0000 (14:42 +0800)]
Bluetooth: btusb: Add support for 0489:e0a2 QCA_ROME device
T: Bus=01 Lev=01 Prnt=01 Port=06 Cnt=03 Dev#= 3 Spd=12 MxCh= 0
D: Ver= 1.10 Cls=e0(wlcon) Sub=01 Prot=01 MxPS=64 #Cfgs= 1
P: Vendor=0489 ProdID=e0a2 Rev=00.01
C: #Ifs= 2 Cfg#= 1 Atr=e0 MxPwr=100mA
I: If#= 0 Alt= 0 #EPs= 3 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
I: If#= 1 Alt= 0 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
Signed-off-by: Shih-Yuan Lee (FourDollars) <sylee@canonical.com>
Suggested-by: Owen Lin <olin@rivetnetworks.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Jeff Mahoney [Sat, 3 Jun 2017 22:01:17 +0000 (18:01 -0400)]
ixgbe: pci_set_drvdata must be called before register_netdev
We call pci_set_drvdata immediately after calling register_netdev,
which leaves a window where tasks writing to the sriov_numvfs sysfs
attribute can sneak in and crash the kernel. register_netdev cleans
up after itself so placing pci_set_drvdata immediately before it
should preserve the intent of commit
0fb6a55cc31f ("ixgbe: fix crash
on rmmod after probe fail").
Fixes:
0fb6a55cc31f ("ixgbe: fix crash on rmmod after probe fail")
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tony Nguyen [Thu, 1 Jun 2017 19:06:05 +0000 (12:06 -0700)]
ixgbe: Resolve cppcheck format string warning
cppcheck warns that the format string is incorrect in the function
ixgbe_get_strings(). Since the value cannot be negative, change the
variable to unsigned which matches the format specifier.
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Emil Tantilov [Tue, 23 May 2017 21:02:23 +0000 (14:02 -0700)]
ixgbe: fix writes to PFQDE
ixgbe_write_qde() was ignoring the qde parameter which resulted
in PFQDE.HIDE_VLAN not being set for X550.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tony Nguyen [Thu, 18 May 2017 21:55:23 +0000 (14:55 -0700)]
ixgbevf: Bump version number
Update ixgbevf version number.
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tony Nguyen [Thu, 18 May 2017 21:55:07 +0000 (14:55 -0700)]
ixgbe: Bump version number
Update ixgbe version number.
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Wed, 3 May 2017 17:29:04 +0000 (10:29 -0700)]
ixgbe: check for Tx timestamp timeouts during watchdog
The ixgbe driver has logic to handle only one Tx timestamp at a time,
using a state bit lock to avoid multiple requests at once.
It may be possible, if incredibly unlikely, that a Tx timestamp event is
requested but never completes. Since we use an interrupt scheme to
determine when the Tx timestamp occurred we would never clear the state
bit in this case.
Add an ixgbe_ptp_tx_hang() function similar to the already existing
ixgbe_ptp_rx_hang() function. This function runs in the watchdog routine
and makes sure we eventually recover from this case instead of
permanently disabling Tx timestamps.
Note: there is no currently known way to cause this without hacking the
driver code to force it.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Wed, 3 May 2017 17:29:00 +0000 (10:29 -0700)]
ixgbe: add statistic indicating number of skipped Tx timestamps
The ixgbe driver can only handle one Tx timestamp request at a time.
This means it is possible for an application timestamp request to be
ignored.
There is no easy way for an administrator to determine if this occurred.
Add a new statistic which tracks this, tx_hwtstamp_skipped.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Wed, 3 May 2017 17:28:56 +0000 (10:28 -0700)]
ixgbe: avoid permanent lock of *_PTP_TX_IN_PROGRESS
The ixgbe driver uses a state bit lock to avoid handling more than one Tx
timestamp request at once. This is required because hardware is limited
to a single set of registers for Tx timestamps.
The state bit lock is not properly cleaned up during
ixgbe_xmit_frame_ring() if the transmit fails such as due to DMA or TSO
failure. In some hardware this results in blocking timestamps until the
service task times out. In other hardware this results in a permanent
lock of the timestamp bit because we never receive an interrupt
indicating the timestamp occurred, since indeed the packet was never
transmitted.
Fix this by checking for DMA and TSO errors in ixgbe_xmit_frame_ring() and
properly cleaning up after ourselves when these occur.
Reported-by: Reported-by: David Mirabito <davidm@metamako.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Wed, 3 May 2017 17:28:53 +0000 (10:28 -0700)]
ixgbe: fix race condition with PTP_TX_IN_PROGRESS bits
Hardware related to the ixgbe driver is limited to handling a single Tx
timestamp request at a time. Thus, the driver ignores requests for Tx
timestamp while waiting for the current request to finish. It uses
a state bit lock which enforces that only one timestamp request is
honored at a time.
Unfortunately this suffers from a simple race condition. The bit lock is
not cleared until after skb_tstamp_tx() is called notifying applications
of a new Tx timestamp. Even a well behaved application sending only one
packet at a time and waiting for a response can wake up and send a new
packet before the bit lock is cleared. This results in needlessly
dropping some Tx timestamp requests.
We can fix this by unlocking the state bit as soon as we read the
Timestamp register, as this is the first point at which it is safe to
unlock.
To avoid issues with the skb pointer, we'll use a copy of the pointer
and set the global variable in the driver structure to NULL first. This
ensures that the next timestamp request does not modify our local copy
of the skb pointer.
This ensures that well behaved applications do not accidentally race
with the unlock bit. Obviously an application which sends multiple Tx
timestamp requests at once will still only timestamp one packet at
a time. Unfortunately there is nothing we can do about this.
Reported-by: David Mirabito <davidm@metamako.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
David S. Miller [Tue, 13 Jun 2017 20:35:04 +0000 (16:35 -0400)]
Merge branch 'net-dsa-Multi-CPU-ground-work'
Florian Fainelli says:
====================
net: dsa: Multi-CPU ground work (v4)
This patch series prepares the ground for adding mutliple CPU port support to
DSA, and starts by removing redundant pieces of information such as
master_netdev which is cpu_dp->ethernet. Finally drivers are moved away from
directly accessing ds->dst->cpu_dp and use appropriate helper functions.
Note that if you have Device Tree blobs/platform configurations that are
currently listing multiple CPU ports, the proposed behavior in
dsa_ds_get_cpu_dp() will be to return the last bit set in ds->cpu_port_mask.
Future plans include:
- making dst->cpu_dp a flexible data structure (array, list, you name it)
- having the ability for drivers to return a default/preferred CPU port (if
necessary)
Changes in v4:
- fixed build warning with NETPOLL enabled
Changes in v3:
- removed the last patch since it causes problems with bcm_sf2/b53 in a
dual-CPU case (root cause known, proper fix underway)
- removed dsa_ds_get_cpu_dp()
Changes in v2:
- added Reviewed-by tags
- assign port->cpu_dp earlier before ops->setup() has run
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Tue, 13 Jun 2017 20:27:22 +0000 (13:27 -0700)]
net: dsa: Introduce dsa_get_cpu_port()
Introduce a helper function which will return a reference to the CPU
port used in a dsa_switch_tree. Right now this is a singleton, but this
will change once we introduce multi-CPU port support, so ease the
transition by converting the affected code paths.
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Tue, 13 Jun 2017 20:27:21 +0000 (13:27 -0700)]
net: dsa: Associate slave network device with CPU port
In preparation for supporting multiple CPU ports with DSA, have the
dsa_port structure know which CPU it is associated with. This will be
important in order to make sure the correct CPU is used for transmission
of the frames. If not for functional reasons, for performance (e.g: load
balancing) and forwarding decisions.
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Tue, 13 Jun 2017 20:27:20 +0000 (13:27 -0700)]
net: dsa: Relocate master ethtool operations
Relocate master_ethtool_ops and master_orig_ethtool_ops into struct
dsa_port in order to be both consistent, and make things self contained
within the dsa_port structure.
This is a preliminary change to supporting multiple CPU port interfaces.
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Tue, 13 Jun 2017 20:27:19 +0000 (13:27 -0700)]
net: dsa: Remove master_netdev and use dst->cpu_dp->netdev
In preparation for supporting multiple CPU ports, remove
dst->master_netdev and ds->master_netdev and replace them with only one
instance of the common object we have for a port: struct
dsa_port::netdev. ds->master_netdev is currently write only and would be
helpful in the case where we have two switches, both with CPU ports, and
also connected within each other, which the multi-CPU port patch series
would address.
While at it, introduce a helper function used in net/dsa/slave.c to
immediately get a reference on the master network device called
dsa_master_netdev().
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mateusz Jurczyk [Tue, 13 Jun 2017 18:06:12 +0000 (20:06 +0200)]
caif: Add sockaddr length check before accessing sa_family in connect handler
Verify that the caller-provided sockaddr structure is large enough to
contain the sa_family field, before accessing it in the connect()
handler of the AF_CAIF socket. Since the syscall doesn't enforce a minimum
size of the corresponding memory region, very short sockaddrs (zero or one
byte long) result in operating on uninitialized memory while referencing
sa_family.
Signed-off-by: Mateusz Jurczyk <mjurczyk@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ganesh Goudar [Tue, 13 Jun 2017 19:15:43 +0000 (00:45 +0530)]
cxgb4: handle serial flash interrupt
If SF bit is not cleared in PL_INT_CAUSE, subsequent non-data
interrupts are not raised. Enable SF bit in Global Interrupt
Mask and handle it as non-fatal and hence eventually clear it.
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Mason [Tue, 13 Jun 2017 14:56:08 +0000 (10:56 -0400)]
of_mdio: move of_mdio_parse_addr to header file
The of_mdio_parse_addr() helper function is useful to other code, but
the module dependency chain causes issues. To work around this, we can
move of_mdio_parse_addr() to be an inline function in the header file.
This gets rid of the dependencies and still allows for the reuse of
code.
Reported-by: Liviu Dudau <liviu@dudau.co.uk>
Signed-off-by: Jon Mason <jon.mason@broadcom.com>
Fixes:
342fa1964439 ("mdio: mux: make child bus walking more permissive and errors more verbose")
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jesper Dangaard Brouer [Tue, 13 Jun 2017 13:17:19 +0000 (15:17 +0200)]
selftests/bpf: make correct use of exit codes in bpf selftests
The selftests depend on using the shell exit code as a mean of
detecting the success or failure of test-binary executed. The
appropiate output "[PASS]" or "[FAIL]" in generated by
tools/testing/selftests/lib.mk.
Notice that the exit code is masked with 255. Thus, be careful if
using the number of errors as the exit code, as 256 errors would be
seen as a success.
There are two standard defined exit(3) codes:
/usr/include/stdlib.h
#define EXIT_FAILURE 1 /* Failing exit status. */
#define EXIT_SUCCESS 0 /* Successful exit status. */
Fix test_verifier.c to not use the negative value of variable
"results", but instead return EXIT_FAILURE.
Fix test_align.c and test_progs.c to actually use exit codes, before
they were always indicating success regardless of results.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>