Jakub Kicinski [Mon, 22 May 2017 17:59:32 +0000 (10:59 -0700)]
nfp: mark port state as stale after reconfig
After port configuration is performed mark it as changed. This
will close a window of time between configuration and async
state refresh which runs from a workqueue where old port state
would be reported.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Mon, 22 May 2017 17:59:31 +0000 (10:59 -0700)]
nfp: provide linking on port structures
Add link to nfp_ports to make it possible to iterate over all ports.
This will come in handy when some ports may be representors.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Mon, 22 May 2017 17:59:30 +0000 (10:59 -0700)]
nfp: move refresh tracking into the port structure
Track whether physical port's state have changed since last refresh
inside the nfp_port structure instead of the vNIC structure.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Mon, 22 May 2017 17:59:29 +0000 (10:59 -0700)]
nfp: update port state in place
Always updating port state in place by overriding values in exiting
pf->eth_tbl makes things easier to manage and allows us to have a
common helper for both full and per-port refresh.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Mon, 22 May 2017 17:59:28 +0000 (10:59 -0700)]
nfp: introduce nfp_port
Encapsulate port information into struct nfp_port. nfp_port will
soon be extended to contain devlink_port information. It also makes
it easier to reuse port-related code between vNICs and representors.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Mon, 22 May 2017 17:59:27 +0000 (10:59 -0700)]
nfp: disallow mixing vNICs with and without NSP port entry
We only support core NIC apps which have vNICs for each physical port/
split and no representors right now. Enforce that either each vNIC has
a NSP eth_table entry or if NSP port table is not available none do.
One scenario this will prevent from happening is user force-loading
wrong firmware file if FW app requires different firmwares per media
config.
While at it move some code to nfp_net_pf_alloc_vnic() to make it
counter-match nfp_net_pf_free_vnic() better.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Mon, 22 May 2017 17:59:26 +0000 (10:59 -0700)]
nfp: introduce very minimal nfp_app
Introduce a concept of an application. For now it's just grouping
pointers and serving as a layer of indirection. It will help us
weaken the dependency on nfp_net in ethtool code. Later series
will flesh out support for different apps in the driver.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Mon, 22 May 2017 17:59:25 +0000 (10:59 -0700)]
nfp: add nfp_net_pf_free_vnic() function
Soon a third place will need to free a struct nfp_net. Add a free
counterpart to nfp_net_pf_alloc_vnic().
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Mon, 22 May 2017 17:59:24 +0000 (10:59 -0700)]
nfp: rename netdev/port to vNIC
vNIC is a PCIe-side abstraction NFP firmwares supported by this
driver use. It was initially meant to represent a device port
and therefore a netdev but today should be thought of as a way
of grouping descriptor rings and associated state. Advanced apps
will have vNICs without netdevs and ports without a vNIC (using
representors instead).
Make sure code refers to vNICs as vNICs and not ports or netdevs.
No functional changes.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Mon, 22 May 2017 17:59:23 +0000 (10:59 -0700)]
nfp: make nfp_net alloc/init/cleanup/free not depend on netdevs
struct nfp_net represents a vNIC, we will be moving away from the
requirement for every vNIC to have a netdev associated with it.
Remove "netdev" from some function names and prefer passing
struct nfp_net pointer as argument instead of struct net_device *.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Simon Horman [Mon, 22 May 2017 17:59:22 +0000 (10:59 -0700)]
nfp: add nfp_cppcore_pcie_unit() helper
Add nfp_cppcore_pcie_unit() helper to retrieve the PCIE unit of a CPP
handle and use the new helper as appropriate.
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ivan Vecera [Fri, 19 May 2017 17:30:43 +0000 (19:30 +0200)]
bridge: fix hello and hold timers starting/stopping
Current bridge code incorrectly handles starting/stopping of hello and
hold timers during STP enable/disable.
1. Timers are stopped in br_stp_start() during NO_STP->USER_STP
transition. The timers are already stopped in NO_STP state so
this is confusing no-op.
2. During USER_STP->NO_STP transition the timers are started. This
does not make sense and is confusion because the timer should not be
active in NO_STP state.
Cc: davem@davemloft.net
Cc: sashok@cumulusnetworks.com
Cc: stephen@networkplumber.org
Cc: bridge@lists.linux-foundation.org
Cc: lucien.xin@gmail.com
Cc: nikolay@cumulusnetworks.com
Signed-off-by: Ivan Vecera <cera@cera.cz>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Holger Brunck [Mon, 22 May 2017 07:31:15 +0000 (09:31 +0200)]
net/wan/fsl_ucc_hdlc: fix muram allocation error
sizeof(priv->ucc_pram) is 4 as it is the size of a pointer, but we want
to reserve space for the struct ucc_hdlc_param.
Signed-off-by: Holger Brunck <holger.brunck@keymile.com>
Cc: Zhao Qiang <qiang.zhao@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rohit Chavan [Mon, 22 May 2017 06:29:15 +0000 (11:59 +0530)]
net: ipv4: tcp: fixed comment coding style issue
Fixed a coding style issue
Signed-off-by: Rohit Chavan <roheetchavan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rosen, Rami [Sun, 21 May 2017 19:12:38 +0000 (22:12 +0300)]
net: socket: fix a typo in sockfd_lookup().
This patch fixes a typo in sockfd_lookup() in net/socket.c.
Signed-off-by: Rami Rosen <rami.rosen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 22 May 2017 16:12:21 +0000 (12:12 -0400)]
Merge branch 'netlink-extack-route-add-del'
David Ahern says:
====================
net: Add extack for route add/delete failures
Use the extack feature to improve error messages to user on route
add and delete failures.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Sun, 21 May 2017 16:12:05 +0000 (10:12 -0600)]
net: ipv6: Add extack messages for route add failures
Add messages for non-obvious errors (e.g, no need to add text for malloc
failures or ENODEV failures). This mostly covers the annoying EINVAL errors
Some message strings violate the 80-columns but searchable strings need to
trump that rule.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Sun, 21 May 2017 16:12:04 +0000 (10:12 -0600)]
net: ipv6: Plumb extack through route add functions
Plumb extack argument down to route add functions.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Sun, 21 May 2017 16:12:03 +0000 (10:12 -0600)]
net: ipv4: Add extack messages for route add failures
Add messages for non-obvious errors (e.g, no need to add text for malloc
failures or ENODEV failures). This mostly covers the annoying EINVAL errors
Some message strings violate the 80-columns but searchable strings need to
trump that rule.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Sun, 21 May 2017 16:12:02 +0000 (10:12 -0600)]
net: ipv4: Plumb extack through route add functions
Plumb extack argument down to route add functions.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Girish Moodalbail [Fri, 19 May 2017 22:25:44 +0000 (15:25 -0700)]
macsec: double accounting of dropped rx/tx packets
The macsec implementation shouldn't account for rx/tx packets that are
dropped in the netdev framework. The netdev framework itself accounts
for such packets by atomically updating struct net_device`rx_dropped and
struct net_device`tx_dropped fields. Later on when the stats for macsec
link is retrieved, the packets dropped in netdev framework will be
included in dev_get_stats() after calling macsec.c`macsec_get_stats64()
Signed-off-by: Girish Moodalbail <girish.moodalbail@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 22 May 2017 14:26:24 +0000 (10:26 -0400)]
net: Fix parisc SCM_TIMESTAMPING_PKTINFO value.
Needs to follow the existing sequence.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 22 May 2017 03:13:37 +0000 (23:13 -0400)]
net: Define SCM_TIMESTAMPING_PKTINFO on all architectures.
A definition was only provided for asm-generic/socket.h
using platforms, define it for the others as well
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Sun, 21 May 2017 17:39:00 +0000 (10:39 -0700)]
tcp: fix tcp_probe_timer() for TCP_USER_TIMEOUT
TCP_USER_TIMEOUT is still converted to jiffies value in
icsk_user_timeout
So we need to make a conversion for the cases HZ != 1000
Fixes:
9a568de4818d ("tcp: switch TCP TS option (RFC 7323) to 1ms clock")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
stephen hemminger [Fri, 19 May 2017 16:55:55 +0000 (09:55 -0700)]
ipv6: drop unused variables in seg6_genl_dumphac
THe seg6_pernet_data variable was set but never used.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
stephen hemminger [Fri, 19 May 2017 16:55:54 +0000 (09:55 -0700)]
fou: make local function static
The build header functions are not used by any other code.
net/ipv6/fou6.c:36:5: warning: no previous prototype for ‘fou6_build_header’ [-Wmissing-prototypes]
net/ipv6/fou6.c:54:5: warning: no previous prototype for ‘gue6_build_header’ [-Wmissing-prototypes]
Need to do some code rearranging to satisfy different Kconfig possiblities.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
stephen hemminger [Fri, 19 May 2017 16:55:52 +0000 (09:55 -0700)]
tcpnv: do not export local function
The TCP New Vegas congestion control was exporting an internal
function tcpnv_get_info which is not used by any other in tree
kernel code. Make it static.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
stephen hemminger [Fri, 19 May 2017 16:55:51 +0000 (09:55 -0700)]
inet: fix warning about missing prototype
The prototype for inet_rcv_saddr_equal was not being included.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
stephen hemminger [Fri, 19 May 2017 16:55:49 +0000 (09:55 -0700)]
ila: propagate error code in ila_output
This warning:
net/ipv6/ila/ila_lwt.c: In function ‘ila_output’:
net/ipv6/ila/ila_lwt.c:42:6: warning: variable ‘err’ set but not used [-Wunused-but-set-variable]
It looks like the code attempts to set propagate different error
values, but always returned -EINVAL.
Compile tested only. Needs review by original author.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
stephen hemminger [Fri, 19 May 2017 16:55:48 +0000 (09:55 -0700)]
dcb: enforce minimum length on IEEE_APPS attribute
Found by reviewing the warning about unused policy table.
The code implies that it meant to check for size, but since
it unrolled the loop for attribute validation that is never used.
Instead do explicit check for attribute.
Compile tested only. Needs review by original author.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 21 May 2017 17:37:35 +0000 (13:37 -0400)]
Merge branch 'net-extend-socket-timestamping-API'
Miroslav Lichvar says:
====================
Extend socket timestamping API
Changes v5->v6:
- fixed skb_is_swtx_tstamp() when OPT_TX_SWHW is disabled and improved
its description
- improved OPT_PKTINFO documentation
- improved scm_timestamping documentation
Changes v4->v5:
- fixed initialization of reserved fields in struct scm_ts_pktinfo
Changes v3->v4:
- added reserved fields to struct scm_ts_pktinfo
- replaced patch fixing false SW timestamps with a documentation fix
- updated OPT_TX_SWHW patch to handle false SW timestamps
Changes v2->v3:
- modified struct scm_ts_pktinfo to use fixed-width integer types
- added WARN_ON_ONCE for missing RCU lock in dev_get_by_napi_id()
- modified dev_get_by_napi_id() to not return dev in unexpected branch
- modified recv to return SCM_TIMESTAMPING_PKTINFO even if the interface
index is unknown
Changes v1->v2:
- added separate patch for new NAPI functions
- split code from __sock_recv_timestamp() for better readability
- fixed RCU locking
- fixed compiler warning (missing case in switch in first patch)
- inline sw_tx_timestamp() in its only user
Changes RFC->v1:
- reworked SOF_TIMESTAMPING_OPT_PKTINFO patch to not add new fields to
skb shared info (net device is now looked up by napi_id), not require
any changes in drivers, and restrict the cmsg to incoming packets
- renamed SOF_TIMESTAMPING_OPT_MULTIMSG to SOF_TIMESTAMPING_OPT_TX_SWHW
and fixed its description
- moved struct scm_ts_pktinfo from errqueue.h to net_tstamp.h as it
can't be received from the error queue anymore
- improved commit descriptions and removed incorrect comment
This patchset adds new options to the timestamping API that will be
useful for NTP implementations and possibly other applications.
The first patch specifies a timestamp filter for NTP packets. The second
patch updates drivers that can timestamp all packets, or need to list
the filter as unsupported. There is no attempt to add the support to the
phyter driver.
The third patch adds two helper functions working with NAPI ID, which is
needed by the next patch. The fourth patch adds a new option to get a
new control message with the L2 length and interface index for incoming
packets with hardware timestamps.
The fifth patch fixes documentation on number of non-zero fields in
scm_timestamping and warns about false software timestamps when
SO_TIMESTAMP(NS) is combined with SCM_TIMESTAMPING.
The sixth patch adds a new option to request both software and hardware
timestamps for outgoing packets. The seventh patch updates drivers that
assumed software timestamping cannot be used together with hardware
timestamping.
The patches have been tested on x86_64 machines with igb and e1000e
drivers.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Miroslav Lichvar [Fri, 19 May 2017 15:52:41 +0000 (17:52 +0200)]
net: ethernet: update drivers to make both SW and HW TX timestamps
Some drivers were calling the skb_tx_timestamp() function only when
a hardware timestamp was not requested. Now that applications can use
the SOF_TIMESTAMPING_OPT_TX_SWHW option to request both software and
hardware timestamps, the drivers need to be modified to unconditionally
call skb_tx_timestamp().
CC: Richard Cochran <richardcochran@gmail.com>
CC: Willem de Bruijn <willemb@google.com>
Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Miroslav Lichvar [Fri, 19 May 2017 15:52:40 +0000 (17:52 +0200)]
net: allow simultaneous SW and HW transmit timestamping
Add SOF_TIMESTAMPING_OPT_TX_SWHW option to allow an outgoing packet to
be looped to the socket's error queue with a software timestamp even
when a hardware transmit timestamp is expected to be provided by the
driver.
Applications using this option will receive two separate messages from
the error queue, one with a software timestamp and the other with a
hardware timestamp. As the hardware timestamp is saved to the shared skb
info, which may happen before the first message with software timestamp
is received by the application, the hardware timestamp is copied to the
SCM_TIMESTAMPING control message only when the skb has no software
timestamp or it is an incoming packet.
While changing sw_tx_timestamp(), inline it in skb_tx_timestamp() as
there are no other users.
CC: Richard Cochran <richardcochran@gmail.com>
CC: Willem de Bruijn <willemb@google.com>
Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Miroslav Lichvar [Fri, 19 May 2017 15:52:39 +0000 (17:52 +0200)]
net: fix documentation of struct scm_timestamping
The scm_timestamping struct may return multiple non-zero fields, e.g.
when both software and hardware RX timestamping is enabled, or when the
SO_TIMESTAMP(NS) option is combined with SCM_TIMESTAMPING and a false
software timestamp is generated in the recvmsg() call in order to always
return a SCM_TIMESTAMP(NS) message.
CC: Richard Cochran <richardcochran@gmail.com>
CC: Willem de Bruijn <willemb@google.com>
Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Miroslav Lichvar [Fri, 19 May 2017 15:52:38 +0000 (17:52 +0200)]
net: add new control message for incoming HW-timestamped packets
Add SOF_TIMESTAMPING_OPT_PKTINFO option to request a new control message
for incoming packets with hardware timestamps. It contains the index of
the real interface which received the packet and the length of the
packet at layer 2.
The index is useful with bonding, bridges and other interfaces, where
IP_PKTINFO doesn't allow applications to determine which PHC made the
timestamp. With the L2 length (and link speed) it is possible to
transpose preamble timestamps to trailer timestamps, which are used in
the NTP protocol.
While this information could be provided by two new socket options
independently from timestamping, it doesn't look like they would be very
useful. With this option any performance impact is limited to hardware
timestamping.
Use dev_get_by_napi_id() to get the device and its index. On kernels
with disabled CONFIG_NET_RX_BUSY_POLL or drivers not using NAPI, a zero
index will be returned in the control message.
CC: Richard Cochran <richardcochran@gmail.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Miroslav Lichvar [Fri, 19 May 2017 15:52:37 +0000 (17:52 +0200)]
net: add function to retrieve original skb device using NAPI ID
Since commit
b68581778cd0 ("net: Make skb->skb_iif always track
skb->dev") skbs don't have the original index of the interface which
received the packet. This information is now needed for a new control
message related to hardware timestamping.
Instead of adding a new field to skb, we can find the device by the NAPI
ID if it is available, i.e. CONFIG_NET_RX_BUSY_POLL is enabled and the
driver is using NAPI. Add dev_get_by_napi_id() and also skb_napi_id() to
hide the CONFIG_NET_RX_BUSY_POLL ifdef.
CC: Richard Cochran <richardcochran@gmail.com>
Suggested-by: Willem de Bruijn <willemb@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Miroslav Lichvar [Fri, 19 May 2017 15:52:36 +0000 (17:52 +0200)]
net: ethernet: update drivers to handle HWTSTAMP_FILTER_NTP_ALL
Include HWTSTAMP_FILTER_NTP_ALL in net_hwtstamp_validate() as a valid
filter and update drivers which can timestamp all packets, or which
explicitly list unsupported filters instead of using a default case, to
handle the filter.
CC: Richard Cochran <richardcochran@gmail.com>
CC: Willem de Bruijn <willemb@google.com>
Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Miroslav Lichvar [Fri, 19 May 2017 15:52:35 +0000 (17:52 +0200)]
net: define receive timestamp filter for NTP
Add HWTSTAMP_FILTER_NTP_ALL to the hwtstamp_rx_filters enum for
timestamping of NTP packets. There is currently only one driver
(phyter) that could support it directly.
CC: Richard Cochran <richardcochran@gmail.com>
CC: Willem de Bruijn <willemb@google.com>
Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ganesh Goudar [Fri, 19 May 2017 12:20:15 +0000 (17:50 +0530)]
cxgb4 : retrieve port information from firmware
issue get port information command to firmware to retrieve port
information and update if it is different from what was last
recorded and also add indication for supported link modes for
firmware port types FW_PORT_TYPE_SFP28, FW_PORT_TYPE_KR_SFP28,
FW_PORT_TYPE_CR4_QSFP.
Based on the original work by Casey Leedom <leedom@chelsio.com>
Signed-off-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sivakumar Krishnasamy [Fri, 19 May 2017 09:30:38 +0000 (05:30 -0400)]
ibmveth: Support to enable LSO/CSO for Trunk VEA.
Current largesend and checksum offload feature in ibmveth driver,
- Source VM sends the TCP packets with ip_summed field set as
CHECKSUM_PARTIAL and TCP pseudo header checksum is placed in
checksum field
- CHECKSUM_PARTIAL flag in SKB will enable ibmveth driver to mark
"no checksum" and "checksum good" bits in transmit buffer descriptor
before the packet is delivered to pseries PowerVM Hypervisor
- If ibmveth has largesend capability enabled, transmit buffer descriptors
are market accordingly before packet is delivered to Hypervisor
(along with mss value for packets with length > MSS)
- Destination VM's ibmveth driver receives the packet with "checksum good"
bit set and so, SKB's ip_summed field is set with CHECKSUM_UNNECESSARY
- If "largesend" bit was on, mss value is copied from receive descriptor
into SKB's gso_size and other flags are appropriately set for
packets > MSS size
- The packet is now successfully delivered up the stack in destination VM
The offloads described above works fine for TCP communication among VMs in
the same pseries server ( VM A <=> PowerVM Hypervisor <=> VM B )
We are now enabling support for OVS in pseries PowerVM environment. One of
our requirements is to have ibmveth driver configured in "Trunk" mode, when
they are used with OVS. This is because, PowerVM Hypervisor will no more
bridge the packets between VMs, instead the packets are delivered to
IO Server which hosts OVS to bridge them between VMs or to external
networks (flow shown below),
VM A <=> PowerVM Hypervisor <=> IO Server(OVS) <=> PowerVM Hypervisor
<=> VM B
In "IO server" the packet is received by inbound Trunk ibmveth and then
delivered to OVS, which is then bridged to outbound Trunk ibmveth (shown
below),
Inbound Trunk ibmveth <=> OVS <=> Outbound Trunk ibmveth
In this model, we hit the following issues which impacted the VM
communication performance,
- Issue 1: ibmveth doesn't support largesend and checksum offload features
when configured as "Trunk". Driver has explicit checks to prevent
enabling these offloads.
- Issue 2: SYN packet drops seen at destination VM. When the packet
originates, it has CHECKSUM_PARTIAL flag set and as it gets delivered to
IO server's inbound Trunk ibmveth, on validating "checksum good" bits
in ibmveth receive routine, SKB's ip_summed field is set with
CHECKSUM_UNNECESSARY flag. This packet is then bridged by OVS (or Linux
Bridge) and delivered to outbound Trunk ibmveth. At this point the
outbound ibmveth transmit routine will not set "no checksum" and
"checksum good" bits in transmit buffer descriptor, as it does so only
when the ip_summed field is CHECKSUM_PARTIAL. When this packet gets
delivered to destination VM, TCP layer receives the packet with checksum
value of 0 and with no checksum related flags in ip_summed field. This
leads to packet drops. So, TCP connections never goes through fine.
- Issue 3: First packet of a TCP connection will be dropped, if there is
no OVS flow cached in datapath. OVS while trying to identify the flow,
computes the checksum. The computed checksum will be invalid at the
receiving end, as ibmveth transmit routine zeroes out the pseudo
checksum value in the packet. This leads to packet drop.
- Issue 4: ibmveth driver doesn't have support for SKB's with frag_list.
When Physical NIC has GRO enabled and when OVS bridges these packets,
OVS vport send code will end up calling dev_queue_xmit, which in turn
calls validate_xmit_skb.
In validate_xmit_skb routine, the larger packets will get segmented into
MSS sized segments, if SKB has a frag_list and if the driver to which
they are delivered to doesn't support NETIF_F_FRAGLIST feature.
This patch addresses the above four issues, thereby enabling end to end
largesend and checksum offload support for better performance.
- Fix for Issue 1 : Remove checks which prevent enabling TCP largesend and
checksum offloads.
- Fix for Issue 2 : When ibmveth receives a packet with "checksum good"
bit set and if its configured in Trunk mode, set appropriate SKB fields
using skb_partial_csum_set (ip_summed field is set with
CHECKSUM_PARTIAL)
- Fix for Issue 3: Recompute the pseudo header checksum before sending the
SKB up the stack.
- Fix for Issue 4: Linearize the SKBs with frag_list. Though we end up
allocating buffers and copying data, this fix gives
upto 4X throughput increase.
Note: All these fixes need to be dropped together as fixing just one of
them will lead to other issues immediately (especially for Issues 1,2 & 3).
Signed-off-by: Sivakumar Krishnasamy <ksiva@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 21 May 2017 16:56:57 +0000 (12:56 -0400)]
Merge branch 'qed-next'
Yuval Mintz says:
====================
qed/qede updates
This series contains some general minor fixes and enhancements:
- #1, #2 and #9 correct small missing ethtool functionality.
- #3, #6 and #8 correct minor issues in driver, but those are either
print-related or unexposed in existing code.
- #4 adds proper support to TLB mode bonding.
- #10 is meant to improve performance on varying cache-line sizes.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Sudarsana Reddy Kalluru [Sun, 21 May 2017 09:11:00 +0000 (12:11 +0300)]
qede: Support 1G advertisment.
Some variants of adapters support the 1G speed capability. Need to
allow the configuration of 1G speed if adapter supports it.
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tomer Tayar [Sun, 21 May 2017 09:10:59 +0000 (12:10 +0300)]
qed: Fix setting of Management bitfields
The management firmware HSI contains masks which are already
shifted to their right place, so QED_MFW_SET_FIELD() is clearing
incorrect fields by shifting the mask by the offset.
Luckily, today we set the fields in an incrementing order [so we're
not erasing any previously set fields], but this still needs fixing.
Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mintz, Yuval [Sun, 21 May 2017 09:10:58 +0000 (12:10 +0300)]
qede: qedr closure after setting state
This is benign, but it makes more sense to start the close sequence
only after changing the internal state [in case it would once care].
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mintz, Yuval [Sun, 21 May 2017 09:10:57 +0000 (12:10 +0300)]
qed: Correct print in iscsi error-flow
If too many CQs are requested, qed would print the available
number as if it's a resource and not a feature leading to the
wrong print.
Fixes:
08737a3fa30a ("qed: Inform qedi the number of possible CQs")
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tomer Tayar [Sun, 21 May 2017 09:10:56 +0000 (12:10 +0300)]
qed: Revise alloc/setup/free flow
Re-organize the logic that allocates and frees memory of various
sub-components of the hw-function -
a. No need to pass pointers to said structure as parameters;
The internal logic knows exactly where to find/set the data.
b. Nullify pointers after cleanup to prevent possible errors to
re-entrant code.
Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mintz, Yuval [Sun, 21 May 2017 09:10:55 +0000 (12:10 +0300)]
qede: Don't use an internal MAC field
Driver maintains its primary MAC in a private field which
gets updated when ndo_dev_set_mac() gets called.
However, there are flows where the primary MAC of the device can change
without said NDO being called [bond device in TLB mode configuring
slaves' addresses], resulting in a configuration where there's a mismatch
between what's apparent to user [the netdevice's value] and what's
configured in the HW [the private value].
As we don't have any real motivation of maintaining this
private field, simply remove it and start using the netdevice's
field instead.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sudarsana Reddy Kalluru [Sun, 21 May 2017 09:10:54 +0000 (12:10 +0300)]
qede: Add missing Status-block free
When destroying the datapath channels, qede doesn't notify qed of the
released status blocks which were acquired during the initialization.
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sudarsana Reddy Kalluru [Sun, 21 May 2017 09:10:53 +0000 (12:10 +0300)]
qede: Honor user request for Tx buffers
Driver always allocates the maximal number of tx-buffers irrespective of
actual Tx ring config.
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mintz, Yuval [Sun, 21 May 2017 09:10:52 +0000 (12:10 +0300)]
qede: Allow WoL to activate by default
When management firmware declares that the device is WoL-capable,
the default driver behavior would be to allow the management firmware
to take the decision of whether it's actually needed or not.
Problem is ethtool interface doesn't have a 'default' kind
of option, and user would see the interface WoL as disabled,
which doesn't accurately reflect the actual configuration.
More-so, if the user actually wants to explicitly disable WoL he'd have
to first enable it [otherwise ethtool would block the command].
Instead of allowing management to make the decision, enable WoL by
default on all devices capable of it.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 19 May 2017 23:41:45 +0000 (19:41 -0400)]
Merge branch 'xgene-check-all-RGMII-phy-mode-variants'
Iyappan Subramanian says:
====================
Check all RGMII phy mode variants
This patch set,
- adds phy_interface_mode_is_rgmii() helper function
- addresses review comment from previous patch set, by calling
phy_interface_mode_is_rgmii() to address all RGMII variants
v2: Address review comments from v1
- adds phy_interface_mode_is_rgmii() helper function
- addresses review comment from previous patch set, by calling
phy_interface_mode_is_rgmii() to address all RGMII variants
v1:
- Initial version
====================
Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Iyappan Subramanian [Thu, 18 May 2017 22:13:44 +0000 (15:13 -0700)]
xgene: Check all RGMII phy mode variants
This patch addresses the review comment from the previous patch set,
by using phy_interface_mode_is_rgmii() helper function to address
all RGMII phy mode variants.
Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
Signed-off-by: Quan Nguyen <qnguyen@apm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Iyappan Subramanian [Thu, 18 May 2017 22:13:43 +0000 (15:13 -0700)]
phy: Add helper function to check phy interface mode
Added helper function that checks phy_mode is RGMII (all variants)
'bool phy_interface_mode_is_rgmii(phy_interface_t mode)'
Changed the following function, to use the above.
'bool phy_interface_is_rgmii(struct phy_device *phydev)'
Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
Suggested-by: Florian Fainelli <f.fainelli@gmail.com>
Suggested-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 19 May 2017 23:21:32 +0000 (19:21 -0400)]
Merge branch 'net-fix-CRC32c-in-the-forwarding-path'
Davide Caratti says:
====================
net: fix CRC32c in the forwarding path
Current kernel allows offloading CRC32c computation when SCTP packets
are generated, setting skb->ip_summed to CHECKSUM_PARTIAL, if the
underlying device features have NETIF_F_SCTP_CRC set. However, after these
packets are forwarded, they may land on a device where CRC32c offloading is
not available: as a consequence, transmission is done with wrong CRC32c.
It's not possible to use sctp_compte_cksum() in the forwarding path
and in most drivers, because it needs symbols exported by libcrc32c module.
Patch 1 and 2 of this series try to solve this problem, introducing a new
helper function, namely skb_crc32c_csum_help(), that can be used to resolve
CHECKSUM_PARTIAL when crc32c is needed instead of Internet Checksum.
Currently, we need to parse the packet headers to understand what algorithm
is needed to resolve CHECKSUM_PARTIAL. We can speedup things by storing
this information in the skb metadata, and use it to call an appropriate
helper (skb_checksum_help or skb_crc32c_csum_help), or leave the packet
unmodified when the NIC is able to offload the checksum computation.
Patch 3 deprecates skb->csum_bad to free one bit in skb metadata; patch 4
introduces skb->csum_not_inet, providing skb with an indication on the
algorithm needed to resolve CHECKSUM_PARTIAL.
Patch 5 and 6 fix the kernel forwarding path and openvswitch datapath,
where skb_checksum_help was unconditionally called to resolve CHECKSUM_PARTIAL,
thus generating wrong CRC32c in forwarded SCTP packets.
Finally, patch 7 updates documentation to provide a better description of
possible values of skb->ip_summed.
Some further work is still possible:
* drivers that parse the packet header to correctly resolve CHECKSUM_PARTIAL
(e.g. ixgbe_tx_csum()) can benefit from testing skb->csum_not_inet to avoid
calling ip_hdr(skb)->protocol or ixgbe_ipv6_csum_is_sctp(skb).
* drivers that call skb_checksum_help() to resolve CHECKSUM_PARTIAL can
call skb_csum_hwoffload_help to avoid corrupting SCTP packets.
Changes v2->v3:
- patch 1/7: more standard declaration of stub variables
Changes v1->v2:
- none
Changes RFCv4->v1:
- patch 2/7: use WARN_ON_ONCE() instead of BUG_ON(), and avoid computing
CRC32c on the error path.
- patch 3/7: don't invert tests on the values of same_flow and
NAPI_GRO_CB(skb)->flush in dev_gro_receive(), it's useless and it breaks
GRO functionality as reported by kernel test robot.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Davide Caratti [Thu, 18 May 2017 13:44:43 +0000 (15:44 +0200)]
sk_buff.h: improve description of CHECKSUM_{COMPLETE, UNNECESSARY}
Add FCoE to the list of protocols that can set CHECKSUM_UNNECESSARY; add a
note to CHECKSUM_COMPLETE section to specify that it does not apply to SCTP
and FCoE protocols.
Suggested-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Davide Caratti [Thu, 18 May 2017 13:44:42 +0000 (15:44 +0200)]
openvswitch: more accurate checksumming in queue_userspace_packet()
if skb carries an SCTP packet and ip_summed is CHECKSUM_PARTIAL, it needs
CRC32c in place of Internet Checksum: use skb_csum_hwoffload_help to avoid
corrupting such packets while queueing them towards userspace.
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Davide Caratti [Thu, 18 May 2017 13:44:41 +0000 (15:44 +0200)]
net: more accurate checksumming in validate_xmit_skb()
skb_csum_hwoffload_help() uses netdev features and skb->csum_not_inet to
determine if skb needs software computation of Internet Checksum or crc32c
(or nothing, if this computation can be done by the hardware). Use it in
place of skb_checksum_help() in validate_xmit_skb() to avoid corruption
of non-GSO SCTP packets having skb->ip_summed equal to CHECKSUM_PARTIAL.
While at it, remove references to skb_csum_off_chk* functions, since they
are not present anymore in Linux _ see commit
cf53b1da73bd ("Revert
"net: Add driver helper functions to determine checksum offloadability"").
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Davide Caratti [Thu, 18 May 2017 13:44:40 +0000 (15:44 +0200)]
net: use skb->csum_not_inet to identify packets needing crc32c
skb->csum_not_inet carries the indication on which algorithm is needed to
compute checksum on skb in the transmit path, when skb->ip_summed is equal
to CHECKSUM_PARTIAL. If skb carries a SCTP packet and crc32c hasn't been
yet written in L4 header, skb->csum_not_inet is assigned to 1; otherwise,
assume Internet Checksum is needed and thus set skb->csum_not_inet to 0.
Suggested-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Davide Caratti [Thu, 18 May 2017 13:44:39 +0000 (15:44 +0200)]
sk_buff: remove support for csum_bad in sk_buff
This bit was introduced with commit
5a21232983aa ("net: Support for
csum_bad in skbuff") to reduce the stack workload when processing RX
packets carrying a wrong Internet Checksum. Up to now, only one driver and
GRO core are setting it.
Suggested-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Davide Caratti [Thu, 18 May 2017 13:44:38 +0000 (15:44 +0200)]
net: introduce skb_crc32c_csum_help
skb_crc32c_csum_help is like skb_checksum_help, but it is designed for
checksumming SCTP packets using crc32c (see RFC3309), provided that
libcrc32c.ko has been loaded before. In case libcrc32c is not loaded,
invoking skb_crc32c_csum_help on a skb results in one the following
printouts:
warn_crc32c_csum_update: attempt to compute crc32c without libcrc32c.ko
warn_crc32c_csum_combine: attempt to compute crc32c without libcrc32c.ko
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Davide Caratti [Thu, 18 May 2017 13:44:37 +0000 (15:44 +0200)]
skbuff: add stub to help computing crc32c on SCTP packets
sctp_compute_checksum requires crc32c symbol (provided by libcrc32c), so
it can't be used in net core. Like it has been done previously with other
symbols (e.g. ipv6_dst_lookup), introduce a stub struct skb_checksum_ops
to allow computation of crc32c checksum in net core after sctp.ko (and thus
libcrc32c) has been loaded.
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Soheil Hassas Yeganeh [Tue, 16 May 2017 21:39:02 +0000 (17:39 -0400)]
tcp: warn on negative reordering values
Commit
bafbb9c73241 ("tcp: eliminate negative reordering
in tcp_clean_rtx_queue") fixes an issue for negative
reordering metrics.
To be resilient to such errors, warn and return
when a negative metric is passed to tcp_update_reordering().
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 18 May 2017 20:11:32 +0000 (16:11 -0400)]
Merge git://git./linux/kernel/git/davem/net
Wei Yongjun [Thu, 18 May 2017 15:34:41 +0000 (15:34 +0000)]
net/mlx5e: Fix possible memory leak
'encap_header' is malloced and should be freed before leaving from
the error handling cases, otherwise it will cause memory leak.
Fixes:
232c001398ae ("net/mlx5e: Add support to neighbour update flow")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Yongjun [Thu, 18 May 2017 15:26:29 +0000 (15:26 +0000)]
qed: Remove unused including <linux/version.h>
Remove including <linux/version.h> that is not needed.
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Yongjun [Thu, 18 May 2017 15:24:52 +0000 (15:24 +0000)]
ibmvnic: fix missing unlock on error in __ibmvnic_reset()
Add the missing unlock before return from function __ibmvnic_reset()
in the error handling case.
Fixes:
ed651a10875f ("ibmvnic: Updated reset handling")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Thu, 18 May 2017 19:04:41 +0000 (12:04 -0700)]
Merge tag 'md/4.12-rc2' of git://git./linux/kernel/git/shli/md
Pull MD fixes from Shaohua Li:
- Several bug fixes for raid5-cache from Song Liu, mainly handle
journal disk error
- Fix bad block handling in choosing raid1 disk from Tomasz Majchrzak
- Simplify external metadata array sysfs handling from Artur
Paszkiewicz
- Optimize raid0 discard handling from me, now raid0 will dispatch
large discard IO directly to underlayer disks.
* tag 'md/4.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md:
raid1: prefer disk without bad blocks
md/r5cache: handle sync with data in write back cache
md/r5cache: gracefully handle journal device errors for writeback mode
md/raid1/10: avoid unnecessary locking
md/raid5-cache: in r5l_do_submit_io(), submit io->split_bio first
md/md0: optimize raid0 discard handling
md: don't return -EAGAIN in md_allow_write for external metadata arrays
md/raid5: make use of spin_lock_irq over local_irq_disable + spin_lock
Matthias Kaehlcke [Thu, 18 May 2017 17:57:19 +0000 (10:57 -0700)]
net1080: Remove unused function nc_dump_ttl()
The function is not used, removing it fixes the following warning when
building with clang:
drivers/net/usb/net1080.c:271:20: error: unused function
'nc_dump_ttl' [-Werror,-Wunused-function]
Also remove the definition of TTL_THIS, which is only used in
nc_dump_ttl()
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Matthias Kaehlcke [Thu, 18 May 2017 17:45:33 +0000 (10:45 -0700)]
r8152: Remove unused function usb_ocp_read()
The function is not used, removing it fixes the following warning when
building with clang:
drivers/net/usb/r8152.c:825:5: error: unused function 'usb_ocp_read'
[-Werror,-Wunused-function]
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Thu, 18 May 2017 18:40:21 +0000 (11:40 -0700)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) Don't allow negative TCP reordering values, from Soheil Hassas
Yeganeh.
2) Don't overflow while parsing ipv6 header options, from Craig Gallek.
3) Handle more cleanly the case where an individual route entry during
a dump will not fit into the allocated netlink SKB, from David
Ahern.
4) Add missing CONFIG_INET dependency for mlx5e, from Arnd Bergmann.
5) Allow neighbour updates to converge more quickly via gratuitous
ARPs, from Ihar Hrachyshka.
6) Fix compile error from CONFIG_INET is disabled, from Eric Dumazet.
7) Fix use after free in x25 protocol init, from Lin Zhang.
8) Valid VLAN pvid ranges passed into br_validate(), from Tobias
Jungel.
9) NULL out address lists in child sockets in SCTP, this is similar to
the fix we made for inet connection sockets last week. From Eric
Dumazet.
10) Fix NULL deref in mlxsw driver, from Ido Schimmel.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (27 commits)
mlxsw: spectrum: Avoid possible NULL pointer dereference
sh_eth: Do not print an error message for probe deferral
sh_eth: Use platform device for printing before register_netdev()
mlxsw: spectrum_router: Fix rif counter freeing routine
mlxsw: spectrum_dpipe: Fix incorrect entry index
cxgb4: update latest firmware version supported
qmi_wwan: add another Lenovo EM74xx device ID
sctp: do not inherit ipv6_{mc|ac|fl}_list from parent
udp: make *udp*_queue_rcv_skb() functions static
bridge: netlink: check vlan_default_pvid range
net: ethernet: faraday: To support device tree usage.
net: x25: fix one potential use-after-free issue
bpf: adjust verifier heuristics
ipv6: Check ip6_find_1stfragopt() return value properly.
selftests/bpf: fix broken build due to types.h
bnxt_en: Check status of firmware DCBX agent before setting DCB_CAP_DCBX_HOST.
bnxt_en: Call bnxt_dcb_init() after getting firmware DCBX configuration.
net: fix compile error in skb_orphan_partial()
ipv6: Prevent overrun when parsing v6 header options
neighbour: update neigh timestamps iff update is effective
...
Linus Torvalds [Thu, 18 May 2017 18:21:10 +0000 (11:21 -0700)]
Merge git://git./linux/kernel/git/davem/sparc
Pull sparc fixes from David Miller:
"Three sparc bug fixes"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
sparc/ftrace: Fix ftrace graph time measurement
sparc: Fix -Wstringop-overflow warning
sparc64: Fix mapping of 64k pages with MAP_FIXED
Linus Torvalds [Thu, 18 May 2017 18:17:34 +0000 (11:17 -0700)]
Merge tag 'kbuild-fixes-v4.12' of git://git./linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fix from Masahiro Yamada:
"Fix headers_install to not delete pre-existing headers in the install
destination"
* tag 'kbuild-fixes-v4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kbuild: skip install/check of headers right under uapi directories
Mintz, Yuval [Thu, 18 May 2017 16:41:04 +0000 (19:41 +0300)]
qed: Utilize FW 8.20.0.0
This pushes qed [and as result, all qed* drivers] into using 8.20.0.0
firmware. The changes are mostly contained in qed with minor changes
to qedi due to some HSI changes.
Content-wise, the firmware contains fixes to various issues exposed
since the release of the previous firmware, including:
- Corrects iSCSI fast retransmit when data digest is enabled.
- Stop draining packets when receiving several consecutive PFCs.
- Prevent possible assertion when consecutively opening/closing
many connections.
- Prevent possible assertion due to too long BDQ fetch time.
In addition, the new firmware would allow us to later add iWARP support
in qed and qedr.
Changes from previous version
-----------------------------
- V2: Fix warning in qed_debug.c
Signed-off-by: Chad Dupuis <Chad.Dupuis@cavium.com>
Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com>
Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com>
Signed-off-by: Manish Rangankar <Manish.Rangankar@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 18 May 2017 16:15:58 +0000 (09:15 -0700)]
tcp: fix tcp_rearm_rto()
skbs in (re)transmit queue no longer have a copy of jiffies
at the time of the transmit : skb->skb_mstamp is now in usec unit,
with no correlation to tcp_jiffies32.
We have to convert rto from jiffies to usec, compute a time difference
in usec, then convert the delta to HZ units.
Fixes:
9a568de4818d ("tcp: switch TCP TS option (RFC 7323) to 1ms clock")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Thu, 18 May 2017 17:04:42 +0000 (10:04 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/ebiederm/user-namespace
Pull pid namespace fixes from Eric Biederman:
"These are two bugs that turn out to have simple fixes that were
reported during the merge window. Both of these issues have existed
for a while and it just happens that they both were reported at almost
the same time"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
pid_ns: Fix race between setns'ed fork() and zap_pid_ns_processes()
pid_ns: Sleep in TASK_INTERRUPTIBLE in zap_pid_ns_processes
Linus Torvalds [Thu, 18 May 2017 16:38:09 +0000 (09:38 -0700)]
Merge tag 'hwmon-for-linus-v4.12-rc2' of git://git./linux/kernel/git/groeck/linux-staging
Pull hwmon fix from Guenter Roeck:
"Fix problem with hotplug state machine in coretemp driver"
* tag 'hwmon-for-linus-v4.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (coretemp) Handle frozen hotplug state correctly
Ido Schimmel [Thu, 18 May 2017 11:03:52 +0000 (13:03 +0200)]
mlxsw: spectrum: Avoid possible NULL pointer dereference
In case we got an FDB notification for a port that doesn't exist we
execute an FDB entry delete to prevent it from re-appearing the next
time we poll for notifications.
If the operation failed we would trigger a NULL pointer dereference as
'mlxsw_sp_port' is NULL.
Fix it by reporting the error using the underlying bus device instead.
Fixes:
12f1501e7511 ("mlxsw: spectrum: remove FDB entry in case we get unknown object notification")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Thu, 18 May 2017 09:14:01 +0000 (10:14 +0100)]
liquidio: make the spinlock octeon_devices_lock static
octeon_devices_lock can be made static as it does not need to be
in global scope.
Cleans up sparse warning: "warning: symbol 'octeon_devices_lock'
was not declared. Should it be static?"
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Geert Uytterhoeven [Thu, 18 May 2017 13:01:35 +0000 (15:01 +0200)]
sh_eth: Do not print an error message for probe deferral
EPROBE_DEFER is not an error, hence printing an error message like
sh-eth
ee700000.ethernet: failed to initialise MDIO
may confuse the user.
To fix this, suppress the error message in case of probe deferral.
While at it, shorten the message, and add the actual error code.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Geert Uytterhoeven [Thu, 18 May 2017 13:01:34 +0000 (15:01 +0200)]
sh_eth: Use platform device for printing before register_netdev()
The MDIO initialization failure message is printed using the network
device, before it has been registered, leading to:
(null): failed to initialise MDIO
Use the platform device instead to fix this:
sh-eth
ee700000.ethernet: failed to initialise MDIO
Fixes:
daacf03f0bbfefee ("sh_eth: Register MDIO bus before registering the network device")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 18 May 2017 15:18:20 +0000 (11:18 -0400)]
Merge tag 'linux-can-next-for-4.13-
20170518' of git://git./linux/kernel/git/mkl/linux-can-next
Marc Kleine-Budde says:
====================
pull-request: can-next 2017-05-18
this is a pull request of 4 patches for net-next/master.
All 4 patches are by Quentin Schulz, they add deep deep Suspend/Resume
support to the m_can driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Arkadi Sharshevsky [Thu, 18 May 2017 07:22:45 +0000 (09:22 +0200)]
mlxsw: spectrum_dpipe: Fix sparse warnings
drivers/net/ethernet/mellanox/mlxsw//spectrum_dpipe.c:221:52: warning:
Using plain integer as NULL pointer
drivers/net/ethernet/mellanox/mlxsw//spectrum_dpipe.c:221:74: warning:
Using plain integer as NULL pointer
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 18 May 2017 15:04:00 +0000 (11:04 -0400)]
Merge branch 'mlxsw-fixes'
Jiri Pirko says:
====================
mlxsw: couple of fixes
Couple of fixes from Arkadi
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Arkadi Sharshevsky [Thu, 18 May 2017 07:18:53 +0000 (09:18 +0200)]
mlxsw: spectrum_router: Fix rif counter freeing routine
During rif counter freeing the counter index can be invalid. Add check
of validity before freeing the counter.
Fixes:
e0c0afd8aa4e ("mlxsw: spectrum: Support for counters on router interfaces")
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arkadi Sharshevsky [Thu, 18 May 2017 07:18:52 +0000 (09:18 +0200)]
mlxsw: spectrum_dpipe: Fix incorrect entry index
In case of disabled counters the entry index will be incorrect. Fix this
by moving the entry index set before the counter status check.
Fixes:
2ba5999f009d ("mlxsw: spectrum: Add Support for erif table entries access")
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Anholt [Thu, 18 May 2017 00:32:12 +0000 (17:32 -0700)]
net: dsa: b53: Add compatible strings for the Cygnus-family BCM11360.
Cygnus is a small family of SoCs, of which we currently have
devicetree for BCM11360 and BCM58300. The 11360's B53 is mostly the
same as 58xx, just requiring a tiny bit of setup that was previously
missing.
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 18 May 2017 14:40:20 +0000 (10:40 -0400)]
Merge branch 'dsa-headers-cleanup'
Vivien Didelot says:
====================
net: dsa: headers cleanup
The DSA core files share a common private header file. Include the DSA
public header there instead of independently in each core source file.
DSA core and its drivers use switchdev, thus include switchdev.h in the
public DSA header. This allows us to get rid of the forward declaration
and use typedef defined by switchdev.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Wed, 17 May 2017 19:46:05 +0000 (15:46 -0400)]
net: dsa: use switchdev_obj_dump_cb_t everywhere
Now that the DSA public header includes switchdev.h, use the provided
switchdev_obj_dump_cb_t typedef for the object dump callback.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Wed, 17 May 2017 19:46:04 +0000 (15:46 -0400)]
net: dsa: include switchdev.h only once
DSA drivers and core use switchdev. Include switchdev.h only once, in
the dsa.h public header, so that inclusion in DSA drivers or forward
declarations of switchdev structures in not necessary anymore.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Wed, 17 May 2017 19:46:03 +0000 (15:46 -0400)]
net: dsa: include dsa.h only once
The public include/net/dsa.h file is meant for DSA drivers, while all
DSA core files share a common private header net/dsa/dsa_priv.h file.
Ensure that dsa_priv.h is the only DSA core file to include net/dsa.h,
and add a new line to separate absolute and relative headers at the same
time.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrey Vagin [Wed, 17 May 2017 18:39:05 +0000 (11:39 -0700)]
net: fix __skb_try_recv_from_queue to return the old behavior
This function has to return NULL on a error case, because there is a
separate error variable.
The offset has to be changed only if skb is returned
v2: fix udp code to not use an extra variable
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: David S. Miller <davem@davemloft.net>
Fixes:
65101aeca522 ("net/sock: factor out dequeue/peek with offset cod")
Signed-off-by: Andrei Vagin <avagin@openvz.org>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ganesh Goudar [Wed, 17 May 2017 18:38:16 +0000 (00:08 +0530)]
cxgb4: update latest firmware version supported
Change t4fw_version.h to update latest firmware version
number to 1.16.43.0.
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexey Dobriyan [Wed, 17 May 2017 16:31:39 +0000 (19:31 +0300)]
net: make struct dst_entry::dev first member
struct dst_entry::dev is used most often. Move it so it can be
accessed without imm8 offset on x86_64.
add/remove: 0/0 grow/shrink: 9/239 up/down: 52/-413 (-361)
function old new delta
dst_rcu_free 126 138 +12
fnhe_flush_routes 211 219 +8
rt_set_nexthop 747 754 +7
rt_cache_route 85 91 +6
rt6_release 209 215 +6
dst_release 107 111 +4
dst_destroy_rcu 29 33 +4
dn_dst_check_expire 329 333 +4
dn_insert_route 484 485 +1
xfrm_resolve_and_create_bundle 2991 2990 -1
...
ip_route_me_harder 1163 1157 -6
__ip_append_data.isra 2730 2724 -6
ip6_forward 3052 3045 -7
callforward_do_filter 659 651 -8
dst_gc_task 571 549 -22
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 18 May 2017 14:28:49 +0000 (10:28 -0400)]
Merge branch 'fsl_ucc_hdlc-enhancements'
Signed-off-by: David S. Miller <davem@davemloft.net>
Holger Brunck [Wed, 17 May 2017 15:24:39 +0000 (17:24 +0200)]
powerpc/85xx/kmcent2: use hdlc busmode for UCC1
Signed-off-by: Holger Brunck <holger.brunck@keymile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Holger Brunck [Wed, 17 May 2017 15:24:38 +0000 (17:24 +0200)]
net/wan/fsl_ucc_hdlc: add hdlc-bus support
This adds support for hdlc-bus mode to the fsl_ucc_hdlc driver. This can
be enabled with the "fsl,hdlc-bus" property in the DTS node of the
corresponding ucc.
This aligns the configuration of the UPSMR and GUMR registers to what is
done in our ucc_hdlc driver (that only support hdlc-bus mode) and with
the QuickEngine's documentation for hdlc-bus mode.
GUMR/SYNL is set to AUTO for the busmode as in this case the CD signal
is ignored. The brkpt_support is enabled to set the HBM1 bit in the
CMXUCR register to configure an open-drain connected HDLC bus.
Signed-off-by: Holger Brunck <holger.brunck@keymile.com>
Cc: Zhao Qiang <qiang.zhao@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Holger Brunck [Wed, 17 May 2017 15:24:37 +0000 (17:24 +0200)]
fsl/qe: add bit description for SYNL register for GUMR
Add the bitmask for the two bit SYNL register according to the QUICK
Engine Reference Manual.
Signed-off-by: Holger Brunck <holger.brunck@keymile.com>
Cc: Zhao Qiang <qiang.zhao@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Holger Brunck [Wed, 17 May 2017 15:24:36 +0000 (17:24 +0200)]
net/wan/fsl_ucc_hdlc: call qe_setbrg only for loopback mode
We can't assume that we are always in loopback mode if rx and tx clock
have the same clock source. If we want to use HDLC busmode we also have
the same clock source but we are not in loopback mode. So move the
setting of the baudrate generator after the check for property for the
loopback mode.
Signed-off-by: Holger Brunck <holger.brunck@keymile.com>
Cc: Zhao Qiang <qiang.zhao@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Holger Brunck [Wed, 17 May 2017 15:24:35 +0000 (17:24 +0200)]
net/wan/fsl_ucc_hdlc: fix incorrect memory allocation
We need space for the struct qe_bd and not for a pointer to this struct.
Signed-off-by: Holger Brunck <holger.brunck@keymile.com>
Cc: Zhao Qiang <qiang.zhao@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Holger Brunck [Wed, 17 May 2017 15:24:34 +0000 (17:24 +0200)]
net/wan/fsl_ucc_hdlc: fix wrong indentation
Signed-off-by: Holger Brunck <holger.brunck@keymile.com>
Cc: Zhao Qiang <qiang.zhao@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>