GitHub/moto-9609/android_kernel_motorola_exynos9610.git
9 years agoipv4: Namespecify TCP PMTU mechanism
Fan Du [Tue, 10 Feb 2015 01:53:16 +0000 (09:53 +0800)]
ipv4: Namespecify TCP PMTU mechanism

Packetization Layer Path MTU Discovery works separately beside
Path MTU Discovery at IP level, different net namespace has
various requirements on which one to chose, e.g., a virutalized
container instance would require TCP PMTU to probe an usable
effective mtu for underlying tunnel, while the host would
employ classical ICMP based PMTU to function.

Hence making TCP PMTU mechanism per net namespace to decouple
two functionality. Furthermore the probe base MSS should also
be configured separately for each namespace.

Signed-off-by: Fan Du <fan.du@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoi40e: Fix for stats init function call in Rx setup
Carolyn Wyborny [Tue, 10 Feb 2015 01:42:31 +0000 (17:42 -0800)]
i40e: Fix for stats init function call in Rx setup

This patch fixes indentation issue and error found in argument
reported by static analysis.  Without this patch, sparse and other
static analysis errors will be found.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Reported-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
David S. Miller [Mon, 9 Feb 2015 22:35:57 +0000 (14:35 -0800)]
Merge git://git./linux/kernel/git/davem/net

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotcp: don't include Fast Open option in SYN-ACK on pure SYN-data
Yuchung Cheng [Mon, 9 Feb 2015 20:35:23 +0000 (12:35 -0800)]
tcp: don't include Fast Open option in SYN-ACK on pure SYN-data

If a server has enabled Fast Open and it receives a pure SYN-data
packet (without a Fast Open option), it won't accept the data but it
incorrectly returns a SYN-ACK with a Fast Open cookie and also
increments the SNMP stat LINUX_MIB_TCPFASTOPENPASSIVEFAIL.

This patch makes the server include a Fast Open cookie in SYN-ACK
only if the SYN has some Fast Open option (i.e., when client
requests or presents a cookie).

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoopenvswitch: Only set TUNNEL_VXLAN_OPT if VXLAN-GBP metadata is set
Thomas Graf [Mon, 9 Feb 2015 15:56:37 +0000 (16:56 +0100)]
openvswitch: Only set TUNNEL_VXLAN_OPT if VXLAN-GBP metadata is set

This avoids setting TUNNEL_VXLAN_OPT for VXLAN frames which don't
have any GBP metadata set. It is not invalid to set it but unnecessary.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'ipv6_ufo_fix'
David S. Miller [Mon, 9 Feb 2015 22:21:10 +0000 (14:21 -0800)]
Merge branch 'ipv6_ufo_fix'

Vladislav Yasevich says:

====================
IPv6 Fix 2 small issues with UFO restoration code

This series fixes 2 small issues introduced by the
"Restore UFO support to virtio_net devices" series.

V2:  Fixed patch title and description for patch1.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoipv6: Make __ipv6_select_ident static
Vlad Yasevich [Mon, 9 Feb 2015 14:38:21 +0000 (09:38 -0500)]
ipv6: Make __ipv6_select_ident static

Make __ipv6_select_ident() static as it isn't used outside
the file.

Fixes: 0508c07f5e0c9 (ipv6: Select fragment id during UFO segmentation if not set.)
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoipv6: Fix fragment id assignment on LE arches.
Vlad Yasevich [Mon, 9 Feb 2015 14:38:20 +0000 (09:38 -0500)]
ipv6: Fix fragment id assignment on LE arches.

Recent commit:
0508c07f5e0c94f38afd5434e8b2a55b84553077
Author: Vlad Yasevich <vyasevich@gmail.com>
Date:   Tue Feb 3 16:36:15 2015 -0500

    ipv6: Select fragment id during UFO segmentation if not set.

Introduced a bug on LE in how ipv6 fragment id is assigned.
This was cought by nightly sparce check:

Resolve the following sparce error:
 net/ipv6/output_core.c:57:38: sparse: incorrect type in assignment
 (different base types)
   net/ipv6/output_core.c:57:38:    expected restricted __be32
[usertype] ip6_frag_id
   net/ipv6/output_core.c:57:38:    got unsigned int [unsigned]
[assigned] [usertype] id

Fixes: 0508c07f5e0c9 (ipv6: Select fragment id during UFO segmentation if not set.)
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobridge: Fix inability to add non-vlan fdb entry
Toshiaki Makita [Mon, 9 Feb 2015 11:16:17 +0000 (20:16 +0900)]
bridge: Fix inability to add non-vlan fdb entry

Bridge's default_pvid adds a vid by default, by which we cannot add a
non-vlan fdb entry by default, because br_fdb_add() adds fdb entries for
all vlans instead of a non-vlan one when any vlan is configured.

 # ip link add br0 type bridge
 # ip link set eth0 master br0
 # bridge fdb add 12:34:56:78:90:ab dev eth0 master temp
 # bridge fdb show brport eth0 | grep 12:34:56:78:90:ab
 12:34:56:78:90:ab dev eth0 vlan 1 static

We expect a non-vlan fdb entry as well as vlan 1:
 12:34:56:78:90:ab dev eth0 static

To fix this, we need to insert a non-vlan fdb entry if vlan is not
specified, even when any vlan is configured.

Fixes: 5be5a2df40f0 ("bridge: Add filtering support for default_pvid")
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net...
David S. Miller [Mon, 9 Feb 2015 22:17:19 +0000 (14:17 -0800)]
Merge branch 'master' of git://git./linux/kernel/git/jkirsher/net-next

Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates 2015-02-09

This series contains updates to i40e and i40evf only.

Rickard Strandqvist removes an unused function for i40e.

John Linville reorders a piece of code so that y_budget does not get used
by the FCoE code before it gets initialized.

Mitch adds a delay after VF reset with a minimum of 10ms to allow the
hardware internal FIFOs to flush.  Bumps up the ARQ descriptors, since
we can easily overrun the PF's admin receive queue.  Added locking around
the VF reset code, since during VF deallocation, we cannot depend on
simply masking the interrupt since this does not lock out the service task,
which can still call the reset routine.  Fix a potential multi-minute
delay on driver unload, VF disable or system shutdown.  When the module
is being unloaded, waiting for the PF to politely handle all of our admin
queue requests might take forever with a lot of VFs enabled, so just
stop everything and request a VF reset.  Also stops the watchdog during
shutdown to prevent a log full of admin queue errors and the occasional
hang when the system is shut down.

Anjali forces Tx descriptor writebacks on ITR by kicking off the SWINT
interrupt since we noticed that there are non-cache-aligned Tx
descriptors waiting in the ring while interrupts are disabled under NAPI.
Enables loopback for the FCoE VSI, so that VSIs can directly talk to each
other without going out on the wire.

Matthew fixes an LED blink issue by making sure to clear the GPIO blink
field, instead of OR'ing against zero if the field is already '1'.

Greg cleans up a function header comment.

Vasu helps biosdevname user tool to differentiate dev_port for PCoE netdev
and PF netdev, by setting different dev_port value for FCoE netdev.

Carolyn adds a call to u64_stats_init to the receive setup in order to
avoid lockdep errors with seqcount on newer kernels.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: Mellanox: Delete unnecessary checks before the function call "vunmap"
Markus Elfring [Mon, 9 Feb 2015 10:10:41 +0000 (11:10 +0100)]
net: Mellanox: Delete unnecessary checks before the function call "vunmap"

The vunmap() function performs also input parameter validation.
Thus the test around the call is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Acked-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'expansion_rom'
David S. Miller [Mon, 9 Feb 2015 22:07:59 +0000 (14:07 -0800)]
Merge branch 'expansion_rom'

Hariprasad Shenai says:

====================
Add support in ethtool to get expansion ROM version

This series adds support to get expansion ROM version via ethtool getdrvinfo.
PATCH 1/3 ("ethtool: rename reserved1 memeber in ethtool_drvinfo for expansion
    ROM version") is created against net-next tree.
PATCH 2/3 ("cxgb4: Add support in cxgb4 to get expansion rom version via
ethtool") is created against net-next tree.
PATCH 3/3 ("ethtool: Add support to get expansion ROM version in ethtool
getdrvinfo") is created against ethtool tree.

We have included all the maintainers of respective trees. Kindly review the
change and let us know in case of any review comments.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agocxgb4: Add support in cxgb4 to get expansion rom version via ethtool
Hariprasad Shenai [Mon, 9 Feb 2015 06:37:30 +0000 (12:07 +0530)]
cxgb4: Add support in cxgb4 to get expansion rom version via ethtool

Add support to get option/expansion rom version flashed in the adapter via
ethtool getdrvinfo function.

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoethtool: rename reserved1 memeber in ethtool_drvinfo for expansion ROM version
Hariprasad Shenai [Mon, 9 Feb 2015 06:37:29 +0000 (12:07 +0530)]
ethtool: rename reserved1 memeber in ethtool_drvinfo for expansion ROM version

Renamed the reserved1 member of struct ethtool_drvinfo to erom_version to get
expansion/option ROM version of the adapter if present.

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: dsa: Remove redundant phy_attach()
Andrew Lunn [Mon, 9 Feb 2015 01:29:55 +0000 (02:29 +0100)]
net: dsa: Remove redundant phy_attach()

dsa_slave_phy_setup() finds the phy for the port via device tree and
using of_phy_connect(), or it uses the fall back of taking a phy from
the switch internal mdio bus and calling phy_connect_direct(). Either
way, if a phy is found, phy_attach_direct() is called to attach the
phy to the slave device.

In dsa_slave_create(), a second call to phy_attach() is made. This
results in the warning "PHY already attached". Remove this second,
redundant attaching of the phy.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'mlx4_bond_notify'
David S. Miller [Mon, 9 Feb 2015 22:03:59 +0000 (14:03 -0800)]
Merge branch 'mlx4_bond_notify'

Or Gerlitz says:

====================
bonding and mlx4 fixes for the HA/LAG support and mlx4 reset flow

There are two fixes to the boding + mlx4 HA/LAG support from Moni and a patch from Yishai
which does further hardening of the mlx4 reset support for IB kernel ULPs.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoIB/mlx4: Reset flow support for IB kernel ULPs
Yishai Hadas [Sun, 8 Feb 2015 09:49:34 +0000 (11:49 +0200)]
IB/mlx4: Reset flow support for IB kernel ULPs

The driver exposes interfaces that directly relate to HW state. Upon fatal
error, consumers of these interfaces (ULPs) that rely on completion of
all their posted work-request could hang, thereby introducing dependencies
in shutdown order.  To prevent this from happening, we manage the
relevant resources (CQs, QPs) that are used by the device. Upon a fatal error,
we now generate simulated completions for outstanding WQEs that were not
completed at the time the HW was reset.

It includes invoking the completion event handler for all involved CQs so that
the ULPs will poll those CQs. When polled we return simulated CQEs with
IB_WC_WR_FLUSH_ERR return code enabling ULPs to clean up their resources and
not wait forever for completions upon receiving remove_one.

The above change requires an extra check in the data path to make sure that when
device is in error state, the simulated CQEs will be returned and no further
WQEs will be posted.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoIB/mlx4: Always use the correct port for mirrored multicast attachments
Moni Shoua [Sun, 8 Feb 2015 09:49:33 +0000 (11:49 +0200)]
IB/mlx4: Always use the correct port for mirrored multicast attachments

When attaching a QP to a multicast address in bonded mode, there was an
assumption that the port of the QP must be #1. This assumption isn't the
case under the flow which enables maximal usage of the physical ports.

Fix it by always checking the port of the original flow and create the
mirrored flow on the other port.

Fixes: c6215745b66a ('IB/mlx4: Load balance ports in port aggregation mode')
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet/bonding: Fix potential bad memory access during bonding events
Moni Shoua [Sun, 8 Feb 2015 09:49:32 +0000 (11:49 +0200)]
net/bonding: Fix potential bad memory access during bonding events

When queuing work to send the NETDEV_BONDING_INFO netdev event, it's
possible that when the work is executed, the pointer to the slave
becomes invalid. This can happen if between queuing the event and the
execution of the work, the net-device was un-ensvaled and re-enslaved.

Fix that by queuing a work with the data of the slave instead of the
slave structure.

Fixes: 69e6113343cf ('net/bonding: Notify state change on slaves')
Reported-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'tipc-next'
David S. Miller [Mon, 9 Feb 2015 21:20:53 +0000 (13:20 -0800)]
Merge branch 'tipc-next'

Richard Alpe says:

====================
tipc: new compat layer for the legacy NL API

This is a compatibility / transcoding layer for the old netlink API.
It relies on the new netlink API to collect data or perform actions
(dumpit / doit).

The main benefit of this compat layer is that it removes a lot of
complex code from the tipc core as only the new API needs to be able
harness data or perform actions. I.e. the compat layer isn't concerned
with locking or how the internal data-structures look. As long as the
new API stays relatively intact the compat layer should be fine.

The main challenge in this compat layer is the randomness of the legacy
API. Some commands send binary data and some send ASCII data, some are
very picky in optimizing there buffer sizes and some just don't care.
Most legacy commands put there data in a single TLV (data container) but some
segment the data into multiple TLV's. This list of randomness goes on and on..
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotipc: remove tipc_snprintf
Richard Alpe [Mon, 9 Feb 2015 08:50:19 +0000 (09:50 +0100)]
tipc: remove tipc_snprintf

tipc_snprintf() was heavily utilized by the old netlink API which no
longer exists (now netlink compat).

In this patch we swap tipc_snprintf() to the identical scnprintf() in
the only remaining occurrence.

Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotipc: nl compat add noop and remove legacy nl framework
Richard Alpe [Mon, 9 Feb 2015 08:50:18 +0000 (09:50 +0100)]
tipc: nl compat add noop and remove legacy nl framework

Add TIPC_CMD_NOOP to compat layer and remove the old framework.

All legacy nl commands are now converted to the compat layer in
netlink_compat.c.

Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotipc: convert legacy nl stats show to nl compat
Richard Alpe [Mon, 9 Feb 2015 08:50:17 +0000 (09:50 +0100)]
tipc: convert legacy nl stats show to nl compat

Convert TIPC_CMD_SHOW_STATS to compat layer. This command does not
have any counterpart in the new API, meaning it now solely exists as a
function in the compat layer.

Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotipc: convert legacy nl net id get to nl compat
Richard Alpe [Mon, 9 Feb 2015 08:50:16 +0000 (09:50 +0100)]
tipc: convert legacy nl net id get to nl compat

Convert TIPC_CMD_GET_NETID to compat dumpit.

Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotipc: convert legacy nl net id set to nl compat
Richard Alpe [Mon, 9 Feb 2015 08:50:15 +0000 (09:50 +0100)]
tipc: convert legacy nl net id set to nl compat

Convert TIPC_CMD_SET_NETID to compat doit.

Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotipc: convert legacy nl node addr set to nl compat
Richard Alpe [Mon, 9 Feb 2015 08:50:14 +0000 (09:50 +0100)]
tipc: convert legacy nl node addr set to nl compat

Convert TIPC_CMD_SET_NODE_ADDR to compat doit.

Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotipc: convert legacy nl node dump to nl compat
Richard Alpe [Mon, 9 Feb 2015 08:50:13 +0000 (09:50 +0100)]
tipc: convert legacy nl node dump to nl compat

Convert TIPC_CMD_GET_NODES to compat dumpit and remove global node
counter solely used by the legacy API.

Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotipc: convert legacy nl media dump to nl compat
Richard Alpe [Mon, 9 Feb 2015 08:50:12 +0000 (09:50 +0100)]
tipc: convert legacy nl media dump to nl compat

Convert TIPC_CMD_GET_MEDIA_NAMES to compat dumpit.

Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotipc: convert legacy nl socket dump to nl compat
Richard Alpe [Mon, 9 Feb 2015 08:50:11 +0000 (09:50 +0100)]
tipc: convert legacy nl socket dump to nl compat

Convert socket (port) listing to compat dumpit call. If a socket
(port) has publications a second dumpit call is issued to collect them
and format then into the legacy buffer before continuing to process
the sockets (ports).

Command converted in this patch:
TIPC_CMD_SHOW_PORTS

Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotipc: convert legacy nl name table dump to nl compat
Richard Alpe [Mon, 9 Feb 2015 08:50:10 +0000 (09:50 +0100)]
tipc: convert legacy nl name table dump to nl compat

Add functionality for printing a dump header and convert
TIPC_CMD_SHOW_NAME_TABLE to compat dumpit.

Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotipc: convert legacy nl link stat reset to nl compat
Richard Alpe [Mon, 9 Feb 2015 08:50:09 +0000 (09:50 +0100)]
tipc: convert legacy nl link stat reset to nl compat

Convert TIPC_CMD_RESET_LINK_STATS to compat doit.

Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotipc: convert legacy nl link prop set to nl compat
Richard Alpe [Mon, 9 Feb 2015 08:50:08 +0000 (09:50 +0100)]
tipc: convert legacy nl link prop set to nl compat

Convert setting of link proprieties to compat doit calls.

Commands converted in this patch:
TIPC_CMD_SET_LINK_TOL
TIPC_CMD_SET_LINK_PRI
TIPC_CMD_SET_LINK_WINDOW

Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotipc: convert legacy nl link dump to nl compat
Richard Alpe [Mon, 9 Feb 2015 08:50:07 +0000 (09:50 +0100)]
tipc: convert legacy nl link dump to nl compat

Convert TIPC_CMD_GET_LINKS to compat dumpit and remove global link
counter solely used by the legacy API.

Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotipc: convert legacy nl link stat to nl compat
Richard Alpe [Mon, 9 Feb 2015 08:50:06 +0000 (09:50 +0100)]
tipc: convert legacy nl link stat to nl compat

Add functionality for safely appending string data to a TLV without
keeping write count in the caller.

Convert TIPC_CMD_SHOW_LINK_STATS to compat dumpit.

Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotipc: convert legacy nl bearer enable/disable to nl compat
Richard Alpe [Mon, 9 Feb 2015 08:50:05 +0000 (09:50 +0100)]
tipc: convert legacy nl bearer enable/disable to nl compat

Introduce a framework for transcoding legacy nl action into actions
(.doit) calls from the new nl API. This is done by converting the
incoming TLV data into netlink data with nested netlink attributes.
Unfortunately due to the randomness of the legacy API we can't do this
generically so each legacy netlink command requires a specific
transcoding recipe. In this case for bearer enable and bearer disable.

Convert TIPC_CMD_ENABLE_BEARER and TIPC_CMD_DISABLE_BEARER into doit
compat calls.

Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotipc: convert legacy nl bearer dump to nl compat
Richard Alpe [Mon, 9 Feb 2015 08:50:04 +0000 (09:50 +0100)]
tipc: convert legacy nl bearer dump to nl compat

Introduce a framework for dumping netlink data from the new netlink
API and formatting it to the old legacy API format. This is done by
looping the dump data and calling a format handler for each entity, in
this case a bearer.

We dump until either all data is dumped or we reach the limited buffer
size of the legacy API. Remember, the legacy API doesn't scale.

In this commit we convert TIPC_CMD_GET_BEARER_NAMES to use the compat
layer.

Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotipc: move and rename the legacy nl api to "nl compat"
Richard Alpe [Mon, 9 Feb 2015 08:50:03 +0000 (09:50 +0100)]
tipc: move and rename the legacy nl api to "nl compat"

The new netlink API is no longer "v2" but rather the standard API and
the legacy API is now "nl compat". We split them into separate
start/stop and put them in different files in order to further
distinguish them.

Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge tag 'wireless-drivers-next-for-davem-2015-02-07' of git://git.kernel.org/pub...
David S. Miller [Mon, 9 Feb 2015 20:07:20 +0000 (12:07 -0800)]
Merge tag 'wireless-drivers-next-for-davem-2015-02-07' of git://git./linux/kernel/git/kvalo/wireless-drivers-next

Major changes:

iwlwifi:

* more work for new devices (4165 / 8260)
* cleanups / improvemnts in rate control
* fixes for TDLS
* major statistics work from Johannes - more to come
* improvements for the fw error dump infrastructure
* usual amount of small fixes here and there (scan, D0i3 etc...)
* add support for beamforming
* enable stuck queue detection for iwlmvm
* a few fixes for EBS scan
* fixes for various failure paths
* improvements for TDLS Offchannel

wil6210:

* performance tuning
* some AP features

brcm80211:

* rework some code in SDIO part of the brcmfmac driver related to
  suspend/resume that were found doing stress testing
* in PCIe part scheduling of worker thread needed to be relaxed
* minor fixes and exposing firmware revision information to
  user-space, ie. ethtool.

mwifiex:

* enhancements for change virtual interface handling
* remove coupling between netdev and FW supported interface
  combination, now conversion from any type of supported interface
  types to any other type is possible
* DFS support in AP mode

ath9k:

* fix calibration issues on some boards
* Wake-on-WLAN improvements

ath10k:

* add support for qca6174 hardware
* enable RX batching to reduce CPU load

Conflicts:
drivers/net/wireless/rtlwifi/pci.c

Conflict resolution is to get rid of the 'end' label and keep
the rest.

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoi40e/i40evf: Add call to u64_stats_init to init
Carolyn Wyborny [Sat, 24 Jan 2015 09:58:32 +0000 (09:58 +0000)]
i40e/i40evf: Add call to u64_stats_init to init

This patch adds a call to u64_stats_init to Rx setup.
This done in order to avoid lockdep errors with seqcount on newer kernels.

Change-ID: Ia8ba8f0bcbd1c0e926f97d70aeee4ce4fd055e93
Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoi40e: Enable Loopback for the FCOE vsi as well
Anjali Singhai Jain [Sat, 24 Jan 2015 09:58:31 +0000 (09:58 +0000)]
i40e: Enable Loopback for the FCOE vsi as well

For all VSIs on a VEB, Loopback mode should be either on or off.
Our configuration requires them to be ON so that VSIs can directly
talk to each other without going out on the wire.

Change-ID: I77b8310bc846329972b13b185949ab1431a46c30
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoi40e: use dev_port for fcoe netdev
Vasu Dev [Sat, 24 Jan 2015 09:58:30 +0000 (09:58 +0000)]
i40e: use dev_port for fcoe netdev

Set different dev_port value 1 for FCoE netdev than the default zero
dev_port value for PF netdev, this helps biosdevname user tool to
differentiate them correctly while both attached to the same PCI
function.

Change-ID: I8fb90e4ef52a1242f7580e49a3f0918735aee8ef
Signed-off-by: Vasu Dev <vasu.dev@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoi40e: Fix function header
Greg Rose [Sat, 24 Jan 2015 09:58:29 +0000 (09:58 +0000)]
i40e: Fix function header

s/enable/disable

Change-ID: Ic0572a6c59d03e05a0a35d2e2e9d532e0512638d
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoi40e: fix led blink toggle to enable steady state
Matt Jared [Sat, 24 Jan 2015 09:58:28 +0000 (09:58 +0000)]
i40e: fix led blink toggle to enable steady state

Make sure to clear the GPIO blink field, instead of OR'ing against zero
if the field is already '1'.

Change-ID: Ie52a52abd48f6f52b20778a6b8b0c542dfc9245c
Signed-off-by: Matt Jared <matthew.a.jared@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoi40evf: Force Tx writeback on ITR
Anjali Singhai Jain [Sat, 10 Jan 2015 01:07:19 +0000 (01:07 +0000)]
i40evf: Force Tx writeback on ITR

This patch forces Tx descriptor writebacks on ITR by kicking
off the SWINT interrupt when we notice that there are non-cache-aligned
Tx descriptors waiting in the ring while interrupts are disabled
under NAPI.

Change-ID: dd6d9675629bf266c7515ad7a201394618c35444
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoi40e: stop the service task at shutdown
Mitch Williams [Fri, 9 Jan 2015 11:18:19 +0000 (11:18 +0000)]
i40e: stop the service task at shutdown

Stop the service task in the shutdown handler, preventing it from
accessing the admin queue after it had been closed. This fixes a panic
that could occur when the system was shut down with a lot of VFs
enabled.

Change-ID: I286735e3842de472385bbf7ad68d30331e508add
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agonet:rfs: adjust table size checking
Eric Dumazet [Mon, 9 Feb 2015 04:39:13 +0000 (20:39 -0800)]
net:rfs: adjust table size checking

Make sure root user does not try something stupid.

Also make sure mask field in struct rps_sock_flow_table
does not share a cache line with the potentially often dirtied
flow table.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: 567e4b79731c ("net: rfs: add hash collision detection")
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agocxgb4: Fix trace observed while dumping clip_tbl
Hariprasad Shenai [Mon, 9 Feb 2015 04:17:10 +0000 (09:47 +0530)]
cxgb4: Fix trace observed while dumping clip_tbl

Handle clip_tbl debugfs entry, when clip_tbl isn't allocated.
In commit b5a02f503caa0837 ("cxgb4: Update ipv6 address handling api") wrong
argument was passed for single_open for clip_tbl debugfs entry, which led to
below trace. Fixing it.

======
call Trace:
 [<ffffffffa073c606>] clip_tbl_open+0x16/0x30 [cxgb4]
 [<ffffffff8119e2fa>] do_dentry_open+0x21a/0x370
 [<ffffffff8119e499>] vfs_open+0x49/0x50
 [<ffffffff811b0d0e>] do_last+0x21e/0x800
 [<ffffffff811b1382>] path_openat+0x92/0x470
 [<ffffffff8110569f>] ? rb_reserve_next_event+0xaf/0x380
 [<ffffffff8110569f>] ? rb_reserve_next_event+0xaf/0x380
 [<ffffffff811b189a>] do_filp_open+0x4a/0xa0
 [<ffffffff811bdc5d>] ? __alloc_fd+0xcd/0x140
 [<ffffffff8119fa4a>] do_sys_open+0x11a/0x230
 [<ffffffff8101219f>] ? syscall_trace_enter_phase2+0xaf/0x1b0
 [<ffffffff8119fb9e>] SyS_open+0x1e/0x20
 [<ffffffff815bf6f0>] tracesys_phase2+0xd4/0xd9
Code: 89 e5 66 66 66 66 90 48 8b 47 e0 48 8b 40 30 48 8b 40 58 c9 c3 66 0f 1f
84 00 00 00 00 00 55 48 89 e5 66 66 66 66 90 48 8b 47 e0 <48> 8b 40 58 c9 c3 66
66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48
RIP  [<ffffffff8120898d>] PDE_DATA+0xd/0x20
 RSP <ffff8800b08c3c48>
CR2: 0000000000000058

=====

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agorhashtable: using ERR_PTR requires linux/err.h
Stephen Rothwell [Mon, 9 Feb 2015 03:04:03 +0000 (14:04 +1100)]
rhashtable: using ERR_PTR requires linux/err.h

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoi40evf: stop the watchdog for shutdown
Mitch Williams [Fri, 9 Jan 2015 11:18:18 +0000 (11:18 +0000)]
i40evf: stop the watchdog for shutdown

Stop the watchdog during shutdown. Failing to do this causes a log full
of admin queue errors and the occasional hang when the system is shut
down.

Change-ID: Ib2fd11213cca2fa589eb68577e86b1000c23c250
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoi40evf: ignore bogus messages from FW
Mitch Williams [Fri, 9 Jan 2015 11:18:17 +0000 (11:18 +0000)]
i40evf: ignore bogus messages from FW

Occasionally on shutdown, the FW will hand us a bunch of messages filled
with zeros, which can cause us to spin trying to handle them. Just
ignore these and get on with shutting down.

Change-ID: I347e9648f7153ad5a7b7e0847b87f7aad5f3e0da
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoi40evf: reset on module unload
Mitch Williams [Fri, 9 Jan 2015 11:18:16 +0000 (11:18 +0000)]
i40evf: reset on module unload

When the module is being unloaded, don't wait for the PF to politely
handle all of our admin queue requests, as that might take forever with
a lot of VFs enabled. Instead, just stop everything and request a VF
reset.

When the original shutdown code was written, VF resets were unreliable,
so we avoided them. But with production hardware and firmware, and the
1.x PF driver, this is no longer the case.

This fixes a potential multi-minute delay on driver unload, VF disable,
or system shutdown.

Change-ID: Ib43d6d860ef6b9b8f26e8dce0615a0302608c7d9
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoi40e: add locking around VF reset
Mitch Williams [Fri, 9 Jan 2015 11:18:15 +0000 (11:18 +0000)]
i40e: add locking around VF reset

During VF deallocation, we need to lock out the VF reset code. However,
we cannot depend on simply masking the interrupt, as this does not lock
out the service task, which can still call the reset routine. Instead,
leave the interrupt enabled, but add locking around the VF disable and
reset routines.

For the disable code, we wait to get the lock, as the reset code will
take a finite amount of time to run. For the reset code, we just return
if we fail to get the lock. Since we know that the VFs are being
disabled, we don't need to handle the reset.
This fixes a panic when disabling SR-IOV.

Change-ID: Iea0a6cdef35c331f48c6d5b2f8e6f0e86322e7d8
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoi40e: Use even more ARQ descriptors
Mitch Williams [Fri, 9 Jan 2015 11:18:14 +0000 (11:18 +0000)]
i40e: Use even more ARQ descriptors

When enabling 64 VFs and loading the VF driver in the host kernel, we
can easily overrun the PF's admin receive queue. Double the size of this
queue, and increase the work limit to allow the PF to handle more
requests in a single pass through the service task.

Change-ID: I0efbbdc61954bffad422a2f33c4b948a59370bf5
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoi40e: delay after VF reset
Mitch Williams [Fri, 9 Jan 2015 11:18:13 +0000 (11:18 +0000)]
i40e: delay after VF reset

Delay a minimum of 10ms after VF reset, to allow the hardware's internal
FIFOs to flush.

Change-ID: I8a02ddb28c9f0d7303a1eb21d0b2443e5b4c1cda
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoi40e: avoid use of uninitialized v_budget in i40e_init_msix
John W Linville [Wed, 14 Jan 2015 03:06:28 +0000 (03:06 +0000)]
i40e: avoid use of uninitialized v_budget in i40e_init_msix

This I40E_FCOE block increments v_budget before it has been initialized,
then v_budget gets overwritten a few lines later.  This patch just
reorders the code hunks in what I believe was the intended sequence.

Coverity: CID 1260099

Signed-off-by: John W Linville <linville@tuxdriver.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoi40e: i40e_fcoe.c: Remove unused function
Rickard Strandqvist [Wed, 7 Jan 2015 11:40:17 +0000 (11:40 +0000)]
i40e: i40e_fcoe.c: Remove unused function

Remove the function i40e_rx_is_fip() that is not used anywhere.

This was partially found by using a static code analysis program
called cppcheck.

Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agovxlan: Wrong type passed to %pIS
Rasmus Villemoes [Sat, 7 Feb 2015 02:17:31 +0000 (03:17 +0100)]
vxlan: Wrong type passed to %pIS

src_ip is a pointer to a union vxlan_addr, one member of which is a
struct sockaddr. Passing a pointer to src_ip is wrong; one should pass
the value of src_ip itself. Since %pIS formally expects something of
type struct sockaddr*, let's pass a pointer to the appropriate union
member, though this of course doesn't change the generated code.

Fixes: e4c7ed415387 ("vxlan: add ipv6 support")
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoDriver: Vmxnet3: Change the hex constant to its decimal equivalent
Shrikrishna Khare [Fri, 6 Feb 2015 21:48:28 +0000 (13:48 -0800)]
Driver: Vmxnet3: Change the hex constant to its decimal equivalent

The hex constant chosen for VMXNET3_REV1_MAGIC is offensive,
replace it with its decimal equivalent.

Signed-off-by: Shrikrishna Khare <skhare@vmware.com>
Reviewed-by: Shreyas Bhatewara <sbhatewara@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: rfs: add hash collision detection
Eric Dumazet [Fri, 6 Feb 2015 20:59:01 +0000 (12:59 -0800)]
net: rfs: add hash collision detection

Receive Flow Steering is a nice solution but suffers from
hash collisions when a mix of connected and unconnected traffic
is received on the host, when flow hash table is populated.

Also, clearing flow in inet_release() makes RFS not very good
for short lived flows, as many packets can follow close().
(FIN , ACK packets, ...)

This patch extends the information stored into global hash table
to not only include cpu number, but upper part of the hash value.

I use a 32bit value, and dynamically split it in two parts.

For host with less than 64 possible cpus, this gives 6 bits for the
cpu number, and 26 (32-6) bits for the upper part of the hash.

Since hash bucket selection use low order bits of the hash, we have
a full hash match, if /proc/sys/net/core/rps_sock_flow_entries is big
enough.

If the hash found in flow table does not match, we fallback to RPS (if
it is enabled for the rxqueue).

This means that a packet for an non connected flow can avoid the
IPI through a unrelated/victim CPU.

This also means we no longer have to clear the table at socket
close time, and this helps short lived flows performance.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: fix a typo in skb_checksum_validate_zero_check
Sabrina Dubroca [Fri, 6 Feb 2015 17:54:19 +0000 (18:54 +0100)]
net: fix a typo in skb_checksum_validate_zero_check

Remove trailing underscore.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agogre/ipip: use be16 variants of netlink functions
Sabrina Dubroca [Fri, 6 Feb 2015 16:22:22 +0000 (17:22 +0100)]
gre/ipip: use be16 variants of netlink functions

encap.sport and encap.dport are __be16, use nla_{get,put}_be16 instead
of nla_{get,put}_u16.

Fixes the sparse warnings:

warning: incorrect type in assignment (different base types)
   expected restricted __be32 [addressable] [usertype] o_key
   got restricted __be16 [addressable] [usertype] i_flags
warning: incorrect type in assignment (different base types)
   expected restricted __be16 [usertype] sport
   got unsigned short
warning: incorrect type in assignment (different base types)
   expected restricted __be16 [usertype] dport
   got unsigned short
warning: incorrect type in argument 3 (different base types)
   expected unsigned short [unsigned] [usertype] value
   got restricted __be16 [usertype] sport
warning: incorrect type in argument 3 (different base types)
   expected unsigned short [unsigned] [usertype] value
   got restricted __be16 [usertype] dport

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotipc: fix bug in socket reception function
Jon Paul Maloy [Sun, 8 Feb 2015 16:10:50 +0000 (11:10 -0500)]
tipc: fix bug in socket reception function

In commit c637c1035534867b85b78b453c38c495b58e2c5a ("tipc: resolve race
problem at unicast message reception") we introduced a time limit
for how long the function tipc_sk_eneque() would be allowed to execute
its loop. Unfortunately, the test for when this limit is passed was put
in the wrong place, resulting in a lost message when the test is true.

We fix this by moving the test to before we dequeue the next buffer
from the input queue.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agort6_probe_deferred: Do not depend on struct ordering
Michael Büsch [Sun, 8 Feb 2015 09:14:07 +0000 (10:14 +0100)]
rt6_probe_deferred: Do not depend on struct ordering

rt6_probe allocates a struct __rt6_probe_work and schedules a work handler rt6_probe_deferred.
But rt6_probe_deferred kfree's the struct work_struct instead of struct __rt6_probe_work.
This works, because struct work_struct is the first element of struct __rt6_probe_work.

Change it to kfree struct __rt6_probe_work to not implicitly depend on
struct work_struct being the first element.

This does not affect the generated code.

Signed-off-by: Michael Buesch <m@bues.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'tcp_ack_loops'
David S. Miller [Sun, 8 Feb 2015 09:03:26 +0000 (01:03 -0800)]
Merge branch 'tcp_ack_loops'

Neal Cardwellsays:

====================
tcp: mitigate TCP ACK loops due to out-of-window validation dupacks

This patch series mitigates "ack loop" DoS scenarios by rate-limiting
outgoing duplicate ACKs sent in response to incoming "out of window"
segments.

Background
-----------

There are several cases in which the TCP RFCs specify that a TCP
endpoint should send a pure duplicate ACK in response to a pure
duplicate ACK that appears to be invalid due to being "out of window":

(1) RFC 793 (section 3.9, page 69) specifies that endpoints should
    send a duplicate ACK in response to an ACK when the incoming
    sequence number is invalid due to being outside the receive
    window: "If an incoming segment is not acceptable, an
    acknowledgment should be sent in reply".

(2) RFC 793 (section 3.9, page 72) says: "If the ACK acknowledges
    something not yet sent (SEG.ACK > SND.NXT) then send an ACK".

(3) RFC 1323 (section 4.2.1, page 18) specifies that endpoints should
    send a duplicate ACK in response to an ACK when the PAWS check for
    the incoming timestamp value fails: "If .... SEG.TSval < TS.Recent
    and if TS.Recent is valid ... Send an acknowledgement in reply"

The problem
------------

Normally, this is not a problem. However, a buggy middlebox or
malicious man-in-the-middle can inject a few packets into the
conversation that advance each endpoint's notion of the current window
(sequence, ACK, or timestamp), without either side noticing. In this
case, from then on each side can think the other is sending invalid
segments. Thus an infinite feedback loop of duplicate ACKs can ensue,
as each endpoint receives a duplicate ACK, decides that it is invalid
(due to sequence number, ACK number, or timestamp), and then sends a
dupack in reply, which the other side decides is invalid, responding
with a dupack... ad infinitum. This ping-pong feedback loop can happen
at a very high rate.

This phenomenon can and does happen in practice. It has been seen in
datacenter and Internet contexts at Google, and has been documented by
Anil Agarwal in the Nov 2013 tcpm thread "TCP mismatched sequence
numbers issue", and Avery Fay in the Feb 2015 Linux netdev thread
"Invalid timestamp? causing tight ack loop (hundreds of thousands of
packets / sec)".

This patch series
------------------

This patch series mitigates such ack loops by rate-limiting outgoing
duplicate ACKs sent in response to incoming TCP packets that are for
an existing connection but that are invalid due to any of the reasons
mentioned above: sequence number (1), ACK field (2), or timestamp
value (3). The rate limit for such duplicate ACKs is specified by a
new sysctl, tcp_invalid_ratelimit, which specifies the minimal space
between such outbound duplicate ACKs, in milliseconds. The default is
500 (500ms), and 0 disables the mechanism.

We rate-limit these duplicate ACK responses rather than blocking them
entirely or resetting the connection, because legitimate connections
can rely on dupacks in response to some out-of-window segments. For
example, zero window probes are typically sent with a sequence number
that is below the current window, and ZWPs thus expect to thus elicit
a dupack in response.

Testing: this approach has been in use at Google for a while.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotcp: mitigate ACK loops for connections as tcp_timewait_sock
Neal Cardwell [Fri, 6 Feb 2015 21:04:41 +0000 (16:04 -0500)]
tcp: mitigate ACK loops for connections as tcp_timewait_sock

Ensure that in state FIN_WAIT2 or TIME_WAIT, where the connection is
represented by a tcp_timewait_sock, we rate limit dupacks in response
to incoming packets (a) with TCP timestamps that fail PAWS checks, or
(b) with sequence numbers that are out of the acceptable window.

We do not send a dupack in response to out-of-window packets if it has
been less than sysctl_tcp_invalid_ratelimit (default 500ms) since we
last sent a dupack in response to an out-of-window packet.

Reported-by: Avery Fay <avery@mixpanel.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotcp: mitigate ACK loops for connections as tcp_sock
Neal Cardwell [Fri, 6 Feb 2015 21:04:40 +0000 (16:04 -0500)]
tcp: mitigate ACK loops for connections as tcp_sock

Ensure that in state ESTABLISHED, where the connection is represented
by a tcp_sock, we rate limit dupacks in response to incoming packets
(a) with TCP timestamps that fail PAWS checks, or (b) with sequence
numbers or ACK numbers that are out of the acceptable window.

We do not send a dupack in response to out-of-window packets if it has
been less than sysctl_tcp_invalid_ratelimit (default 500ms) since we
last sent a dupack in response to an out-of-window packet.

There is already a similar (although global) rate-limiting mechanism
for "challenge ACKs". When deciding whether to send a challence ACK,
we first consult the new per-connection rate limit, and then the
global rate limit.

Reported-by: Avery Fay <avery@mixpanel.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotcp: mitigate ACK loops for connections as tcp_request_sock
Neal Cardwell [Fri, 6 Feb 2015 21:04:39 +0000 (16:04 -0500)]
tcp: mitigate ACK loops for connections as tcp_request_sock

In the SYN_RECV state, where the TCP connection is represented by
tcp_request_sock, we now rate-limit SYNACKs in response to a client's
retransmitted SYNs: we do not send a SYNACK in response to client SYN
if it has been less than sysctl_tcp_invalid_ratelimit (default 500ms)
since we last sent a SYNACK in response to a client's retransmitted
SYN.

This allows the vast majority of legitimate client connections to
proceed unimpeded, even for the most aggressive platforms, iOS and
MacOS, which actually retransmit SYNs 1-second intervals for several
times in a row. They use SYN RTO timeouts following the progression:
1,1,1,1,1,2,4,8,16,32.

Reported-by: Avery Fay <avery@mixpanel.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotcp: helpers to mitigate ACK loops by rate-limiting out-of-window dupacks
Neal Cardwell [Fri, 6 Feb 2015 21:04:38 +0000 (16:04 -0500)]
tcp: helpers to mitigate ACK loops by rate-limiting out-of-window dupacks

Helpers for mitigating ACK loops by rate-limiting dupacks sent in
response to incoming out-of-window packets.

This patch includes:

- rate-limiting logic
- sysctl to control how often we allow dupacks to out-of-window packets
- SNMP counter for cases where we rate-limited our dupack sending

The rate-limiting logic in this patch decides to not send dupacks in
response to out-of-window segments if (a) they are SYNs or pure ACKs
and (b) the remote endpoint is sending them faster than the configured
rate limit.

We rate-limit our responses rather than blocking them entirely or
resetting the connection, because legitimate connections can rely on
dupacks in response to some out-of-window segments. For example, zero
window probes are typically sent with a sequence number that is below
the current window, and ZWPs thus expect to thus elicit a dupack in
response.

We allow dupacks in response to TCP segments with data, because these
may be spurious retransmissions for which the remote endpoint wants to
receive DSACKs. This is safe because segments with data can't
realistically be part of ACK loops, which by their nature consist of
each side sending pure/data-less ACKs to each other.

The dupack interval is controlled by a new sysctl knob,
tcp_invalid_ratelimit, given in milliseconds, in case an administrator
needs to dial this upward in the face of a high-rate DoS attack. The
name and units are chosen to be analogous to the existing analogous
knob for ICMP, icmp_ratelimit.

The default value for tcp_invalid_ratelimit is 500ms, which allows at
most one such dupack per 500ms. This is chosen to be 2x faster than
the 1-second minimum RTO interval allowed by RFC 6298 (section 2, rule
2.4). We allow the extra 2x factor because network delay variations
can cause packets sent at 1 second intervals to be compressed and
arrive much closer.

Reported-by: Avery Fay <avery@mixpanel.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoopenvswitch: Initialize unmasked key and uid len
Pravin B Shelar [Fri, 6 Feb 2015 19:17:13 +0000 (11:17 -0800)]
openvswitch: Initialize unmasked key and uid len

Flow alloc needs to initialize unmasked key pointer. Otherwise
it can crash kernel trying to free random unmasked-key pointer.

general protection fault: 0000 [#1] SMP
3.19.0-rc6-net-next+ #457
Hardware name: Supermicro X7DWU/X7DWU, BIOS  1.1 04/30/2008
RIP: 0010:[<ffffffff8111df0e>] [<ffffffff8111df0e>] kfree+0xac/0x196
Call Trace:
 [<ffffffffa060bd87>] flow_free+0x21/0x59 [openvswitch]
 [<ffffffffa060bde0>] ovs_flow_free+0x21/0x23 [openvswitch]
 [<ffffffffa0605b4a>] ovs_packet_cmd_execute+0x2f3/0x35f [openvswitch]
 [<ffffffffa0605995>] ? ovs_packet_cmd_execute+0x13e/0x35f [openvswitch]
 [<ffffffff811fe6fb>] ? nla_parse+0x4f/0xec
 [<ffffffff8139a2fc>] genl_family_rcv_msg+0x26d/0x2c9
 [<ffffffff8107620f>] ? __lock_acquire+0x90e/0x9aa
 [<ffffffff8139a3be>] genl_rcv_msg+0x66/0x89
 [<ffffffff8139a358>] ? genl_family_rcv_msg+0x2c9/0x2c9
 [<ffffffff81399591>] netlink_rcv_skb+0x3e/0x95
 [<ffffffff81399898>] ? genl_rcv+0x18/0x37
 [<ffffffff813998a7>] genl_rcv+0x27/0x37
 [<ffffffff81399033>] netlink_unicast+0x103/0x191
 [<ffffffff81399382>] netlink_sendmsg+0x2c1/0x310
 [<ffffffff811007ad>] ? might_fault+0x50/0xa0
 [<ffffffff8135c773>] do_sock_sendmsg+0x5f/0x7a
 [<ffffffff8135c799>] sock_sendmsg+0xb/0xd
 [<ffffffff8135cacf>] ___sys_sendmsg+0x1a3/0x218
 [<ffffffff8113e54b>] ? get_close_on_exec+0x86/0x86
 [<ffffffff8115a9d0>] ? fsnotify+0x32c/0x348
 [<ffffffff8115a720>] ? fsnotify+0x7c/0x348
 [<ffffffff8113e5f5>] ? __fget+0xaa/0xbf
 [<ffffffff8113e54b>] ? get_close_on_exec+0x86/0x86
 [<ffffffff8135cccd>] __sys_sendmsg+0x3d/0x5e
 [<ffffffff8135cd02>] SyS_sendmsg+0x14/0x16
 [<ffffffff81411852>] system_call_fastpath+0x12/0x17

Fixes: 74ed7ab9264("openvswitch: Add support for unique flow IDs.")
CC: Joe Stringer <joestringer@nicira.com>
Reported-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'cxgb4'
David S. Miller [Sun, 8 Feb 2015 06:53:03 +0000 (22:53 -0800)]
Merge branch 'cxgb4'

Hariprasad Shenai says:

====================
Add support to dump some hw debug info

This patch series adds support to dump sensor info, dump Transport Processor
event trace, dump Upper Layer Protocol RX module command trace, dump mailbox
contents and dump Transport Processor congestion control configuration.

Will send a separate patch series for all the hw stats patches, by moving them
to ethtool.

The patches series is created against 'net-next' tree.
And includes patches on cxgb4 driver.

We have included all the maintainers of respective drivers. Kindly review the
change and let us know in case of any review comments.

V2: Dopped all hw stats related patches. Added a new patch which adds support to
dump congestion control table.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agocxgb4: Add support in debugfs to dump the congestion control table
Hariprasad Shenai [Fri, 6 Feb 2015 14:02:55 +0000 (19:32 +0530)]
cxgb4: Add support in debugfs to dump the congestion control table

Dump Transport Processor modules congestion control configuration

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agocxgb4: Add support to dump mailbox content in debugfs
Hariprasad Shenai [Fri, 6 Feb 2015 14:02:54 +0000 (19:32 +0530)]
cxgb4: Add support to dump mailbox content in debugfs

Adds support to dump the current contents of mailbox and the driver which owns
it.

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agocxgb4: Add support for ULP RX logic analyzer output in debugfs
Hariprasad Shenai [Fri, 6 Feb 2015 14:02:53 +0000 (19:32 +0530)]
cxgb4: Add support for ULP RX logic analyzer output in debugfs

Dump Upper Layer Protocol RX module command trace

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agocxgb4: Added support in debugfs to display TP logic analyzer output
Hariprasad Shenai [Fri, 6 Feb 2015 14:02:52 +0000 (19:32 +0530)]
cxgb4: Added support in debugfs to display TP logic analyzer output

Dump Transport Processor event trace.

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agocxgb4: Add support in debugfs to display sensor information
Hariprasad Shenai [Fri, 6 Feb 2015 14:02:51 +0000 (19:32 +0530)]
cxgb4: Add support in debugfs to display sensor information

Dump out various chip sensor information. Currently Chip Temperature
and Core Voltage.

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'be2net'
David S. Miller [Sun, 8 Feb 2015 06:51:02 +0000 (22:51 -0800)]
Merge branch 'be2net'

Sathya Perla says:

====================
be2net: patch set

Hi Dave, pls consider applying the following patch-set to the
net-next tree. It has 5 code/style cleanup patches and 4 patches that
add functionality to the driver.

Patch 1 moves routines that were not needed to be in be.h to the respective
src files, to avoid unnecessary compilation.

Patch 2 replaces (1 << x) with BIT(x) macro

Patch 3 refactors code that checks if a FW flash file is compatible
with the adapter. The code is now refactored into 2 routines, the first one
gets the file type from the image file and the 2nd routine checks if the
file type is compatible with the adapter.

Patch 4 adds compatibility checks for flashing a FW image on the new
Skyhawk P2 HW revision.

Patch 5 adds support for a new "offset based" flashing scheme, wherein
the driver informs the FW of the offset at which each component in the flash
file is to be flashed at. This helps flashing components that were
previously not recognized by the running FW.

Patch 6 simplifies the be_cmd_rx_filter() routine, by passing to it the
filter flags already used in the FW cmd, instead of the netdev flags that
were converted to the FW-cmd flags.

Patch 7 introduces helper routines in be_set_rx_mode() and be_vid_config()
to improve code readability.

Patch 8 adds processing of port-misconfig async event sent by the FW.

Patch 9 removes unnecessary swapping of a field in the TX desc.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobe2net: avoid unncessary swapping of fields in eth_tx_wrb
Sathya Perla [Fri, 6 Feb 2015 13:18:43 +0000 (08:18 -0500)]
be2net: avoid unncessary swapping of fields in eth_tx_wrb

The 32-bit fields of a tx-wrb are little endian. The driver is currently
using be_dws_le_to_cpu() routine to swap (cpu to le) all the fields of
a tx-wrb. So, the rsvd field is also unnecessarily swapped.

This patch fixes this by individually swapping the required fields.
Also, the type of the fields in eth_tx_wrb{} is now changed to __le32
from u32 to avoid sparse warnings.

Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobe2net: process port misconfig async event
Vasundhara Volam [Fri, 6 Feb 2015 13:18:42 +0000 (08:18 -0500)]
be2net: process port misconfig async event

This patch adds support for processing the port misconfigure async
event generated by the FW. This event is generated typically when an
optical module is incorrectly installed or is faulty.

This patch also moves the port_name field to the adapter struct for
logging the event. As the be_cmd_query_port_name() call is now moved
to be_get_config(), it is modified to use the mailbox instead of MCCQ

Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobe2net: refactor be_set_rx_mode() and be_vid_config() for readability
Sathya Perla [Fri, 6 Feb 2015 13:18:41 +0000 (08:18 -0500)]
be2net: refactor be_set_rx_mode() and be_vid_config() for readability

This patch re-factors the filter setting (uc-list, mc-list, promisc, vlan)
code in be_set_rx_mode() and be_vid_config() to make it more readable
and reduce code duplication.
This patch adds a separate field to track the state/mode of filtering,
along with moving all the filtering related fields to one place in be
be_adapter structure.

Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobe2net: remove duplicate code in be_cmd_rx_filter()
Sathya Perla [Fri, 6 Feb 2015 13:18:40 +0000 (08:18 -0500)]
be2net: remove duplicate code in be_cmd_rx_filter()

This patch passes BE_IF_FLAGS_XXX flags to be_cmd_rx_filter() routine
instead of the IFF_XXX flags. Doing this gets rid of the code to convert
the IFF_XXX flags to the BE_IF_FLAGS_XXX used by the FW cmd. The patch
also removes code for setting if_flags_mask that was duplicated for each
filter mode.

Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: Kalesh AP <kalesh.purayil@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobe2net: use offset based FW flashing for Skyhawk chip
Vasundhara Volam [Fri, 6 Feb 2015 13:18:39 +0000 (08:18 -0500)]
be2net: use offset based FW flashing for Skyhawk chip

While sending FW update cmds to the FW, the driver specifies the "type"
of each component that needs to be flashed. The FW then picks the offset
in the flash area at which the componnet is to be flashed. This doesn't work
when new components that the current FW doesn't recognize, need to be
flashed. Recent FWs (10.2 and above) support a scheme of FW-update wherein
the "offset" of the component in the flash area can be specified instead
of the "type". This patch uses the "offset" based FW-update mechanism and
only when it fails, it fallsback to the old "type" based update.

Signed-off-by: Vasundhara Volam <vasundhara.volam@emulex.com>
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobe2net: avoid flashing SH-B0 UFI image on SH-P2 chip
Vasundhara Volam [Fri, 6 Feb 2015 13:18:38 +0000 (08:18 -0500)]
be2net: avoid flashing SH-B0 UFI image on SH-P2 chip

Skyhawk-B0 FW UFI is not compatible to flash on Skyhawk-P2 ASIC.
But, Skyhawk-P2 FW UFI is compatible with both B0 and P2 chips.

Signed-off-by: Vasundhara Volam <vasundhara.volam@emulex.com>
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobe2net: refactor code that checks flash file compatibility
Vasundhara Volam [Fri, 6 Feb 2015 13:18:37 +0000 (08:18 -0500)]
be2net: refactor code that checks flash file compatibility

This patch re-factors the code that checks for flash file compatibility with
the chip type, for better readability, as follows:
- be_get_ufi_type() returns the UFI type from the flash file
- be_check_ufi_compatibility() checks if the UFI type is compatible
  with the adapter/chip that is being flashed

Signed-off-by: Vasundhara Volam <vasundhara.volam@emulex.com>
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobe2net: replace (1 << x) with BIT(x)
Vasundhara Volam [Fri, 6 Feb 2015 13:18:36 +0000 (08:18 -0500)]
be2net: replace (1 << x) with BIT(x)

BIT(x) is the preffered usage.
Signed-off-by: Vasundhara Volam <vasundhara.volam@emulex.com>
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobe2net: move un-exported routines from be.h to respective src files
Sathya Perla [Fri, 6 Feb 2015 13:18:35 +0000 (08:18 -0500)]
be2net: move un-exported routines from be.h to respective src files

Routines that are called only inside one src file must remain in that
file itself. Including them in a header file that is used for exporting
routine/struct definitions, causes unnecessary compilation of other
src files, when such a routine is modified.

Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobridge: add missing bridge port check for offloads
Roopa Prabhu [Fri, 6 Feb 2015 06:24:45 +0000 (22:24 -0800)]
bridge: add missing bridge port check for offloads

This patch fixes a missing bridge port check caught by smatch.

setlink/dellink of attributes like vlans can come for a bridge device
and there is no need to offload those today. So, this patch adds a bridge
port check. (In these cases however, the BRIDGE_SELF flags will always be set
and we may not hit a problem with the current code).

smatch complaint:

The patch 68e331c785b8: "bridge: offload bridge port attributes to
switch asic if feature flag set" from Jan 29, 2015, leads to the
following Smatch complaint:

net/bridge/br_netlink.c:552 br_setlink()
 error: we previously assumed 'p' could be null (see line 518)

net/bridge/br_netlink.c
   517
   518 if (p && protinfo) {
                    ^
Check for NULL.

Reported-By: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net...
David S. Miller [Sun, 8 Feb 2015 06:48:50 +0000 (22:48 -0800)]
Merge branch 'master' of git://git./linux/kernel/git/jkirsher/net-next

Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates 2015-02-05

This series contains updates to fm10k, ixgbe and ixgbevf.

Matthew fixes an issue where fm10k does not properly drop the upper-most four
bits on of the VLAN ID due to type promotion, so resolve the issue by not
masking off the bits, but by throwing an error if the VLAN ID is out-of-bounds.
Then cleans up two cases where variables were not being used, but were
being set, so just remove the unused variables.

Don cleans up sparse errors in the x550 family file for ixgbe.  Fixed up
a redundant setting of the default value for set_rxpba, which was done
twice accidentally.  Cleaned up the probe routine to remove a redundant
attempt to identify the PHY, which could lead to a panic on x550.  Added
support for VXLAN receive checksum offload in x550 hardware.  Added the
Ethertype Anti-spoofing feature for affected devices.

Emil enables ixgbe and ixgbevf to allow multiple queues in SRIOV mode.
Adds RSS support for x550 per VF.  Fixed up a couple of issues introduced
in commit 2b509c0cd292 ("ixgbe: cleanup ixgbe_ndo_set_vf_vlan"), fixed
setting of the VLAN inside ixgbe_enable_port_vlan() and disable the
"hide VLAN" bit in PFQDE when port VLAN is disabled.  Cleaned up the
setting of vlan_features by enabling all features at once.  Fixed the
ordering of the shutdown patch so that we attempt to shutdown the rings
more gracefully.  We shutdown the main Rx filter in the case of Rx and we
set the carrier_off state in the case of Tx so that packets stop being
delivered from outside the driver.  Then we shutdown interrupts and NAPI,
then finally stop the rings from performing DMA and clean them.  Added
code to allow for Tx hang checking to provide more robust debug info in
the event of a transmit unit hang in ixgbevf.  Cleaned up ixgbevf logic
dealing with link up/down by breaking down the link detection and up/down
events into separate functions, similar to how these events are handled
in other drivers.  Combined the ixgbevf reset and watchdog tasks into a
single task so that we can avoid multiple schedules of the reset task when
we have a reset event needed due to either the mailbox going down or
transmit packets being present on a link down.

v2: Fixed up patch #03 of the series to remove the variable type change
    based on feedback from David Laight
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'r8152'
David S. Miller [Sun, 8 Feb 2015 06:46:34 +0000 (22:46 -0800)]
Merge branch 'r8152'

Hayes Wang says:

====================
r8152: adjust the code

V2:
Correct the subject of patch #5. Replace "link feed" with "line feed".

v1:
Code adjustment.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agor8152: use BIT macro
hayeswang [Fri, 6 Feb 2015 03:30:51 +0000 (11:30 +0800)]
r8152: use BIT macro

Use BIT macro to replace (1 << bits).

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agor8152: replace get_protocol with vlan_get_protocol
hayeswang [Fri, 6 Feb 2015 03:30:50 +0000 (11:30 +0800)]
r8152: replace get_protocol with vlan_get_protocol

vlan_get_protocol() has been defined and use it to replace
get_protocol().

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agor8152: adjust the line feed for hw_features
hayeswang [Fri, 6 Feb 2015 03:30:49 +0000 (11:30 +0800)]
r8152: adjust the line feed for hw_features

Keep NETIF_F_HW_VLAN_CTAG_RX and NETIF_F_HW_VLAN_CTAG_TX at the
same line.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agor8152: check RTL8152_UNPLUG for rtl8152_close
hayeswang [Fri, 6 Feb 2015 03:30:48 +0000 (11:30 +0800)]
r8152: check RTL8152_UNPLUG for rtl8152_close

It is unnecessary to accress the hw register if the device is unplugged.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agor8152: check linking status with netif_carrier_ok
hayeswang [Fri, 6 Feb 2015 03:30:47 +0000 (11:30 +0800)]
r8152: check linking status with netif_carrier_ok

Replace (tp->speed & LINK_STATUS) with netif_carrier_ok().

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agor8152: adjust lpm timer
hayeswang [Fri, 6 Feb 2015 03:30:46 +0000 (11:30 +0800)]
r8152: adjust lpm timer

Set LPM timer to 500us, except for RTL_VER_04 which doesn't link at
USB 3.0.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agor8152: adjust rx_bottom
hayeswang [Fri, 6 Feb 2015 03:30:45 +0000 (11:30 +0800)]
r8152: adjust rx_bottom

If a error occurs when submitting rx, skip the remaining submissions
and try to submit them again next time.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoamd-xgbe: Check per channel DMA interrupt use in main ISR
Lendacky, Thomas [Fri, 6 Feb 2015 01:17:14 +0000 (19:17 -0600)]
amd-xgbe: Check per channel DMA interrupt use in main ISR

When using per channel DMA interrupts the transmit interrupt (TI) and the
receive interrupt (RI) are masked off so as to not generate an interrupt
to the main ISR. However, should another interrupt fire for the DMA channel
that is handled by the main ISR the TI/RI bits can still be set. This
will cause the wrong and uninitialized napi structure to be used causing a
panic. Add a check to be sure per channel DMA interrupts are not enabled
before acting on those bit flags.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: use netif_rx_ni() from process context
Eric Dumazet [Thu, 5 Feb 2015 22:58:14 +0000 (14:58 -0800)]
net: use netif_rx_ni() from process context

Hotpluging a cpu might be rare, yet we have to use proper
handlers when taking over packets found in backlog queues.

dev_cpu_callback() runs from process context, thus we should
call netif_rx_ni() to properly invoke softirq handler.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agords: Make rds_message_copy_from_user() return 0 on success.
Sowmini Varadhan [Thu, 5 Feb 2015 22:41:43 +0000 (17:41 -0500)]
rds: Make rds_message_copy_from_user() return 0 on success.

Commit 083735f4b01b ("rds: switch rds_message_copy_from_user() to iov_iter")
breaks rds_message_copy_from_user() semantics on success, and causes it
to return nbytes copied, when it should return 0.  This commit fixes that bug.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: rds: Remove repeated function names from debug output
Rasmus Villemoes [Thu, 5 Feb 2015 22:17:20 +0000 (23:17 +0100)]
net: rds: Remove repeated function names from debug output

The macro rdsdebug is defined as

  pr_debug("%s(): " fmt, __func__ , ##args)

Hence it doesn't make sense to include the name of the calling
function explicitly in the format string passed to rdsdebug.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: openvswitch: Support masked set actions.
Jarno Rajahalme [Thu, 5 Feb 2015 21:40:49 +0000 (13:40 -0800)]
net: openvswitch: Support masked set actions.

OVS userspace already probes the openvswitch kernel module for
OVS_ACTION_ATTR_SET_MASKED support.  This patch adds the kernel module
implementation of masked set actions.

The existing set action sets many fields at once.  When only a subset
of the IP header fields, for example, should be modified, all the IP
fields need to be exact matched so that the other field values can be
copied to the set action.  A masked set action allows modification of
an arbitrary subset of the supported header bits without requiring the
rest to be matched.

Masked set action is now supported for all writeable key types, except
for the tunnel key.  The set tunnel action is an exception as any
input tunnel info is cleared before action processing starts, so there
is no tunnel info to mask.

The kernel module converts all (non-tunnel) set actions to masked set
actions.  This makes action processing more uniform, and results in
less branching and duplicating the action processing code.  When
returning actions to userspace, the fully masked set actions are
converted back to normal set actions.  We use a kernel internal action
code to be able to tell the userspace provided and converted masked
set actions apart.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>