GitHub/moto-9609/android_kernel_motorola_exynos9610.git
9 years agoMerge branch 'mlx4-802.1ad-accel'
David S. Miller [Mon, 27 Jul 2015 22:00:37 +0000 (15:00 -0700)]
Merge branch 'mlx4-802.1ad-accel'

Amir Vadai says:

====================
net/mlx4_en: Hardware accelerated 802.1ad

This patchset by Hadar introduces support in Hardware accelerated 802.1ad, for
ConnectX-3pro NIC's.  In order to support existing deployment, and due to some
hardware limitations, the feature is disabled by default, and needed to be
enabled using a private flag in ethtool. Ofcourse user can enable the private
flag only if hardware has support.
After being enabled, the standard ethtool -k/-K can be used.

Patchset was applied and tested over commit 71790a2 ("hv_netvsc: Add structs
and handlers for VF messages")
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet/mlx4_en: Add support for hardware accelerated 802.1ad vlan
Hadar Hen Zion [Mon, 27 Jul 2015 11:46:34 +0000 (14:46 +0300)]
net/mlx4_en: Add support for hardware accelerated 802.1ad vlan

To enable device support in accelerated 802.1ad vlan, the port
capability "packet has vlan enable" (phv_en) should be set.
Firmware won't work properly, in case phv_en is not set.

The user can enable "phv_en" port capability with the new ethtool
private flag phv-bit. The phv-bit private flag default value is OFF,
users who are interested in 802.1ad hardware acceleration should turn ON
the phv-bit private flag:
$ ethtool --set-priv-flags eth1 phv-bit on

Once the private flag is set, the device is ready for 802.1ad vlan
acceleration.

The user should also change the interface device features and turn on
"tx-vlan-stag-hw-insert" which is off by default:
$ ethtool -K eth1  tx-vlan-stag-hw-insert on

"phv-bit" private flag setting is available only for Physical
Functions(PF), the Virtual Function (VF) will be able to use the feature
by setting "tx-vlan-stag-hw-insert" ethtool device feature only if the
feature was enabled by the Hypervisor.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet/mlx4: Prepare VLAN macros for 802.1ad Hardware accelerated support
Hadar Hen Zion [Mon, 27 Jul 2015 11:46:33 +0000 (14:46 +0300)]
net/mlx4: Prepare VLAN macros for 802.1ad Hardware accelerated support

To add Hardware accelerated support in 802.1ad vlan, replace
Current VLAN macros to CVLAN.
Replace:
MLX4_WQE_CTRL_INS_VLAN
MLX4_CQE_VLAN_PRESENT_MASK
With:
MLX4_WQE_CTRL_INS_CVLAN
MLX4_CQE_CVLAN_PRESENT_MASK

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet/mlx4_en: Prepare ethtool private flags to support more flags
Hadar Hen Zion [Mon, 27 Jul 2015 11:46:32 +0000 (14:46 +0300)]
net/mlx4_en: Prepare ethtool private flags to support more flags

Currently we support only one ethtool private flag. Prepare
mlx4_en_set_priv_flags function to support more than one private flag.
Will be used in the next patch to support hardware accelerated 802.1ad
vlan.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet/mlx4_core: Preparations for 802.1ad VLAN support
Hadar Hen Zion [Mon, 27 Jul 2015 11:46:31 +0000 (14:46 +0300)]
net/mlx4_core: Preparations for 802.1ad VLAN support

mlx4_core preparation to support hardware accelerated 802.1ad VLAN
device.

To allow 802.1ad accelerated device, "packet has vlan" (phv)
Firmware capability should be available. Firmware without the
phv capability won't behave properly and can't support 802.1ad device
acceleration.

The driver checks the Firmware capability and sets the phv bit
accordingly in SET_PORT command.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'arm-bpf-next'
David S. Miller [Mon, 27 Jul 2015 21:57:41 +0000 (14:57 -0700)]
Merge branch 'arm-bpf-next'

Nicolas Schichan says:

====================
ARM BPF JIT features

This series adds support for more instructions to the ARM BPF JIT
namely skb netdevice type retrieval, skb payload offset retrieval, and
skb packet type retrieval.

This allows 35 tests to use the JIT instead of 29 before.

This series depends on the "BPF JIT fixes for ARM" serie sent earlier.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoARM: net: add support for BPF_ANC | SKF_AD_HATYPE in ARM JIT.
Nicolas Schichan [Mon, 27 Jul 2015 13:06:51 +0000 (15:06 +0200)]
ARM: net: add support for BPF_ANC | SKF_AD_HATYPE in ARM JIT.

Signed-off-by: Nicolas Schichan <nschichan@freebox.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoARM: net: add support for BPF_ANC | SKF_AD_PAY_OFFSET in ARM JIT.
Nicolas Schichan [Mon, 27 Jul 2015 13:06:50 +0000 (15:06 +0200)]
ARM: net: add support for BPF_ANC | SKF_AD_PAY_OFFSET in ARM JIT.

Signed-off-by: Nicolas Schichan <nschichan@freebox.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoARM: net: add support for BPF_ANC | SKF_AD_PKTTYPE in ARM JIT.
Nicolas Schichan [Mon, 27 Jul 2015 13:06:49 +0000 (15:06 +0200)]
ARM: net: add support for BPF_ANC | SKF_AD_PKTTYPE in ARM JIT.

Signed-off-by: Nicolas Schichan <nschichan@freebox.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agolwtunnel: use kfree_skb() instead of vanilla kfree()
Dan Carpenter [Mon, 27 Jul 2015 08:07:47 +0000 (11:07 +0300)]
lwtunnel: use kfree_skb() instead of vanilla kfree()

kfree_skb() is correct here.

Fixes: ffce41962ef6 ('lwtunnel: support dst output redirect function')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotcp: tso: allow deferring under reordering state
Eric Dumazet [Sun, 26 Jul 2015 07:45:24 +0000 (09:45 +0200)]
tcp: tso: allow deferring under reordering state

While doing experiments with reordering resilience, we found
linux senders were not able to send at full speed under reordering,
because every incoming SACK was releasing one MSS.

This patch removes the limitation, as we did for CWR state
in commit a0ea700e409 ("tcp: tso: allow CA_CWR state in
tcp_tso_should_defer()")

Neal Cardwell had a concern about limited transmit so
Yuchung conducted experiments on GFE and found nothing
worth adding an extra check on fast path :

  if (icsk->icsk_ca_state == TCP_CA_Disorder &&
      tcp_sk(sk)->reordering == sysctl_tcp_reordering)
          goto send_now;

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoravb: minimize TX data copying
Sergei Shtylyov [Sat, 25 Jul 2015 20:42:01 +0000 (23:42 +0300)]
ravb: minimize TX data copying

Renesas Ethernet AVB controller requires that all data are aligned on 4-byte
boundary.  While it's  easily achievable for  the RX  data with  the help of
skb_reserve() (we even align on 128-byte boundary as recommended by the manual),
we  can't  do the same with the TX data, and it always comes  unaligned from
the networking core. Originally we solved it an easy way, copying all packet
to  a  preallocated  aligned buffer; however, it's enough to copy only up to
3 first bytes from each packet, doing the transfer using 2 TX descriptors
instead of just 1. Here's an implementation of the new  TX algorithm that
significantly reduces the driver's memory requirements.

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agodsa: mv88e6352/mv88e6xxx: Move temperature sensor code to mv88e6xxx.c
Guenter Roeck [Sat, 25 Jul 2015 16:42:28 +0000 (09:42 -0700)]
dsa: mv88e6352/mv88e6xxx: Move temperature sensor code to mv88e6xxx.c

Move the temperature sensing code for mv88e6352 and mv88e6320 families
into mv88e6xxx.c to simplify adding support for additional chips.

With this change, mv88e6xxx_6320_family() no longer needs to be
a global function and is made static.

Cc: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agohv_netvsc: Add structs and handlers for VF messages
Haiyang Zhang [Fri, 24 Jul 2015 17:08:40 +0000 (10:08 -0700)]
hv_netvsc: Add structs and handlers for VF messages

This patch adds data structures and handlers for messages related
to SRIOV Virtual Function.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'rt6_probe_write_lock'
David S. Miller [Mon, 27 Jul 2015 08:08:26 +0000 (01:08 -0700)]
Merge branch 'rt6_probe_write_lock'

Martin KaFai Lau says:

====================
ipv6: Avoid rt6_probe() taking writer lock in the fast path

v1 -> v2:
1. Separate the code re-arrangement into another patch
2. Fix style
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoipv6: Avoid rt6_probe() taking writer lock in the fast path
Martin KaFai Lau [Fri, 24 Jul 2015 16:57:43 +0000 (09:57 -0700)]
ipv6: Avoid rt6_probe() taking writer lock in the fast path

The patch checks neigh->nud_state before acquiring the writer lock.
Note that rt6_probe() is only used in CONFIG_IPV6_ROUTER_PREF.

40 udpflood processes and a /64 gateway route are used.
The gateway has NUD_PERMANENT.  Each of them is run for 30s.
At the end, the total number of finished sendto():

Before: 55M
After: 95M

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
CC: Julian Anastasov <ja@ssi.bg>
CC: YOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoipv6: Re-arrange code in rt6_probe()
Martin KaFai Lau [Fri, 24 Jul 2015 16:57:42 +0000 (09:57 -0700)]
ipv6: Re-arrange code in rt6_probe()

It is a prep work for the next patch to remove write_lock
from rt6_probe().

1. Reduce the number of if(neigh) check.  From 4 to 1.
2. Bring the write_(un)lock() closer to the operations that the
   lock is protecting.

Hopefully, the above make rt6_probe() more readable.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Julian Anastasov <ja@ssi.bg>
Cc: YOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobonding: convert num_grat_arp to the new bonding option API
Nikolay Aleksandrov [Fri, 24 Jul 2015 13:50:31 +0000 (15:50 +0200)]
bonding: convert num_grat_arp to the new bonding option API

num_grat_arp wasn't converted to the new bonding option API, so do this
now and remove the specific sysfs store option in order to use the
standard one. num_grat_arp is the same as num_unsol_na so add it as an
alias with the same option settings. An important difference is the option
name which is matched in bond_sysfs_store_option().

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Acked-by: Veaceslav Falico <vfalico@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: phy: fix auto negotiation checking for teranetics
Shaohui Xie [Fri, 24 Jul 2015 11:26:02 +0000 (19:26 +0800)]
net: phy: fix auto negotiation checking for teranetics

When using fiber port, the phy cannot report it's auto negotiation state,
driver should always report auto negotiation is done when using fiber port.

Signed-off-by: Shaohui Xie <Shaohui.Xie@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agolwtunnel: change prototype of lwtunnel_state_get()
Nicolas Dichtel [Fri, 24 Jul 2015 10:28:36 +0000 (12:28 +0200)]
lwtunnel: change prototype of lwtunnel_state_get()

It saves some lines and simplify a bit the code when the state is returning
by this function. It's also useful to handle a NULL entry.

To avoid too long lines, I've also renamed lwtunnel_state_get() and
lwtunnel_state_put() to lwtstate_get() and lwtstate_put().

CC: Thomas Graf <tgraf@suug.ch>
CC: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoipv6: copy lwtstate in ip6_rt_copy_init()
Nicolas Dichtel [Fri, 24 Jul 2015 10:28:35 +0000 (12:28 +0200)]
ipv6: copy lwtstate in ip6_rt_copy_init()

We need to copy this field (ip6_rt_cache_alloc() and ip6_rt_pcpu_alloc()
use ip6_rt_copy_init() to build a dst).

CC: Thomas Graf <tgraf@suug.ch>
CC: Roopa Prabhu <roopa@cumulusnetworks.com>
Fixes: 19e42e451506 ("ipv6: support for fib route lwtunnel encap attributes")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoipv6: use lwtunnel_output6() only if flag redirect is set
Nicolas Dichtel [Fri, 24 Jul 2015 08:59:41 +0000 (10:59 +0200)]
ipv6: use lwtunnel_output6() only if flag redirect is set

This function make sense only when LWTUNNEL_STATE_OUTPUT_REDIRECT is set.
The check is already done in IPv4.

CC: Thomas Graf <tgraf@suug.ch>
CC: Roopa Prabhu <roopa@cumulusnetworks.com>
Fixes: 74a0f2fe8ed5 ("ipv6: rt6_info output redirect to tunnel output")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: phy: dp83867: fix simple_return.cocci warnings
Wu Fengguang [Fri, 24 Jul 2015 06:16:10 +0000 (14:16 +0800)]
net: phy: dp83867: fix simple_return.cocci warnings

drivers/net/phy/dp83867.c:126:1-4: WARNING: end returns can be simpified
drivers/net/phy/dp83867.c:74:5-8: WARNING: end returns can be simpified if tested value is negative or 0

 Simplify a trivial if-return sequence.  Possibly combine with a
 preceding function call.

Generated by: scripts/coccinelle/misc/simple_return.cocci

CC: Dan Murphy <dmurphy@ti.com>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agodev: Spelling fix in comments
subashab@codeaurora.org [Fri, 24 Jul 2015 03:03:29 +0000 (03:03 +0000)]
dev: Spelling fix in comments

Fix the following typo
- unchainged -> unchanged

Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoebpf: Allow dereferences of PTR_TO_STACK registers
Alex Gartrell [Thu, 23 Jul 2015 21:24:40 +0000 (14:24 -0700)]
ebpf: Allow dereferences of PTR_TO_STACK registers

mov %rsp, %r1           ; r1 = rsp
        add $-8, %r1            ; r1 = rsp - 8
        store_q $123, -8(%rsp)  ; *(u64*)r1 = 123  <- valid
        store_q $123, (%r1)     ; *(u64*)r1 = 123  <- previously invalid
        mov $0, %r0
        exit                    ; Always need to exit

And we'd get the following error:

0: (bf) r1 = r10
1: (07) r1 += -8
2: (7a) *(u64 *)(r10 -8) = 999
3: (7a) *(u64 *)(r1 +0) = 999
R1 invalid mem access 'fp'

Unable to load program

We already know that a register is a stack address and the appropriate
offset, so we should be able to validate those references as well.

Signed-off-by: Alex Gartrell <agartrell@fb.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'mlx5e-next'
David S. Miller [Mon, 27 Jul 2015 07:29:18 +0000 (00:29 -0700)]
Merge branch 'mlx5e-next'

Amir Vadai says:

====================
ConnectX-4 driver update 2015-07-23

This patchset introduce some performance enhancements to the ConnectX-4 driver.
1. Improving RSS distribution, and make RSS function controlable using ethtool.
2. Make memory that is written by NIC and read by host CPU allocate in the
   local NUMA to the processing CPU
3. Support tx copybreak
4. Using hardware feature called blueflame to save DMA reads when possible

Another patch by Achiad fix some cosmetic issues in the driver.

Patchset was applied and tested on top of commit 045a0fa ("ip_tunnel: Call
ip_tunnel_core_init() from inet_init()")
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet/mlx5e: Input IPSEC.SPI into the RX RSS hash function
Achiad Shochat [Thu, 23 Jul 2015 20:36:01 +0000 (23:36 +0300)]
net/mlx5e: Input IPSEC.SPI into the RX RSS hash function

In addition to the source/destination IP which are already hashed.
Only for unicast traffic for now.

Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet/mlx5e: Cosmetics: use BIT() instead of "1 <<", and others
Achiad Shochat [Thu, 23 Jul 2015 20:36:00 +0000 (23:36 +0300)]
net/mlx5e: Cosmetics: use BIT() instead of "1 <<", and others

No logical change in this commit.

Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet/mlx5e: TX latency optimization to save DMA reads
Achiad Shochat [Thu, 23 Jul 2015 20:35:59 +0000 (23:35 +0300)]
net/mlx5e: TX latency optimization to save DMA reads

A regular TX WQE execution involves two or more DMA reads -
one to fetch the WQE, and another one per WQE gather entry.

These DMA reads obviously increase the TX latency.
There are two mlx5 mechanisms to bypass these DMA reads:
1) Inline WQE
2) Blue Flame (BF)

An inline WQE contains a whole packet, thus saves the DMA read/s
of the regular WQE gather entry/s. Inline WQE support was already
added in the previous commit.

A BF WQE is written directly to the device I/O mapped memory, thus
enables saving the DMA read that fetches the WQE.

The BF WQE I/O write must be in cache line granularity, thus uses
the CPU write combining mechanism.
A BF WQE I/O write acts also as a TX doorbell for notifying the
device of new TX WQEs.
A BF WQE is written to the same I/O mapped address as the regular TX
doorbell, thus this address is being mapped twice - once by ioremap()
and once by io_mapping_map_wc().

While both mechanisms reduce the TX latency, they both consume more CPU
cycles than a regular WQE:
- A BF WQE must still be written to host memory, in addition to being
  written directly to the device I/O mapped memory.
- An inline WQE involves copying the SKB data into it.

To handle this tradeoff, we introduce here a heuristic algorithm that
strives to avoid using these two mechanisms in case the TX queue is
being back-pressured by the device, and limit their usage rate otherwise.

An inline WQE will always be "Blue Flamed" (written directly to the
device I/O mapped memory) while a BF WQE may not be inlined (may contain
gather entries).

Preliminary testing using netperf UDP_RR shows that the latency goes down
from 17.5us to 16.9us, while the message rate (tested with pktgen) stays
the same.

Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet/mlx5e: Support TX packet copy into WQE
Achiad Shochat [Thu, 23 Jul 2015 20:35:58 +0000 (23:35 +0300)]
net/mlx5e: Support TX packet copy into WQE

AKA inline WQE.
A TX latency optimization to save data gather DMA reads.
Controlled by ETHTOOL_TX_COPYBREAK.

Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet/mlx5e: Allocate DMA coherent memory on reader NUMA node
Saeed Mahameed [Thu, 23 Jul 2015 20:35:57 +0000 (23:35 +0300)]
net/mlx5e: Allocate DMA coherent memory on reader NUMA node

By affinity hints and XPS, each mlx5e channel is assigned a CPU
core.

Channel DMA coherent memory that is written by the NIC and read
by SW (e.g CQ buffer) is allocated on the NUMA node of the CPU
core assigned for the channel.

Channel DMA coherent memory that is written by SW and read by the
NIC (e.g SQ/RQ buffer) is allocated on the NUMA node of the NIC.

Doorbell record (written by SW and read by the NIC) is an
exception since it is accessed by SW more frequently.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet/mlx5e: Support ETH_RSS_HASH_XOR
Saeed Mahameed [Thu, 23 Jul 2015 20:35:56 +0000 (23:35 +0300)]
net/mlx5e: Support ETH_RSS_HASH_XOR

The ConnectX-4 HW implements inverted XOR8.
To make it act as XOR we re-order the HW RSS indirection table.

Set XOR to be the default RSS hash function and add ethtool API to
control it.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'netcp-next'
David S. Miller [Mon, 27 Jul 2015 07:18:40 +0000 (00:18 -0700)]
Merge branch 'netcp-next'

WingMan Kwok says:

====================
net: netcp: Bug fixes of CPSW statistics collection

This patch set contains bug fixes and enhencements of hw ethernet
statistics processing on TI's Keystone2 CPSW ethernet switches.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: netcp: Adds missing statistics for K2L and K2E
WingMan Kwok [Thu, 23 Jul 2015 19:57:24 +0000 (15:57 -0400)]
net: netcp: Adds missing statistics for K2L and K2E

This patch adds the missing statistics for the host
and slave ports of the CPSW on K2L and K2E platforms.

Signed-off-by: WingMan Kwok <w-kwok2@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: netcp: Fixes to CPSW statistics collection
WingMan Kwok [Thu, 23 Jul 2015 19:57:23 +0000 (15:57 -0400)]
net: netcp: Fixes to CPSW statistics collection

In certain applications it's beneficial to allow the CPSW h/w
stats counters to continue to increment even while the kernel
polls them. This patch implements this behavior for both 1G
and 10G ethernet subsystem modules.

Signed-off-by: WingMan Kwok <w-kwok2@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: netcp: Consolidates statistics collection code
WingMan Kwok [Thu, 23 Jul 2015 19:57:22 +0000 (15:57 -0400)]
net: netcp: Consolidates statistics collection code

Different Keystone2 platforms have different number and
layouts of hw statistics modules.  This patch consolidates
the statistics processing of different Keystone2 platforms
for easy maintenance.

Signed-off-by: WingMan Kwok <w-kwok2@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: netcp: Fixes error in oversized memory allocation for statistics storage
WingMan Kwok [Thu, 23 Jul 2015 19:57:21 +0000 (15:57 -0400)]
net: netcp: Fixes error in oversized memory allocation for statistics storage

The CPSW driver keeps internally some, but not all, of
the statistics available in the hw statistics modules.  Furthermore,
some of the locations in the hw statistics modules are reserved and
contain no useful information.  Prior to this patch, the driver
allocates memory of the size of the the whole hw statistics modules,
instead of the size of statistics-entries-interested-in (i.e. et_stats),
for internal storage.  This patch fixes that.

Signed-off-by: WingMan Kwok <w-kwok2@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: netcp: Fixes hw statistics module base setting error
WingMan Kwok [Thu, 23 Jul 2015 19:57:20 +0000 (15:57 -0400)]
net: netcp: Fixes hw statistics module base setting error

This patch fixes error in the setting of the hw statistics
module base for K2HK platform.  In K2HK although there are
4 hw statistics modules, but only 2 are visible at a time.
Thus when setting up the pointers to the base of the
corresponding hw statistics modules, modules 0 and 2 should
point to one base, while modules 1 and 3 should point to the
other.

Signed-off-by: WingMan Kwok <w-kwok2@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: netcp: Fixes the use of spin_lock_bh in timer function
WingMan Kwok [Thu, 23 Jul 2015 19:57:19 +0000 (15:57 -0400)]
net: netcp: Fixes the use of spin_lock_bh in timer function

This patch fixes a bug in which the timer routine synchronized
against the ethtool-triggered statistics updates with spin_lock_bh().
A timer function is itself a bottom-half, so this should be
spin_lock().

Signed-off-by: WingMan Kwok <w-kwok2@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agocxgb4vf: Read correct FL congestion threshold for T5 and T6
Hariprasad Shenai [Thu, 23 Jul 2015 17:11:14 +0000 (22:41 +0530)]
cxgb4vf: Read correct FL congestion threshold for T5 and T6

VF driver was reading incorrect freelist congestion notification threshold
for FLM queues when packing is enabled for T5 and T6 adapter. Fixing it
now.

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agolwtunnel: export linux/lwtunnel.h to userspace
Nicolas Dichtel [Thu, 23 Jul 2015 13:43:56 +0000 (15:43 +0200)]
lwtunnel: export linux/lwtunnel.h to userspace

Note also that include/linux/lwtunnel.h is not needed.

CC: Thomas Graf <tgraf@suug.ch>
CC: Roopa Prabhu <roopa@cumulusnetworks.com>
Fixes: 499a24256862 ("lwtunnel: infrastructure for handling light weight tunnels like mpls")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobridge: mdb: notify on router port add and del
Satish Ashok [Thu, 23 Jul 2015 12:00:53 +0000 (05:00 -0700)]
bridge: mdb: notify on router port add and del

Send notifications on router port add and del/expire, re-use the already
existing MDBA_ROUTER and send NEWMDB/DELMDB netlink notifications
respectively.

Signed-off-by: Satish Ashok <sashok@cumulusnetworks.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoopenvswitch: Retrieve tunnel metadata when receiving from vport-netdev
Thomas Graf [Thu, 23 Jul 2015 11:04:44 +0000 (13:04 +0200)]
openvswitch: Retrieve tunnel metadata when receiving from vport-netdev

Retrieve the tunnel metadata for packets received by a net_device and
provide it to ovs_vport_receive() for flow key extraction.

[This hunk was in the GRE patch in the initial series and missed the
 cut for the initial submission for merging.]

Fixes: 614732eaa12d ("openvswitch: Use regular VXLAN net_device device")
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: macb: Change capability mask for jumbo support
Harini Katakam [Thu, 23 Jul 2015 10:14:25 +0000 (15:44 +0530)]
net: macb: Change capability mask for jumbo support

JUMBO and NO_GIGABIT_HALF have the same capability masks.
Change one of them.

Signed-off-by: Harini Katakam <harinik@xilinx.com>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Acked-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoopenvswitch: fix compilation when vxlan is a module
Nicolas Dichtel [Thu, 23 Jul 2015 09:29:07 +0000 (11:29 +0200)]
openvswitch: fix compilation when vxlan is a module

With CONFIG_VXLAN=m and CONFIG_OPENVSWITCH=y, there was the following
compilation error:
  LD      init/built-in.o
  net/built-in.o: In function `vxlan_tnl_create':
  .../net/openvswitch/vport-netdev.c:322: undefined reference to `vxlan_dev_create'
  make: *** [vmlinux] Error 1

CC: Thomas Graf <tgraf@suug.ch>
Fixes: 614732eaa12d ("openvswitch: Use regular VXLAN net_device device")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoipv4: be more aggressive when probing alternative gateways
Julian Anastasov [Thu, 23 Jul 2015 07:39:35 +0000 (10:39 +0300)]
ipv4: be more aggressive when probing alternative gateways

Currently, we do not notice if new alternative gateways
are added. We can do it by checking for present neigh
entry. Also, gateways that are currently probed (NUD_INCOMPLETE)
can be skipped from round-robin probing.

Suggested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoipv6: fix crash over flow-based vxlan device
Wei-Chun Chao [Thu, 23 Jul 2015 01:13:12 +0000 (18:13 -0700)]
ipv6: fix crash over flow-based vxlan device

Similar check was added in ip_rcv but not in ipv6_rcv.

BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffff81734e0a>] ipv6_rcv+0xfa/0x500
Call Trace:
[<ffffffff816c9786>] ? ip_rcv+0x296/0x400
[<ffffffff817732d2>] ? packet_rcv+0x52/0x410
[<ffffffff8168e99f>] __netif_receive_skb_core+0x63f/0x9a0
[<ffffffffc02b34a0>] ? br_handle_frame_finish+0x580/0x580 [bridge]
[<ffffffff8109912c>] ? update_rq_clock.part.81+0x1c/0x40
[<ffffffff8168ed18>] __netif_receive_skb+0x18/0x60
[<ffffffff8168fa1f>] process_backlog+0x9f/0x150

Fixes: ee122c79d422 (vxlan: Flow based tunneling)
Signed-off-by: Wei-Chun Chao <weichunc@plumgrid.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: bcmgenet: Register link_update callback for all MoCA PHYs
Florian Fainelli [Thu, 23 Jul 2015 00:29:53 +0000 (17:29 -0700)]
net: bcmgenet: Register link_update callback for all MoCA PHYs

Commit 8d88c6ebb34c ("net: bcmgenet: enable MoCA link state change
detection") added a fixed PHY link_update callback for MoCA PHYs when
registered using platform_data exclusively, this change is also
applicable to systems using Device Tree as their primary configuration
interface.

In order for this to work, move the link_update assignment into
bcmgenet_moca_phy_setup() where we know for sure that we are running on
a MoCA GENET instance, and do not override phydev->link since this is:

- properly taken care of by the PHY library by getting the link UP/DOWN
  interrupts
- this now runs everytime we call bcmgenet_open(), so we need to
  preserve whatever we detected before we went administratively DOWN and
  then UP
- we need to make sure that MoCA PHYs start with a link DOWN during
  probe in order to force a link transition to occur

To avoid a forward declaration, move bcmgenet_fixed_phy_link_update()
above its caller.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: bcmgenet: Remove checks on clock handles
Florian Fainelli [Thu, 23 Jul 2015 00:28:23 +0000 (17:28 -0700)]
net: bcmgenet: Remove checks on clock handles

Instead of multiplying the number of checks for IS_ERR(priv->clk),
simply NULLify the 'struct clk' pointer which is something the Linux
common clock framework perfectly deals with and does early return for
each and every single clk_* API functions.

Having every single function check for !IS_ERR(priv->clk) is both
redundant and error prone, as it turns out, we were doing it for the
main GENET clock: priv->clk, but not for the Wake-on-LAN or EEE clock,
so let's just be consistent here.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Petri Gynther <pgynther@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agohv_netvsc: Wait for sub-channels to be processed during probe
KY Srinivasan [Wed, 22 Jul 2015 18:42:32 +0000 (11:42 -0700)]
hv_netvsc: Wait for sub-channels to be processed during probe

The current code returns from probe without waiting for the proper handling
of subchannels that may be requested. If the netvsc driver were to be rapidly
loaded/unloaded, we can  trigger a panic as the unload will be tearing
down state that may not have been fully setup yet. We fix this issue by making
sure that we return from the probe call only after ensuring that the
sub-channel offers in flight are properly handled.

Reviewed-and-tested-by: Haiyang Zhang <haiyangz@microsoft.com
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agocxgb4: Allow firmware flash, only if cxgb4 is the master driver
Hariprasad Shenai [Wed, 22 Jul 2015 17:24:50 +0000 (22:54 +0530)]
cxgb4: Allow firmware flash, only if cxgb4 is the master driver

Adapter can go for a toss, if cxgb4 is loaded as slave and we try to
upgrade the firmware. So add a check for the same before flashing
firmware using ethtool.

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agovxlan: Use proper endian type for vni in vxlan[6]_xmit_skb
Thomas Graf [Wed, 22 Jul 2015 15:08:42 +0000 (17:08 +0200)]
vxlan: Use proper endian type for vni in vxlan[6]_xmit_skb

Silences the following sparse warnings:
drivers/net/vxlan.c:1818:21: warning: incorrect type in assignment (different base types)
drivers/net/vxlan.c:1818:21:    expected restricted __be32 [usertype] vx_vni
drivers/net/vxlan.c:1818:21:    got unsigned int [unsigned] [usertype] vni
drivers/net/vxlan.c:2014:58: warning: incorrect type in argument 11 (different base types)
drivers/net/vxlan.c:2014:58:    expected unsigned int [unsigned] [usertype] vni
drivers/net/vxlan.c:2014:58:    got restricted __be32 [usertype] <noident>

Fixes: 614732eaa12d ("openvswitch: Use regular VXLAN net_device device")
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'tipc'
David S. Miller [Sun, 26 Jul 2015 23:31:50 +0000 (16:31 -0700)]
Merge branch 'tipc'

Jon Maloy says:

====================
tipc: clean up socket message reception

Despite recent improvements the message reception code in socket.c is
perceived as obscure and hard to follow, especially regarding the logics
for message rejection. With the commits in this series we try to remedy
this situation.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotipc: clean up socket layer message reception
Jon Paul Maloy [Wed, 22 Jul 2015 14:11:20 +0000 (10:11 -0400)]
tipc: clean up socket layer message reception

When a message is received in a socket, one of the call chains
tipc_sk_rcv()->tipc_sk_enqueue()->filter_rcv()(->tipc_sk_proto_rcv())
or
tipc_sk_backlog_rcv()->filter_rcv()(->tipc_sk_proto_rcv())
are followed. At each of these levels we may encounter situations
where the message may need to be rejected, or a new message
produced for transfer back to the sender. Despite recent
improvements, the current code for doing this is perceived
as awkward and hard to follow.

Leveraging the two previous commits in this series, we now
introduce a more uniform handling of such situations. We
let each of the functions in the chain itself produce/reverse
the message to be returned to the sender, but also perform the
actual forwarding. This simplifies the necessary logics within
each function.

Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotipc: introduce new tipc_sk_respond() function
Jon Paul Maloy [Wed, 22 Jul 2015 14:11:19 +0000 (10:11 -0400)]
tipc: introduce new tipc_sk_respond() function

Currently, we use the code sequence

if (msg_reverse())
   tipc_link_xmit_skb()

at numerous locations in socket.c. The preparation of arguments
for these calls, as well as the sequence itself, makes the code
unecessarily complex.

In this commit, we introduce a new function, tipc_sk_respond(),
that performs this call combination. We also replace some, but not
yet all, of these explicit call sequences with calls to the new
function. Notably, we let the function tipc_sk_proto_rcv() use
the new function to directly send out PROBE_REPLY messages,
instead of deferring this to the calling tipc_sk_rcv() function,
as we do now.

Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotipc: let function tipc_msg_reverse() expand header when needed
Jon Paul Maloy [Wed, 22 Jul 2015 14:11:18 +0000 (10:11 -0400)]
tipc: let function tipc_msg_reverse() expand header when needed

The shortest TIPC message header, for cluster local CONNECTED messages,
is 24 bytes long. With this format, the fields "dest_node" and
"orig_node" are optimized away, since they in reality are redundant
in this particular case.

However, the absence of these fields leads to code inconsistencies
that are difficult to handle in some cases, especially when we need
to reverse or reject messages at the socket layer.

In this commit, we concentrate the handling of the absent fields
to one place, by letting the function tipc_msg_reverse() reallocate
the buffer and expand the header to 32 bytes when necessary. This
means that the socket code now can assume that the two previously
absent fields are present in the header when a message needs to be
rejected. This opens up for some further simplifications of the
socket code.

Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next...
David S. Miller [Sat, 25 Jul 2015 07:14:46 +0000 (00:14 -0700)]
Merge branch 'master' of git://git./linux/kernel/git/jkirsher/next-queue

Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates 2015-07-23

This series contains updates to e1000e, igb, ixgbevf, i40e and i40evf.

Emil extends the reporting of the RSS key and hash table by adding support
for x550 VFs.

Jia-Ju Bai fixes a QoS issue in e1000e where the error handling lacked a
call to pm_qos_remove_request() to cleanup the QoS request made in
e1000_open().

Todd updates igb to report unsupported for ethtool coalesce settings
that are not supported.  Also updated the driver to use the ARRAY_SIZE()
macro.

Carolyn fixes and refactors the dynamic ITR code for i40e and i40evf
which would never change dynamically.  So update the switch() statement
to have a default case and switch on "new_latency_range" versus the
current ITR setting.

Shannon cleans up i40e code, where there were un-needed goto's.  Also
clean up error status messages that were causing some confusion in
PHY and FCoE setup error reports.

Mitch updates the virtual channel interface to prepare for the x722 device
and other future devices, so that the VF driver can report what its
capable of supporting to the PF driver.  Updates the i40evf driver to
handle resets like Core or EMP resets, where the device is reinitialized
and the VF will not get the same VSI.

Jesse updates the i40e and i40evf driver to use the kernel BIT() and
BIT_ULL() macros.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobridge: Fix setting a flag in br_fill_ifvlaninfo_range().
Rosen, Rami [Wed, 22 Jul 2015 04:57:02 +0000 (07:57 +0300)]
bridge: Fix setting a flag in br_fill_ifvlaninfo_range().

This patch fixes setting of vinfo.flags in the br_fill_ifvlaninfo_range() method. The
assignment of vinfo.flags &= ~BRIDGE_VLAN_INFO_RANGE_BEGIN has no effect and is
unneeded, as vinfo.flags value is overriden by the  immediately following
vinfo.flags = flags | BRIDGE_VLAN_INFO_RANGE_END assignement.

Signed-off-by: Rami Rosen <rami.rosen@intel.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobe2net: support ndo_get_phys_port_id()
Sriharsha Basavapatna [Wed, 22 Jul 2015 05:45:12 +0000 (11:15 +0530)]
be2net: support ndo_get_phys_port_id()

Add be_get_phys_port_id() function to report physical port id. The port id
should be unique across different be2net devices in the system. We use the
chip serial number along with the physical port number for this.

Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@avagotech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoi40e: use BIT and BIT_ULL macros
Jesse Brandeburg [Thu, 4 Jun 2015 20:24:02 +0000 (16:24 -0400)]
i40e: use BIT and BIT_ULL macros

Use macros for abstracting (1 << foo) to BIT(foo)
and (1ULL << foo64) to BIT_ULL(foo64) in order to match
better with kernel requirements.

NOTE: the adminq_cmd.h file was not modified on purpose because
of the dependency upon firmware for that file.

Change-ID: I73ee2e48c880d671948aad19bd53ca6b2ac558fc
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoi40e: clean up error status messages
Shannon Nelson [Thu, 4 Jun 2015 20:24:01 +0000 (16:24 -0400)]
i40e: clean up error status messages

Clean up a little confusion in reporting error status in phy and fcoe
setup error reports by separating the return status from the AQ error.

Add two decoder functions to make this easier.

Change-ID: I960bcdeef3978a15fec1cdb5eff781d5cbae42fb
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoi40e: provide correct API version to older VF drivers
Mitch Williams [Thu, 4 Jun 2015 20:24:00 +0000 (16:24 -0400)]
i40e: provide correct API version to older VF drivers

This driver fully supports VF drivers using both the 1.0 and 1.1
versions of the virtual channel API. However, VF drivers using
version 1.0 get upset if we provide them with a version other than
that, and refuse to play with us.

Correct this by checking the VFs API version at the time that we
store it off, and provide the correct version number back to the VF
so we can all get along.

Change-ID: I86dfe02e67b2bef336b4b49a1bb072f3e7229abc
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoi40evf: support virtual channel API version 1.1
Mitch Williams [Thu, 4 Jun 2015 20:23:59 +0000 (16:23 -0400)]
i40evf: support virtual channel API version 1.1

Store off the PF's API version, then use it to determine whether or not
to send it our capabilities. Change the version checking to allow for PF
drivers with lower API versions than our current version, so we can
still talk to PF drivers over the 1.0 API.

Change-ID: I8edc55d1229c7decf0ed3f285a63032694007c2e
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Jim young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoi40evf: handle big resets
Mitch Williams [Thu, 4 Jun 2015 20:23:58 +0000 (16:23 -0400)]
i40evf: handle big resets

The most common type of reset that the VF will encounter is a PF reset
that cascades down into a VF reset for each VF. In this case, the VF
will always be assigned the same VSI and recovery is fairly simple.

However, in the case of 'bigger' resets, such as a Core or EMP reset,
when the device is reinitialized, it's probable that the VF will NOT get
the same VSI. When this happens, the VF will not be able to recover, as
it will continue to request resources for its original VSI.

Add an extra state to the admin queue state machine so that the driver
can re-request its configuration information at runtime. During reset
recovery, set this bit in the aq_required field, and fetch the (possibly
new) configuration information before attempting to bring the driver
back up. Since the driver doesn't know what kind of reset it has
encountered, this step is done even for a PF reset, but it doesn't hurt
anything - it just gets the same VSI back.

Change-ID: I915d59ffb40375215117362f4ac7a37811aba748
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoi40e: support virtual channel API 1.1
Mitch Williams [Thu, 4 Jun 2015 20:23:57 +0000 (16:23 -0400)]
i40e: support virtual channel API 1.1

Store off the VF API version for use when figuring out the VF driver
capabilities. Add support for the VF driver handing its capabilities to
the PF driver and then use this information when sending VF resource
information back to the VF driver.

Change-ID: Ic00d0eeeb5b8118085e12f068ef857089a8f7c2d
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoi40e/i40evf: add macros for virtual channel API version and device capability
Mitch Williams [Thu, 4 Jun 2015 20:23:56 +0000 (16:23 -0400)]
i40e/i40evf: add macros for virtual channel API version and device capability

Now that we've rolled the virtual channel API version to 1.1, add some
macros to test what version is being used by our partner in crime. For the
VF, add some macros to determine what our device capabilities are.

Change-ID: I79f6683d4c23bd76a8ad9fd492776fcc1208e1dc
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoi40e: add VF capabilities to virtual channel interface
Mitch Williams [Thu, 4 Jun 2015 20:23:55 +0000 (16:23 -0400)]
i40e: add VF capabilities to virtual channel interface

To prepare for the changes coming up in the X722 device and future
devices, the virtual channel interface has to change slightly. The VF
driver can now report what its capable of supporting, which then informs
the PF driver when it sends the configuration information back to the
VF.

A 1.1 VF driver on a 1.0 PF driver should not send its capabilities.
Likewise, a 1.1 PF driver controlling a 1.0 VF driver should not expect
or depend upon receiving the VF capabilities.

All other aspects of the API are unchanged.

Change-ID: I530cc55f107edd1ee8bdf95830aa90b87854058a
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Acked-by: Shannon Nelson <shannon.nelson@intel.com>
Acked-by: Anjali Singhai <anjali.singhai@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoi40e: clean up unneeded gotos
Shannon Nelson [Mon, 1 Jun 2015 19:33:03 +0000 (19:33 +0000)]
i40e: clean up unneeded gotos

With a little work we can clean up some unnecessary logic jumping and
drop a variable.

Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Cc: Laurent Navet <laurent.navet@gmail.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoi40e/i40evf: Fix and refactor dynamic ITR code
Carolyn Wyborny [Wed, 10 Jun 2015 17:42:07 +0000 (13:42 -0400)]
i40e/i40evf: Fix and refactor dynamic ITR code

This patch changes the switch statement for dynamic interrupt throttling
and adds a default case. With this patch, we check the latency setting
instead of the current ITR settings and the included refactor improves
performance.

Without this patch, the ITR setting would never change dynamically, and
there was no default.

Change-ID: Idb5a8a14c7109ec47c90f6e94bd43baa17d7ee37
Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoigb: bump version to igb-5.3.0
Todd Fujinaka [Wed, 20 May 2015 22:40:20 +0000 (15:40 -0700)]
igb: bump version to igb-5.3.0

Signed-off-by: Todd Fujinaka <todd.fujinaka@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoigb: use ARRAY_SIZE to replace calculating sizeof(a)/sizeof(a[0])
Todd Fujinaka [Tue, 30 Jun 2015 22:16:55 +0000 (15:16 -0700)]
igb: use ARRAY_SIZE to replace calculating sizeof(a)/sizeof(a[0])

Use the ARRAY_SIZE macro rather than calculating sizeof(a)/sizeof(a[0]).
Also directly replace the code rather than using an unnecessary define.

Reported-by: Maninder Singh <maninder1.s@samsung.com>
Reported-by: Joe Perches <joe@perches.com>
Signed-off-by: Todd Fujinaka <todd.fujinaka@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoigb: report unsupported ethtool settings in set_coalesce
Todd Fujinaka [Thu, 4 Jun 2015 21:26:56 +0000 (14:26 -0700)]
igb: report unsupported ethtool settings in set_coalesce

There are many settings possible using ethtool -C/--coalesce, but not
all of them are supported in igb. Report failure when an unsupported
option is set.

Signed-off-by: Todd Fujinaka <todd.fujinaka@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoe1000e: Cleanup qos request in error handling of e1000_open
Jia-Ju Bai [Thu, 4 Jun 2015 13:07:27 +0000 (21:07 +0800)]
e1000e: Cleanup qos request in error handling of e1000_open

The driver lacks pm_qos_remove_request in error handling (err_req_irq) of
e1000_open, and qos request inserted by pm_qos_add_request is not removed.
This patch add pm_qos_remove_request in error handling to fix it.

Signed-off-by: Jia-Ju Bai <baijiaju1990@163.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoixgbevf: add support for reporting RSS key and hash table for X550
Emil Tantilov [Thu, 30 Apr 2015 18:50:55 +0000 (11:50 -0700)]
ixgbevf: add support for reporting RSS key and hash table for X550

This patch extends the reporting of the RSS key and hash table by
adding support for X550 VFs. The difference is that X550 VFs have
their own registers for RSS key and indirection table, so there is
no need to query the PF.

The RSS key and indirection table are stored in the adapter structure
during the configuration of VFRSSRK and VFRETA which in turn can be
used in ethtool for reporting.

The logic for writing VFRETA is also changed to make sure that the
indirection table is reported correctly.

In addition this patch adds defines for the VFRETA entries and number
of VFRSSRK registers as well as some whitespace cleanups.

Reported-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoip_tunnel: Call ip_tunnel_core_init() from inet_init()
Thomas Graf [Thu, 23 Jul 2015 08:08:44 +0000 (10:08 +0200)]
ip_tunnel: Call ip_tunnel_core_init() from inet_init()

Convert the module_init() to a invocation from inet_init() since
ip_tunnel_core is part of the INET built-in.

Fixes: 3093fbe7ff4 ("route: Per route IP tunnel metadata via lightweight tunnel")
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
David S. Miller [Thu, 23 Jul 2015 07:41:16 +0000 (00:41 -0700)]
Merge git://git./linux/kernel/git/davem/net

Conflicts:
net/bridge/br_mdb.c

br_mdb.c conflict was a function call being removed to fix a bug in
'net' but whose signature was changed in 'net-next'.

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Linus Torvalds [Wed, 22 Jul 2015 21:45:25 +0000 (14:45 -0700)]
Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:

 1) Don't use shared bluetooth antenna in iwlwifi driver for management
    frames, from Emmanuel Grumbach.

 2) Fix device ID check in ath9k driver, from Felix Fietkau.

 3) Off by one in xen-netback BUG checks, from Dan Carpenter.

 4) Fix IFLA_VF_PORT netlink attribute validation, from Daniel Borkmann.

 5) Fix races in setting peeked bit flag in SKBs during datagram
    receive.  If it's shared we have to clone it otherwise the value can
    easily be corrupted.  Fix from Herbert Xu.

 6) Revert fec clock handling change, causes regressions.  From Fabio
    Estevam.

 7) Fix use after free in fq_codel and sfq packet schedulers, from WANG
    Cong.

 8) ipvlan bug fixes (memory leaks, missing rcu_dereference_bh, etc.)
    from WANG Cong and Konstantin Khlebnikov.

 9) Memory leak in act_bpf packet action, from Alexei Starovoitov.

10) ARM bpf JIT bug fixes from Nicolas Schichan.

11) Fix backwards compat of ANY_LAYOUT in virtio_net driver, from
    Michael S Tsirkin.

12) Destruction of bond with different ARP header types not handled
    correctly, fix from Nikolay Aleksandrov.

13) Revert GRO receive support in ipv6 SIT tunnel driver, causes
    regressions because the GRO packets created cannot be processed
    properly on the GSO side if we forward the frame.  From Herbert Xu.

14) TCCR update race and other fixes to ravb driver from Sergei
    Shtylyov.

15) Fix SKB leaks in caif_queue_rcv_skb(), from Eric Dumazet.

16) Fix panics on packet scheduler filter replace, from Daniel Borkmann.

17) Make sure AF_PACKET sees properly IP headers in defragmented frames
    (via PACKET_FANOUT_FLAG_DEFRAG option), from Edward Hyunkoo Jee.

18) AF_NETLINK cannot hold mutex in RCU callback, fix from Florian
    Westphal.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (84 commits)
  ravb: fix ring memory allocation
  net: phy: dp83867: Fix warning check for setting the internal delay
  openvswitch: allocate nr_node_ids flow_stats instead of num_possible_nodes
  netlink: don't hold mutex in rcu callback when releasing mmapd ring
  ARM: net: fix vlan access instructions in ARM JIT.
  ARM: net: handle negative offsets in BPF JIT.
  ARM: net: fix condition for load_order > 0 when translating load instructions.
  tcp: suppress a division by zero warning
  drivers: net: cpsw: remove tx event processing in rx napi poll
  inet: frags: fix defragmented packet's IP header for af_packet
  net: mvneta: fix refilling for Rx DMA buffers
  stmmac: fix setting of driver data in stmmac_dvr_probe
  sched: cls_flow: fix panic on filter replace
  sched: cls_flower: fix panic on filter replace
  sched: cls_bpf: fix panic on filter replace
  net/mdio: fix mdio_bus_match for c45 PHY
  net: ratelimit warnings about dst entry refcount underflow or overflow
  caif: fix leaks and race in caif_queue_rcv_skb()
  qmi_wwan: add the second QMI/network interface for Sierra Wireless MC7305/MC7355
  ravb: fix race updating TCCR
  ...

9 years agoip_tunnel: Provide tunnel metadata API for CONFIG_INET=n
Thomas Graf [Wed, 22 Jul 2015 12:43:58 +0000 (14:43 +0200)]
ip_tunnel: Provide tunnel metadata API for CONFIG_INET=n

Account for the configuration FIB_RULES=y && INET=n as FIB_RULES can
be selected by IPV6 or DECNET without INET.

Fixes: e7030878fc84 ("fib: Add fib rule match on tunnel id")
Fixes: 3093fbe7ff4b ("route: Per route IP tunnel metadata via lightweight tunnel")
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoipv6: sysctl to restrict candidate source addresses
Erik Kline [Wed, 22 Jul 2015 07:38:25 +0000 (16:38 +0900)]
ipv6: sysctl to restrict candidate source addresses

Per RFC 6724, section 4, "Candidate Source Addresses":

    It is RECOMMENDED that the candidate source addresses be the set
    of unicast addresses assigned to the interface that will be used
    to send to the destination (the "outgoing" interface).

Add a sysctl to enable this behaviour.

Signed-off-by: Erik Kline <ek@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agor8152: support the new RTL8153 chip
hayeswang [Wed, 22 Jul 2015 07:27:41 +0000 (15:27 +0800)]
r8152: support the new RTL8153 chip

Support the new USB gigabit ethernet.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agompls_iptunnel: fix sparse warn: remove incorrect rcu_dereference
Roopa Prabhu [Wed, 22 Jul 2015 05:49:00 +0000 (22:49 -0700)]
mpls_iptunnel: fix sparse warn: remove incorrect rcu_dereference

fix for:
net/mpls/mpls_iptunnel.c:73:19: sparse: incompatible types in comparison
expression (different address spaces)

remove incorrect rcu_dereference possibly left over from
earlier revisions of the code.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'bnx2x-next'
David S. Miller [Wed, 22 Jul 2015 17:47:27 +0000 (10:47 -0700)]
Merge branch 'bnx2x-next'

Yuval Mintz says:

====================
bnx2x: update FW, rebrand and more

This patch series does several things - it updates the bnx2x FW into
7.12.30 which both contains some small fixes as well as opening the door
for several new features for the device - mainly vxlan/geneve offloads
and vlan filtering offload.
It then adds a new Multi-function mode [BD] which requires this FW in
order to operate.

In addition, this finally rebrands the driver from a 'broadcom' driver
into a 'qlogic' driver [although it would still reside under Broadcom's
tree in the kernel].
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobnx2x: Bump up driver version to 1.712.30
Yuval Mintz [Wed, 22 Jul 2015 06:16:27 +0000 (09:16 +0300)]
bnx2x: Bump up driver version to 1.712.30

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobnx2x: Add MFW dump support
Yuval Mintz [Wed, 22 Jul 2015 06:16:26 +0000 (09:16 +0300)]
bnx2x: Add MFW dump support

Devices with up-to-date management FW will be able to store register dumps
on their persistent storage - in case management FW identifies a fatal
error it would gather and store such dumps, which could later be retrieved
using specific debug tools.

This patch adds the necessary part in the driver in order to make the
feature operational, as well as update users [under debug] during load
in case their device contains a dump of a previous crash.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobnx2x: new Multi-function mode - BD
Yuval Mintz [Wed, 22 Jul 2015 06:16:25 +0000 (09:16 +0300)]
bnx2x: new Multi-function mode - BD

This adds support to a new multi-function mode, enabling driver to
initialize such devices and correctly interacting with management FW
for fully utilizing their features.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobnx2x: Add 84858 phy support
Yaniv Rosner [Wed, 22 Jul 2015 06:16:24 +0000 (09:16 +0300)]
bnx2x: Add 84858 phy support

This adds support to a new copper phy.

Signed-off-by: Yaniv Rosner <Yaniv.Rosner@qlogic.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobnx2x: Rebrand from 'broadcom' into 'qlogic'
Yuval Mintz [Wed, 22 Jul 2015 06:16:23 +0000 (09:16 +0300)]
bnx2x: Rebrand from 'broadcom' into 'qlogic'

bnx2x still appears as a Broadcom driver even though the devices it
utilizes belong to Qlogic for more than a year.

This patch changes the various headers and the device strings to indicate
the correct ownership of the device.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobnx2x: Utilize FW 7.12.30
Yuval Mintz [Wed, 22 Jul 2015 06:16:22 +0000 (09:16 +0300)]
bnx2x: Utilize FW 7.12.30

This moves bnx2x into using 7.12.30 FW. Said firmware fixes the following:

 - Packets from a VF with pvid configured which were sent with a
   different vlan were transmitted instead of being discarded.

 - FCoE traffic might not recover after a failue while there's traffic
   to another function.

In addition, this FW opens the door for the driver to implement several
new features; Specifically, this enhances the device's support for
encapsulated packets and will allow vxlan/geneve offloads to be added in
the future, as well as vlan filtering offload.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Linus Torvalds [Wed, 22 Jul 2015 15:52:42 +0000 (08:52 -0700)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux

Pull ARM64 fixes from Catalin Marinas:

 - arm64 build fix following the move of the thread_struct to the end of
   task_struct and the asm offsets becoming too large for the AArch64
   ISA

 - preparatory patch for moving irq_data struct members (applied now to
   reduce dependency for the next merging window)

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  ARM64/irq: Use access helper irq_data_get_affinity_mask()
  arm64: switch_to: calculate cpu context pointer using separate register

9 years agoARM64/irq: Use access helper irq_data_get_affinity_mask()
Jiang Liu [Mon, 13 Jul 2015 20:30:04 +0000 (20:30 +0000)]
ARM64/irq: Use access helper irq_data_get_affinity_mask()

This is a preparatory patch for moving irq_data struct members.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Reviewed-by: Hanjun Guo <hanjun.guo@linaro.org>
Cc: linux-arm-kernel@lists.infradead.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
9 years agoarm64: switch_to: calculate cpu context pointer using separate register
Will Deacon [Mon, 20 Jul 2015 14:14:53 +0000 (15:14 +0100)]
arm64: switch_to: calculate cpu context pointer using separate register

Commit 0c8c0f03e3a2 ("x86/fpu, sched: Dynamically allocate 'struct fpu'")
moved the thread_struct to the bottom of task_struct. As a result, the
offset is now too large to be used in an immediate add on arm64 with
some kernel configs:

arch/arm64/kernel/entry.S: Assembler messages:
arch/arm64/kernel/entry.S:588: Error: immediate out of range
arch/arm64/kernel/entry.S:597: Error: immediate out of range

This patch calculates the offset using an additional register instead of
an immediate offset.

Fixes: 0c8c0f03e3a2 ("x86/fpu, sched: Dynamically allocate 'struct fpu'")
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Olof Johansson <olof@lixom.net>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
9 years agonet: track success and failure of TCP PMTU probing
Rick Jones [Tue, 21 Jul 2015 23:14:13 +0000 (16:14 -0700)]
net: track success and failure of TCP PMTU probing

Track success and failure of TCP PMTU probing.

Signed-off-by: Rick Jones <rick.jones2@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoravb: fix ring memory allocation
Sergei Shtylyov [Tue, 21 Jul 2015 22:31:59 +0000 (01:31 +0300)]
ravb: fix ring memory allocation

The driver is written as if it can adapt to a low memory situation  allocating
less RX  skbs and TX aligned buffers than the respective RX/TX ring sizes.  In
reality  though  the driver  would malfunction in this case. Stop being overly
smart and just fail in such situation -- this is achieved by moving the memory
allocation from ravb_ring_format() to ravb_ring_init().

We leave dma_map_single() calls in place but make their failure non-fatal
by marking the corresponding RX descriptors  with zero data size which should
prevent DMA to an invalid addresses.

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agocxgb4: Add debugfs entry to enable backdoor access
Hariprasad Shenai [Tue, 21 Jul 2015 17:09:40 +0000 (22:39 +0530)]
cxgb4: Add debugfs entry to enable backdoor access

Add debugfs entry 'use_backdoor' to enable backdoor access to read sge
context. By default, we read sge context's via firmware. In case of FW
issues, one can enable backdoor access via debugfs to dump sge context
for debugging purpose.

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: phy: dp83867: Fix warning check for setting the internal delay
Dan Murphy [Tue, 21 Jul 2015 17:06:45 +0000 (12:06 -0500)]
net: phy: dp83867: Fix warning check for setting the internal delay

Fix warning: logical ‘or’ of collectively exhaustive tests is always true

Change the internal delay check from an 'or' condition to an 'and'
condition.

Reported-by: David Binderman <dcb314@hotmail.com>
Signed-off-by: Dan Murphy <dmurphy@ti.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agompls: make RTA_OIF optional
Roopa Prabhu [Tue, 21 Jul 2015 16:16:24 +0000 (09:16 -0700)]
mpls: make RTA_OIF optional

If user did not specify an oif, try and get it from the via address.
If failed to get device, return with -ENODEV.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoopenvswitch: allocate nr_node_ids flow_stats instead of num_possible_nodes
Chris J Arges [Tue, 21 Jul 2015 17:36:33 +0000 (12:36 -0500)]
openvswitch: allocate nr_node_ids flow_stats instead of num_possible_nodes

Some architectures like POWER can have a NUMA node_possible_map that
contains sparse entries. This causes memory corruption with openvswitch
since it allocates flow_cache with a multiple of num_possible_nodes() and
assumes the node variable returned by for_each_node will index into
flow->stats[node].

Use nr_node_ids to allocate a maximal sparse array instead of
num_possible_nodes().

The crash was noticed after 3af229f2 was applied as it changed the
node_possible_map to match node_online_map on boot.
Fixes: 3af229f2071f5b5cb31664be6109561fbe19c861

Signed-off-by: Chris J Arges <chris.j.arges@canonical.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonetlink: don't hold mutex in rcu callback when releasing mmapd ring
Florian Westphal [Tue, 21 Jul 2015 14:33:50 +0000 (16:33 +0200)]
netlink: don't hold mutex in rcu callback when releasing mmapd ring

Kirill A. Shutemov says:

This simple test-case trigers few locking asserts in kernel:

int main(int argc, char **argv)
{
        unsigned int block_size = 16 * 4096;
        struct nl_mmap_req req = {
                .nm_block_size          = block_size,
                .nm_block_nr            = 64,
                .nm_frame_size          = 16384,
                .nm_frame_nr            = 64 * block_size / 16384,
        };
        unsigned int ring_size;
int fd;

fd = socket(AF_NETLINK, SOCK_RAW, NETLINK_GENERIC);
        if (setsockopt(fd, SOL_NETLINK, NETLINK_RX_RING, &req, sizeof(req)) < 0)
                exit(1);
        if (setsockopt(fd, SOL_NETLINK, NETLINK_TX_RING, &req, sizeof(req)) < 0)
                exit(1);

ring_size = req.nm_block_nr * req.nm_block_size;
mmap(NULL, 2 * ring_size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
return 0;
}

+++ exited with 0 +++
BUG: sleeping function called from invalid context at /home/kas/git/public/linux-mm/kernel/locking/mutex.c:616
in_atomic(): 1, irqs_disabled(): 0, pid: 1, name: init
3 locks held by init/1:
 #0:  (reboot_mutex){+.+...}, at: [<ffffffff81080959>] SyS_reboot+0xa9/0x220
 #1:  ((reboot_notifier_list).rwsem){.+.+..}, at: [<ffffffff8107f379>] __blocking_notifier_call_chain+0x39/0x70
 #2:  (rcu_callback){......}, at: [<ffffffff810d32e0>] rcu_do_batch.isra.49+0x160/0x10c0
Preemption disabled at:[<ffffffff8145365f>] __delay+0xf/0x20

CPU: 1 PID: 1 Comm: init Not tainted 4.1.0-00009-gbddf4c4818e0 #253
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Debian-1.8.2-1 04/01/2014
 ffff88017b3d8000 ffff88027bc03c38 ffffffff81929ceb 0000000000000102
 0000000000000000 ffff88027bc03c68 ffffffff81085a9d 0000000000000002
 ffffffff81ca2a20 0000000000000268 0000000000000000 ffff88027bc03c98
Call Trace:
 <IRQ>  [<ffffffff81929ceb>] dump_stack+0x4f/0x7b
 [<ffffffff81085a9d>] ___might_sleep+0x16d/0x270
 [<ffffffff81085bed>] __might_sleep+0x4d/0x90
 [<ffffffff8192e96f>] mutex_lock_nested+0x2f/0x430
 [<ffffffff81932fed>] ? _raw_spin_unlock_irqrestore+0x5d/0x80
 [<ffffffff81464143>] ? __this_cpu_preempt_check+0x13/0x20
 [<ffffffff8182fc3d>] netlink_set_ring+0x1ed/0x350
 [<ffffffff8182e000>] ? netlink_undo_bind+0x70/0x70
 [<ffffffff8182fe20>] netlink_sock_destruct+0x80/0x150
 [<ffffffff817e484d>] __sk_free+0x1d/0x160
 [<ffffffff817e49a9>] sk_free+0x19/0x20
[..]

Cong Wang says:

We can't hold mutex lock in a rcu callback, [..]

Thomas Graf says:

The socket should be dead at this point. It might be simpler to
add a netlink_release_ring() function which doesn't require
locking at all.

Reported-by: "Kirill A. Shutemov" <kirill@shutemov.name>
Diagnosed-by: Cong Wang <cwang@twopensource.com>
Suggested-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'sfc-filter-chaining'
David S. Miller [Wed, 22 Jul 2015 05:21:33 +0000 (22:21 -0700)]
Merge branch 'sfc-filter-chaining'

Edward Cree says:

====================
sfc: support for cascaded multicast filtering

Recent versions of firmware for SFC9100 adapters add support for filter
 chaining, in which packets matching multiple filters are delivered to all
 filters' recipients, rather than only the highest match-priority filter as was
 previously the case.
This patch series enables this feature and redesigns the filter handling code
 to make use of it; in particular, subscribing to a multicast address on one
 function no longer prevents traffic to that address reaching another function
 which is in promiscuous or allmulti mode.
If the firmware does not support filter chaining, the driver will fall back to
 the old behaviour.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agosfc: clean fallbacks between promisc/normal in efx_ef10_filter_sync_rx_mode
Edward Cree [Tue, 21 Jul 2015 14:11:00 +0000 (15:11 +0100)]
sfc: clean fallbacks between promisc/normal in efx_ef10_filter_sync_rx_mode

Separate functions for inserting individual and promisc filters; explicit
 fallback logic in efx_ef10_filter_sync_rx_mode(), in order not to overload
 the 'promisc' flag as also meaning "fall back to promisc".

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>