GitHub/MotorolaMobilityLLC/kernel-slsi.git
11 years agoxen-netback: remove page tracking facility
Wei Liu [Mon, 26 Aug 2013 11:59:37 +0000 (12:59 +0100)]
xen-netback: remove page tracking facility

The data flow from DomU to DomU on the same host in current copying
scheme with tracking facility:

       copy
DomU --------> Dom0          DomU
 |                            ^
 |____________________________|
             copy

The page in Dom0 is a page with valid MFN. So we can always copy from
page Dom0, thus removing the need for a tracking facility.

       copy           copy
DomU --------> Dom0 -------> DomU

Simple iperf test shows no performance regression (obviously we copy
twice either way):

  W/  tracking: ~5.3Gb/s
  W/o tracking: ~5.4Gb/s

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Matt Wilson <msw@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch
David S. Miller [Wed, 28 Aug 2013 02:11:18 +0000 (22:11 -0400)]
Merge branch 'master' of git://git./linux/kernel/git/jesse/openvswitch

Jesse Gross says:

====================
A number of significant new features and optimizations for net-next/3.12.
Highlights are:
 * "Megaflows", an optimization that allows userspace to specify which
   flow fields were used to compute the results of the flow lookup.
   This allows for a major reduction in flow setups (the major
   performance bottleneck in Open vSwitch) without reducing flexibility.
 * Converting netlink dump operations to use RCU, allowing for
   additional parallelism in userspace.
 * Matching and modifying SCTP protocol fields.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
David S. Miller [Wed, 28 Aug 2013 02:07:02 +0000 (22:07 -0400)]
Merge branch 'master' of git://git./linux/kernel/git/pablo/nf-next

Pablo Neira Ayuso says:

====================
The following patchset contains Netfilter updates for your net-next tree,
they are:

* The new SYNPROXY target for iptables, including IPv4 and IPv6 support,
  from Patrick McHardy.

* nf_defrag_ipv6.o should be only linked to nf_defrag_ipv6.ko, from
  Nathan Hintz.

* Fix an old bug in REJECT, which replies with wrong MAC source address
  from the bridge, by Phil Oester.

* Fix uninitialized helper variable in the expectation support over
  nfnetlink_queue, from Florian Westphal.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoMerge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/bwh/sfc...
David S. Miller [Wed, 28 Aug 2013 01:56:22 +0000 (21:56 -0400)]
Merge branch 'for-davem' of git://git./linux/kernel/git/bwh/sfc-next

Ben Hutchings says:

====================
More refactoring and cleanup, particularly around filter management.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonetfilter: ctnetlink: fix uninitialized variable
Florian Westphal [Tue, 27 Aug 2013 09:47:26 +0000 (11:47 +0200)]
netfilter: ctnetlink: fix uninitialized variable

net/netfilter/nf_conntrack_netlink.c: In function 'ctnetlink_nfqueue_attach_expect':
'helper' may be used uninitialized in this function

It was only initialized in if CTA_EXPECT_HELP_NAME attribute was
present, it must be NULL otherwise.

Problem added recently in bd077937
(netfilter: nfnetlink_queue: allow to attach expectations to conntracks).

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
11 years agonetfilter: add IPv6 SYNPROXY target
Patrick McHardy [Tue, 27 Aug 2013 06:50:16 +0000 (08:50 +0200)]
netfilter: add IPv6 SYNPROXY target

Add an IPv6 version of the SYNPROXY target. The main differences to the
IPv4 version is routing and IP header construction.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Tested-by: Martin Topholm <mph@one.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
11 years agonet: syncookies: export cookie_v6_init_sequence/cookie_v6_check
Patrick McHardy [Tue, 27 Aug 2013 06:50:15 +0000 (08:50 +0200)]
net: syncookies: export cookie_v6_init_sequence/cookie_v6_check

Extract the local TCP stack independant parts of tcp_v6_init_sequence()
and cookie_v6_check() and export them for use by the upcoming IPv6 SYNPROXY
target.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: David S. Miller <davem@davemloft.net>
Tested-by: Martin Topholm <mph@one.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
11 years agonetfilter: add SYNPROXY core/target
Patrick McHardy [Tue, 27 Aug 2013 06:50:14 +0000 (08:50 +0200)]
netfilter: add SYNPROXY core/target

Add a SYNPROXY for netfilter. The code is split into two parts, the synproxy
core with common functions and an address family specific target.

The SYNPROXY receives the connection request from the client, responds with
a SYN/ACK containing a SYN cookie and announcing a zero window and checks
whether the final ACK from the client contains a valid cookie.

It then establishes a connection to the original destination and, if
successful, sends a window update to the client with the window size
announced by the server.

Support for timestamps, SACK, window scaling and MSS options can be
statically configured as target parameters if the features of the server
are known. If timestamps are used, the timestamp value sent back to
the client in the SYN/ACK will be different from the real timestamp of
the server. In order to now break PAWS, the timestamps are translated in
the direction server->client.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Tested-by: Martin Topholm <mph@one.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
11 years agonet: syncookies: export cookie_v4_init_sequence/cookie_v4_check
Patrick McHardy [Tue, 27 Aug 2013 06:50:13 +0000 (08:50 +0200)]
net: syncookies: export cookie_v4_init_sequence/cookie_v4_check

Extract the local TCP stack independant parts of tcp_v4_init_sequence()
and cookie_v4_check() and export them for use by the upcoming SYNPROXY
target.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: David S. Miller <davem@davemloft.net>
Tested-by: Martin Topholm <mph@one.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
11 years agonetfilter: nf_conntrack: make sequence number adjustments usuable without NAT
Patrick McHardy [Tue, 27 Aug 2013 06:50:12 +0000 (08:50 +0200)]
netfilter: nf_conntrack: make sequence number adjustments usuable without NAT

Split out sequence number adjustments from NAT and move them to the conntrack
core to make them usable for SYN proxying. The sequence number adjustment
information is moved to a seperate extend. The extend is added to new
conntracks when a NAT mapping is set up for a connection using a helper.

As a side effect, this saves 24 bytes per connection with NAT in the common
case that a connection does not have a helper assigned.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Tested-by: Martin Topholm <mph@one.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
11 years agonetfilter: nf_defrag_ipv6.o included twice
Nathan Hintz [Fri, 23 Aug 2013 05:09:12 +0000 (22:09 -0700)]
netfilter: nf_defrag_ipv6.o included twice

'nf_defrag_ipv6' is built as a separate module; it shouldn't be
included in the 'nf_conntrack_ipv6' module as well.

Signed-off-by: Nathan Hintz <nlhintz@hotmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
11 years agonetfilter: ip[6]t_REJECT: tcp-reset using wrong MAC source if bridged
Phil Oester [Wed, 26 Jun 2013 21:16:28 +0000 (17:16 -0400)]
netfilter: ip[6]t_REJECT: tcp-reset using wrong MAC source if bridged

As reported by Casper Gripenberg, in a bridged setup, using ip[6]t_REJECT
with the tcp-reset option sends out reset packets with the src MAC address
of the local bridge interface, instead of the MAC address of the intended
destination.  This causes some routers/firewalls to drop the reset packet
as it appears to be spoofed.  Fix this by bypassing ip[6]_local_out and
setting the MAC of the sender in the tcp reset packet.

This closes netfilter bugzilla #531.

Signed-off-by: Phil Oester <kernel@linuxace.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
11 years agoopenvswitch: optimize flow compare and mask functions
Andy Zhou [Tue, 27 Aug 2013 20:02:21 +0000 (13:02 -0700)]
openvswitch: optimize flow compare and mask functions

Make sure the sw_flow_key structure and valid mask boundaries are always
machine word aligned. Optimize the flow compare and mask operations
using machine word size operations. This patch improves throughput on
average by 15% when CPU is the bottleneck of forwarding packets.

This patch is inspired by ideas and code from a patch submitted by Peter
Klausler titled "replace memcmp() with specialized comparator".
However, The original patch only optimizes for architectures
support unaligned machine word access. This patch optimizes for all
architectures.

Signed-off-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
11 years agoe1000e: balance semaphore put/get for 82573
Steven La [Sat, 24 Aug 2013 00:19:37 +0000 (17:19 -0700)]
e1000e: balance semaphore put/get for 82573

Steven (cc-ed) noticed an imbalance in semaphore put/get for
82573-based NICs. Don't we need something like the following
(untested) patch?

Signed-off-by: Steven La <sla@riverbed.com>
Acked-by: Arthur Kepner <akepner@riverbed.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoDocumentation/networking/: Update Intel wired LAN driver documentation
Jeff Kirsher [Sat, 24 Aug 2013 00:19:23 +0000 (17:19 -0700)]
Documentation/networking/: Update Intel wired LAN driver documentation

Updates the documentation to the Intel wired LAN drivers.

Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agobna: firmware update to 3.2.1.1
Rasesh Mody [Fri, 23 Aug 2013 21:31:30 +0000 (14:31 -0700)]
bna: firmware update to 3.2.1.1

This patch updates the firmware to address the thermal notification issue

Signed-off-by: Rasesh Mody <rmody@brocade.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoVMXNET3: Add support for virtual IOMMU
Andy King [Fri, 23 Aug 2013 16:33:49 +0000 (09:33 -0700)]
VMXNET3: Add support for virtual IOMMU

This patch adds support for virtual IOMMU to the vmxnet3 module.  We
switch to DMA consistent mappings for anything we pass to the device.
There were a few places where we already did this, but using pci_blah();
these have been fixed to use dma_blah(), along with all new occurrences
where we've replaced kmalloc() and friends.

Also fix two small bugs:
1) use after free of rq->buf_info in vmxnet3_rq_destroy()
2) a cpu_to_le32() that should have been a cpu_to_le64()

Acked-by: George Zhang <georgezhang@vmware.com>
Acked-by: Aditya Sarwade <asarwade@vmware.com>
Signed-off-by: Andy King <acking@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agobe2net: implement ethtool set/get_channel hooks
Sathya Perla [Tue, 27 Aug 2013 11:27:35 +0000 (16:57 +0530)]
be2net: implement ethtool set/get_channel hooks

Support is provided only for combined channels. When SR-IOV is not
enabled, BE3 supports upto 16 channels and Lancer-R/SH-R support upto
32 channels.

Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agobe2net: refactor be_setup() to consolidate queue creation routines
Sathya Perla [Tue, 27 Aug 2013 11:27:34 +0000 (16:57 +0530)]
be2net: refactor be_setup() to consolidate queue creation routines

1) Move be_cmd_if_create() above queue create routines to allow
   TXQ creation (that requires if_handle) to be clubbed with TX-CQ creation.
2) Consolidate all queue create routines into be_setup_queues()

Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agobe2net: Fix be_cmd_if_create() to use MBOX if MCCQ is not created
Sathya Perla [Tue, 27 Aug 2013 11:27:33 +0000 (16:57 +0530)]
be2net: Fix be_cmd_if_create() to use MBOX if MCCQ is not created

Currently the IF_CREATE FW cmd is issued only *after* MCCQ is created as
it was coded to only use MCCQ. By fixing this, cmd_if_create() can be
called before MCCQ is created and the same routine for VF provisioning
can be called after.
This allows for consolidating all the queue create routines by moving
the be_cmd_if_create() call above all queue create calls in be_setup().

Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agobe2net: refactor be_get_resources() code
Sathya Perla [Tue, 27 Aug 2013 11:27:32 +0000 (16:57 +0530)]
be2net: refactor be_get_resources() code

1) use be_resources{} struct to query/store HW resource limits
2) The HW queue/resource limits for BE2/BE3 chips are mostly called out
   in driver as constants.  Code to handle this is scattered across various
   places in be_setup(). Consolidate this code into BEx_get_resources().
   For Lancer-R, Skyhawk-R, these limits are queried from FW.

Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agobe2net: Fixup profile management routines
Vasundhara Volam [Tue, 27 Aug 2013 11:27:31 +0000 (16:57 +0530)]
be2net: Fixup profile management routines

1) Parse PCIe descriptor for max-VFs supported by HW
2) Cleanup NIC descriptor parsing in get_func/profile_config() routines
3) Use common struct definitions for v0 and v1 versions of GET_FUNC_CONFIG
Signed-off-by: Vasundhara Volam <vasundhara.volam@emulex.com>
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agobe2net: use EQ_CREATEv2 for SH-R
Sathya Perla [Tue, 27 Aug 2013 11:27:30 +0000 (16:57 +0530)]
be2net: use EQ_CREATEv2 for SH-R

EQ_CREATEv2 explicitly returns the msix-index associated with a EQ.
For SH-R this is needed if EQs need to be deleted and re-created without
resetting a function.

Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: tcp_probe: allow more advanced ingress filtering by mark
Daniel Borkmann [Fri, 23 Aug 2013 14:16:33 +0000 (16:16 +0200)]
net: tcp_probe: allow more advanced ingress filtering by mark

Currently, the tcp_probe snooper can either filter packets by a given
port (handed to the module via module parameter e.g. port=80) or lets
all TCP traffic pass (port=0, default). When a port is specified, the
port number is tested against the sk's source/destination port. Thus,
if one of them matches, the information will be further processed for
the log.

As this is quite limited, allow for more advanced filtering possibilities
which can facilitate debugging/analysis with the help of the tcp_probe
snooper. Therefore, similarly as added to BPF machine in commit 7e75f93e
("pkt_sched: ingress socket filter by mark"), add the possibility to
use skb->mark as a filter.

If the mark is not being used otherwise, this allows ingress filtering
by flow (e.g. in order to track updates from only a single flow, or a
subset of all flows for a given port) and other things such as dynamic
logging and reconfiguration without removing/re-inserting the tcp_probe
module, etc. Simple example:

  insmod net/ipv4/tcp_probe.ko fwmark=8888 full=1
  ...
  iptables -A INPUT -i eth4 -t mangle -p tcp --dport 22 \
           --sport 60952 -j MARK --set-mark 8888
  [... sampling interval ...]
  iptables -D INPUT -i eth4 -t mangle -p tcp --dport 22 \
           --sport 60952 -j MARK --set-mark 8888

The current option to filter by a given port is still being preserved. A
similar approach could be done for the sctp_probe module as a follow-up.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoqlcnic: Update version to 5.3.49.
Sucheta Chakraborty [Fri, 23 Aug 2013 17:38:29 +0000 (13:38 -0400)]
qlcnic: Update version to 5.3.49.

Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoqlcnic: dcb: Add support for CEE Netlink interface.
Sucheta Chakraborty [Fri, 23 Aug 2013 17:38:28 +0000 (13:38 -0400)]
qlcnic: dcb: Add support for CEE Netlink interface.

o Adapter and driver supports only CEE dcbnl ops. Only GET callbacks
  within dcbnl ops are supported currently.

Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoqlcnic: dcb: Register DCB AEN handler.
Sucheta Chakraborty [Fri, 23 Aug 2013 17:38:27 +0000 (13:38 -0400)]
qlcnic: dcb: Register DCB AEN handler.

o Adapter sends Asynchronous Event Notifications to the driver when
  there are changes in the switch or adapter DCBX configuration.
  AEN handler updates the driver DCBX parameters.

Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoqlcnic: dcb: Get DCB parameters from the adapter.
Sucheta Chakraborty [Fri, 23 Aug 2013 17:38:26 +0000 (13:38 -0400)]
qlcnic: dcb: Get DCB parameters from the adapter.

o Populate driver data structures with local, operational, and peer
  DCB parameters.

Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoqlcnic: dcb: Query adapter DCB capabilities.
Sucheta Chakraborty [Fri, 23 Aug 2013 17:38:25 +0000 (13:38 -0400)]
qlcnic: dcb: Query adapter DCB capabilities.

o Query adapter DCB capabilities and  populate local data structures
  with relevant information.

o Add QLCNIC_DCB to Kconfig for enabling/disabling DCB.

Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoMerge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/bwh/sfc...
David S. Miller [Tue, 27 Aug 2013 16:16:20 +0000 (12:16 -0400)]
Merge branch 'for-davem' of git://git./linux/kernel/git/bwh/sfc-next

Ben Hutchings says:

====================
1. Refactoring and cleanup in preparation for new hardware support.
2. Some bug fixes for firmware completion handling.  (They're not known
to cause real problems, otherwise I'd be submitting these for net and
stable.)
3. Update to the firmware protocol (MCDI) definitions.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoopenvswitch: Rename key_len to key_end
Andy Zhou [Thu, 22 Aug 2013 19:12:57 +0000 (12:12 -0700)]
openvswitch: Rename key_len to key_end

Key_end is a better name describing the ending boundary than key_len.
Rename those variables to make it less confusing.

Signed-off-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
11 years agoopenvswitch: Add SCTP support
Joe Stringer [Thu, 22 Aug 2013 19:30:48 +0000 (12:30 -0700)]
openvswitch: Add SCTP support

This patch adds support for rewriting SCTP src,dst ports similar to the
functionality already available for TCP/UDP.

Rewriting SCTP ports is expensive due to double-recalculation of the
SCTP checksums; this is performed to ensure that packets traversing OVS
with invalid checksums will continue to the destination with any
checksum corruption intact.

Reviewed-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Joe Stringer <joe@wand.net.nz>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
11 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
David S. Miller [Mon, 26 Aug 2013 20:37:08 +0000 (16:37 -0400)]
Merge git://git./linux/kernel/git/davem/net

Conflicts:
drivers/net/wireless/iwlwifi/pcie/trans.c
include/linux/inetdevice.h

The inetdevice.h conflict involves moving the IPV4_DEVCONF values
into a UAPI header, overlapping additions of some new entries.

The iwlwifi conflict is a context overlap.

Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoMerge branch 'cadence'
David S. Miller [Mon, 26 Aug 2013 20:04:26 +0000 (16:04 -0400)]
Merge branch 'cadence'

Boris BREZILLON says:

====================
net/cadence/macb: add support for dt phy definition

This patch series adds support for ethernet phy definition using device
tree.

This may help in moving some at91 boards to dt (some of them define an
interrupt pin).

Tested on samad31ek.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoARM: at91/dt: define phy available on sama5d3 mother board
Boris BREZILLON [Thu, 22 Aug 2013 15:58:29 +0000 (17:58 +0200)]
ARM: at91/dt: define phy available on sama5d3 mother board

This patch describe the phy used on atmel sama5d3 mother board:
 - phy address
 - phy interrupt pin

Signed-off-by: Boris BREZILLON <b.brezillon@overkiz.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet/cadence/macb: add support for dt phy definition
Boris BREZILLON [Thu, 22 Aug 2013 15:57:28 +0000 (17:57 +0200)]
net/cadence/macb: add support for dt phy definition

The macb driver only handle PHY description through platform_data
(macb_platform_data).
Thus, when using dt you cannot define phy properties like phy address or
phy irq pin.

This patch makes use of the of_mdiobus_register to add support for
phy device definition using dt.
A fallback to the autoscan procedure is added in case there is no phy
devices defined in dt.

Signed-off-by: Boris BREZILLON <b.brezillon@overkiz.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoipip: potential race in ip_tunnel_init_net()
Dan Carpenter [Fri, 23 Aug 2013 08:15:37 +0000 (11:15 +0300)]
ipip: potential race in ip_tunnel_init_net()

Eric Dumazet says that my previous fix for an ERR_PTR dereference
(ea857f28ab 'ipip: dereferencing an ERR_PTR in ip_tunnel_init_net()')
could be racy and suggests the following fix instead.

Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agobonding: fix error return code in bond_enslave()
Wei Yongjun [Fri, 23 Aug 2013 02:45:07 +0000 (10:45 +0800)]
bonding: fix error return code in bond_enslave()

Fix to return a negative error code in the add bond vlan ids error
handling case instead of 0, as done elsewhere in this function.

Introduced by commit 1ff412ad7714f6952f76ffd77f0a7f2f563288a1.
(bonding: change the bond's vlan syncing functions with the standard ones)

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Acked-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoMerge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/bwh/sfc...
David S. Miller [Sun, 25 Aug 2013 22:30:27 +0000 (18:30 -0400)]
Merge branch 'for-davem' of git://git./linux/kernel/git/bwh/sfc-next

Merge SFC driver changes from Ben Hutchings.

Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: Add NEXTHDR_SCTP to ipv6.h
Joe Stringer [Tue, 23 Jul 2013 04:37:45 +0000 (13:37 +0900)]
net: Add NEXTHDR_SCTP to ipv6.h

Signed-off-by: Joe Stringer <joe@wand.net.nz>
Signed-off-by: Jesse Gross <jesse@nicira.com>
11 years agoopenvswitch: Mega flow implementation
Andy Zhou [Thu, 8 Aug 2013 03:01:00 +0000 (20:01 -0700)]
openvswitch: Mega flow implementation

Add wildcarded flow support in kernel datapath.

Wildcarded flow can improve OVS flow set up performance by avoid sending
matching new flows to the user space program. The exact performance boost
will largely dependent on wildcarded flow hit rate.

In case all new flows hits wildcard flows, the flow set up rate is
within 5% of that of linux bridge module.

Pravin has made significant contributions to this patch. Including API
clean ups and bug fixes.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
11 years agoopenvswitch: check CONFIG_OPENVSWITCH_GRE in makefile
Cong Wang [Tue, 20 Aug 2013 17:48:15 +0000 (10:48 -0700)]
openvswitch: check CONFIG_OPENVSWITCH_GRE in makefile

Cc: Jesse Gross <jesse@nicira.com>
Cc: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
11 years agoopenvswitch: Fix argument descriptions in vport.c.
Justin Pettit [Tue, 20 Aug 2013 00:49:29 +0000 (17:49 -0700)]
openvswitch: Fix argument descriptions in vport.c.

Signed-off-by: Justin Pettit <jpettit@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
11 years agoopenvswitch:: link upper device for port devices
Jiri Pirko [Fri, 26 Jul 2013 12:01:54 +0000 (14:01 +0200)]
openvswitch:: link upper device for port devices

Link upper device properly. That will make IFLA_MASTER filled up.
Set the master to port 0 of the datapath under which the port belongs.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Jesse Gross <jesse@nicira.com>
11 years agoopenvswitch: Use non rcu hlist_del() flow table entry.
Pravin B Shelar [Tue, 30 Jul 2013 22:45:59 +0000 (15:45 -0700)]
openvswitch: Use non rcu hlist_del() flow table entry.

Flow table destroy is done in rcu call-back context.  Therefore
there is no need to use rcu variant of hlist_del().

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
11 years agoopenvswitch: Use RCU lock for dp dump operation.
Pravin B Shelar [Tue, 30 Jul 2013 22:42:19 +0000 (15:42 -0700)]
openvswitch: Use RCU lock for dp dump operation.

RCUfy dp-dump operation which is already read-only. This
makes all ovs dump operations lockless.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
11 years agoopenvswitch: Use RCU lock for flow dump operation.
Pravin B Shelar [Tue, 30 Jul 2013 22:39:39 +0000 (15:39 -0700)]
openvswitch: Use RCU lock for flow dump operation.

Flow dump operation is read-only operation.  There is no need to
take ovs-lock.  Following patch use rcu-lock for dumping flows.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
11 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Linus Torvalds [Fri, 23 Aug 2013 16:54:21 +0000 (09:54 -0700)]
Merge git://git./linux/kernel/git/davem/net

Merge networking fixes from David Miller:

 1) Revert Johannes Berg's genetlink locking fix, because it causes
    regressions.

    Johannes and Pravin Shelar are working on fixing things properly.

 2) Do not drop ipv6 ICMP messages without a redirected header option,
    they are legal.  From Duan Jiong.

 3) Missing error return propagation in probing of via-ircc driver.
    From Alexey Khoroshilov.

 4) Do not clear out broadcast/multicast/unicast/WOL bits in r8169 when
    initializing, from Peter Wu.

 5) realtek phy driver programs wrong interrupt status bit, from
    Giuseppe CAVALLARO.

 6) Fix statistics regression in AF_PACKET code, from Willem de Bruijn.

 7) Bridge code uses wrong bitmap length, from Toshiaki Makita.

 8) SFC driver uses wrong indexes to look up MAC filters, from Ben
    Hutchings.

 9) Don't pass stack buffers into usb control operations in hso driver,
    from Daniel Gimpelevich.

10) Multiple ipv6 fragmentation headers in one packet is illegal and
    such packets should be dropped, from Hannes Frederic Sowa.

11) When TCP sockets are "repaired" as part of checkpoint/restart, the
    timestamp field of SKBs need to be refreshed otherwise RTOs can be
    wildly off.  From Andrey Vagin.

12) Fix memcpy args (uses 'address of pointer' instead of 'pointer') in
    hostp driver.  From Dan Carpenter.

13) nl80211hdr_put() doesn't return an ERR_PTR, but some code believes
    it does.  From Dan Carpenter.

14) Fix regression in wireless SME disconnects, from Johannes Berg.

15) Don't use a stack buffer for DMA in zd1201 USB wireless driver, from
    Jussi Kivilinna.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (33 commits)
  ipv4: expose IPV4_DEVCONF
  ipv6: handle Redirect ICMP Message with no Redirected Header option
  be2net: fix disabling TX in be_close()
  Revert "genetlink: fix family dump race"
  hso: Fix stack corruption on some architectures
  hso: Earlier catch of error condition
  sfc: Fix lookup of default RX MAC filters when steered using ethtool
  bridge: Use the correct bit length for bitmap functions in the VLAN code
  packet: restore packet statistics tp_packets to include drops
  net: phy: rtl8211: fix interrupt on status link change
  r8169: remember WOL preferences on driver load
  via-ircc: don't return zero if via_ircc_open() failed
  macvtap: Ignore tap features when VNET_HDR is off
  macvtap: Correctly set tap features when IFF_VNET_HDR is disabled.
  macvtap: simplify usage of tap_features
  tcp: set timestamps for restored skb-s
  bnx2x: set VF DMAE when first function has 0 supported VFs
  bnx2x: Protect against VFs' ndos when SR-IOV is disabled
  bnx2x: prevent VF benign attentions
  bnx2x: Consider DCBX remote error
  ...

11 years agoMerge branch 'akpm' (patches from Andrew Morton)
Linus Torvalds [Fri, 23 Aug 2013 16:52:32 +0000 (09:52 -0700)]
Merge branch 'akpm' (patches from Andrew Morton)

Merge fixes from Andrew Morton:
 "A few fixes.  One is a licensing change and I don't do licensing, so
  please eyeball that one"

Licensing eye-balled.

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  lib/lz4: correct the LZ4 license
  memcg: get rid of swapaccount leftovers
  nilfs2: fix issue with counting number of bio requests for BIO_EOPNOTSUPP error detection
  nilfs2: remove double bio_put() in nilfs_end_bio_write() for BIO_EOPNOTSUPP error
  drivers/platform/olpc/olpc-ec.c: initialise earlier

11 years agolib/lz4: correct the LZ4 license
Richard Laager [Thu, 22 Aug 2013 23:35:47 +0000 (16:35 -0700)]
lib/lz4: correct the LZ4 license

The LZ4 code is listed as using the "BSD 2-Clause License".

Signed-off-by: Richard Laager <rlaager@wiktel.com>
Acked-by: Kyungsik Lee <kyungsik.lee@lge.com>
Cc: Chanho Min <chanho.min@lge.com>
Cc: Richard Yao <ryao@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[ The 2-clause BSD can be just converted into GPL, but that's rude and
  pointless, so don't do it   - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
11 years agomemcg: get rid of swapaccount leftovers
Michal Hocko [Thu, 22 Aug 2013 23:35:46 +0000 (16:35 -0700)]
memcg: get rid of swapaccount leftovers

The swapaccount kernel parameter without any values has been removed by
commit a2c8990aed5a ("memsw: remove noswapaccount kernel parameter") but
it seems that we didn't get rid of all the left overs.

Make sure that menuconfig help text and kernel-parameters.txt are clear
about value for the paramter and remove the stalled comment which is not
very much useful on its own.

Signed-off-by: Michal Hocko <mhocko@suse.cz>
Reported-by: Gergely Risko <gergely@risko.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
11 years agonilfs2: fix issue with counting number of bio requests for BIO_EOPNOTSUPP error detection
Vyacheslav Dubeyko [Thu, 22 Aug 2013 23:35:45 +0000 (16:35 -0700)]
nilfs2: fix issue with counting number of bio requests for BIO_EOPNOTSUPP error detection

Fix the issue with improper counting number of flying bio requests for
BIO_EOPNOTSUPP error detection case.

The sb_nbio must be incremented exactly the same number of times as
complete() function was called (or will be called) because
nilfs_segbuf_wait() will call wail_for_completion() for the number of
times set to sb_nbio:

  do {
      wait_for_completion(&segbuf->sb_bio_event);
  } while (--segbuf->sb_nbio > 0);

Two functions complete() and wait_for_completion() must be called the
same number of times for the same sb_bio_event.  Otherwise,
wait_for_completion() will hang or leak.

Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
11 years agonilfs2: remove double bio_put() in nilfs_end_bio_write() for BIO_EOPNOTSUPP error
Vyacheslav Dubeyko [Thu, 22 Aug 2013 23:35:44 +0000 (16:35 -0700)]
nilfs2: remove double bio_put() in nilfs_end_bio_write() for BIO_EOPNOTSUPP error

Remove double call of bio_put() in nilfs_end_bio_write() for the case of
BIO_EOPNOTSUPP error detection.  The issue was found by Dan Carpenter
and he suggests first version of the fix too.

Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
11 years agodrivers/platform/olpc/olpc-ec.c: initialise earlier
Daniel Drake [Thu, 22 Aug 2013 23:35:43 +0000 (16:35 -0700)]
drivers/platform/olpc/olpc-ec.c: initialise earlier

Being a low-level component, various drivers (e.g.  olpc-battery) assume
that it is ok to communicate with the OLPC Embedded Controller during
probe.  Therefore the OLPC EC driver must be initialised before other
drivers try to use it.  This was the case until it was recently moved
out of arch/x86 and restructured around commits ac2504151f5a ("Platform:
OLPC: turn EC driver into a platform_driver") and 85f90cf6ca56 ("x86:
OLPC: switch over to using new EC driver on x86").

Use arch_initcall so that olpc-ec is readied earlier, matching the
previous behaviour.

Fixes a regression introduced in Linux-3.6 where various drivers such as
olpc-battery and olpc-xo1-sci failed to load due to an inability to
communicate with the EC.  The user-visible effect was a lack of battery
monitoring, missing ebook/lid switch input devices, etc.

Signed-off-by: Daniel Drake <dsd@laptop.org>
Cc: Andres Salomon <dilinger@queued.net>
Cc: Paul Fox <pgf@laptop.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
11 years agofs_enet: cleanup clock API use
Gerhard Sittig [Thu, 22 Aug 2013 19:55:13 +0000 (21:55 +0200)]
fs_enet: cleanup clock API use

make the Freescale ethernet driver get, prepare and enable the FEC clock
during probe(); disable and unprepare the clock upon remove(), put is
done by the devm approach; hold a reference to the clock over the period
of use.

clock lookup is non-fatal as not all platforms provide clock specs in
their device tree; failure to enable specified clocks is fatal.

Signed-off-by: Gerhard Sittig <gsi@denx.de>
Signed-off-by: Anatolij Gustschin <agust@denx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agofs_enet: silence a build warning (unused variable)
Gerhard Sittig [Thu, 22 Aug 2013 19:55:12 +0000 (21:55 +0200)]
fs_enet: silence a build warning (unused variable)

Since commit 720a43efd30f04a0a492c85fb997361c44fbae05
(drivers:net: Remove unnecessary OOM messages after netdev_alloc_skb)
there is a build warning:

drivers/net/ethernet/freescale/fs_enet/fs_enet-main.c: In function 'tx_skb_align_workaround':
drivers/net/ethernet/freescale/fs_enet/fs_enet-main.c:586:26: warning: unused variable 'fep'

Fix it.

Signed-off-by: Gerhard Sittig <gsi@denx.de>
Signed-off-by: Anatolij Gustschin <agust@denx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: sctp_probe: simplify code by using %pISc format specifier
Daniel Borkmann [Thu, 22 Aug 2013 15:12:50 +0000 (17:12 +0200)]
net: sctp_probe: simplify code by using %pISc format specifier

We can simply use the %pISc format specifier that was recently added
and thus remove some code that distinguishes between IPv4 and IPv6.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoMerge branch 'for-davem' of git://gitorious.org/linux-can/linux-can-next into cpsw
David S. Miller [Fri, 23 Aug 2013 03:49:59 +0000 (20:49 -0700)]
Merge branch 'for-davem' of git://gitorious.org/linux-can/linux-can-next into cpsw

Marc Kleine-Budde says:

====================
another pull-request for net-next. It consists of two patches by Libo
Chen, the at91 and flexcan driver make use of platform_set_drvdata()
rather than open coding it. Chen Gang improves the error checking in
the c_can_platform driver's probe function.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: ethernet: davinci_cpdma: export cpdma_chan_get_stats
Daniel Mack [Thu, 22 Aug 2013 11:47:00 +0000 (13:47 +0200)]
net: ethernet: davinci_cpdma: export cpdma_chan_get_stats

This is needed when the cpsw driver is built as module.

Signed-off-by: Daniel Mack <zonque@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoipv4: expose IPV4_DEVCONF
stephen hemminger [Thu, 22 Aug 2013 04:09:47 +0000 (21:09 -0700)]
ipv4: expose IPV4_DEVCONF

IP sends device configuration (see inet_fill_link_af) as an array
in the netlink information, but the indices in that array are not
exposed to userspace through any current santized header file.

It was available back in 2.6.32 (in /usr/include/linux/sysctl.h)
but was broken by:
  commit 02291680ffba92e5b5865bc0c5e7d1f3056b80ec
  Author: Eric W. Biederman <ebiederm@xmission.com>
  Date:   Sun Feb 14 03:25:51 2010 +0000

    net ipv4: Decouple ipv4 interface parameters from binary sysctl numbers

Eric was solving the sysctl problem but then the indices were re-exposed
by a later addition of devconf support for IPV4

  commit 9f0f7272ac9506f4c8c05cc597b7e376b0b9f3e4
  Author: Thomas Graf <tgraf@infradead.org>
  Date:   Tue Nov 16 04:32:48 2010 +0000

    ipv4: AF_INET link address family

Putting them in /usr/include/linux/ip.h seemed the logical match
for the DEVCONF_ definitions for IPV6 in /usr/include/linux/ip6.h

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoipv6: handle Redirect ICMP Message with no Redirected Header option
Duan Jiong [Thu, 22 Aug 2013 04:07:35 +0000 (12:07 +0800)]
ipv6: handle Redirect ICMP Message with no Redirected Header option

rfc 4861 says the Redirected Header option is optional, so
the kernel should not drop the Redirect Message that has no
Redirected Header option. In this patch, the function
ip6_redirect_no_header() is introduced to deal with that
condition.

Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
11 years agobe2net: fix disabling TX in be_close()
Sathya Perla [Thu, 22 Aug 2013 06:53:41 +0000 (12:23 +0530)]
be2net: fix disabling TX in be_close()

commit fba875591 ("disable TX in be_close()") disabled TX in be_close()
to protect be_xmit() from touching freed up queues in the AER recovery
flow.  But, TX must be disabled *before* cleaning up TX completions in
the close() path, not after. This allows be_tx_compl_clean() to free up
all TX-req skbs that were notified to the HW.

Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: tcp_probe: add IPv6 support
Daniel Borkmann [Wed, 21 Aug 2013 17:48:00 +0000 (19:48 +0200)]
net: tcp_probe: add IPv6 support

The tcp_probe currently only supports analysis of IPv4 connections.
Therefore, it would be nice to have IPv6 supported as well. Since we
have the recently added %pISpc specifier that is IPv4/IPv6 generic,
build related sockaddress structures from the flow information and
pass this to our format string. Tested with SSH and HTTP sessions
on IPv4 and IPv6.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: tcp_probe: kprobes: adapt jtcp_rcv_established signature
Daniel Borkmann [Wed, 21 Aug 2013 17:47:59 +0000 (19:47 +0200)]
net: tcp_probe: kprobes: adapt jtcp_rcv_established signature

This patches fixes a rather unproblematic function signature mismatch
as the const specifier was missing for the th variable; and next to
that it adds a build-time assertion so that future function signature
mismatches for kprobes will not end badly, similarly as commit 22222997
("net: sctp: add build check for sctp_sf_eat_sack_6_2/jsctp_sf_eat_sack")
did it for SCTP.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: tcp_probe: also include rcv_wnd next to snd_wnd
Daniel Borkmann [Wed, 21 Aug 2013 17:47:58 +0000 (19:47 +0200)]
net: tcp_probe: also include rcv_wnd next to snd_wnd

It is helpful to sometimes know the TCP window sizes of an established
socket e.g. to confirm that window scaling is working or to tweak the
window size to improve high-latency connections, etc etc. Currently the
TCP snooper only exports the send window size, but not the receive window
size. Therefore, also add the receive window size to the end of the
output line.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec...
David S. Miller [Thu, 22 Aug 2013 23:04:41 +0000 (16:04 -0700)]
Merge branch 'master' of git://git./linux/kernel/git/klassert/ipsec-next

Steffen Klassert says:

====================
1) Some constifications, from Mathias Krause.

2) Catch bugs if a hold timer is still active when xfrm_policy_destroy()
   is called, from Fan Du.

3) Remove a redundant address family checking, from Fan Du.

4) Make xfrm_state timer monotonic to be independent of system clock changes,
   from Fan Du.

5) Remove an outdated comment on returning -EREMOTE in the xfrm_lookup(),
   from Rami Rosen.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoethernet: broadcom: remove unnecessary platform_set_drvdata()
Jingoo Han [Thu, 22 Aug 2013 01:57:20 +0000 (10:57 +0900)]
ethernet: broadcom: remove unnecessary platform_set_drvdata()

The driver core clears the driver data to NULL after device_release
or on probe failure. Thus, it is not needed to manually clear the
device driver data to NULL.

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoethernet: moxa: remove unnecessary platform_set_drvdata()
Jingoo Han [Thu, 22 Aug 2013 01:55:21 +0000 (10:55 +0900)]
ethernet: moxa: remove unnecessary platform_set_drvdata()

The driver core clears the driver data to NULL after device_release
or on probe failure. Thus, it is not needed to manually clear the
device driver data to NULL.

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agotcp: increase throughput when reordering is high
Yuchung Cheng [Thu, 22 Aug 2013 00:29:23 +0000 (17:29 -0700)]
tcp: increase throughput when reordering is high

The stack currently detects reordering and avoid spurious
retransmission very well. However the throughput is sub-optimal under
high reordering because cwnd is increased only if the data is deliverd
in order. I.e., FLAG_DATA_ACKED check in tcp_ack().  The more packet
are reordered the worse the throughput is.

Therefore when reordering is proven high, cwnd should advance whenever
the data is delivered regardless of its ordering. If reordering is low,
conservatively advance cwnd only on ordered deliveries in Open state,
and retain cwnd in Disordered state (RFC5681).

Using netperf on a qdisc setup of 20Mbps BW and random RTT from 45ms
to 55ms (for reordering effect). This change increases TCP throughput
by 20 - 25% to near bottleneck BW.

A special case is the stretched ACK with new SACK and/or ECE mark.
For example, a receiver may receive an out of order or ECN packet with
unacked data buffered because of LRO or delayed ACK. The principle on
such an ACK is to advance cwnd on the cummulative acked part first,
then reduce cwnd in tcp_fastretrans_alert().

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoMerge branch 'sfc-3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/bwh/sfc
David S. Miller [Thu, 22 Aug 2013 21:34:13 +0000 (14:34 -0700)]
Merge branch 'sfc-3.11' of git://git./linux/kernel/git/bwh/sfc

Merge in a fix for RX MAC address filter programming bug in the sfc
driver.

Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoSolutionEngine7724: fix typo in Ether platform data
Sergei Shtylyov [Wed, 21 Aug 2013 22:18:30 +0000 (02:18 +0400)]
SolutionEngine7724: fix typo in Ether platform data

Commit bd61224b1cbec096694e89c4187119c8576fe186 (SolutionEngine7724: fix Ether
support) has a typo in the 'phy_interface' field name of the platform data which
causes build error -- fix it.

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoSH7619: fix typo in Ether platform data
Sergei Shtylyov [Wed, 21 Aug 2013 22:17:25 +0000 (02:17 +0400)]
SH7619: fix typo in Ether platform data

Commit 06a64f91da72cb5827e2bedef2ead60a123fd66e (SH7619: fix Ether support)  has
a typo in the 'phy_interface' field name of the platform data which causes build
error -- fix it.

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agor8169: fix invalid register dump
Peter Wu [Wed, 21 Aug 2013 21:17:11 +0000 (23:17 +0200)]
r8169: fix invalid register dump

For some reason, my PCIe RTL8111E onboard NIC on a GA-Z68X-UD3H-B3
motherboard reads as FFs when reading from MMIO with a block size
larger than 7. Therefore change to reading blocks of four bytes.

Ben Hutchings noted that the buffer is large enough to hold all
registers, so now all registers are read.

Signed-off-by: Peter Wu <lekensteyn@gmail.com>
Acked-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoRevert "genetlink: fix family dump race"
Johannes Berg [Wed, 21 Aug 2013 14:08:02 +0000 (16:08 +0200)]
Revert "genetlink: fix family dump race"

This reverts commit 58ad436fcf49810aa006016107f494c9ac9013db.

It turns out that the change introduced a potential deadlock
by causing a locking dependency with netlink's cb_mutex. I
can't seem to find a way to resolve this without doing major
changes to the locking, so revert this.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoMerge branch 'linux-next' of git://cavan.codon.org.uk/platform-drivers-x86
Linus Torvalds [Thu, 22 Aug 2013 20:04:11 +0000 (13:04 -0700)]
Merge branch 'linux-next' of git://cavan.codon.org.uk/platform-drivers-x86

Pull x86 platform driver fixes from Matthew Garrett:
 "Three trivial fixes - the first reverts a patch that's broken some
  other devices (again - I'm trying to figure out a clean way to
  implement this), the other two fix minor issues in the sony-laptop
  driver"

* 'linux-next' of git://cavan.codon.org.uk/platform-drivers-x86:
  Revert "hp-wmi: Enable hotkeys on some systems"
  sony-laptop: Fix reporting of gfx_switch_status
  sony-laptop: return a negative error code in sonypi_compat_init()

11 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net...
David S. Miller [Thu, 22 Aug 2013 20:00:46 +0000 (13:00 -0700)]
Merge branch 'master' of git://git./linux/kernel/git/jkirsher/net-next

Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates

This series contains updates to igb and e1000e.

Akeem provides 3 igb patches, the first resets the link when EEE is enabled
or disabled if the link is up.  His second patch changes a register read
which normally stores of the read value to "just-read" so that hardware
can accurately latch the register read.  Lastly, he adds rcu_lock to avoid
a possible race condition with igb_update_stats function.

Mitch provides a fix for SR-IOV, where MSI-X interrupts are required, so
make sure that MSI-X is enabled before allowing the user to turn on SR-IOV.

Alex's igb patch make it so that we limit the lower bound for max_frame_size
to the size of a standard Ethernet frame.  This allows for feature parity
with other Intel based drivers such as ixgbe.

Carolyn adds a SKU for a flashless i210 device and a fix for get_fw_version()
so that it works for all parts for igb.  In addition, she has 2 igb patches
to refactor NVM code to accommodate devices with no flash.  Lastly, she
adds code to check for the failure of pci_disable_link_state() to attempt
to work around a problem found with some systems.

Laura provides the remaining 2 igb patches.  One removing the hard-coded
value for the size of the RETA indirection table, and creates a macro instead
for the RETA indirection table.  The second adds the ethtool callbacks
necessary to change the RETA indirection table from userspace.

Bruce fixes a whitespace issue in a recent commit and resolves a jiffies
comparison warning by using time_after().

Li provides a fix for e1000e to avoid a kernel crash on shutdown by adding
one more check in e1000e_shutdown().  This is due to e1000e_shutdown()
trying to clear correctable errors on the upstream P2P bridge, when under
some cases we do not have the upstream P2P bridge.

v2:
 - fixed patch 11 conditional statement from < to <= based on feedback
   from Ben Hutchings
 - fixed patch 12 patch description (adding the commit summary) based
   on feedback from Sergei Shtylyov
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agosfc: Eliminate struct efx_mtd
Ben Hutchings [Wed, 28 Nov 2012 04:38:10 +0000 (04:38 +0000)]
sfc: Eliminate struct efx_mtd

Currently we use struct efx_mtd to represent a physical NVRAM device
and struct efx_mtd_partition to represent a partition on that device.
But this only really makes sense for Falcon, as we don't know or care
whether MC-managed NVRAM partitions are on one or more physical
devices.  It complicates iteration and provides little benefit.
Therefore:

- Replace the pointer to efx_mtd in mtd_info::priv with a pointer to efx_nic
- Move the falcon_spi_device pointer into the union in struct efx_mtd_partition
- Move the device name to efx_mtd_partition::dev_type_name
- Move the efx_mtd_ops pointer to efx_nic::mtd_ops
- Make efx_nic::mtd_list a list of partitions

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
11 years agosfc: Rename SPI stuff to show that it is Falcon-specific
Ben Hutchings [Wed, 28 Nov 2012 04:12:41 +0000 (04:12 +0000)]
sfc: Rename SPI stuff to show that it is Falcon-specific

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
11 years agosfc: Cleanup Falcon-arch simple MAC filter state
Ben Hutchings [Mon, 19 Nov 2012 23:08:22 +0000 (23:08 +0000)]
sfc: Cleanup Falcon-arch simple MAC filter state

On Falcon we implement MAC filtering requested by the stack using the
MAC wrapper's single unicast filter and multicast hash filter.  Siena
is very similar, though MAC configuration is mediated by the MC.

Since MCDI operations may sleep, reconfiguration is deferred from
ndo_set_rx_mode to a work item.  However, it still updates the private
variables describing the filter state synchronously.  Contrary to
comments, the later use of these variables is not protected using the
address lock, resulting in race conditions.

Move the state update to a new function
efx_farch_filter_sync_rx_mode() and make the Falcon-arch MAC
configuration functions call that, so that its use is consistently
serialised by the mac_lock.

Invert and rename the promiscuous flag to the more accurate
unicast_filter, and comment that both this and multicast_hash are
not used on EF10.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
11 years agosfc: Define and use MCDI_POPULATE_DWORD_{1,2,3,4,5,6,7}
Ben Hutchings [Wed, 10 Oct 2012 22:24:51 +0000 (23:24 +0100)]
sfc: Define and use MCDI_POPULATE_DWORD_{1,2,3,4,5,6,7}

There is only one user now, but we're about to add many more.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
11 years agosfc: Add flag for stack-owned RX MAC filters
Ben Hutchings [Mon, 19 Nov 2012 23:08:20 +0000 (23:08 +0000)]
sfc: Add flag for stack-owned RX MAC filters

MAC filters inserted on request from the stack (ndo_set_rx_mode)
should allow manual steering but not removal.  Currently we have a
special case for Siena's all-multicast and all-unicast MAC filters,
but on EF10 we need to allow for steering of precise MAC filters as
well.

The EFX_FILTER_FLAG_RX_STACK flag changes the behaviour of replacement
and removal requests:

- Replacement *of* a filter with this flag never clears the flag but
  does change steering and saved priority
- Replacement *by* a filter with this flag only sets the flag but does
  not change steering
- Removal with priority < EFX_FILTER_PRI_REQUIRED really resets RX
  steering and saved priority

This could support precise MAC filtering on Siena in future.

As a side-benefit, the default MAC filters are hidden from ethtool
until they are steered.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
11 years agosfc: Refactor Falcon-arch filter removal
Ben Hutchings [Mon, 19 Nov 2012 23:08:19 +0000 (23:08 +0000)]
sfc: Refactor Falcon-arch filter removal

Move the special case for removal of default filters from
efx_farch_filter_table_clear_entry() into a wrapper function,
efx_farch_filter_table_remove().  Move the existence and priority
checks into the latter and use it where appropriate.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
11 years agosfc: Make most filter operations NIC-type-specific
Ben Hutchings [Thu, 8 Nov 2012 01:46:53 +0000 (01:46 +0000)]
sfc: Make most filter operations NIC-type-specific

Aside from accelerated RFS, there is almost nothing that can be shared
between the filter table implementations for the Falcon architecture
and EF10.

Move the few shared functions into efx.c and rx.c and the rest into
farch.c.  Introduce efx_nic_type operations for the implementation and
inline wrapper functions that call these.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
11 years agosfc: Refactor Falcon-arch search limit reset
Ben Hutchings [Fri, 26 Oct 2012 23:33:48 +0000 (00:33 +0100)]
sfc: Refactor Falcon-arch search limit reset

Currently every call to efx_farch_filter_table_clear_entry() is
shortly followed by a conditional reset of the table limits.  The new
limits (0) are not pushed to hardware until the next filter insertion.
Move both the reset and the hardware reconfiguration into
efx_farch_filter_table_clear_entry(), and add an explanatory comment.

Also, make consistent use of the term 'search limit' for the maximum
number of probes the NIC must make when searching for a filter of a
particular type.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
11 years agosfc: Split Falcon-arch-specific and common filter state
Ben Hutchings [Fri, 26 Oct 2012 23:33:30 +0000 (00:33 +0100)]
sfc: Split Falcon-arch-specific and common filter state

Move the common state from struct efx_filter_state into struct efx_nic.
Rename struct efx_filter_state to efx_farch_filter_state and change
the type of efx_nic::filter_state to void *.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
11 years agosfc: Extend and abstract efx_filter_spec to cover Huntington/EF10
Ben Hutchings [Fri, 26 Oct 2012 23:33:28 +0000 (00:33 +0100)]
sfc: Extend and abstract efx_filter_spec to cover Huntington/EF10

Replace type field with match_flags.  Add rss_context and match values
covering of most of what is now in the MCDI protocol.

Change some fields into bitfields so that the structure size doesn't grow
beyond 64 bytes.

Ditch the filter decoding functions as it is now easier to pick apart
the abstract structure.

Rewrite ethtool NFC rule functions to set/get filter match flags and
values directly.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
11 years agosfc: Name the RX drop queue ID
Ben Hutchings [Tue, 30 Oct 2012 01:01:52 +0000 (01:01 +0000)]
sfc: Name the RX drop queue ID

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
11 years agosfc: Rename Falcon-arch filter implementation types and functions
Ben Hutchings [Fri, 26 Oct 2012 22:55:46 +0000 (23:55 +0100)]
sfc: Rename Falcon-arch filter implementation types and functions

The filter table(s) on EF10 are managed by firmware and will need
almost entirely separate code.  Rename the types and functions used
within the existing implementation.  The current definition of struct
efx_filter_spec is really implementation-specific, so we need to keep
it.  For now, define a separate structure for the internal
representation but leave them identical.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
11 years agosfc: Remove unused filter_flags variables and efx_farch_filter_id_flags()
Ben Hutchings [Mon, 19 Nov 2012 19:05:25 +0000 (19:05 +0000)]
sfc: Remove unused filter_flags variables and efx_farch_filter_id_flags()

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
11 years agoMerge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm...
Linus Torvalds [Thu, 22 Aug 2013 17:44:44 +0000 (10:44 -0700)]
Merge tag 'fixes-for-linus' of git://git./linux/kernel/git/arm/arm-soc

Pull ARM SoC fixes from Olof Johansson:
 "A handful of fixes for 3.11 are still trickling in.  These are:
   - A couple of fixes for older OMAP platforms
   - Another few fixes for at91 (lateish due to European summer
     vacations)
   - A late-found problem with USB on Tegra, fix is to keep VBUS
     regulator on at all times
   - One fix for Exynos 5440 dealing with CPU detection
   - One MAINTAINERS update"

* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  ARM: tegra: always enable USB VBUS regulators
  ARM: davinci: nand: specify ecc strength
  ARM: OMAP: rx51: change musb mode to OTG
  ARM: OMAP2: fix musb usage for n8x0
  MAINTAINERS: Update email address for Benoit Cousson
  ARM: at91/DT: fix at91sam9n12ek memory node
  ARM: at91: add missing uart clocks DT entries
  ARM: SAMSUNG: fix to support for missing cpu specific map_io
  ARM: at91/DT: at91sam9x5ek: fix USB host property to enable port C

11 years agoMerge tag 'devicetree-fixes-for-3.11' of git://sources.calxeda.com/kernel/linux
Linus Torvalds [Thu, 22 Aug 2013 17:43:47 +0000 (10:43 -0700)]
Merge tag 'devicetree-fixes-for-3.11' of git://sources.calxeda.com/kernel/linux

Pull device tree fix from Rob Herring:
 "For DT unflattening, add missing memory initialization.

  This is needed for arches like PPC that use memblock_alloc.  This
  appears to have been an issue for some time, but is a somewhat limited
  usecase of OF_DYNAMIC"

* tag 'devicetree-fixes-for-3.11' of git://sources.calxeda.com/kernel/linux:
  of: fdt: fix memory initialization for expanded DT

11 years agoMerge tag 'dm-3.11-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device...
Linus Torvalds [Thu, 22 Aug 2013 17:43:00 +0000 (10:43 -0700)]
Merge tag 'dm-3.11-fixes' of git://git./linux/kernel/git/device-mapper/linux-dm

Pull device mapper fix from Mike Snitzer:
 "A patch to fix dm-cache-policy-mq's remove_mapping() conflict with
  sparc32"

* tag 'dm-3.11-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm cache: avoid conflicting remove_mapping() in mq policy

11 years agox86 get_unmapped_area: Access mmap_legacy_base through mm_struct member
Radu Caragea [Wed, 21 Aug 2013 17:55:59 +0000 (20:55 +0300)]
x86 get_unmapped_area: Access mmap_legacy_base through mm_struct member

This is the updated version of df54d6fa5427 ("x86 get_unmapped_area():
use proper mmap base for bottom-up direction") that only randomizes the
mmap base address once.

Signed-off-by: Radu Caragea <sinaelgl@gmail.com>
Reported-and-tested-by: Jeff Shorey <shoreyjeff@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michel Lespinasse <walken@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Adrian Sendroiu <molecula2788@gmail.com>
Cc: Greg KH <greg@kroah.com>
Cc: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
11 years agoRevert "x86 get_unmapped_area(): use proper mmap base for bottom-up direction"
Linus Torvalds [Thu, 22 Aug 2013 16:13:06 +0000 (09:13 -0700)]
Revert "x86 get_unmapped_area(): use proper mmap base for bottom-up direction"

This reverts commit df54d6fa54275ce59660453e29d1228c2b45a826.

The commit isn't necessarily wrong, but because it recalculates the
random mmap_base every time, it seems to confuse user memory allocators
that expect contiguous mmap allocations even when the mmap address isn't
specified.

In particular, the MATLAB Java runtime seems to be unhappy. See

  https://bugzilla.kernel.org/show_bug.cgi?id=60774

So we'll want to apply the random offset only once, and Radu has a patch
for that.  Revert this older commit in order to apply the other one.

Reported-by: Jeff Shorey <shoreyjeff@gmail.com>
Cc: Radu Caragea <sinaelgl@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
11 years agoe1000e: resolve checkpatch JIFFIES_COMPARISON warning
Bruce Allan [Thu, 15 Aug 2013 03:43:24 +0000 (03:43 +0000)]
e1000e: resolve checkpatch JIFFIES_COMPARISON warning

WARNING:JIFFIES_COMPARISON: Comparing jiffies is almost always wrong;
prefer time_after, time_before and friends

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoe1000e: Avoid kernel crash during shutdown
Li Zhang [Tue, 13 Aug 2013 18:42:58 +0000 (18:42 +0000)]
e1000e: Avoid kernel crash during shutdown

While doing shutdown on the PCI device, the corresponding callback
function e1000e_shutdown() is trying to clear those correctable
errors on the upstream P2P bridge. Unfortunately, we don't have
the upstream P2P bridge under some cases (e.g. PCI-passthrou for
KVM on Power). That leads to kernel crash eventually.

The patch adds one more check on that to avoid kernel crash.

Signed-off-by: Li Zhang <zhlcindy@gmail.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoe1000e: Add code to check for failure of pci_disable_link_state call
Carolyn Wyborny [Sat, 3 Aug 2013 01:53:54 +0000 (01:53 +0000)]
e1000e: Add code to check for failure of pci_disable_link_state call

This patch attempts to work around a problem found with some systems where
the call to pci_diable_link_state_locked() fails.  As a result, ASPM is not,
in fact, disabled.  Changing disable ASPM code to check if state actually
is disabled after the call and, if not, try another way to disable it.

Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Acked-by: Bruce W. Allan <bruce.w.allan@intel.com>
Tested-by: Pavel Machek <pavel@ucw.cz>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoe1000e: cleanup whitespace in recent commit
Bruce Allan [Fri, 2 Aug 2013 03:33:32 +0000 (03:33 +0000)]
e1000e: cleanup whitespace in recent commit

Commit (c96ddb0b e1000e: Use marco instead of digit for defining
e1000_rx_desc_packet_split) moved a define from one file to another but
missed using proper indentation/whitespace.

CC: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoigb: Expose RSS indirection table for ethtool
Laura Mihaela Vasilescu [Wed, 31 Jul 2013 20:19:54 +0000 (20:19 +0000)]
igb: Expose RSS indirection table for ethtool

This patch adds the ethtool callbacks necessary to change the RETA
indirection table from userspace.

In order to achieve this, we add the indirection table field (rss_indir_tbl)
in the board specific data structure (struct igb_adapter) to preserve the
values across hardware resets.

The indirection table must be initialized with default values in the
following cases:
* at module init time
* when the number of RX queues changes.
For this reason we add a new field (rss_indir_tbl_init) in igb_adapter
that keeps track of the number of RX queues. Whenever the number of RX
queues changes, the rss_indir_tbl is modified and initialized with default
values. The rss_indir_tbl_init is updated accordingly.

CC: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Laura Mihaela Vasilescu <laura.vasilescu@rosedu.org>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoigb: Add macro for size of RETA indirection table
Laura Mihaela Vasilescu [Wed, 31 Jul 2013 20:19:48 +0000 (20:19 +0000)]
igb: Add macro for size of RETA indirection table

RETA indirection table is used to assign the received data to a CPU
in order to maintain an efficient distribution of network receive
processing across multiple CPUs.

This patch removes the hard-coded value for the size of the indirection
table and defines a new macro.

Signed-off-by: Laura Mihaela Vasilescu <laura.vasilescu@rosedu.org>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>