GitHub/moto-9609/android_kernel_motorola_exynos9610.git
11 years agobonding: initial RCU conversion
nikolay@redhat.com [Thu, 1 Aug 2013 14:54:51 +0000 (16:54 +0200)]
bonding: initial RCU conversion

This patch does the initial bonding conversion to RCU. After it the
following modes are protected by RCU alone: roundrobin, active-backup,
broadcast and xor. Modes ALB/TLB and 3ad still acquire bond->lock for
reading, and will be dealt with later. curr_active_slave needs to be
dereferenced via rcu in the converted modes because the only thing
protecting the slave after this patch is rcu_read_lock, so we need the
proper barrier for weakly ordered archs and to make sure we don't have
stale pointer. It's not tagged with __rcu yet because there's still work
to be done to remove the curr_slave_lock, so sparse will complain when
rcu_assign_pointer and rcu_dereference are used, but the alternative to use
rcu_dereference_protected would've created much bigger code churn which is
more difficult to test and review. That will be converted in time.

1. Active-backup mode
 1.1 Perf recording while doing iperf -P 4
  - old bonding: iperf spent 0.55% in bonding, system spent 0.29% CPU
                 in bonding
  - new bonding: iperf spent 0.29% in bonding, system spent 0.15% CPU
                 in bonding
 1.2. Bandwidth measurements
  - old bonding: 16.1 gbps consistently
  - new bonding: 17.5 gbps consistently

2. Round-robin mode
 2.1 Perf recording while doing iperf -P 4
  - old bonding: iperf spent 0.51% in bonding, system spent 0.24% CPU
                 in bonding
  - new bonding: iperf spent 0.16% in bonding, system spent 0.11% CPU
                 in bonding
 2.2 Bandwidth measurements
  - old bonding: 8 gbps (variable due to packet reorderings)
  - new bonding: 10 gbps (variable due to packet reorderings)

Of course the latency has improved in all converted modes, and moreover
while
doing enslave/release (since it doesn't affect tx anymore).

Also I've stress tested all modes doing enslave/release in a loop while
transmitting traffic.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agobonding: factor out slave id tx code and simplify xmit paths
Nikolay Aleksandrov [Thu, 1 Aug 2013 14:54:50 +0000 (16:54 +0200)]
bonding: factor out slave id tx code and simplify xmit paths

I factored out the tx xmit code which relies on slave id in
bond_xmit_slave_id. It is global because later it can be used also in
3ad mode xmit. Unnecessary obvious comments are removed. Active-backup
mode is simplified because bond_dev_queue_xmit always consumes the skb.
bond_xmit_xor becomes one line because of bond_xmit_slave_id.
bond_for_each_slave_from is not used in bond_xmit_slave_id because later
when RCU is used we can avoid important race condition by using standard
rculist routines.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agobonding: simplify broadcast_xmit function
Nikolay Aleksandrov [Thu, 1 Aug 2013 14:54:49 +0000 (16:54 +0200)]
bonding: simplify broadcast_xmit function

We don't need to start from the curr_active_slave as the frame will be
sent to all eligible slaves anyway, so we remove the unnecessary local
variables, checks and comments, and make it use the standard list API.
This has the nice side-effect that later when it's converted to RCU
a race condition will be avoided which could lead to double packet tx.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agobonding: remove unnecessary read_locks of curr_slave_lock
nikolay@redhat.com [Thu, 1 Aug 2013 14:54:48 +0000 (16:54 +0200)]
bonding: remove unnecessary read_locks of curr_slave_lock

In all the cases we already hold bond->lock for reading, so the slave
can't get away and the check != NULL is sufficient. curr_active_slave
can still change after the read_lock is unlocked prior to use of the
dereferenced value, so there's no need for it. It either contains a
valid slave which we use (and can't get away), or it is NULL which is
checked.
In some places the read_lock of curr_slave_lock was left because we need
it not to change while performing some action (e.g. syncing current
active slave's addresses, sending ARP requests through the active slave)
such cases will be dealt with individually while converting to RCU.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agobonding: convert to list API and replace bond's custom list
nikolay@redhat.com [Thu, 1 Aug 2013 14:54:47 +0000 (16:54 +0200)]
bonding: convert to list API and replace bond's custom list

This patch aims to remove struct bonding's first_slave and struct
slave's next and prev pointers, and replace them with the standard Linux
list API. The old macros are converted to list API as well and some new
primitives are available now. The checks if there're slaves that used
slave_cnt have been replaced by the list_empty macro.
Also a few small style fixes, changing longest -> shortest line in local
variable declarations, leaving an empty line before return and removing
unnecessary brackets.
This is the first step to gradual RCU conversion.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoipv6: bump genid when delete/add address
fan.du [Thu, 1 Aug 2013 09:44:44 +0000 (17:44 +0800)]
ipv6: bump genid when delete/add address

Server           Client
2001:1::803/64  <-> 2001:1::805/64
2001:2::804/64  <-> 2001:2::806/64

Server side fib binary tree looks like this:

                                   (2001:/64)
                                   /
                                  /
                   ffff88002103c380
                 /                 \
     (2)        /                   \
 (2001::803/128)                     ffff880037ac07c0
                                    /               \
                                   /                 \  (3)
                      ffff880037ac0640               (2001::806/128)
                       /             \
             (1)      /               \
        (2001::804/128)               (2001::805/128)

Delete 2001::804/64 won't cause prefix route deleted as well as rt in (3)
destinate to 2001::806 with source address as 2001::804/64. That's because
2001::803/64 is still alive, which make onlink=1 in ipv6_del_addr, this is
where the substantial difference between same prefix configuration and
different prefix configuration :) So packet are still transmitted out to
2001::806 with source address as 2001::804/64.

So bump genid will clear rt in (3), and up layer protocol will eventually
find the right one for themselves.

This problem arised from the discussion in here:
http://marc.info/?l=linux-netdev&m=137404469219410&w=4

Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoMerge branch 'for-davem' of git://gitorious.org/linux-can/linux-can-next
David S. Miller [Thu, 1 Aug 2013 23:14:29 +0000 (16:14 -0700)]
Merge branch 'for-davem' of git://gitorious.org/linux-can/linux-can-next

Marc Kleine-Budde says:

====================
this is a pull-request for net-next/master. It consists of two patches
by Fabio Estevam. Them first convert the flexcan driver to use
devm_ioremap_resource(), the second adds return value checking for
clk_prepare_enable().
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agobnx2x: Revising locking scheme for MAC configuration
Yuval Mintz [Thu, 1 Aug 2013 14:30:59 +0000 (17:30 +0300)]
bnx2x: Revising locking scheme for MAC configuration

On very rare occasions, repeated load/unload stress test in the presence of
our storage driver (bnx2i/bnx2fc) causes a kernel panic in bnx2x code
(NULL pointer dereference). Stack traces indicate the issue happens during MAC
configuration; thorough code review showed that indeed several races exist
in which one thread can iterate over the list of configured MACs while another
deletes entries from the same list.

This patch adds a varient on the single-writer/Multiple-reader lock mechanism -
It utilizes an already exsiting bottom-half lock, using it so that Whenever
a writer is unable to continue due to the existence of another writer/reader,
it pends its request for future deliverance.
The writer / last readers will check for the existence of such requests and
perform them instead of the original initiator.
This prevents the writer from having to sleep while waiting for the lock
to be accessible, which might cause deadlocks given the locks already
held by the writer.

Another result of this patch is that setting of Rx Mode is now made in
sleepable context - Setting of Rx Mode is made under a bottom-half lock, which
was always nontrivial for the bnx2x driver, as the HW/FW configuration requires
wait for completions.
Since sleep was impossible (due to the sleepless-context), various mechanisms
were utilized to prevent the calling thread from sleep, but the truth was that
when the caller thread (i.e, the one calling ndo_set_rx_mode()) returned, the
Rx mode was still not set in HW/FW.

bnx2x_set_rx_mode() will now overtly schedule for the Rx changes to be
configured by the sp_rtnl_task which hold the RTNL lock and is sleepable
context.

Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agobonding: fix system hang due to fast igmp timer rescheduling
Nikolay Aleksandrov [Thu, 1 Aug 2013 09:51:42 +0000 (11:51 +0200)]
bonding: fix system hang due to fast igmp timer rescheduling

After commit 4aa5dee4d9 ("net: convert resend IGMP to notifier event")
we try to acquire rtnl in bond_resend_igmp_join_requests but it can be
scheduled with rtnl already held (e.g. when bond_change_active_slave is
called with rtnl) causing a loop of immediate reschedules + calls because
rtnl_trylock fails each time since it's being already held.
For me this issue leads to system hangs very easy:
modprobe bonding; ifconfig bond0 up; ifenslave bond0 eth0; rmmod
bonding;

The fix is to introduce a small (1 jiffy) delay which is enough for the
sections holding rtnl to finish without putting any strain on the system.
Also adjust the timer in bond_change_active_slave to be 1 jiffy, since
most of the time it's called with rtnl already held.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agotile: support PTP using the tilegx mPIPE (IEEE 1588)
Chris Metcalf [Thu, 1 Aug 2013 15:36:42 +0000 (11:36 -0400)]
tile: support PTP using the tilegx mPIPE (IEEE 1588)

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agotile: remove deprecated NETIF_F_LLTX flag from tile drivers
Chris Metcalf [Thu, 1 Aug 2013 15:36:42 +0000 (11:36 -0400)]
tile: remove deprecated NETIF_F_LLTX flag from tile drivers

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agotile: make "tile_net.custom" a proper bool module parameter
Chris Metcalf [Thu, 1 Aug 2013 15:36:42 +0000 (11:36 -0400)]
tile: make "tile_net.custom" a proper bool module parameter

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agotile: support TSO for IPv6 in tilegx network driver
Chris Metcalf [Thu, 1 Aug 2013 15:36:42 +0000 (11:36 -0400)]
tile: support TSO for IPv6 in tilegx network driver

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agotile: support multiple mPIPE shims in tilegx network driver
Chris Metcalf [Thu, 1 Aug 2013 15:36:42 +0000 (11:36 -0400)]
tile: support multiple mPIPE shims in tilegx network driver

The initial driver support was for a single mPIPE shim on the chip
(as is the case for the Gx36 hardware).  The Gx72 chip has two mPIPE
shims, so we extend the driver to handle that case.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agotile: enable GRO in the tilegx network driver
Chris Metcalf [Thu, 1 Aug 2013 15:36:42 +0000 (11:36 -0400)]
tile: enable GRO in the tilegx network driver

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agotile: fix panic bug in napi support for tilegx network driver
Chris Metcalf [Thu, 1 Aug 2013 15:36:42 +0000 (11:36 -0400)]
tile: fix panic bug in napi support for tilegx network driver

The code used to call napi_disable() in an interrupt handler
(from smp_call_function), which in turn could call msleep().
Unfortunately you can't sleep in an interrupt context.

Luckily it turns out all the NAPI support functions are
just operating on data structures and not on any deeply
per-cpu data, so we can arrange to set up and tear down all
the NAPI state on the core driving the process, and just
do the IRQ enable/disable as a smp_call_function thing.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agotile: update dev->stats directly in tilegx network driver
Chris Metcalf [Thu, 1 Aug 2013 15:36:42 +0000 (11:36 -0400)]
tile: update dev->stats directly in tilegx network driver

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agotile: support jumbo frames in the tilegx network driver
Chris Metcalf [Thu, 1 Aug 2013 15:36:42 +0000 (11:36 -0400)]
tile: support jumbo frames in the tilegx network driver

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agotile: remove dead is_dup_ack() function from tilepro net driver
Chris Metcalf [Thu, 1 Aug 2013 15:36:42 +0000 (11:36 -0400)]
tile: remove dead is_dup_ack() function from tilepro net driver

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agotile: avoid bug in tilepro net driver built with old hypervisor
Chris Metcalf [Thu, 1 Aug 2013 15:36:42 +0000 (11:36 -0400)]
tile: avoid bug in tilepro net driver built with old hypervisor

Building against headers from an older Tilera hypervisor can cause
the frags[] array to be overrun.  Don't enable TSO in that case.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agotile: support rx_dropped/rx_errors in tilepro net driver
Chris Metcalf [Thu, 1 Aug 2013 15:36:42 +0000 (11:36 -0400)]
tile: support rx_dropped/rx_errors in tilepro net driver

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agotile: set hw_features and vlan_features in setup
Chris Metcalf [Thu, 1 Aug 2013 15:36:42 +0000 (11:36 -0400)]
tile: set hw_features and vlan_features in setup

This change allows the user to configure various features of the tile
networking drivers on and off.  There is no change to the default
initialization state of either the tilegx or tilepro drivers.

Neither driver needs the ndo_fix_features or ndo_set_features callbacks,
since the generic code already handles the dependencies for
fix_features, and there is no hardware state to tweak in set_features.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agogianfar: Remove unused field grp_id from gfar_priv_grp
Claudiu Manoil [Thu, 1 Aug 2013 07:07:05 +0000 (10:07 +0300)]
gianfar: Remove unused field grp_id from gfar_priv_grp

grp->grp_id is obsolete. It has no use in the current driver.
Remove it from gfar_priv_grp and put the 'rstat' member
in its place, in the 2nd cache line, as rstat needs fast access.

Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: add a temporary sanity check in skb_orphan()
Eric Dumazet [Thu, 1 Aug 2013 18:43:08 +0000 (11:43 -0700)]
net: add a temporary sanity check in skb_orphan()

David suggested to add a BUG_ON() to catch if some layer
sets skb->sk pointer without a corresponding destructor.

As skb can sit in a queue, it's mandatory to make sure the
socket cannot disappear, and it's usually done by taking a
reference on the socket, then releasing it from the skb
destructor.

This patch is a follow-up to commit c34a761231b5
("net: skb_orphan() changes") and will be reverted after
catching all possible offenders if any.

Suggested-by: David Miller <davem@davemloft.net>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agocan: flexcan: Check the return value from clk_prepare_enable()
Fabio Estevam [Mon, 22 Jul 2013 15:41:40 +0000 (12:41 -0300)]
can: flexcan: Check the return value from clk_prepare_enable()

clk_prepare_enable() may fail, so let's check its return value and propagate it
in the case of error.

Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
11 years agocan: flexcan: Use devm_ioremap_resource()
Fabio Estevam [Mon, 22 Jul 2013 15:41:39 +0000 (12:41 -0300)]
can: flexcan: Use devm_ioremap_resource()

Using devm_ioremap_resource() can make the code simpler and smaller.

Also, place alloc_candev() after of_match_device() to make error handling
easier.

Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
11 years agoipv6: fib6_rules should return exact return value
Hannes Frederic Sowa [Thu, 1 Aug 2013 06:54:47 +0000 (08:54 +0200)]
ipv6: fib6_rules should return exact return value

With the addition of the suppress operation
(7764a45a8f1fe74d4f7d301eaca2e558e7e2831a ("fib_rules: add .suppress
operation") we rely on accurate error reporting of the fib_rules.actions.

fib6_rule_action always returned -EAGAIN in case we could not find a
matching route and 0 if a rule was matched. This also included a match
for blackhole or prohibited rule actions which could get suppressed by
the new logic.

So adapt fib6_rule_action to always return the correct error code as
its counterpart fib4_rule_action does. This also fixes a possiblity of
nullptr-deref where we don't find a table, thus rt == NULL. Because
the condition rt != ip6_null_entry still holdes it seems we could later
get a nullptr bug on dereference rt->dst.

v2:
a) Fixed a brain fart in the commit msg (the rule => a table, etc). No
   changes to the patch.

Cc: Stefan Tomanek <stefan.tomanek@wertarbyte.de>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agocls_cgroup.h netprio_cgroup.h: Remove extern from function prototypes
Joe Perches [Thu, 1 Aug 2013 00:31:39 +0000 (17:31 -0700)]
cls_cgroup.h netprio_cgroup.h: Remove extern from function prototypes

There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agochecksum: Remove extern from function prototypes
Joe Perches [Thu, 1 Aug 2013 00:31:38 +0000 (17:31 -0700)]
checksum: Remove extern from function prototypes

There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agocfg80211.h/mac80211.h: Remove extern from function prototypes
Joe Perches [Thu, 1 Aug 2013 00:31:37 +0000 (17:31 -0700)]
cfg80211.h/mac80211.h: Remove extern from function prototypes

There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Reflow modified prototypes to 80 columns.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoax25.h: Remove extern from function prototypes
Joe Perches [Thu, 1 Aug 2013 00:31:36 +0000 (17:31 -0700)]
ax25.h: Remove extern from function prototypes

There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Reflow modified prototypes to 80 columns.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoarp/neighbour.h: Remove extern from function prototypes
Joe Perches [Thu, 1 Aug 2013 00:31:35 +0000 (17:31 -0700)]
arp/neighbour.h: Remove extern from function prototypes

There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Reflow modified prototypes to 80 columns.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoaf_rxrpc.h: Remove extern from function prototypes
Joe Perches [Thu, 1 Aug 2013 00:31:34 +0000 (17:31 -0700)]
af_rxrpc.h: Remove extern from function prototypes

There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Reflow modified prototypes to 80 columns.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoaf_unix.h: Remove extern from function prototypes
Joe Perches [Thu, 1 Aug 2013 00:31:33 +0000 (17:31 -0700)]
af_unix.h: Remove extern from function prototypes

There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Reflow modified prototypes to 80 columns.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoaddrconf.h: Remove extern function prototypes
Joe Perches [Thu, 1 Aug 2013 00:31:32 +0000 (17:31 -0700)]
addrconf.h: Remove extern function prototypes

There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Reflow modified prototypes to 80 columns.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoDocumentation: add networking/netdev-FAQ.txt
Paul Gortmaker [Wed, 31 Jul 2013 19:16:20 +0000 (15:16 -0400)]
Documentation: add networking/netdev-FAQ.txt

A collection of expectations and operational details about how
networking development takes place in the context of the netdev
mailing list.

The content is meant to capture specific items that are unique
to netdev workflow, and not re-document generic linux expectations
that are already captured elsewhere.

This was originally proposed[1] as a regular posting mailing list
FAQ, but it probably is more universally accessible here in tree.

[1] https://lwn.net/Articles/559211/

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agofib_rules: add .suppress operation
Stefan Tomanek [Thu, 1 Aug 2013 00:17:15 +0000 (02:17 +0200)]
fib_rules: add .suppress operation

This change adds a new operation to the fib_rules_ops struct; it allows the
suppression of routing decisions if certain criteria are not met by its
results.

The first implemented constraint is a minimum prefix length added to the
structures of routing rules. If a rule is added with a minimum prefix length
>0, only routes meeting this threshold will be considered. Any other (more
general) routing table entries will be ignored.

When configuring a system with multiple network uplinks and default routes, it
is often convinient to reference the main routing table multiple times - but
omitting the default route. Using this patch and a modified "ip" utility, this
can be achieved by using the following command sequence:

  $ ip route add table secuplink default via 10.42.23.1

  $ ip rule add pref 100            table main prefixlength 1
  $ ip rule add pref 150 fwmark 0xA table secuplink

With this setup, packets marked 0xA will be processed by the additional routing
table "secuplink", but only if no suitable route in the main routing table can
be found. By using a minimal prefixlength of 1, the default route (/0) of the
table "main" is hidden to packets processed by rule 100; packets traveling to
destinations with more specific routing entries are processed as usual.

Signed-off-by: Stefan Tomanek <stefan.tomanek@wertarbyte.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: Remove extern from include/net/ scheduling prototypes
Joe Perches [Wed, 31 Jul 2013 05:47:13 +0000 (22:47 -0700)]
net: Remove extern from include/net/ scheduling prototypes

There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Reflow modified prototypes to 80 columns.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: skb_orphan() changes
Eric Dumazet [Tue, 30 Jul 2013 23:11:15 +0000 (16:11 -0700)]
net: skb_orphan() changes

It is illegal to set skb->sk without corresponding destructor.

Its therefore safe for skb_orphan() to not clear skb->sk if
skb->destructor is not set.

Also avoid clearing skb->destructor if already NULL.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonetem: Introduce skb_orphan_partial() helper
Eric Dumazet [Wed, 31 Jul 2013 00:55:08 +0000 (17:55 -0700)]
netem: Introduce skb_orphan_partial() helper

Commit 547669d483e578 ("tcp: xps: fix reordering issues") added
unexpected reorders in case netem is used in a MQ setup for high
performance test bed.

ETH=eth0
tc qd del dev $ETH root 2>/dev/null
tc qd add dev $ETH root handle 1: mq
for i in `seq 1 32`
do
 tc qd add dev $ETH parent 1:$i netem delay 100ms
done

As all tcp packets are orphaned by netem, TCP stack believes it can
set skb->ooo_okay on all packets.

In order to allow producers to send more packets, we want to
keep sk_wmem_alloc from reaching sk_sndbuf limit.

We can do that by accounting one byte per skb in netem queues,
so that TCP stack is not fooled too much.

Tested:

With above MQ/netem setup, scaling number of concurrent flows gives
linear results and no reorders/retransmits

lpq83:~# for n in 1 10 20 30 40 50 60 70 80 90 100
 do echo -n "n:$n " ; ./super_netperf $n -H 10.7.7.84; done
n:1 198.46
n:10 2002.69
n:20 4000.98
n:30 6006.35
n:40 8020.93
n:50 10032.3
n:60 12081.9
n:70 13971.3
n:80 16009.7
n:90 17117.3
n:100 17425.5

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: split rt_genid for ipv4 and ipv6
fan.du [Tue, 30 Jul 2013 00:33:53 +0000 (08:33 +0800)]
net: split rt_genid for ipv4 and ipv6

Current net name space has only one genid for both IPv4 and IPv6, it has below
drawbacks:

- Add/delete an IPv4 address will invalidate all IPv6 routing table entries.
- Insert/remove XFRM policy will also invalidate both IPv4/IPv6 routing table
  entries even when the policy is only applied for one address family.

Thus, this patch attempt to split one genid for two to cater for IPv4 and IPv6
separately in a fine granularity.

Signed-off-by: Fan Du <fan.du@windriver.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agosh_eth: r8a7790: Handle the RFE (Receive FIFO overflow Error) interrupt
Laurent Pinchart [Wed, 31 Jul 2013 07:42:11 +0000 (16:42 +0900)]
sh_eth: r8a7790: Handle the RFE (Receive FIFO overflow Error) interrupt

The RFE interrupt is enabled for the r8a7790 but isn't handled,
resulting in the interrupts core noticing unhandled interrupts, and
eventually disabling the ethernet IRQ.

Fix it by adding RFE to the bitmask of error interrupts to be handled
for r8a7790.

Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net...
David S. Miller [Wed, 31 Jul 2013 20:37:47 +0000 (13:37 -0700)]
Merge branch 'master' of git://git./linux/kernel/git/jkirsher/net-next

Jeff Kirsher says:

====================
This series contains updates to ixgbe and pci.

The first patch for ixgbe from Greg Rose is the second submission.  The
first submission of "ixgbe: Retain VLAN filtering in promiscuous + VT
mode" had a typo, which Joe Perches pointed out and is fixed in this
submission.

Alex updates the ixgbe driver to use the generic helper pci_vfs_assigned
instead of the driver specific function ixgbe_vfs_are_assigned.

Don Skidmore provides 4 patches for ixgbe, the first being a fix for
flow control ethtool reporting.  Originally ixgbe_device_supports_autoneg_fc()
was expected to be called by only copper devices, which lead to false
information being displayed via ethtool.  Two other patches add support
for fixed fiber for SFP+ devices and the addition of a quad-port x520
adapter.  The last patch simply bumps the driver version.

Emil Tantilov provides 3 fixes for ixgbe, two of which resolve
semaphore lock issues.  The third fix resolves several issues in the
previous implementation of the SFF data dumps of SFP+ modules.

The remaining ixgbe and pci patches are from Jacob Keller.  The pci
patches exposes bus speed, link speed and bus width so that drivers
can take advantage of this information.  In addition, adds a pci function
which obtains minimum link width and speed.  Jacob also provides the
ixgbe patch to incorporate the pci function. He provides a patch that
fixes a lockdep issue created due to ixgbe_ptp_stop always running
cancel_work_sync even if the work item had not been created properly with
INIT_WORK. This issue was found and reported by Stephen Hemminger.

-v2-
* fix patch 3 to be a bool function based on David Miller's feedback
* fix patch 4 debug message based on David Miller's feedback
* fix patch 8 moved the extern declarations to pci.h based on Bjorn
  Helgaas's feedback
* fix patch 11 update the error message to include encoding loss based
* fix patch 8/9/10 title based on Bjorn's feedback
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agotcp: Remove unused tcpct declarations and comments
Dmitry Popov [Wed, 31 Jul 2013 09:39:45 +0000 (13:39 +0400)]
tcp: Remove unused tcpct declarations and comments

Remove declaration, 4 defines and confusing comment that are no longer used
since 1a2c6181c4 ("tcp: Remove TCPCT").

Signed-off-by: Dmitry Popov <dp@highloadlab.com>
Acked-by: Christoph Paasch <christoph.paasch@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoixgbe: add support for quad-port x520 adapter
Don Skidmore [Sat, 27 Jul 2013 06:25:38 +0000 (06:25 +0000)]
ixgbe: add support for quad-port x520 adapter

This is a x520 based quad-port (4x10Gbps) NIC with a single QSFP+
connector.  Changes were required to our identify functions due to
different eeprom address which is also included here.

Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoixgbe: clear semaphore bits on timeouts
Emil Tantilov [Tue, 23 Jul 2013 01:57:03 +0000 (01:57 +0000)]
ixgbe: clear semaphore bits on timeouts

This patch changes the error code path in ixgbe_acquire_swfw_sync() to deal
with cases where acquiring SW semaphore times out.

In cases where the SW/FW semaphore bits were set (i.e. due to a crash) the
driver will hang on load. With this patch the driver will clear
the stuck bits if the semaphore was not acquired in the allotted time.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoixgbe: rename LL_EXTENDED_STATS to use queue instead of q
Jacob Keller [Tue, 16 Jul 2013 07:57:46 +0000 (07:57 +0000)]
ixgbe: rename LL_EXTENDED_STATS to use queue instead of q

This patch renames the stats introduced by the busy poll feature so that they
are more inline with the current statistics naming schemes.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoixgbe: fix lockdep annotation issue for ptp's work item
Jacob Keller [Fri, 21 Jun 2013 08:14:32 +0000 (08:14 +0000)]
ixgbe: fix lockdep annotation issue for ptp's work item

This patch fixes a lockdep issue created due to ixgbe_ptp_stop always running
cancel_work_sync even if the work item had not been created properly with
INIT_WORK. This is caused because ixgbe_ptp_stop did not check to actually
ensure PTP was running first. The new implementation introduces a state in the
&adapter->state field which is used to indicate that PTP is running. (This
replaces the IXGBE_FLAG2_PTP_ENABLED field). This state will use the atomic
set_bit, test_bit, and test_and_clear_bit functions. ixgbe_ptp_stop will check
to ensure that PTP was enabled, (and if not, it will not attempt to do any
cleanup work from ixgbe_ptp_init). This resolves the lockdep annotation warning
found by Stephen Hemminger

Reported-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoixgbe: call pcie_get_mimimum_link to check if device has enough bandwidth
Jacob Keller [Wed, 31 Jul 2013 06:53:31 +0000 (06:53 +0000)]
ixgbe: call pcie_get_mimimum_link to check if device has enough bandwidth

This patch uses the new pcie_get_minimum_link function to perform a check to
ensure that the adapter is hooked into a slot which is capable of providing the
necessary bandwidth. This check supersedes the original method which only
checked the current pci device. The new method is capable of determining the
minimum speed and link of an entire PCI chain.

-v2-
* update the error message to include encoding loss

CC: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoPCI: Add function to obtain minimum link width and speed
Jacob Keller [Wed, 31 Jul 2013 06:53:26 +0000 (06:53 +0000)]
PCI: Add function to obtain minimum link width and speed

A PCI Express device can potentially report a link width and speed which it will
not properly fulfill due to being plugged into a slower link higher in the
chain. This function walks up the PCI bus chain and calculates the minimum link
width and speed of this entire chain. This can be useful to enable a device to
determine if it has enough bandwidth for optimum functionality.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agonet: remove an unneeded check
Dan Carpenter [Mon, 29 Jul 2013 19:15:19 +0000 (22:15 +0300)]
net: remove an unneeded check

"ifa->ifa_label" is an array inside the in_ifaddr struct.  It can never
be NULL so we can remove this check.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoflow_dissector: add support for IPPROTO_IPV6
Tom Herbert [Mon, 29 Jul 2013 18:07:42 +0000 (11:07 -0700)]
flow_dissector: add support for IPPROTO_IPV6

Support IPPROTO_IPV6 similar to IPPROTO_IPIP

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoflow_dissector: clean up IPIP case
Tom Herbert [Mon, 29 Jul 2013 18:07:36 +0000 (11:07 -0700)]
flow_dissector: clean up IPIP case

Explicitly set proto to ETH_P_IP and jump directly to ip processing.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoPCI: move enum pcie_link_width into pci.h
Jacob Keller [Wed, 31 Jul 2013 06:53:21 +0000 (06:53 +0000)]
PCI: move enum pcie_link_width into pci.h

pcie_link_width is the enum used to define the link width values for a pcie
device. This enum should not be contained solely in pci_hotplug.h, and this
patch moves it next to pci_bus_speed in pci.h

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoPCI: expose pcie_link_speed and pcix_bus_speed arrays
Jacob Keller [Wed, 31 Jul 2013 06:53:16 +0000 (06:53 +0000)]
PCI: expose pcie_link_speed and pcix_bus_speed arrays

pcie_link_speed and pcix_bus_speed are arrays used by probe.c to correctly
convert lnksta register values into the pci_bus_speed enum. These static arrays
are useful outside probe for this purpose. This patch makes these defines into
conist arrays and exposes them with an extern header in drivers/pci/pci.h

-v2-
* move extern declarations to drivers/pci/pci.h

CC: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoixgbe: fix SFF data dumps of SFP+ modules
Emil Tantilov [Wed, 29 May 2013 06:23:10 +0000 (06:23 +0000)]
ixgbe: fix SFF data dumps of SFP+ modules

This patch fixes several issues with the previous implementation of the
SFF data dump of SFP+ modules:

- removed the __IXGBE_READ_I2C flag - I2C access locking is handled in the
  HW specific routines

- fixed the read loop to read data from ee->offset to ee->len

- the reads fail if __IXGBE_IN_SFP_INIT is set in the process - this is
  needed because on some HW I2C operations can take long time and disrupt
  the SFP and link detection process

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Reported-by: Ben Hutchings <bhutchings@solarflare.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoixgbe: fix semaphore lock for I2C read/writes on 82598
Emil Tantilov [Wed, 29 May 2013 06:23:05 +0000 (06:23 +0000)]
ixgbe: fix semaphore lock for I2C read/writes on 82598

ixgbe_read/write_i2c_phy_82598() does not hold the SWFW_SYNC
semaphore for the entire function. Instead the lock is held only
during the phy.ops.read/write_reg operations. As result when the
function is being called simultaneously the I2C read/writes can
be corrupted.

The following patch introduces the SWFW_SYNC semaphore for the
entire ixgbe_read/write_i2c_phy_82598() function. To accomplish
this I had to create 2 separate functions:

ixgbe_read_phy_reg_mdi()
ixgbe_write_phy_reg_mdi()

Those functions are identical to ixgbe_read/write_phy_reg_generic()
sans the locking, and can be used in ixgbe_read/write_i2c_phy_82598()
with the SWFW_SYNC semaphore being held.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoixgbe: bump version number
Don Skidmore [Wed, 15 May 2013 07:34:50 +0000 (07:34 +0000)]
ixgbe: bump version number

Bump the version number to better match with a similar version of the
out of tree driver.

Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoixgbe: add new media type.
Don Skidmore [Wed, 31 Jul 2013 02:17:40 +0000 (02:17 +0000)]
ixgbe: add new media type.

This patch adds support for a new media type fiber_fixed.  This is useful
to avoid all the SFP+ hot plug support path on devices who's fix fiber need
not worry about such things.  This patch is needed for a following patch
that adds support for "fiber_fixed" devices.

v2: cleaned up logging message based on feedback from David Miller

Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoMerge branch 'phys_port'
David S. Miller [Wed, 31 Jul 2013 00:32:12 +0000 (17:32 -0700)]
Merge branch 'phys_port'

Jiri Pirko says:

====================
This patchset is based on patch by Narendra_K@Dell.com
Once device which can change phys port id during its lifetime adopts this,
NETDEV_CHANGEPHYSPORTID event will be added and driver will call
call_netdevice_notifiers(NETDEV_NETDEV_CHANGEPHYSPORTID, dev) to propagate
the change to userspace.

v1->v2: as suggested by Ben, handle -EOPNOTSUPP in rtnl code (wrapped up ndo call)
v2->v3: adjusted patch 1 commit message
v3->v4: used "%phN" for sysfs printf as suggested by DaveM
        added igb/igbvf implementation as requested by Or Gerlitz
v4->v5: used prandom_u32 to generate id in igb_probe
        removed duplicate code in ibgvf_probe
        pushed dev_err string into one line in igbvf_refresh_ppid
v5->v6: use uuid_le_gen for generating 16-byte phys port id for igb/igbvf
as suggested by BenH

1) Why do we need this, and why do existing facilities fail to provide
   a way to accomplish this?

Currenty there's very hard to tell if two netdevs are using the same physical
port. For sr-iov this can be get by sysfs. For other mechanisms, like NPAR
there's very hard to do it (one must learn it from NIC BIOS). But even for
sr-iov there's no way to say if two netdevs are using the same phys port when
these are passed through to virtual guests.

This patchset provides the generic way of letting this information know to
userspace. This info can be used by apps like NetworkManager, teamd, Wicked,
ovs daemon, etc, to do smarter bonding decisions.

2) Why is the physical port ID defined as a 32 byte opaque cookie?
   What formats and layouts need to be accomodated, and which
   influenced the design of the ID?

For user to distinguish if two netdevs are using the same port, he only needs
to compare their phys port ids. Nothing else is needed. This id has no
structure for security reasons. VF should not know anything about PF.

3) Are IDs globally unique?  Why or why not?  If IDs should be
   globally unique, but only in certain cases, what exactly are those
   cases.

Most of the time only uniqueness needed is in scope of single machine.
There might be case when the id should be unique between couple of machines
in virtualization environment. Given that for example for igb/igbvf 16B uuid
is used, there is no problem for this case as well. But each driver can
implement this differently focusing the hw capabilities and needs.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: export physical port id via sysfs
Jiri Pirko [Mon, 29 Jul 2013 16:16:51 +0000 (18:16 +0200)]
net: export physical port id via sysfs

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Narendra K <narendra_k@dell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agortnl: export physical port id via RT netlink
Jiri Pirko [Mon, 29 Jul 2013 16:16:50 +0000 (18:16 +0200)]
rtnl: export physical port id via RT netlink

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Narendra K <narendra_k@dell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: add ndo to get id of physical port of the device
Jiri Pirko [Mon, 29 Jul 2013 16:16:49 +0000 (18:16 +0200)]
net: add ndo to get id of physical port of the device

This patch adds a ndo for getting physical port of the device. Driver
which is aware of being virtual function of some physical port should
implement this ndo. This is applicable not only for IOV, but for other
solutions (NPAR, multichannel) as well. Basically if there is possible
to have multiple netdevs on the single hw port.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoixgbe: fix fc autoneg ethtool reporting.
Don Skidmore [Wed, 31 Jul 2013 02:19:24 +0000 (02:19 +0000)]
ixgbe: fix fc autoneg ethtool reporting.

Originally ixgbe_device_supports_autoneg_fc() was only expected to
be called by copper devices.  This would lead to false information
to be displayed via ethtool.

v2: changed ixgbe_device_supports_autoneg_fc() to a bool function,
    it returns bool.  Based on feedback from David Miller

Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoixgbe: Use pci_vfs_assigned instead of ixgbe_vfs_are_assigned
Alexander Duyck [Tue, 26 Mar 2013 00:03:21 +0000 (00:03 +0000)]
ixgbe: Use pci_vfs_assigned instead of ixgbe_vfs_are_assigned

This change makes it so that the ixgbe driver uses the generic helper
pci_vfs_assigned instead of the ixgbe specific function
ixgbe_vfs_are_assigned.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoixgbe: Retain VLAN filtering in promiscuous + VT mode
Greg Rose [Fri, 22 Feb 2013 02:14:39 +0000 (02:14 +0000)]
ixgbe: Retain VLAN filtering in promiscuous + VT mode

When using the new bridge FDB interface to allow SR-IOV virtual function
network devices to communicate with SW bridged network devices the
physical function is placed into promiscuous mode and hardware VLAN
filtering is disabled.  This defeats the ability to use VLAN tagging
to isolate user networks.  When the device is in promiscuous mode and
VT mode simultaneously ensure that VLAN hardware filtering remains
enabled.

Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agonet: mvneta: support big endian
Thomas Petazzoni [Mon, 29 Jul 2013 13:21:28 +0000 (15:21 +0200)]
net: mvneta: support big endian

Use the "swap descriptor" feature of the hardware to properly swap the
descriptors when running in big endian mode. Since the swapping occurs
on 64 bits words, we also need to provide a separate structure layout
for the DMA descriptors between little endian and big endian mode,
like is done in the mv643xx_eth driver.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: mvneta: move the RX and TX desc macros outside of the structs
Thomas Petazzoni [Mon, 29 Jul 2013 13:21:27 +0000 (15:21 +0200)]
net: mvneta: move the RX and TX desc macros outside of the structs

The macros used for the various fields of the RX and TX descriptions
are currently declared next to those fields within the structure
definitions of the RX and TX descriptors.

However, in order to support big endian, we'll have to use the "swap
descriptors" features of the hardware, which swaps every byte within
each 64 bits word of the descriptors. This requires a separate
definition of the RX and TX descriptor structures for little and big
endian, as is done in the mv643xx_eth. Those macros can therefore no
longer be defined inside those structures.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agopktgen: Require CONFIG_INET due to use of IPv4 checksum function
Thomas Graf [Mon, 29 Jul 2013 11:44:15 +0000 (13:44 +0200)]
pktgen: Require CONFIG_INET due to use of IPv4 checksum function

Unlike for IPv6, the IPv4 checksum functions are only available
if CONFIG_INET is set.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agotcp: add tcp_syncookies mode to allow unconditionally generation of syncookies
Hannes Frederic Sowa [Fri, 26 Jul 2013 15:43:23 +0000 (17:43 +0200)]
tcp: add tcp_syncookies mode to allow unconditionally generation of syncookies

| If you want to test which effects syncookies have to your
| network connections you can set this knob to 2 to enable
| unconditionally generation of syncookies.

Original idea and first implementation by Eric Dumazet.

Cc: Florian Westphal <fw@strlen.de>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agodrivers: net: cpsw: Add support for set MAC address
Mugunthan V N [Thu, 25 Jul 2013 18:14:01 +0000 (23:44 +0530)]
drivers: net: cpsw: Add support for set MAC address

Adding support for setting MAC address to cpsw device via ndo_set_mac_address

Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agotile: handle 64-bit statistics in tilepro network driver
Chris Metcalf [Thu, 25 Jul 2013 16:41:15 +0000 (12:41 -0400)]
tile: handle 64-bit statistics in tilepro network driver

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years ago9p: client: remove unused code and any reference to "cancelled" function
Andi Shyti [Thu, 25 Jul 2013 08:54:24 +0000 (10:54 +0200)]
9p: client: remove unused code and any reference to "cancelled" function

This patch reverts commit

80b45261a0b263536b043c5ccfc4ba4fc27c2acc

which was implementing a 'cancelled' functionality to notify that
a cancelled request will not be replied.

This implementation was not used anywhere and therefore removed.

Signed-off-by: Andi Shyti <andi@etezian.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agobe2net: don't use dev_err when AER enabling fails
Ivan Vecera [Thu, 25 Jul 2013 14:10:55 +0000 (16:10 +0200)]
be2net: don't use dev_err when AER enabling fails

The driver uses dev_err when enabling of AER fails (e.g. PCIe AER is not
supported). The dev_info is more appropriate to avoid console pollution.

Cc: sathya.perla@emulex.com
Cc: subbu.seetharaman@emulex.com
Cc: ajit.khaparde@emulex.com
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agotg3: Update version to 3.133
Nithin Sujir [Mon, 29 Jul 2013 20:58:40 +0000 (13:58 -0700)]
tg3: Update version to 3.133

Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agotg3: Fix UDP fragments treated as RMCP
Nithin Sujir [Mon, 29 Jul 2013 20:58:39 +0000 (13:58 -0700)]
tg3: Fix UDP fragments treated as RMCP

The 5762 devices sometimes incorrectly treat udp fragments as RMCP
packets and route to the APE. This patch sets the RX_MODE_IPV4_FRAG_FIX
bit for these devices which enables the proper behaviour.

Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agotg3: Enable support for timesync gpio output
Nithin Sujir [Mon, 29 Jul 2013 20:58:38 +0000 (13:58 -0700)]
tg3: Enable support for timesync gpio output

The PTP_CAPABLE tg3 devices have a gpio output that is toggled when the
free running counter matches a watchdog value. This patch adds support
to set the watchdog and enable this feature.

Since the output is controlled via bits in the EAV_REF_CLCK_CTL
register, we have to read-modify-write it when we stop/resume.

Cc: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agotg3: Implement the shutdown handler
Nithin Sujir [Mon, 29 Jul 2013 20:58:37 +0000 (13:58 -0700)]
tg3: Implement the shutdown handler

Also remove the call to tg3_power_down_prepare() in tg3_power_down()
since tg3_close() calls it.

Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agotg3: Allow NVRAM programming when interface is down
Nithin Sujir [Mon, 29 Jul 2013 20:58:36 +0000 (13:58 -0700)]
tg3: Allow NVRAM programming when interface is down

Previously, when the interface was brought down, the driver would set
the power state to D3hot. In D3hot, we don't have access to the NVRAM.
This patch removes the call to set the power state to PCI_D3hot in
close. A following patch will implement the shutdown handler to properly
set the D3hot state when the system is going down.

Doing the above means that the TG3_PHYFLG_IS_LOW_POWER should not be
checked to validate access to the NVRAM.

Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agotg3: Remove incorrect switch to aux power
Nithin Sujir [Mon, 29 Jul 2013 20:58:35 +0000 (13:58 -0700)]
tg3: Remove incorrect switch to aux power

During probe, the driver is incorrectly switching the power to Vaux on
the 5717 and later devices. At this point, we are in D0 state and
drawing maximum power. We also definitely have Vmain available. It
doesn't make sense to switch to Vaux since it has a lesser maximum power
draw and we might go over the limit. On a new system, we observe that
not all ports are recognized in some of the slots with this call in
place.

Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agocnic: Update version to 2.5.17 and copyright year.
Michael Chan [Mon, 29 Jul 2013 02:04:00 +0000 (19:04 -0700)]
cnic: Update version to 2.5.17 and copyright year.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agocnic: Add missing error checking for RAMROD_CMD_ID_CLOSE
Eddie Wai [Mon, 29 Jul 2013 02:03:59 +0000 (19:03 -0700)]
cnic: Add missing error checking for RAMROD_CMD_ID_CLOSE

Completion status field should also be checked for non-zero error
condition.

Signed-off-by: Eddie Wai <eddie.wai@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agocnic: Update TCP options setup for iSCSI.
Eddie Wai [Mon, 29 Jul 2013 02:03:58 +0000 (19:03 -0700)]
cnic: Update TCP options setup for iSCSI.

Update TCP delayed ACK and timestamp options setup to match latest bnx2x
firmware.

Signed-off-by: Eddie Wai <eddie.wai@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agocnic: Reset tcp_flags during cnic_cm_create().
Eddie Wai [Mon, 29 Jul 2013 02:03:57 +0000 (19:03 -0700)]
cnic: Reset tcp_flags during cnic_cm_create().

Without resetting it, the bnx2i driver cannot use different options for
different iSCSI connections.

Signed-off-by: Eddie Wai <eddie.wai@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agocnic: Simplify cnic_release().
Michael Chan [Mon, 29 Jul 2013 02:03:56 +0000 (19:03 -0700)]
cnic: Simplify cnic_release().

Since unregister_netdevice_notifier() will replay the NETDEV_DOWN and
NETDEV_UNREGISTER_EVENTS, the cnic_dev_list will be cleaned up automatically.
The loop to cleanup the cnic_dev_list can be removed in cnic_release().

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agocnic: Simplify netdev events handling.
Michael Chan [Mon, 29 Jul 2013 02:03:55 +0000 (19:03 -0700)]
cnic: Simplify netdev events handling.

After this earlier commit to simplify probing:

    commit 4bd9b0fffb193d2e288f67f81821af32df8d4349
    cnic, bnx2x, bnx2: Simplify cnic probing.

we can now reliably receive netdev events and we can simplify the handling
of these events.  We now remove the logic that tries to handle missed
NETDEV_REGISTER events.

This change will allow cleanup to be simplified in the next patch.  We can
now rely on the play back of netdev events during
unregister_netdevice_notifier() to cleanup the structures.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet/mlx4_core: Respond to operation request by firmware
Yevgeny Petrilin [Sun, 28 Jul 2013 15:54:21 +0000 (18:54 +0300)]
net/mlx4_core: Respond to operation request by firmware

This commit adds new firmware command and new firmware event.  The firmware
raises the MLX4_EVENT_TYPE_OP_REQUIRED event in order to signal the driver it
needs to perform an administrative operation throughout the MLX4_CMD_GET_OP_REQ
command. At the moment the supported operation is adding/removing multicast
entries which are used by the firmware for handling NCSI traffic in B0
steering mode.

Also, had to swap the order of mlx4_init_mcg_table() and
mlx4_init_eq_table() to make sure that driver will get events only after
resources are initialized to handle it.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.com>
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet/mlx4_en: Fix BlueFlame race
Eugenia Emantayev [Thu, 25 Jul 2013 16:21:23 +0000 (19:21 +0300)]
net/mlx4_en: Fix BlueFlame race

Fix a race between BlueFlame flow and stamping in post send flow.
Example:
SW: Build WQE 0 on the TX buffer, except the ownership bit
SW: Set ownership for WQE 0 on the TX buffer
SW: Ring doorbell for WQE 0
SW: Build WQE 1 on the TX buffer, except the ownership bit
SW: Set ownership for WQE 1 on the TX buffer
HW: Read WQE 0 and then WQE 1, before doorbell was rung/BF was done for WQE 1
HW: Produce CQEs for WQE 0 and WQE 1
SW: Process the CQEs, and stamp WQE 0 and WQE 1 accordingly (on the TX buffer)
SW: Copy WQE 1 from the TX buffer to the BF register - ALREADY STAMPED!
HW: CQE error with index 0xFFFF  - the BF WQE's control segment is STAMPED,
so the BF index is 0xFFFF. Error: Invalid Opcode.
As a result QP enters the error state and no traffic can be sent.

Solution:
When stamping - do not stamp last completed wqe.

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agopktgen: add needed include file
Stephen Rothwell [Mon, 29 Jul 2013 06:21:26 +0000 (16:21 +1000)]
pktgen: add needed include file

Fixes this on PowerPC (at least):

net/core/pktgen.c: In function 'fill_packet_ipv6':
net/core/pktgen.c:2906:3: error: implicit declaration of function 'csum_ipv6_magic' [-Werror=implicit-function-declaration]
   udph->check = ~csum_ipv6_magic(&iph->saddr, &iph->daddr, udplen, IPPROTO_UDP, 0);
   ^

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net...
David S. Miller [Sun, 28 Jul 2013 20:18:49 +0000 (13:18 -0700)]
Merge branch 'master' of git://git./linux/kernel/git/jkirsher/net-next

Jeff Kirsher says:

====================
This series contains updates to e100 and e1000e.

The e100 patch from Andy simply updates the netif_printk() to use
%*ph to dump small buffers.

The changes to e1000e include a fix from Dean Nelson to resolve a
issue where a pci_clear_master() was accidentally dropped during a
conflict resolution. Wei Young provides 2 patches, one removes an
assignment of the default ring size because it was a duplicate. The
second changes the packet split receive structure to use
PS_PAGE_BUFFERS macro for the length so that problems won't occur
when the length is changed.

The remaining patches for e1000e are from Bruce Allan, where he
provides a number of fixes and updates for I218.  In addition, a
fix for 82583 which can disappear off the PCIe bus, to resolve this,
disable ASPM L1.  Bruce also provides a fix to a previous commit
(commit e60b22c5b7 e1000e: fix accessing to suspended device) so that
devices are only taken out of runtime power management for those
ethtool operations that must access device registers.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoipv4, ipv6: send igmpv3/mld packets with TC_PRIO_CONTROL
Hannes Frederic Sowa [Fri, 26 Jul 2013 15:05:16 +0000 (17:05 +0200)]
ipv4, ipv6: send igmpv3/mld packets with TC_PRIO_CONTROL

v2:
a) Also send ipv4 igmp messages with TC_PRIO_CONTROL

Cc: William Manley <william.manley@youview.com>
Cc: Lukas Tribus <luky-37@hotmail.com>
Acked-by: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoe1000e: fix I217/I218 PHY initialization flow
Bruce Allan [Sat, 29 Jun 2013 07:42:39 +0000 (07:42 +0000)]
e1000e: fix I217/I218 PHY initialization flow

The initialization of the PHY on I217/I218, while similar to 82579, must
also check to see if the MAC and PHY are in the same mode (PCIe vs. SMBus)
otherwise the PHY will be inaccessible by the MAC.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoe1000e: do not resume device from RPM suspend to read PHY status registers
Bruce Allan [Sat, 29 Jun 2013 07:42:25 +0000 (07:42 +0000)]
e1000e: do not resume device from RPM suspend to read PHY status registers

When the device is runtime suspended (e.g. when there is no link), do not
wake it from D3 to read the PHY status; just set the values to typical
power-on defaults as is done when runtime PM is not enabled and there is no
link.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoe1000e: enable support for new device IDs
Bruce Allan [Sat, 29 Jun 2013 01:15:16 +0000 (01:15 +0000)]
e1000e: enable support for new device IDs

The device IDs 0x15a0 and 0x15a1 are new SKUs that contain the same MAC as
I217 and same PHY as I218.

The device IDs 0x15a2 and 0x15a3 are the same as existing I218 SKUs.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoe1000e: ethtool unnecessarily takes device out of RPM suspend
Bruce Allan [Thu, 27 Jun 2013 02:44:44 +0000 (02:44 +0000)]
e1000e: ethtool unnecessarily takes device out of RPM suspend

A previous patch (commit e60b22c5b7 e1000e: fix accessing to suspended
device) added .begin and .complete ethtool driver callbacks so that the
device was resumed from Runtime Power Management (RPM) suspend state for
all ethtool operations.  This is overkill for operations which do not need
to access any registers in the device.  This patch makes it so that the
device is taken out of RPM suspend only for those ethtool operations that
must access device registers.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoe1000e: Tx hang on I218 when linked at 100Half and slow response at 10Mbps
Bruce Allan [Fri, 21 Jun 2013 09:07:13 +0000 (09:07 +0000)]
e1000e: Tx hang on I218 when linked at 100Half and slow response at 10Mbps

Tx hang is an unintended consequence of another workaround that is in the
EEPROM for an issue with the firmware at 10Mbps when K1 (a power mode of
the MAC-PHY interconnect) is enabled.  The issue is resolved by setting
appropriate Tx re-transmission timeouts in the PHY and associated K1 entry
times in the MAC to allow enough transmissions to occur without triggering
a Tx hang.  A similar change is needed when linked at 10Mbps to improve
latency.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoe1000e: low throughput using 4K jumbos on I218
Bruce Allan [Fri, 21 Jun 2013 09:07:07 +0000 (09:07 +0000)]
e1000e: low throughput using 4K jumbos on I218

Alter the packet buffer allocation accordingly.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoe1000e: iAMT connections drop on driver unload when jumbo frames enabled
Bruce Allan [Fri, 21 Jun 2013 09:07:02 +0000 (09:07 +0000)]
e1000e: iAMT connections drop on driver unload when jumbo frames enabled

The jumbo frame configuration in the MAC/PHY should be reverted on 82579
and newer parts when the interface is brought down (not just when the MTU
is changed back to standard frame size) otherwise iAMT connections (e.g.
SoL, IDE-R) will be dropped and cannot be re-acquired until the MTU is
changed again.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoe1000e: disable ASPM L1 on 82583
Bruce Allan [Fri, 21 Jun 2013 09:07:18 +0000 (09:07 +0000)]
e1000e: disable ASPM L1 on 82583

The 82583 can disappear off the PCIe bus.  This device is a modified 82574
which had the same problem which was fixed by disabling ASPM L1; disabling
it on 82583 fixes the issue on this device.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoe1000e: Use marco instead of digit for defining e1000_rx_desc_packet_split
Wei Yang [Sat, 25 May 2013 06:23:45 +0000 (06:23 +0000)]
e1000e: Use marco instead of digit for defining e1000_rx_desc_packet_split

In structure e1000_rx_desc_packet_split, the size of wb.upper.length is
defined by a digit. This may introduce some problem when the length is
changed.

This patch use the macro PS_PAGE_BUFFERS for the definition. And move the
definition to hw.h.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>