GitHub/exynos8895/android_kernel_samsung_universal8895.git
14 years agobonding: Fix deadlock in bonding driver resulting from internal locking when using...
Neil Horman [Wed, 13 Oct 2010 16:01:50 +0000 (16:01 +0000)]
bonding: Fix deadlock in bonding driver resulting from internal locking when using netpoll

The monitoring paths in the bonding driver take write locks that are shared by
the tx path.  If netconsole is in use, these paths can call printk which puts us
in the netpoll tx path, which, if netconsole is attached to the bonding driver,
result in deadlock (the xmit_lock guards are useless in netpoll_send_skb, as the
monitor paths in the bonding driver don't claim the xmit_lock, nor should they).
The solution is to use a per cpu flag internal to the driver to indicate when a
cpu is holding the lock in a path that might recusrse into the tx path for the
driver via netconsole.  By checking this flag on transmit, we can defer the
sending of the netconsole frames until a later time using the retransmit feature
of netpoll_send_skb that is triggered on the return code NETDEV_TX_BUSY.  I've
tested this and am able to transmit via netconsole while causing failover
conditions on the bond slave links.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agobonding: Fix bonding drivers improper modification of netpoll structure
Neil Horman [Wed, 13 Oct 2010 16:01:49 +0000 (16:01 +0000)]
bonding: Fix bonding drivers improper modification of netpoll structure

The bonding driver currently modifies the netpoll structure in its xmit path
while sending frames from netpoll.  This is racy, as other cpus can access the
netpoll structure in parallel. Since the bonding driver points np->dev to a
slave device, other cpus can inadvertently attempt to send data directly to
slave devices, leading to improper locking with the bonding master, lost frames,
and deadlocks.  This patch fixes that up.

This patch also removes the real_dev pointer from the netpoll structure as that
data is really only used by bonding in the poll_controller, and we can emulate
its behavior by check each slave for IS_UP.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoe1000e: Fix for offline diag test failure at first call
Carolyn Wyborny [Fri, 15 Oct 2010 17:35:31 +0000 (17:35 +0000)]
e1000e: Fix for offline diag test failure at first call

Move link test call to later in the offline sequence, move the
restore settings block to afterwards and add another reset to ensure
the hardware is in a known state afterwards.

Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Acked-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoigbvf: Remove unneeded pm_qos* calls
Greg Rose [Fri, 15 Oct 2010 17:26:47 +0000 (17:26 +0000)]
igbvf: Remove unneeded pm_qos* calls

Power Management Quality of Service is not supported or used by the VF
driver.

Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoigb: fix stats handling
Eric Dumazet [Fri, 15 Oct 2010 17:27:10 +0000 (17:27 +0000)]
igb: fix stats handling

There are currently some problems with igb.

- On 32bit arches, maintaining 64bit counters without proper
synchronization between writers and readers.

- Stats updated every two seconds, as reported by Jesper.
   (Jesper provided a patch for this)

- Potential problem between worker thread and ethtool -S

This patch uses u64_stats_sync, and convert everything to be 64bit safe,
SMP safe, even on 32bit arches. It integrates Jesper idea of providing
accurate stats at the time user reads them.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Tested-by: Emil Tantilov <emil.s.tantilov@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonetxen: mask correctable error
amit salecha [Mon, 18 Oct 2010 02:03:42 +0000 (02:03 +0000)]
netxen: mask correctable error

HW workaround:
Disable logging of correctable error for some NX3031 based adapter.

Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonetxen: fix race in tx stop queue
Rajesh Borundia [Mon, 18 Oct 2010 02:03:41 +0000 (02:03 +0000)]
netxen: fix race in tx stop queue

There is race between netif_stop_queue and netif_stopped_queue
check.So check once again if buffers are available to avoid race.
With above logic we can also get rid of tx lock in process_cmd_ring.

Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com>
Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoqlcnic: update ethtool stats
amit salecha [Mon, 18 Oct 2010 01:47:48 +0000 (01:47 +0000)]
qlcnic: update ethtool stats

Added statistics for Nic Partition supported adapter.
These statistics are maintined in device.

Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoIPv4: route.c: Change checks against 0xffffffff to ipv4_is_lbcast()
Andy Walls [Sun, 17 Oct 2010 15:11:22 +0000 (15:11 +0000)]
IPv4: route.c: Change checks against 0xffffffff to ipv4_is_lbcast()

Change a few checks against the hardcoded broadcast address,
0xffffffff, to ipv4_is_lbcast().  Remove some existing checks
using ipv4_is_lbcast() that are now obviously superfluous.

Signed-off-by: Andy Walls <awalls@md.metrocast.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoMerge branch 'can/mcp251x-for-net-next' of git://git.pengutronix.de/git/mkl/linux-2.6
David S. Miller [Mon, 18 Oct 2010 14:11:44 +0000 (07:11 -0700)]
Merge branch 'can/mcp251x-for-net-next' of git://git.pengutronix.de/git/mkl/linux-2.6

14 years agobnx2x: update version to 1.60.00-2
Dmitry Kravkov [Sun, 17 Oct 2010 23:08:55 +0000 (23:08 +0000)]
bnx2x: update version to 1.60.00-2

Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agobnx2x: remove unnecessary FUNC_FLG_RSS flag and related
Dmitry Kravkov [Sun, 17 Oct 2010 23:08:53 +0000 (23:08 +0000)]
bnx2x: remove unnecessary FUNC_FLG_RSS flag and related

As suggested by: Joe Perches <joe@perches.com>

Although RSS is meaningless when there is a single HW queue we
still need it enabled in order to have HW Rx hash generated.

Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Vladislav Zolotarov <vladz@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agobnx2x: Use correct FW constant for header padding
Dmitry Kravkov [Sun, 17 Oct 2010 23:09:30 +0000 (23:09 +0000)]
bnx2x: Use correct FW constant for header padding

the value of the constant is the same, but it's clearer to use original
constant provided by HSI

Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agobnx2x: do not deal with power if no capability
Dmitry Kravkov [Sun, 17 Oct 2010 23:10:02 +0000 (23:10 +0000)]
bnx2x: do not deal with power if no capability

Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Vladislav Zolotarov <vladz@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agobnx2x: remove redundant commands during error handling
Dmitry Kravkov [Sun, 17 Oct 2010 23:05:09 +0000 (23:05 +0000)]
bnx2x: remove redundant commands during error handling

Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agobnx2x: Optimized the branching in the bnx2x_rx_int()
Vladislav Zolotarov [Sun, 17 Oct 2010 23:02:20 +0000 (23:02 +0000)]
bnx2x: Optimized the branching in the bnx2x_rx_int()

Optimized the branching in the bnx2x_rx_int() based on the fact
that FP CQE will always have at least one of START or STOP flags set,
so if not both bits are set and START bit is not set,
then it's a STOP bit that is set.

Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Vladislav Zolotarov <vladz@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agocan: mcp251x: optimize 2515, rx int gets cleared automatically
Marc Kleine-Budde [Mon, 4 Oct 2010 10:09:31 +0000 (12:09 +0200)]
can: mcp251x: optimize 2515, rx int gets cleared automatically

Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Acked-by: Wolfgang Grandegger <wg@grandegger.com>
14 years agocan: mcp251x: define helper functions mcp251x_is_2510, mcp251x_is_2515
Marc Kleine-Budde [Thu, 23 Sep 2010 19:34:28 +0000 (21:34 +0200)]
can: mcp251x: define helper functions mcp251x_is_2510, mcp251x_is_2515

Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Acked-by: Wolfgang Grandegger <wg@grandegger.com>
14 years agocan: mcp251x: Don't use pdata->model for chip selection anymore
Marc Kleine-Budde [Mon, 18 Oct 2010 13:00:18 +0000 (15:00 +0200)]
can: mcp251x: Don't use pdata->model for chip selection anymore

Since commit e446630c960946b5c1762e4eadb618becef599e7, i.e. v2.6.35-rc1,
the mcp251x chip model can be selected via the modalias member in the
struct spi_board_info. The driver stores the actual model in the
struct mcp251x_platform_data.

From the driver point of view the platform_data should be read only.
Since all in-tree users of the mcp251x have already been converted to
the modalias method, this patch moves the "model" member from the
struct mcp251x_platform_data to the driver's private data structure.

Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Cc: Christian Pellegrin <chripell@fsfe.org>
Cc: Marc Zyngier <maz@misterjones.org>
14 years agocan: mcp251x: write intf only when needed
Marc Kleine-Budde [Tue, 28 Sep 2010 08:18:34 +0000 (10:18 +0200)]
can: mcp251x: write intf only when needed

This patch introduces a variable "clear_intf" that hold the bits that
should be cleared. Only read-modify-write register if "clear_intf"
is set.

Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Acked-by: Wolfgang Grandegger <wg@grandegger.com>
14 years agocan: mcp251x: read-modify-write eflag only when needed
Sascha Hauer [Tue, 28 Sep 2010 08:00:47 +0000 (10:00 +0200)]
can: mcp251x: read-modify-write eflag only when needed

Use read-modify-write instead of a simple write to change the register
contents, to close existing the race window between the original manual
read and write.

Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Acked-by: Wolfgang Grandegger <wg@grandegger.com>
14 years agocan: mcp251x: allow to read two registers in one spi transfer
Sascha Hauer [Tue, 28 Sep 2010 07:53:35 +0000 (09:53 +0200)]
can: mcp251x: allow to read two registers in one spi transfer

This patch bases on work done earlier by David Jander.

Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Acked-by: David Jander <david@protonic.nl>
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Acked-by: Wolfgang Grandegger <wg@grandegger.com>
14 years agocan: mcp251x: increase rx_errors on overflow, not only rx_over_errors
Sascha Hauer [Thu, 30 Sep 2010 07:46:00 +0000 (09:46 +0200)]
can: mcp251x: increase rx_errors on overflow, not only rx_over_errors

Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Acked-by: Wolfgang Grandegger <wg@grandegger.com>
14 years agocan: mcp251x: fix NOHZ local_softirq_pending 08 warning
Marc Kleine-Budde [Mon, 4 Oct 2010 08:50:51 +0000 (10:50 +0200)]
can: mcp251x: fix NOHZ local_softirq_pending 08 warning

This patch replaces netif_rx() with netif_rx_ni() which has to be used
from the threaded interrupt i.e. process context context.

Thanks to Christian Pellegrin for pointing at the right fix:
481a8199142c050b72bff8a1956a49fd0a75bbe0 by Oliver Hartkopp.

Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Acked-by: Wolfgang Grandegger <wg@grandegger.com>
14 years agons83820: spin_lock_irq() => spin_lock()
Dan Carpenter [Wed, 13 Oct 2010 09:18:53 +0000 (09:18 +0000)]
ns83820: spin_lock_irq() => spin_lock()

This is essentially cosmetic.  At this point the IRQs are already
disabled because we called spin_lock_irq(&dev->rx_info.lock).

The real bug here was fixed back in 2006 in 3a10ccebe: "[PATCH] lock
validator: fix ns83820.c irq-flags bug".  Prior to that patch, it was
a "spin_lock_irq is not nestable" type bug.  The 2006 patch changes the
unlock to not re-enable IRQs, which eliminates the potential deadlock.

But this bit was missed.  We should change the lock function as well so
it balances nicely.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agotipc: Simplify bearer shutdown logic
Allan Stephens [Thu, 14 Oct 2010 13:58:26 +0000 (13:58 +0000)]
tipc: Simplify bearer shutdown logic

Optimize processing in TIPC's bearer shutdown code, including:

1. Remove an unnecessary check to see if TIPC bearer's can exist.
2. Don't release spinlocks before calling a media-specific disabling
routine, since the routine can't sleep.
3. Make bearer_disable() operate directly on a struct bearer, instead
of needlessly taking a name and then mapping that to the struct.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Reviewed-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agotipc: Kill tipc_get_mode() completely.
David S. Miller [Mon, 18 Oct 2010 08:06:20 +0000 (01:06 -0700)]
tipc: Kill tipc_get_mode() completely.

It's completely unused and exporting a static symbol
makes no sense and breaks the build.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agotg3: Update version to 3.115
Matt Carlson [Thu, 14 Oct 2010 10:37:45 +0000 (10:37 +0000)]
tg3: Update version to 3.115

This patch updates the tg3 version to 3.115.

Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Reviewed-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agotg3: Report invalid link from tg3_get_settings()
Matt Carlson [Thu, 14 Oct 2010 10:37:44 +0000 (10:37 +0000)]
tg3: Report invalid link from tg3_get_settings()

Currently the tg3 driver leaves the speed and duplex fields
uninitialized in tg3_get_settings() if the device is not up.  This can
lead to some strange deductions in certain versions of ethtool.  When
the device is up and the link is down, the driver reports SPEED_INVALID
and DUPLEX_INVALID for these fields.  This patch makes the presentation
consistent by returning SPEED_INVALID and DUPLEX_INVALID when the device
has not been brought up as well.

Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Reviewed-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agotg3: Don't allocate jumbo ring for 5780 class devs
Matt Carlson [Thu, 14 Oct 2010 10:37:43 +0000 (10:37 +0000)]
tg3: Don't allocate jumbo ring for 5780 class devs

The 5714, 5715, and 5780 devices do not have a separate rx jumbo
producer ring.  This patch changes the code so that resources are not
allocated for it.

Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Reviewed-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agotg3: Cleanup tg3_alloc_rx_skb()
Matt Carlson [Thu, 14 Oct 2010 10:37:42 +0000 (10:37 +0000)]
tg3: Cleanup tg3_alloc_rx_skb()

src_map is no longer used in tg3_alloc_rx_skb().  Remove it.

Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Reviewed-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agotg3: Add EEE support
Matt Carlson [Thu, 14 Oct 2010 10:37:41 +0000 (10:37 +0000)]
tg3: Add EEE support

This patch adds Energy Efficient Ethernet (EEE) support for the 5718
device ID and the 57765 B0 asic revision.

Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Reviewed-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agotg3: Add clause 45 register accessor methods
Matt Carlson [Thu, 14 Oct 2010 10:37:40 +0000 (10:37 +0000)]
tg3: Add clause 45 register accessor methods

This patch adds clause 45 register access methods.  They will be used in
the following patch.

Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Reviewed-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agotg3: Disable unused transmit rings
Matt Carlson [Thu, 14 Oct 2010 10:37:39 +0000 (10:37 +0000)]
tg3: Disable unused transmit rings

This patch allows the driver to disable the additional transmit rings
available on the 5717 and 5719 devices.  This is not strictly necessary,
but is done anyways for correctness.

Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Reviewed-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agotg3: Add support for selfboot format 1 v6
Matt Carlson [Thu, 14 Oct 2010 10:37:38 +0000 (10:37 +0000)]
tg3: Add support for selfboot format 1 v6

5718 B0 and 5719 devices will use a new selfboot firmware format.  This
patch adds code to detect the new format so that bootcode versions get
reported correctly.

Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Reviewed-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agofib_hash: RCU conversion phase 2
Eric Dumazet [Thu, 14 Oct 2010 20:56:39 +0000 (20:56 +0000)]
fib_hash: RCU conversion phase 2

Get rid of fib_hash_lock rwlock.

The fn_zone hash table resize is the noticeable part of this patch.

I added a seqlock per fn_zone, so that readers can restart their lookup
in the (very rare) case a writer expanded the hash table.

Add rcu heads in fib_alias and fib_node, use call_rcu() to defer their
freeing, and use appropriate _rcu list manipulations.

Stress test (160.000.000 udp frames sent, IP route cache disabled to
mimic DDOS attack, FIB_HASH)

Before:
real 0m41.191s
user 0m13.137s
sys 8m55.241s

After:
real 0m38.091s
user 0m13.189s
sys 7m53.018s

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agofib_hash: RCU conversion phase 1
Eric Dumazet [Thu, 14 Oct 2010 20:53:34 +0000 (20:53 +0000)]
fib_hash: RCU conversion phase 1

First step for RCU conversion of fib_hash :

struct fn_zone are created and never deleted.

Very classic conversion, using rcu_assign_pointer(), rcu_dereference()
and rtnl_dereference() verbs.

__rcu markers on fz_next and fn_zone_list

They are created under RTNL, we dont need fib_hash_lock anymore in
fn_new_zone().

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agofib_hash: embed initial hash table in fn_zone
Eric Dumazet [Thu, 14 Oct 2010 20:53:04 +0000 (20:53 +0000)]
fib_hash: embed initial hash table in fn_zone

While looking for false sharing problems, I noticed
sizeof(struct fn_zone) was small (28 bytes) and possibly sharing a cache
line with an often written kernel structure.

Most of the time, fn_zone uses its initial hash table of 16 slots.

We can avoid the false sharing problem by embedding this initial hash
table in fn_zone itself, so that sizeof(fn_zone) > L1_CACHE_BYTES

We did a similar optimization in commit a6501e080c (Reduce memory needs
and speedup lookups)

Add a fz_revorder field to speedup fn_hash() a bit.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agodrivers/net/pch_gbe: Use DEFINE_PCI_DEVICE_TABLE
Joe Perches [Thu, 14 Oct 2010 09:55:50 +0000 (09:55 +0000)]
drivers/net/pch_gbe: Use DEFINE_PCI_DEVICE_TABLE

Use the standard macro to put this table in __devinitconst.

Compiled, untested.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonetns: reorder fields in struct net
Eric Dumazet [Thu, 14 Oct 2010 05:56:18 +0000 (05:56 +0000)]
netns: reorder fields in struct net

In a network bench, I noticed an unfortunate false sharing between
'loopback_dev' and 'count' fields in "struct net".

'count' is written each time a socket is created or destroyed, while
loopback_dev might be often read in routing code.

Move loopback_dev in a read mostly section of "struct net"

Note: struct netns_xfrm is cache line aligned on SMP.
(It contains a "struct dst_ops")
Move it at the end to avoid holes, and reduce sizeof(struct net) by 128
bytes on ia32.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agotcp: use correct counters in CA_CWR state too
Ilpo Järvinen [Thu, 14 Oct 2010 01:52:09 +0000 (01:52 +0000)]
tcp: use correct counters in CA_CWR state too

As CWR is stronger than CA_Disorder state, we can miscount
SACK/Reno failure into other timeouts. Not a bad problem as
it can happen only due to ECN, FRTO detecting spurious RTO
or xmit error which are the only callers of tcp_enter_cwr.
And even then losses and RTO must still follow thereafter
to actually end up into the relevant code paths.

Compile tested.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agotcp: sack lost marking fixes
Ilpo Järvinen [Thu, 14 Oct 2010 01:42:30 +0000 (01:42 +0000)]
tcp: sack lost marking fixes

When only fast rexmit should be done, tcp_mark_head_lost marks
L too far. Also, sacked_upto below 1 is perfectly valid number,
the packets == 0 then needs to be trapped elsewhere.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agostmmac: remove ifdef NETIF_F_TSO from stmmac_ethtool.c
Giuseppe Cavallaro [Sun, 17 Oct 2010 20:43:56 +0000 (13:43 -0700)]
stmmac: remove ifdef NETIF_F_TSO from stmmac_ethtool.c

Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Reported-by: Armando Visconti <armando.visconti@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoniu: introduce temp variables to avoid sparse warnings when swapping in-situ
Harvey Harrison [Wed, 13 Oct 2010 18:59:13 +0000 (18:59 +0000)]
niu: introduce temp variables to avoid sparse warnings when swapping in-situ

Suppress a large block of warnings like:
drivers/net/niu.c:7094:38: warning: incorrect type in assignment (different base types)
drivers/net/niu.c:7094:38:    expected restricted __be32 [usertype] ip4src
drivers/net/niu.c:7094:38:    got unsigned long long
drivers/net/niu.c:7104:17: warning: cast from restricted __be32
...

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonet: move MII outside of NET_ETHERNET, fix kconfig warning
Randy Dunlap [Wed, 13 Oct 2010 15:18:59 +0000 (15:18 +0000)]
net: move MII outside of NET_ETHERNET, fix kconfig warning

We have USB, PCMCIA, and gigabit ethernet drivers that select
MII even though NET_ETHERNET is not enabled, so make MII not
be dependent on NET_ETHERNET.  It is still dependent on NET
and NETDEVICES.

Fixes kconfig unmet dependency warning (shortened, was very long string):

warning: (ARM_AT91_ETHER && NETDEVICES && NET_ETHERNET && ARM && ARCH_AT91RM9200 || ARM_KS8695_ETHER && NETDEVICES && NET_ETHERNET && ARM && ARCH_KS8695 || ... || IP1000 && NETDEVICES && NETDEV_1000 && PCI && EXPERIMENTAL || HAMACHI && NETDEVICES && NETDEV_1000 && PCI || R8169 && NETDEVICES && NETDEV_1000 && PCI || SIS190 && NETDEVICES && NETDEV_1000 && PCI || VIA_VELOCITY && NETDEVICES && NETDEV_1000 && PCI || ATL1 && NETDEVICES && NETDEV_1000 && PCI || ATL1E && NETDEVICES && NETDEV_1000 && PCI && EXPERIMENTAL || ATL1C && NETDEVICES && NETDEV_1000 && PCI && EXPERIMENTAL || JME && NETDEVICES && NETDEV_1000 && PCI || STMMAC_ETH && NETDEV_1000 && NETDEVICES && HAS_IOMEM || USB_PEGASUS && NETDEVICES && USB && NET || USB_RTL8150 && NETDEVICES && USB && NET && EXPERIMENTAL || USB_USBNET && NETDEVICES && USB && NET || PCMCIA_SMC91C92 && NETDEVICES && NET_PCMCIA && PCMCIA) selects MII which has unmet direct dependencies (NETDEVICES && NET_ETHERNET)

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Jeff Garzik <jgarzik@pobox.com> [2006-NOV-30]
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoinfiniband: fix mlx4 kconfig dependency warning
Randy Dunlap [Wed, 13 Oct 2010 15:12:53 +0000 (15:12 +0000)]
infiniband: fix mlx4 kconfig dependency warning

Fix kconfig dependency warning to satisfy dependencies:

warning: (MLX4_EN && NETDEVICES && NETDEV_10000 && PCI && INET || MLX4_INFINIBAND && INFINIBAND) selects MLX4_CORE which has unmet direct dependencies (NETDEVICES && NETDEV_10000 && PCI)

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agostmmac: make function tables const
stephen hemminger [Wed, 13 Oct 2010 14:51:25 +0000 (14:51 +0000)]
stmmac: make function tables const

These tables only contain function pointers.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Gustavo F. Padovan <padovan@profusion.mobi>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agostmmac: make ethtool functions local
stephen hemminger [Wed, 13 Oct 2010 14:50:31 +0000 (14:50 +0000)]
stmmac: make ethtool functions local

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Gustavo F. Padovan <padovan@profusion.mobi>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agotipc: cleanup function namespace
stephen hemminger [Wed, 13 Oct 2010 13:20:35 +0000 (13:20 +0000)]
tipc: cleanup function namespace

Do some cleanups of TIPC based on make namespacecheck
  1. Don't export unused symbols
  2. Eliminate dead code
  3. Make functions and variables local
  4. Rename buf_acquire to tipc_buf_acquire since it is used in several files

Compile tested only.
This make break out of tree kernel modules that depend on TIPC routines.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agovia-velocity: forced 1000 Mbps mode support.
françois romieu [Wed, 13 Oct 2010 09:26:05 +0000 (09:26 +0000)]
via-velocity: forced 1000 Mbps mode support.

Full duplex only. Half duplex 1000 Mbps is not supported.

Signed-off-by: David Lv <DavidLv@viatech.com.cn>
Acked-by: Francois Romieu <romieu@fr.zoreil.com>
Tested-by: Seguier Regis <rseguier@e-teleport.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agofib: avoid false sharing on fib_table_hash
Eric Dumazet [Wed, 13 Oct 2010 08:22:03 +0000 (08:22 +0000)]
fib: avoid false sharing on fib_table_hash

While doing profile analysis, I found fib_hash_table was sometime in a
cache line shared by a possibly often written kernel structure.

(CONFIG_IP_ROUTE_MULTIPATH || !CONFIG_IPV6_MULTIPLE_TABLES)

It's hard to detect because not easily reproductible.

Make sure we allocate a full cache line to keep this shared in all cpus
caches.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agofib_trie: use fls() instead of open coded loop
Eric Dumazet [Wed, 13 Oct 2010 06:56:11 +0000 (06:56 +0000)]
fib_trie: use fls() instead of open coded loop

fib_table_lookup() might use fls() to speedup an open coded loop.

Noticed while doing a profile analysis.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agofib: remove a useless synchronize_rcu() call
Eric Dumazet [Wed, 13 Oct 2010 04:43:04 +0000 (04:43 +0000)]
fib: remove a useless synchronize_rcu() call

fib_nl_delrule() calls synchronize_rcu() for no apparent reason,
while rtnl is held.

I suspect it was done to avoid an atomic_inc_not_zero() in
fib_rules_lookup(), which commit 7fa7cb7109d07 added anyway.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agofib6: use FIB_LOOKUP_NOREF in fib6_rule_lookup()
Eric Dumazet [Wed, 13 Oct 2010 02:45:40 +0000 (02:45 +0000)]
fib6: use FIB_LOOKUP_NOREF in fib6_rule_lookup()

Avoid two atomic ops on found rule in fib6_rule_lookup()

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agosundance: Add initial ethtool stats support
Denis Kirjanov [Wed, 13 Oct 2010 00:56:09 +0000 (00:56 +0000)]
sundance: Add initial ethtool stats support

Add ethtool stats support.

Signed-off-by: Denis Kirjanov <dkirjanov@kernel.org>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agopch_gbe: fix if condition in set_settings()
Dan Carpenter [Tue, 12 Oct 2010 23:36:19 +0000 (23:36 +0000)]
pch_gbe: fix if condition in set_settings()

There were no curly braces in this if condition so it always enabled full
duplex.

And ecmd->speed is an unsigned short so it is never equal to -1.  The
effect is that mii_ethtool_sset() fails with -EINVAL and an error is
printed to dmesg.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agodnet: mark methods static and annotate for correct endianness
Harvey Harrison [Tue, 12 Oct 2010 22:20:34 +0000 (22:20 +0000)]
dnet: mark methods static and annotate for correct endianness

Their doesn't appear to be bugs with the endianness handling here, just get the
annotations right to keep sparse happy.

Suppresses the following sparse warnings:
drivers/net/dnet.c:30:5: warning: symbol 'dnet_readw_mac' was not declared. Should it be static?
drivers/net/dnet.c:49:6: warning: symbol 'dnet_writew_mac' was not declared. Should it be static?
drivers/net/dnet.c:364:5: warning: symbol 'dnet_phy_marvell_fixup' was not declared. Should it be static?
drivers/net/dnet.c:66:13: warning: incorrect type in assignment (different base types)
drivers/net/dnet.c:66:13:    expected unsigned short [unsigned] [usertype] tmp
drivers/net/dnet.c:66:13:    got restricted __be16 [usertype] <noident>
drivers/net/dnet.c:68:13: warning: incorrect type in assignment (different base types)
drivers/net/dnet.c:68:13:    expected unsigned short [unsigned] [usertype] tmp
drivers/net/dnet.c:68:13:    got restricted __be16 [usertype] <noident>
drivers/net/dnet.c:70:13: warning: incorrect type in assignment (different base types)
drivers/net/dnet.c:70:13:    expected unsigned short [unsigned] [usertype] tmp
drivers/net/dnet.c:70:13:    got restricted __be16 [usertype] <noident>
drivers/net/dnet.c:92:27: warning: cast to restricted __be16
drivers/net/dnet.c:94:33: warning: cast to restricted __be16
drivers/net/dnet.c:96:33: warning: cast to restricted __be16

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agocxgb4vf: make single bit signed bitfields unsigned
Harvey Harrison [Tue, 12 Oct 2010 21:52:26 +0000 (21:52 +0000)]
cxgb4vf: make single bit signed bitfields unsigned

Single bit signed bitfields don't make a lot of sense, noticed by sparse:
drivers/net/cxgb4vf/t4vf_common.h:135:31: error: dubious one-bit signed bitfield
drivers/net/cxgb4vf/t4vf_common.h:136:36: error: dubious one-bit signed bitfield
drivers/net/cxgb4vf/t4vf_common.h:137:36: error: dubious one-bit signed bitfield
drivers/net/cxgb4vf/t4vf_common.h:138:36: error: dubious one-bit signed bitfield
drivers/net/cxgb4vf/t4vf_common.h:139:36: error: dubious one-bit signed bitfield
drivers/net/cxgb4vf/t4vf_common.h:140:31: error: dubious one-bit signed bitfield
drivers/net/cxgb4vf/t4vf_common.h:141:31: error: dubious one-bit signed bitfield
drivers/net/cxgb4vf/t4vf_common.h:142:35: error: dubious one-bit signed bitfield
drivers/net/cxgb4vf/t4vf_common.h:143:35: error: dubious one-bit signed bitfield
drivers/net/cxgb4vf/t4vf_common.h:154:27: error: dubious one-bit signed bitfield
drivers/net/cxgb4vf/t4vf_common.h:155:26: error: dubious one-bit signed bitfield
drivers/net/cxgb4vf/t4vf_common.h:156:27: error: dubious one-bit signed bitfield
drivers/net/cxgb4vf/t4vf_common.h:157:26: error: dubious one-bit signed bitfield

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonet: allocate skbs on local node
Eric Dumazet [Mon, 11 Oct 2010 19:05:25 +0000 (19:05 +0000)]
net: allocate skbs on local node

commit b30973f877 (node-aware skb allocation) spread a wrong habit of
allocating net drivers skbs on a given memory node : The one closest to
the NIC hardware. This is wrong because as soon as we try to scale
network stack, we need to use many cpus to handle traffic and hit
slub/slab management on cross-node allocations/frees when these cpus
have to alloc/free skbs bound to a central node.

skb allocated in RX path are ephemeral, they have a very short
lifetime : Extra cost to maintain NUMA affinity is too expensive. What
appeared as a nice idea four years ago is in fact a bad one.

In 2010, NIC hardwares are multiqueue, or we use RPS to spread the load,
and two 10Gb NIC might deliver more than 28 million packets per second,
needing all the available cpus.

Cost of cross-node handling in network and vm stacks outperforms the
small benefit hardware had when doing its DMA transfert in its 'local'
memory node at RX time. Even trying to differentiate the two allocations
done for one skb (the sk_buff on local node, the data part on NIC
hardware node) is not enough to bring good performance.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agor8169: use 50% less ram for RX ring
Eric Dumazet [Mon, 11 Oct 2010 11:17:47 +0000 (11:17 +0000)]
r8169: use 50% less ram for RX ring

Using standard skb allocations in r8169 leads to order-3 allocations (if
PAGE_SIZE=4096), because NIC needs 16383 bytes, and skb overhead makes
this bigger than 16384 -> 32768 bytes per "skb"

Using kmalloc() permits to reduce memory requirements of one r8169 nic
by 4Mbytes. (256 frames * 16Kbytes). This is fine since a hardware bug
requires us to copy incoming frames, so we build real skb when doing
this copy.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoixgbe: DCB: remove DCB check config
John Fastabend [Fri, 15 Oct 2010 16:27:38 +0000 (09:27 -0700)]
ixgbe: DCB: remove DCB check config

Remove a DCB check config from DCB configuration we
continue to configure DCB even if it fails so don't
even bother to check.  Plus user space (lldpad) checks
this before programming the hw anyways.

Worse case is we program some values into the hw that
don't make total sense resulting in incorrect bandwidth
allocation.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoigb: add check for fiber/serdes devices to igb_set_spd_dplx;
Carolyn Wyborny [Tue, 12 Oct 2010 22:27:02 +0000 (22:27 +0000)]
igb: add check for fiber/serdes devices to igb_set_spd_dplx;

Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Tested-by: Emil Tantilov <emil.s.tantilov@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoixgbe: declare functions as static
Emil Tantilov [Tue, 12 Oct 2010 22:20:59 +0000 (22:20 +0000)]
ixgbe: declare functions as static

Following patch fixes warnings reported by `make namespacecheck`

Reported by Stephen Hemminger

CC: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoixgbe: remove unused functions
Emil Tantilov [Tue, 12 Oct 2010 22:20:34 +0000 (22:20 +0000)]
ixgbe: remove unused functions

Remove functions that are declared, but not used in the driver.
This patch fixes warnings reported by `make namespacecheck`

Reported by Stephen Hemminger

CC: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agocnic: Add support for 57712 device
Michael Chan [Wed, 13 Oct 2010 14:06:51 +0000 (14:06 +0000)]
cnic: Add support for 57712 device

Add new interrupt ack functions and other hardware interface logic to
support the new device.

Update version to 2.2.6.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agocnic: Decouple uio close from cnic shutdown
Michael Chan [Wed, 13 Oct 2010 14:06:50 +0000 (14:06 +0000)]
cnic: Decouple uio close from cnic shutdown

During cnic shutdown, the original driver code requires userspace to
close the uio device within a few seconds.  This doesn't always happen
as the userapp may be hung or otherwise take a long time to close.  The
system may crash when this happens.

We fix the problem by decoupling the uio structures from the cnic
structures during cnic shutdown.  We do not unregister the uio device
until the cnic driver is unloaded.  This eliminates the unreliable wait
loop for uio to close.

All uio structures are kept in a linked list.  If the device is shutdown
and later brought back up again, the uio strcture will be found in the
linked list and coupled back to the cnic structures.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agocnic: Add cnic_uio_dev struct
Michael Chan [Wed, 13 Oct 2010 14:06:49 +0000 (14:06 +0000)]
cnic: Add cnic_uio_dev struct

and put all uio related structures and ring buffers in it.  This allows
uio operations to be done independent of the cnic device structures.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agocnic: Add cnic_free_uio()
Michael Chan [Wed, 13 Oct 2010 14:06:48 +0000 (14:06 +0000)]
cnic: Add cnic_free_uio()

to free all UIO related structures.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agocnic: Defer iscsi connection cleanup
Michael Chan [Wed, 13 Oct 2010 14:06:47 +0000 (14:06 +0000)]
cnic: Defer iscsi connection cleanup

The bnx2x devices require a 2 second quiet time before sending the last
RAMROD command to destroy a connection.  This sleep wait adds up to a
long delay when iscsid is serially destroying maultiple connections.

Create a workqueue to perform the final connection cleanup in the
background to speed up the process.  This significantly speeds up the
process as the wait time can be done in parallel for multiple connections.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agocnic: Add cnic_bnx2x_destroy_ramrod()
Michael Chan [Wed, 13 Oct 2010 14:06:46 +0000 (14:06 +0000)]
cnic: Add cnic_bnx2x_destroy_ramrod()

Refactoring code for the next patch to defer connection clean up.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agocnic: Convert ctx_flags to bit fields
Michael Chan [Wed, 13 Oct 2010 14:06:45 +0000 (14:06 +0000)]
cnic: Convert ctx_flags to bit fields

so that we can additional bit definitions without requiring a spinlock.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agocnic: Add common cnic_request_irq()
Michael Chan [Wed, 13 Oct 2010 14:06:44 +0000 (14:06 +0000)]
cnic: Add common cnic_request_irq()

to reduce some duplicate code.  Also, use tasklet_kill() in
cnic_free_irq() to wait for the cnic_irq_task to complete.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoDocumentation: Update Phonet doc for Pipe controller changes
Kumar Sanghvi [Tue, 12 Oct 2010 20:17:25 +0000 (20:17 +0000)]
Documentation: Update Phonet doc for Pipe controller changes

Updates to Phonet doc for Pipe controller 'connect' socket
implementation and changes related to socket options.

Signed-off-by: Kumar Sanghvi <kumar.sanghvi@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoPhonet: 'connect' socket implementation for Pipe controller
Kumar Sanghvi [Tue, 12 Oct 2010 20:14:43 +0000 (20:14 +0000)]
Phonet: 'connect' socket implementation for Pipe controller

Based on suggestion by Rémi Denis-Courmont to implement 'connect'
for Pipe controller logic,  this patch implements 'connect' socket
call for the Pipe controller logic.
The patch does following:-
- Removes setsockopts for PNPIPE_CREATE and PNPIPE_DESTROY
- Adds setsockopt for setting the Pipe handle value
- Implements connect socket call
- Updates the Pipe controller logic

User-space should now follow below sequence with Pipe controller:-
-socket
-bind
-setsockopt for PNPIPE_PIPE_HANDLE
-connect
-setsockopt for PNPIPE_ENCAP_IP
-setsockopt for PNPIPE_ENABLE

GPRS/3G data has been tested working fine with this.

Signed-off-by: Kumar Sanghvi <kumar.sanghvi@stericsson.com>
Acked-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agotipc: clean out all instances of #if 0'd unused code
Paul Gortmaker [Tue, 12 Oct 2010 14:25:58 +0000 (14:25 +0000)]
tipc: clean out all instances of #if 0'd unused code

Remove all instances of legacy, or as yet to be implemented code
that is currently living within an #if 0 ... #endif block.
In the rare instance that some of it be needed in the future,
it can still be dragged out of history, but there is no need
for it to sit in mainline.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agos390: ctcm_mpc: Fix build after netdev refcount changes.
David S. Miller [Wed, 13 Oct 2010 16:11:26 +0000 (09:11 -0700)]
s390: ctcm_mpc: Fix build after netdev refcount changes.

Reported-by: Sachin Sant <sachinp@in.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonet: percpu net_device refcount
Eric Dumazet [Mon, 11 Oct 2010 10:22:12 +0000 (10:22 +0000)]
net: percpu net_device refcount

We tried very hard to remove all possible dev_hold()/dev_put() pairs in
network stack, using RCU conversions.

There is still an unavoidable device refcount change for every dst we
create/destroy, and this can slow down some workloads (routers or some
app servers, mmap af_packet)

We can switch to a percpu refcount implementation, now dynamic per_cpu
infrastructure is mature. On a 64 cpus machine, this consumes 256 bytes
per device.

On x86, dev_hold(dev) code :

before
        lock    incl 0x280(%ebx)
after:
        movl    0x260(%ebx),%eax
        incl    fs:(%eax)

Stress bench :

(Sending 160.000.000 UDP frames,
IP route cache disabled, dual E5540 @2.53GHz,
32bit kernel, FIB_TRIE)

Before:

real    1m1.662s
user    0m14.373s
sys     12m55.960s

After:

real    0m51.179s
user    0m15.329s
sys     10m15.942s

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agobnx2x: Fixing a typo: added a missing RSS enablement
Dmitry Kravkov [Tue, 12 Oct 2010 09:02:21 +0000 (09:02 +0000)]
bnx2x: Fixing a typo: added a missing RSS enablement

Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Vladislav Zolotarov <vladz@broadcom.com>
Tested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoMerge branch 'dccp' of git://eden-feed.erg.abdn.ac.uk/net-next-2.6
David S. Miller [Tue, 12 Oct 2010 18:43:42 +0000 (11:43 -0700)]
Merge branch 'dccp' of git://eden-feed.erg.abdn.ac.uk/net-next-2.6

14 years agodccp: cosmetics - warning format
Gerrit Renker [Mon, 11 Oct 2010 18:44:42 +0000 (20:44 +0200)]
dccp: cosmetics - warning format

This  omits the redundant "DCCP:" in warning messages, since DCCP_WARN() already
echoes the function name, avoiding messages like

   kernel: [10988.766503] dccp_close: DCCP: ABORT -- 209 bytes unread

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
14 years agodccp: schedule an Ack when receiving timestamps
Gerrit Renker [Mon, 11 Oct 2010 18:41:13 +0000 (20:41 +0200)]
dccp: schedule an Ack when receiving timestamps

This schedules an Ack when receiving a timestamp, exploiting the
existing inet_csk_schedule_ack() function, saving one case in the
`dccp_ack_pending()' function.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
14 years agodccp: generalise data-loss condition
Ivo Calado [Mon, 11 Oct 2010 18:40:04 +0000 (20:40 +0200)]
dccp: generalise data-loss condition

This patch generalises the task of determining data loss from RFC 4340, 7.7.1.

Let S_A, S_B be sequence numbers such that S_B is "after" S_A, and let
N_B be the NDP count of packet S_B. Then, using modulo-2^48 arithmetic,
 D = S_B - S_A - 1  is an upper bound of the number of lost data packets,
 D - N_B            is an approximation of the number of lost data packets
                    (there are cases where this is not exact).

The patch implements this as
 dccp_loss_count(S_A, S_B, N_B) := max(S_B - S_A - 1 - N_B, 0)

Signed-off-by: Ivo Calado <ivocalado@embedded.ufcg.edu.br>
Signed-off-by: Erivaldo Xavier <desadoc@gmail.com>
Signed-off-by: Leandro Sales <leandroal@gmail.com>
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
14 years agodccp: remove unused argument in CCID tx function
Gerrit Renker [Mon, 11 Oct 2010 18:37:38 +0000 (20:37 +0200)]
dccp: remove unused argument in CCID tx function

This removes the argument `more' from ccid_hc_tx_packet_sent, since it was
nowhere used in the entire code.

(Btw, this argument was not even used in the original KAME code where the
 function initially came from; compare the variable moreToSend in the
 freebsd61-dccp-kame-28.08.2006.patch kept by Emmanuel Lochin.)

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
14 years agodccp: merge now-reduced connect_init() function
Gerrit Renker [Mon, 11 Oct 2010 18:36:33 +0000 (20:36 +0200)]
dccp: merge now-reduced connect_init() function

After moving the assignment of GAR/ISS from dccp_connect_init() to
dccp_transmit_skb(), the former function becomes very small, so that
a merger with dccp_connect() suggests itself.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
14 years agodccp: fix the adjustments to AWL and SWL
Gerrit Renker [Mon, 11 Oct 2010 18:35:40 +0000 (20:35 +0200)]
dccp: fix the adjustments to AWL and SWL

This fixes a problem and a potential loophole with regard to seqno/ackno
validity: currently the initial adjustments to AWL/SWL are only performed
once at the begin of the connection, during the handshake.

Since the Sequence Window feature is always greater than Wmin=32 (7.5.2),
it is however necessary to perform these adjustments at least for the first
W/W' (variables as per 7.5.1) packets in the lifetime of a connection.

This requirement is complicated by the fact that W/W' can change at any time
during the lifetime of a connection.

Therefore it is better to perform that safety check each time SWL/AWL are
updated, as implemented by the patch.

A second problem solved by this patch is that the remote/local Sequence Window
feature values (which set the bounds for AWL/SWL/SWH) are undefined until the
feature negotiation has completed.

During the initial handshake we have more stringent sequence number protection;
the changes added by this patch effect that {A,S}W{L,H} are within the correct
bounds at the instant that feature negotiation completes (since the SeqWin
feature activation handlers call dccp_update_gsr/gss()).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
14 years agobnx2: Enable AER on PCIE devices only
Michael Chan [Mon, 11 Oct 2010 23:12:28 +0000 (16:12 -0700)]
bnx2: Enable AER on PCIE devices only

To prevent unnecessary error message.  pci_save_state() is also moved to
the end of ->probe() so that all PCI config, including AER state, will be
saved.

Update version to 2.0.18.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Reviewed-by: Benjamin Li <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agobnx2: Update firmware to 6.0.x.
Michael Chan [Mon, 11 Oct 2010 23:12:00 +0000 (16:12 -0700)]
bnx2: Update firmware to 6.0.x.

- Improved flow control and simplified interface
- Use hardware RSS indirection table instead of the slower firmware-
  based table
- Lower latency interrupt on 5709

Signed-off-by: Michael Chan <mchan@broadcom.com>
Reviewed-by: Benjamin Li <benli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoneigh: reorder struct neighbour fields
Eric Dumazet [Mon, 11 Oct 2010 12:20:54 +0000 (12:20 +0000)]
neigh: reorder struct neighbour fields

Le mardi 12 octobre 2010 à 00:02 +0200, Eric Dumazet a écrit :
> Here is the followup patch.
>
> Thanks !
>

Oops, this was an old version, the up2date ones also took care of "used"
field.

I guess its time for a sleep, sorry again.

[PATCH net-next V2] neigh: reorder struct neighbour fields

(refcnt) and (ha_lock, ha, used, dev, output, ops, primary_key) should
be placed on a separate cache lines.

refcnt can be often written, while other fields are mostly read.

This gave me good result on stress test :

before:

real    0m45.570s
user    0m15.525s
sys     9m56.669s

After:

real    0m41.841s
user    0m15.261s
sys     8m45.949s

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonet dst: use a percpu_counter to track entries
Eric Dumazet [Fri, 8 Oct 2010 06:37:34 +0000 (06:37 +0000)]
net dst: use a percpu_counter to track entries

struct dst_ops tracks number of allocated dst in an atomic_t field,
subject to high cache line contention in stress workload.

Switch to a percpu_counter, to reduce number of time we need to dirty a
central location. Place it on a separate cache line to avoid dirtying
read only fields.

Stress test :

(Sending 160.000.000 UDP frames,
IP route cache disabled, dual E5540 @2.53GHz,
32bit kernel, FIB_TRIE, SLUB/NUMA)

Before:

real    0m51.179s
user    0m15.329s
sys     10m15.942s

After:

real 0m45.570s
user 0m15.525s
sys 9m56.669s

With a small reordering of struct neighbour fields, subject of a
following patch, (to separate refcnt from other read mostly fields)

real 0m41.841s
user 0m15.261s
sys 8m45.949s

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoneigh: Protect neigh->ha[] with a seqlock
Eric Dumazet [Thu, 7 Oct 2010 10:44:07 +0000 (10:44 +0000)]
neigh: Protect neigh->ha[] with a seqlock

Add a seqlock in struct neighbour to protect neigh->ha[], and avoid
dirtying neighbour in stress situation (many different flows / dsts)

Dirtying takes place because of read_lock(&n->lock) and n->used writes.

Switching to a seqlock, and writing n->used only on jiffies changes
permits less dirtying.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoMerge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
David S. Miller [Mon, 11 Oct 2010 19:30:34 +0000 (12:30 -0700)]
Merge branch 'master' of /linux/kernel/git/davem/net-2.6

Conflicts:
net/core/ethtool.c

14 years agonet: clear heap allocations for privileged ethtool actions
Kees Cook [Mon, 11 Oct 2010 19:23:25 +0000 (12:23 -0700)]
net: clear heap allocations for privileged ethtool actions

Several other ethtool functions leave heap uncleared (potentially) by
drivers. Some interfaces appear safe (eeprom, etc), in that the sizes
are well controlled. In some situations (e.g. unchecked error conditions),
the heap will remain unchanged in areas before copying back to userspace.
Note that these are less of an issue since these all require CAP_NET_ADMIN.

Cc: stable@kernel.org
Signed-off-by: Kees Cook <kees.cook@canonical.com>
Acked-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoNET: pch, fix use after free
Jiri Slaby [Sun, 10 Oct 2010 23:26:56 +0000 (23:26 +0000)]
NET: pch, fix use after free

Stanse found that pch_gbe_xmit_frame uses skb after it is freed. Fix
that.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Masayuki Ohtake <masa-korg@dsn.okisemi.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoNET: wimax, fix use after free
Jiri Slaby [Sun, 10 Oct 2010 23:26:58 +0000 (23:26 +0000)]
NET: wimax, fix use after free

Stanse found that i2400m_rx frees skb, but still uses skb->len even
though it has skb_len defined. So use skb_len properly in the code.

And also define it unsinged int rather than size_t to solve
compilation warnings.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Inaky Perez-Gonzalez <inaky.perez-gonzalez@intel.com>
Cc: linux-wimax@intel.com
Acked-by: Inaky Perez-Gonzalez <inaky.perez-gonzalez@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoATM: iphase, remove sleep-inside-atomic
Jiri Slaby [Sun, 10 Oct 2010 23:26:57 +0000 (23:26 +0000)]
ATM: iphase, remove sleep-inside-atomic

Stanse found that ia_init_one locks a spinlock and inside of that it
calls ia_start which calls:
* request_irq
* tx_init which does kmalloc(GFP_KERNEL)

Both of them can thus sleep and result in a deadlock. I don't see a
reason to have a per-device spinlock there which is used only there
and inited right before the lock location. So remove it completely.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Chas Williams <chas@cmf.nrl.navy.mil>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoATM: mpc, fix use after free
Jiri Slaby [Sun, 10 Oct 2010 22:46:34 +0000 (22:46 +0000)]
ATM: mpc, fix use after free

Stanse found that mpc_push frees skb and then it dereferences it. It
is a typo, new_skb should be dereferenced there.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoATM: solos-pci, remove use after free
Jiri Slaby [Sun, 10 Oct 2010 21:50:44 +0000 (21:50 +0000)]
ATM: solos-pci, remove use after free

Stanse found we do in console_show:
  kfree_skb(skb);
  return skb->len;
which is not good. Fix that by remembering the len and use it in the
function instead.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Chas Williams <chas@cmf.nrl.navy.mil>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoneigh: speedup neigh_hh_init()
Eric Dumazet [Mon, 11 Oct 2010 16:16:57 +0000 (09:16 -0700)]
neigh: speedup neigh_hh_init()

When a new dst is used to send a frame, neigh_resolve_output() tries to
associate an struct hh_cache to this dst, calling neigh_hh_init() with
the neigh rwlock write locked.

Most of the time, hh_cache is already known and linked into neighbour,
so we find it and increment its refcount.

This patch changes the logic so that we call neigh_hh_init() with
neighbour lock read locked only, so that fast path can be run in
parallel by concurrent cpus.

This brings part of the speedup we got with commit c7d4426a98a5f
(introduce DST_NOCACHE flag) for non cached dsts, even for cached ones,
removing one of the contention point that routers hit on multiqueue
enabled machines.

Further improvements would need to use a seqlock instead of an rwlock to
protect neigh->ha[], to not dirty neigh too often and remove two atomic
ops.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonet/fec: carrier off initially to avoid root mount failure
Oskar Schirmer [Thu, 7 Oct 2010 02:30:30 +0000 (02:30 +0000)]
net/fec: carrier off initially to avoid root mount failure

with hardware slow in negotiation, the system did freeze
while trying to mount root on nfs at boot time.

the link state has not been initialised so network stack
tried to start transmission right away. this caused instant
retries, as the driver solely stated business upon link down,
rendering the system unusable.

notify carrier off initially to prevent transmission until
phylib will report link up.

Signed-off-by: Oskar Schirmer <oskar@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoirda: Fix heap memory corruption in iriap.c
Samuel Ortiz [Tue, 5 Oct 2010 23:03:12 +0000 (01:03 +0200)]
irda: Fix heap memory corruption in iriap.c

While parsing the GetValuebyClass command frame, we could potentially write
passed the skb->data pointer.

Cc: stable@kernel.org
Reported-by: Ilja Van Sprundel <ivansprundel@ioactive.com>
Signed-off-by: Samuel Ortiz <samuel@sortiz.org>