Don Skidmore [Tue, 29 Jun 2010 18:30:59 +0000 (18:30 +0000)]
ixgbe: add 1g PHY support for 82599
Add support for 1G SFP+ PHY's to 82599.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Wed, 30 Jun 2010 05:06:28 +0000 (05:06 +0000)]
sfc: Add support for RX flow hash control
Allow ethtool to query the number of RX rings, the fields used in RX
flow hashing and the hash indirection table.
Allow ethtool to update the RX flow hash indirection table.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Wed, 30 Jun 2010 05:05:23 +0000 (05:05 +0000)]
ethtool: Add support for control of RX flow hash indirection
Many NICs use an indirection table to map an RX flow hash value to one
of an arbitrary number of queues (not necessarily a power of 2). It
can be useful to remove some queues from this indirection table so
that they are only used for flows that are specifically filtered
there. It may also be useful to weight the mapping to account for
user processes with the same CPU-affinity as the RX interrupts.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Wed, 30 Jun 2010 02:47:40 +0000 (02:47 +0000)]
vmxnet3: Remove incorrect implementation of ethtool_ops::get_flags()
Only some netdev feature flags correspond directly to ethtool feature
flags. ethtool_op_get_flags() does the right thing.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Bhavesh Davda <bhavesh@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Wed, 30 Jun 2010 02:46:56 +0000 (02:46 +0000)]
netdev: Make ethtool_ops::set_flags() return -EINVAL for unsupported flags
The documented error code for attempts to set unsupported flags (or
to clear flags that cannot be disabled) is EINVAL, not EOPNOTSUPP.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Acked-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Wed, 30 Jun 2010 02:44:32 +0000 (02:44 +0000)]
ethtool: Change ethtool_op_set_flags to validate flags
ethtool_op_set_flags() does not check for unsupported flags, and has
no way of doing so. This means it is not suitable for use as a
default implementation of ethtool_ops::set_flags.
Add a 'supported' parameter specifying the flags that the driver and
hardware support, validate the requested flags against this, and
change all current callers to pass this parameter.
Change some other trivial implementations of ethtool_ops::set_flags to
call ethtool_op_set_flags().
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Reviewed-by: Stanislaw Gruszka <sgruszka@redhat.com>
Acked-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Casey Leedom [Tue, 29 Jun 2010 12:54:12 +0000 (12:54 +0000)]
cxgb4vf: Use correct shift factor for extracting the SGE DMA Ingress Padding Boundary
Use correct shift factor for extracting the SGE DMA Ingress Padding
Boundary. Was accidentally using the register field's shift which was
close enough (4 instead of the propper value of 5) that it actually
sort of worked for various packet sizes ...
Signed-off-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Casey Leedom [Tue, 29 Jun 2010 12:53:39 +0000 (12:53 +0000)]
cxgb4vf: Remove obsolete comment about the lack of a TX Timer Callback
Remove obsolete comment about the lack of a TX Timer Callback -- which
we now _do_ have ...
Signed-off-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Changli Gao [Tue, 29 Jun 2010 04:39:37 +0000 (04:39 +0000)]
fragment: add fast path for in-order fragments
add fast path for in-order fragments
As the fragments are sent in order in most of OSes, such as Windows, Darwin and
FreeBSD, it is likely the new fragments are at the end of the inet_frag_queue.
In the fast path, we check if the skb at the end of the inet_frag_queue is the
prev we expect.
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
----
include/net/inet_frag.h | 1 +
net/ipv4/ip_fragment.c | 12 ++++++++++++
net/ipv6/reassembly.c | 11 +++++++++++
3 files changed, 24 insertions(+)
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 30 Jun 2010 20:31:19 +0000 (13:31 -0700)]
snmp: 64bit ipstats_mib for all arches
/proc/net/snmp and /proc/net/netstat expose SNMP counters.
Width of these counters is either 32 or 64 bits, depending on the size
of "unsigned long" in kernel.
This means user program parsing these files must already be prepared to
deal with 64bit values, regardless of user program being 32 or 64 bit.
This patch introduces 64bit snmp values for IPSTAT mib, where some
counters can wrap pretty fast if they are 32bit wide.
# netstat -s|egrep "InOctets|OutOctets"
InOctets:
244068329096
OutOctets:
244069348848
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sathya Perla [Tue, 29 Jun 2010 00:11:17 +0000 (00:11 +0000)]
be2net: memory barrier fixes on IBM p7 platform
The ibm p7 architecure seems to reorder memory accesses more
aggressively than previous ppc64 architectures. This requires memory
barriers to ensure that rx/tx doorbells are pressed only after
memory to be DMAed is written.
Signed-off-by: Sathya Perla <sathyap@serverengines.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Wed, 30 Jun 2010 20:12:01 +0000 (13:12 -0700)]
cpmac: use resource_size()
The original code is off by one because we should start counting at
zero. So the size of the resource is end - start + 1. I switched it to
use resource_size() to do the calculation.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Changli Gao [Tue, 29 Jun 2010 23:07:09 +0000 (23:07 +0000)]
act_nat: use stack variable
act_nat: use stack variable
structure tc_nat isn't too big for stack, so we can put it in stack.
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
----
net/sched/act_nat.c | 31 ++++++++++---------------------
1 file changed, 10 insertions(+), 21 deletions(-)
Signed-off-by: David S. Miller <davem@davemloft.net>
Changli Gao [Tue, 29 Jun 2010 22:54:58 +0000 (22:54 +0000)]
act_mirred: combine duplicate code
act_mirred: combine duplicate code
tcf_bstats is updated in any way, so we can do it earlier to reduce the size of
the code.
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
----
net/sched/act_mirred.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
Signed-off-by: David S. Miller <davem@davemloft.net>
Kulikov Vasiliy [Wed, 30 Jun 2010 06:08:15 +0000 (06:08 +0000)]
net/neighbour.h: fix typo
'Shoul' must be 'should'.
Signed-off-by: Kulikov Vasiliy <segooon@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Anton Vorontsov [Wed, 30 Jun 2010 06:39:15 +0000 (06:39 +0000)]
gianfar: Implement workaround for eTSEC-A002 erratum
MPC8313ECE says:
"If the controller receives a 1- or 2-byte frame (such as an illegal
runt packet or a packet with RX_ER asserted) before GRS is asserted
and does not receive any other frames, the controller may fail to set
GRSC even when the receive logic is completely idle. Any subsequent
receive frame that is larger than two bytes will reset the state so
the graceful stop can complete. A MAC receiver (Rx) reset will also
reset the state."
This patch implements the proposed workaround:
"If IEVENT[GRSC] is still not set after the timeout, read the eTSEC
register at offset 0xD1C. If bits 7-14 are the same as bits 23-30,
the eTSEC Rx is assumed to be idle and the Rx can be safely reset.
If the register fields are not equal, wait for another timeout
period and check again."
Signed-off-by: Anton Vorontsov <avorontsov@mvista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Anton Vorontsov [Wed, 30 Jun 2010 06:39:13 +0000 (06:39 +0000)]
gianfar: Implement workaround for eTSEC76 erratum
MPC8313ECE says:
"For TOE=1 huge or jumbo frames, the data required to generate the
checksum may exceed the 2500-byte threshold beyond which the controller
constrains itself to one memory fetch every 256 eTSEC system clocks.
This throttling threshold is supposed to trigger only when the
controller has sufficient data to keep transmit active for the duration
of the memory fetches. The state machine handling this threshold,
however, fails to take large TOE frames into account. As a result,
TOE=1 frames larger than 2500 bytes often see excess delays before start
of transmission."
This patch implements the workaround as suggested by the errata
document, i.e.:
"Limit TOE=1 frames to less than 2500 bytes to avoid excess delays due to
memory throttling.
When using packets larger than 2700 bytes, it is recommended to turn TOE
off."
To be sure, we limit the TOE frames to 2500 bytes, and do software
checksumming instead.
Signed-off-by: Anton Vorontsov <avorontsov@mvista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Anton Vorontsov [Wed, 30 Jun 2010 06:39:12 +0000 (06:39 +0000)]
gianfar: Implement workaround for eTSEC74 erratum
MPC8313ECE says:
"If MACCFG2[Huge Frame]=0 and the Ethernet controller receives frames
which are larger than MAXFRM, the controller truncates the frames to
length MAXFRM and marks RxBD[TR]=1 to indicate the error. The controller
also erroneously marks RxBD[TR]=1 if the received frame length is MAXFRM
or MAXFRM-1, even though those frames are not truncated.
No truncation or truncation error occurs if MACCFG2[Huge Frame]=1."
There are two options to workaround the issue:
"1. Set MACCFG2[Huge Frame]=1, so no truncation occurs for invalid large
frames. Software can determine if a frame is larger than MAXFRM by
reading RxBD[LG] or RxBD[Data Length].
2. Set MAXFRM to 1538 (0x602) instead of the default 1536 (0x600), so
normal-length frames are not marked as truncated. Software can examine
RxBD[Data Length] to determine if the frame was larger than MAXFRM-2."
This patch implements the first workaround option by setting HUGEFRAME
bit, and gfar_clean_rx_ring() already checks the RxBD[Data Length].
Signed-off-by: Anton Vorontsov <avorontsov@mvista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sebastian Andrzej Siewior [Wed, 30 Jun 2010 17:39:19 +0000 (10:39 -0700)]
net/core: use ntohs for skb->protocol
This is only noticed by people that are not doing everything correct in
the first place.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Sat, 26 Jun 2010 11:42:55 +0000 (11:42 +0000)]
ipv6: Use interface max_desync_factor instead of static default
max_desync_factor can be configured per-interface, but nothing is
using the value.
Reported-by: Piotr Lewandowski <piotr.lewandowski@gmail.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Sat, 26 Jun 2010 11:37:47 +0000 (11:37 +0000)]
ipv6: Clamp reported valid_lft to a minimum of 0
Since addresses are only revalidated every 2 minutes, the reported
valid_lft can underflow shortly before the address is deleted.
Clamp it to a minimum of 0, as for prefered_lft.
Reported-by: Piotr Lewandowski <piotr.lewandowski@gmail.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nicolas Kaiser [Sat, 26 Jun 2010 06:58:54 +0000 (06:58 +0000)]
usb: pegasus: fixed coding style issues
Fixed brace, static initialization, comment, whitespace and spacing
coding style issues.
Signed-off-by: Nicolas Kaiser <nikai@nikai.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Tue, 29 Jun 2010 15:26:56 +0000 (15:26 +0000)]
3c59x: Use fine-grained locks for MII and windowed register access
This avoids scheduling in atomic context and also means that IRQs
will only be deferred for relatively short periods of time.
Previously discussed in:
http://article.gmane.org/gmane.linux.network/155024
Reported-by: Arne Nordmark <nordmark@mech.kth.se>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bruce Allan [Tue, 29 Jun 2010 18:13:13 +0000 (18:13 +0000)]
e1000e: disable EEE support by default
Based on community feedback, EEE should be disabled by default until the
IEEE802.3az specification has been finalized.
Cc: bhutchings@solarflare.com
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Emil Tantilov <emil.s.tantilov@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bruce Allan [Tue, 29 Jun 2010 18:12:52 +0000 (18:12 +0000)]
e1000e: remove EEE module parameter
As requested by Dave Miller. A follow-on set of patches will allow for
ethtool to enable/disable the feature instead.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Emil Tantilov <emil.s.tantilov@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bruce Allan [Tue, 29 Jun 2010 18:12:30 +0000 (18:12 +0000)]
e1000e: suppress compile warnings on certain archs
Commit
84f4ee902ad3ee964b7b3a13d5b7cf9c086e9916 causes compile warnings on
architectures that have unsigned long long's that are not 64-bit, e.g.
ia64.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dean Nelson [Tue, 29 Jun 2010 18:12:05 +0000 (18:12 +0000)]
e1000e: don't inadvertently re-set INTX_DISABLE
Should e1000_test_msi() fail to see an msi interrupt, it attempts to
fallback to legacy INTx interrupts. But an error in the code may prevent
this from happening correctly.
Before calling e1000_test_msi_interrupt(), e1000_test_msi() disables SERR
by clearing the SERR bit from the just read PCI_COMMAND bits as it writes
them back out.
Upon return from calling e1000_test_msi_interrupt(), it re-enables SERR
by writing out the version of PCI_COMMAND it had previously read.
The problem with this is that e1000_test_msi_interrupt() calls
pci_disable_msi(), which eventually ends up in pci_intx(). And because
pci_intx() was called with enable set to 1, the INTX_DISABLE bit gets
cleared from PCI_COMMAND, which is what we want. But when we get back to
e1000_test_msi(), the INTX_DISABLE bit gets inadvertently re-set because
of the attempt by e1000_test_msi() to re-enable SERR.
The solution is to have e1000_test_msi() re-read the PCI_COMMAND bits as
part of its attempt to re-enable SERR.
During debugging/testing of this issue I found that not all the systems
I ran on had the SERR bit set to begin with. And on some of the systems
the same could be said for the INTX_DISABLE bit. Needless to say these
latter systems didn't have a problem falling back to legacy INTx
interrupts with the code as is.
Signed-off-by: Dean Nelson <dnelson@redhat.com>
CC: stable@kernel.org
Tested-by: Emil Tantilov <emil.s.tantilov@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nicolas Kaiser [Sun, 27 Jun 2010 11:44:52 +0000 (11:44 +0000)]
drivers/net/Makefile: conditionally descend to wireless
Don't descend to wireless unless it is actually used.
Signed-off-by: Nicolas Kaiser <nikai@nikai.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nicolas Kaiser [Sun, 27 Jun 2010 00:00:25 +0000 (00:00 +0000)]
net/Makefile: conditionally descend to wireless and ieee802154
Don't descend to wireless and ieee802154 unless they are actually used.
Signed-off-by: Nicolas Kaiser <nikai@nikai.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rajesh K Borundia [Tue, 29 Jun 2010 08:01:20 +0000 (08:01 +0000)]
qlcnic: Add support for configuring eswitch and npars
Following changes are made:
1.Obtain capabilities of Nic partition.
2.Configure tx bandwidth of particular Nic partition.
3.Configure the eswitch for setting port mirroring, enable mac
learning, promiscous mode.
Signed-off-by: Rajesh K Borundia <rajesh.borundia@qlogic.com>
Signed-off-by: Anirban Chakraborty <anirban.chakraborty@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Anirban Chakraborty [Tue, 29 Jun 2010 07:52:12 +0000 (07:52 +0000)]
qlcnic: Remove obsolete code
Current driver uses FW API version 2 and thus code corresponding to FW API
version 1 has become obsolete. Clean up this from the driver.
Signed-off-by: Anirban Chakraborty <anirban.chakraborty@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Choi, David [Mon, 28 Jun 2010 15:23:41 +0000 (15:23 +0000)]
micrel phy driver - updated(1)
Hello all:
This patch fixes what Ben mentioned, namely duplicated ids.
From: David J. Choi <david.choi@micrel.com>
Body of the explanation: This patch has changes as followings;
-support the interrupt from phy devices from Micrel Inc.
-support more phy devices, ks8737, ks8721, ks8041, ks8051 from Micrel.
-remove vsc8201 because this device was used only internal test at Micrel.
Signed-off-by: David J. Choi <david.choi@micrel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stanislaw Gruszka [Sun, 27 Jun 2010 23:31:34 +0000 (23:31 +0000)]
qlcnic: fail when try to setup unsupported features
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stanislaw Gruszka [Sun, 27 Jun 2010 23:33:29 +0000 (23:33 +0000)]
netxen: fail when try to setup unsupported features
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stanislaw Gruszka [Sun, 27 Jun 2010 23:28:11 +0000 (23:28 +0000)]
bnx2x: fail when try to setup unsupported features
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stanislaw Gruszka [Sun, 27 Jun 2010 23:29:42 +0000 (23:29 +0000)]
vmxnet3: fail when try to setup unsupported features
Return EOPNOTSUPP in ethtool_ops->set_flags.
Fix coding style while at it.
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stanislaw Gruszka [Sun, 27 Jun 2010 23:26:23 +0000 (23:26 +0000)]
e1000e: fail when try to setup unsupported features
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sjur Braendeland [Tue, 29 Jun 2010 07:08:21 +0000 (00:08 -0700)]
caif-driver: Add CAIF-SPI Protocol driver.
This patch introduces the CAIF SPI Protocol Driver for
CAIF Link Layer.
This driver implements a platform driver to accommodate for a
platform specific SPI device. A general platform driver is not
possible as there are no SPI Slave side Kernel API defined.
A sample CAIF SPI Platform device can be found in
.../Documentation/networking/caif/spi_porting.txt
Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sjur Braendeland [Sat, 26 Jun 2010 11:31:28 +0000 (11:31 +0000)]
caif: Kconfig and Makefile fixes
Use "depends on" instead of "if" in Kconfig files.
Fixed CAIF debug flag, and removed unnecessary clean-* options.
Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Casey Leedom [Fri, 25 Jun 2010 12:15:33 +0000 (12:15 +0000)]
cxgb4vf: Stitch new T4 PCI-E SR-IOV Virtual Function driver into the build
Stitch new T4 PCI-E SR-IOV Virtual Function driver into the build.
Signed-off-by: Casey Leedom
Signed-off-by: David S. Miller <davem@davemloft.net>
Casey Leedom [Fri, 25 Jun 2010 12:14:57 +0000 (12:14 +0000)]
cxgb4vf: Add new Makefile for T4 PCI-E SR-IOV Virtual Function driver cxgb4vf
Add new Makefile for T4 PCI-E SR-IOV Virtual Function driver "cxgb4vf".
Signed-off-by: Casey Leedom
Signed-off-by: David S. Miller <davem@davemloft.net>
Casey Leedom [Fri, 25 Jun 2010 12:14:15 +0000 (12:14 +0000)]
cxgb4vf: Add main T4 PCI-E SR-IOV Virtual Function driver for cxgb4vf
Add main T4 PCI-E SR-IOV Virtual Function driver for "cxgb4vf".
Signed-off-by: Casey Leedom
Signed-off-by: David S. Miller <davem@davemloft.net>
Casey Leedom [Fri, 25 Jun 2010 12:13:28 +0000 (12:13 +0000)]
cxgb4vf: Add T4 Virtual Function Scatter-Gather Engine DMA code
Add T4 Virtual Function Scatter-Gather Engine DMA code.
Signed-off-by: Casey Leedom
Signed-off-by: David S. Miller <davem@davemloft.net>
Casey Leedom [Fri, 25 Jun 2010 12:12:54 +0000 (12:12 +0000)]
cxgb4vf: Add core T4 PCI-E SR-IOV Virtual Function hardware definitions and device communication code
Add core T4 PCI-E SR-IOV Virtual Function hardware definitions and device
communication code.
Signed-off-by: Casey Leedom
Signed-off-by: David S. Miller <davem@davemloft.net>
Casey Leedom [Fri, 25 Jun 2010 12:11:46 +0000 (12:11 +0000)]
cxgb4vf: Add code to provision T4 PCI-E SR-IOV Virtual Functions with hardware resources
Add code to provision T4 PCI-E SR-IOV Virtual Functions with hardware
resources.
Signed-off-by: Casey Leedom
Signed-off-by: David S. Miller <davem@davemloft.net>
Casey Leedom [Fri, 25 Jun 2010 12:11:05 +0000 (12:11 +0000)]
cxgb4vf: Add new macros and definitions for hardware constants
Add new macros and definitions for hardware constants.
Signed-off-by: Casey Leedom
Signed-off-by: David S. Miller <davem@davemloft.net>
Casey Leedom [Fri, 25 Jun 2010 12:10:32 +0000 (12:10 +0000)]
cxgb4vf: update to latest T4 firmware API file
Update to latest T4 firmware API file.
Signed-off-by: Casey Leedom
Signed-off-by: David S. Miller <davem@davemloft.net>
Casey Leedom [Fri, 25 Jun 2010 12:09:38 +0000 (12:09 +0000)]
cxgb4vf: small changes to message processing structures/macros
Split cpl_tx_pkt_lso into core message structure and encapsulated message,
make RSPD_LEN macro match other response descriptor macros.
Signed-off-by: Casey Leedom
Signed-off-by: David S. Miller <davem@davemloft.net>
David Daney [Thu, 24 Jun 2010 09:14:48 +0000 (09:14 +0000)]
netdev: mdio-octeon: Fix section mismatch errors.
We started getting:
WARNING: vmlinux.o(.data+0x20bd0): Section mismatch in reference from
the variable octeon_mdiobus_driver to the function
.init.text:octeon_mdiobus_probe()
This fixes it.
Signed-off-by: David Daney <ddaney@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Daney [Thu, 24 Jun 2010 09:14:47 +0000 (09:14 +0000)]
netdev: octeon_mgmt: Fix section mismatch errors.
We started getting:
WARNING: drivers/net/built-in.o(.data+0x10f0): Section mismatch in
reference from the variable octeon_mgmt_driver to the function
.init.text:octeon_mgmt_probe()
This fixes it.
Signed-off-by: David Daney <ddaney@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Changli Gao [Thu, 24 Jun 2010 16:25:12 +0000 (16:25 +0000)]
act_mirred: don't clone skb when skb isn't shared
don't clone skb when skb isn't shared
When the tcf_action is TC_ACT_STOLEN, and the skb isn't shared, we don't need
to clone a new skb. As the skb will be freed after this function returns, we
can use it freely once we get a reference to it.
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
----
include/net/sch_generic.h | 11 +++++++++--
net/sched/act_mirred.c | 6 +++---
2 files changed, 12 insertions(+), 5 deletions(-)
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 24 Jun 2010 01:00:22 +0000 (01:00 +0000)]
tcp: tso_fragment() might avoid GFP_ATOMIC
We can pass a gfp argument to tso_fragment() and avoid GFP_ATOMIC
allocations sometimes.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 24 Jun 2010 00:55:06 +0000 (00:55 +0000)]
vlan: 64 bit rx counters
Use u64_stats_sync infrastructure to implement 64bit rx stats.
(tx stats are addressed later)
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 24 Jun 2010 00:54:21 +0000 (00:54 +0000)]
macvlan: 64 bit rx counters
Use u64_stats_sync infrastructure to implement 64bit stats.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 24 Jun 2010 00:54:06 +0000 (00:54 +0000)]
net: u64_stats_fetch_begin_bh() and u64_stats_fetch_retry_bh()
- Must disable preemption in case of 32bit UP in u64_stats_fetch_begin()
and u64_stats_fetch_retry()
- Add new u64_stats_fetch_begin_bh() and u64_stats_fetch_retry_bh() for
network usage, disabling BH on 32bit UP only.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 24 Jun 2010 00:52:37 +0000 (00:52 +0000)]
net: use this_cpu_ptr()
use this_cpu_ptr(p) instead of per_cpu_ptr(p, smp_processor_id())
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 24 Jun 2010 00:04:38 +0000 (00:04 +0000)]
net: u64_stats_sync improvements
- Add a comment about interrupts:
6) If counter might be written by an interrupt, readers should block
interrupts.
- Fix a typo in sample of use.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Wed, 23 Jun 2010 13:54:31 +0000 (13:54 +0000)]
3c59x: Specify window explicitly for access to windowed registers
Currently much of the code assumes that a specific window has been
selected, while a few functions save and restore the window. This
makes it impossible to introduce fine-grained locking.
Make those assumptions explicit by introducing wrapper functions
to set the window and read/write a register. Use these everywhere
except vortex_interrupt(), vortex_start_xmit() and vortex_rx().
These set the window just once, or not at all in the case of
vortex_rx() as it should always be called from vortex_interrupt().
Cache the current window in struct vortex_private to avoid
unnecessary hardware writes.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Tested-by: Arne Nordmark <nordmark@mech.kth.se> [against 2.6.32]
Signed-off-by: David S. Miller <davem@davemloft.net>
Amerigo Wang [Mon, 21 Jun 2010 22:50:17 +0000 (22:50 +0000)]
mlx4: add dynamic LRO disable support
This patch adds dynamic LRO diable support for mlx4 net driver.
It also fixes a bug of mlx4, which checks NETIF_F_LRO flag in rx
path without rtnl lock.
(I don't have mlx4 card, so only did compiling test. Anyone who wants
to test this is more than welcome.)
This is based on Neil's initial work too, and heavily modified based
on Stanislaw's suggestions.
Signed-off-by: WANG Cong <amwang@redhat.com>
Signed-off-by: Neil Horman <nhorman@redhat.com>
Acked-by: Neil Horman <nhorman@redhat.com>
Reviewed-by: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Mason [Thu, 24 Jun 2010 18:45:10 +0000 (18:45 +0000)]
s2io: add dynamic LRO disable support
This patch adds dynamic LRO disable support for s2io net driver,
enables LRO by default, increases the driver version number, and
corrects the name of the LRO modparm.
This is mostly Wang's patch based on Neil's initial work, heavily
modified based on Ramkrishna's suggestions. This has been tested on
a Neterion Xframe adapter and verified via adapter LRO statistics.
Signed-off-by: Jon Mason <jon.mason@exar.com>
Signed-off-by: WANG Cong <amwang@redhat.com>
Signed-off-by: Neil Horman <nhorman@redhat.com>
Acked-by: Neil Horman <nhorman@redhat.com>
Reviewed-by: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Ramkrishna Vepa <Ramkrishna.Vepa@exar.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Westphal [Mon, 21 Jun 2010 11:48:45 +0000 (11:48 +0000)]
syncookies: add support for ECN
Allows use of ECN when syncookies are in effect by encoding ecn_ok
into the syn-ack tcp timestamp.
While at it, remove a uneeded #ifdef CONFIG_SYN_COOKIES.
With CONFIG_SYN_COOKIES=nm want_cookie is ifdef'd to 0 and gcc
removes the "if (0)".
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Westphal [Mon, 21 Jun 2010 11:48:44 +0000 (11:48 +0000)]
syncookies: do not store rcv_wscale in tcp timestamp
As pointed out by Fernando Gont there is no need to encode rcv_wscale
into the cookie.
We did not use the restored rcv_wscale anyway; it is recomputed
via tcp_select_initial_window().
Thus we can save 4 bits in the ts option space by removing rcv_wscale.
In case window scaling was not supported, we set the (invalid) wscale
value 0xf.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 23 Jun 2010 02:51:25 +0000 (02:51 +0000)]
ipv6: remove ipv6_statistics
commit
9261e5370112 (ipv6: making ip and icmp statistics per/namespace)
forgot to remove ipv6_statistics variable.
commit
bc417d99bf27 (ipv6: remove stale MIB definitions) took care of
icmpv6_statistics & icmpv6msg_statistics
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Denis V. Lunev <den@openvz.org>
CC: Alexey Dobriyan <adobriyan@gmail.com>
CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 22 Jun 2010 20:58:41 +0000 (20:58 +0000)]
snmp: add align parameter to snmp_mib_init()
In preparation for 64bit snmp counters for some mibs,
add an 'align' parameter to snmp_mib_init(), instead
of assuming mibs only contain 'unsigned long' fields.
Callers can use __alignof__(type) to provide correct
alignment.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Herbert Xu <herbert@gondor.apana.org.au>
CC: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
CC: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 22 Jun 2010 12:44:11 +0000 (12:44 +0000)]
loopback: use u64_stats_sync infrastructure
Commit
6b10de38f0ef (loopback: Implement 64bit stats on 32bit arches)
introduced 64bit stats in loopback driver, using a private seqcount and
private helpers.
David suggested to introduce a generic infrastructure, added in (net:
Introduce u64_stats_sync infrastructure)
This patch reimplements loopback 64bit stats using the u64_stats_sync
infrastructure.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 22 Jun 2010 07:43:15 +0000 (07:43 +0000)]
arp: RCU change in arp_solicit()
Avoid two atomic ops in arp_solicit()
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gerrit Renker [Tue, 22 Jun 2010 01:14:35 +0000 (01:14 +0000)]
dccp: make implementation of Syn-RTT symmetric
This patch is thanks to Andre Noll who reported the issue and helped testing.
The Syn-RTT sampled during the initial handshake currently only works for
the client sending the DCCP-Request. TFRC penalizes the absence of an RTT
sample with a very slow initial speed (1 packet per second), which delays
slow-start significantly, resulting in sluggish performance.
This patch mirrors the "Syn RTT" principle by adding a timestamp also onto
the DCCP-Response, producing an RTT sample when the (Data)Ack completing
the handshake arrives.
Also changed the documentation to 'TFRC' since Syn RTTs are also used by CCID-4.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gerrit Renker [Tue, 22 Jun 2010 01:14:34 +0000 (01:14 +0000)]
dccp: remove unused function argument
This removes an unused 'sk' argument from several option-inserting functions.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Benard [Fri, 18 Jun 2010 04:19:54 +0000 (04:19 +0000)]
net/fec: clean suspend/resume
Commit
59d4289b83b11379d867e2f7146904b19cc96404 converted fec to dev_pm_ops but
didn't update the suspend/resume functions thus leading to the following warning :
"initialization from incompatible pointer type" when CONFIG_PM is set.
This patch also fixe a few indentation and style around CONFIG_PM area.
Signed-off-by: Eric Bénard <eric@eukrea.com>
Cc: netdev@vger.kernel.org
Cc: davem@davemloft.net
Cc: amit.kucheria@canonical.com
Cc: s.hauer@pengutronix.de
Cc: linux-arm-kernel@lists.infradead.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Divy Le Ray [Mon, 21 Jun 2010 15:54:53 +0000 (15:54 +0000)]
cxgb3: request 7.10 firmware
The driver requests FW 7.10
Bump up driver version.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Divy Le Ray [Mon, 21 Jun 2010 15:54:48 +0000 (15:54 +0000)]
cxgb3: update FW to 7.10
Update FW to 7.10
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Joe Perches [Mon, 21 Jun 2010 12:29:14 +0000 (12:29 +0000)]
net/core/pktgen.c: Use pr_<level>
Add pr_fmt(fmt) KBUILD_MODNAME ": " fmt
Remove "pktgen: " from formats
Convert printks to pr_<level>
Added func_enter() for debugging
Moved version to end of string at module_init
Coalesced long formats
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hagen Paul Pfeifer [Sat, 19 Jun 2010 17:05:36 +0000 (17:05 +0000)]
net: optimize Berkeley Packet Filter (BPF) processing
Gcc is currenlty not in the ability to optimize the switch statement in
sk_run_filter() because of dense case labels. This patch replace the
OR'd labels with ordered sequenced case labels. The sk_chk_filter()
function is modified to patch/replace the original OPCODES in a
ordered but equivalent form. gcc is now in the ability to transform the
switch statement in sk_run_filter into a jump table of complexity O(1).
Until this patch gcc generates a sequence of conditional branches (O(n) of 567
byte .text segment size (arch x86_64):
7ff: 8b 06 mov (%rsi),%eax
801: 66 83 f8 35 cmp $0x35,%ax
805: 0f 84 d0 02 00 00 je adb <sk_run_filter+0x31d>
80b: 0f 87 07 01 00 00 ja 918 <sk_run_filter+0x15a>
811: 66 83 f8 15 cmp $0x15,%ax
815: 0f 84 c5 02 00 00 je ae0 <sk_run_filter+0x322>
81b: 77 73 ja 890 <sk_run_filter+0xd2>
81d: 66 83 f8 04 cmp $0x4,%ax
821: 0f 84 17 02 00 00 je a3e <sk_run_filter+0x280>
827: 77 29 ja 852 <sk_run_filter+0x94>
829: 66 83 f8 01 cmp $0x1,%ax
[...]
With the modification the compiler translate the switch statement into
the following jump table fragment:
7ff: 66 83 3e 2c cmpw $0x2c,(%rsi)
803: 0f 87 1f 02 00 00 ja a28 <sk_run_filter+0x26a>
809: 0f b7 06 movzwl (%rsi),%eax
80c: ff 24 c5 00 00 00 00 jmpq *0x0(,%rax,8)
813: 44 89 e3 mov %r12d,%ebx
816: e9 43 03 00 00 jmpq b5e <sk_run_filter+0x3a0>
81b: 41 89 dc mov %ebx,%r12d
81e: e9 3b 03 00 00 jmpq b5e <sk_run_filter+0x3a0>
Furthermore, I reordered the instructions to reduce cache line misses by
order the most common instruction to the start.
Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Fri, 25 Jun 2010 07:06:29 +0000 (07:06 +0000)]
sfc: Log clearer error messages for hardware monitor
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Fri, 25 Jun 2010 07:05:56 +0000 (07:05 +0000)]
sfc: Use Toeplitz IPv4 hash for RSS and hash insertion
Insertion of the Falcon hash is unreliable.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Fri, 25 Jun 2010 07:05:43 +0000 (07:05 +0000)]
sfc: Move siena_nic_data::ipv6_rss_key to efx_nic::rx_hash_key
We will use this hash key for Toeplitz IPv4 hashing too.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Fri, 25 Jun 2010 07:05:33 +0000 (07:05 +0000)]
sfc: Fix reading of inserted hash
The hash appears immediately before the packet data, not at the
beginning of the buffer. This means we can easily use negative offsets
from the start of packet data, so adjust the data and length at the
top of __efx_rx_packet() instead of wherever we consume the hash.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vasanthy Kolluri [Thu, 24 Jun 2010 10:52:26 +0000 (10:52 +0000)]
enic: Clean ups
1) Update copyright
2) Fix hardware queue descriptor field size CQ_ENET_RQ_DESC_FCOE_SOF_BITS
3) Include rtnetlink.h instead of if_link.h
4) Selectively flush writes to interrupt mask register
5) Use pci_enable_device_mem
6) Remove unused variables and header files
7) Fix size mismatch between memory alloc and free operations of a variable
8) Check for non null arguments to vic_provinfo_alloc
Signed-off-by: Scott Feldman <scofeldm@cisco.com>
Signed-off-by: Vasanthy Kolluri <vkolluri@cisco.com>
Signed-off-by: Roopa Prabhu <roprabhu@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vasanthy Kolluri [Thu, 24 Jun 2010 10:52:08 +0000 (10:52 +0000)]
enic: Bug Fix: Handle surprise hardware removals
Handle surprise hardware removals gracefully during devcmd issue and init,
cleanup of queues.
Signed-off-by: Scott Feldman <scofeldm@cisco.com>
Signed-off-by: Vasanthy Kolluri <vkolluri@cisco.com>
Signed-off-by: Roopa Prabhu <roprabhu@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vasanthy Kolluri [Thu, 24 Jun 2010 10:51:59 +0000 (10:51 +0000)]
enic: Feature Add: Add loopback capability to enic devices
Hardware has the loopback capability to queue the packets transmitted from
a device to the receive queue of the same device. enic now supports the
loopback capability.
Signed-off-by: Scott Feldman <scofeldm@cisco.com>
Signed-off-by: Vasanthy Kolluri <vkolluri@cisco.com>
Signed-off-by: Roopa Prabhu <roprabhu@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vasanthy Kolluri [Thu, 24 Jun 2010 10:51:51 +0000 (10:51 +0000)]
enic: Use receive queue buffer blocks of 32/64 entries
Change the receive queue buffer allocations into blocks of 32 entries when
ring size is less than 64, otherwise use 64 entries per block.
Signed-off-by: Scott Feldman <scofeldm@cisco.com>
Signed-off-by: Vasanthy Kolluri <vkolluri@cisco.com>
Signed-off-by: Roopa Prabhu <roprabhu@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vasanthy Kolluri [Thu, 24 Jun 2010 10:51:43 +0000 (10:51 +0000)]
enic: Add new firmware devcmds
Add new firmware devcmds - CMD_PROXY_BY_BDF, CMD_PACKET_FILTER_ALL,
CMD_ENABLE_WAIT.
Signed-off-by: Scott Feldman <scofeldm@cisco.com>
Signed-off-by: Vasanthy Kolluri <vkolluri@cisco.com>
Signed-off-by: Roopa Prabhu <roprabhu@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vasanthy Kolluri [Thu, 24 Jun 2010 10:50:56 +0000 (10:50 +0000)]
enic: Use (netdev|dev|pr)_<level> macro helpers for logging
Replace all printk routines with the (netdev|dev|pr)_<level> macros that
provide verbose logs.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Scott Feldman <scofeldm@cisco.com>
Signed-off-by: Vasanthy Kolluri <vkolluri@cisco.com>
Signed-off-by: Roopa Prabhu <roprabhu@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vasanthy Kolluri [Thu, 24 Jun 2010 10:50:12 +0000 (10:50 +0000)]
enic: Clean up: Add wrapper routines for firmware devcmd calls
Add wrapper routines that issue devcmds to firmware and ensure that a
devcmd lock is held for each devcmd call.
Signed-off-by: Scott Feldman <scofeldm@cisco.com>
Signed-off-by: Vasanthy Kolluri <vkolluri@cisco.com>
Signed-off-by: Roopa Prabhu <roprabhu@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vasanthy Kolluri [Thu, 24 Jun 2010 10:50:00 +0000 (10:50 +0000)]
enic: Use a lighter reset operation for enic devices
The port profile information for a dynamic enic device is set by the upper
layers, that are oblivious to the device reset operation. We do not want a
reset operation erase the network state of a dynamic enic device as there
is no way to set up the port profile information again. Hence a lighter
reset operation called hang reset is used. Hang reset, unlike soft reset
does not reset the network state and resets the host side state only.
Signed-off-by: Scott Feldman <scofeldm@cisco.com>
Signed-off-by: Vasanthy Kolluri <vkolluri@cisco.com>
Signed-off-by: Roopa Prabhu <roprabhu@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vasanthy Kolluri [Thu, 24 Jun 2010 10:49:51 +0000 (10:49 +0000)]
enic: Bug Fix: Change hardware ingress vlan rewrite mode
The current ingress vlan rewrite mode setting lets the hardware strip off
the tag control information of a packet received on native vlan. As a
result, the priority bits are also lost. The fix is to change the ingress
vlan rewrite mode setting such that the complete tag control information is
retained for packets that belong to native vlan.
Signed-off-by: Scott Feldman <scofeldm@cisco.com>
Signed-off-by: Vasanthy Kolluri <vkolluri@cisco.com>
Signed-off-by: Roopa Prabhu <roprabhu@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vasanthy Kolluri [Thu, 24 Jun 2010 10:49:25 +0000 (10:49 +0000)]
enic: Feature Add: Replace LRO with GRO
enic now uses the GRO mechanism instead of LRO to pass skbs to upper
layers.
Signed-off-by: Scott Feldman <scofeldm@cisco.com>
Signed-off-by: Vasanthy Kolluri <vkolluri@cisco.com>
Signed-off-by: Roopa Prabhu <roprabhu@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Thu, 24 Jun 2010 14:58:42 +0000 (14:58 +0000)]
cnic: Update version to 2.1.3.
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Thu, 24 Jun 2010 14:58:41 +0000 (14:58 +0000)]
cnic: Further unify kcq handling code.
This eliminates some of the duplicate code for the various devices
that require the same basic kcq handling.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Thu, 24 Jun 2010 14:58:40 +0000 (14:58 +0000)]
cnic: Restructure kcq processing.
By doing more work in the common function cnic_get_kcqes(), and
making full use of the kcq_info structure.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Thu, 24 Jun 2010 14:58:39 +0000 (14:58 +0000)]
cnic: Unify kcq allocation for all devices.
By creating a common data stucture kcq_info for all devices, the kcq
(kernel completion queue) for all devices can be allocated by common
code.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Thu, 24 Jun 2010 14:58:38 +0000 (14:58 +0000)]
cnic: Unify IRQ code for all hardware types.
By creating a common cnic_doirq().
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Thu, 24 Jun 2010 14:58:37 +0000 (14:58 +0000)]
cnic: Fine-tune CID memory space calculation.
The current code makes assumptions about the CID (context ID) memory
space and starting CID that may not be always correct when firmware
changes. In particular, BNX2_ISCSI_START_CID may not always be fixed.
We now calculate cp->max_cid_space and cp->iscsi_start_cid dynamically
instead of using fixed constants. The unused cp->max_iscsi_conn is also
eliminated.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Wed, 23 Jun 2010 11:31:28 +0000 (11:31 +0000)]
sfc: Record hardware RX hash on each skb where possible
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Wed, 23 Jun 2010 11:30:35 +0000 (11:30 +0000)]
sfc: Disable setting feature flags that are not implemented
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Wed, 23 Jun 2010 11:30:26 +0000 (11:30 +0000)]
sfc: Replace EFX_DRIVER_NAME with KBUILD_MODNAME
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Wed, 23 Jun 2010 11:30:07 +0000 (11:30 +0000)]
sfc: Implement message level control
Replace EFX_ERR() with netif_err(), EFX_INFO() with netif_info(),
EFX_LOG() with netif_dbg() and EFX_TRACE() and EFX_REGDUMP() with
netif_vdbg().
Replace EFX_ERR_RL(), EFX_INFO_RL() and EFX_LOG_RL() using explicit
calls to net_ratelimit().
Implement the ethtool operations to get and set message level flags,
and add a 'debug' module parameter for the initial value.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Wed, 23 Jun 2010 11:29:24 +0000 (11:29 +0000)]
sfc: Log MTD errors using partition name, not just net device name
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Mon, 21 Jun 2010 03:06:53 +0000 (03:06 +0000)]
sfc: Implement ethtool register dump operation
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Acked-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Konstantin Khorenko [Fri, 25 Jun 2010 04:54:58 +0000 (21:54 -0700)]
tcp: do not send reset to already closed sockets
i've found that tcp_close() can be called for an already closed
socket, but still sends reset in this case (tcp_send_active_reset())
which seems to be incorrect. Moreover, a packet with reset is sent
with different source port as original port number has been already
cleared on socket. Besides that incrementing stat counter for
LINUX_MIB_TCPABORTONCLOSE also does not look correct in this case.
Initially this issue was found on 2.6.18-x RHEL5 kernel, but the same
seems to be true for the current mainstream kernel (checked on
2.6.35-rc3). Please, correct me if i missed something.
How that happens:
1) the server receives a packet for socket in TCP_CLOSE_WAIT state
that triggers a tcp_reset():
Call Trace:
<IRQ> [<
ffffffff8025b9b9>] tcp_reset+0x12f/0x1e8
[<
ffffffff80046125>] tcp_rcv_state_process+0x1c0/0xa08
[<
ffffffff8003eb22>] tcp_v4_do_rcv+0x310/0x37a
[<
ffffffff80028bea>] tcp_v4_rcv+0x74d/0xb43
[<
ffffffff8024ef4c>] ip_local_deliver_finish+0x0/0x259
[<
ffffffff80037131>] ip_local_deliver+0x200/0x2f4
[<
ffffffff8003843c>] ip_rcv+0x64c/0x69f
[<
ffffffff80021d89>] netif_receive_skb+0x4c4/0x4fa
[<
ffffffff80032eca>] process_backlog+0x90/0xec
[<
ffffffff8000cc50>] net_rx_action+0xbb/0x1f1
[<
ffffffff80012d3a>] __do_softirq+0xf5/0x1ce
[<
ffffffff8001147a>] handle_IRQ_event+0x56/0xb0
[<
ffffffff8006334c>] call_softirq+0x1c/0x28
[<
ffffffff80070476>] do_softirq+0x2c/0x85
[<
ffffffff80070441>] do_IRQ+0x149/0x152
[<
ffffffff80062665>] ret_from_intr+0x0/0xa
<EOI> [<
ffffffff80008a2e>] __handle_mm_fault+0x6cd/0x1303
[<
ffffffff80008903>] __handle_mm_fault+0x5a2/0x1303
[<
ffffffff80033a9d>] cache_free_debugcheck+0x21f/0x22e
[<
ffffffff8006a263>] do_page_fault+0x49a/0x7dc
[<
ffffffff80066487>] thread_return+0x89/0x174
[<
ffffffff800c5aee>] audit_syscall_exit+0x341/0x35c
[<
ffffffff80062e39>] error_exit+0x0/0x84
tcp_rcv_state_process()
... // (sk_state == TCP_CLOSE_WAIT here)
...
/* step 2: check RST bit */
if(th->rst) {
tcp_reset(sk);
goto discard;
}
...
---------------------------------
tcp_rcv_state_process
tcp_reset
tcp_done
tcp_set_state(sk, TCP_CLOSE);
inet_put_port
__inet_put_port
inet_sk(sk)->num = 0;
sk->sk_shutdown = SHUTDOWN_MASK;
2) After that the process (socket owner) tries to write something to
that socket and "inet_autobind" sets a _new_ (which differs from
the original!) port number for the socket:
Call Trace:
[<
ffffffff80255a12>] inet_bind_hash+0x33/0x5f
[<
ffffffff80257180>] inet_csk_get_port+0x216/0x268
[<
ffffffff8026bcc9>] inet_autobind+0x22/0x8f
[<
ffffffff80049140>] inet_sendmsg+0x27/0x57
[<
ffffffff8003a9d9>] do_sock_write+0xae/0xea
[<
ffffffff80226ac7>] sock_writev+0xdc/0xf6
[<
ffffffff800680c7>] _spin_lock_irqsave+0x9/0xe
[<
ffffffff8001fb49>] __pollwait+0x0/0xdd
[<
ffffffff8008d533>] default_wake_function+0x0/0xe
[<
ffffffff800a4f10>] autoremove_wake_function+0x0/0x2e
[<
ffffffff800f0b49>] do_readv_writev+0x163/0x274
[<
ffffffff80066538>] thread_return+0x13a/0x174
[<
ffffffff800145d8>] tcp_poll+0x0/0x1c9
[<
ffffffff800c56d3>] audit_syscall_entry+0x180/0x1b3
[<
ffffffff800f0dd0>] sys_writev+0x49/0xe4
[<
ffffffff800622dd>] tracesys+0xd5/0xe0
3) sendmsg fails at last with -EPIPE (=> 'write' returns -EPIPE in userspace):
F: tcp_sendmsg1 -EPIPE: sk=
ffff81000bda00d0, sport=49847, old_state=7, new_state=7, sk_err=0, sk_shutdown=3
Call Trace:
[<
ffffffff80027557>] tcp_sendmsg+0xcb/0xe87
[<
ffffffff80033300>] release_sock+0x10/0xae
[<
ffffffff8016f20f>] vgacon_cursor+0x0/0x1a7
[<
ffffffff8026bd32>] inet_autobind+0x8b/0x8f
[<
ffffffff8003a9d9>] do_sock_write+0xae/0xea
[<
ffffffff80226ac7>] sock_writev+0xdc/0xf6
[<
ffffffff800680c7>] _spin_lock_irqsave+0x9/0xe
[<
ffffffff8001fb49>] __pollwait+0x0/0xdd
[<
ffffffff8008d533>] default_wake_function+0x0/0xe
[<
ffffffff800a4f10>] autoremove_wake_function+0x0/0x2e
[<
ffffffff800f0b49>] do_readv_writev+0x163/0x274
[<
ffffffff80066538>] thread_return+0x13a/0x174
[<
ffffffff800145d8>] tcp_poll+0x0/0x1c9
[<
ffffffff800c56d3>] audit_syscall_entry+0x180/0x1b3
[<
ffffffff800f0dd0>] sys_writev+0x49/0xe4
[<
ffffffff800622dd>] tracesys+0xd5/0xe0
tcp_sendmsg()
...
/* Wait for a connection to finish. */
if ((1 << sk->sk_state) & ~(TCPF_ESTABLISHED | TCPF_CLOSE_WAIT)) {
int old_state = sk->sk_state;
if ((err = sk_stream_wait_connect(sk, &timeo)) != 0) {
if (f_d && (err == -EPIPE)) {
printk("F: tcp_sendmsg1 -EPIPE: sk=%p, sport=%u, old_state=%d, new_state=%d, "
"sk_err=%d, sk_shutdown=%d\n",
sk, ntohs(inet_sk(sk)->sport), old_state, sk->sk_state,
sk->sk_err, sk->sk_shutdown);
dump_stack();
}
goto out_err;
}
}
...
4) Then the process (socket owner) understands that it's time to close
that socket and does that (and thus triggers sending reset packet):
Call Trace:
...
[<
ffffffff80032077>] dev_queue_xmit+0x343/0x3d6
[<
ffffffff80034698>] ip_output+0x351/0x384
[<
ffffffff80251ae9>] dst_output+0x0/0xe
[<
ffffffff80036ec6>] ip_queue_xmit+0x567/0x5d2
[<
ffffffff80095700>] vprintk+0x21/0x33
[<
ffffffff800070f0>] check_poison_obj+0x2e/0x206
[<
ffffffff80013587>] poison_obj+0x36/0x45
[<
ffffffff8025dea6>] tcp_send_active_reset+0x15/0x14d
[<
ffffffff80023481>] dbg_redzone1+0x1c/0x25
[<
ffffffff8025dea6>] tcp_send_active_reset+0x15/0x14d
[<
ffffffff8000ca94>] cache_alloc_debugcheck_after+0x189/0x1c8
[<
ffffffff80023405>] tcp_transmit_skb+0x764/0x786
[<
ffffffff8025df8a>] tcp_send_active_reset+0xf9/0x14d
[<
ffffffff80258ff1>] tcp_close+0x39a/0x960
[<
ffffffff8026be12>] inet_release+0x69/0x80
[<
ffffffff80059b31>] sock_release+0x4f/0xcf
[<
ffffffff80059d4c>] sock_close+0x2c/0x30
[<
ffffffff800133c9>] __fput+0xac/0x197
[<
ffffffff800252bc>] filp_close+0x59/0x61
[<
ffffffff8001eff6>] sys_close+0x85/0xc7
[<
ffffffff800622dd>] tracesys+0xd5/0xe0
So, in brief:
* a received packet for socket in TCP_CLOSE_WAIT state triggers
tcp_reset() which clears inet_sk(sk)->num and put socket into
TCP_CLOSE state
* an attempt to write to that socket forces inet_autobind() to get a
new port (but the write itself fails with -EPIPE)
* tcp_close() called for socket in TCP_CLOSE state sends an active
reset via socket with newly allocated port
This adds an additional check in tcp_close() for already closed
sockets. We do not want to send anything to closed sockets.
Signed-off-by: Konstantin Khorenko <khorenko@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>