GitHub/exynos8895/android_kernel_samsung_universal8895.git
14 years agortnetlink: Link address family API
Thomas Graf [Tue, 16 Nov 2010 04:30:14 +0000 (04:30 +0000)]
rtnetlink: Link address family API

Each net_device contains address family specific data such as
per device settings and statistics. We already expose this data
via procfs/sysfs and partially netlink.

The netlink method requires the requester to send one RTM_GETLINK
request for each address family it wishes to receive data of
and then merge this data itself.

This patch implements a new API which combines all address family
specific link data in a new netlink attribute IFLA_AF_SPEC.
IFLA_AF_SPEC contains a sequence of nested attributes, one for each
address family which in turn defines the structure of its own
attribute. Example:

   [IFLA_AF_SPEC] = {
       [AF_INET] = {
           [IFLA_INET_CONF] = ...,
       },
       [AF_INET6] = {
           [IFLA_INET6_FLAGS] = ...,
           [IFLA_INET6_CONF] = ...,
       }
   }

The API also allows for address families to implement a function
which parses the IFLA_AF_SPEC attribute sent by userspace to
implement address family specific link options.

Signed-off-by: Thomas Graf <tgraf@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoclarify documentation for net.ipv4.igmp_max_memberships
Jeremy Eder [Mon, 15 Nov 2010 05:41:31 +0000 (05:41 +0000)]
clarify documentation for net.ipv4.igmp_max_memberships

This patch helps clarify documentation for
net.ipv4.igmp_max_memberships by providing a formula for
calculating the maximum number of multicast groups that can be
subscribed to, plus defining the theoretical limit.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: Jeremy Eder <jeder@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agodrivers/isdn/hisax: Add printf format/argument verification and fix fallout
Joe Perches [Wed, 10 Nov 2010 18:54:58 +0000 (18:54 +0000)]
drivers/isdn/hisax: Add printf format/argument verification and fix fallout

Add __attribute__((format... to several functins
Make formats and arguments match.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoSELinux: return -ECONNREFUSED from ip_postroute to signal fatal error
Eric Paris [Tue, 16 Nov 2010 11:52:57 +0000 (11:52 +0000)]
SELinux: return -ECONNREFUSED from ip_postroute to signal fatal error

The SELinux netfilter hooks just return NF_DROP if they drop a packet.  We
want to signal that a drop in this hook is a permanant fatal error and is not
transient.  If we do this the error will be passed back up the stack in some
places and applications will get a faster interaction that something went
wrong.

Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonetwork: tcp_connect should return certain errors up the stack
Eric Paris [Tue, 16 Nov 2010 11:52:49 +0000 (11:52 +0000)]
network: tcp_connect should return certain errors up the stack

The current tcp_connect code completely ignores errors from sending an skb.
This makes sense in many situations (like -ENOBUFFS) but I want to be able to
immediately fail connections if they are denied by the SELinux netfilter hook.
Netfilter does not normally return ECONNREFUSED when it drops a packet so we
respect that error code as a final and fatal error that can not be recovered.

Based-on-patch-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonetfilter: allow hooks to pass error code back up the stack
Eric Paris [Tue, 16 Nov 2010 11:52:38 +0000 (11:52 +0000)]
netfilter: allow hooks to pass error code back up the stack

SELinux would like to pass certain fatal errors back up the stack.  This patch
implements the generic netfilter support for this functionality.

Based-on-patch-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonet/atm: Remove unnecessary casts of netdev_priv
Joe Perches [Mon, 15 Nov 2010 11:12:33 +0000 (11:12 +0000)]
net/atm: Remove unnecessary casts of netdev_priv

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agodrivers/net: Remove unnecessary casts of netdev_priv
Joe Perches [Mon, 15 Nov 2010 11:12:31 +0000 (11:12 +0000)]
drivers/net: Remove unnecessary casts of netdev_priv

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agodrivers/net/vxge: Remove unnecessary casts of netdev_priv
Joe Perches [Mon, 15 Nov 2010 11:12:30 +0000 (11:12 +0000)]
drivers/net/vxge: Remove unnecessary casts of netdev_priv

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Jon Mason <jon.mason@exar.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agodrivers/net/usb: Remove unnecessary casts of netdev_priv
Joe Perches [Mon, 15 Nov 2010 11:12:29 +0000 (11:12 +0000)]
drivers/net/usb: Remove unnecessary casts of netdev_priv

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agodrivers/net/qlge: Remove unnecessary casts of netdev_priv
Joe Perches [Mon, 15 Nov 2010 11:12:28 +0000 (11:12 +0000)]
drivers/net/qlge: Remove unnecessary casts of netdev_priv

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agodrivers/net/qla3xxx.c: Remove unnecessary casts of netdev_priv
Joe Perches [Mon, 15 Nov 2010 11:12:27 +0000 (11:12 +0000)]
drivers/net/qla3xxx.c: Remove unnecessary casts of netdev_priv

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agodrivers/net/pcmcia: Remove unnecessary casts of netdev_priv
Joe Perches [Mon, 15 Nov 2010 11:12:26 +0000 (11:12 +0000)]
drivers/net/pcmcia: Remove unnecessary casts of netdev_priv

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agodrivers/net/bonding: Remove unnecessary casts of netdev_priv
Joe Perches [Mon, 15 Nov 2010 11:12:25 +0000 (11:12 +0000)]
drivers/net/bonding: Remove unnecessary casts of netdev_priv

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agodrivers/isdn/i4l: Remove unnecessary casts of netdev_priv
Joe Perches [Mon, 15 Nov 2010 11:12:24 +0000 (11:12 +0000)]
drivers/isdn/i4l: Remove unnecessary casts of netdev_priv

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoMerge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/jkirsher/net...
David S. Miller [Wed, 17 Nov 2010 17:56:04 +0000 (09:56 -0800)]
Merge branch 'master' of /linux/kernel/git/jkirsher/net-next-2.6

14 years agoe1000e: add netpoll support for MSI/MSI-X IRQ modes
Dongdong Deng [Wed, 17 Nov 2010 03:50:15 +0000 (19:50 -0800)]
e1000e: add netpoll support for MSI/MSI-X IRQ modes

With enabling CONFIG_PCI_MSI, e1000e could work in MSI/MSI-X IRQ mode,
and netpoll controller didn't deal with those IRQ modes on e1000e.

This patch add the handling MSI/MSI-X IRQ modes to netpoll controller,
so that netconsole could work with those IRQ modes.

Signed-off-by: Dongdong Deng <dongdong.deng@windriver.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
14 years agoe1000e: 82574 intermittently fails to initialize with manageability f/w
Bruce Allan [Wed, 17 Nov 2010 03:50:14 +0000 (19:50 -0800)]
e1000e: 82574 intermittently fails to initialize with manageability f/w

The driver can fail initializing the hardware when manageability firmware
is performing concurrent MDIO operations because the hardware semaphore
scheme to prevent concurrent operations between software and firmware is
incorrect for 82574/82583.  Instead of using the SWSM register, the driver
should be using the EXTCNF_CTRL register.  A software mutex is also added
to prevent simultaneous software threads from performing similar concurrent
accesses.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Emil Tantilov <emil.s.tantilov@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
14 years agoe1000e: 82571 SerDes link handle null code word from partner
Bruce Allan [Wed, 17 Nov 2010 03:50:13 +0000 (19:50 -0800)]
e1000e: 82571 SerDes link handle null code word from partner

SerDes Link detection on certain 82571 mezzanine cards can fail when the
link is forced, the link partner does not recognize forced link and the
link partner sends null code words.  Detect the null code words and return
to auto-negotiation state which causes the link partner to begin responding
with valid code words.  Within a reasonable interval the link will finally
settle as forced by both partners.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
14 years agoRemove extra struct page member from the buffer info structure
Greg Rose [Wed, 17 Nov 2010 03:41:36 +0000 (19:41 -0800)]
Remove extra struct page member from the buffer info structure

declaration.

Reported-by: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Emil Tantilov <emil.s.tantilov@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
14 years agoigbvf: Remove some dead code in igbvf
Julian Stecklina [Wed, 17 Nov 2010 03:41:36 +0000 (19:41 -0800)]
igbvf: Remove some dead code in igbvf

Removed unused variable in igbvf.

Signed-off-by: Julian Stecklina <jsteckli@os.inf.tu-dresden.de>
Acked-by: Greg Rose <greg.v.rose@intel.com>
Tested-by: Emil Tantilov <emil.s.tantilov@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
14 years agoigbvf: Update version and Copyright
Greg Rose [Wed, 17 Nov 2010 03:41:35 +0000 (19:41 -0800)]
igbvf: Update version and Copyright

Update version string and copyright notice

Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Emil Tantilov <emil.s.tantilov@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
14 years agoixgbevf: Fix Oops
Greg Rose [Wed, 17 Nov 2010 03:27:19 +0000 (19:27 -0800)]
ixgbevf: Fix Oops

The driver is calling netif_carrier_off and netif_tx_stop_all_queues
before the netdevice is registered which causes an Oops.  Move call
to netif_carrier_off after the netdevice is registered and remove
call to netif_tx_stop_all_queues because there aren't any TX
queues yet.

Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Emil Tantilov <emil.s.tantilov@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
14 years agoixgbe: refactor ixgbe_alloc_queues()
Eric Dumazet [Wed, 17 Nov 2010 03:27:18 +0000 (19:27 -0800)]
ixgbe: refactor ixgbe_alloc_queues()

I noticed ring variable was initialized before allocations, and that
memory node management was a bit ugly. We also leak memory in case of
ring allocations error.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
14 years agoixgbe: add support for x540 MAC
Don Skidmore [Wed, 17 Nov 2010 03:27:17 +0000 (19:27 -0800)]
ixgbe: add support for x540 MAC

This patch adds support for the x540 MAC which is the next MAC in the
82598/82599 line.

Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
14 years agoixgbe: add MAC and PHY support for x540
Don Skidmore [Wed, 17 Nov 2010 03:27:16 +0000 (19:27 -0800)]
ixgbe: add MAC and PHY support for x540

Adds the new x540.c file and Aquantia 1202 PHY for X540 support.

Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
14 years agoixgbe: make silicon specific functions generic
Don Skidmore [Wed, 17 Nov 2010 03:27:15 +0000 (19:27 -0800)]
ixgbe: make silicon specific functions generic

The new MAC type X540 shares much of the same functionality of
some silicon specific functions.  To reduce duplicate code,
made these functions generic.

Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
14 years agoixgbe: make sure FCoE DDP user buffers are really released by the HW
Yi Zou [Wed, 17 Nov 2010 03:27:14 +0000 (19:27 -0800)]
ixgbe: make sure FCoE DDP user buffers are really released by the HW

When invalidating the DDP context is invalidated, the HW may not be done
with the user buffer right away. In which case, we poll the FCBUFF register
to check if the buffer valid bit is cleared or not, if not, we wait for max
100us that is guaranteed by the HW.

Signed-off-by: Yi Zou <yi.zou@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
14 years agoixgbe: invalidate FCoE DDP context when no error status is available
Yi Zou [Wed, 17 Nov 2010 03:27:13 +0000 (19:27 -0800)]
ixgbe: invalidate FCoE DDP context when no error status is available

The hw automatically invalidates the context if DDP is successful or there is
error detected. In case there is no error status available from the hw,
initializing the per context error status to be 1 allows the DDP context to be
still invalidated via the upper layer call to ddp_put().

Signed-off-by: Yi Zou <yi.zou@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
14 years agoixgbe: avoid doing FCoE DDP when adapter is DOWN or RESETTING
Yi Zou [Wed, 17 Nov 2010 03:27:13 +0000 (19:27 -0800)]
ixgbe: avoid doing FCoE DDP when adapter is DOWN or RESETTING

There is no point to allow incoming DDP requests from the upper layer stack if
the adapter is going down or being reset.

Signed-off-by: Yi Zou <yi.zou@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
14 years agoixgbe: rework Tx hang detection to fix reoccurring false Tx hangs
John Fastabend [Wed, 17 Nov 2010 03:27:12 +0000 (19:27 -0800)]
ixgbe: rework Tx hang detection to fix reoccurring false Tx hangs

The Tx hang logic has been known to detect false hangs when
the device is receiving pause frames or has delayed processing
for some other reason.

This patch makes the logic more robust and resolves these
known issues. The old logic checked to see if the device
was paused by querying the HW then the hang logic was
aborted if the device was currently paused. This check was
racy because the device could have been in the pause state
any time up to this check. The other operation of the
hang logic is to verify the Tx ring is still advancing
the old logic checked the EOP timestamp. This is not
sufficient to determine the ring is not advancing but
only infers that it may be moving slowly.

Here we add logic to track the number of completed Tx
descriptors and use the adapter stats to check if any
pause frames have been received since the previous Tx
hang check. This way we avoid racing with the HW
register and do not detect false hangs if the ring is
advancing slowly.

This patch is primarily the work of Jesse Brandeburg. I
clean it up some and fixed the PFC checking.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
14 years agoixgbe: Resolve null function pointer accesses on 82598 w/ multi-speed fiber
Alexander Duyck [Wed, 17 Nov 2010 03:27:11 +0000 (19:27 -0800)]
ixgbe: Resolve null function pointer accesses on 82598 w/ multi-speed fiber

This change resolves some null function pointer accesses on 82598 when a
multi-speed fiber module is inserted into the adapter.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
14 years agoixgbe: populate the ring->q_vector pointer during ring mapping
Alexander Duyck [Wed, 17 Nov 2010 03:27:10 +0000 (19:27 -0800)]
ixgbe: populate the ring->q_vector pointer during ring mapping

The q_vector back pointer was not being set in the rings so it would not
have been possible to determine the parent q_vector of the ring.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
14 years agoixgbe: cleanup ixgbe_map_rings_to_vectors
Alexander Duyck [Wed, 17 Nov 2010 03:27:09 +0000 (19:27 -0800)]
ixgbe: cleanup ixgbe_map_rings_to_vectors

This change cleans up some of the items in ixgbe_map_rings_to_vectors.
Specifically it merges the two for loops and drops the unnecessary vectors
parameter.

It also moves the vector names into the q_vectors themselves.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
14 years agoixgbe: simplify math and improve stack use of ixgbe_set_itr functions
Alexander Duyck [Wed, 17 Nov 2010 03:27:08 +0000 (19:27 -0800)]
ixgbe: simplify math and improve stack use of ixgbe_set_itr functions

This change is meant to improve the stack utilization and simplify the math
used in ixgbe_set_itr_msix.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
14 years agoixgbe: cleanup unclear references to reg_idx
Alexander Duyck [Wed, 17 Nov 2010 03:27:07 +0000 (19:27 -0800)]
ixgbe: cleanup unclear references to reg_idx

There are a number of places where we use the variable j to contain the
register index of the ring.  Instead of using such a non-descriptive
variable name it is better that we name it reg_idx so that it is clear what
the variable contains.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
14 years agoixgbe: cleanup unnecessary return value in ixgbe_cache_ring_rss
Alexander Duyck [Wed, 17 Nov 2010 03:27:06 +0000 (19:27 -0800)]
ixgbe: cleanup unnecessary return value in ixgbe_cache_ring_rss

This change is just to cleanup some confusing logic in ixgbe_cache_ring_rss
which can be simplified by adding a conditional with return to the start of
the call.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
14 years agoixgbe: Cleanup DCB logic, whitespace, and comments in ixgbe_ethtool.c
Alexander Duyck [Wed, 17 Nov 2010 03:27:05 +0000 (19:27 -0800)]
ixgbe: Cleanup DCB logic, whitespace, and comments in ixgbe_ethtool.c

This change address a few whitespace issues in DCB #ifdefs, adds a comment
calling out the DCB specific registers, and nests an if statement inline
with a number of if statements related to flow control.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
14 years agoixgbe: add WOL support for backplane adapters
Alexander Duyck [Wed, 17 Nov 2010 03:27:05 +0000 (19:27 -0800)]
ixgbe: add WOL support for backplane adapters

This change adds support for certain 82599 based Mezzanine adapters.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
14 years agoixgbe: cleanup ixgbe_set_tx_csum ethtool flags configuration
Alexander Duyck [Wed, 17 Nov 2010 03:27:04 +0000 (19:27 -0800)]
ixgbe: cleanup ixgbe_set_tx_csum ethtool flags configuration

This change makes it so that we always disable SCTP regardless of mac type
since we shouldn't need to check mac type before disabling a feature that
isn't supported on a given piece of hardware.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
14 years agoixgbe: change mac_type if statements to switch statements
Alexander Duyck [Wed, 17 Nov 2010 03:27:03 +0000 (19:27 -0800)]
ixgbe: change mac_type if statements to switch statements

This change replaces a number of if/elseif/else statements with switch
statements to support the addition of future devices to the ixgbe driver.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
14 years agoixgbe: cleanup use of ixgbe_rsc_count and RSC_CB
Alexander Duyck [Wed, 17 Nov 2010 03:27:02 +0000 (19:27 -0800)]
ixgbe: cleanup use of ixgbe_rsc_count and RSC_CB

This change cleans up the use of rsc_count and changes it to a boolean since
the actual numerical value is used nowhere in the Rx cleanup path.  I am
also moving the skb count into the RSC_CB path since it is much easier to
track it there than when it is passed as a parameter to various function
calls.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
14 years agoixgbe: cleanup ATR filter setup function
Alexander Duyck [Wed, 17 Nov 2010 03:27:01 +0000 (19:27 -0800)]
ixgbe: cleanup ATR filter setup function

This change cleans up the ixgbe_atr filter setup function so that it uses
fewer items from the stack.  Since the code is only applicable to IPv4 w/
TCP it makes sense to just use the pointers based on the headers themselves
instead of copying them to temp variables and then writing those to the
filters.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
14 years agoixgbe: cleanup ixgbe_clean_rx_irq
Alexander Duyck [Wed, 17 Nov 2010 03:27:00 +0000 (19:27 -0800)]
ixgbe: cleanup ixgbe_clean_rx_irq

The code for ixgbe_clean_rx_irq was much more tangled up than it needed to
be in terms of logic statements and unused variables.  This change
untangles much of that and drops several unused variables such as cleaned
which was being returned but never checked.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
14 years agoixgbe: change vector numbering so that queues end up on correct CPUs
Alexander Duyck [Wed, 17 Nov 2010 03:26:59 +0000 (19:26 -0800)]
ixgbe: change vector numbering so that queues end up on correct CPUs

This changes the numbering scheme slightly. Previously the ordering was
coming out like this:
Rx-2
Rx-1
Rx-0
TxRx-0
Which would drop two queues on CPU 0. This change makes it so that the
ordering is like this:
Rx-3
Rx-2
Rx-1
TxRx-0
This means that each CPU will have it's own Rx queue, and only CPU 0 will
have the Tx queue.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
14 years agoixgbe: reorder Tx cleanup so that if adapter will reset we don't rearm
Alexander Duyck [Wed, 17 Nov 2010 03:26:58 +0000 (19:26 -0800)]
ixgbe: reorder Tx cleanup so that if adapter will reset we don't rearm

The code as it existed could re-arm the queues when it was requesting a HW
reset due to a TX hang. Instead of doing that this change makes it so that
we will just exit if the hardware is believed to be hung.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
14 years agoixgbe: Disable RSC when ITR setting is too high to allow RSC
Alexander Duyck [Wed, 17 Nov 2010 03:26:57 +0000 (19:26 -0800)]
ixgbe: Disable RSC when ITR setting is too high to allow RSC

RSC will flush its descriptors every time the interrupt throttle timer
expires.  In addition there are known issues with RSC when the rx-usecs
value is set too low.  As such we are forced to clear the RSC_ENABLED bit
and reset the adapter when the rx-usecs value is set too low.

However we do not need to clear the NETIF_F_LRO flag because it is used to
indicate that the user wants to leave the LRO feature enabled, and in fact
with this change we will now re-enable RSC as soon as the rx-usecs value is
increased and the flag is still set.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
14 years agoixgbe: cleanup race conditions in link setup
Alexander Duyck [Wed, 17 Nov 2010 03:26:57 +0000 (19:26 -0800)]
ixgbe: cleanup race conditions in link setup

This change makes it so that we perform link setup with interrupts
disabled. If the SFP has not been detected previously we will schedule the
SFP detection task to run in order to detect link.  By doing this we avoid
the possibility of interrupts firing in the middle of our link setup during
ixgbe_up_complete.

In addition this change makes it so that the multi-speed fiber setup and SFP
setup are not mutually exclusive.  The addresses issues seen in which a
link would only come up at 1G on some multi-speed fiber modules.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
14 years agoixgbe: add a state flags to ring
Alexander Duyck [Wed, 17 Nov 2010 03:26:56 +0000 (19:26 -0800)]
ixgbe: add a state flags to ring

This change adds a set of state flags to the rings that allow them to
independently function allowing for features like RSC, packet split, and
TX hang detection to be done per ring instead of for the entire device.

This is accomplished by re-purposing the flow director reinit_state member
and making it a global state instead since a long for a single bit flag is
a bit wasteful.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
14 years agoixgbe: move CPU variable from ring into q_vector, add ring->q_vector
Alexander Duyck [Wed, 17 Nov 2010 03:26:55 +0000 (19:26 -0800)]
ixgbe: move CPU variable from ring into q_vector, add ring->q_vector

This is the start of work to sort out what belongs in the rings and what
belongs in the q_vector. Items like the CPU variable for make much more
sense in the q_vector since the CPU is a per-interrupt thing rather than a
per ring thing.
I also added a back-pointer from the ring to the q_vector.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
14 years agoixgbe: move adapter into pci_dev driver data instead of netdev
Alexander Duyck [Wed, 17 Nov 2010 03:26:54 +0000 (19:26 -0800)]
ixgbe: move adapter into pci_dev driver data instead of netdev

This change moves an adapter pointer into the private portion of the
pci_dev instead of a pointer to the netdev. The reason for this change is
because in most cases we just want the adapter anyway. In addition as we
start moving toward multiple netdevs per port we may want to move the
adapter pointer out of the netdevs entirely.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
14 years agoixgbe: remove residual code left over from earlier combining of TXDCTL
Alexander Duyck [Wed, 17 Nov 2010 03:26:53 +0000 (19:26 -0800)]
ixgbe: remove residual code left over from earlier combining of TXDCTL

Missed some code that was left floating around in the DCB configuration
for the TXDCTL register.  As a result the register was being messed with in
two different spots when we only needed to do the change once.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
14 years agoixgbe: move ixgbe_clear_interrupt_scheme to before pci_save_state
Alexander Duyck [Wed, 17 Nov 2010 03:26:52 +0000 (19:26 -0800)]
ixgbe: move ixgbe_clear_interrupt_scheme to before pci_save_state

The main reason for this change is to keep the suspend/resume logic matched
up. The clear_interrupt_scheme function will disable MSI-X which will
effect the PCIe configuration space. Therefore we will want to do it before
we save state to avoid having the interrupt state restored by
pci_restore_state, and then trying to re-enable MSI/MSI-X interrupts via
ixgbe_setup_interrupt_scheme.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
14 years agoixgbe: add a netdev pointer to the ring structure
Alexander Duyck [Wed, 17 Nov 2010 03:26:51 +0000 (19:26 -0800)]
ixgbe: add a netdev pointer to the ring structure

This change places a netdev pointer directly into the ring structure. This
way we can avoid having to determine which netdev we are supposed to be
using and can just access the one on the ring directly.
As a result of this change further collapse of the code is possible by
dropping the adapter from ixgbe_alloc_rx_buffers, and the netdev pointer
from ixgbe_xmit_frame_ring_adv and ixgbe_maybe_stop_tx.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
14 years agoixgbe: combine some stats into a union to allow for Tx/Rx stats overlap
Alexander Duyck [Wed, 17 Nov 2010 03:26:50 +0000 (19:26 -0800)]
ixgbe: combine some stats into a union to allow for Tx/Rx stats overlap

This change moved some of the RX and TX stats into separate structures and
them placed those structures in a union in order to help reduce the size of
the ring structure.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
14 years agoixgbe: move device pointer into the ring structure
Alexander Duyck [Wed, 17 Nov 2010 03:26:49 +0000 (19:26 -0800)]
ixgbe: move device pointer into the ring structure

This change is meant to simplify DMA map/unmap by providing a device
pointer. As a result the adapter pointer can be dropped from many of
the calls.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
14 years agoixgbe: drop ring->head, make ring->tail a pointer instead of offset
Alexander Duyck [Wed, 17 Nov 2010 03:26:49 +0000 (19:26 -0800)]
ixgbe: drop ring->head, make ring->tail a pointer instead of offset

This change drops ring->head since it is not used in any hot-path and can
easily be determined using IXGBE_[RT]DH(ring->reg_idx).

It also changes ring->tail into a true pointer so we can avoid unnecessary
pointer math to find the location of the tail.

In addition I also dropped the setting of head and tail in
ixgbe_clean_[rx|tx]_ring. The only location that should be setting the head
and tail values is ixgbe_configure_[rx|tx]_ring and that is only while the
queue is disabled.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
14 years agoixgbe: cleanup ixgbe_alloc_rx_buffers
Alexander Duyck [Wed, 17 Nov 2010 03:26:48 +0000 (19:26 -0800)]
ixgbe: cleanup ixgbe_alloc_rx_buffers

This change re-orders alloc_rx_buffers to make better use of the packet
split enabled flag.  The new setup should require less branching in the
code since now we are down to fewer if statements since we either are
handling packet split or aren't.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
14 years agoixgbe: move GSO segments and byte count processing into ixgbe_tx_map
Alexander Duyck [Wed, 17 Nov 2010 03:26:47 +0000 (19:26 -0800)]
ixgbe: move GSO segments and byte count processing into ixgbe_tx_map

This change simplifies the work being done by the TX interrupt handler and
pushes it into the tx_map call. This allows for fewer cache misses since
the TX cleanup now accesses almost none of the skb members.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
14 years agoixgbe: remove unnecessary re-init of adapter on Rx-csum change
Alexander Duyck [Wed, 17 Nov 2010 03:26:46 +0000 (19:26 -0800)]
ixgbe: remove unnecessary re-init of adapter on Rx-csum change

There is no need to reset the adapter when changing the Rx checksum
settings. Since the only change is a software flag we can disable it
without needing to reset the entire adapter.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
14 years agoixgbe: DCB: credit max only needs to be gt TSO size for 82598
John Fastabend [Wed, 17 Nov 2010 03:26:45 +0000 (19:26 -0800)]
ixgbe: DCB: credit max only needs to be gt TSO size for 82598

The maximum credits per traffic class only needs to be greater
then the TSO size for 82598 devices. The 82599 devices do not
have this requirement so only do this test for 82598 devices.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
14 years agoixgbe: DCB set PFC high and low water marks per data sheet specs
John Fastabend [Wed, 17 Nov 2010 03:26:44 +0000 (19:26 -0800)]
ixgbe: DCB set PFC high and low water marks per data sheet specs

Currently the high and low water marks for PFC are being set
conservatively for jumbo frames. This means the RX buffers
are being underutilized in the default 1500 MTU. This patch
fixes this so that the water marks are set as described in
the data sheet considering the MTU size.

The equation used is,

RTT * 1.44 + MTU * 1.44 + MTU

Where RTT is the round trip time and MTU is the max frame size
in KB. To avoid floating point arithmetic FC_HIGH_WATER is
defined

((((RTT + MTU) * 144) + 99) / 100) + MTU

This changes how the hardware field fc.low_water and
fc.high_water are used. With this change they are no longer
storing the actual low water and high water markers but are
storing the required head room in the buffer. This simplifies
the logic and we do not need to account for the size of the
buffer when setting the thresholds.

Testing with iperf and 16 threads showed a slight uptick in
throughput over a single traffic class .1-.2Gbps and a reduction
in pause frames. Without the patch a 30 second run would show
~10-15 pause frames being transmitted with the patch ~2-5 are
seen. Test were run back to back with 82599.

Note RXPBSIZE is in KB and low and high water marks fields are
also in KB. However the FCRT* registers are 32B granularity and
right shifted 5 into the register,

(((rx_pbsize - water_mark) * 1024) / 32) << 5

is the most explicit conversion here we simplify

(rx_pbsize - water_mark) * 32 << 5 = (rx_pbsize - water_mark) << 10

This patch updates the PFC thresholds and legacy FC thresholds.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
14 years agoixgbevf: Update Version String and Copyright Notice
Greg Rose [Wed, 17 Nov 2010 03:26:43 +0000 (19:26 -0800)]
ixgbevf: Update Version String and Copyright Notice

Update version string and copyright notice.

Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Emil Tantilov <emil.s.tantilov@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
14 years agoixgbe: delay rx_ring freeing
Eric Dumazet [Wed, 17 Nov 2010 03:26:42 +0000 (19:26 -0800)]
ixgbe: delay rx_ring freeing

"cat /proc/net/dev" uses RCU protection only.

Its quite possible we call a driver get_stats() method while device is
dismantling and freeing its data structures.

So get_stats() methods must be very careful not accessing driver private
data without appropriate locking.

In ixgbe case, we access rx_ring pointers. These pointers are freed in
ixgbe_clear_interrupt_scheme() and set to NULL, this can trigger NULL
dereference in ixgbe_get_stats64()

A possible fix is to use RCU locking in ixgbe_get_stats64() and defer
rx_ring freeing after a grace period in ixgbe_clear_interrupt_scheme()

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Reported-by: Tantilov, Emil S <emil.s.tantilov@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
14 years agompc52xx: cleanup locking
Eric Dumazet [Wed, 3 Nov 2010 05:56:38 +0000 (05:56 +0000)]
mpc52xx: cleanup locking

commit 1e4e0767ecb1 (Fix locking on fec_mpc52xx driver) assumed IRQ are
enabled when an IRQ handler is called.

It is not the case anymore (IRQF_DISABLED is deprecated), so we can use
regular spin_lock(), no need for spin_lock_irqsave().

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Tested-by: Jean-Michel Hautbois <jhautbois@gmail.com>
Cc: Asier Llano <a.llano@ziv.es>
Cc: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonet: reorder struct sock fields
Eric Dumazet [Tue, 16 Nov 2010 05:56:04 +0000 (05:56 +0000)]
net: reorder struct sock fields

Right now, fields in struct sock are not optimally ordered, because each
path (RX softirq, TX completion, RX user,  TX user) has to touch fields
that are contained in many different cache lines.

The really critical thing is to shrink number of cache lines that are
used at RX softirq time : CPU handling softirqs for a device can receive
many frames per second for many sockets. If load is too big, we can drop
frames at NIC level. RPS or multiqueue cards can help, but better reduce
latency if possible.

This patch starts with UDP protocol, then additional patches will try to
reduce latencies of other ones as well.

At RX softirq time, fields of interest for UDP protocol are :
(not counting ones in inet struct for the lookup)

Read/Written:
sk_refcnt   (atomic increment/decrement)
sk_rmem_alloc & sk_backlog.len (to check if there is room in queues)
sk_receive_queue
sk_backlog (if socket locked by user program)
sk_rxhash
sk_forward_alloc
sk_drops

Read only:
sk_rcvbuf (sk_rcvqueues_full())
sk_filter
sk_wq
sk_policy[0]
sk_flags

Additional notes :

- sk_backlog has one hole on 64bit arches. We can fill it to save 8
bytes.
- sk_backlog is used only if RX sofirq handler finds the socket while
locked by user.
- sk_rxhash is written only once per flow.
- sk_drops is written only if queues are full

Final layout :

[1] One section grouping all read/write fields, but placing rxhash and
sk_backlog at the end of this section.

[2] One section grouping all read fields in RX handler
   (sk_filter, sk_rcv_buf, sk_wq)

[3] Section used by other paths

I'll post a patch on its own to put sk_refcnt at the end of struct
sock_common so that it shares same cache line than section [1]

New offsets on 64bit arch :

sizeof(struct sock)=0x268
offsetof(struct sock, sk_refcnt)  =0x10
offsetof(struct sock, sk_lock)    =0x48
offsetof(struct sock, sk_receive_queue)=0x68
offsetof(struct sock, sk_backlog)=0x80
offsetof(struct sock, sk_rmem_alloc)=0x80
offsetof(struct sock, sk_forward_alloc)=0x98
offsetof(struct sock, sk_rxhash)=0x9c
offsetof(struct sock, sk_rcvbuf)=0xa4
offsetof(struct sock, sk_drops) =0xa0
offsetof(struct sock, sk_filter)=0xa8
offsetof(struct sock, sk_wq)=0xb0
offsetof(struct sock, sk_policy)=0xd0
offsetof(struct sock, sk_flags) =0xe0

Instead of :

sizeof(struct sock)=0x270
offsetof(struct sock, sk_refcnt)  =0x10
offsetof(struct sock, sk_lock)    =0x50
offsetof(struct sock, sk_receive_queue)=0xc0
offsetof(struct sock, sk_backlog)=0x70
offsetof(struct sock, sk_rmem_alloc)=0xac
offsetof(struct sock, sk_forward_alloc)=0x10c
offsetof(struct sock, sk_rxhash)=0x128
offsetof(struct sock, sk_rcvbuf)=0x4c
offsetof(struct sock, sk_drops) =0x16c
offsetof(struct sock, sk_filter)=0x198
offsetof(struct sock, sk_wq)=0x88
offsetof(struct sock, sk_policy)=0x98
offsetof(struct sock, sk_flags) =0x130

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoudp: use atomic_inc_not_zero_hint
Eric Dumazet [Mon, 15 Nov 2010 19:58:26 +0000 (19:58 +0000)]
udp: use atomic_inc_not_zero_hint

UDP sockets refcount is usually 2, unless an incoming frame is going to
be queued in receive or backlog queue.

Using atomic_inc_not_zero_hint() permits to reduce latency, because
processor issues less memory transactions.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agovlan: remove ndo_select_queue() logic
Eric Dumazet [Thu, 11 Nov 2010 09:42:45 +0000 (09:42 +0000)]
vlan: remove ndo_select_queue() logic

Now vlan are lockless, we dont need special ndo_select_queue() logic.
dev_pick_tx() will do the multiqueue stuff on the real device transmit.

Suggested-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agovlan: lockless transmit path
Eric Dumazet [Wed, 10 Nov 2010 23:42:00 +0000 (23:42 +0000)]
vlan: lockless transmit path

vlan is a stacked device, like tunnels. We should use the lockless
mechanism we are using in tunnels and loopback.

This patch completely removes locking in TX path.

tx stat counters are added into existing percpu stat structure, renamed
from vlan_rx_stats to vlan_pcpu_stats.

Note : this partially reverts commit 2e59af3dcbdf (vlan: multiqueue vlan
device)

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agomacvlan: lockless tx path
Eric Dumazet [Wed, 10 Nov 2010 21:14:04 +0000 (21:14 +0000)]
macvlan: lockless tx path

macvlan is a stacked device, like tunnels. We should use the lockless
mechanism we are using in tunnels and loopback.

This patch completely removes locking in TX path.

tx stat counters are added into existing percpu stat structure, renamed
from rx_stats to pcpu_stats.

Note : this reverts commit 2c11455321f37 (macvlan: add multiqueue
capability)

Note : rx_errors converted to a 32bit counter, like tx_dropped, since
they dont need 64bit range.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Ben Greear <greearb@candelatech.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agopacket: Enhance AF_PACKET implementation to not require high order contiguous memory...
Neil Horman [Tue, 16 Nov 2010 18:26:47 +0000 (10:26 -0800)]
packet: Enhance AF_PACKET implementation to not require high order contiguous memory allocation (v4)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Version 4 of this patch.

Change notes:
1) Removed extra memset.  Didn't think kcalloc added a GFP_ZERO the way kzalloc did :)

Summary:
It was shown to me recently that systems under high load were driven very deep
into swap when tcpdump was run.  The reason this happened was because the
AF_PACKET protocol has a SET_RINGBUFFER socket option that allows the user space
application to specify how many entries an AF_PACKET socket will have and how
large each entry will be.  It seems the default setting for tcpdump is to set
the ring buffer to 32 entries of 64 Kb each, which implies 32 order 5
allocation.  Thats difficult under good circumstances, and horrid under memory
pressure.

I thought it would be good to make that a bit more usable.  I was going to do a
simple conversion of the ring buffer from contigous pages to iovecs, but
unfortunately, the metadata which AF_PACKET places in these buffers can easily
span a page boundary, and given that these buffers get mapped into user space,
and the data layout doesn't easily allow for a change to padding between frames
to avoid that, a simple iovec change is just going to break user space ABI
consistency.

So I've done this, I've added a three tiered mechanism to the af_packet set_ring
socket option.  It attempts to allocate memory in the following order:

1) Using __get_free_pages with GFP_NORETRY set, so as to fail quickly without
digging into swap

2) Using vmalloc

3) Using __get_free_pages with GFP_NORETRY clear, causing us to try as hard as
needed to get the memory

The effect is that we don't disturb the system as much when we're under load,
while still being able to conduct tcpdumps effectively.

Tested successfully by me.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Maciej Żenczykowski <zenczykowski@gmail.com>
Reported-by: Maciej Żenczykowski <zenczykowski@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agodrivers/isdn/mISDN: Use printf extension %pV
Joe Perches [Tue, 9 Nov 2010 14:35:16 +0000 (14:35 +0000)]
drivers/isdn/mISDN: Use printf extension %pV

Using %pV reduces the number of printk calls and
eliminates any possible message interleaving from
other printk calls.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonetlink: let nlmsg and nla functions take pointer-to-const args
Jan Engelhardt [Tue, 16 Nov 2010 17:52:32 +0000 (09:52 -0800)]
netlink: let nlmsg and nla functions take pointer-to-const args

The changed functions do not modify the NL messages and/or attributes
at all. They should use const (similar to strchr), so that callers
which have a const nlmsg/nlattr around can make use of them without
casting.

While at it, constify a data array.

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoipv6: fix missing in6_ifa_put in addrconf
John Fastabend [Mon, 15 Nov 2010 20:29:21 +0000 (20:29 +0000)]
ipv6: fix missing in6_ifa_put in addrconf

Fix ref count bug introduced by

commit 2de795707294972f6c34bae9de713e502c431296
Author: Lorenzo Colitti <lorenzo@google.com>
Date:   Wed Oct 27 18:16:49 2010 +0000

ipv6: addrconf: don't remove address state on ifdown if the address
is being kept

Fix logic so that addrconf_ifdown() decrements the inet6_ifaddr
refcnt correctly with in6_ifa_put().

Reported-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wirel...
David S. Miller [Tue, 16 Nov 2010 17:17:12 +0000 (09:17 -0800)]
Merge branch 'master' of git://git./linux/kernel/git/linville/wireless-next-2.6

14 years agonet: Export netif_get_vlan_features().
David S. Miller [Tue, 16 Nov 2010 04:15:03 +0000 (20:15 -0800)]
net: Export netif_get_vlan_features().

ERROR: "netif_get_vlan_features" [drivers/net/xen-netfront.ko] undefined!

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoenic: Fix build warnings
Vasanthy Kolluri [Mon, 15 Nov 2010 08:09:55 +0000 (08:09 +0000)]
enic: Fix build warnings

Fix data type of argument passed to pci_alloc_consistent and pci_free_consistent routines.

Signed-off-by: Vasanthy Kolluri <vkolluri@cisco.com>
Signed-off-by: Roopa Prabhu <roprabhu@cisco.com>
Signed-off-by: David Wang <dwang2@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agohso: Fix unused variable warning
Alan Cox [Mon, 15 Nov 2010 07:30:42 +0000 (07:30 +0000)]
hso: Fix unused variable warning

Fallout from the TIOCGICOUNT work

Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agobridge: add RCU annotations to bridge port lookup
Eric Dumazet [Mon, 15 Nov 2010 06:38:14 +0000 (06:38 +0000)]
bridge: add RCU annotations to bridge port lookup

br_port_get() renamed to br_port_get_rtnl() to make clear RTNL is held.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agobridge: fix RCU races with bridge port
stephen hemminger [Mon, 15 Nov 2010 06:38:13 +0000 (06:38 +0000)]
bridge: fix RCU races with bridge port

The macro br_port_exists() is not enough protection when only
RCU is being used. There is a tiny race where other CPU has cleared port
handler hook, but is bridge port flag might still be set.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonetdev: add rcu annotations to receive handler hook
stephen hemminger [Mon, 15 Nov 2010 06:38:12 +0000 (06:38 +0000)]
netdev: add rcu annotations to receive handler hook

Suggested by Eric's bridge RCU changes.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agobridge: add proper RCU annotation to should_route_hook
Eric Dumazet [Mon, 15 Nov 2010 06:38:11 +0000 (06:38 +0000)]
bridge: add proper RCU annotation to should_route_hook

Add br_should_route_hook_t typedef, this is the only way we can
get a clean RCU implementation for function pointer.

Move route_hook to location where it is used.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agobridge: add RCU annotation to bridge multicast table
Eric Dumazet [Mon, 15 Nov 2010 06:38:10 +0000 (06:38 +0000)]
bridge: add RCU annotation to bridge multicast table

Add modern __rcu annotatations to bridge multicast table.
Use newer hlist macros to avoid direct access to hlist internals.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonet/ipv6/mcast.c: Remove unnecessary semicolons
Joe Perches [Sun, 14 Nov 2010 17:05:00 +0000 (17:05 +0000)]
net/ipv6/mcast.c: Remove unnecessary semicolons

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoinclude/net/caif/cfctrl.h: Remove unnecessary semicolons
Joe Perches [Sun, 14 Nov 2010 17:04:58 +0000 (17:04 +0000)]
include/net/caif/cfctrl.h: Remove unnecessary semicolons

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoinclude/linux/if_macvlan.h: Remove unnecessary semicolons
Joe Perches [Sun, 14 Nov 2010 17:04:57 +0000 (17:04 +0000)]
include/linux/if_macvlan.h: Remove unnecessary semicolons

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agodrivers/net/cnic.c: Remove unnecessary semicolons
Joe Perches [Sun, 14 Nov 2010 17:04:37 +0000 (17:04 +0000)]
drivers/net/cnic.c: Remove unnecessary semicolons

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agodrivers/net/ixgbe: Remove unnecessary semicolons
Joe Perches [Sun, 14 Nov 2010 17:04:33 +0000 (17:04 +0000)]
drivers/net/ixgbe: Remove unnecessary semicolons

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agodrivers/net/e1000e: Remove unnecessary semicolons
Joe Perches [Sun, 14 Nov 2010 17:04:32 +0000 (17:04 +0000)]
drivers/net/e1000e: Remove unnecessary semicolons

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agodrivers/net/bnx2x: Remove unnecessary semicolons
Joe Perches [Sun, 14 Nov 2010 17:04:31 +0000 (17:04 +0000)]
drivers/net/bnx2x: Remove unnecessary semicolons

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agodrivers/isdn: Remove unnecessary semicolons
Joe Perches [Sun, 14 Nov 2010 17:04:26 +0000 (17:04 +0000)]
drivers/isdn: Remove unnecessary semicolons

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoMerge branch 'dccp' of git://eden-feed.erg.abdn.ac.uk/net-next-2.6
David S. Miller [Mon, 15 Nov 2010 18:59:49 +0000 (10:59 -0800)]
Merge branch 'dccp' of git://eden-feed.erg.abdn.ac.uk/net-next-2.6

14 years agonet: Simplify RX queue allocation
Tom Herbert [Tue, 9 Nov 2010 10:47:38 +0000 (10:47 +0000)]
net: Simplify RX queue allocation

This patch move RX queue allocation to alloc_netdev_mq and freeing of
the queues to free_netdev (symmetric to TX queue allocation).  Each
kobject RX queue takes a reference to the queue's device so that the
device can't be freed before all the kobjects have been released-- this
obviates the need for reference counts specific to RX queues.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonet: Move TX queue allocation to alloc_netdev_mq
Tom Herbert [Tue, 9 Nov 2010 10:47:30 +0000 (10:47 +0000)]
net: Move TX queue allocation to alloc_netdev_mq

TX queues are now allocated in alloc_netdev_mq and freed in
free_netdev.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoxfrm: use gre key as flow upper protocol info
Timo Teräs [Wed, 3 Nov 2010 04:41:38 +0000 (04:41 +0000)]
xfrm: use gre key as flow upper protocol info

The GRE Key field is intended to be used for identifying an individual
traffic flow within a tunnel. It is useful to be able to have XFRM
policy selector matches to have different policies for different
GRE tunnels.

Signed-off-by: Timo Teräs <timo.teras@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agovlan: Fix build warning in vlandev_seq_show()
David S. Miller [Mon, 15 Nov 2010 18:37:30 +0000 (10:37 -0800)]
vlan: Fix build warning in vlandev_seq_show()

net/8021q/vlanproc.c: In function 'vlandev_seq_show':
net/8021q/vlanproc.c:283:20: warning: unused variable 'fmt'

Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agocarl9170: use generic sign_extend32
Christian Lamparter [Fri, 29 Oct 2010 21:11:23 +0000 (23:11 +0200)]
carl9170: use generic sign_extend32

This patch replaces the handcrafted
sign extension cruft with a generic
bitop function.

Signed-off-by: Christian Lamparter <chunkeey@googlemail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
14 years agobitops: Provide generic sign_extend32 function
Andreas Herrmann [Mon, 30 Aug 2010 19:04:01 +0000 (19:04 +0000)]
bitops: Provide generic sign_extend32 function

This patch moves code out from wireless drivers where two different
functions are defined in three code locations for the same purpose and
provides a common function to sign extend a 32-bit value.

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
14 years agowl1251: use wl12xx_platform_data to pass data
Grazvydas Ignotas [Wed, 3 Nov 2010 22:13:49 +0000 (00:13 +0200)]
wl1251: use wl12xx_platform_data to pass data

Make use the newly added method to pass platform data for wl1251 too.
This allows to eliminate some redundant code.

Cc: Ohad Ben-Cohen <ohad@wizery.com>
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Acked-by: Kalle Valo <kvalo@adurom.com>
Acked-by: Luciano Coelho <luciano.coelho@nokia.com>
Acked-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
14 years agowl1251: add runtime PM support for SDIO
Grazvydas Ignotas [Mon, 8 Nov 2010 13:29:36 +0000 (15:29 +0200)]
wl1251: add runtime PM support for SDIO

Add runtime PM support, similar to how it's done for wl1271.
This allows to power down the card when the driver is loaded but
network is not in use.

Cc: Ohad Ben-Cohen <ohad@wizery.com>
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Acked-by: Kalle Valo <kvalo@adurom.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>