GitHub/LineageOS/android_kernel_motorola_exynos9610.git
14 years ago3c523: Remove unnecessary memset of netdev private data
Tobias Klauser [Thu, 6 May 2010 05:39:46 +0000 (05:39 +0000)]
3c523: Remove unnecessary memset of netdev private data

The memory for the private data is allocated using kzalloc in
alloc_etherdev (or alloc_netdev_mq respectively) so there is no need to
set it to 0 again.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years ago3c507: Remove unnecessary memset of netdev private data
Tobias Klauser [Thu, 6 May 2010 05:39:11 +0000 (05:39 +0000)]
3c507: Remove unnecessary memset of netdev private data

The memory for the private data is allocated using kzalloc in
alloc_etherdev (or alloc_netdev_mq respectively) so there is no need to
set it to 0 again.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agorps: Various optimizations
Eric Dumazet [Fri, 7 May 2010 05:07:48 +0000 (22:07 -0700)]
rps: Various optimizations

Introduce ____napi_schedule() helper for callers in irq disabled
contexts. rps_trigger_softirq() becomes a leaf function.

Use container_of() in process_backlog() instead of accessing per_cpu
address.

Use a custom inlined version of __napi_complete() in process_backlog()
to avoid one locked instruction :

 only current cpu owns and manipulates this napi,
 and NAPI_STATE_SCHED is the only possible flag set on backlog.
 we can use a plain write instead of clear_bit(),
 and we dont need an smp_mb() memory barrier, since RPS is on,
 backlog is protected by a spinlock.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agomicroblaze: Kill NET_SKB_PAD and NET_IP_ALIGN overrides.
David S. Miller [Fri, 7 May 2010 05:01:53 +0000 (22:01 -0700)]
microblaze: Kill NET_SKB_PAD and NET_IP_ALIGN overrides.

NET_IP_ALIGN defaults to 2, no need to override.

NET_SKB_PAD is now 64, which is much larger than microblaze's
L1_CACHE_SIZE so no need to override that either.

Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonet: Increase NET_SKB_PAD to 64 bytes
Eric Dumazet [Fri, 7 May 2010 04:58:51 +0000 (21:58 -0700)]
net: Increase NET_SKB_PAD to 64 bytes

eth_type_trans() & get_rps_cpus() currently need two 64bytes cache
lines in packet to compute rxhash.

Increasing NET_SKB_PAD from 32 to 64 reduces the need to one cache
line only, and makes RPS faster.

NET_IP_ALIGN(2) + ethernet_header(14) + IP_header(20/40) + ports(8)

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoipv6: udp: make short packet logging consistent with ipv4
Bjørn Mork [Thu, 6 May 2010 03:44:35 +0000 (03:44 +0000)]
ipv6: udp: make short packet logging consistent with ipv4

Adding addresses and ports to the short packet log message,
like ipv4/udp.c does it, makes these messages a lot more useful:

[  822.182450] UDPv6: short packet: From [2001:db8:ffb4:3::1]:47839 23715/178 to [2001:db8:ffb4:3:5054:ff:feff:200]:1234

This requires us to drop logging in case pskb_may_pull() fails,
which also is consistent with ipv4/udp.c

Signed-off-by: Bjørn Mork <bjorn@mork.no>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonet: emaclite: Use resource_size
Tobias Klauser [Wed, 5 May 2010 22:12:20 +0000 (22:12 +0000)]
net: emaclite: Use resource_size

Use the resource_size function instead of manually calculating the
resource size.  This reduces the chance of introducing off-by-one
errors.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoe1000e: Reset 82577/82578 PHY before first PHY register read
Bruce Allan [Wed, 5 May 2010 22:00:27 +0000 (22:00 +0000)]
e1000e: Reset 82577/82578 PHY before first PHY register read

Reset the PHY before first accessing it.  Doing so, ensure that the PHY is
in a known good state before we read/write PHY registers. This fixes a
driver probe failure.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoe1000e: reset MAC-PHY interconnect on 82577/82578 during Sx->S0
Bruce Allan [Wed, 5 May 2010 22:00:06 +0000 (22:00 +0000)]
e1000e: reset MAC-PHY interconnect on 82577/82578 during Sx->S0

During Sx->S0 transitions, the interconnect between the MAC and PHY on
82577/82578 can remain in SMBus mode instead of transitioning to the
PCIe-like mode required during normal operation.  Toggling the LANPHYPC
Value bit essentially resets the interconnect forcing it to the correct
mode.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonetpoll: Use 'bool' for netpoll_rx() return type.
David S. Miller [Thu, 6 May 2010 08:20:10 +0000 (01:20 -0700)]
netpoll: Use 'bool' for netpoll_rx() return type.

Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agobonding: make bonding support netpoll
WANG Cong [Thu, 6 May 2010 07:48:51 +0000 (00:48 -0700)]
bonding: make bonding support netpoll

Based on Andy's work, but I modified a lot.

Similar to the patch for bridge, this patch does:

1) implement the 2 methods to support netpoll for bonding;

2) modify netpoll during forwarding packets via bonding;

3) disable netpoll support of bonding when a netpoll-unabled device
   is added to bonding;

4) enable netpoll support when all underlying devices support netpoll.

Cc: Andy Gospodarek <gospo@redhat.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Jay Vosburgh <fubar@us.ibm.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: WANG Cong <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agobridge: make bridge support netpoll
WANG Cong [Thu, 6 May 2010 07:48:24 +0000 (00:48 -0700)]
bridge: make bridge support netpoll

Based on the previous patch, make bridge support netpoll by:

1) implement the 2 methods to support netpoll for bridge;

2) modify netpoll during forwarding packets via bridge;

3) disable netpoll support of bridge when a netpoll-unabled device
   is added to bridge;

4) enable netpoll support when all underlying devices support netpoll.

Cc: David Miller <davem@davemloft.net>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Stephen Hemminger <shemminger@linux-foundation.org>
Cc: Matt Mackall <mpm@selenic.com>
Signed-off-by: WANG Cong <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonetpoll: add generic support for bridge and bonding devices
WANG Cong [Thu, 6 May 2010 07:47:21 +0000 (00:47 -0700)]
netpoll: add generic support for bridge and bonding devices

This whole patchset is for adding netpoll support to bridge and bonding
devices. I already tested it for bridge, bonding, bridge over bonding,
and bonding over bridge. It looks fine now.

To make bridge and bonding support netpoll, we need to adjust
some netpoll generic code. This patch does the following things:

1) introduce two new priv_flags for struct net_device:
   IFF_IN_NETPOLL which identifies we are processing a netpoll;
   IFF_DISABLE_NETPOLL is used to disable netpoll support for a device
   at run-time;

2) introduce one new method for netdev_ops:
   ->ndo_netpoll_cleanup() is used to clean up netpoll when a device is
     removed.

3) introduce netpoll_poll_dev() which takes a struct net_device * parameter;
   export netpoll_send_skb() and netpoll_poll_dev() which will be used later;

4) hide a pointer to struct netpoll in struct netpoll_info, ditto.

5) introduce ->real_dev for struct netpoll.

6) introduce a new status NETDEV_BONDING_DESLAE, which is used to disable
   netconsole before releasing a slave, to avoid deadlocks.

Cc: David Miller <davem@davemloft.net>
Cc: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: WANG Cong <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoixgbevf: Cache PF ack bit in interrupt
Greg Rose [Wed, 5 May 2010 19:57:49 +0000 (19:57 +0000)]
ixgbevf: Cache PF ack bit in interrupt

When the PF acks a message from the VF the VF gets an interrupt.  It
must cache the ack bit so that polling SW will not miss the ack.  Also
avoid reading the message buffer on acks because that also will clear
the ack bit.

Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoixgbe: Streamline MC filter setup for VFs
Greg Rose [Wed, 5 May 2010 19:57:30 +0000 (19:57 +0000)]
ixgbe: Streamline MC filter setup for VFs

The driver was calling the set Rx mode function for every multicast
filter set by the VF.  When starting many VMs where each might have
multiple VLAN interfaces this would result in the function being
called hundreds or even thousands of times.  This is unnecessary
for the case of the imperfect filters used in the MTA and has been
streamlined to be more efficient.

Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoixgbe: Remove unneeded register writes in VF VLAN setup
Greg Rose [Wed, 5 May 2010 19:57:10 +0000 (19:57 +0000)]
ixgbe: Remove unneeded register writes in VF VLAN setup

The driver is unnecessarily writing values to VLAN control registers.
These writes already done elsewhere and are superfluous here.

Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoMerge branch 'vhost' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
David S. Miller [Thu, 6 May 2010 07:26:49 +0000 (00:26 -0700)]
Merge branch 'vhost' of git://git./linux/kernel/git/mst/vhost

14 years agoigb: reduce cache misses on tx cleanup
Nick Nunley [Tue, 4 May 2010 21:58:07 +0000 (21:58 +0000)]
igb: reduce cache misses on tx cleanup

This patch reduces the number of skb cache misses in the
clean_tx_irq path, and results in an overall increase
in tx packet throughput.

Signed-off-by: Nicholas Nunley <nicholasx.d.nunley@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoks8851: companion eeprom access through ethtool
Sebastien Jan [Wed, 5 May 2010 08:45:54 +0000 (08:45 +0000)]
ks8851: companion eeprom access through ethtool

Accessing ks8851 companion eeprom permits modifying the ks8851 stored
MAC address.

Example how to change the MAC address using ethtool, to set the
01:23:45:67:89:AB MAC address:
$ echo "0:AB8976452301" | xxd -r > mac.bin
$ sudo ethtool -E eth0 magic 0x8870 offset 2 < mac.bin

Signed-off-by: Sebastien Jan <s-jan@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoks8851: Low level functions for read/write to companion eeprom
Sebastien Jan [Wed, 5 May 2010 08:45:53 +0000 (08:45 +0000)]
ks8851: Low level functions for read/write to companion eeprom

Low-level functions provide 16bits words read and write capability
to ks8851 companion eeprom.

Signed-off-by: Sebastien Jan <s-jan@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoks8851: Add caching of CCR register
Sebastien Jan [Wed, 5 May 2010 08:45:52 +0000 (08:45 +0000)]
ks8851: Add caching of CCR register

CCR register contains information on companion eeprom availability.

Signed-off-by: Sebastien Jan <s-jan@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoforcedeth: Account for consumed budget in napi poll
Tom Herbert [Wed, 5 May 2010 18:15:21 +0000 (18:15 +0000)]
forcedeth: Account for consumed budget in napi poll

Repeated calls to nv_rx_process in napi poll routine do not take
portion of budget that has been consumed in previous calls.  Fix by
subtracting the number of packets processed.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonetdev: octeon_mgmt: Remove some gratuitous blank lines.
David Daney [Wed, 5 May 2010 13:03:13 +0000 (13:03 +0000)]
netdev: octeon_mgmt: Remove some gratuitous blank lines.

Signed-off-by: David Daney <ddaney@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonetdev: octeon_mgmt: Try not to drop TX packets when stopping the queue.
David Daney [Wed, 5 May 2010 13:03:12 +0000 (13:03 +0000)]
netdev: octeon_mgmt: Try not to drop TX packets when stopping the queue.

Stop the queue when we add the packet that will fill it instead of dropping the packet

Signed-off-by: David Daney <ddaney@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonetdev: octeon_mgmt: Free TX skbufs in a timely manner.
David Daney [Wed, 5 May 2010 13:03:11 +0000 (13:03 +0000)]
netdev: octeon_mgmt: Free TX skbufs in a timely manner.

We also reduce the high water mark to 1 so skbufs are not stranded for
long periods of time.  Since we are cleaning after each packet, no
need to do it in the transmit path.

Signed-off-by: David Daney <ddaney@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonetdev: octeon_mgmt: Fix race manipulating irq bits.
David Daney [Wed, 5 May 2010 13:03:10 +0000 (13:03 +0000)]
netdev: octeon_mgmt: Fix race manipulating irq bits.

Don't re-read the interrupt status register, clear the exact bits we
will be testing.

Signed-off-by: David Daney <ddaney@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonetdev: octeon_mgmt: Fix race condition freeing TX buffers.
David Daney [Wed, 5 May 2010 13:03:09 +0000 (13:03 +0000)]
netdev: octeon_mgmt: Fix race condition freeing TX buffers.

Under heavy load the TX cleanup tasklet and xmit threads would race
and try to free too many buffers.

Signed-off-by: David Daney <ddaney@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonetdev: octeon_mgmt: Use proper MAC addresses.
David Daney [Wed, 5 May 2010 13:03:08 +0000 (13:03 +0000)]
netdev: octeon_mgmt: Use proper MAC addresses.

The original implementation incorrectly uses netdev->dev_addrs.

Use netdev->uc instead.  Also use netdev_for_each_uc_addr to iterate
over the addresses.  Fix comment.

Signed-off-by: David Daney <ddaney@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoixgbe: Add support for VF MAC and VLAN configuration
Greg Rose [Tue, 4 May 2010 22:12:06 +0000 (22:12 +0000)]
ixgbe: Add support for VF MAC and VLAN configuration

Add support for the "ip link set" and "ip link show" commands that allow
configuration of the virtual functions' MAC and port VLAN via user space
command line.

Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoixgbe: Add boolean parameter to ixgbe_set_vmolr
Greg Rose [Tue, 4 May 2010 22:11:46 +0000 (22:11 +0000)]
ixgbe: Add boolean parameter to ixgbe_set_vmolr

Add a boolean parameter to ixgbe-set_vmolr so that the caller can
specify whether the pool should accept untagged packets.  Required
for a follow on patch to enable administrative configuration of port
VLAN for virtual functions.

Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoe1000/e1000e: implement a simple interrupt moderation
Jesse Brandeburg [Tue, 4 May 2010 22:26:03 +0000 (22:26 +0000)]
e1000/e1000e: implement a simple interrupt moderation

Back before e1000-7.3.20, the e1000 driver had a simple algorithm that
managed interrupt moderation.  The driver was updated in 7.3.20 to
have the new "adaptive" interrupt moderation but we have customer
requests to redeploy the old way as an option.  This patch adds the
old functionality back.  The new functionality can be enabled via
module parameter or at runtime via ethtool.
Module parameter: (InterruptThrottleRate=4) to use this new
moderation method.
Ethtool method: ethtool -C ethX rx-usecs 4

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoe1000e: increase rx fifo size to 36K on 82574 and 82583
Alexander Duyck [Tue, 4 May 2010 22:25:42 +0000 (22:25 +0000)]
e1000e: increase rx fifo size to 36K on 82574 and 82583

This change increases the RX fifo size to 36K for standard frames and
decreases the TX fifo size to 4K.  The reason for this change is that on
slower systems the RX is much more likely to backfill and need space than
the TX is.  As long as the TX fifo is twice the size of the MTU we should
have more than enough TX fifo.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoe1000e: Save irq into netdev structure
Tom Herbert [Wed, 5 May 2010 14:03:32 +0000 (14:03 +0000)]
e1000e: Save irq into netdev structure

Set net->devirq to pdev->irq.  This should be consistent with other
drivers.

Signed-off-by: Tom Herbert <therbert@google.com>
Acked-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoe1000e: Remove unnessary log message
Tom Herbert [Wed, 5 May 2010 14:03:11 +0000 (14:03 +0000)]
e1000e: Remove unnessary log message

Remove e_info message printed whenever TSO is enabled or disabled.
This is not very useful and just clutters dmesg.

Signed-off-by: Tom Herbert <therbert@google.com>
Acked-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoe1000e: reduce writes of RX producer ptr
Tom Herbert [Wed, 5 May 2010 14:02:49 +0000 (14:02 +0000)]
e1000e: reduce writes of RX producer ptr

Reduce number of writes to RX producer pointer.   When alloc'ing RX
buffers, only write the RX producer pointer once every
E1000_RX_BUFFER_WRITE (16) buffers created.

Signed-off-by: Tom Herbert <therbert@google.com>
Acked-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoe1000e: save skb counts in TX to avoid cache misses
Tom Herbert [Wed, 5 May 2010 14:02:27 +0000 (14:02 +0000)]
e1000e: save skb counts in TX to avoid cache misses

In e1000_tx_map, precompute number of segements and bytecounts which
are derived from fields in skb; these are stored in buffer_info.  When
cleaning tx in e1000_clean_tx_irq use the values in the associated
buffer_info for statistics counting, this eliminates cache misses
on skb fields.

Signed-off-by: Tom Herbert <therbert@google.com>
Acked-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agobnx2x: Fix check to get RX hash
Tom Herbert [Wed, 5 May 2010 17:57:16 +0000 (17:57 +0000)]
bnx2x: Fix check to get RX hash

Flag used in check to get rxhash out of the descriptor is incorrect one.
Fix to use the proper features flag.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoMerge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville...
David S. Miller [Wed, 5 May 2010 22:09:05 +0000 (15:09 -0700)]
Merge branch 'for-davem' of git://git./linux/kernel/git/linville/wireless-next-2.6

14 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wirel...
John W. Linville [Wed, 5 May 2010 20:14:16 +0000 (16:14 -0400)]
Merge branch 'master' of git://git./linux/kernel/git/linville/wireless-next-2.6 into for-davem

Conflicts:
drivers/net/wireless/libertas_tf/cmd.c
drivers/net/wireless/libertas_tf/main.c

14 years agomac80211: use fixed channel in ibss join when appropriate
Johannes Berg [Wed, 5 May 2010 13:33:55 +0000 (15:33 +0200)]
mac80211: use fixed channel in ibss join when appropriate

"mac80211: improve IBSS scanning" was missing a hunk.
This adds that hunk as originally intended.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
14 years agortl8187: use SET_IEEE80211_PERM_ADDR
John W. Linville [Tue, 4 May 2010 19:48:48 +0000 (15:48 -0400)]
rtl8187: use SET_IEEE80211_PERM_ADDR

Signed-off-by: John W. Linville <linville@tuxdriver.com>
Acked-by: Hin-Tak Leung <htl10@users.sourceforge.net>
14 years agonet: __alloc_skb() speedup
Eric Dumazet [Wed, 5 May 2010 08:07:37 +0000 (01:07 -0700)]
net: __alloc_skb() speedup

With following patch I can reach maximum rate of my pktgen+udpsink
simulator :
- 'old' machine : dual quad core E5450  @3.00GHz
- 64 UDP rx flows (only differ by destination port)
- RPS enabled, NIC interrupts serviced on cpu0
- rps dispatched on 7 other cores. (~130.000 IPI per second)
- SLAB allocator (faster than SLUB in this workload)
- tg3 NIC
- 1.080.000 pps without a single drop at NIC level.

Idea is to add two prefetchw() calls in __alloc_skb(), one to prefetch
first sk_buff cache line, the second to prefetch the shinfo part.

Also using one memset() to initialize all skb_shared_info fields instead
of one by one to reduce number of instructions, using long word moves.

All skb_shared_info fields before 'dataref' are cleared in
__alloc_skb().

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agopppoe: remove unnecessary checks in pppoe_flush_dev
Jiri Pirko [Wed, 5 May 2010 07:56:33 +0000 (00:56 -0700)]
pppoe: remove unnecessary checks in pppoe_flush_dev

pernet memory is guaranteed to exist when notifiers are called.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agortl8180: use SET_IEEE80211_PERM_ADDR
John W. Linville [Tue, 4 May 2010 19:46:15 +0000 (15:46 -0400)]
rtl8180: use SET_IEEE80211_PERM_ADDR

Signed-off-by: John W. Linville <linville@tuxdriver.com>
14 years agort2x00: Register frame length in TX entry descriptor instead of L2PAD.
Gertjan van Wingerde [Mon, 3 May 2010 20:43:05 +0000 (22:43 +0200)]
rt2x00: Register frame length in TX entry descriptor instead of L2PAD.

And use it consistently in the chipset drivers.
Preparation for further clean ups.

Signed-off-by: Gertjan van Wingerde <gwingerde@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
14 years agort2x00: Fix HT40+/HT40- setting in rt2800.
Gertjan van Wingerde [Mon, 3 May 2010 20:43:04 +0000 (22:43 +0200)]
rt2x00: Fix HT40+/HT40- setting in rt2800.

Inspection of the Ralink vendor driver shows that the TX_BAND_CFG register
and BBP register 3 are about HT40- indication, not about HT40+ indication.
Inverse the meaning of these fields in the code.

Signed-off-by: Gertjan van Wingerde <gwingerde@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
14 years agort2x00: Enable RT30xx by default.
Gertjan van Wingerde [Mon, 3 May 2010 20:43:03 +0000 (22:43 +0200)]
rt2x00: Enable RT30xx by default.

Now that RT30xx support is at the same level as RT28xx support we can enable
these devices by default.

Signed-off-by: Gertjan van Wingerde <gwingerde@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
14 years agort2x00: Remove rt2x00pci.h include from rt2800lib.
Gertjan van Wingerde [Mon, 3 May 2010 20:43:02 +0000 (22:43 +0200)]
rt2x00: Remove rt2x00pci.h include from rt2800lib.

PCI specific code has been remove quite some time ago.

Signed-off-by: Gertjan van Wingerde <gwingerde@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
14 years agoiwlwifi: recalculate average tpt if not current
Reinette Chatre [Mon, 3 May 2010 17:55:07 +0000 (10:55 -0700)]
iwlwifi: recalculate average tpt if not current

We currently have this check as a BUG_ON, which is being hit by people.
Previously it was an error with a recalculation if not current, return that
code.

The BUG_ON was introduced by:
commit 3110bef78cb4282c58245bc8fd6d95d9ccb19749
Author: Guy Cohen <guy.cohen@intel.com>
Date:   Tue Sep 9 10:54:54 2008 +0800

    iwlwifi: Added support for 3 antennas

... the portion adding the BUG_ON is reverted since we are encountering the error
and BUG_ON was created with assumption that error is not encountered.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
14 years agoforcedeth: Kill NAPI config options.
David S. Miller [Tue, 4 May 2010 06:33:05 +0000 (23:33 -0700)]
forcedeth: Kill NAPI config options.

All distributions enable it, therefore no significant body of users
are even testing the driver with it disabled.  And making NAPI
configurable is heavily discouraged anyways.

I left the MSI-X interrupt enabling thing in an "#if 0" block
so hopefully someone can debug that and it can get re-enabled.
Probably it was just one of the NVIDIA chipset MSI erratas that
we work handle these days in the PCI quirks (see drivers/pci/quirks.c
and stuff like nvenet_msi_disable()).

Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoforcedeth: GRO support
Tom Herbert [Mon, 3 May 2010 19:08:45 +0000 (19:08 +0000)]
forcedeth: GRO support

Add GRO support to forcedeth.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonet: skb_free_datagram_locked() fix
Eric Dumazet [Tue, 4 May 2010 06:18:14 +0000 (23:18 -0700)]
net: skb_free_datagram_locked() fix

Commit 4b0b72f7dd617b ( net: speedup udp receive path )
introduced a bug in skb_free_datagram_locked().

We should not skb_orphan() skb if we dont have the guarantee we are the
last skb user, this might happen with MSG_PEEK concurrent users.

To keep socket locked for the smallest period of time, we split
consume_skb() logic, inlined in skb_free_datagram_locked()

Reported-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoMerge branch 'net-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vxy/lksctp-dev
David S. Miller [Mon, 3 May 2010 23:20:44 +0000 (16:20 -0700)]
Merge branch 'net-next' of git://git./linux/kernel/git/vxy/lksctp-dev

Add missing linux/vmalloc.h include to net/sctp/probe.c

Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonet: rcu fixes
Eric Dumazet [Mon, 3 May 2010 10:50:14 +0000 (10:50 +0000)]
net: rcu fixes

Add hlist_for_each_entry_rcu_bh() and
hlist_for_each_entry_continue_rcu_bh() macros, and use them in
ipv6_get_ifaddr(), if6_get_first() and if6_get_next() to fix lockdeps
warnings.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Reviewed-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agophy/micrel: Add module device ID table for autoloading.
David S. Miller [Mon, 3 May 2010 22:48:29 +0000 (15:48 -0700)]
phy/micrel: Add module device ID table for autoloading.

Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoMerge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
David S. Miller [Mon, 3 May 2010 22:45:52 +0000 (15:45 -0700)]
Merge branch 'master' of /linux/kernel/git/davem/net-2.6

14 years agodrivers/net/phy: micrel phy driver
David J. Choi [Thu, 29 Apr 2010 06:12:41 +0000 (06:12 +0000)]
drivers/net/phy: micrel phy driver

This is the first version of phy driver from Micrel Inc.

Signed-off-by: David J. Choi <david.choi@micrel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agounix/garbage: kill copy of the skb queue walker
Ilpo Järvinen [Mon, 3 May 2010 03:22:18 +0000 (03:22 +0000)]
unix/garbage: kill copy of the skb queue walker

Worse yet, it seems that its arguments were in reverse order. Also
remove one related helper which seems hardly worth keeping.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agomacvtap: add ioctl to modify vnet header size
Michael S. Tsirkin [Thu, 29 Apr 2010 10:50:48 +0000 (13:50 +0300)]
macvtap: add ioctl to modify vnet header size

This adds TUNSETVNETHDRSZ/TUNGETVNETHDRSZ support
to macvtap.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: David S. Miller <davem@davemloft.net>
14 years agodm9601: fix phy/eeprom write routine
Peter Korsgaard [Mon, 3 May 2010 10:01:26 +0000 (10:01 +0000)]
dm9601: fix phy/eeprom write routine

Use correct bit positions in DM_SHARED_CTRL register for writes.

Michael Planes recently encountered a 'KY-RS9600 USB-LAN converter', which
came with a driver CD containing a Linux driver. This driver turns out to
be a copy of dm9601.c with symbols renamed and my copyright stripped.
That aside, it did contain 1 functional change in dm_write_shared_word(),
and after checking the datasheet the original value was indeed wrong
(read versus write bits).

On Michaels HW, this change bumps receive speed from ~30KB/s to ~900KB/s.
On other devices the difference is less spectacular, but still significant
(~30%).

Reported-by: Michael Planes <michael.planes@free.fr>
CC: stable@kernel.org
Signed-off-by: Peter Korsgaard <jacmet@sunsite.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoRevert "ixgbe: disable MSI-X by default on certain Cisco adapters"
David S. Miller [Mon, 3 May 2010 22:18:22 +0000 (15:18 -0700)]
Revert "ixgbe: disable MSI-X by default on certain Cisco adapters"

This reverts commit d5ffd75a27fade39ba5df3b07290c5a2c297b9bd.

As requested by Jeff Kircher.

Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoppp_generic: handle non-linear skbs when passing them to pppd
Simon Arlott [Mon, 3 May 2010 10:20:27 +0000 (10:20 +0000)]
ppp_generic: handle non-linear skbs when passing them to pppd

Frequently when using PPPoE with an interface MTU greater than 1500,
the skb is likely to be non-linear. If the skb needs to be passed to
pppd then the skb data must be read correctly.

The previous commit fixes an issue with accidentally sending skbs
to pppd based on an invalid read of the protocol type. When that
error occurred pppd was reading invalid skb data too.

Signed-off-by: Simon Arlott <simon@fire.lp0.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoppp_generic: pull 2 bytes so that PPP_PROTO(skb) is valid
Simon Arlott [Mon, 3 May 2010 10:19:33 +0000 (10:19 +0000)]
ppp_generic: pull 2 bytes so that PPP_PROTO(skb) is valid

In ppp_input(), PPP_PROTO(skb) may refer to invalid data in the skb.

If this happens and (proto >= 0xc000 || proto == PPP_CCPFRAG) then
the packet is passed directly to pppd.

This occurs frequently when using PPPoE with an interface MTU
greater than 1500 because the skb is more likely to be non-linear.

The next 2 bytes need to be pulled in ppp_input(). The pull of 2
bytes in ppp_receive_frame() has been removed as it is no longer
required.

Signed-off-by: Simon Arlott <simon@fire.lp0.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoiwmc3200wifi: fix busted iwm_debugfs_init definition
John W. Linville [Mon, 3 May 2010 20:12:39 +0000 (16:12 -0400)]
iwmc3200wifi: fix busted iwm_debugfs_init definition

Looks like we missed removing the return statement in the non-CONFIG_IWM_DEBUG
dummy implementation of iwm_debugfs_init...

Signed-off-by: John W. Linville <linville@tuxdriver.com>
14 years agoMerge branch 'wireless-next-2.6' of git://git.kernel.org/pub/scm/linux/kernel/git...
John W. Linville [Mon, 3 May 2010 18:53:49 +0000 (14:53 -0400)]
Merge branch 'wireless-next-2.6' of git://git./linux/kernel/git/iwlwifi/iwlwifi-2.6

14 years agowireless: rt2x00: rt2800usb: be in sync with latest windows drivers.
Xose Vazquez Perez [Mon, 3 May 2010 11:11:38 +0000 (13:11 +0200)]
wireless: rt2x00: rt2800usb: be in sync with latest windows drivers.

0x07d1,0x3c17 D-Link Wireless N 150 USB Adapter DWA-125
0x1b75,0x3071 Ovislink Airlive WN-301USB
0x1d4d,0x0011 Pegatron Ralink RT3072 802.11b/g/n Wireless Lan USB Device
0x083a,0xf511 Arcadyan 802.11 USB Wireless LAN Card
0x13d3,0x3322 AzureWave 802.11 n/g/b USB Wireless LAN Card

Signed-off-by: Xose Vazquez Perez <xose.vazquez@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
14 years agomac80211_hwsim: fix double-scan detection
Johannes Berg [Mon, 3 May 2010 07:21:14 +0000 (09:21 +0200)]
mac80211_hwsim: fix double-scan detection

Currently, hwsim will always detect a double scan
after the first one has finished ...

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
14 years agomac80211: improve IBSS scanning
Johannes Berg [Mon, 3 May 2010 06:49:48 +0000 (08:49 +0200)]
mac80211: improve IBSS scanning

When IBSS is fixed to a frequency, it can still
scan to try to find the right BSSID. This makes
sense if the BSSID isn't also fixed, but it need
not scan all channels -- just one is sufficient.
Make it do that by moving the scan setup code to
ieee80211_request_internal_scan() and include
a channel variable setting.

Note that this can be further improved to start
the IBSS right away if both frequency and BSSID
are fixed.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
14 years agomac80211: allow controlling aggregation manually
Johannes Berg [Sat, 1 May 2010 16:53:51 +0000 (18:53 +0200)]
mac80211: allow controlling aggregation manually

This allows enabling TX and disabling both TX and
RX aggregation sessions manually in debugfs. It is
very useful for debugging session initiation and
teardown problems since with this you don't have
to force a lot of traffic to get aggregation and
thus have less data to analyse.

Also, to debug mac80211 code itself, make hwsim
"support" aggregation sessions. It will still just
transfer the frame, but go through the setup and
teardown handshakes.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
14 years agoorinoco_usb: implement fw download
David Kilroy [Sat, 1 May 2010 13:05:43 +0000 (14:05 +0100)]
orinoco_usb: implement fw download

This involves some refactorring of the common fw download code to
substitute ezusb versions of various functions.

Note that WPA-enabled firmwares (9.xx series) will not work fully with
orinoco_usb yet.

Signed-off-by: David Kilroy <kilroyd@googlemail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
14 years agoorinoco_usb: avoid in_atomic
David Kilroy [Sat, 1 May 2010 13:05:42 +0000 (14:05 +0100)]
orinoco_usb: avoid in_atomic

We expect to be either in process contect or soft interrupt context. So
use in_softirq instead.

Signed-off-by: David Kilroy <kilroyd@googlemail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
14 years agoorinoco: add orinoco_usb driver
David Kilroy [Sat, 1 May 2010 13:05:41 +0000 (14:05 +0100)]
orinoco: add orinoco_usb driver

This driver uses the core orinoco modules for the bulk of
the functionality. The low level hermes routines (for local bus
cards) are replaced, the driver supplies its own ndo_xmit_start
function, and locking is done with the _bh variant.

Some recent functionality is not available to the USB cards yet
(firmware loading and WPA).

Out-of-tree driver originally written by Manuel Estrada Sainz.

Thanks to Mark Davis for supplying hardware to test the updates.

Signed-off-by: David Kilroy <kilroyd@googlemail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
14 years agoorinoco: encapsulate driver locking
David Kilroy [Sat, 1 May 2010 13:05:40 +0000 (14:05 +0100)]
orinoco: encapsulate driver locking

Local bus and USB drivers will need to do locking differently.

The original orinoco_usb patches had a boolean variable controlling
whether spin_lock_bh was used, or irq based locking. This version
provides wrappers for the lock functions and the drivers specify the
functions pointers needed.

This will introduce a performance penalty, but I'm not expecting it to
be noticable.

Signed-off-by: David Kilroy <kilroyd@googlemail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
14 years agoorinoco: allow driver to specify netdev_ops
David Kilroy [Sat, 1 May 2010 13:05:39 +0000 (14:05 +0100)]
orinoco: allow driver to specify netdev_ops

Allow the main drivers to specify a custom version of the net_device_ops
structure. This is required by orinoco_usb to supply a separate transmit
function.

Export existing net_device_ops callbacks so that the drivers can reuse
some of the existing code.

Signed-off-by: David Kilroy <kilroyd@googlemail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
14 years agoorinoco: add hermes_ops
David Kilroy [Sat, 1 May 2010 13:05:38 +0000 (14:05 +0100)]
orinoco: add hermes_ops

Pave the way for introducing USB alternative functions.

Force callers to dereference ops instead of providing wrappers.

Signed-off-by: David Kilroy <kilroyd@googlemail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
14 years agomac80211: fix ieee80211_find_sta[_by_hw]
Johannes Berg [Fri, 30 Apr 2010 11:48:36 +0000 (13:48 +0200)]
mac80211: fix ieee80211_find_sta[_by_hw]

Both of these functions can currently return
a station pointer that, to the driver, is
invalid (in IBSS mode only) because adding
the station failed. Check for that, and also
make ieee80211_find_sta() properly use the
per interface station search.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
14 years agob43legacy: Added get_survey callback in order to get channel noise
John W. Linville [Thu, 29 Apr 2010 19:56:25 +0000 (15:56 -0400)]
b43legacy: Added get_survey callback in order to get channel noise

Signed-off-by: John W. Linville <linville@tuxdriver.com>
14 years agob43: Added get_survey callback in order to get channel noise
John W. Linville [Thu, 29 Apr 2010 19:56:06 +0000 (15:56 -0400)]
b43: Added get_survey callback in order to get channel noise

Signed-off-by: John W. Linville <linville@tuxdriver.com>
14 years agort2x00: remove now unused noise field from struct rxdone_entry_desc
John W. Linville [Wed, 28 Apr 2010 21:00:52 +0000 (17:00 -0400)]
rt2x00: remove now unused noise field from struct rxdone_entry_desc

Signed-off-by: John W. Linville <linville@tuxdriver.com>
Acked-by: Gertjan van Wingerde <gwingerde@gmail.com>
14 years agoiwmc3200wifi: cleanup unneeded debugfs error handling
John W. Linville [Mon, 3 May 2010 18:46:05 +0000 (14:46 -0400)]
iwmc3200wifi: cleanup unneeded debugfs error handling

"iwl: cleanup: remove unneeded error handling" missed the one in
if_sdio_debugfs_init().

I don't think we even need to check -ENODEV ourselves because if
DEBUG_FS is not compiled in, all the debugfs utility functions will
become no-op.

Reported-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Acked-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
14 years agotun: add ioctl to modify vnet header size
Michael S. Tsirkin [Wed, 17 Mar 2010 15:45:01 +0000 (17:45 +0200)]
tun: add ioctl to modify vnet header size

virtio added mergeable buffers mode where 2 bytes of extra info is put
after vnet header but before actual data (tun does not need this data).
In hindsight, it would have been better to add the new info *before* the
packet: as it is, users need a lot of tricky code to skip the extra 2
bytes in the middle of the iovec, and in fact applications seem to get
it wrong, and only work with specific iovec layout.  The fact we might
need to split iovec also means we might in theory overflow iovec max
size.

This patch adds a simpler way for applications to handle this,
and future proofs the interface against further extensions,
by making the size of the virtio net header configurable
from userspace. As a result, tun driver will simply
skip the extra 2 bytes on both input and output.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: David S. Miller <davem@davemloft.net>
14 years agonet: Use explicit "unsigned int" instead of plain "unsigned" in netdevice.h
David S. Miller [Mon, 3 May 2010 05:27:59 +0000 (22:27 -0700)]
net: Use explicit "unsigned int" instead of plain "unsigned" in netdevice.h

Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonet: fix softnet_stat
Changli Gao [Sun, 2 May 2010 05:42:16 +0000 (05:42 +0000)]
net: fix softnet_stat

Per cpu variable softnet_data.total was shared between IRQ and SoftIRQ context
without any protection. And enqueue_to_backlog should update the netdev_rx_stat
of the target CPU.

This patch renames softnet_data.total to softnet_data.processed: the number of
packets processed in uppper levels(IP stacks).

softnet_stat data is moved into softnet_data.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
----
 include/linux/netdevice.h |   17 +++++++----------
 net/core/dev.c            |   26 ++++++++++++--------------
 net/sched/sch_generic.c   |    2 +-
 3 files changed, 20 insertions(+), 25 deletions(-)
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoMerge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
David S. Miller [Mon, 3 May 2010 04:43:40 +0000 (21:43 -0700)]
Merge branch 'master' of /linux/kernel/git/davem/net-2.6

14 years agonet: fix compile error due to double return type in SOCK_DEBUG
Jan Engelhardt [Sun, 2 May 2010 20:42:39 +0000 (13:42 -0700)]
net: fix compile error due to double return type in SOCK_DEBUG

Fix this one:
include/net/sock.h: error: two or more data types in declaration specifiers

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonet: Inline skb_pull() in eth_type_trans().
David S. Miller [Sun, 2 May 2010 09:21:44 +0000 (02:21 -0700)]
net: Inline skb_pull() in eth_type_trans().

In commit 6be8ac2f ("[NET]: uninline skb_pull, de-bloats a lot")
we uninlined skb_pull.

But in some critical paths it makes sense to inline this thing
and it helps performance significantly.

Create an skb_pull_inline() so that we can do this in a way that
serves also as annotation.

Based upon a patch by Eric Dumazet.

Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonet/usb: initiate sync sequence in sierra_net.c driver
Elina Pasheva [Wed, 28 Apr 2010 13:28:24 +0000 (13:28 +0000)]
net/usb: initiate sync sequence in sierra_net.c driver

The following patch adds the initiation of the sync sequence to
"sierra_net_bind()". If this step is omitted, the modem will never sync up
with the host and it will not be possible to establish a data connection.

Signed-off-by: Elina Pasheva <epasheva@sierrawireless.com>
Signed-off-by: Rory Filer <rfiler@sierrawireless.com>
Tested-by: Elina Pasheva <epasheva@sierrawireless.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonet: sock_def_readable() and friends RCU conversion
Eric Dumazet [Thu, 29 Apr 2010 11:01:49 +0000 (11:01 +0000)]
net: sock_def_readable() and friends RCU conversion

sk_callback_lock rwlock actually protects sk->sk_sleep pointer, so we
need two atomic operations (and associated dirtying) per incoming
packet.

RCU conversion is pretty much needed :

1) Add a new structure, called "struct socket_wq" to hold all fields
that will need rcu_read_lock() protection (currently: a
wait_queue_head_t and a struct fasync_struct pointer).

[Future patch will add a list anchor for wakeup coalescing]

2) Attach one of such structure to each "struct socket" created in
sock_alloc_inode().

3) Respect RCU grace period when freeing a "struct socket_wq"

4) Change sk_sleep pointer in "struct sock" by sk_wq, pointer to "struct
socket_wq"

5) Change sk_sleep() function to use new sk->sk_wq instead of
sk->sk_sleep

6) Change sk_has_sleeper() to wq_has_sleeper() that must be used inside
a rcu_read_lock() section.

7) Change all sk_has_sleeper() callers to :
  - Use rcu_read_lock() instead of read_lock(&sk->sk_callback_lock)
  - Use wq_has_sleeper() to eventually wakeup tasks.
  - Use rcu_read_unlock() instead of read_unlock(&sk->sk_callback_lock)

8) sock_wake_async() is modified to use rcu protection as well.

9) Exceptions :
  macvtap, drivers/net/tun.c, af_unix use integrated "struct socket_wq"
instead of dynamically allocated ones. They dont need rcu freeing.

Some cleanups or followups are probably needed, (possible
sk_callback_lock conversion to a spinlock for example...).

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agosctp: Tag messages that can be Nagle delayed at creation.
Vlad Yasevich [Sat, 1 May 2010 02:41:10 +0000 (22:41 -0400)]
sctp: Tag messages that can be Nagle delayed at creation.

When we create the sctp_datamsg and fragment the user data,
we know exactly if we are sending full segments or not and
how they might be bundled.  During this time, we can mark
messages a Nagle capable or not.  This makes the check at
transmit time much simpler.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
14 years agosctp: Optimize computation of highest new tsn in SACK.
Vlad Yasevich [Sat, 1 May 2010 02:41:10 +0000 (22:41 -0400)]
sctp: Optimize computation of highest new tsn in SACK.

Right now, if the highest tsn in the SACK doesn't change, we'll
end up scanning the transmitted lists on the transports twice:
once for locating the highest _new_ tsn, and once for actually
tagging chunks as acked.  This is a waste, since we can record
the highest _new_ tsn at the same time as tagging chunks.  Long
ago this was not possible because we would try to mark chunks
as missing at the same time as tagging them acked and this approach
didn't work.  Now that the two steps are separate, we can re-use
the old approach.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
14 years agosctp: correctly mark missing chunks in fast recovery
Vlad Yasevich [Sat, 1 May 2010 02:41:10 +0000 (22:41 -0400)]
sctp: correctly mark missing chunks in fast recovery

According to RFC 4960 Section 7.2.4:
  If an endpoint is in Fast
   Recovery and a SACK arrives that advances the Cumulative TSN Ack
   Point, the miss indications are incremented for all TSNs reported
   missing in the SACK.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
14 years agosctp: rwnd_press should be cumulative
Vlad Yasevich [Sat, 1 May 2010 02:41:10 +0000 (22:41 -0400)]
sctp: rwnd_press should be cumulative

rwnd_press tracks the pressure on the recieve window.  Every
timer the receive buffer overlows, we truncate the receive
window and then grow it back.  However, if we don't track
the cumulative presser, it's possible to reach a situation
when receive buffer is empty, but rwnd stays truncated.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
14 years agosctp: fast recovery algorithm is per association.
Vlad Yasevich [Sat, 1 May 2010 02:41:10 +0000 (22:41 -0400)]
sctp: fast recovery algorithm is per association.

SCTP fast recovery algorithm really applies per association
and impacts all transports.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
14 years agosctp: update transport initializations
Vlad Yasevich [Sat, 1 May 2010 02:41:10 +0000 (22:41 -0400)]
sctp: update transport initializations

Right now, sctp transports are not fully initialized and when
adding any new fields, they have to be explicitely initialized.
This is prone to mistakes.  So we switch to calling kzalloc()
which makes things much simpler.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
14 years agosctp: Save some room in the sctp_transport by using bitfields
Vlad Yasevich [Sat, 1 May 2010 02:41:09 +0000 (22:41 -0400)]
sctp: Save some room in the sctp_transport by using bitfields

Saves some room in the sctp_transport structure.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
14 years agosctp: Do not force T3 timer on fast retransmissions.
Vlad Yasevich [Sat, 1 May 2010 02:41:09 +0000 (22:41 -0400)]
sctp: Do not force T3 timer on fast retransmissions.

We don't need to force the T3 timer any more and it's
actually wrong to do as it causes too long of a delay.
The timer will be started if one is not running, but if
one is running, we leave it alone.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
14 years agosctp: remove 'resent' bit from the chunk
Vlad Yasevich [Sat, 1 May 2010 02:41:09 +0000 (22:41 -0400)]
sctp: remove 'resent' bit from the chunk

The 'resent' bit is used to make sure that we don't update
rto estimate based on retransmitted chunks.  However, we already
have the 'rto_pending' bit that we test when need to update rto,
so 'resent' bit is just extra.  Additionally, we currently have
a bug in that we always set a 'resent' bit and thus rto estimate
is only updated by Heartbeats.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
14 years agosctp: Make sure we always return valid retransmit path
Vlad Yasevich [Sat, 1 May 2010 02:41:09 +0000 (22:41 -0400)]
sctp: Make sure we always return valid retransmit path

commit 4951feda0c60d1ef681f1a270afdd617924ab041
    sctp: Do no select unconfirmed transports for retransmissions

added code to make sure that we do not select unconfirmed paths
for data transmission.  This caused a problem when there are only
2 paths, 1 unconfirmed and 1 unreachable.  In that case, the next
retransmit path returned is NULL and that causes a kernel crash.

The solution is to only change retransmit paths if we found one to use.

Reported-by: Frank Schuster <frank.schuster01@web.de>
Signed-off-b: Vlad Yasevich <vladislav.yasevich@hp.com>

14 years agosctp: cleanup: remove duplicate assignment
Dan Carpenter [Sat, 1 May 2010 02:41:09 +0000 (22:41 -0400)]
sctp: cleanup: remove duplicate assignment

This assignment isn't needed because we did it earlier already.

Also another reason to delete the assignment is because it triggers a
Smatch warning about checking for NULL pointers after a dereference.

Reported-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
14 years agosctp: implement sctp association probing module
Wei Yongjun [Sat, 1 May 2010 02:41:09 +0000 (22:41 -0400)]
sctp: implement sctp association probing module

This patch implement sctp association probing module, the module
will be called sctp_probe.

This module allows for capturing the changes to SCTP association
state in response to incoming packets. It is used for debugging
SCTP congestion control algorithms.

Usage:
  $ modprobe sctp_probe [full=n] [port=n] [bufsize=n]
  $ cat /proc/net/sctpprobe

  The output format is:
    TIME     ASSOC     LPORT RPORT MTU    RWND  UNACK <REMOTE-ADDR   STATE  CWND   SSTHRESH  INFLIGHT  PARTIAL_BYTES_ACKED MTU> ...

  The output will be like this:
    9.226086 c4064c48  9000  8000  1500    53352     1 *192.168.0.19  1     4380    54784     1252        0     1500
    9.287195 c4064c48  9000  8000  1500    45144     5 *192.168.0.19  1     5880    54784     6500        0     1500
    9.289130 c4064c48  9000  8000  1500    42724     5 *192.168.0.19  1     7380    54784     6500        0     1500
    9.620332 c4064c48  9000  8000  1500    48284     4 *192.168.0.19  1     8880    54784     5200        0     1500
    ......

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>