YOSHIFUJI Hideaki [Sat, 20 Mar 2010 23:11:12 +0000 (16:11 -0700)]
ipv6: Reduce timer events for addrconf_verify().
This patch reduces timer events while keeping accuracy by rounding
our timer and/or batching several address validations in addrconf_verify().
addrconf_verify() is called at earliest timeout among interface addresses'
timeouts, but at maximum ADDR_CHECK_FREQUENCY (120 secs).
In most cases, all of timeouts of interface addresses are long enough
(e.g. several hours or days vs 2 minutes), this timer is usually called
every ADDR_CHECK_FREQUENCY, and it is okay to be lazy.
(Note this timer could be eliminated if all code paths which modifies
variables related to timeouts call us manually, but it is another story.)
However, in other least but important cases, we try keeping accuracy.
When the real interface address timeout is coming, and the timeout
is just before the rounded timeout, we accept some error.
When a timeout has been reached, we also try batching other several
events in very near future.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
stephen hemminger [Wed, 17 Mar 2010 20:31:17 +0000 (20:31 +0000)]
IPv6: addrconf cleanup addrconf_verify
The variable regen_advance is only used in the privacy case.
Move it to simplify code and eliminate ifdef's
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stephen Hemminger [Sat, 20 Mar 2010 23:09:01 +0000 (16:09 -0700)]
addrconf: checkpatch fixes
Fix some of the checkpatch complaints.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stephen Hemminger [Sat, 20 Mar 2010 23:08:18 +0000 (16:08 -0700)]
IPv6: addrconf cleanups
Some minor stuff, reformat comments and add whitespace for clarity
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
stephen hemminger [Wed, 17 Mar 2010 20:31:13 +0000 (20:31 +0000)]
ipv6: convert idev_list to list macros
Convert to list macro's for the list of addresses per interface
in IPv6.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
stephen hemminger [Wed, 17 Mar 2010 20:31:12 +0000 (20:31 +0000)]
ipv6: user better hash for addrconf
The existing hash function has a couple of issues:
* it is hardwired to 16 for IN6_ADDR_HSIZE
* limited to 256 and callers using int
* use jhash2 rather than some old BSD algorithm
No need for random seed since this is local only (based on assigned
addresses) table.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
stephen hemminger [Wed, 17 Mar 2010 20:31:11 +0000 (20:31 +0000)]
IPv6: convert addrconf hash list to RCU
Convert from reader/writer lock to RCU and spinlock for addrconf
hash list.
Adds an additional helper macro for hlist_for_each_entry_continue_rcu
to handle the continue case.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
stephen hemminger [Wed, 17 Mar 2010 20:31:10 +0000 (20:31 +0000)]
ipv6: convert addrconf list to hlist
Using hash list macros, simplifies code and helps later RCU.
This patch includes some initialization that is not strictly necessary,
since an empty hlist node/list is all zero; and list is in BSS
and node is allocated with kzalloc.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
stephen hemminger [Wed, 17 Mar 2010 20:31:09 +0000 (20:31 +0000)]
ipv6: convert temporary address list to list macros
Use list macros instead of open coded linked list.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 20 Mar 2010 22:24:29 +0000 (15:24 -0700)]
Merge branch 'master' of /linux/kernel/git/davem/net-2.6
David S. Miller [Sat, 20 Mar 2010 21:41:01 +0000 (14:41 -0700)]
Merge branch 'vhost' of git://git./linux/kernel/git/mst/vhost
Pablo Neira Ayuso [Tue, 16 Mar 2010 13:30:21 +0000 (13:30 +0000)]
netfilter: ctnetlink: fix reliable event delivery if message building fails
This patch fixes a bug that allows to lose events when reliable
event delivery mode is used, ie. if NETLINK_BROADCAST_SEND_ERROR
and NETLINK_RECV_NO_ENOBUFS socket options are set.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso [Thu, 18 Mar 2010 14:24:42 +0000 (14:24 +0000)]
netlink: fix NETLINK_RECV_NO_ENOBUFS in netlink_set_err()
Currently, ENOBUFS errors are reported to the socket via
netlink_set_err() even if NETLINK_RECV_NO_ENOBUFS is set. However,
that should not happen. This fixes this problem and it changes the
prototype of netlink_set_err() to return the number of sockets that
have set the NETLINK_RECV_NO_ENOBUFS socket option. This return
value is used in the next patch in these bugfix series.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steven J. Magnani [Tue, 16 Mar 2010 05:22:44 +0000 (05:22 +0000)]
NET_DMA: free skbs periodically
Under NET_DMA, data transfer can grind to a halt when userland issues a
large read on a socket with a high RCVLOWAT (i.e., 512 KB for both).
This appears to be because the NET_DMA design queues up lots of memcpy
operations, but doesn't issue or wait for them (and thus free the
associated skbs) until it is time for tcp_recvmesg() to return.
The socket hangs when its TCP window goes to zero before enough data is
available to satisfy the read.
Periodically issue asynchronous memcpy operations, and free skbs for ones
that have completed, to prevent sockets from going into zero-window mode.
Signed-off-by: Steven J. Magnani <steve@digidescorp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso [Tue, 16 Mar 2010 13:30:44 +0000 (13:30 +0000)]
netlink: fix unaligned access in nla_get_be64()
This patch fixes a unaligned access in nla_get_be64() that was
introduced by myself in
a17c859849402315613a0015ac8fbf101acf0cc1.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lennart Schulte [Wed, 17 Mar 2010 02:16:29 +0000 (02:16 +0000)]
tcp: Fix tcp_mark_head_lost() with packets == 0
A packet is marked as lost in case packets == 0, although nothing should be done.
This results in a too early retransmitted packet during recovery in some cases.
This small patch fixes this issue by returning immediately.
Signed-off-by: Lennart Schulte <lennart.schulte@nets.rwth-aachen.de>
Signed-off-by: Arnd Hannemann <hannemann@nets.rwth-aachen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Wed, 17 Mar 2010 06:04:14 +0000 (06:04 +0000)]
net: ipmr/ip6mr: fix potential out-of-bounds vif_table access
mfc_parent of cache entries is used to index into the vif_table and is
initialised from mfcctl->mfcc_parent. This can take values of to 2^16-1,
while the vif_table has only MAXVIFS (32) entries. The same problem
affects ip6mr.
Refuse invalid values to fix a potential out-of-bounds access. Unlike
the other validity checks, this is checked in ipmr_mfc_add() instead of
the setsockopt handler since its unused in the delete path and might be
uninitialized.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yegor Yefremov [Sat, 20 Mar 2010 05:43:29 +0000 (22:43 -0700)]
KS8695: update ksp->next_rx_desc_read at the end of rx loop
There is no need to adjust the next rx descriptor after each packet,
so do it only once at the end of the routine.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Yegor Yefremov <yegorslists@googlemail.com>
Carolyn Wyborny [Fri, 19 Mar 2010 06:07:48 +0000 (06:07 +0000)]
igb: Add support for 82576 ET2 Quad Port Server Adapter
Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Greg Rose [Fri, 19 Mar 2010 03:00:31 +0000 (03:00 +0000)]
ixgbevf: Message formatting cleanups
Clean up some text output formatting.
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Greg Rose [Fri, 19 Mar 2010 03:00:12 +0000 (03:00 +0000)]
ixgbevf: Shorten up delay timer for watchdog task
The recovery from PF reset works better when you shorten up the delay
until the watchdog task executes.
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Greg Rose [Fri, 19 Mar 2010 02:59:52 +0000 (02:59 +0000)]
ixgbevf: Fix VF Stats accounting after reset
The counters in the 82599 Virtual Function are not clear on read. They
accumulate to the maximum value and then roll over. They are also not
cleared when the VF executes a soft reset, so it is possible they are
non-zero when the driver loads and starts. This has all been accounted
for in the code that keeps the stats up to date but there is one case
that is not. When the PF driver is reset the counters in the VF are
all reset to zero. This adds an additional accounting overhead into
the VF driver when the PF is reset under its feet. This patch adds
additional counters that are used by the VF driver to accumulate and
save stats after a PF reset has been detected. Prior to this patch
displaying the stats in the VF after the PF has reset would show
bogus data.
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mallikarjuna R Chilakala [Fri, 19 Mar 2010 04:41:33 +0000 (04:41 +0000)]
ixgbe: Set IXGBE_RSC_CB(skb)->DMA field to zero after unmapping the address
As per Simon Horman's feedback set IXGBE_RSC_CB(skb)->dma to zero
after unmapping HWRSC DMA address to avoid double freeing.
Signed-off-by: Mallikarjuna R Chilakala <mallikarjuna.chilakala@intel.com>
Acked-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vasu Dev [Fri, 19 Mar 2010 04:33:10 +0000 (04:33 +0000)]
ixgbe: fix for real_num_tx_queues update issue
Currently netdev_features_change is called before fcoe tx queues
setup is done, so this patch moves calling of netdev_features_change
after tx queues setup is done in ixgbe_init_interrupt_scheme, so
that real_num_tx_queues is updated correctly on each fcoe enable
or disable.
This allows additional fcoe queues updated correctly in vlan driver
for their correct queue selection.
Signed-off-by: Vasu Dev <vasu.dev@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
stephen hemminger [Thu, 18 Mar 2010 11:27:32 +0000 (11:27 +0000)]
TCP: check min TTL on received ICMP packets
This adds RFC5082 checks for TTL on received ICMP packets.
It adds some security against spoofed ICMP packets
disrupting GTSM protected sessions.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Thu, 18 Mar 2010 23:00:22 +0000 (23:00 +0000)]
ipv6: Remove redundant dst NULL check in ip6_dst_check
As the only path leading to ip6_dst_check makes an indirect call
through dst->ops, dst cannot be NULL in ip6_dst_check.
This patch removes this check in case it misleads people who
come across this code.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Timo Teräs [Thu, 18 Mar 2010 23:20:20 +0000 (23:20 +0000)]
ipv4: check rt_genid in dst_check
Xfrm_dst keeps a reference to ipv4 rtable entries on each
cached bundle. The only way to renew xfrm_dst when the underlying
route has changed, is to implement dst_check for this. This is
what ipv6 side does too.
The problems started after
87c1e12b5eeb7b30b4b41291bef8e0b41fc3dde9
("ipsec: Fix bogus bundle flowi") which fixed a bug causing xfrm_dst
to not get reused, until that all lookups always generated new
xfrm_dst with new route reference and path mtu worked. But after the
fix, the old routes started to get reused even after they were expired
causing pmtu to break (well it would occationally work if the rtable
gc had run recently and marked the route obsolete causing dst_check to
get called).
Signed-off-by: Timo Teras <timo.teras@iki.fi>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steve Glendinning [Fri, 19 Mar 2010 05:18:41 +0000 (22:18 -0700)]
smsc95xx: Fix tx checksum offload for small packets
TX checksum offload does not work properly when transmitting
UDP packets with 0, 1 or 2 bytes of data. This patch works
around the problem by calculating checksums for these packets
in the driver.
Signed-off-by: Steve Glendinning <steve.glendinning@smsc.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mallikarjuna R Chilakala [Thu, 18 Mar 2010 15:16:56 +0000 (15:16 +0000)]
ixgbe: Fix 82599 KX4 Wake on LAN issue after an improper system shutdown
Advanced Power Management is disabled for 82599 KX4 connections by
clearing GRC.APME bit, causing it to not wake the system from an
improper system shutdown. By default GRC.APME is enabled and software
is not supposed to clear these settings during adapter probe.
Signed-off-by: Mallikarjuna R Chilakala <mallikarjuna.chilakala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mallikarjuna R Chilakala [Thu, 18 Mar 2010 14:34:52 +0000 (14:34 +0000)]
ixgbe: Fix 82599 multispeed fiber link issues due to Tx laser flapping
Fix 82599 link issues during driver load and unload test using multi-speed
10G & 1G fiber modules. When connected back to back sometime 82599 multispeed
fiber modules would link at 1G speed instead of 10G highest speed, due to a
race condition in autotry process involving Tx laser flapping. Move autotry
autoneg-37 tx laser flapping process from multispeed module init setup
to driver unload. This will alert the link partner to restart its
autotry process when it tries to establish the link with the link partner
Signed-off-by: Mallikarjuna R Chilakala <mallikarjuna.chilakala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vasanthy Kolluri [Thu, 18 Mar 2010 16:20:04 +0000 (16:20 +0000)]
enic: Clean up: Change driver description; Fix tab space; Update MAINTAINERS
1) Change enic driver description to "Cisco VIC Ethernet NIC Driver"
2) Fix tab space
3) Update MAINTAINERS list
Signed-off-by: Scott Feldman <scofeldm@cisco.com>
Signed-off-by: Vasanthy Kolluri <vkolluri@cisco.com>
Signed-off-by: Roopa Prabhu <roprabhu@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vasanthy Kolluri [Thu, 18 Mar 2010 16:19:59 +0000 (16:19 +0000)]
enic: Clean up: Add wrapper functions
Add wrapper functions vnic_dev_notify_setcmd and vnic_dev_notify_unsetcmd
for firmware notify commands.
Signed-off-by: Scott Feldman <scofeldm@cisco.com>
Signed-off-by: Vasanthy Kolluri <vkolluri@cisco.com>
Signed-off-by: Roopa Prabhu <roprabhu@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vasanthy Kolluri [Thu, 18 Mar 2010 16:19:54 +0000 (16:19 +0000)]
enic: Do not advertise NETIF_F_HW_VLAN_RX
Hardware does not honor vlan filters from the host and so the driver does
not need to advertise this.
Signed-off-by: Scott Feldman <scofeldm@cisco.com>
Signed-off-by: Vasanthy Kolluri <vkolluri@cisco.com>
Signed-off-by: Roopa Prabhu <roprabhu@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vasanthy Kolluri [Thu, 18 Mar 2010 16:19:49 +0000 (16:19 +0000)]
enic: Bug Fix: Fix timeout for hardware Tx and Rx queue disable operations
The timeout for hardware Tx and Rx queue disable operations is increased to
work-around an erratum for "unnamed" chipset where a DMA completion may take
upto 10ms. We have to wait atleast this long for hardware to signal that Tx
and Rx queues are quiesced.
Signed-off-by: Scott Feldman <scofeldm@cisco.com>
Signed-off-by: Vasanthy Kolluri <vkolluri@cisco.com>
Signed-off-by: Roopa Prabhu <roprabhu@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vasanthy Kolluri [Thu, 18 Mar 2010 16:19:44 +0000 (16:19 +0000)]
enic: Bug Fix: Fix hardware descriptor reads
The last bit written to a completion descriptor by hardware is the color
bit. Driver must read all other descriptor fields only after reading the
color bit to avoid reading stale descriptor fields. There is a rmb() after
reading the color bit to avoid any compiler/cpu reordering of the reads.
The color bit is the generation bit that toggles each pass through the
completion descriptor ring.
Signed-off-by: Scott Feldman <scofeldm@cisco.com>
Signed-off-by: Vasanthy Kolluri <vkolluri@cisco.com>
Signed-off-by: Roopa Prabhu <roprabhu@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 19 Mar 2010 04:18:19 +0000 (21:18 -0700)]
Merge branch 'master' of git://git./linux/kernel/git/linville/wireless-2.6
Eric Dumazet [Fri, 19 Mar 2010 04:16:45 +0000 (21:16 -0700)]
net: Potential null skb->dev dereference
When doing "ifenslave -d bond0 eth0", there is chance to get NULL
dereference in netif_receive_skb(), because dev->master suddenly becomes
NULL after we tested it.
We should use ACCESS_ONCE() to avoid this (or rcu_dereference())
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Guo-Fu Tseng [Wed, 17 Mar 2010 00:09:31 +0000 (00:09 +0000)]
jme: Advance driver version number
Advance driver version number after some bug fix.
Signed-off-by: Guo-Fu Tseng <cooldavid@cooldavid.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Guo-Fu Tseng [Wed, 17 Mar 2010 00:09:30 +0000 (00:09 +0000)]
jme: Protect vlgrp structure by pause RX actions.
Temporary stop the RX IRQ, and disable (sync) tasklet or napi.
And restore it after finished the vlgrp pointer assignment.
Signed-off-by: Guo-Fu Tseng <cooldavid@cooldavid.org>
Cc: stable@kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Guo-Fu Tseng [Wed, 17 Mar 2010 00:09:29 +0000 (00:09 +0000)]
jme: Fix VLAN memory leak
Fix memory leak while receiving 8021q tagged packet which is not
registered by user.
Signed-off-by: Guo-Fu Tseng <cooldavid@cooldavid.org>
Cc: stable@kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Abraham Arce [Tue, 16 Mar 2010 12:24:54 +0000 (12:24 +0000)]
KS8851: Avoid NULL pointer in set rx mode
Kernel NULL pointer dereference when setting mode for IFF_MULTICAST.
Tested on SDP OMAP4430 board.
ks8851 spi1.0: message enable is 0
ks8851 spi1.0: revision 0, MAC f2:f4:2f:56:37:de, IRQ 194
Unable to handle kernel NULL pointer dereference at virtual address
00000000
pgd =
c0004000
[
00000000] *pgd=
00000000
Internal error: Oops: 5 [#1] PREEMPT SMP
last sysfs file:
Modules linked in:
CPU: 0 Not tainted (
2.6.34-rc1-01039-g38d7ed1-dirty #3)
PC is at ks8851_set_rx_mode+0x88/0x124
LR is at bitrev32+0x24/0x2c
<snip>
Backtrace:
[<
c01bfbd8>] ? (ks8851_set_rx_mode+0x0/0x124)
[<
c01d4164>] (__dev_set_rx_mode+0x0/0x90)
[<
c01dc460>] (dev_mc_add+0x0/0x78)
[<
c021f0bc>] (igmp_group_added+0x0/0x64)
[<
c021f174>] (ip_mc_inc_group+0x0/0x150)
[<
c021f3b8>] (ip_mc_up+0x0/0x64)
[<
c0219eb0>] (inetdev_event+0x0/0x3d4)
[<
c0066818>] (notifier_call_chain+0x0/0x78)
[<
c00668b8>] (__raw_notifier_call_chain+0x0/0x24)
[<
c00668dc>] (raw_notifier_call_chain+0x0/0x28)
[<
c01d7484>] (call_netdevice_notifiers+0x0/0x24)
[<
c01d7780>] (__dev_notify_flags+0x0/0x68)
[<
c01d77e8>] (dev_change_flags+0x0/0x4c)
[<
c001f0bc>] (ip_auto_config+0x0/0xf1c)
[<
c0028490>] (do_one_initcall+0x0/0x1bc)
[<
c00084dc>] (kernel_init+0x0/0x234)
Code:
e15130bc e1833012 e14130bc e5943000 (
e5934000)
---[ end trace
ed0fb00a94142792 ]---
Kernel panic - not syncing: Fatal exception in interrupt
Signed-off-by: Abraham Arce <x0066660@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexandra Kossovsky [Fri, 19 Mar 2010 03:29:24 +0000 (20:29 -0700)]
tcp: Fix OOB POLLIN avoidance.
From: Alexandra.Kossovsky@oktetlabs.ru
Fixes kernel bugzilla #15541
Signed-off-by: David S. Miller <davem@davemloft.net>
Huang Weiyi [Fri, 19 Mar 2010 03:24:51 +0000 (20:24 -0700)]
net: remove unused #include <linux/version.h>
Remove unused #include <linux/version.h>('s) in
drivers/net/ksz884x.c
Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Wed, 10 Mar 2010 10:30:19 +0000 (10:30 +0000)]
net: forbid underlaying devices to change its type
It's not desired for underlaying devices to change type. At the time,
there is for example possible to have bond with changed type from
Ethernet to Infiniband as a port of a bridge. This patch fixes this.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Wed, 10 Mar 2010 10:29:35 +0000 (10:29 +0000)]
bonding: check return value of nofitier when changing type
This patch adds the possibility to refuse the bonding type change for
other subsystems (such as for example bridge, vlan, etc.)
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Wed, 10 Mar 2010 10:28:56 +0000 (10:28 +0000)]
net: rename notifier defines for netdev type change
Since generally there could be more netdevices changing type other
than bonding, making this event type name "bonding-unrelated"
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tom Herbert [Fri, 19 Mar 2010 00:45:44 +0000 (17:45 -0700)]
rps: Fixed build with CONFIG_SMP not enabled.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rafael J. Wysocki [Thu, 18 Mar 2010 06:12:24 +0000 (23:12 -0700)]
Net / e1000e: Fix build issue introduced by runtime PM patch
The recent PCI runtime PM patch broke build for CONFIG_PM_RUNTIME
and CONFIG_PM_SLEEP undefined. Fix that by moving the PM callbacks
under suitable #ifdefs.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tilman Schmidt [Wed, 17 Mar 2010 21:22:07 +0000 (14:22 -0700)]
gigaset: fix build failure
Update the dummy LL interface to the LL interface change
introduced by commit
daab433c03c15fd642c71c94eb51bdd3f32602c8.
This fixes the build failure occurring after that commit when
enabling ISDN_DRV_GIGASET but neither ISDN_I4L nor ISDN_CAPI.
Impact: bugfix
Signed-off-by: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael S. Tsirkin [Wed, 17 Mar 2010 14:06:11 +0000 (16:06 +0200)]
vhost: fix error handling in vring ioctls
Stanse found a locking problem in vhost_set_vring:
several returns from VHOST_SET_VRING_KICK, VHOST_SET_VRING_CALL,
VHOST_SET_VRING_ERR with the vq->mutex held.
Fix these up.
Reported-by: Jiri Slaby <jirislaby@gmail.com>
Acked-by: Laurent Chavey <chavey@google.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Michael S. Tsirkin [Mon, 8 Mar 2010 21:24:22 +0000 (23:24 +0200)]
vhost: fix interrupt mitigation with raw sockets
A thinko in code means we never trigger interrupt
mitigation. Fix this.
Reported-by: Juan Quintela <quintela@redhat.com>
Reported-by: Unai Uribarri <unai.uribarri@optenet.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
David S. Miller [Wed, 17 Mar 2010 06:36:24 +0000 (23:36 -0700)]
e1000e: Fix build with CONFIG_PM disabled.
Signed-off-by: David S. Miller <davem@davemloft.net>
Joe Perches [Wed, 17 Mar 2010 04:24:32 +0000 (21:24 -0700)]
drivers/net/e100.c: Use pr_<level> and netif_<level>
Convert DPRINTK, commonly used for debugging, to netif_<level>
Remove #define PFX
Use #define pr_fmt
Consistently use no periods for non-sentence logging messages
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jason Gunthorpe [Tue, 9 Mar 2010 09:17:42 +0000 (09:17 +0000)]
NET: Support clause 45 MDIO commands at the MDIO bus level
IEEE 802.3ae clause 45 specifies a somewhat modified MDIO protocol
for use by 10GIGE phys. The main change is a 21 bit address split into
a 5 bit device ID and a 16 bit register offset. The definition is designed
so that normal and extended devices can run on the same MDIO bus.
Extend mdio-bitbang to do the new protocol. At the MDIO bus level the
protocol is requested by or'ing MII_ADDR_C45 into the register offset.
Make phy_read/phy_write/etc pass a full 32 bit register offset.
This does not attempt to make the phy layer support C45 style PHYs, just
to provide the MDIO bus support.
Tested against a Broadcom 10GE phy with ID 0x206034, and several
Broadcom 10/100/1000 Phys in normal mode.
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rafael J. Wysocki [Sun, 14 Mar 2010 14:35:17 +0000 (14:35 +0000)]
e1000e / PCI / PM: Add basic runtime PM support (rev. 4)
Use the PCI runtime power management framework to add basic PCI
runtime PM support to the e1000e driver. Namely, make the driver
suspend the device when the link is off and set it up for generating
a wakeup event after the link has been detected again. [This
feature is disabled until the user space enables it with the help of
the /sys/devices/.../power/contol device attribute.]
Based on a patch from Matthew Garrett.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rafael J. Wysocki [Sun, 14 Mar 2010 14:33:51 +0000 (14:33 +0000)]
r8169 / PCI / PM: Add simplified runtime PM support (rev. 3)
Use the PCI runtime power management framework to add basic PCI
runtime PM support to the r8169 driver. Namely, make the driver
suspend the device when the link is not present and set it up for
generating a wakeup event after the link has been detected again.
[This feature is disabled until the user space enables it with the
help of the /sys/devices/.../power/contol device attribute.]
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Mon, 1 Mar 2010 05:09:14 +0000 (05:09 +0000)]
net: convert multiple drivers to use netdev_for_each_mc_addr, part7
In mlx4, using char * to store mc address in private structure instead.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Joe Perches [Sat, 27 Feb 2010 14:43:51 +0000 (14:43 +0000)]
drivers/net/ks*: Use netdev_<level>, netif_<level> and pr_<level>
I'm not sure this is correct.
It changes logging macros from:
dev_<level>(&ks->spidev->dev,
to
netdev_<level>(ks->netdev,
Comments?
Use netdev_<level>
Use netif_<level>
Use pr_<level>
Add #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
Add missing line to message in ks8851_remove
Change kmalloc/memset(,0) to kzalloc
Remove ks_<level> macros
Consolidation code into set_media_state
Signed-off-by: David S. Miller <davem@davemloft.net>
Neil Horman [Mon, 15 Mar 2010 07:58:45 +0000 (07:58 +0000)]
tipc: Allow retransmission of cloned buffers
Forward port commit
fc477e160af086f6e30c3d4fdf5f5c000d29beb5
from git://tipc.cslab.ericsson.net/pub/git/people/allan/tipc.git
Origional commit message:
Allow retransmission of cloned buffers
This patch fixes an issue with TIPC's message retransmission logic
that prevented retransmission of clone sk_buffs. Originally intended
as a means of avoiding wasted work in retransmitting messages that
were still on the driver's outbound queue, it also prevented TIPC
from retransmitting messages through other means -- such as the
secondary bearer of the broadcast link, or another interface in a
set of bonded interfaces. This fix removes existing checks for
cloned sk_buffs that prevented such retransmission.
Origionally-Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Neil Horman [Mon, 15 Mar 2010 08:02:24 +0000 (08:02 +0000)]
tipc: Increase frequency of load distribution over broadcast link
Forward port commit
29eb572941501c40ac6e62dbc5043bf9ee76ee56
from git://tipc.cslab.ericsson.net/pub/git/people/allan/tipc.git
Origional commit message:
Increase frequency of load distribution over broadcast link
This patch enhances the behavior of TIPC's broadcast link so that it
alternates between redundant bearers (if available) after every
message sent, rather than after every 10 messages. This change helps
to speed up delivery of retransmitted messages by ensuring that
they are not sent repeatedly over a bearer that is no longer working,
but not yet recognized as failed.
Tested by myself in the latest net-2.6 tree using the tipc sanity test suite
Origionally-signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
bcast.c | 35 ++++++++++++++---------------------
1 file changed, 14 insertions(+), 21 deletions(-)
Signed-off-by: David S. Miller <davem@davemloft.net>
Jan Engelhardt [Thu, 11 Mar 2010 09:57:29 +0000 (09:57 +0000)]
net: core: add IFLA_STATS64 support
`ip -s link` shows interface counters truncated to 32 bit. This is
because interface statistics are transported only in 32-bit quantity
to userspace. This commit adds a new IFLA_STATS64 attribute that
exports them in full 64 bit.
References: http://lkml.indiana.edu/hypermail/linux/kernel/0307.3/0215.html
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jan Engelhardt [Thu, 11 Mar 2010 09:57:28 +0000 (09:57 +0000)]
net: tcp: make veno selectable as default congestion module
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jan Engelhardt [Thu, 11 Mar 2010 09:57:27 +0000 (09:57 +0000)]
net: tcp: make hybla selectable as default congestion module
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 9 Mar 2010 20:03:38 +0000 (20:03 +0000)]
net: remove rcu locking from fib_rules_event()
We hold RTNL at this point and dont use RCU variants of list traversals,
we dont need rcu_read_lock()/rcu_read_unlock()
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
stephen hemminger [Tue, 2 Mar 2010 13:32:09 +0000 (13:32 +0000)]
bridge: per-cpu packet statistics (v3)
The shared packet statistics are a potential source of slow down
on bridged traffic. Convert to per-cpu array, but only keep those
statistics which change per-packet.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tom Herbert [Tue, 16 Mar 2010 08:03:29 +0000 (08:03 +0000)]
rps: Receive Packet Steering
This patch implements software receive side packet steering (RPS). RPS
distributes the load of received packet processing across multiple CPUs.
Problem statement: Protocol processing done in the NAPI context for received
packets is serialized per device queue and becomes a bottleneck under high
packet load. This substantially limits pps that can be achieved on a single
queue NIC and provides no scaling with multiple cores.
This solution queues packets early on in the receive path on the backlog queues
of other CPUs. This allows protocol processing (e.g. IP and TCP) to be
performed on packets in parallel. For each device (or each receive queue in
a multi-queue device) a mask of CPUs is set to indicate the CPUs that can
process packets. A CPU is selected on a per packet basis by hashing contents
of the packet header (e.g. the TCP or UDP 4-tuple) and using the result to index
into the CPU mask. The IPI mechanism is used to raise networking receive
softirqs between CPUs. This effectively emulates in software what a multi-queue
NIC can provide, but is generic requiring no device support.
Many devices now provide a hash over the 4-tuple on a per packet basis
(e.g. the Toeplitz hash). This patch allow drivers to set the HW reported hash
in an skb field, and that value in turn is used to index into the RPS maps.
Using the HW generated hash can avoid cache misses on the packet when
steering it to a remote CPU.
The CPU mask is set on a per device and per queue basis in the sysfs variable
/sys/class/net/<device>/queues/rx-<n>/rps_cpus. This is a set of canonical
bit maps for receive queues in the device (numbered by <n>). If a device
does not support multi-queue, a single variable is used for the device (rx-0).
Generally, we have found this technique increases pps capabilities of a single
queue device with good CPU utilization. Optimal settings for the CPU mask
seem to depend on architectures and cache hierarcy. Below are some results
running 500 instances of netperf TCP_RR test with 1 byte req. and resp.
Results show cumulative transaction rate and system CPU utilization.
e1000e on 8 core Intel
Without RPS: 108K tps at 33% CPU
With RPS: 311K tps at 64% CPU
forcedeth on 16 core AMD
Without RPS: 156K tps at 15% CPU
With RPS: 404K tps at 49% CPU
bnx2x on 16 core AMD
Without RPS 567K tps at 61% CPU (4 HW RX queues)
Without RPS 738K tps at 96% CPU (8 HW RX queues)
With RPS: 854K tps at 76% CPU (4 HW RX queues)
Caveats:
- The benefits of this patch are dependent on architecture and cache hierarchy.
Tuning the masks to get best performance is probably necessary.
- This patch adds overhead in the path for processing a single packet. In
a lightly loaded server this overhead may eliminate the advantages of
increased parallelism, and possibly cause some relative performance degradation.
We have found that masks that are cache aware (share same caches with
the interrupting CPU) mitigate much of this.
- The RPS masks can be changed dynamically, however whenever the mask is changed
this introduces the possibility of generating out of order packets. It's
probably best not change the masks too frequently.
Signed-off-by: Tom Herbert <therbert@google.com>
include/linux/netdevice.h | 32 ++++-
include/linux/skbuff.h | 3 +
net/core/dev.c | 335 +++++++++++++++++++++++++++++++++++++--------
net/core/net-sysfs.c | 225 ++++++++++++++++++++++++++++++-
net/core/skbuff.c | 2 +
5 files changed, 538 insertions(+), 59 deletions(-)
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tina Yang [Thu, 11 Mar 2010 13:50:07 +0000 (13:50 +0000)]
RDS: Enable per-cpu workqueue threads
Create per-cpu workqueue threads instead of a single
krdsd thread. This is a step towards better scalability.
Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Grover [Thu, 11 Mar 2010 13:50:06 +0000 (13:50 +0000)]
RDS: Do not call set_page_dirty() with irqs off
set_page_dirty() unconditionally re-enables interrupts, so
if we call it with irqs off, they will be on after the call,
and that's bad. This patch moves the call after we've re-enabled
interrupts in send_drop_to(), so it's safe.
Also, add BUG_ONs to let us know if we ever do call set_page_dirty
with interrupts off.
Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sherman Pun [Thu, 11 Mar 2010 13:50:05 +0000 (13:50 +0000)]
RDS: Properly unmap when getting a remote access error
If the RDMA op has aborted with a remote access error,
in addition to what we already do (tell userspace it has
completed with an error) also unmap it and put() the rm.
Otherwise, hangs may occur on arches that track maps and
will not exit without proper cleanup.
Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Grover [Thu, 11 Mar 2010 13:50:04 +0000 (13:50 +0000)]
RDS: only put sockets that have seen congestion on the poll_waitq
rds_poll_waitq's listeners will be awoken if we receive a congestion
notification. Bad performance may result because *all* polled sockets
contend for this single lock. However, it should not be necessary to
wake pollers when a congestion update arrives if they have never
experienced congestion, and not putting these on the waitq will
hopefully greatly reduce contention.
Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tina Yang [Thu, 11 Mar 2010 13:50:03 +0000 (13:50 +0000)]
RDS: Fix locking in rds_send_drop_to()
It seems rds_send_drop_to() called
__rds_rdma_send_complete(rs, rm, RDS_RDMA_CANCELED)
with only rds_sock lock, but not rds_message lock. It raced with
other threads that is attempting to modify the rds_message as well,
such as from within rds_rdma_send_complete().
Signed-off-by: Tina Yang <tina.yang@oracle.com>
Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Grover [Thu, 11 Mar 2010 13:50:02 +0000 (13:50 +0000)]
RDS: Turn down alarming reconnect messages
RDS's error messages when a connection goes down are a little
extreme. A connection may go down, and it will be re-established,
and everything is fine. This patch links these messages through
rdsdebug(), instead of to printk directly.
Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Grover [Thu, 11 Mar 2010 13:50:01 +0000 (13:50 +0000)]
RDS: Workaround for in-use MRs on close causing crash
if a machine is shut down without closing sockets properly, and
freeing all MRs, then a BUG_ON will bring it down. This patch
changes these to WARN_ONs -- leaking MRs is not fatal (although
not ideal, and there is more work to do here for a proper fix.)
Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tina Yang [Thu, 11 Mar 2010 13:50:00 +0000 (13:50 +0000)]
RDS: Fix send locking issue
Fix a deadlock between rds_rdma_send_complete() and
rds_send_remove_from_sock() when rds socket lock and
rds message lock are acquired out-of-order.
Signed-off-by: Tina Yang <Tina.Yang@oracle.com>
Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Grover [Thu, 11 Mar 2010 13:49:59 +0000 (13:49 +0000)]
RDS: Fix congestion issues for loopback
We have two kinds of loopback: software (via loop transport)
and hardware (via IB). sw is used for 127.0.0.1, and doesn't
support rdma ops. hw is used for sends to local device IPs,
and supports rdma. Both are used in different cases.
For both of these, when there is a congestion map update, we
want to call rds_cong_map_updated() but not actually send
anything -- since loopback local and foreign congestion maps
point to the same spot, they're already in sync.
The old code never called sw loop's xmit_cong_map(),so
rds_cong_map_updated() wasn't being called for it. sw loop
ports would not work right with the congestion monitor.
Fixing that meant that hw loopback now would send congestion maps
to itself. This is also undesirable (racy), so we check for this
case in the ib-specific xmit code.
Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Grover [Thu, 11 Mar 2010 13:49:58 +0000 (13:49 +0000)]
RDS/TCP: Wait to wake thread when write space available
Instead of waking the send thread whenever any send space is available,
wait until it is at least half empty. This is modeled on how
sock_def_write_space() does it, and may help to minimize context
switches.
Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Grover [Thu, 11 Mar 2010 13:49:57 +0000 (13:49 +0000)]
RDS: update copy_to_user state in tcp transport
Other transports use rds_page_copy_user, which updates our
s_copy_to_user counter. TCP doesn't, so it needs to explicity
call rds_stats_add().
Reported-by: Richard Frank <richard.frank@oracle.com>
Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Grover [Thu, 11 Mar 2010 13:49:56 +0000 (13:49 +0000)]
RDS: sendmsg() should check sndtimeo, not rcvtimeo
Most likely cut n paste error - sendmsg() was checking sock_rcvtimeo.
Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Grover [Thu, 11 Mar 2010 13:49:55 +0000 (13:49 +0000)]
RDS: Do not BUG() on error returned from ib_post_send
BUGging on a runtime error code should be avoided. This
patch also eliminates all other BUG()s that have no real
reason to exist.
Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 16 Mar 2010 21:37:47 +0000 (14:37 -0700)]
bridge: Make first arg to deliver_clone const.
Otherwise we get a warning from the call in br_forward().
Signed-off-by: David S. Miller <davem@davemloft.net>
YOSHIFUJI Hideaki / 吉藤英明 [Mon, 15 Mar 2010 21:51:18 +0000 (21:51 +0000)]
bridge br_multicast: Don't refer to BR_INPUT_SKB_CB(skb)->mrouters_only without IGMP snooping.
Without CONFIG_BRIDGE_IGMP_SNOOPING,
BR_INPUT_SKB_CB(skb)->mrouters_only is not appropriately
initialized, so we can see garbage.
A clear option to fix this is to set it even without that
config, but we cannot optimize out the branch.
Let's introduce a macro that returns value of mrouters_only
and let it return 0 without CONFIG_BRIDGE_IGMP_SNOOPING.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vitaliy Gusev [Tue, 16 Mar 2010 01:07:51 +0000 (01:07 +0000)]
route: Fix caught BUG_ON during rt_secret_rebuild_oneshot()
route: Fix caught BUG_ON during rt_secret_rebuild_oneshot()
Call rt_secret_rebuild can cause BUG_ON(timer_pending(&net->ipv4.rt_secret_timer)) in
add_timer as there is not any synchronization for call rt_secret_rebuild_oneshot()
for the same net namespace.
Also this issue affects to rt_secret_reschedule().
Thus use mod_timer enstead.
Signed-off-by: Vitaliy Gusev <vgusev@openvz.org>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
YOSHIFUJI Hideaki / 吉藤英明 [Mon, 15 Mar 2010 19:26:56 +0000 (19:26 +0000)]
bridge br_multicast: Fix skb leakage in error path.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
YOSHIFUJI Hideaki / 吉藤英明 [Mon, 15 Mar 2010 19:27:00 +0000 (19:27 +0000)]
bridge br_multicast: Fix handling of Max Response Code in IGMPv3 message.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Slaby [Tue, 16 Mar 2010 05:29:54 +0000 (05:29 +0000)]
NET: netpoll, fix potential NULL ptr dereference
Stanse found that one error path in netpoll_setup dereferences npinfo
even though it is NULL. Avoid that by adding new label and go to that
instead.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Daniel Borkmann <danborkmann@googlemail.com>
Cc: David S. Miller <davem@davemloft.net>
Acked-by: chavey@google.com
Acked-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Neil Horman [Tue, 16 Mar 2010 08:14:33 +0000 (08:14 +0000)]
tipc: fix lockdep warning on address assignment
So in the forward porting of various tipc packages, I was constantly
getting this lockdep warning everytime I used tipc-config to set a network
address for the protocol:
[ INFO: possible circular locking dependency detected ]
2.6.33 #1
tipc-config/1326 is trying to acquire lock:
(ref_table_lock){+.-...}, at: [<
ffffffffa0315148>] tipc_ref_discard+0x53/0xd4 [tipc]
but task is already holding lock:
(&(&entry->lock)->rlock#2){+.-...}, at: [<
ffffffffa03150d5>] tipc_ref_lock+0x43/0x63 [tipc]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&(&entry->lock)->rlock#2){+.-...}:
[<
ffffffff8107b508>] __lock_acquire+0xb67/0xd0f
[<
ffffffff8107b78c>] lock_acquire+0xdc/0x102
[<
ffffffff8145471e>] _raw_spin_lock_bh+0x3b/0x6e
[<
ffffffffa03152b1>] tipc_ref_acquire+0xe8/0x11b [tipc]
[<
ffffffffa031433f>] tipc_createport_raw+0x78/0x1b9 [tipc]
[<
ffffffffa031450b>] tipc_createport+0x8b/0x125 [tipc]
[<
ffffffffa030f221>] tipc_subscr_start+0xce/0x126 [tipc]
[<
ffffffffa0308fb2>] process_signal_queue+0x47/0x7d [tipc]
[<
ffffffff81053e0c>] tasklet_action+0x8c/0xf4
[<
ffffffff81054bd8>] __do_softirq+0xf8/0x1cd
[<
ffffffff8100aadc>] call_softirq+0x1c/0x30
[<
ffffffff810549f4>] _local_bh_enable_ip+0xb8/0xd7
[<
ffffffff81054a21>] local_bh_enable_ip+0xe/0x10
[<
ffffffff81454d31>] _raw_spin_unlock_bh+0x34/0x39
[<
ffffffffa0308eb8>] spin_unlock_bh.clone.0+0x15/0x17 [tipc]
[<
ffffffffa0308f47>] tipc_k_signal+0x8d/0xb1 [tipc]
[<
ffffffffa0308dd9>] tipc_core_start+0x8a/0xad [tipc]
[<
ffffffffa01b1087>] 0xffffffffa01b1087
[<
ffffffff8100207d>] do_one_initcall+0x72/0x18a
[<
ffffffff810872fb>] sys_init_module+0xd8/0x23a
[<
ffffffff81009b42>] system_call_fastpath+0x16/0x1b
-> #0 (ref_table_lock){+.-...}:
[<
ffffffff8107b3b2>] __lock_acquire+0xa11/0xd0f
[<
ffffffff8107b78c>] lock_acquire+0xdc/0x102
[<
ffffffff81454836>] _raw_write_lock_bh+0x3b/0x6e
[<
ffffffffa0315148>] tipc_ref_discard+0x53/0xd4 [tipc]
[<
ffffffffa03141ee>] tipc_deleteport+0x40/0x119 [tipc]
[<
ffffffffa0316e35>] release+0xeb/0x137 [tipc]
[<
ffffffff8139dbf4>] sock_release+0x1f/0x6f
[<
ffffffff8139dc6b>] sock_close+0x27/0x2b
[<
ffffffff811116f6>] __fput+0x12a/0x1df
[<
ffffffff811117c5>] fput+0x1a/0x1c
[<
ffffffff8110e49b>] filp_close+0x68/0x72
[<
ffffffff8110e552>] sys_close+0xad/0xe7
[<
ffffffff81009b42>] system_call_fastpath+0x16/0x1b
Finally decided I should fix this. Its a straightforward inversion,
tipc_ref_acquire takes two locks in this order:
ref_table_lock
entry->lock
while tipc_deleteport takes them in this order:
entry->lock (via tipc_port_lock())
ref_table_lock (via tipc_ref_discard())
when the same entry is referenced, we get the above warning. The fix is equally
straightforward. Theres no real relation between the entry->lock and the
ref_table_lock (they just are needed at the same time), so move the entry->lock
aquisition in tipc_ref_acquire down, after we unlock ref_table_lock (this is
safe since the ref_table_lock guards changes to the reference table, and we've
already claimed a slot there. I've tested the below fix and confirmed that it
clears up the lockdep issue
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
James Chapman [Tue, 16 Mar 2010 06:29:20 +0000 (06:29 +0000)]
l2tp: Fix UDP socket reference count bugs in the pppol2tp driver
This patch fixes UDP socket refcnt bugs in the pppol2tp driver.
A bug can cause a kernel stack trace when a tunnel socket is closed.
A way to reproduce the issue is to prepare the UDP socket for L2TP (by
opening a tunnel pppol2tp socket) and then close it before any L2TP
sessions are added to it. The sequence is
Create UDP socket
Create tunnel pppol2tp socket to prepare UDP socket for L2TP
pppol2tp_connect: session_id=0, peer_session_id=0
L2TP SCCRP control frame received (tunnel_id==0)
pppol2tp_recv_core: sock_hold()
pppol2tp_recv_core: sock_put
L2TP ZLB control frame received (tunnel_id=nnn)
pppol2tp_recv_core: sock_hold()
pppol2tp_recv_core: sock_put
Close tunnel management socket
pppol2tp_release: session_id=0, peer_session_id=0
Close UDP socket
udp_lib_close: BUG
The addition of sock_hold() in pppol2tp_connect() solves the problem.
For data frames, two sock_put() calls were added to plug a refcnt leak
per received data frame. The ref that is grabbed at the top of
pppol2tp_recv_core() must always be released, but this wasn't done for
accepted data frames or data frames discarded because of bad UDP
checksums. This leak meant that any UDP socket that had passed L2TP
data traffic (i.e. L2TP data frames, not just L2TP control frames)
using pppol2tp would not be released by the kernel.
WARNING: at include/net/sock.h:435 udp_lib_unhash+0x117/0x120()
Pid: 1086, comm: openl2tpd Not tainted 2.6.33-rc1 #8
Call Trace:
[<
c119e9b7>] ? udp_lib_unhash+0x117/0x120
[<
c101b871>] ? warn_slowpath_common+0x71/0xd0
[<
c119e9b7>] ? udp_lib_unhash+0x117/0x120
[<
c101b8e3>] ? warn_slowpath_null+0x13/0x20
[<
c119e9b7>] ? udp_lib_unhash+0x117/0x120
[<
c11598a7>] ? sk_common_release+0x17/0x90
[<
c11a5e33>] ? inet_release+0x33/0x60
[<
c11577b0>] ? sock_release+0x10/0x60
[<
c115780f>] ? sock_close+0xf/0x30
[<
c106e542>] ? __fput+0x52/0x150
[<
c106b68e>] ? filp_close+0x3e/0x70
[<
c101d2e2>] ? put_files_struct+0x62/0xb0
[<
c101eaf7>] ? do_exit+0x5e7/0x650
[<
c1081623>] ? mntput_no_expire+0x13/0x70
[<
c106b68e>] ? filp_close+0x3e/0x70
[<
c101eb8a>] ? do_group_exit+0x2a/0x70
[<
c101ebe1>] ? sys_exit_group+0x11/0x20
[<
c10029b0>] ? sysenter_do_call+0x12/0x26
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steve Glendinning [Tue, 16 Mar 2010 09:03:06 +0000 (09:03 +0000)]
smsc95xx: wait for PHY to complete reset during init
This patch ensures the PHY correctly completes its reset before
setting register values.
Signed-off-by: Steve Glendinning <steve.glendinning@smsc.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
James Chapman [Tue, 16 Mar 2010 06:46:31 +0000 (06:46 +0000)]
l2tp: Fix oops in pppol2tp_xmit
When transmitting L2TP frames, we derive the outgoing interface's UDP
checksum hardware assist capabilities from the tunnel dst dev. This
can sometimes be NULL, especially when routing protocols are used and
routing changes occur. This patch just checks for NULL dst or dev
pointers when checking for netdev hardware assist features.
BUG: unable to handle kernel NULL pointer dereference at
0000000c
IP: [<
f89d074c>] pppol2tp_xmit+0x341/0x4da [pppol2tp]
*pde =
00000000
Oops: 0000 [#1] SMP
last sysfs file: /sys/class/net/lo/operstate
Modules linked in: pppol2tp pppox ppp_generic slhc ipv6 dummy loop snd_hda_codec_atihdmi snd_hda_intel snd_hda_codec snd_pcm snd_timer snd soundcore snd_page_alloc evdev psmouse serio_raw processor button i2c_piix4 i2c_core ati_agp agpgart pcspkr ext3 jbd mbcache sd_mod ide_pci_generic atiixp ide_core ahci ata_generic floppy ehci_hcd ohci_hcd libata e1000e scsi_mod usbcore nls_base thermal fan thermal_sys [last unloaded: scsi_wait_scan]
Pid: 0, comm: swapper Not tainted (2.6.32.8 #1)
EIP: 0060:[<
f89d074c>] EFLAGS:
00010297 CPU: 3
EIP is at pppol2tp_xmit+0x341/0x4da [pppol2tp]
EAX:
00000000 EBX:
f64d1680 ECX:
000005b9 EDX:
00000000
ESI:
f6b91850 EDI:
f64d16ac EBP:
f6a0c4c0 ESP:
f70a9cac
DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
Process swapper (pid: 0, ti=
f70a8000 task=
f70a31c0 task.ti=
f70a8000)
Stack:
000005a9 000005b9 f734c400 f66652c0 f7352e00 f67dc800 00000000 f6b91800
<0>
000005a3 f70ef6c4 f67dcda9 000005a3 f89b192e 00000246 000005a3 f64d1680
<0>
f63633e0 f6363320 f64d1680 f65a7320 f65a7364 f65856c0 f64d1680 f679f02f
Call Trace:
[<
f89b192e>] ? ppp_push+0x459/0x50e [ppp_generic]
[<
f89b217f>] ? ppp_xmit_process+0x3b6/0x430 [ppp_generic]
[<
f89b2306>] ? ppp_start_xmit+0x10d/0x120 [ppp_generic]
[<
c11c15cb>] ? dev_hard_start_xmit+0x21f/0x2b2
[<
c11d0947>] ? sch_direct_xmit+0x48/0x10e
[<
c11c19a0>] ? dev_queue_xmit+0x263/0x3a6
[<
c11e2a9f>] ? ip_finish_output+0x1f7/0x221
[<
c11df682>] ? ip_forward_finish+0x2e/0x30
[<
c11de645>] ? ip_rcv_finish+0x295/0x2a9
[<
c11c0b19>] ? netif_receive_skb+0x3e9/0x404
[<
f814b791>] ? e1000_clean_rx_irq+0x253/0x2fc [e1000e]
[<
f814cb7a>] ? e1000_clean+0x63/0x1fc [e1000e]
[<
c1047eff>] ? sched_clock_local+0x15/0x11b
[<
c11c1095>] ? net_rx_action+0x96/0x195
[<
c1035750>] ? __do_softirq+0xaa/0x151
[<
c1035828>] ? do_softirq+0x31/0x3c
[<
c10358fe>] ? irq_exit+0x26/0x58
[<
c1004b21>] ? do_IRQ+0x78/0x89
[<
c1003729>] ? common_interrupt+0x29/0x30
[<
c101ac28>] ? native_safe_halt+0x2/0x3
[<
c1008c54>] ? default_idle+0x55/0x75
[<
c1009045>] ? c1e_idle+0xd2/0xd5
[<
c100233c>] ? cpu_idle+0x46/0x62
Code: 8d 45 08 f0 ff 45 08 89 6b 08 c7 43 68 7e fb 9c f8 8a 45 24 83 e0 0c 3c 04 75 09 80 63 64 f3 e9 b4 00 00 00 8b 43 18 8b 4c 24 04 <8b> 40 0c 8d 79 11 f6 40 44 0e 8a 43 64 75 51 6a 00 8b 4c 24 08
EIP: [<
f89d074c>] pppol2tp_xmit+0x341/0x4da [pppol2tp] SS:ESP 0068:
f70a9cac
CR2:
000000000000000c
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steve Glendinning [Tue, 16 Mar 2010 08:46:46 +0000 (08:46 +0000)]
smsc75xx: SMSC LAN75xx USB gigabit ethernet adapter driver
This patch adds a driver for SMSC's LAN7500 family of USB 2.0
to gigabit ethernet adapters. It's loosely based on the smsc95xx
driver but the device registers for LAN7500 are completely different.
Signed-off-by: Steve Glendinning <steve.glendinning@smsc.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Atsushi Nemoto [Tue, 16 Mar 2010 05:27:40 +0000 (05:27 +0000)]
ne: Do not use slashes in irq name string
This patch fixes following warning introduced by commit
12bac0d9f4dbf3445a0319beee848d15fa32775e ("proc: warn on non-existing
proc entries"):
WARNING: at /work/mips-linux/make/linux/fs/proc/generic.c:316 __xlate_proc_name+0xe0/0xe8()
name 'RBHMA4X00/RTL8019'
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Slaby [Tue, 16 Mar 2010 04:53:50 +0000 (04:53 +0000)]
NET: ksz884x, fix lock imbalance
Stanse found that one error path (when alloc_skb fails) in netdev_tx
omits to unlock hw_priv->hwlock. Fix that.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Tristram Ha <Tristram.Ha@micrel.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tilman Schmidt [Tue, 16 Mar 2010 07:04:01 +0000 (07:04 +0000)]
gigaset: correct range checking off by one error
Correct a potential array overrun due to an off by one error in the
range check on the CAPI CONNECT_REQ CIPValue parameter.
Found and reported by Dan Carpenter using smatch.
Impact: bugfix
Signed-off-by: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adel Gadllah [Sun, 14 Mar 2010 18:16:25 +0000 (19:16 +0100)]
iwlwifi: Silence tfds_in_queue message
Commit
a239a8b47cc0e5e6d7416a89f340beac06d5edaa introduced a
noisy message, that fills up the log very fast.
The error seems not to be fatal (the connection is stable and
performance is ok), so make it IWL_DEBUG_TX rather than IWL_ERR.
Signed-off-by: Adel Gadllah <adel.gadllah@gmail.com>
Cc: stable@kernel.org
Acked-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Felix Fietkau [Fri, 12 Mar 2010 03:02:43 +0000 (04:02 +0100)]
ath9k: fix BUG_ON triggered by PAE frames
When I initially stumbled upon sequence number problems with PAE frames
in ath9k, I submitted a patch to remove all special cases for PAE
frames and let them go through the normal transmit path.
Out of concern about crypto incompatibility issues, this change was
merged instead:
commit
6c8afef551fef87a3bf24f8a74c69a7f2f72fc82
Author: Sujith <Sujith.Manoharan@atheros.com>
Date: Tue Feb 9 10:07:00 2010 +0530
ath9k: Fix sequence numbers for PAE frames
After a lot of testing, I'm able to reliably trigger a driver crash on
rekeying with current versions with this change in place.
It seems that the driver does not support sending out regular MPDUs with
the same TID while an A-MPDU session is active.
This leads to duplicate entries in the TID Tx buffer, which hits the
following BUG_ON in ath_tx_addto_baw():
index = ATH_BA_INDEX(tid->seq_start, bf->bf_seqno);
cindex = (tid->baw_head + index) & (ATH_TID_MAX_BUFS - 1);
BUG_ON(tid->tx_buf[cindex] != NULL);
I believe until we actually have a reproducible case of an
incompatibility with another AP using no PAE special cases, we should
simply get rid of this mess.
This patch completely fixes my crash issues in STA mode and makes it
stay connected without throughput drops or connectivity issues even
when the AP is configured to a very short group rekey interval.
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Cc: stable@kernel.org
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Grazvydas Ignotas [Thu, 11 Mar 2010 15:45:26 +0000 (17:45 +0200)]
wl1251: fix potential crash
In case debugfs does not init for some reason (or is disabled
on older kernels) driver does not allocate stats.fw_stats
structure, but tries to clear it later and trips on a NULL
pointer:
Unable to handle kernel NULL pointer dereference at virtual address
00000000
PC is at __memzero+0x24/0x80
Backtrace:
[<
bf0ddb88>] (wl1251_debugfs_reset+0x0/0x30 [wl1251])
[<
bf0d6a2c>] (wl1251_op_stop+0x0/0x12c [wl1251])
[<
bf0bc228>] (ieee80211_stop_device+0x0/0x74 [mac80211])
[<
bf0b0d10>] (ieee80211_stop+0x0/0x4ac [mac80211])
[<
c02deeac>] (dev_close+0x0/0xb4)
[<
c02deac0>] (dev_change_flags+0x0/0x184)
[<
c031f478>] (devinet_ioctl+0x0/0x704)
[<
c0320720>] (inet_ioctl+0x0/0x100)
Add a NULL pointer check to fix this.
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Acked-by: Kalle Valo <kalle.valo@iki.fi>
Cc: stable@kernel.org
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Michael Braun [Tue, 16 Mar 2010 07:26:22 +0000 (00:26 -0700)]
bridge: Fix br_forward crash in promiscuous mode
From: Michael Braun <michael-dev@fami-braun.de>
bridge: Fix br_forward crash in promiscuous mode
It's a linux-next kernel from 2010-03-12 on an x86 system and it
OOPs in the bridge module in br_pass_frame_up (called by
br_handle_frame_finish) because brdev cannot be dereferenced (its set to
a non-null value).
Adding some BUG_ON statements revealed that
BR_INPUT_SKB_CB(skb)->brdev == br-dev
(as set in br_handle_frame_finish first)
only holds until br_forward is called.
The next call to br_pass_frame_up then fails.
Digging deeper it seems that br_forward either frees the skb or passes
it to NF_HOOK which will in turn take care of freeing the skb. The
same is holds for br_pass_frame_ip. So it seems as if two independent
skb allocations are required. As far as I can see, commit
b33084be192ee1e347d98bb5c9e38a53d98d35e2 ("bridge: Avoid unnecessary
clone on forward path") removed skb duplication and so likely causes
this crash. This crash does not happen on 2.6.33.
I've therefore modified br_forward the same way br_flood has been
modified so that the skb is not freed if skb0 is going to be used
and I can confirm that the attached patch resolves the issue for me.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Tue, 16 Mar 2010 03:38:25 +0000 (20:38 -0700)]
bridge: Move NULL mdb check into br_mdb_ip_get
Since all callers of br_mdb_ip_get need to check whether the
hash table is NULL, this patch moves the check into the function.
This fixes the two callers (query/leave handler) that didn't
check it.
Reported-by: Michael Braun <michael-dev@fami-braun.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lars Ellenberg [Tue, 16 Mar 2010 02:09:28 +0000 (19:09 -0700)]
ISDN: Add PCI ID for HFC-2S/4S Beronet Card PCIe
A few subdevice IDs seem to have been dropped when hfc_multi was
included upstream, just compare the list at
http://www.openvox.cn/viewvc/misdn/trunk/hfc_multi.c?revision=75&view=annotate#l175
with the IDs in drivers/isdn/hardware/mISDN/hfcmulti.c
Added PCIe 2 Port card and LED settings (same as PCI)
Do not use <linux/pci_ids.h> /KKe
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Karsten Keil <keil@b1-systems.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 15 Mar 2010 23:23:54 +0000 (16:23 -0700)]
Merge branch 'master' of /home/davem/src/GIT/linux-2.6/