Eric Dumazet [Wed, 11 Jul 2012 05:50:31 +0000 (05:50 +0000)]
tcp: TCP Small Queues
This introduce TSQ (TCP Small Queues)
TSQ goal is to reduce number of TCP packets in xmit queues (qdisc &
device queues), to reduce RTT and cwnd bias, part of the bufferbloat
problem.
sk->sk_wmem_alloc not allowed to grow above a given limit,
allowing no more than ~128KB [1] per tcp socket in qdisc/dev layers at a
given time.
TSO packets are sized/capped to half the limit, so that we have two
TSO packets in flight, allowing better bandwidth use.
As a side effect, setting the limit to 40000 automatically reduces the
standard gso max limit (65536) to 40000/2 : It can help to reduce
latencies of high prio packets, having smaller TSO packets.
This means we divert sock_wfree() to a tcp_wfree() handler, to
queue/send following frames when skb_orphan() [2] is called for the
already queued skbs.
Results on my dev machines (tg3/ixgbe nics) are really impressive,
using standard pfifo_fast, and with or without TSO/GSO.
Without reduction of nominal bandwidth, we have reduction of buffering
per bulk sender :
< 1ms on Gbit (instead of 50ms with TSO)
< 8ms on 100Mbit (instead of 132 ms)
I no longer have 4 MBytes backlogged in qdisc by a single netperf
session, and both side socket autotuning no longer use 4 Mbytes.
As skb destructor cannot restart xmit itself ( as qdisc lock might be
taken at this point ), we delegate the work to a tasklet. We use one
tasklest per cpu for performance reasons.
If tasklet finds a socket owned by the user, it sets TSQ_OWNED flag.
This flag is tested in a new protocol method called from release_sock(),
to eventually send new segments.
[1] New /proc/sys/net/ipv4/tcp_limit_output_bytes tunable
[2] skb_orphan() is usually called at TX completion time,
but some drivers call it in their start_xmit() handler.
These drivers should at least use BQL, or else a single TCP
session can still fill the whole NIC TX ring, since TSQ will
have no effect.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Dave Taht <dave.taht@bufferbloat.net>
Cc: Tom Herbert <therbert@google.com>
Cc: Matt Mathis <mattmathis@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Thu, 12 Jul 2012 00:18:04 +0000 (17:18 -0700)]
tcp: Fix out of bounds access to tcpm_vals
The recent patch "tcp: Maintain dynamic metrics in local cache." introduced
an out of bounds access due to what appears to be a typo. I believe this
change should resolve the issue by replacing the access to RTAX_CWND with
TCP_METRIC_CWND.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 11 Jul 2012 09:39:24 +0000 (02:39 -0700)]
ipv6: Move ipv6 twsk accessors outside of CONFIG_IPV6 ifdefs.
Fixes build when ipv6 is disabled.
Reported-by: Fengguang Wu <wfg@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Li RongQing [Mon, 9 Jul 2012 23:56:12 +0000 (23:56 +0000)]
bridge: fix endian
mld->mld_maxdelay is net endian, so we should use ntohs, not htons
CC: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Li RongQing [Mon, 9 Jul 2012 22:02:42 +0000 (22:02 +0000)]
qlge: fix endian issue
commit
6d29b1ef introduces a bug, ntohs is __be16_to_cpu,
not cpu_to_be16.
We always use htons on IP_OFFSET and IP_MF, then compare
with network package.
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Li RongQing [Mon, 9 Jul 2012 20:56:06 +0000 (20:56 +0000)]
ksz884x: fix Endian
ETH_P_IP is host Endian, skb->protocol is big Endian, when
compare them, Using htons on skb->protocol is wrong.
And fix two code style issues: indentation and remove
unnecessary parentheses.
CC: Tristram Ha <Tristram.Ha@micrel.com>
CC: Ben Hutchings <bhutchings@solarflare.com>
CC: Joe Perches <joe@perches.com>
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 11 Jul 2012 08:28:36 +0000 (01:28 -0700)]
Merge branch 'davem-next.r8169' of git://violet.fr.zoreil.com/romieu/linux
David S. Miller [Wed, 11 Jul 2012 06:56:33 +0000 (23:56 -0700)]
Merge git://git./linux/kernel/git/davem/net
Conflicts:
net/batman-adv/bridge_loop_avoidance.c
net/batman-adv/bridge_loop_avoidance.h
net/batman-adv/soft-interface.c
net/mac80211/mlme.c
With merge help from Antonio Quartulli (batman-adv) and
Stephen Rothwell (drivers/net/usb/qmi_wwan.c).
The net/mac80211/mlme.c conflict seemed easy enough, accounting for a
conversion to some new tracing macros.
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Tue, 10 Jul 2012 10:04:40 +0000 (10:04 +0000)]
bnx2: Fix bug in bnx2_free_tx_skbs().
In rare cases, bnx2x_free_tx_skbs() can unmap the wrong DMA address
when it gets to the last entry of the tx ring. We were not using
the proper macro to skip the last entry when advancing the tx index.
Reported-by: Zongyun Lai <zlai@vmware.com>
Reviewed-by: Jeffrey Huang <huangjw@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 10 Jul 2012 10:03:41 +0000 (10:03 +0000)]
IPoIB: fix skb truesize underestimatiom
Or Gerlitz reported triggering of WARN_ON_ONCE(delta < len); in
skb_try_coalesce()
This warning tracks drivers that incorrectly set skb->truesize
IPoIB indeed allocates a full page to store a fragment, but only
accounts in skb->truesize the used part of the page (frame length)
This patch fixes skb truesize underestimation, and
also fixes a performance issue, because RX skbs have not enough tailroom
to allow IP and TCP stacks to pull their header in skb linear part
without an expensive call to pskb_expand_head()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Or Gerlitz <ogerlitz@mellanox.com>
Cc: Erez Shitrit <erezsh@mellanox.com>
Cc: Shlomo Pongartz <shlomop@mellanox.com>
Cc: Roland Dreier <roland@purestorage.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Amir Hanania [Mon, 9 Jul 2012 20:47:19 +0000 (20:47 +0000)]
net: Fix memory leak - vlan_info struct
In driver reload test there is a memory leak.
The structure vlan_info was not freed when the driver was removed.
It was not released since the nr_vids var is one after last vlan was removed.
The nr_vids is one, since vlan zero is added to the interface when the interface
is being set, but the vlan zero is not deleted at unregister.
Fix - delete vlan zero when we unregister the device.
Signed-off-by: Amir Hanania <amir.hanania@intel.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 11 Jul 2012 06:31:37 +0000 (23:31 -0700)]
Merge tag 'batman-adv-fix-for-davem' of git://git.open-mesh.org/linux-merge
Included changes:
- fix a bug generated by the wrong interaction between the GW feature and the
Bridge Loop Avoidance
Jitendra Kalsaria [Tue, 10 Jul 2012 14:57:39 +0000 (14:57 +0000)]
qlge: Bumped driver version to 1.00.00.31
Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jitendra Kalsaria [Tue, 10 Jul 2012 14:57:38 +0000 (14:57 +0000)]
qlge: Refactoring of ethtool stats.
Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jitendra Kalsaria [Tue, 10 Jul 2012 14:57:37 +0000 (14:57 +0000)]
qlge: Moving low level frame error to ethtool statistics.
Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jitendra Kalsaria [Tue, 10 Jul 2012 14:57:36 +0000 (14:57 +0000)]
qlge: Fixed double pci free upon tx_ring->q allocation failure.
Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jitendra Kalsaria [Tue, 10 Jul 2012 14:57:35 +0000 (14:57 +0000)]
qlge: Added missing case statement to ethtool get_strings.
Missing case was causing ethtool self test to print garbage
value in extra info section.
Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jitendra Kalsaria [Tue, 10 Jul 2012 14:57:34 +0000 (14:57 +0000)]
qlge: Clean up ethtool set WOL routine.
Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jitendra Kalsaria [Tue, 10 Jul 2012 14:57:33 +0000 (14:57 +0000)]
qlge: Fix ethtool WOL calls to operate only on devices that support WOL.
Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jitendra Kalsaria [Tue, 10 Jul 2012 14:57:32 +0000 (14:57 +0000)]
qlge: Cleanup atomic queue threshold check.
Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jitendra Kalsaria [Tue, 10 Jul 2012 14:57:31 +0000 (14:57 +0000)]
qlge: Fix TX queue stoppage due to full condition.
TX queue was being stopped at beginning of send path instead
of at the end when last descriptor is used.
Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rob Herring [Mon, 9 Jul 2012 14:16:10 +0000 (14:16 +0000)]
net: calxedaxgmac: enable rx cut-thru mode
Enabling RX cut-thru mode yields better performance as received frames
start getting written to memory before a whole frame is received.
Signed-off-by: Rob Herring <rob.herring@calxeda.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rob Herring [Mon, 9 Jul 2012 14:16:09 +0000 (14:16 +0000)]
net: calxedaxgmac: set outstanding AXI bus transactions to 8
Increase the number of outstanding read and write AXI transactions from 1
to 8 for better performance.
Signed-off-by: Rob Herring <rob.herring@calxeda.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rob Herring [Mon, 9 Jul 2012 14:16:08 +0000 (14:16 +0000)]
net: calxedaxgmac: fix hang on rx refill
Fix intermittent hangs in xgmac_rx_refill. If a ring buffer entry already
had an skb allocated, then xgmac_rx_refill would get stuck in a loop. This
can happen on a rx error when we just leave the skb allocated to the entry.
[ 7884.510000] INFO: rcu_preempt detected stall on CPU 0 (t=727315 jiffies)
[ 7884.510000] [<
c0010a59>] (unwind_backtrace+0x1/0x98) from [<
c006fd93>] (__rcu_pending+0x11b/0x2c4)
[ 7884.510000] [<
c006fd93>] (__rcu_pending+0x11b/0x2c4) from [<
c0070b95>] (rcu_check_callbacks+0xed/0x1a8)
[ 7884.510000] [<
c0070b95>] (rcu_check_callbacks+0xed/0x1a8) from [<
c0036abb>] (update_process_times+0x2b/0x48)
[ 7884.510000] [<
c0036abb>] (update_process_times+0x2b/0x48) from [<
c004e8fd>] (tick_sched_timer+0x51/0x94)
[ 7884.510000] [<
c004e8fd>] (tick_sched_timer+0x51/0x94) from [<
c0045527>] (__run_hrtimer+0x4f/0x1e8)
[ 7884.510000] [<
c0045527>] (__run_hrtimer+0x4f/0x1e8) from [<
c0046003>] (hrtimer_interrupt+0xd7/0x1e4)
[ 7884.510000] [<
c0046003>] (hrtimer_interrupt+0xd7/0x1e4) from [<
c00101d3>] (twd_handler+0x17/0x24)
[ 7884.510000] [<
c00101d3>] (twd_handler+0x17/0x24) from [<
c006be39>] (handle_percpu_devid_irq+0x59/0x114)
[ 7884.510000] [<
c006be39>] (handle_percpu_devid_irq+0x59/0x114) from [<
c0069aab>] (generic_handle_irq+0x17/0x2c)
[ 7884.510000] [<
c0069aab>] (generic_handle_irq+0x17/0x2c) from [<
c000cc8d>] (handle_IRQ+0x35/0x7c)
[ 7884.510000] [<
c000cc8d>] (handle_IRQ+0x35/0x7c) from [<
c033b153>] (__irq_svc+0x33/0xb8)
[ 7884.510000] [<
c033b153>] (__irq_svc+0x33/0xb8) from [<
c0244b06>] (xgmac_rx_refill+0x3a/0x140)
[ 7884.510000] [<
c0244b06>] (xgmac_rx_refill+0x3a/0x140) from [<
c02458ed>] (xgmac_poll+0x265/0x3bc)
[ 7884.510000] [<
c02458ed>] (xgmac_poll+0x265/0x3bc) from [<
c029fcbf>] (net_rx_action+0xc3/0x200)
[ 7884.510000] [<
c029fcbf>] (net_rx_action+0xc3/0x200) from [<
c0030cab>] (__do_softirq+0xa3/0x1bc)
Signed-off-by: Rob Herring <rob.herring@calxeda.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rob Herring [Mon, 9 Jul 2012 14:16:07 +0000 (14:16 +0000)]
net: calxedaxgmac: fix net timeout recovery
Fix net tx watchdog timeout recovery. The descriptor ring was reset,
but the DMA engine was not reset to the beginning of the ring.
Signed-off-by: Rob Herring <rob.herring@calxeda.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Mason [Mon, 9 Jul 2012 14:09:35 +0000 (14:09 +0000)]
ll_temac: remove unnecessary setting of skb->dev
skb->dev is being unnecessarily set by the driver on packet recieve.
eth_type_trans already sets skb->dev to the proper value and it is not
referenced anywhere else in the dirver, thus making its setting unnecessary.
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Mason [Mon, 9 Jul 2012 14:09:34 +0000 (14:09 +0000)]
sunhme: remove unnecessary setting of skb->dev
skb->dev is being unnecessarily set during ring init and skb alloc in rx. It is
already being set to the proper value when eth_type_trans is called on packet
receive, and the skb->dev is not referenced anywhere else in the code.
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Mason [Mon, 9 Jul 2012 14:09:33 +0000 (14:09 +0000)]
sungem: remove unnecessary setting of skb->dev
skb->dev is being unnecessarily set by the driver's skb alloc routine (which is
called in init and during rx). It is already being set to the proper value when
eth_type_trans is called on packet receive, and the skb->dev is not referenced
anywhere else in the code.
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Mason [Mon, 9 Jul 2012 14:09:32 +0000 (14:09 +0000)]
sunbmac: remove unnecessary setting of skb->dev
skb->dev is being unnecessarily set during ring init and skb alloc in rx. It is
already being set to the proper value when eth_type_trans is called on packet
receive, and the skb->dev is not referenced anywhere else in the code.
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Mason [Mon, 9 Jul 2012 14:09:31 +0000 (14:09 +0000)]
qlge: remove unnecessary setting of skb->dev
skb->dev is being unnecessarily set by the driver on packet recieve.
eth_type_trans already sets skb->dev to the proper value and it is not
referenced anywhere else in the dirver, thus making its setting unnecessary.
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Cc: Anirban Chakraborty <anirban.chakraborty@qlogic.com>
Cc: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Cc: Ron Mercer <ron.mercer@qlogic.com>
Cc: linux-driver@qlogic.com
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Mason [Mon, 9 Jul 2012 14:09:30 +0000 (14:09 +0000)]
qlcnic: remove unnecessary setting of skb->dev
skb->dev is being unnecessarily set before calling eth_type_trans.
eth_type_trans already sets skb->dev to the proper value, thus making this
unnecessary.
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Cc: Anirban Chakraborty <anirban.chakraborty@qlogic.com>
Cc: Sony Chacko <sony.chacko@qlogic.com>
Cc: linux-driver@qlogic.com
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Mason [Mon, 9 Jul 2012 14:09:29 +0000 (14:09 +0000)]
ksz884x: remove unnecessary setting of skb->dev
skb->dev is being unnecessarily set during ring init. It is already being set
to the proper value when eth_type_trans is called on packet receive, and the
skb->dev is not referenced anywhere else in the code.
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Mason [Mon, 9 Jul 2012 14:09:28 +0000 (14:09 +0000)]
lantiq_etop: remove unnecessary setting of skb->dev
skb->dev is being unnecessarily set before calling eth_type_trans.
eth_type_trans already sets skb->dev to the proper value, thus making this
unnecessary.
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Mason [Mon, 9 Jul 2012 14:09:27 +0000 (14:09 +0000)]
netxen: remove unnecessary setting of skb->dev
skb->dev is being unnecessarily set by the driver on packet recieve.
eth_type_trans already sets skb->dev to the proper value and it is not
referenced anywhere else in the dirver, thus making its setting unnecessary.
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Cc: Sony Chacko <sony.chacko@qlogic.com>
Cc: Rajesh Borundia <rajesh.borundia@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Mason [Mon, 9 Jul 2012 14:09:26 +0000 (14:09 +0000)]
enic: remove unnecessary setting of skb->dev
skb->dev is being unnecessarily set after calling eth_type_trans.
eth_type_trans already sets skb->dev to the proper value, thus making this
unnecessary.
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Cc: Christian Benvenuti <benve@cisco.com>
Cc: Roopa Prabhu <roprabhu@cisco.com>
Cc: Neel Patel <neepatel@cisco.com>
Cc: Nishank Trivedi <nistrive@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Mason [Mon, 9 Jul 2012 14:09:25 +0000 (14:09 +0000)]
lance: remove unnecessary setting of skb->dev
skb->dev is being unnecessarily set during ring init. It is already being set
to the proper value when eth_type_trans is called on packet receive, and the
skb->dev is not referenced anywhere else in the code.
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Mason [Mon, 9 Jul 2012 14:07:57 +0000 (14:07 +0000)]
vxge/s2io: remove dead URLs
URLs to neterion.com and s2io.com no longer resolve. Remove all references to
these URLs in the driver source and documentation.
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 10 Jul 2012 19:05:57 +0000 (19:05 +0000)]
ipv6: optimize ipv6 addresses compares
On 64 bit arches having efficient unaligned accesses (eg x86_64) we can
use long words to reduce number of instructions for free.
Joe Perches suggested to change ipv6_masked_addr_cmp() to return a bool
instead of 'int', to make sure ipv6_masked_addr_cmp() cannot be used
in a sorting function.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Tue, 10 Jul 2012 10:56:59 +0000 (10:56 +0000)]
drivers/net/ethernet: Fix non-kernel-doc comments with kernel-doc start markers
Convert doxygen (or similar) formatted comments to kernel-doc or
unformatted comment. Delete a few that are content-free.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Tue, 10 Jul 2012 10:56:00 +0000 (10:56 +0000)]
drivers/net/ethernet: Fix (nearly-)kernel-doc comments for various functions
Fix incorrect start markers, wrapped summary lines, missing section
breaks, incorrect separators, and some name mismatches. Delete
a few that are content-free.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Tue, 10 Jul 2012 10:55:35 +0000 (10:55 +0000)]
net: Fix non-kernel-doc comments with kernel-doc start marker
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Tue, 10 Jul 2012 10:55:09 +0000 (10:55 +0000)]
net: Fix (nearly-)kernel-doc comments for various functions
Fix incorrect start markers, wrapped summary lines, missing section
breaks, incorrect separators, and some name mismatches.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Tue, 10 Jul 2012 10:54:38 +0000 (10:54 +0000)]
net: Properly define functions with no parameters
Defining a function with no parameters as 'T foo()' is the deprecated
K&R style, and is not strictly equivalent to defining it as 'T foo(void)'.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 11 Jul 2012 05:53:57 +0000 (22:53 -0700)]
Merge branch 'metrics_restructure'
This patch series works towards the goal of minimizing the amount
of things that can change in an ipv4 route.
In a regime where the routing cache is removed, route changes will
lead to cloning in the FIB tables or similar.
The largest trigger of route metrics writes, TCP, now has it's own
cache of dynamic metric state. The timewait timestamps are stored
there now as well.
As a result of that, pre-cowing metrics is no longer necessary,
and therefore FLOWI_FLAG_PRECOW_METRICS is removed.
Redirect and PMTU handling is moved back into the ipv4 routes. I'm
sorry for all the headaches trying to do this in the inetpeer has
caused, it was the wrong approach for sure.
Since metrics become read-only for ipv4 we no longer need the inetpeer
hung off of the ipv4 routes either. So those disappear too.
Also, timewait sockets no longer need to hold onto an inetpeer either.
After this series, we still have some details to resolve wrt. PMTU and
redirects for a route-cache-less system:
1) With just the plain route cache removal, PMTU will continue to
work mostly fine. This is because of how the local route users
call down into the PMTU update code with the route they already
hold.
However, if we wish to cache pre-computed routes in fib_info
nexthops (which we want for performance), then we need to add
route cloning for PMTU events.
2) Redirects require more work. First, redirects must be changed to
be handled like PMTU. Wherein we call down into the sockets and
other entities, and then they call back into the routing code with
the route they were using.
So we'll be adding an ->update_nexthop() method alongside
->update_pmtu().
And then, like for PMTU, we'll need cloning support once we start
caching routes in the fib_info nexthops.
But that's it, we can completely pull the trigger and remove the
routing cache with minimal disruptions.
As it is, this patch series alone helps a lot of things. For one,
routing cache entry creation should be a lot faster, because we no
longer do inetpeer lookups (even to check if an entry exists).
This patch series also opens the door for non-DST_HOST ipv4 routes,
because nothing fundamentally cares about rt->rt_dst any more. It
can be removed with the base routing cache removal patch. In fact,
that was the primary goal of this patch series.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 10 Jul 2012 14:26:01 +0000 (07:26 -0700)]
ipv4: Remove inetpeer from routes.
No longer used.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 10 Jul 2012 14:08:18 +0000 (07:08 -0700)]
ipv4: Calling ->cow_metrics() now is a bug.
Nothing every writes to ipv4 metrics any longer.
PMTU is stored in rt->rt_pmtu.
Dynamic TCP metrics are stored in a special TCP metrics cache,
completely outside of the routes.
Therefore ->cow_metrics() can simply nothing more than a WARN_ON
trigger so we can catch anyone who tries to add new writes to
ipv4 route metrics.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 10 Jul 2012 14:03:43 +0000 (07:03 -0700)]
ipv4: Kill dst_copy_metrics() call from ipv4_blackhole_route().
Blackhole routes have a COW metrics operation that returns NULL
always, therefore this dst_copy_metrics() call did absolutely
nothing.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 10 Jul 2012 14:02:09 +0000 (07:02 -0700)]
ipv4: Enforce max MTU metric at route insertion time.
Rather than at every struct rtable creation.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 10 Jul 2012 13:58:42 +0000 (06:58 -0700)]
ipv4: Maintain redirect and PMTU info in struct rtable again.
Maintaining this in the inetpeer entries was not the right way to do
this at all.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 10 Jul 2012 12:06:14 +0000 (05:06 -0700)]
rtnetlink: Remove ts/tsage args to rtnl_put_cacheinfo().
Nobody provides non-zero values any longer.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 10 Jul 2012 11:01:57 +0000 (04:01 -0700)]
inet: Kill FLOWI_FLAG_PRECOW_METRICS.
No longer needed. TCP writes metrics, but now in it's own special
cache that does not dirty the route metrics. Therefore there is no
longer any reason to pre-cow metrics in this way.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 10 Jul 2012 10:58:16 +0000 (03:58 -0700)]
inet: Minimize use of cached route inetpeer.
Only use it in the absolutely required cases:
1) COW'ing metrics
2) ipv4 PMTU
3) ipv4 redirects
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 10 Jul 2012 10:32:59 +0000 (03:32 -0700)]
inet: Remove ->get_peer() method.
No longer used.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 10 Jul 2012 10:27:56 +0000 (03:27 -0700)]
tcp: Remove tw->tw_peer
No longer used.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 10 Jul 2012 10:14:24 +0000 (03:14 -0700)]
tcp: Move timestamps from inetpeer to metrics cache.
With help from Lin Ming.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 10 Jul 2012 07:53:48 +0000 (00:53 -0700)]
net: Kill set_dst_metric_rtt().
No longer used.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 10 Jul 2012 07:52:56 +0000 (00:52 -0700)]
net: Don't report route RTT metric value in cache dumps.
We don't maintain it dynamically any longer, so reporting it would
be extremely misleading. Report zero instead.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 10 Jul 2012 07:49:14 +0000 (00:49 -0700)]
tcp: Maintain dynamic metrics in local cache.
Maintain a local hash table of TCP dynamic metrics blobs.
Computed TCP metrics are no longer maintained in the route metrics.
The table uses RCU and an extremely simple hash so that it has low
latency and low overhead. A simple hash is legitimate because we only
make metrics blobs for fully established connections.
Some tweaking of the default hash table sizes, metric timeouts, and
the hash chain length limit certainly could use some tweaking. But
the basic design seems sound.
With help from Eric Dumazet and Joe Perches.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 9 Jul 2012 23:19:30 +0000 (16:19 -0700)]
tcp: Abstract back handling peer aliveness test into helper function.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 9 Jul 2012 23:07:30 +0000 (16:07 -0700)]
tcp: Move dynamnic metrics handling into seperate file.
Signed-off-by: David S. Miller <davem@davemloft.net>
Johannes Berg [Tue, 10 Jul 2012 06:18:44 +0000 (06:18 +0000)]
etherdevice: introduce eth_broadcast_addr
A lot of code has either the memset or an inefficient copy
from a static array that contains the all-ones broadcast
address. Introduce eth_broadcast_addr() to fill an address
with all ones, making the code clearer and allowing us to
get rid of some constant arrays.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 11 Jul 2012 01:05:28 +0000 (18:05 -0700)]
ipv4: Fix crashes in fib_rules_tclass().
All paths assume, when CONFIG_IP_MULTIPLE_TABLES is enabled, that any
successful call to fib_lookup() will initialize the fib_result->r
value to something.
We violated that expectation in the new fib_lookup() fast path.
Reported-by: Or Gerlitz <ogerlitz@mellanox.com>
Tested-by: Eric Dumazet <eric.dumazet@gmail.com>
Tested-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hayes Wang [Tue, 10 Jul 2012 06:47:05 +0000 (08:47 +0200)]
r8169: fix argument in rtl_hw_init_8168g.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
David S. Miller [Mon, 9 Jul 2012 23:09:47 +0000 (16:09 -0700)]
Merge branch 'davem-next.r8169' of git://violet.fr.zoreil.com/romieu/linux
Francois Romieu (4):
r8169: mdio_ops signature change.
r8169: csi_ops signature change.
r8169: ephy, eri and efuse functions signature changes.
r8169: abstract out loop conditions.
Hayes Wang (2):
r8169: add RTL8106E support.
r8169: support RTL8168G
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 5 Jul 2012 11:45:13 +0000 (11:45 +0000)]
gianfar: fix potential sk_wmem_alloc imbalance
commit
db83d136d7f753 (gianfar: Fix missing sock reference when
processing TX time stamps) added a potential sk_wmem_alloc imbalance
If the new skb has a different truesize than old one, we can get a
negative sk_wmem_alloc once new skb is orphaned at TX completion.
Now we no longer early orphan skbs in dev_hard_start_xmit(), this
probably can lead to fatal bugs.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Manfred Rudigier <manfred.rudigier@omicron.at>
Cc: Claudiu Manoil <claudiu.manoil@freescale.com>
Cc: Jiajun Wu <b06378@freescale.com>
Cc: Andy Fleming <afleming@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julia Lawall [Sun, 8 Jul 2012 01:37:43 +0000 (01:37 +0000)]
drivers/net/ethernet/broadcom/cnic.c: remove invalid reference to list iterator variable
If list_for_each_entry, etc complete a traversal of the list, the iterator
variable ends up pointing to an address at an offset from the list head,
and not a meaningful structure. Thus this value should not be used after
the end of the iterator. There does not seem to be a meaningful value to
provide to netdev_warn. Replace with pr_warn, since pr_err is used
elsewhere.
This problem was found using Coccinelle (http://coccinelle.lip6.fr/).
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julia Lawall [Sun, 8 Jul 2012 01:37:39 +0000 (01:37 +0000)]
net/rxrpc/ar-peer.c: remove invalid reference to list iterator variable
If list_for_each_entry, etc complete a traversal of the list, the iterator
variable ends up pointing to an address at an offset from the list head,
and not a meaningful structure. Thus this value should not be used after
the end of the iterator. This seems to be a copy-paste bug from a previous
debugging message, and so the meaningless value is just deleted.
This problem was found using Coccinelle (http://coccinelle.lip6.fr/).
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julia Lawall [Sun, 8 Jul 2012 01:37:38 +0000 (01:37 +0000)]
drivers/isdn/mISDN/stack.c: remove invalid reference to list iterator variable
If list_for_each_entry, etc complete a traversal of the list, the iterator
variable ends up pointing to an address at an offset from the list head,
and not a meaningful structure. Thus this value should not be used after
the end of the iterator. The dereferences are just deleted from the
debugging statement.
This problem was found using Coccinelle (http://coccinelle.lip6.fr/).
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Timur Tabi [Mon, 9 Jul 2012 21:57:36 +0000 (16:57 -0500)]
net/fsl_pq_mdio: use spin_event_timeout() to poll the indicator register
Macro spin_event_timeout() was designed for simple polling of hardware
registers with a timeout, so use it when we poll the MIIMIND register.
This allows us to return an error code instead of polling indefinitely.
Note that PHY_INIT_TIMEOUT is a count of loop iterations, so we can't use
it for spin_event_timeout(), which asks for microseconds.
Signed-off-by: Timur Tabi <timur@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Sun, 8 Jul 2012 21:45:10 +0000 (21:45 +0000)]
net: cgroup: fix out of bounds accesses
dev->priomap is allocated by extend_netdev_table() called from
update_netdev_tables().
And this is only called if write_priomap() is called.
But if write_priomap() is not called, it seems we can have out of bounds
accesses in cgrp_destroy(), read_priomap() & skb_update_prio()
With help from Gao Feng
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Gao feng <gaofeng@cn.fujitsu.com>
Acked-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric W. Biederman [Mon, 9 Jul 2012 10:52:43 +0000 (10:52 +0000)]
bonding: debugfs and network namespaces are incompatible
The bonding debugfs support has been broken in the presence of network
namespaces since it has been added. The debugfs support does not handle
multiple bonding devices with the same name in different network
namespaces.
I haven't had any bug reports, and I'm not interested in getting any.
Disable the debugfs support when network namespaces are enabled.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric W. Biederman [Mon, 9 Jul 2012 10:51:45 +0000 (10:51 +0000)]
bonding: Manage /proc/net/bonding/ entries from the netdev events
It was recently reported that moving a bonding device between network
namespaces causes warnings from /proc. It turns out after the move we
were trying to add and to remove the /proc/net/bonding entries from the
wrong network namespace.
Move the bonding /proc registration code into the NETDEV_REGISTER and
NETDEV_UNREGISTER events where the proc registration and unregistration
will always happen at the right time.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Emeric Vigier [Mon, 9 Jul 2012 21:44:45 +0000 (17:44 -0400)]
smsc95xx: support ethtool get_regs
Inspired by implementation in smsc911x.c and smsc9420.c
Tested on ARM/pandaboard running android
Signed-off-by: Emeric Vigier <emeric.vigier@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Devendra Naga [Sun, 8 Jul 2012 05:57:57 +0000 (05:57 +0000)]
r6040: use module_pci_driver macro
as the manual of module_pci_driver says that
it can be used when the init and exit functions of
the module does nothing but the pci_register_driver
and pci_unregister_driver.
use it for rdc's r6040 driver, as the init and exit
paths does as above, and also this reduces a little
amount of code.
Signed-off-by: Devendra Naga <devendra.aaru@gmail.com>
Acked-by: Florian Fainelli <florian@openwrt.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 9 Jul 2012 06:02:24 +0000 (06:02 +0000)]
bnx2x: populate skb->l4_rxhash
l4_rxhash is set on skb when rxhash is obtained from canonical 4-tuple
over transport ports/addresses.
We can set skb->l4_rxhash for all incoming TCP packets on bnx2x for
free, as cqe status contains a hash type information.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Eilon Greenstein <eilong@broadcom.com>
Cc: Willem de Bruijn <willemb@google.com>
Acked-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hayes Wang [Mon, 2 Jul 2012 09:23:22 +0000 (17:23 +0800)]
r8169: support RTL8168G
For RTL8111G, the settings of phy and firmware are replaced with
ocp functions. r8168g_mdio_{write / read} redirects the relative
settings to suitable ocp functions. A per-device variable is needed
to evaluate the real address of ocp functions.
rtl_writephy(tp, 0x1f, xxxx) is dedicated to keeping said variable
up-to-date.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Acked-by: Francois Romieu <romieu@fr.zoreil.com>
Francois Romieu [Fri, 6 Jul 2012 12:19:23 +0000 (14:19 +0200)]
r8169: abstract out loop conditions.
Twelve functions can fail silently. Now they have a chance to complain.
Macro and pasting abuse has been kept at a level where tags and
friends should not be hurt.
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Francois Romieu [Fri, 6 Jul 2012 20:40:38 +0000 (22:40 +0200)]
r8169: ephy, eri and efuse functions signature changes.
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Francois Romieu [Fri, 6 Jul 2012 11:37:00 +0000 (13:37 +0200)]
r8169: csi_ops signature change.
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Francois Romieu [Fri, 6 Jul 2012 18:19:42 +0000 (20:19 +0200)]
r8169: mdio_ops signature change.
Further changes need more context down in the call stack.
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Hayes Wang [Mon, 2 Jul 2012 09:23:21 +0000 (17:23 +0800)]
r8169: add RTL8106E support.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Acked-by: Francois Romieu <romieu@fr.zoreil.com>
Deepak Sikri [Sun, 8 Jul 2012 21:14:46 +0000 (21:14 +0000)]
stmmac: Fix for higher mtu size handling
For the higher mtu sizes requiring the buffer size greater than 8192,
the buffers are sent or received using multiple dma descriptors/ same
descriptor with option of multi buffer handling.
It was observed during tests that the driver was missing on data
packets during the normal ping operations if the data buffers being used
catered to jumbo frame handling.
The memory barrriers are added in between preparation of dma descriptors
in the jumbo frame handling path to ensure all instructions before
enabling the dma are complete.
Signed-off-by: Deepak Sikri <deepak.sikri@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Deepak Sikri [Sun, 8 Jul 2012 21:14:45 +0000 (21:14 +0000)]
stmmac: Fix for nfs hang on multiple reboot
It was observed that during multiple reboots nfs hangs. The status of
receive descriptors shows that all the descriptors were in control of
CPU, and none were assigned to DMA.
Also the DMA status register confirmed that the Rx buffer is
unavailable.
This patch adds the fix for the same by adding the memory barriers to
ascertain that the all instructions before enabling the Rx or Tx DMA are
completed which involves the proper setting of the ownership bit in DMA
descriptors.
Signed-off-by: Deepak Sikri <deepak.sikri@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
John W. Linville [Mon, 9 Jul 2012 19:09:08 +0000 (15:09 -0400)]
Merge branch 'master' of git://git./linux/kernel/git/linville/wireless into for-davem
Emmanuel Grumbach [Wed, 4 Jul 2012 11:59:08 +0000 (13:59 +0200)]
iwlegacy: don't mess up the SCD when removing a key
When we remove a key, we put a key index which was supposed
to tell the fw that we are actually removing the key. But
instead the fw took that index as a valid index and messed
up the SRAM of the device.
This memory corruption on the device mangled the data of
the SCD. The impact on the user is that SCD queue 2 got
stuck after having removed keys.
Reported-by: Paul Bolle <pebolle@tiscali.nl>
Cc: stable@vger.kernel.org
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Stanislaw Gruszka [Wed, 4 Jul 2012 11:20:20 +0000 (13:20 +0200)]
iwlegacy: always monitor for stuck queues
This is iwlegacy version of:
commit
342bbf3fee2fa9a18147e74b2e3c4229a4564912
Author: Johannes Berg <johannes.berg@intel.com>
Date: Sun Mar 4 08:50:46 2012 -0800
iwlwifi: always monitor for stuck queues
If we only monitor while associated, the following
can happen:
- we're associated, and the queue stuck check
runs, setting the queue "touch" time to X
- we disassociate, stopping the monitoring,
which leaves the time set to X
- almost 2s later, we associate, and enqueue
a frame
- before the frame is transmitted, we monitor
for stuck queues, and find the time set to
X, although it is now later than X + 2000ms,
so we decide that the queue is stuck and
erroneously restart the device
Cc: stable@vger.kernel.org
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Stanislaw Gruszka [Wed, 4 Jul 2012 11:10:02 +0000 (13:10 +0200)]
rt2x00usb: fix indexes ordering on RX queue kick
On rt2x00_dmastart() we increase index specified by Q_INDEX and on
rt2x00_dmadone() we increase index specified by Q_INDEX_DONE. So entries
between Q_INDEX_DONE and Q_INDEX are those we currently process in the
hardware. Entries between Q_INDEX and Q_INDEX_DONE are those we can
submit to the hardware.
According to that fix rt2x00usb_kick_queue(), as we need to submit RX
entries that are not processed by the hardware. It worked before only
for empty queue, otherwise was broken.
Note that for TX queues indexes ordering are ok. We need to kick entries
that have filled skb, but was not submitted to the hardware, i.e.
started from Q_INDEX_DONE and have ENTRY_DATA_PENDING bit set.
From practical standpoint this fixes RX queue stall, usually reproducible
in AP mode, like for example reported here:
https://bugzilla.redhat.com/show_bug.cgi?id=828824
Reported-and-tested-by: Franco Miceli <fmiceli@plan.ceibal.edu.uy>
Reported-and-tested-by: Tom Horsley <horsley1953@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Bing Zhao [Tue, 3 Jul 2012 22:53:13 +0000 (15:53 -0700)]
mwifiex: fix Coverity SCAN CID 709078: Resource leak (RESOURCE_LEAK)
> *. CID 709078: Resource leak (RESOURCE_LEAK)
> - drivers/net/wireless/mwifiex/cfg80211.c, line: 935
> Assigning: "bss_cfg" = storage returned from "kzalloc(132UL, 208U)"
> - but was not free
> drivers/net/wireless/mwifiex/cfg80211.c:935
Signed-off-by: Bing Zhao <bzhao@marvell.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Eliad Peller [Mon, 2 Jul 2012 11:42:03 +0000 (14:42 +0300)]
mac80211: destroy assoc_data correctly if assoc fails
If association failed due to internal error (e.g. no
supported rates IE), we call ieee80211_destroy_assoc_data()
with assoc=true, while we actually reject the association.
This results in the BSSID not being zeroed out.
After passing assoc=false, we no longer have to call
sta_info_destroy_addr() explicitly. While on it, move
the "associated" message after the assoc_success check.
Cc: stable@vger.kernel.org [3.4+]
Signed-off-by: Eliad Peller <eliad@wizery.com>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Sasha Levin [Sat, 30 Jun 2012 09:56:47 +0000 (11:56 +0200)]
NFC: Prevent NULL deref when getting socket name
llcp_sock_getname can be called without a device attached to the nfc_llcp_sock.
This would lead to the following BUG:
[ 362.341807] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 362.341815] IP: [<
ffffffff836258e5>] llcp_sock_getname+0x75/0xc0
[ 362.341818] PGD
31b35067 PUD
30631067 PMD 0
[ 362.341821] Oops: 0000 [#627] PREEMPT SMP DEBUG_PAGEALLOC
[ 362.341826] CPU 3
[ 362.341827] Pid: 7816, comm: trinity-child55 Tainted: G D W
3.5.0-rc4-next-20120628-sasha-00005-g9f23eb7 #479
[ 362.341831] RIP: 0010:[<
ffffffff836258e5>] [<
ffffffff836258e5>] llcp_sock_getname+0x75/0xc0
[ 362.341832] RSP: 0018:
ffff8800304fde88 EFLAGS:
00010286
[ 362.341834] RAX:
0000000000000000 RBX:
ffff880033cb8000 RCX:
0000000000000001
[ 362.341835] RDX:
ffff8800304fdec4 RSI:
ffff8800304fdec8 RDI:
ffff8800304fdeda
[ 362.341836] RBP:
ffff8800304fdea8 R08:
7ebcebcb772b7ffb R09:
5fbfcb9c35bdfd53
[ 362.341838] R10:
4220020c54326244 R11:
0000000000000246 R12:
ffff8800304fdec8
[ 362.341839] R13:
ffff8800304fdec4 R14:
ffff8800304fdec8 R15:
0000000000000044
[ 362.341841] FS:
00007effa376e700(0000) GS:
ffff880035a00000(0000) knlGS:
0000000000000000
[ 362.341843] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 362.341844] CR2:
0000000000000000 CR3:
0000000030438000 CR4:
00000000000406e0
[ 362.341851] DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
[ 362.341856] DR3:
0000000000000000 DR6:
00000000ffff0ff0 DR7:
0000000000000400
[ 362.341858] Process trinity-child55 (pid: 7816, threadinfo
ffff8800304fc000, task
ffff880031270000)
[ 362.341858] Stack:
[ 362.341862]
ffff8800304fdea8 ffff880035156780 0000000000000000 0000000000001000
[ 362.341865]
ffff8800304fdf78 ffffffff83183b40 00000000304fdec8 0000006000000000
[ 362.341868]
ffff8800304f0027 ffffffff83729649 ffff8800304fdee8 ffff8800304fdf48
[ 362.341869] Call Trace:
[ 362.341874] [<
ffffffff83183b40>] sys_getpeername+0xa0/0x110
[ 362.341877] [<
ffffffff83729649>] ? _raw_spin_unlock_irq+0x59/0x80
[ 362.341882] [<
ffffffff810f342b>] ? do_setitimer+0x23b/0x290
[ 362.341886] [<
ffffffff81985ede>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[ 362.341889] [<
ffffffff8372a539>] system_call_fastpath+0x16/0x1b
[ 362.341921] Code: 84 00 00 00 00 00 b8 b3 ff ff ff 48 85 db 74 54 66 41 c7 04 24 27 00 49 8d 7c 24 12 41 c7 45 00 60 00 00 00 48 8b 83 28 05 00 00 <8b> 00 41 89 44 24 04 0f b6 83 41 05 00 00 41 88 44 24 10 0f b6
[ 362.341924] RIP [<
ffffffff836258e5>] llcp_sock_getname+0x75/0xc0
[ 362.341925] RSP <
ffff8800304fde88>
[ 362.341926] CR2:
0000000000000000
[ 362.341928] ---[ end trace
6d450e935ee18bf3 ]---
Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Thomas Huehn [Fri, 29 Jun 2012 13:26:27 +0000 (06:26 -0700)]
mac80211: correct size the argument to kzalloc in minstrel_ht
msp has type struct minstrel_ht_sta_priv not struct minstrel_ht_sta.
(This incorporates the fixup originally posted as "mac80211: fix kzalloc
memory corruption introduced in minstrel_ht". -- JWL)
Reported-by: Fengguang Wu <wfg@linux.intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Thomas Huehn <thomas@net.t-labs.tu-berlin.de>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
David S. Miller [Mon, 9 Jul 2012 09:47:59 +0000 (02:47 -0700)]
Merge branch 'master' of git://1984.lsi.us.es/nf
Pablo Neira Ayuso says:
====================
* One to get the timeout special parameter for the SET target back working
(this was introduced while trying to fix another bug in 3.4) from
Jozsef Kadlecsik.
* One crash fix if containers and nf_conntrack are used reported by Hans
Schillstrom by myself.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso [Thu, 5 Jul 2012 13:42:10 +0000 (15:42 +0200)]
netfilter: nf_ct_ecache: fix crash with multiple containers, one shutting down
Hans reports that he's still hitting:
BUG: unable to handle kernel NULL pointer dereference at
000000000000027c
IP: [<
ffffffff813615db>] netlink_has_listeners+0xb/0x60
PGD 0
Oops: 0000 [#3] PREEMPT SMP
CPU 0
It happens when adding a number of containers with do:
nfct_query(h, NFCT_Q_CREATE, ct);
and most likely one namespace shuts down.
this problem was supposed to be fixed by:
70e9942 netfilter: nf_conntrack: make event callback registration per-netns
Still, it was missing one rcu_access_pointer to check if the callback
is set or not.
Reported-by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Jozsef Kadlecsik [Fri, 29 Jun 2012 09:42:28 +0000 (09:42 +0000)]
netfilter: ipset: timeout fixing bug broke SET target special timeout value
The patch "
127f559 netfilter: ipset: fix timeout value overflow bug"
broke the SET target when no timeout was specified.
Reported-by: Jean-Philippe Menil <jean-philippe.menil@univ-nantes.fr>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Michael Chan [Thu, 5 Jul 2012 14:21:55 +0000 (14:21 +0000)]
cnic: Don't use netdev->base_addr
commit
c0357e975afdbbedab5c662d19bef865f02adc17
bnx2: stop using net_device.{base_addr, irq}.
removed netdev->base_addr so we need to update cnic to get the MMIO
base address from pci_resource_start(). Otherwise, mmap of the uio
device will fail.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bjørn Mork [Thu, 5 Jul 2012 01:13:33 +0000 (01:13 +0000)]
net: qmi_wwan: add ZTE MF60
Adding a device with limited QMI support. It does not support
normal QMI_WDS commands for connection management. Instead,
sending a QMI_CTL SET_INSTANCE_ID command is required to
enable the network interface:
01 0f 00 00 00 00 00 00 20 00 04 00 01 01 00 00
A number of QMI_DMS and QMI_NAS commands are also supported
for optional device management.
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gao feng [Wed, 4 Jul 2012 23:28:40 +0000 (23:28 +0000)]
cgroup: fix panic in netprio_cgroup
we set max_prioidx to the first zero bit index of prioidx_map in
function get_prioidx.
So when we delete the low index netprio cgroup and adding a new
netprio cgroup again,the max_prioidx will be set to the low index.
when we set the high index cgroup's net_prio.ifpriomap,the function
write_priomap will call update_netdev_tables to alloc memory which
size is sizeof(struct netprio_map) + sizeof(u32) * (max_prioidx + 1),
so the size of array that map->priomap point to is max_prioidx +1,
which is low than what we actually need.
fix this by adding check in get_prioidx,only set max_prioidx when
max_prioidx low than the new prioidx.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Wed, 4 Jul 2012 22:27:18 +0000 (22:27 +0000)]
small cleanup in ax25_addr_parse()
The comments were wrong here because "AX25_MAX_DIGIS" is 8 but the
comments say 6. Also I've changed the "7" to "AX25_ADDR_LEN".
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Li RongQing [Wed, 4 Jul 2012 16:05:42 +0000 (16:05 +0000)]
be2net: Fix Endian
ETH_P_IP is host Endian, skb->protocol is big Endian, when
compare them, we should change ETH_P_IP from host endian
to big endian, htons, not ntohs.
CC: Somnath Kotur <somnath.kotur@emulex.com>
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Acked-by: Somnath Kotur <somnath.kotur@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Daney [Wed, 4 Jul 2012 12:06:16 +0000 (12:06 +0000)]
netdev/phy: Fixup lockdep warnings in mdio-mux.c
With lockdep enabled we get:
=============================================
[ INFO: possible recursive locking detected ]
3.4.4-Cavium-Octeon+ #313 Not tainted
---------------------------------------------
kworker/u:1/36 is trying to acquire lock:
(&bus->mdio_lock){+.+...}, at: [<
ffffffff813da7e8>] mdio_mux_read+0x38/0xa0
but task is already holding lock:
(&bus->mdio_lock){+.+...}, at: [<
ffffffff813d79e4>] mdiobus_read+0x44/0x88
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&bus->mdio_lock);
lock(&bus->mdio_lock);
*** DEADLOCK ***
May be due to missing lock nesting notation
.
.
.
This is a false positive, since we are indeed using 'nested' locking,
we need to use mutex_lock_nested().
Now in theory we can stack multiple MDIO multiplexers, but that would
require passing the nesting level (which is difficult to know) to
mutex_lock_nested(). Instead we assume the simple case of a single
level of nesting. Since these are only warning messages, it isn't so
important to solve the general case.
Signed-off-by: David Daney <david.daney@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>