Paolo Valente [Wed, 10 Jul 2013 13:46:08 +0000 (15:46 +0200)]
pkt_sched: sch_qfq: improve efficiency of make_eligible
In make_eligible, a mask is used to decide which groups must become eligible:
the i-th group becomes eligible only if the i-th bit of the mask (from the
right) is set. The mask is computed by left-shifting a 1 by a given number of
places, and decrementing the result. The shift is performed on a ULL to avoid
problems in case the number of places to shift is higher than 31. On a 32-bit
machine, this is more costly than working on an UL. This patch replaces such a
costly operation with two cheaper branches.
The trick is based on the following fact: in case of a shift of at least 32
places, the resulting mask has at least the 32 less significant bits set,
whereas the total number of groups is lower than 32. As a consequence, in this
case it is enough to just set the 32 less significant bits of the mask with a
cheaper ~0UL. In the other case, the shift can be safely performed on a UL.
Reported-by: David S. Miller <davem@davemloft.net>
Reported-by: David Laight <David.Laight@ACULAB.COM>
Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Thu, 11 Jul 2013 00:05:06 +0000 (17:05 -0700)]
gso: Update tunnel segmentation to support Tx checksum offload
This change makes it so that the GRE and VXLAN tunnels can make use of Tx
checksum offload support provided by some drivers via the hw_enc_features.
Without this fix enabling GSO means sacrificing Tx checksum offload and
this actually leads to a performance regression as shown below:
Utilization
Send
Throughput local GSO
10^6bits/s % S state
6276.51 8.39 enabled
7123.52 8.42 disabled
To resolve this it was necessary to address two items. First
netif_skb_features needed to be updated so that it would correctly handle
the Trans Ether Bridging protocol without impacting the need to check for
Q-in-Q tagging. To do this it was necessary to update harmonize_features
so that it used skb_network_protocol instead of just using the outer
protocol.
Second it was necessary to update the GRE and UDP tunnel segmentation
offloads so that they would reset the encapsulation bit and inner header
offsets after the offload was complete.
As a result of this change I have seen the following results on a interface
with Tx checksum enabled for encapsulated frames:
Utilization
Send
Throughput local GSO
10^6bits/s % S state
7123.52 8.42 disabled
8321.75 5.43 enabled
v2: Instead of replacing refrence to skb->protocol with
skb_network_protocol just replace the protocol reference in
harmonize_features to allow for double VLAN tag checks.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Camelia Groza [Thu, 11 Jul 2013 06:55:51 +0000 (09:55 +0300)]
inet: fix spacing in assignment
Found using checkpatch.pl
Signed-off-by: Camelia Groza <camelia.groza@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
dingtianhong [Thu, 11 Jul 2013 11:04:06 +0000 (19:04 +0800)]
ifb: fix oops when loading the ifb failed
If __rtnl_link_register() return faild when loading the ifb, it will
take the wrong path and get oops, so fix it just like dummy.
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
dingtianhong [Thu, 11 Jul 2013 11:04:02 +0000 (19:04 +0800)]
dummy: fix oops when loading the dummy failed
We rename the dummy in modprobe.conf like this:
install dummy0 /sbin/modprobe -o dummy0 --ignore-install dummy
install dummy1 /sbin/modprobe -o dummy1 --ignore-install dummy
We got oops when we run the command:
modprobe dummy0
modprobe dummy1
------------[ cut here ]------------
[ 3302.187584] BUG: unable to handle kernel NULL pointer dereference at
0000000000000008
[ 3302.195411] IP: [<
ffffffff813fe62a>] __rtnl_link_unregister+0x9a/0xd0
[ 3302.201844] PGD
85c94a067 PUD
8517bd067 PMD 0
[ 3302.206305] Oops: 0002 [#1] SMP
[ 3302.299737] task:
ffff88105ccea300 ti:
ffff880eba4a0000 task.ti:
ffff880eba4a0000
[ 3302.307186] RIP: 0010:[<
ffffffff813fe62a>] [<
ffffffff813fe62a>] __rtnl_link_unregister+0x9a/0xd0
[ 3302.316044] RSP: 0018:
ffff880eba4a1dd8 EFLAGS:
00010246
[ 3302.321332] RAX:
0000000000000000 RBX:
ffffffff81a9d738 RCX:
0000000000000002
[ 3302.328436] RDX:
0000000000000000 RSI:
ffffffffa04d602c RDI:
ffff880eba4a1dd8
[ 3302.335541] RBP:
ffff880eba4a1e18 R08:
dead000000200200 R09:
dead000000100100
[ 3302.342644] R10:
0000000000000080 R11:
0000000000000003 R12:
ffffffff81a9d788
[ 3302.349748] R13:
ffffffffa04d7020 R14:
ffffffff81a9d670 R15:
ffff880eba4a1dd8
[ 3302.364910] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 3302.370630] CR2:
0000000000000008 CR3:
000000085e15e000 CR4:
00000000000427e0
[ 3302.377734] DR0:
0000000000000003 DR1:
00000000000000b0 DR2:
0000000000000001
[ 3302.384838] DR3:
0000000000000000 DR6:
00000000ffff0ff0 DR7:
0000000000000400
[ 3302.391940] Stack:
[ 3302.393944]
ffff880eba4a1dd8 ffff880eba4a1dd8 ffff880eba4a1e18 ffffffffa04d70c0
[ 3302.401350]
00000000ffffffef ffffffffa01a8000 0000000000000000 ffffffff816111c8
[ 3302.408758]
ffff880eba4a1e48 ffffffffa01a80be ffff880eba4a1e48 ffffffffa04d70c0
[ 3302.416164] Call Trace:
[ 3302.418605] [<
ffffffffa01a8000>] ? 0xffffffffa01a7fff
[ 3302.423727] [<
ffffffffa01a80be>] dummy_init_module+0xbe/0x1000 [dummy0]
[ 3302.430405] [<
ffffffffa01a8000>] ? 0xffffffffa01a7fff
[ 3302.435535] [<
ffffffff81000322>] do_one_initcall+0x152/0x1b0
[ 3302.441263] [<
ffffffff810ab24b>] do_init_module+0x7b/0x200
[ 3302.446824] [<
ffffffff810ad3d2>] load_module+0x4e2/0x530
[ 3302.452215] [<
ffffffff8127ae40>] ? ddebug_dyndbg_boot_param_cb+0x60/0x60
[ 3302.458979] [<
ffffffff810ad5f1>] SyS_init_module+0xd1/0x130
[ 3302.464627] [<
ffffffff814b9652>] system_call_fastpath+0x16/0x1b
[ 3302.490090] RIP [<
ffffffff813fe62a>] __rtnl_link_unregister+0x9a/0xd0
[ 3302.496607] RSP <
ffff880eba4a1dd8>
[ 3302.500084] CR2:
0000000000000008
[ 3302.503466] ---[ end trace
8342d49cd49f78ed ]---
The reason is that when loading dummy, if __rtnl_link_register() return failed,
the init_module should return and avoid take the wrong path.
Signed-off-by: Tan Xiaojun <tanxiaojun@huawei.com>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Helmut Schaa [Thu, 11 Jul 2013 11:57:34 +0000 (13:57 +0200)]
drivers: net: phy: at803x: Add missing mdio device id
at803x supports Atheros 8030, 8031 and 8035 PHYs. 8031 was missing from
the mdio device id table.
Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hannes Frederic Sowa [Thu, 11 Jul 2013 10:43:42 +0000 (12:43 +0200)]
ipv6: fix route selection if kernel is not compiled with CONFIG_IPV6_ROUTER_PREF
This is a follow-up patch to
3630d40067a21d4dfbadc6002bb469ce26ac5d52
("ipv6: rt6_check_neigh should successfully verify neigh if no NUD
information are available").
Since the removal of rt->n in rt6_info we can end up with a dst ==
NULL in rt6_check_neigh. In case the kernel is not compiled with
CONFIG_IPV6_ROUTER_PREF we should also select a route with unkown
NUD state but we must not avoid doing round robin selection on routes
with the same target. So introduce and pass down a boolean ``do_rr'' to
indicate when we should update rt->rr_ptr. As soon as no route is valid
we do backtracking and do a lookup on a higher level in the fib trie.
v2:
a) Improved rt6_check_neigh logic (no need to create neighbour there)
and documented return values.
v3:
a) Introduce enum rt6_nud_state to get rid of the magic numbers
(thanks to David Miller).
b) Update and shorten commit message a bit to actualy reflect
the source.
Reported-by: Pierre Emeriaud <petrus.lt@gmail.com>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dmitry Kravkov [Thu, 11 Jul 2013 12:48:21 +0000 (15:48 +0300)]
bnx2x: fix tunneling CSUM calculation
Since commit
c957d09ffda417f6c8e3d1f10e2b05228607d6d7
"bnx2x: Remove sparse and coccinelle warnings"
driver provided wrong partial csum for HW in tunneing
scenarios.
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Alexander Duyck <alexander.h.duyck@intel.com>
CC: Pravin Shelar <pshelar@nicira.com>
CC: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Tested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Maarten Lankhorst [Thu, 11 Jul 2013 13:53:21 +0000 (15:53 +0200)]
alx: fix lockdep annotation
Move spin_lock_init to be called before the spinlocks are used, preventing a lockdep splat.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pravin B Shelar [Thu, 11 Jul 2013 18:38:06 +0000 (11:38 -0700)]
vxlan: Fix kernel crash on rmmod.
vxlan exit module unregisters vxlan net and then it unregisters
rtnl ops which triggers vxlan_dellink() from __rtnl_kill_links().
vxlan_dellink() deletes vxlan-dev from vxlan_list which has
list-head in vxlan-net-struct but that is already gone due to
net-unregister. That is how we are getting following crash.
Following commit fixes the crash by fixing module exit path.
BUG: unable to handle kernel paging request at
ffff8804102c8000
IP: [<
ffffffff812cc5e9>] __list_del_entry+0x29/0xd0
PGD
2972067 PUD
83e019067 PMD
83df97067 PTE
80000004102c8060
Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
Modules linked in: ---
CPU: 19 PID: 6712 Comm: rmmod Tainted: GF 3.10.0+ #95
Hardware name: Dell Inc. PowerEdge R620/0KCKR5, BIOS 1.4.8 10/25/2012
task:
ffff88080c47c580 ti:
ffff88080ac50000 task.ti:
ffff88080ac50000
RIP: 0010:[<
ffffffff812cc5e9>] [<
ffffffff812cc5e9>]
__list_del_entry+0x29/0xd0
RSP: 0018:
ffff88080ac51e08 EFLAGS:
00010206
RAX:
ffff8804102c8000 RBX:
ffff88040f0d4b10 RCX:
dead000000200200
RDX:
ffff8804102c8000 RSI:
ffff88080ac51e58 RDI:
ffff88040f0d4b10
RBP:
ffff88080ac51e08 R08:
0000000000000001 R09:
2222222222222222
R10:
2222222222222222 R11:
2222222222222222 R12:
ffff88080ac51e58
R13:
ffffffffa07b8840 R14:
ffffffff81ae48c0 R15:
ffff88080ac51e58
FS:
00007f9ef105c700(0000) GS:
ffff88082a800000(0000)
knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
ffff8804102c8000 CR3:
00000008227e5000 CR4:
00000000000407e0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000ffff0ff0 DR7:
0000000000000400
Stack:
ffff88080ac51e28 ffffffff812cc6a1 2222222222222222 ffff88040f0d4000
ffff88080ac51e48 ffffffffa07b3311 ffff88040f0d4000 ffffffff81ae49c8
ffff88080ac51e98 ffffffff81492fc2 ffff88080ac51e58 ffff88080ac51e58
Call Trace:
[<
ffffffff812cc6a1>] list_del+0x11/0x40
[<
ffffffffa07b3311>] vxlan_dellink+0x51/0x70 [vxlan]
[<
ffffffff81492fc2>] __rtnl_link_unregister+0xa2/0xb0
[<
ffffffff8149448e>] rtnl_link_unregister+0x1e/0x30
[<
ffffffffa07b7b7c>] vxlan_cleanup_module+0x1c/0x2f [vxlan]
[<
ffffffff810c9b31>] SyS_delete_module+0x1d1/0x2c0
[<
ffffffff812b8a0e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[<
ffffffff81582f42>] system_call_fastpath+0x16/0x1b
Code: eb 9f 55 48 8b 17 48 b9 00 01 10 00 00 00 ad de 48 8b 47 08 48 89
e5 48 39 ca 74 29 48 b9 00 02 20 00 00 00 ad de 48 39 c8 74 7a <4c> 8b
00 4c 39 c7 75 53 4c 8b 42 08 4c 39 c7 75 2b 48 89 42 08
RIP [<
ffffffff812cc5e9>] __list_del_entry+0x29/0xd0
RSP <
ffff88080ac51e08>
CR2:
ffff8804102c8000
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sasha Levin [Thu, 11 Jul 2013 17:16:54 +0000 (13:16 -0400)]
9p: fix off by one causing access violations and memory corruption
p9_release_pages() would attempt to dereference one value past the end of
pages[]. This would cause the following crashes:
[ 6293.171817] BUG: unable to handle kernel paging request at
ffff8807c96f3000
[ 6293.174146] IP: [<
ffffffff8412793b>] p9_release_pages+0x3b/0x60
[ 6293.176447] PGD
79c5067 PUD
82c1e3067 PMD
82c197067 PTE
80000007c96f3060
[ 6293.180060] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[ 6293.180060] Modules linked in:
[ 6293.180060] CPU: 62 PID: 174043 Comm: modprobe Tainted: G W 3.10.0-next-
20130710-sasha #3954
[ 6293.180060] task:
ffff8807b803b000 ti:
ffff880787dde000 task.ti:
ffff880787dde000
[ 6293.180060] RIP: 0010:[<
ffffffff8412793b>] [<
ffffffff8412793b>] p9_release_pages+0x3b/0x60
[ 6293.214316] RSP: 0000:
ffff880787ddfc28 EFLAGS:
00010202
[ 6293.214316] RAX:
0000000000000001 RBX:
ffff8807c96f2ff8 RCX:
0000000000000000
[ 6293.222017] RDX:
ffff8807b803b000 RSI:
0000000000000001 RDI:
ffffea001c7e3d40
[ 6293.222017] RBP:
ffff880787ddfc48 R08:
0000000000000000 R09:
0000000000000000
[ 6293.222017] R10:
0000000000000001 R11:
0000000000000000 R12:
0000000000000001
[ 6293.222017] R13:
0000000000000001 R14:
ffff8807cc50c070 R15:
ffff8807cc50c070
[ 6293.222017] FS:
00007f572641d700(0000) GS:
ffff8807f3600000(0000) knlGS:
0000000000000000
[ 6293.256784] CS: 0010 DS: 0000 ES: 0000 CR0:
000000008005003b
[ 6293.256784] CR2:
ffff8807c96f3000 CR3:
00000007c8e81000 CR4:
00000000000006e0
[ 6293.256784] Stack:
[ 6293.256784]
ffff880787ddfcc8 ffff880787ddfcc8 0000000000000000 ffff880787ddfcc8
[ 6293.256784]
ffff880787ddfd48 ffffffff84128be8 ffff880700000002 0000000000000001
[ 6293.256784]
ffff8807b803b000 ffff880787ddfce0 0000100000000000 0000000000000000
[ 6293.256784] Call Trace:
[ 6293.256784] [<
ffffffff84128be8>] p9_virtio_zc_request+0x598/0x630
[ 6293.256784] [<
ffffffff8115c610>] ? wake_up_bit+0x40/0x40
[ 6293.256784] [<
ffffffff841209b1>] p9_client_zc_rpc+0x111/0x3a0
[ 6293.256784] [<
ffffffff81174b78>] ? sched_clock_cpu+0x108/0x120
[ 6293.256784] [<
ffffffff84122a21>] p9_client_read+0xe1/0x2c0
[ 6293.256784] [<
ffffffff81708a90>] v9fs_file_read+0x90/0xc0
[ 6293.256784] [<
ffffffff812bd073>] vfs_read+0xc3/0x130
[ 6293.256784] [<
ffffffff811a78bd>] ? trace_hardirqs_on+0xd/0x10
[ 6293.256784] [<
ffffffff812bd5a2>] SyS_read+0x62/0xa0
[ 6293.256784] [<
ffffffff841a1a00>] tracesys+0xdd/0xe2
[ 6293.256784] Code: 66 90 48 89 fb 41 89 f5 48 8b 3f 48 85 ff 74 29 85 f6 74 25 45 31 e4 66 0f 1f 84 00 00 00 00 00 e8 eb 14 12 fd 41 ff c4 49 63 c4 <48> 8b 3c c3 48 85 ff 74 05 45 39 e5 75 e7 48 83 c4 08 5b 41 5c
[ 6293.256784] RIP [<
ffffffff8412793b>] p9_release_pages+0x3b/0x60
[ 6293.256784] RSP <
ffff880787ddfc28>
[ 6293.256784] CR2:
ffff8807c96f3000
[ 6293.256784] ---[ end trace
50822ee72cd360fc ]---
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Geert Uytterhoeven [Wed, 10 Jul 2013 21:03:34 +0000 (23:03 +0200)]
sh_eth: SH_ETH should depend on HAS_DMA
If NO_DMA=y:
drivers/built-in.o: In function `sh_eth_free_dma_buffer':
drivers/net/ethernet/renesas/sh_eth.c:1103: undefined reference to `dma_free_coherent'
drivers/net/ethernet/renesas/sh_eth.c:1110: undefined reference to `dma_free_coherent'
drivers/built-in.o: In function `sh_eth_ring_init':
drivers/net/ethernet/renesas/sh_eth.c:1065: undefined reference to `dma_alloc_coherent'
drivers/net/ethernet/renesas/sh_eth.c:1086: undefined reference to `dma_free_coherent'
drivers/built-in.o: In function `sh_eth_ring_format':
drivers/net/ethernet/renesas/sh_eth.c:988: undefined reference to `dma_map_single'
drivers/built-in.o: In function `sh_eth_txfree':
drivers/net/ethernet/renesas/sh_eth.c:1220: undefined reference to `dma_unmap_single'
drivers/built-in.o: In function `sh_eth_rx':
drivers/net/ethernet/renesas/sh_eth.c:1323: undefined reference to `dma_map_single'
drivers/built-in.o: In function `sh_eth_start_xmit':
drivers/net/ethernet/renesas/sh_eth.c:1954: undefined reference to `dma_map_single'
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hannes Frederic Sowa [Wed, 10 Jul 2013 21:00:57 +0000 (23:00 +0200)]
ipv6: in case of link failure remove route directly instead of letting it expire
We could end up expiring a route which is part of an ecmp route set. Doing
so would invalidate the rt->rt6i_nsiblings calculations and could provoke
the following panic:
[ 80.144667] ------------[ cut here ]------------
[ 80.145172] kernel BUG at net/ipv6/ip6_fib.c:733!
[ 80.145172] invalid opcode: 0000 [#1] SMP
[ 80.145172] Modules linked in: 8021q nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_filter ebtables ip6table_filter ip6_tables
+snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_page_alloc snd_timer virtio_balloon snd soundcore i2c_piix4 i2c_core virtio_net virtio_blk
[ 80.145172] CPU: 1 PID: 786 Comm: ping6 Not tainted 3.10.0+ #118
[ 80.145172] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 80.145172] task:
ffff880117fa0000 ti:
ffff880118770000 task.ti:
ffff880118770000
[ 80.145172] RIP: 0010:[<
ffffffff815f3b5d>] [<
ffffffff815f3b5d>] fib6_add+0x75d/0x830
[ 80.145172] RSP: 0018:
ffff880118771798 EFLAGS:
00010202
[ 80.145172] RAX:
0000000000000000 RBX:
0000000000000000 RCX:
ffff88011350e480
[ 80.145172] RDX:
ffff88011350e238 RSI:
0000000000000004 RDI:
ffff88011350f738
[ 80.145172] RBP:
ffff880118771848 R08:
ffff880117903280 R09:
0000000000000001
[ 80.145172] R10:
0000000000000000 R11:
0000000000000000 R12:
ffff88011350f680
[ 80.145172] R13:
ffff880117903280 R14:
ffff880118771890 R15:
ffff88011350ef90
[ 80.145172] FS:
00007f02b5127740(0000) GS:
ffff88011fd00000(0000) knlGS:
0000000000000000
[ 80.145172] CS: 0010 DS: 0000 ES: 0000 CR0:
000000008005003b
[ 80.145172] CR2:
00007f981322a000 CR3:
00000001181b1000 CR4:
00000000000006e0
[ 80.145172] DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
[ 80.145172] DR3:
0000000000000000 DR6:
00000000ffff0ff0 DR7:
0000000000000400
[ 80.145172] Stack:
[ 80.145172]
0000000000000001 ffff880100000000 ffff880100000000 ffff880117903280
[ 80.145172]
0000000000000000 ffff880119a4cf00 0000000000000400 00000000000007fa
[ 80.145172]
0000000000000000 0000000000000000 0000000000000000 ffff88011350f680
[ 80.145172] Call Trace:
[ 80.145172] [<
ffffffff815eeceb>] ? rt6_bind_peer+0x4b/0x90
[ 80.145172] [<
ffffffff815ed985>] __ip6_ins_rt+0x45/0x70
[ 80.145172] [<
ffffffff815eee35>] ip6_ins_rt+0x35/0x40
[ 80.145172] [<
ffffffff815ef1e4>] ip6_pol_route.isra.44+0x3a4/0x4b0
[ 80.145172] [<
ffffffff815ef34a>] ip6_pol_route_output+0x2a/0x30
[ 80.145172] [<
ffffffff81616077>] fib6_rule_action+0xd7/0x210
[ 80.145172] [<
ffffffff815ef320>] ? ip6_pol_route_input+0x30/0x30
[ 80.145172] [<
ffffffff81553026>] fib_rules_lookup+0xc6/0x140
[ 80.145172] [<
ffffffff81616374>] fib6_rule_lookup+0x44/0x80
[ 80.145172] [<
ffffffff815ef320>] ? ip6_pol_route_input+0x30/0x30
[ 80.145172] [<
ffffffff815edea3>] ip6_route_output+0x73/0xb0
[ 80.145172] [<
ffffffff815dfdf3>] ip6_dst_lookup_tail+0x2c3/0x2e0
[ 80.145172] [<
ffffffff813007b1>] ? list_del+0x11/0x40
[ 80.145172] [<
ffffffff81082a4c>] ? remove_wait_queue+0x3c/0x50
[ 80.145172] [<
ffffffff815dfe4d>] ip6_dst_lookup_flow+0x3d/0xa0
[ 80.145172] [<
ffffffff815fda77>] rawv6_sendmsg+0x267/0xc20
[ 80.145172] [<
ffffffff815a8a83>] inet_sendmsg+0x63/0xb0
[ 80.145172] [<
ffffffff8128eb93>] ? selinux_socket_sendmsg+0x23/0x30
[ 80.145172] [<
ffffffff815218d6>] sock_sendmsg+0xa6/0xd0
[ 80.145172] [<
ffffffff81524a68>] SYSC_sendto+0x128/0x180
[ 80.145172] [<
ffffffff8109825c>] ? update_curr+0xec/0x170
[ 80.145172] [<
ffffffff81041d09>] ? kvm_clock_get_cycles+0x9/0x10
[ 80.145172] [<
ffffffff810afd1e>] ? __getnstimeofday+0x3e/0xd0
[ 80.145172] [<
ffffffff8152509e>] SyS_sendto+0xe/0x10
[ 80.145172] [<
ffffffff8164efd9>] system_call_fastpath+0x16/0x1b
[ 80.145172] Code: fe ff ff 41 f6 45 2a 06 0f 85 ca fe ff ff 49 8b 7e 08 4c 89 ee e8 94 ef ff ff e9 b9 fe ff ff 48 8b 82 28 05 00 00 e9 01 ff ff ff <0f> 0b 49 8b 54 24 30 0d 00 00 40 00 89 83 14 01 00 00 48 89 53
[ 80.145172] RIP [<
ffffffff815f3b5d>] fib6_add+0x75d/0x830
[ 80.145172] RSP <
ffff880118771798>
[ 80.387413] ---[ end trace
02f20b7a8b81ed95 ]---
[ 80.390154] Kernel panic - not syncing: Fatal exception in interrupt
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jason Wang [Wed, 10 Jul 2013 05:43:28 +0000 (13:43 +0800)]
macvtap: correctly linearize skb when zerocopy is used
Userspace may produce vectors greater than MAX_SKB_FRAGS. When we try to
linearize parts of the skb to let the rest of iov to be fit in
the frags, we need count copylen into linear when calling macvtap_alloc_skb()
instead of partly counting it into data_len. Since this breaks
zerocopy_sg_from_iovec() since its inner counter assumes nr_frags should
be zero at beginning. This cause nr_frags to be increased wrongly without
setting the correct frags.
This bug were introduced from
b92946e2919134ebe2a4083e4302236295ea2a73
(macvtap: zerocopy: validate vectors before building skb).
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jason Wang [Wed, 10 Jul 2013 05:43:27 +0000 (13:43 +0800)]
tuntap: correctly linearize skb when zerocopy is used
Userspace may produce vectors greater than MAX_SKB_FRAGS. When we try to
linearize parts of the skb to let the rest of iov to be fit in
the frags, we need count copylen into linear when calling tun_alloc_skb()
instead of partly counting it into data_len. Since this breaks
zerocopy_sg_from_iovec() since its inner counter assumes nr_frags should
be zero at beginning. This cause nr_frags to be increased wrongly without
setting the correct frags.
This bug were introduced from
0690899b4d4501b3505be069b9a687e68ccbe15b
(tun: experimental zero copy tx support)
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
dingtianhong [Wed, 10 Jul 2013 04:04:02 +0000 (12:04 +0800)]
ifb: fix rcu_sched self-detected stalls
According to the commit
16b0dc29c1af9df341428f4c49ada4f626258082
(dummy: fix rcu_sched self-detected stalls)
Eric Dumazet fix the problem in dummy, but the ifb will occur the
same problem like the dummy modules.
Trying to "modprobe ifb numifbs=30000" triggers :
INFO: rcu_sched self-detected stall on CPU
After this splat, RTNL is locked and reboot is needed.
We must call cond_resched() to avoid this, even holding RTNL.
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
hayeswang [Mon, 8 Jul 2013 09:09:01 +0000 (17:09 +0800)]
r8169: add a new chip for RTL8411
Add a new chip for RTL8411 series.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 11 Jul 2013 00:10:40 +0000 (17:10 -0700)]
Merge "net: finish renaming lls to busy poll"
Eliezer Tamir says:
====================
Here are three patches that complete the rename of lls to busy-poll
1. rename include/net/ll_poll.h to include/net/busy_poll.h
2. Rename ndo_ll_poll to ndo_busy_poll.
Rename sk_mark_ll to sk_mark_napi_id.
Rename skb_mark_ll to skb_mark_napi_id.
Correct all useres of these functions.
Update comments and defines in include/net/busy_poll.h
3. Rename LL_SO to BUSY_POLL_SO
Rename sysctl_net_ll_{read,poll} to sysctl_busy_{read,poll}
Fix up users of these variables.
Fix documentation for sysctl.
v2 fixed forgetting the ndo changes in v1
v3 is a resend with -M
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Eliezer Tamir [Wed, 10 Jul 2013 14:13:36 +0000 (17:13 +0300)]
net: rename busy poll socket op and globals
Rename LL_SO to BUSY_POLL_SO
Rename sysctl_net_ll_{read,poll} to sysctl_busy_{read,poll}
Fix up users of these variables.
Fix documentation for sysctl.
a patch for the socket.7 man page will follow separately,
because of limitations of my mail setup.
Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eliezer Tamir [Wed, 10 Jul 2013 14:13:26 +0000 (17:13 +0300)]
net: rename ll methods to busy-poll
Rename ndo_ll_poll to ndo_busy_poll.
Rename sk_mark_ll to sk_mark_napi_id.
Rename skb_mark_ll to skb_mark_napi_id.
Correct all useres of these functions.
Update comments and defines in include/net/busy_poll.h
Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eliezer Tamir [Wed, 10 Jul 2013 14:13:17 +0000 (17:13 +0300)]
net: rename include/net/ll_poll.h to include/net/busy_poll.h
Rename the file and correct all the places where it is included.
Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Wed, 10 Jul 2013 18:16:00 +0000 (11:16 -0700)]
Merge tag 'mmc-updates-for-3.11-rc1' of git://git./linux/kernel/git/cjb/mmc
Pull MMC updates from Chris Ball:
"MMC highlights for 3.11:
Core:
- Add support for eMMC 5.1 devices
- Add MMC_CAP_AGGRESSIVE_PM capability for aggressive power
management of eMMC/SD between requests, using runtime PM
- Add an ioctl to perform the eMMC 4.5 Sanitize command. Sample code
at:
git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc-utils.git
Drivers:
- dw_mmc: Add support for Rockchip's Cortex-A9 SoCs
- dw_mmc: Add support for Altera SoCFPGAs
- sdhci-esdhc-imx: Add support for 8-bit bus width, non-removable
cards
- sdhci-bcm-kona: New driver for Broadcom Kona (281xx) SoCs
- sdhi/tmio: Add DT DMA support"
* tag 'mmc-updates-for-3.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc: (87 commits)
mmc: bcm281xx SDHCI driver
mmc: sdhci: add card_event callback to sdhci
mmc: core: Fixup Oops for SDIO shutdown
mmc: sdhci-pci: add another device id
mmc: esdhc: Fix bug when writing to SDHCI_HOST_CONTROL register
mmc: esdhc: Add support for 8-bit bus width and non-removable card
mmc: core: production year for eMMC 4.41 and later
mmc: omap: remove unnecessary #if 0's
mmc: sdhci: fix ctrl_2 on super-speed selection
mmc: dw_mmc-pltfm: add Rockchip variant
mmc: dw_mmc-pltfm: move probe and remove below dt match table
mmc: dw_mmc-pltfm: remove static from dw_mci_pltfm_remove
mmc: sdhci-acpi: add support for eMMC hardware reset for HID
80860F14
mmc: sdhci-pci: add support for eMMC hardware reset for BYT eMMC.
mmc: dw_mmc: Add support DW SD/MMC driver on SOCFPGA
mmc: sdhci: fix caps2 for HS200
sdhci-pxav3: Fix runtime PM initialization
mmc: core: Add DT-bindings for MMC_CAP2_FULL_PWR_CYCLE
mmc: core: Invent MMC_CAP2_FULL_PWR_CYCLE
mmc: core: Enable power_off_notify for eMMC shutdown sequence
...
Linus Torvalds [Wed, 10 Jul 2013 18:14:56 +0000 (11:14 -0700)]
Merge tag 'for-3.11-rc1' of git://gitorious.org/linux-pwm/linux-pwm
Pull pwm changes from Thierry Reding:
"A new driver supports driving PWM signals using the TPU unit found on
various Renesas SoCs. Furthermore support is added for the NXP
PCA9685 LED controller. Another big chunk is the sysfs interface
which has been in the works for quite some time.
The remaining patches are a random assortment of cleanups and fixes"
* tag 'for-3.11-rc1' of git://gitorious.org/linux-pwm/linux-pwm:
pwm: pwm-tiehrpwm: Use clk_enable/disable instead clk_prepare/unprepare.
pwm: pca9685: Fix wrong argument to set MODE1_SLEEP bit
pwm: renesas-tpu: Add MODULE_ALIAS to make module auto loading work
pwm: renesas-tpu: fix return value check in tpu_probe()
pwm: Add Renesas TPU PWM driver
pwm: Add sysfs interface
pwm: Fill in missing .owner fields
pwm: add pca9685 driver
pwm: atmel-tcb: prepare clk before calling enable
pwm: devm: alloc correct pointer size
pwm: mxs: Let device core handle pinctrl
MAINTAINERS: Update PWM subsystem entry
Linus Torvalds [Wed, 10 Jul 2013 18:13:00 +0000 (11:13 -0700)]
Merge tag 'for-v3.11' of git://git.infradead.org/battery-2.6
Pull battery subsystem update from Anton Vorontsov:
"Nothing exciting this time, just assorted fixes and cleanups"
* tag 'for-v3.11' of git://git.infradead.org/battery-2.6: (25 commits)
charger-manager: Fix regulator_get() return check
charger-manager: Fix a bug when it unregisters notifier block of extcon
tps65090-charger: Add dt node to power_supply
sbs-battery: Add dt to power_supply struct
power_supply: Add of_node_put to fix refcount
power_supply: Move of_node out of the #ifdef CONFIG_OF
power/reset: Make the vexpress driver optional on arm and arm64
charger-manager: Add missing newlines, fix a couple of typos, add pr_fmt
tps65090-charger: Fix AC detect
MAINTAINERS: Update email address for Anton Vorontsov
charger-manager: Ensure event is not used as format string
power_supply: Replace strict_strtoul() with kstrtoul()
generic-adc-battery: Fix checking if none of the channels are supported
power: Use platform_{get,set}_drvdata()
pm2301_charger: Return error if create_singlethread_workqueue fails
pm2301_charger: Fix NULL pointer dereference
lp8727_charger: Support the device tree feature
twl4030_charger: Remove unnecessary platform_set_drvdata()
rx51_battery: Remove unnecessary platform_set_drvdata()
jz4740-battery: Remove unnecessary platform_set_drvdata()
...
Linus Torvalds [Wed, 10 Jul 2013 18:10:27 +0000 (11:10 -0700)]
Merge tag 'mfd-3.11-1' of git://git./linux/kernel/git/sameo/mfd-next
Pull MFD update from Samuel Ortiz:
"For the 3.11 merge we only have one new MFD driver for the Kontron
PLD.
But we also have:
- Support for the TPS659038 PMIC from the palmas driver.
- Intel's Coleto Creek and Avoton SoCs support from the lpc_ich
driver.
- RTL8411B support from the rtsx driver.
- More DT support for the Arizona, max8998, twl4030-power and the
ti_am335x_tsadc drivers.
- The SSBI driver move under MFD.
- A conversion to the devm_* API for most of the MFD drivers.
- The twl4030-power got split from twl-core into its own module.
- A major ti_am335x_adc cleanup, leading to a proper DT support.
- Our regular arizona and wm* updates and cleanups from the Wolfson
folks.
- A better error handling and initialization, and a regulator
subdevice addition for the 88pm80x driver.
- A bulk platform_set_drvdata() call removal that's no longer need
since commit
0998d0631001 ("device-core: Ensure drvdata = NULL when
no driver is bound")
* tag 'mfd-3.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-next: (102 commits)
mfd: sec: Provide max_register to regmap
mfd: wm8994: Remove duplicate check for active JACKDET
MAINTAINERS: Add include directory to MFD file patterns
mfd: sec: Remove fields not used since regmap conversion
watchdog: Kontron PLD watchdog timer driver
mfd: max8998: Add support for Device Tree
regulator: max8998: Use arrays for specifying voltages in platform data
mfd: max8998: Add irq domain support
regulator: palmas: Add TPS659038 support
mfd: Kontron PLD mfd driver
mfd: palmas: Add TPS659038 PMIC support
mfd: palmas: Add SMPS10_BOOST feature
mfd: palmas: Check if irq is valid
mfd: lpc_ich: iTCO_wdt patch for Intel Coleto Creek DeviceIDs
mfd: twl-core: Change TWL6025 references to TWL6032
mfd: davinci_voicecodec: Fix build breakage
mfd: vexpress: Make the driver optional for arm and arm64
mfd: htc-egpio: Use devm_ioremap_nocache() instead of ioremap_nocache()
mfd: davinci_voicecodec: Convert to use devm_* APIs
mfd: twl4030-power: Fix relocking on error
...
Linus Torvalds [Wed, 10 Jul 2013 18:04:38 +0000 (11:04 -0700)]
Merge branch 'hwmon-for-linus' of git://git./linux/kernel/git/jdelvare/staging
Pull hwmon update from Jean Delvare.
* 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging:
hwmon: (lm63) Drop redundant safety on cache lifetime
hwmon: (lm90) Drop redundant safety on cache lifetime
Linus Torvalds [Wed, 10 Jul 2013 18:03:58 +0000 (11:03 -0700)]
Merge tag 'regulator-v3.11-2' of git://git./linux/kernel/git/broonie/regulator
Pull regulator fixes from Mark Brown:
"Fixes for the merge window
A set of small fixes for issues noticed during the merge window, all
very much non-invasive"
* tag 'regulator-v3.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
MAINTAINERS: Update git repository
regulator: max8997: Fix a trivial typo in documentation
regulator: s5m8767: Fix a trivial typo in documentation
regulator: s2mps11: Convert ramp rate to uV/us and set default ramp rate
regulator: s5m8767: Update s5m8767-regulator bindings document
Linus Torvalds [Wed, 10 Jul 2013 18:02:58 +0000 (11:02 -0700)]
Merge tag 'firewire-updates' of git://git./linux/kernel/git/ieee1394/linux1394
Pull firewire updates from Stefan Richter:
"Make struct ieee1394_device_id.driver_data actually avaliable to 1394
protocol drivers. This is especially useful to 1394 audio drivers for
model-specific parameters and methods"
* tag 'firewire-updates' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
firewire: remove support of fw_driver.driver.probe and .remove methods
firewire: introduce fw_driver.probe and .remove methods
Linus Torvalds [Wed, 10 Jul 2013 17:17:01 +0000 (10:17 -0700)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 fix from Ingo Molnar:
"irq-tracing fixlet"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/tracing: Add irq_enter/exit() in smp_trace_reschedule_interrupt()
Linus Torvalds [Wed, 10 Jul 2013 17:16:07 +0000 (10:16 -0700)]
Merge branch 'next' of git://git.monstr.eu/linux-2.6-microblaze
Pull microblaze update from Michal Simek:
"This Microblaze merge window is quite minimal.
I have also added to my branch one xilinx systemace sparse fix because
haven't got any reply from block maintainer."
* 'next' of git://git.monstr.eu/linux-2.6-microblaze:
xilinx systemace: Fix sparse warnings
microblaze: Move __NR_syscalls from uapi
microblaze: Enable KGDB in defconfig
microblaze: Don't mark arch_kgdb_ops as const.
Linus Torvalds [Wed, 10 Jul 2013 17:14:35 +0000 (10:14 -0700)]
Merge tag 'metag-fixes-for-v3.11-1' of git://git./linux/kernel/git/jhogan/metag
Pull arch/metag fixes from James Hogan:
"This is just a single fix to fix bad UDP checksums sometimes being
generated to IP addresses *.*.255.255"
* tag 'metag-fixes-for-v3.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/metag:
metag: checksum.h: fix carry in csum_tcpudp_nofold
Linus Torvalds [Wed, 10 Jul 2013 17:12:58 +0000 (10:12 -0700)]
Merge tag 'blackfin-for-linus' of git://git./linux/kernel/git/realmz6/blackfin-linux
Pull blackfin updates from Steven Miao:
"blackfin updates for Linux 3.11"
* tag 'blackfin-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/realmz6/blackfin-linux:
smp: refine bf561 smpboot code
bf609: stmmac: fix build after stmmac_mdio_bus_data changed
bf609: add cpu revision 0.1
bf609: rename bfin6xx_spi to bfin_spi3
kgdb: blackfin: include irq_regs.h in kgdb.c
Linus Torvalds [Wed, 10 Jul 2013 17:11:26 +0000 (10:11 -0700)]
Merge tag 'arc-v3.11-rc1-part2' of git://git./linux/kernel/git/vgupta/arc
Pull second set of ARC architecture updates from Vineet Gupta:
"Couple of Platform updates (Device Tree files primarily) given that
the corresponding drivers (net/ethernet/arc/*, irqctl/irq-tb10x.c)
have now been merged into your tree.
Ideally these shd have been part of same submissions, oh well..."
* tag 'arc-v3.11-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
ARC: [TB10x] Updates for irqchip driver
ARC: [plat-arcfpga] Enable arc_emac for ARCAngle4 Board
Linus Torvalds [Wed, 10 Jul 2013 17:10:02 +0000 (10:10 -0700)]
Merge branch 'parisc-for-3.11' of git://git./linux/kernel/git/deller/parisc-linux
Pull parisc updates from Helge Deller:
"The PA-RISC updates for v3.11 include a gcc miscompilation fix,
gzip-compressed vmlinuz support, a fix in the PCI code for ATI FireGL
support on c8000 machines, a fix to prevent that %sr1 is being
clobbered and a few smaller optimizations and documentation updates"
* 'parisc-for-3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Fix gcc miscompilation in pa_memcpy()
parisc: Ensure volatile space register %sr1 is not clobbered
parisc: optimize mtsp(0,sr) inline assembly
parisc: switch to gzip-compressed vmlinuz kernel
parisc: document the shadow registers
parisc: more capabilities info in /proc/cpuinfo
parisc: fix LMMIO mismatch between PAT length and MASK register
Linus Torvalds [Wed, 10 Jul 2013 17:09:04 +0000 (10:09 -0700)]
Merge tag 'please-pull-fix-ia64-warnings' of git://git./linux/kernel/git/aegl/linux
P{ill ia64 warning fix from Tony Luck:
"Add some casts to avoid warnings from efi_runtime_services_t members"
* tag 'please-pull-fix-ia64-warnings' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
[IA64] sim: Add casts to avoid assignment warnings
Michal Simek [Thu, 7 Feb 2013 16:28:20 +0000 (17:28 +0100)]
xilinx systemace: Fix sparse warnings
Fix sysace sparse warnings.
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Michal Simek [Mon, 8 Jul 2013 07:50:24 +0000 (09:50 +0200)]
microblaze: Move __NR_syscalls from uapi
The reason is that other applications like strace
think that every __NR_xx is syscall.
Also __NR_syscalls is not used by user applications/libs.
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Linus Torvalds [Wed, 10 Jul 2013 01:24:39 +0000 (18:24 -0700)]
Merge git://git./linux/kernel/git/davem/net-next
Pull networking updates from David Miller:
"This is a re-do of the net-next pull request for the current merge
window. The only difference from the one I made the other day is that
this has Eliezer's interface renames and the timeout handling changes
made based upon your feedback, as well as a few bug fixes that have
trickeled in.
Highlights:
1) Low latency device polling, eliminating the cost of interrupt
handling and context switches. Allows direct polling of a network
device from socket operations, such as recvmsg() and poll().
Currently ixgbe, mlx4, and bnx2x support this feature.
Full high level description, performance numbers, and design in
commit
0a4db187a999 ("Merge branch 'll_poll'")
From Eliezer Tamir.
2) With the routing cache removed, ip_check_mc_rcu() gets exercised
more than ever before in the case where we have lots of multicast
addresses. Use a hash table instead of a simple linked list, from
Eric Dumazet.
3) Add driver for Atheros CQA98xx 802.11ac wireless devices, from
Bartosz Markowski, Janusz Dziedzic, Kalle Valo, Marek Kwaczynski,
Marek Puzyniak, Michal Kazior, and Sujith Manoharan.
4) Support reporting the TUN device persist flag to userspace, from
Pavel Emelyanov.
5) Allow controlling network device VF link state using netlink, from
Rony Efraim.
6) Support GRE tunneling in openvswitch, from Pravin B Shelar.
7) Adjust SOCK_MIN_RCVBUF and SOCK_MIN_SNDBUF for modern times, from
Daniel Borkmann and Eric Dumazet.
8) Allow controlling of TCP quickack behavior on a per-route basis,
from Cong Wang.
9) Several bug fixes and improvements to vxlan from Stephen
Hemminger, Pravin B Shelar, and Mike Rapoport. In particular,
support receiving on multiple UDP ports.
10) Major cleanups, particular in the area of debugging and cookie
lifetime handline, to the SCTP protocol code. From Daniel
Borkmann.
11) Allow packets to cross network namespaces when traversing tunnel
devices. From Nicolas Dichtel.
12) Allow monitoring netlink traffic via AF_PACKET sockets, in a
manner akin to how we monitor real network traffic via ptype_all.
From Daniel Borkmann.
13) Several bug fixes and improvements for the new alx device driver,
from Johannes Berg.
14) Fix scalability issues in the netem packet scheduler's time queue,
by using an rbtree. From Eric Dumazet.
15) Several bug fixes in TCP loss recovery handling, from Yuchung
Cheng.
16) Add support for GSO segmentation of MPLS packets, from Simon
Horman.
17) Make network notifiers have a real data type for the opaque
pointer that's passed into them. Use this to properly handle
network device flag changes in arp_netdev_event(). From Jiri
Pirko and Timo Teräs.
18) Convert several drivers over to module_pci_driver(), from Peter
Huewe.
19) tcp_fixup_rcvbuf() can loop 500 times over loopback, just use a
O(1) calculation instead. From Eric Dumazet.
20) Support setting of explicit tunnel peer addresses in ipv6, just
like ipv4. From Nicolas Dichtel.
21) Protect x86 BPF JIT against spraying attacks, from Eric Dumazet.
22) Prevent a single high rate flow from overruning an individual cpu
during RX packet processing via selective flow shedding. From
Willem de Bruijn.
23) Don't use spinlocks in TCP md5 signing fast paths, from Eric
Dumazet.
24) Don't just drop GSO packets which are above the TBF scheduler's
burst limit, chop them up so they are in-bounds instead. Also
from Eric Dumazet.
25) VLAN offloads are missed when configured on top of a bridge, fix
from Vlad Yasevich.
26) Support IPV6 in ping sockets. From Lorenzo Colitti.
27) Receive flow steering targets should be updated at poll() time
too, from David Majnemer.
28) Fix several corner case regressions in PMTU/redirect handling due
to the routing cache removal, from Timo Teräs.
29) We have to be mindful of ipv4 mapped ipv6 sockets in
upd_v6_push_pending_frames(). From Hannes Frederic Sowa.
30) Fix L2TP sequence number handling bugs, from James Chapman."
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1214 commits)
drivers/net: caif: fix wrong rtnl_is_locked() usage
drivers/net: enic: release rtnl_lock on error-path
vhost-net: fix use-after-free in vhost_net_flush
net: mv643xx_eth: do not use port number as platform device id
net: sctp: confirm route during forward progress
virtio_net: fix race in RX VQ processing
virtio: support unlocked queue poll
net/cadence/macb: fix bug/typo in extracting gem_irq_read_clear bit
Documentation: Fix references to defunct linux-net@vger.kernel.org
net/fs: change busy poll time accounting
net: rename low latency sockets functions to busy poll
bridge: fix some kernel warning in multicast timer
sfc: Fix memory leak when discarding scattered packets
sit: fix tunnel update via netlink
dt:net:stmmac: Add dt specific phy reset callback support.
dt:net:stmmac: Add support to dwmac version 3.610 and 3.710
dt:net:stmmac: Allocate platform data only if its NULL.
net:stmmac: fix memleak in the open method
ipv6: rt6_check_neigh should successfully verify neigh if no NUD information are available
net: ipv6: fix wrong ping_v6_sendmsg return value
...
Linus Torvalds [Tue, 9 Jul 2013 23:04:31 +0000 (16:04 -0700)]
Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linux
Pull drm updates from Dave Airlie:
"Okay this is the big one, I was stalled on the fbdev pull req as I
stupidly let fbdev guys merge a patch I required to fix a warning with
some patches I had, they ended up merging the patch from the wrong
place, but the warning should be fixed. In future I'll just take the
patch myself!
Outside drm:
There are some snd changes for the HDMI audio interactions on haswell,
they've been acked for inclusion via my tree. This relies on the
wound/wait tree from Ingo which is already merged.
Major changes:
AMD finally released the dynamic power management code for all their
GPUs from r600->present day, this is great, off by default for now but
also a huge amount of code, in fact it is most of this pull request.
Since it landed there has been a lot of community testing and Alex has
sent a lot of fixes for any bugs found so far. I suspect radeon might
now be the biggest kernel driver ever :-P p.s. radeon.dpm=1 to enable
dynamic powermanagement for anyone.
New drivers:
Renesas r-car display unit.
Other highlights:
- core: GEM CMA prime support, use new w/w mutexs for TTM
reservations, cursor hotspot, doc updates
- dvo chips: chrontel 7010B support
- i915: Haswell (fbc, ips, vecs, watermarks, audio powerwell),
Valleyview (enabled by default, rc6), lots of pll reworking, 30bpp
support (this time for sure)
- nouveau: async buffer object deletion, context/register init
updates, kernel vp2 engine support, GF117 support, GK110 accel
support (with external nvidia ucode), context cleanups.
- exynos: memory leak fixes, Add S3C64XX SoC series support, device
tree updates, common clock framework support,
- qxl: cursor hotspot support, multi-monitor support, suspend/resume
support
- mgag200: hw cursor support, g200 mode limiting
- shmobile: prime support
- tegra: fixes mostly
I've been banging on this quite a lot due to the size of it, and it
seems to okay on everything I've tested it on."
* 'drm-next' of git://people.freedesktop.org/~airlied/linux: (811 commits)
drm/radeon/dpm: implement vblank_too_short callback for si
drm/radeon/dpm: implement vblank_too_short callback for cayman
drm/radeon/dpm: implement vblank_too_short callback for btc
drm/radeon/dpm: implement vblank_too_short callback for evergreen
drm/radeon/dpm: implement vblank_too_short callback for 7xx
drm/radeon/dpm: add checks against vblank time
drm/radeon/dpm: add helper to calculate vblank time
drm/radeon: remove stray line in old pm code
drm/radeon/dpm: fix display_gap programming on rv7xx
drm/nvc0/gr: fix gpc firmware regression
drm/nouveau: fix minor thinko causing bo moves to not be async on kepler
drm/radeon/dpm: implement force performance level for TN
drm/radeon/dpm: implement force performance level for ON/LN
drm/radeon/dpm: implement force performance level for SI
drm/radeon/dpm: implement force performance level for cayman
drm/radeon/dpm: implement force performance levels for 7xx/eg/btc
drm/radeon/dpm: add infrastructure to force performance levels
drm/radeon: fix surface setup on r1xx
drm/radeon: add support for 3d perf states on older asics
drm/radeon: set default clocks for SI when DPM is disabled
...
Linus Torvalds [Tue, 9 Jul 2013 22:51:32 +0000 (15:51 -0700)]
Merge tag 'fbdev-for-3.11' of git://git./linux/kernel/git/plagnioj/linux-fbdev
Pull fbdev update from Jean-Christophe PLAGNIOL-VILLARD:
"Various fbdev changes for 3.11
- xilinxfb updates
- Small cleanups and fixes to multiple drivers
- OMAP display subsystem bug updates
- imxfb dt support"
* tag 'fbdev-for-3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/plagnioj/linux-fbdev: (95 commits)
video: imxfb: Add DT support
video: i740fb: Make i740fb_init static
fb: make fp_get_options name argument const
video: mmp: fix graphics/video layer enable/mask swap issue
video: mmp: fix memcpy wrong size for mmp_addr issue
radeon: use pdev->pm_cap instead of pci_find_capability(..,PCI_CAP_ID_PM)
aty128fb: use pdev->pm_cap instead of pci_find_capability(..,PCI_CAP_ID_PM)
video: of_display_timing.h: Declare 'display_timing'
fbdev: bfin-lq035q1-fb: Use dev_pm_ops
fbmem: return -EFAULT on copy_to_user() failure
OMAPDSS: DPI: Fix wrong pixel clock limit
video: replace strict_strtoul() with kstrtoul()
uvesafb: Correct/simplify warning message
fb: fix atyfb unused data warnings
fb: fix atyfb build warning
video: imxfb: Make local symbols static
video: udlfb: Make local symbol static
video: udlfb: Use NULL instead of 0
video: smscufx: Use NULL instead of 0
video: remove unnecessary platform_set_drvdata()
...
Linus Torvalds [Tue, 9 Jul 2013 20:33:36 +0000 (13:33 -0700)]
Merge branch 'akpm' (updates from Andrew Morton)
Merge second patch-bomb from Andrew Morton:
- misc fixes
- audit stuff
- fanotify/inotify/dnotify things
- most of the rest of MM. The new cache shrinker code from Glauber and
Dave Chinner probably isn't quite stabilized yet.
- ptrace
- ipc
- partitions
- reboot cleanups
- add LZ4 decompressor, use it for kernel compression
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (118 commits)
lib/scatterlist: error handling in __sg_alloc_table()
scsi_debug: fix do_device_access() with wrap around range
crypto: talitos: use sg_pcopy_to_buffer()
lib/scatterlist: introduce sg_pcopy_from_buffer() and sg_pcopy_to_buffer()
lib/scatterlist: factor out sg_miter_get_next_page() from sg_miter_next()
crypto: add lz4 Cryptographic API
lib: add lz4 compressor module
arm: add support for LZ4-compressed kernel
lib: add support for LZ4-compressed kernel
decompressor: add LZ4 decompressor module
lib: add weak clz/ctz functions
reboot: move arch/x86 reboot= handling to generic kernel
reboot: arm: change reboot_mode to use enum reboot_mode
reboot: arm: prepare reboot_mode for moving to generic kernel code
reboot: arm: remove unused restart_mode fields from some arm subarchs
reboot: unicore32: prepare reboot_mode for moving to generic kernel code
reboot: x86: prepare reboot_mode for moving to generic kernel code
reboot: checkpatch.pl the new kernel/reboot.c file
reboot: move shutdown/reboot related functions to kernel/reboot.c
reboot: remove -stable friendly PF_THREAD_BOUND define
...
Helge Deller [Thu, 4 Jul 2013 20:34:11 +0000 (22:34 +0200)]
parisc: Fix gcc miscompilation in pa_memcpy()
When running the LTP testsuite one may hit this kernel BUG() with the
write06 testcase:
kernel BUG at mm/filemap.c:2023!
CPU: 1 PID: 8614 Comm: writev01 Not tainted 3.10.0-rc7-64bit-c3000+ #6
IASQ:
0000000000000000 0000000000000000 IAOQ:
00000000401e6e84 00000000401e6e88
IIR:
03ffe01f ISR:
0000000010340000 IOR:
000001fbe0380820
CPU: 1 CR30:
00000000bef80000 CR31:
ffffffffffffffff
ORIG_R28:
00000000bdc192c0
IAOQ[0]: iov_iter_advance+0x3c/0xc0
IAOQ[1]: iov_iter_advance+0x40/0xc0
RP(r2): generic_file_buffered_write+0x204/0x3f0
Backtrace:
[<
00000000401e764c>] generic_file_buffered_write+0x204/0x3f0
[<
00000000401eab24>] __generic_file_aio_write+0x244/0x448
[<
00000000401eadc0>] generic_file_aio_write+0x98/0x150
[<
000000004024f460>] do_sync_readv_writev+0xc0/0x130
[<
000000004025037c>] compat_do_readv_writev+0x12c/0x340
[<
00000000402505f8>] compat_writev+0x68/0xa0
[<
0000000040251d88>] compat_SyS_writev+0x98/0xf8
Reason for this crash is a gcc miscompilation in the fault handlers of
pa_memcpy() which return the fault address instead of the copied bytes.
Since this seems to be a generic problem with gcc-4.7.x (and below), it's
better to simplify the fault handlers in pa_memcpy to avoid this problem.
Here is a simple reproducer for the problem:
int main(int argc, char **argv)
{
int fd, nbytes;
struct iovec wr_iovec[] = {
{ "TEST STRING ",32},
{ (char*)0x40005000,32} }; // random memory.
fd = open(DATA_FILE, O_RDWR | O_CREAT, 0666);
nbytes = writev(fd, wr_iovec, 2);
printf("return value = %d, errno %d (%s)\n",
nbytes, errno, strerror(errno));
return 0;
}
In addition, John David Anglin wrote:
There is no gcc PR as pa_memcpy is not legitimate C code. There is an
implicit assumption that certain variables will contain correct values
when an exception occurs and the code randomly jumps to one of the
exception blocks. There is no guarantee of this. If a PR was filed, it
would likely be marked as invalid.
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # 3.8+
Signed-off-by: Helge Deller <deller@gmx.de>
John David Anglin [Sat, 29 Jun 2013 20:42:12 +0000 (16:42 -0400)]
parisc: Ensure volatile space register %sr1 is not clobbered
I still see the occasional random segv on rp3440. Looking at one of
these (a code 15), it appeared the problem must be with the cache
handling of anonymous pages. Reviewing this, I noticed that the space
register %sr1 might be being clobbered when we flush an anonymous page.
Register %sr1 is used for TLB purges in a couple of places. These
purges are needed on PA8800 and PA8900 processors to ensure cache
consistency of flushed cache lines.
The solution here is simply to move the %sr1 load into the TLB lock
region needed to ensure that one purge executes at a time on SMP
systems. This was already the case for one use. After a few days of
operation, I haven't had a random segv on my rp3440.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # 3.10
Signed-off-by: Helge Deller <deller@gmx.de>
Helge Deller [Sat, 29 Jun 2013 20:08:03 +0000 (22:08 +0200)]
parisc: optimize mtsp(0,sr) inline assembly
If the value which should be moved into a space register is zero, we can
optimize the inline assembly to become "mtsp %r0,%srX".
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: <stable@vger.kernel.org> # 3.10
Helge Deller [Sat, 29 Jun 2013 11:31:24 +0000 (13:31 +0200)]
parisc: switch to gzip-compressed vmlinuz kernel
The latest PA-RISC Boot Loader (palo) allows loading of gzip compressed
vmlinuz kernels. So let's now switch to build a vmlinuz file when we
build a palo boot image.
PALO version 1.9 (or higher) is required for this which is available at
git://git.kernel.org/pub/scm/linux/kernel/git/deller/palo.git
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: <stable@vger.kernel.org> # 3.10
Helge Deller [Sat, 29 Jun 2013 11:24:16 +0000 (13:24 +0200)]
parisc: document the shadow registers
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: <stable@vger.kernel.org> # 3.10
Helge Deller [Fri, 21 Jun 2013 21:32:44 +0000 (23:32 +0200)]
parisc: more capabilities info in /proc/cpuinfo
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: <stable@vger.kernel.org> # 3.10
Helge Deller [Tue, 18 Jun 2013 21:21:25 +0000 (23:21 +0200)]
parisc: fix LMMIO mismatch between PAT length and MASK register
The LMMIO length reported by PAT and the length given by the LBA MASK
register are not consistent. This leads e.g. to a not-working ATI FireGL
card with the radeon DRM driver since the memory can't be mapped.
Fix this by correctly adjusting the resource sizes.
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: <stable@vger.kernel.org> # 3.10
Konstantin Khlebnikov [Mon, 8 Jul 2013 07:23:04 +0000 (11:23 +0400)]
drivers/net: caif: fix wrong rtnl_is_locked() usage
rtnl_is_locked() doesn't check who holds this lock, it just tells that it's
locked right now. if caif::ldisc_close really can be called under rtrnl_lock
then it should release net device in other context because there is no way
to grab rtnl_lock without deadlock.
This patch adds work which releases these devices. Also this patch fixes calling
dev_close/unregister_netdevice without rtnl_lock from caif_ser_exit().
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Dmitry Tarnyagin <dmitry.tarnyagin@lockless.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Konstantin Khlebnikov [Mon, 8 Jul 2013 07:22:51 +0000 (11:22 +0400)]
drivers/net: enic: release rtnl_lock on error-path
enic_change_mtu_work() must call rtnl_unlock() on all exiting paths.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Christian Benvenuti <benve@cisco.com>
Cc: Roopa Prabhu <roprabhu@cisco.com>
Cc: Neel Patel <neepatel@cisco.com>
Cc: Nishank Trivedi <nistrive@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael S. Tsirkin [Sun, 7 Jul 2013 11:26:53 +0000 (14:26 +0300)]
vhost-net: fix use-after-free in vhost_net_flush
vhost_net_ubuf_put_and_wait has a confusing name:
it will actually also free it's argument.
Thus since commit
1280c27f8e29acf4af2da914e80ec27c3dbd5c01
"vhost-net: flush outstanding DMAs on memory change"
vhost_net_flush tries to use the argument after passing it
to vhost_net_ubuf_put_and_wait, this results
in use after free.
To fix, don't free the argument in vhost_net_ubuf_put_and_wait,
add an new API for callers that want to free ubufs.
Acked-by: Asias He <asias@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Tue, 9 Jul 2013 19:55:13 +0000 (12:55 -0700)]
Merge tag 'for-linus-3.11-merge-window-part-1' of git://git./linux/kernel/git/ericvh/v9fs
Pull 9p update from Eric Van Hensbergen:
"Grab bag of little fixes and enhancements:
- optional security enhancements
- fix path coverage in MAINTAINERS
- switch to using most used protocol and transport as default
- clean up buffer dumps in trace code
Held off on RDMA patches as they need to be cleaned up a bit, but will
try to get the cleaned, checked, and pushed by mid-week"
* tag 'for-linus-3.11-merge-window-part-1' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
9p: Add rest of 9p files to MAINTAINERS entry
9p: trace: use %*ph to dump buffer
net/9p: Handle error in zero copy request correctly for 9p2000.u
net/9p: Use virtio transpart as the default transport
net/9p: Make 9P2000.L the default protocol for 9p file system
Jonas Gorski [Sun, 7 Jul 2013 22:44:55 +0000 (00:44 +0200)]
net: mv643xx_eth: do not use port number as platform device id
The port number is only local to the ethernet block, not global, so
there can be two ethernet blocks both using the same port, like
kirkwood with both using port 0.
Fix this by using the array index offset for the allocated platform
devices as the id.
Signed-off-by: Jonas Gorski <jogo@openwrt.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Tue, 9 Jul 2013 14:17:04 +0000 (16:17 +0200)]
net: sctp: confirm route during forward progress
This fix has been proposed originally by Vlad Yasevich. He says:
When SCTP makes forward progress (receives a SACK that acks new chunks,
renegs, or answeres 0-window probes) or when HB-ACK arrives, mark
the route as confirmed so we don't unnecessarily send NUD probes.
Having a simple SCTP client/server that exchange data chunks every 1sec,
without this patch ARP requests are sent periodically every 40-60sec.
With this fix applied, an ARP request is only done once right at the
"session" beginning. Also, when clearing the related ARP cache entry
manually during the session, a new request is correctly done. I have
only "backported" this to net-next and tested that it works, so full
credit goes to Vlad.
Signed-off-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 9 Jul 2013 19:45:43 +0000 (12:45 -0700)]
virtio_net: fix race in RX VQ processing
Michael S. Tsirkin says:
====================
Jason Wang reported a race in RX VQ processing:
virtqueue_enable_cb is called outside napi lock,
violating virtio serialization rules.
The race has been there from day 1, but it got especially nasty in 3.0
when commit
a5c262c5fd83ece01bd649fb08416c501d4c59d7
"virtio_ring: support event idx feature"
added more dependency on vq state.
Please review, and consider for 3.11 and stable.
Changes from v1:
- Added Jason's Tested-by tag
- minor coding style fix
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael S. Tsirkin [Tue, 9 Jul 2013 05:13:04 +0000 (08:13 +0300)]
virtio_net: fix race in RX VQ processing
virtio net called virtqueue_enable_cq on RX path after napi_complete, so
with NAPI_STATE_SCHED clear - outside the implicit napi lock.
This violates the requirement to synchronize virtqueue_enable_cq wrt
virtqueue_add_buf. In particular, used event can move backwards,
causing us to lose interrupts.
In a debug build, this can trigger panic within START_USE.
Jason Wang reports that he can trigger the races artificially,
by adding udelay() in virtqueue_enable_cb() after virtio_mb().
However, we must call napi_complete to clear NAPI_STATE_SCHED before
polling the virtqueue for used buffers, otherwise napi_schedule_prep in
a callback will fail, causing us to lose RX events.
To fix, call virtqueue_enable_cb_prepare with NAPI_STATE_SCHED
set (under napi lock), later call virtqueue_poll with
NAPI_STATE_SCHED clear (outside the lock).
Reported-by: Jason Wang <jasowang@redhat.com>
Tested-by: Jason Wang <jasowang@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael S. Tsirkin [Tue, 9 Jul 2013 10:19:18 +0000 (13:19 +0300)]
virtio: support unlocked queue poll
This adds a way to check ring empty state after enable_cb outside any
locks. Will be used by virtio_net.
Note: there's room for more optimization: caller is likely to have a
memory barrier already, which means we might be able to get rid of a
barrier here. Deferring this optimization until we do some
benchmarking.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jongsung Kim [Tue, 9 Jul 2013 08:36:00 +0000 (17:36 +0900)]
net/cadence/macb: fix bug/typo in extracting gem_irq_read_clear bit
Signed-off-by: Jongsung Kim <neidhard.kim@lge.com>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Geert Uytterhoeven [Tue, 9 Jul 2013 08:22:31 +0000 (10:22 +0200)]
Documentation: Fix references to defunct linux-net@vger.
linux-net@vger.kernel.org was replaced by netdev@oss.sgi.com was replaced
by netdev@vger.kernel.org.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Tue, 9 Jul 2013 19:39:10 +0000 (12:39 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/sage/ceph-client
Pull Ceph updates from Sage Weil:
"There is some follow-on RBD cleanup after the last window's code drop,
a series from Yan fixing multi-mds behavior in cephfs, and then a
sprinkling of bug fixes all around. Some warnings, sleeping while
atomic, a null dereference, and cleanups"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (36 commits)
libceph: fix invalid unsigned->signed conversion for timespec encoding
libceph: call r_unsafe_callback when unsafe reply is received
ceph: fix race between cap issue and revoke
ceph: fix cap revoke race
ceph: fix pending vmtruncate race
ceph: avoid accessing invalid memory
libceph: Fix NULL pointer dereference in auth client code
ceph: Reconstruct the func ceph_reserve_caps.
ceph: Free mdsc if alloc mdsc->mdsmap failed.
ceph: remove sb_start/end_write in ceph_aio_write.
ceph: avoid meaningless calling ceph_caps_revoking if sync_mode == WB_SYNC_ALL.
ceph: fix sleeping function called from invalid context.
ceph: move inode to proper flushing list when auth MDS changes
rbd: fix a couple warnings
ceph: clear migrate seq when MDS restarts
ceph: check migrate seq before changing auth cap
ceph: fix race between page writeback and truncate
ceph: reset iov_len when discarding cap release messages
ceph: fix cap release race
libceph: fix truncate size calculation
...
Linus Torvalds [Tue, 9 Jul 2013 19:33:09 +0000 (12:33 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/mason/linux-btrfs
Pull btrfs update from Chris Mason:
"These are the usual mixture of bugs, cleanups and performance fixes.
Miao has some really nice tuning of our crc code as well as our
transaction commits.
Josef is peeling off more and more problems related to early enospc,
and has a number of important bug fixes in here too"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (81 commits)
Btrfs: wait ordered range before doing direct io
Btrfs: only do the tree_mod_log_free_eb if this is our last ref
Btrfs: hold the tree mod lock in __tree_mod_log_rewind
Btrfs: make backref walking code handle skinny metadata
Btrfs: fix crash regarding to ulist_add_merge
Btrfs: fix several potential problems in copy_nocow_pages_for_inode
Btrfs: cleanup the code of copy_nocow_pages_for_inode()
Btrfs: fix oops when recovering the file data by scrub function
Btrfs: make the chunk allocator completely tree lockless
Btrfs: cleanup orphaned root orphan item
Btrfs: fix wrong mirror number tuning
Btrfs: cleanup redundant code in btrfs_submit_direct()
Btrfs: remove btrfs_sector_sum structure
Btrfs: check if we can nocow if we don't have data space
Btrfs: stop using try_to_writeback_inodes_sb_nr to flush delalloc
Btrfs: use a percpu to keep track of possibly pinned bytes
Btrfs: check for actual acls rather than just xattrs when caching no acl
Btrfs: move btrfs_truncate_page to btrfs_cont_expand instead of btrfs_truncate
Btrfs: optimize reada_for_balance
Btrfs: optimize read_block_for_search
...
Eliezer Tamir [Tue, 9 Jul 2013 10:09:21 +0000 (13:09 +0300)]
net/fs: change busy poll time accounting
Suggested by Linus:
Changed time accounting for busy-poll:
- Make it microsecond based.
- Use unsigned longs.
- Revert back to use time_after instead of time_in_range.
Reorder poll/select busy loop conditions:
- Clear busy_flag after one time we can't busy-poll.
- Only init busy_end if we actually are going to busy-poll.
Added one more missing need_resched() test.
Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Tue, 9 Jul 2013 19:29:12 +0000 (12:29 -0700)]
Merge tag 'for-linus-v3.11-rc1' of git://oss.sgi.com/xfs/xfs
Pull xfs update from Ben Myers:
"This includes several bugfixes, part of the work for project quotas
and group quotas to be used together, performance improvements for
inode creation/deletion, buffer readahead, and bulkstat,
implementation of the inode change count, an inode create transaction,
and the removal of a bunch of dead code.
There are also some duplicate commits that you already have from the
3.10-rc series.
- part of the work to allow project quotas and group quotas to be
used together
- inode change count
- inode create transaction
- block queue plugging in buffer readahead and bulkstat
- ordered log vector support
- removal of dead code in and around xfs_sync_inode_grab,
xfs_ialloc_get_rec, XFS_MOUNT_RETERR, XFS_ALLOCFREE_LOG_RES,
XFS_DIROP_LOG_RES, xfs_chash, ctl_table, and
xfs_growfs_data_private
- don't keep silent if sunit/swidth can not be changed via mount
- fix a leak of remote symlink blocks into the filesystem when xattrs
are used on symlinks
- fix for fiemap to return FIEMAP_EXTENT_UNKOWN flag on delay extents
- part of a fix for xfs_fsr
- disable speculative preallocation with small files
- performance improvements for inode creates and deletes"
* tag 'for-linus-v3.11-rc1' of git://oss.sgi.com/xfs/xfs: (61 commits)
xfs: Remove incore use of XFS_OQUOTA_ENFD and XFS_OQUOTA_CHKD
xfs: Change xfs_dquot_acct to be a 2-dimensional array
xfs: Code cleanup and removal of some typedef usage
xfs: Replace macro XFS_DQ_TO_QIP with a function
xfs: Replace macro XFS_DQUOT_TREE with a function
xfs: Define a new function xfs_is_quota_inode()
xfs: implement inode change count
xfs: Use inode create transaction
xfs: Inode create item recovery
xfs: Inode create transaction reservations
xfs: Inode create log items
xfs: Introduce an ordered buffer item
xfs: Introduce ordered log vector support
xfs: xfs_ifree doesn't need to modify the inode buffer
xfs: don't do IO when creating an new inode
xfs: don't use speculative prealloc for small files
xfs: plug directory buffer readahead
xfs: add pluging for bulkstat readahead
xfs: Remove dead function prototype xfs_sync_inode_grab()
xfs: Remove the left function variable from xfs_ialloc_get_rec()
...
Josh Durgin [Fri, 28 Jun 2013 20:13:16 +0000 (13:13 -0700)]
libceph: fix invalid unsigned->signed conversion for timespec encoding
__kernel_time_t is a long, which cannot hold a U32_MAX on 32-bit
architectures. Just drop this check as it has limited value.
This fixes a crash like:
[ 957.905812] kernel BUG at /srv/autobuild-ceph/gitbuilder.git/build/include/linux/ceph/decode.h:164!
[ 957.914849] Internal error: Oops - BUG: 0 [#1] SMP ARM
[ 957.919978] Modules linked in: rbd libceph libcrc32c ipmi_devintf ipmi_si ipmi_msghandler nfsd nfs_acl auth_rpcgss nfs fscache lockd sunrpc
[ 957.932547] CPU: 1 Tainted: G W (3.9.0-ceph-
19bb6a83-highbank #1)
[ 957.939881] PC is at ceph_osdc_build_request+0x8c/0x4f8 [libceph]
[ 957.945967] LR is at 0xec520904
[ 957.949103] pc : [<
bf13e76c>] lr : [<
ec520904>] psr:
20000153
[ 957.949103] sp :
ec753df8 ip :
00000001 fp :
ec53e100
[ 957.960571] r10:
ebef25c0 r9 :
ec5fa400 r8 :
ecbcc000
[ 957.965788] r7 :
00000000 r6 :
00000000 r5 :
ffffffff r4 :
00000020
[ 957.972307] r3 :
51cc8143 r2 :
ec520900 r1 :
ec753e58 r0 :
ec520908
[ 957.978827] Flags: nzCv IRQs on FIQs off Mode SVC_32 ISA ARM Segment user
[ 957.986039] Control:
10c5387d Table:
2c59c04a DAC:
00000015
[ 957.991777] Process rbd (pid: 2138, stack limit = 0xec752238)
[ 957.997514] Stack: (0xec753df8 to 0xec754000)
[ 958.001864] 3de0:
00000001 00000001
[ 958.010032] 3e00:
00000001 bf139744 ecbcc000 ec55a0a0 00000024 00000000 ebef25c0 fffffffe
[ 958.018204] 3e20:
ffffffff 00000000 00000000 00000001 ec5fa400 ebef25c0 ec53e100 bf166b68
[ 958.026377] 3e40:
00000000 0000220f fffffffe ffffffff ec753e58 bf13ff24 51cc8143 05b25ed2
[ 958.034548] 3e60:
00000001 00000000 00000000 bf1688d4 00000001 00000000 00000000 00000000
[ 958.042720] 3e80:
00000001 00000060 ec5fa400 ed53d200 ed439600 ed439300 00000001 00000060
[ 958.050888] 3ea0:
ec5fa400 ed53d200 00000000 bf16a320 00000000 ec53e100 00000040 ec753eb8
[ 958.059059] 3ec0:
ec51df00 ed53d7c0 ed53d200 ed53d7c0 00000000 ed53d7c0 ec5fa400 bf16ed70
[ 958.067230] 3ee0:
00000000 00000060 00000002 ed53d200 00000000 bf16acf4 ed53d7c0 ec752000
[ 958.075402] 3f00:
ed980e50 e954f5d8 00000000 00000060 ed53d240 ed53d258 ec753f80 c04f44a8
[ 958.083574] 3f20:
edb7910c ec664700 01ade920 c02e4c44 00000060 c016b3dc ec51de40 01adfb84
[ 958.091745] 3f40:
00000060 ec752000 ec753f80 ec752000 00000060 c0108444 00000007 ec51de48
[ 958.099914] 3f60:
ed0eb8c0 00000000 00000000 ec51de40 01adfb84 00000001 00000060 c0108858
[ 958.108085] 3f80:
00000000 00000000 51cc8143 00000060 01adfb84 00000007 00000004 c000dd68
[ 958.116257] 3fa0:
00000000 c000dbc0 00000060 01adfb84 00000007 01adfb84 00000060 01adfb80
[ 958.124429] 3fc0:
00000060 01adfb84 00000007 00000004 beded1a8 00000000 01adf2f0 01ade920
[ 958.132599] 3fe0:
00000000 beded180 b6811324 b6811334 800f0010 00000007 2e7f5821 2e7f5c21
[ 958.140815] [<
bf13e76c>] (ceph_osdc_build_request+0x8c/0x4f8 [libceph]) from [<
bf166b68>] (rbd_osd_req_format_write+0x50/0x7c [rbd])
[ 958.152739] [<
bf166b68>] (rbd_osd_req_format_write+0x50/0x7c [rbd]) from [<
bf1688d4>] (rbd_dev_header_watch_sync+0xe0/0x204 [rbd])
[ 958.164486] [<
bf1688d4>] (rbd_dev_header_watch_sync+0xe0/0x204 [rbd]) from [<
bf16a320>] (rbd_dev_image_probe+0x23c/0x850 [rbd])
[ 958.175967] [<
bf16a320>] (rbd_dev_image_probe+0x23c/0x850 [rbd]) from [<
bf16acf4>] (rbd_add+0x3c0/0x918 [rbd])
[ 958.185975] [<
bf16acf4>] (rbd_add+0x3c0/0x918 [rbd]) from [<
c02e4c44>] (bus_attr_store+0x20/0x2c)
[ 958.194850] [<
c02e4c44>] (bus_attr_store+0x20/0x2c) from [<
c016b3dc>] (sysfs_write_file+0x168/0x198)
[ 958.203984] [<
c016b3dc>] (sysfs_write_file+0x168/0x198) from [<
c0108444>] (vfs_write+0x9c/0x170)
[ 958.212768] [<
c0108444>] (vfs_write+0x9c/0x170) from [<
c0108858>] (sys_write+0x3c/0x70)
[ 958.220768] [<
c0108858>] (sys_write+0x3c/0x70) from [<
c000dbc0>] (ret_fast_syscall+0x0/0x30)
[ 958.229199] Code:
e59d1058 e5913000 e3530000 ba000114 (
e7f001f2)
CC: stable@vger.kernel.org # 3.4+
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Linus Torvalds [Tue, 9 Jul 2013 19:09:43 +0000 (12:09 -0700)]
Merge tag 'nfs-for-3.11-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client updates from Trond Myklebust:
"Feature highlights include:
- Add basic client support for NFSv4.2
- Add basic client support for Labeled NFS (selinux for NFSv4.2)
- Fix the use of credentials in NFSv4.1 stateful operations, and add
support for NFSv4.1 state protection.
Bugfix highlights:
- Fix another NFSv4 open state recovery race
- Fix an NFSv4.1 back channel session regression
- Various rpc_pipefs races
- Fix another issue with NFSv3 auth negotiation
Please note that Labeled NFS does require some additional support from
the security subsystem. The relevant changesets have all been
reviewed and acked by James Morris."
* tag 'nfs-for-3.11-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (54 commits)
NFS: Set NFS_CS_MIGRATION for NFSv4 mounts
NFSv4.1 Refactor nfs4_init_session and nfs4_init_channel_attrs
nfs: have NFSv3 try server-specified auth flavors in turn
nfs: have nfs_mount fake up a auth_flavs list when the server didn't provide it
nfs: move server_authlist into nfs_try_mount_request
nfs: refactor "need_mount" code out of nfs_try_mount
SUNRPC: PipeFS MOUNT notification optimization for dying clients
SUNRPC: split client creation routine into setup and registration
SUNRPC: fix races on PipeFS UMOUNT notifications
SUNRPC: fix races on PipeFS MOUNT notifications
NFSv4.1 use pnfs_device maxcount for the objectlayout gdia_maxcount
NFSv4.1 use pnfs_device maxcount for the blocklayout gdia_maxcount
NFSv4.1 Fix gdia_maxcount calculation to fit in ca_maxresponsesize
NFS: Improve legacy idmapping fallback
NFSv4.1 end back channel session draining
NFS: Apply v4.1 capabilities to v4.2
NFSv4.1: Clean up layout segment comparison helper names
NFSv4.1: layout segment comparison helpers should take 'const' parameters
NFSv4: Move the DNS resolver into the NFSv4 module
rpc_pipefs: only set rpc_dentry_ops if d_op isn't already set
...
Linus Torvalds [Tue, 9 Jul 2013 19:08:43 +0000 (12:08 -0700)]
Merge branch 'for_linus' of git://git./linux/kernel/git/jack/linux-fs
Pull ext3 fix and quota cleanup from Jan Kara:
"A fix of ext3 error reporting from fsync and a quota cleanup"
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
quota: Convert use of typedef ctl_table to struct ctl_table
ext3: Fix fsync error handling after filesystem abort.
Linus Torvalds [Tue, 9 Jul 2013 18:26:44 +0000 (11:26 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/viro/vfs
Pull third set of VFS updates from Al Viro:
"Misc stuff all over the place. There will be one more pile in a
couple of days"
This is an "evil merge" that also uses the new d_count helper in
fs/configfs/dir.c, missed by commit
84d08fa888e7 ("helper for reading
->d_count")
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
ncpfs: fix error return code in ncp_parse_options()
locks: move file_lock_list to a set of percpu hlist_heads and convert file_lock_lock to an lglock
seq_file: add seq_list_*_percpu helpers
f2fs: fix readdir incorrectness
mode_t whack-a-mole...
lustre: kill the pointless wrapper
helper for reading ->d_count
Dan Carpenter [Mon, 8 Jul 2013 23:01:58 +0000 (16:01 -0700)]
lib/scatterlist: error handling in __sg_alloc_table()
I was reviewing code which I suspected might allocate a zero size SG
table. That will cause memory corruption. Also we can't return before
doing the memset or we could end up using uninitialized memory in the
cleanup path.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Imre Deak <imre.deak@intel.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Maxim Levitsky <maximlevitsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Akinobu Mita [Mon, 8 Jul 2013 23:01:57 +0000 (16:01 -0700)]
scsi_debug: fix do_device_access() with wrap around range
do_device_access() is a function that abstracts copying SG list from/to
ramdisk storage (fake_storep).
It must deal with the ranges exceeding actual fake_storep size, because
such ranges are valid if virtual_gb is set greater than zero, and they
should be treated as fake_storep is repeatedly mirrored up to virtual
size.
Unfortunately, it can't deal with the range which wraps around the end of
fake_storep. A wrap around range is copied by two
sg_copy_{from,to}_buffer() calls, but sg_copy_{from,to}_buffer() can't
copy from/to in the middle of SG list, therefore the second call can't
copy correctly.
This fixes it by using sg_pcopy_{from,to}_buffer() that can copy from/to
the middle of SG list.
This also simplifies the assignment of sdb->resid in
fill_from_dev_buffer(). Because fill_from_dev_buffer() is now only called
once per command execution cycle. So it is not necessary to take care to
decrease sdb->resid if fill_from_dev_buffer() is called more than once.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
Cc: Douglas Gilbert <dgilbert@interlog.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Horia Geanta <horia.geanta@freescale.com>
Cc: Imre Deak <imre.deak@intel.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Akinobu Mita [Mon, 8 Jul 2013 23:01:55 +0000 (16:01 -0700)]
crypto: talitos: use sg_pcopy_to_buffer()
Use sg_pcopy_to_buffer() which is better than the function previously used.
Because it doesn't do kmap/kunmap for skipped pages.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
Cc: Douglas Gilbert <dgilbert@interlog.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Horia Geanta <horia.geanta@freescale.com>
Cc: Imre Deak <imre.deak@intel.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Akinobu Mita [Mon, 8 Jul 2013 23:01:54 +0000 (16:01 -0700)]
lib/scatterlist: introduce sg_pcopy_from_buffer() and sg_pcopy_to_buffer()
The only difference between sg_pcopy_{from,to}_buffer() and
sg_copy_{from,to}_buffer() is an additional argument that specifies the
number of bytes to skip the SG list before copying.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
Cc: Douglas Gilbert <dgilbert@interlog.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Horia Geanta <horia.geanta@freescale.com>
Cc: Imre Deak <imre.deak@intel.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Akinobu Mita [Mon, 8 Jul 2013 23:01:52 +0000 (16:01 -0700)]
lib/scatterlist: factor out sg_miter_get_next_page() from sg_miter_next()
This patchset introduces sg_pcopy_from_buffer() and sg_pcopy_to_buffer(),
which copy data between a linear buffer and an SG list.
The only difference between sg_pcopy_{from,to}_buffer() and
sg_copy_{from,to}_buffer() is an additional argument that specifies the
number of bytes to skip the SG list before copying.
The main reason for introducing these functions is to fix a problem in
scsi_debug module. And there is a local function in crypto/talitos
module, which can be replaced by sg_pcopy_to_buffer().
This patch:
sg_miter_get_next_page() is used to proceed page iterator to the next page
if necessary, and will be used to implement the variants of
sg_copy_{from,to}_buffer() later.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Imre Deak <imre.deak@intel.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
Cc: Douglas Gilbert <dgilbert@interlog.com>
Cc: Horia Geanta <horia.geanta@freescale.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Chanho Min [Mon, 8 Jul 2013 23:01:51 +0000 (16:01 -0700)]
crypto: add lz4 Cryptographic API
Add support for lz4 and lz4hc compression algorithm using the lib/lz4/*
codebase.
[akpm@linux-foundation.org: fix warnings]
Signed-off-by: Chanho Min <chanho.min@lge.com>
Cc: "Darrick J. Wong" <djwong@us.ibm.com>
Cc: Bob Pearson <rpearson@systemfabricworks.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Herbert Xu <herbert@gondor.hengli.com.au>
Cc: Yann Collet <yann.collet.73@gmail.com>
Cc: Kyungsik Lee <kyungsik.lee@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Chanho Min [Mon, 8 Jul 2013 23:01:49 +0000 (16:01 -0700)]
lib: add lz4 compressor module
This patchset is for supporting LZ4 compression and the crypto API using
it.
As shown below, the size of data is a little bit bigger but compressing
speed is faster under the enabled unaligned memory access. We can use
lz4 de/compression through crypto API as well. Also, It will be useful
for another potential user of lz4 compression.
lz4 Compression Benchmark:
Compiler: ARM gcc 4.6.4
ARMv7, 1 GHz based board
Kernel: linux 3.4
Uncompressed data Size: 101 MB
Compressed Size compression Speed
LZO 72.1MB 32.1MB/s, 33.0MB/s(UA)
LZ4 75.1MB 30.4MB/s, 35.9MB/s(UA)
LZ4HC 59.8MB 2.4MB/s, 2.5MB/s(UA)
- UA: Unaligned memory Access support
- Latest patch set for LZO applied
This patch:
Add support for LZ4 compression in the Linux Kernel. LZ4 Compression APIs
for kernel are based on LZ4 implementation by Yann Collet and were changed
for kernel coding style.
LZ4 homepage : http://fastcompression.blogspot.com/p/lz4.html
LZ4 source repository : http://code.google.com/p/lz4/
svn revision : r90
Two APIs are added:
lz4_compress() support basic lz4 compression whereas lz4hc_compress()
support high compression or CPU performance get lower but compression
ratio get higher. Also, we require the pre-allocated working memory with
the defined size and destination buffer must be allocated with the size of
lz4_compressbound.
[akpm@linux-foundation.org: make lz4_compresshcctx() static]
Signed-off-by: Chanho Min <chanho.min@lge.com>
Cc: "Darrick J. Wong" <djwong@us.ibm.com>
Cc: Bob Pearson <rpearson@systemfabricworks.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Herbert Xu <herbert@gondor.hengli.com.au>
Cc: Yann Collet <yann.collet.73@gmail.com>
Cc: Kyungsik Lee <kyungsik.lee@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kyungsik Lee [Mon, 8 Jul 2013 23:01:48 +0000 (16:01 -0700)]
arm: add support for LZ4-compressed kernel
Integrates the LZ4 decompression code to the arm pre-boot code.
Signed-off-by: Kyungsik Lee <kyungsik.lee@lge.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Florian Fainelli <florian@openwrt.org>
Cc: Yann Collet <yann.collet.73@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kyungsik Lee [Mon, 8 Jul 2013 23:01:46 +0000 (16:01 -0700)]
lib: add support for LZ4-compressed kernel
Add support for extracting LZ4-compressed kernel images, as well as
LZ4-compressed ramdisk images in the kernel boot process.
Signed-off-by: Kyungsik Lee <kyungsik.lee@lge.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Florian Fainelli <florian@openwrt.org>
Cc: Yann Collet <yann.collet.73@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kyungsik Lee [Mon, 8 Jul 2013 23:01:45 +0000 (16:01 -0700)]
decompressor: add LZ4 decompressor module
Add support for LZ4 decompression in the Linux Kernel. LZ4 Decompression
APIs for kernel are based on LZ4 implementation by Yann Collet.
Benchmark Results(PATCH v3)
Compiler: Linaro ARM gcc 4.6.2
1. ARMv7, 1.5GHz based board
Kernel: linux 3.4
Uncompressed Kernel Size: 14MB
Compressed Size Decompression Speed
LZO 6.7MB 20.1MB/s, 25.2MB/s(UA)
LZ4 7.3MB 29.1MB/s, 45.6MB/s(UA)
2. ARMv7, 1.7GHz based board
Kernel: linux 3.7
Uncompressed Kernel Size: 14MB
Compressed Size Decompression Speed
LZO 6.0MB 34.1MB/s, 52.2MB/s(UA)
LZ4 6.5MB 86.7MB/s
- UA: Unaligned memory Access support
- Latest patch set for LZO applied
This patch set is for adding support for LZ4-compressed Kernel. LZ4 is a
very fast lossless compression algorithm and it also features an extremely
fast decoder [1].
But we have five of decompressors already and one question which does
arise, however, is that of where do we stop adding new ones? This issue
had been discussed and came to the conclusion [2].
Russell King said that we should have:
- one decompressor which is the fastest
- one decompressor for the highest compression ratio
- one popular decompressor (eg conventional gzip)
If we have a replacement one for one of these, then it should do exactly
that: replace it.
The benchmark shows that an 8% increase in image size vs a 66% increase
in decompression speed compared to LZO(which has been known as the
fastest decompressor in the Kernel). Therefore the "fast but may not be
small" compression title has clearly been taken by LZ4 [3].
[1] http://code.google.com/p/lz4/
[2] http://thread.gmane.org/gmane.linux.kbuild.devel/9157
[3] http://thread.gmane.org/gmane.linux.kbuild.devel/9347
LZ4 homepage: http://fastcompression.blogspot.com/p/lz4.html
LZ4 source repository: http://code.google.com/p/lz4/
Signed-off-by: Kyungsik Lee <kyungsik.lee@lge.com>
Signed-off-by: Yann Collet <yann.collet.73@gmail.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Florian Fainelli <florian@openwrt.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Chanho Min [Mon, 8 Jul 2013 23:01:43 +0000 (16:01 -0700)]
lib: add weak clz/ctz functions
Some architectures need __c[lt]z[sd]i2() for __builtin_c[lt]z[ll] and
that causes a build failure. They can be implemented using the
fls()/__ffs() and overridden by linking arch-specific versions may not
be implemented yet.
This is required by "lib: add lz4 compressor module".
Reference: https://lkml.org/lkml/2013/4/18/603
Signed-off-by: Chanho Min <chanho.min@lge.com>
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: "Darrick J. Wong" <djwong@us.ibm.com>
Cc: Bob Pearson <rpearson@systemfabricworks.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Herbert Xu <herbert@gondor.hengli.com.au>
Cc: Yann Collet <yann.collet.73@gmail.com>
Cc: Kyungsik Lee <kyungsik.lee@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Robin Holt [Mon, 8 Jul 2013 23:01:42 +0000 (16:01 -0700)]
reboot: move arch/x86 reboot= handling to generic kernel
Merge together the unicore32, arm, and x86 reboot= command line
parameter handling.
Signed-off-by: Robin Holt <holt@sgi.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Russ Anderson <rja@sgi.com>
Cc: Robin Holt <holt@sgi.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Guan Xuetao <gxt@mprc.pku.edu.cn>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Robin Holt [Mon, 8 Jul 2013 23:01:40 +0000 (16:01 -0700)]
reboot: arm: change reboot_mode to use enum reboot_mode
Preparing to move the parsing of reboot= to generic kernel code forces
the change in reboot_mode handling to use the enum.
[akpm@linux-foundation.org: fix arch/arm/mach-socfpga/socfpga.c]
Signed-off-by: Robin Holt <holt@sgi.com>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Russ Anderson <rja@sgi.com>
Cc: Robin Holt <holt@sgi.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Robin Holt [Mon, 8 Jul 2013 23:01:39 +0000 (16:01 -0700)]
reboot: arm: prepare reboot_mode for moving to generic kernel code
Prepare for the moving the parsing of reboot= to the generic kernel code
by making reboot_mode into a more generic form.
Signed-off-by: Robin Holt <holt@sgi.com>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Russ Anderson <rja@sgi.com>
Cc: Robin Holt <holt@sgi.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Robin Holt [Mon, 8 Jul 2013 23:01:38 +0000 (16:01 -0700)]
reboot: arm: remove unused restart_mode fields from some arm subarchs
These restart_mode fields are not used at all. Remove them to make
moving the reboot= cmdline options to the general kernel easier.
Signed-off-by: Robin Holt <holt@sgi.com>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Russ Anderson <rja@sgi.com>
Cc: Robin Holt <holt@sgi.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Robin Holt [Mon, 8 Jul 2013 23:01:36 +0000 (16:01 -0700)]
reboot: unicore32: prepare reboot_mode for moving to generic kernel code
Prepare for the moving the parsing of reboot= to the generic kernel code
by making reboot_mode into a more generic form.
Signed-off-by: Robin Holt <holt@sgi.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Russ Anderson <rja@sgi.com>
Cc: Robin Holt <holt@sgi.com>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: H. Peter Anvin <hpa@zytor.com>
Acked-by: Guan Xuetao <gxt@mprc.pku.edu.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Robin Holt [Mon, 8 Jul 2013 23:01:35 +0000 (16:01 -0700)]
reboot: x86: prepare reboot_mode for moving to generic kernel code
Prepare for the moving the parsing of reboot= to the generic kernel code
by making reboot_mode into a more generic form.
Signed-off-by: Robin Holt <holt@sgi.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Miguel Boton <mboton.lkml@gmail.com>
Cc: Russ Anderson <rja@sgi.com>
Cc: Robin Holt <holt@sgi.com>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Robin Holt [Mon, 8 Jul 2013 23:01:34 +0000 (16:01 -0700)]
reboot: checkpatch.pl the new kernel/reboot.c file
Get the new file to pass scripts/checkpatch.pl
Signed-off-by: Robin Holt <holt@sgi.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Russ Anderson <rja@sgi.com>
Cc: Robin Holt <holt@sgi.com>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Robin Holt [Mon, 8 Jul 2013 23:01:32 +0000 (16:01 -0700)]
reboot: move shutdown/reboot related functions to kernel/reboot.c
This patch is preparatory. It moves reboot related syscall, etc
functions from kernel/sys.c to kernel/reboot.c.
Signed-off-by: Robin Holt <holt@sgi.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Russ Anderson <rja@sgi.com>
Cc: Robin Holt <holt@sgi.com>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Robin Holt [Mon, 8 Jul 2013 23:01:31 +0000 (16:01 -0700)]
reboot: remove -stable friendly PF_THREAD_BOUND define
Remove the prior patch's #define for easier backporting to the stable
releases.
Signed-off-by: Robin Holt <holt@sgi.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Russ Anderson <rja@sgi.com>
Cc: Robin Holt <holt@sgi.com>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Philippe De Muyter [Mon, 8 Jul 2013 23:01:30 +0000 (16:01 -0700)]
partitions/msdos: enumerate also AIX LVM partitions
Graft AIX partitions enumeration into partitions/msdos.c
There is already a AIX disks detection logic in msdos.c. When an AIX disk
has been found, and if configured to, call the aix partitions recognizer.
This avoids removal of AIX disks protection from msdos.c, avoids code
duplication, and ensures that AIX partitions enumeration is called before
plain msdos partitions enumeration.
Signed-off-by: Philippe De Muyter <phdm@macqel.be>
Cc: Karel Zak <kzak@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Philippe De Muyter [Mon, 8 Jul 2013 23:01:29 +0000 (16:01 -0700)]
partitions: add aix lvm partition support files
Add partitions/aix.h and partitions/aix.c.
AIX LVM permits to make "logical volumes" which are made of multiple
slices of multiple disks. The new code allows only access to the
"logical volumes" which are made of one slice on the probed disk, a
slice being a contiguous disk area. The code also detects "logical
volumes" made of multiple slices on the probed disk, but can not
describe them to the partition layer, because the partition layer
generic code does not support that. When such non-contiguous "logical
volumes" are detected, a diagnostic message is printed.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Philippe De Muyter <phdm@macqel.be>
Cc: Karel Zak <kzak@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Philippe De Muyter [Mon, 8 Jul 2013 23:01:28 +0000 (16:01 -0700)]
partitions/msdos.c: end-of-line whitespace and semicolon cleanup
Signed-off-by: Philippe De Muyter <phdm@macqel.be>
Cc: Karel Zak <kzak@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dan Carpenter [Mon, 8 Jul 2013 23:01:27 +0000 (16:01 -0700)]
mwave: fix info leak in mwave_ioctl()
Smatch complains that on 64 bit systems, there is a hole in the
MW_ABILITIES struct between ->component_count and ->component_list[].
It leaks stack information from the mwave_ioctl() function.
I've added a memset() to initialize the struct to zero.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Greg KH <greg@kroah.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Manfred Spraul [Mon, 8 Jul 2013 23:01:26 +0000 (16:01 -0700)]
ipc/sem.c: rename try_atomic_semop() to perform_atomic_semop(), docu update
Cleanup: Some minor points that I noticed while writing the previous
patches
1) The name try_atomic_semop() is misleading: The function performs the
operation (if it is possible).
2) Some documentation updates.
No real code change, a rename and documentation changes.
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Manfred Spraul [Mon, 8 Jul 2013 23:01:25 +0000 (16:01 -0700)]
ipc/sem.c: replace shared sem_otime with per-semaphore value
sem_otime contains the time of the last semaphore operation that
completed successfully. Every operation updates this value, thus access
from multiple cpus can cause thrashing.
Therefore the patch replaces the variable with a per-semaphore variable.
The per-array sem_otime is only calculated when required.
No performance improvement on a single-socket i3 - only important for
larger systems.
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Manfred Spraul [Mon, 8 Jul 2013 23:01:24 +0000 (16:01 -0700)]
ipc/sem.c: always use only one queue for alter operations
There are two places that can contain alter operations:
- the global queue: sma->pending_alter
- the per-semaphore queues: sma->sem_base[].pending_alter.
Since one of the queues must be processed first, this causes an odd
priorization of the wakeups: complex operations have priority over
simple ops.
The patch restores the behavior of linux <=3.0.9: The longest waiting
operation has the highest priority.
This is done by using only one queue:
- if there are complex ops, then sma->pending_alter is used.
- otherwise, the per-semaphore queues are used.
As a side effect, do_smart_update_queue() becomes much simpler: no more
goto logic.
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Manfred Spraul [Mon, 8 Jul 2013 23:01:23 +0000 (16:01 -0700)]
ipc/sem: separate wait-for-zero and alter tasks into seperate queues
Introduce separate queues for operations that do not modify the
semaphore values. Advantages:
- Simpler logic in check_restart().
- Faster update_queue(): Right now, all wait-for-zero operations are
always tested, even if the semaphore value is not 0.
- wait-for-zero gets again priority, as in linux <=3.0.9
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Manfred Spraul [Mon, 8 Jul 2013 23:01:22 +0000 (16:01 -0700)]
ipc/sem.c: cacheline align the semaphore structures
As now each semaphore has its own spinlock and parallel operations are
possible, give each semaphore its own cacheline.
On a i3 laptop, this gives up to 28% better performance:
#semscale 10 | grep "interleave 2"
- before:
Cpus 1, interleave 2 delay 0:
36109234 in 10 secs
Cpus 2, interleave 2 delay 0:
55276317 in 10 secs
Cpus 3, interleave 2 delay 0:
62411025 in 10 secs
Cpus 4, interleave 2 delay 0:
81963928 in 10 secs
-after:
Cpus 1, interleave 2 delay 0:
35527306 in 10 secs
Cpus 2, interleave 2 delay 0:
70922909 in 10 secs <<< + 28%
Cpus 3, interleave 2 delay 0:
80518538 in 10 secs
Cpus 4, interleave 2 delay 0:
89115148 in 10 secs <<< + 8.7%
i3, with 2 cores and with hyperthreading enabled. Interleave 2 in order
use first the full cores. HT partially hides the delay from cacheline
trashing, thus the improvement is "only" 8.7% if 4 threads are running.
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Manfred Spraul [Mon, 8 Jul 2013 23:01:20 +0000 (16:01 -0700)]
ipc/util.c, ipc_rcu_alloc: cacheline align allocation
Enforce that ipc_rcu_alloc returns a cacheline aligned pointer on SMP.
Rationale:
The SysV sem code tries to move the main spinlock into a seperate
cacheline (____cacheline_aligned_in_smp). This works only if
ipc_rcu_alloc returns cacheline aligned pointers. vmalloc and kmalloc
return cacheline algined pointers, the implementation of ipc_rcu_alloc
breaks that.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Davidlohr Bueso [Mon, 8 Jul 2013 23:01:19 +0000 (16:01 -0700)]
ipc: remove unused functions
We can now drop the msg_lock and msg_lock_check functions along with a
bogus comment introduced previously in semctl_down.
Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Davidlohr Bueso [Mon, 8 Jul 2013 23:01:18 +0000 (16:01 -0700)]
ipc,msg: shorten critical region in msgrcv
do_msgrcv() is the last msg queue function that abuses the ipc lock Take
it only when needed when actually updating msq.
Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Rik van Riel <riel@redhat.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Davidlohr Bueso [Mon, 8 Jul 2013 23:01:17 +0000 (16:01 -0700)]
ipc,msg: shorten critical region in msgsnd
do_msgsnd() is another function that does too many things with the ipc
object lock acquired. Take it only when needed when actually updating
msq.
Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>