Vineeth Remanan Pillai [Tue, 7 Feb 2017 18:59:01 +0000 (18:59 +0000)]
xen-netfront: Rework the fix for Rx stall during OOM and network stress
The commit
90c311b0eeea ("xen-netfront: Fix Rx stall during network
stress and OOM") caused the refill timer to be triggerred almost on
all invocations of xennet_alloc_rx_buffers for certain workloads.
This reworks the fix by reverting to the old behaviour and taking into
consideration the skb allocation failure. Refill timer is now triggered
on insufficient requests or skb allocation failure.
Signed-off-by: Vineeth Remanan Pillai <vineethp@amazon.com>
Fixes:
90c311b0eeea (xen-netfront: Fix Rx stall during network stress and OOM)
Reported-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Thu, 9 Feb 2017 03:05:26 +0000 (19:05 -0800)]
net: phy: Fix PHY module checks and NULL deref in phy_attach_direct()
The Generic PHY drivers gets assigned after we checked that the current
PHY driver is NULL, so we need to check a few things before we can
safely dereference d->driver. This would be causing a NULL deference to
occur when a system binds to the Generic PHY driver. Update
phy_attach_direct() to do the following:
- grab the driver module reference after we have assigned the Generic
PHY drivers accordingly, and remember we came from the generic PHY
path
- update the error path to clean up the module reference in case the
Generic PHY probe function fails
- split the error path involving phy_detacht() to avoid double free/put
since phy_detach() does all the clean up
- finally, have phy_detach() drop the module reference count before we
call device_release_driver() for the Generic PHY driver case
Fixes:
cafe8df8b9bc ("net: phy: Fix lack of reference count on PHY driver")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thanneeru Srinivasulu [Wed, 8 Feb 2017 12:39:00 +0000 (18:09 +0530)]
net: thunderx: Fix PHY autoneg for SGMII QLM mode
This patch fixes the case where there is no phydev attached
to a LMAC in DT due to non-existance of a PHY driver or due
to usage of non-stanadard PHY which doesn't support autoneg.
Changes dependeds on firmware to send correct info w.r.t
PHY and autoneg capability.
This patch also covers a case where a 10G/40G interface is used
as a 1G with convertors with Cortina PHY in between.
Signed-off-by: Thanneeru Srinivasulu <tsrinivasulu@cavium.com>
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Wed, 8 Feb 2017 07:10:13 +0000 (23:10 -0800)]
net: dsa: Do not destroy invalid network devices
dsa_slave_create() can fail, and dsa_user_port_unapply() will properly check
for the network device not being NULL before attempting to destroy it. We were
not setting the slave network device as NULL if dsa_slave_create() failed, so
we would later on be calling dsa_slave_destroy() on a now free'd and
unitialized network device, causing crashes in dsa_slave_destroy().
Fixes:
83c0afaec7b7 ("net: dsa: Add new binding implementation")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
WANG Cong [Tue, 7 Feb 2017 20:59:46 +0000 (12:59 -0800)]
ping: fix a null pointer dereference
Andrey reported a kernel crash:
general protection fault: 0000 [#1] SMP KASAN
Dumping ftrace buffer:
(ftrace buffer empty)
Modules linked in:
CPU: 2 PID: 3880 Comm: syz-executor1 Not tainted 4.10.0-rc6+ #124
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task:
ffff880060048040 task.stack:
ffff880069be8000
RIP: 0010:ping_v4_push_pending_frames net/ipv4/ping.c:647 [inline]
RIP: 0010:ping_v4_sendmsg+0x1acd/0x23f0 net/ipv4/ping.c:837
RSP: 0018:
ffff880069bef8b8 EFLAGS:
00010206
RAX:
dffffc0000000000 RBX:
ffff880069befb90 RCX:
0000000000000000
RDX:
0000000000000018 RSI:
ffff880069befa30 RDI:
00000000000000c2
RBP:
ffff880069befbb8 R08:
0000000000000008 R09:
0000000000000000
R10:
0000000000000002 R11:
0000000000000000 R12:
ffff880069befab0
R13:
ffff88006c624a80 R14:
ffff880069befa70 R15:
0000000000000000
FS:
00007f6f7c716700(0000) GS:
ffff88006de00000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
00000000004a6f28 CR3:
000000003a134000 CR4:
00000000000006e0
Call Trace:
inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:744
sock_sendmsg_nosec net/socket.c:635 [inline]
sock_sendmsg+0xca/0x110 net/socket.c:645
SYSC_sendto+0x660/0x810 net/socket.c:1687
SyS_sendto+0x40/0x50 net/socket.c:1655
entry_SYSCALL_64_fastpath+0x1f/0xc2
This is because we miss a check for NULL pointer for skb_peek() when
the queue is empty. Other places already have the same check.
Fixes:
c319b4d76b9e ("net: ipv4: add IPPROTO_ICMP socket kind")
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 8 Feb 2017 18:56:38 +0000 (13:56 -0500)]
Merge branch 'net-header-length-truncation'
Willem de Bruijn says:
====================
net: Fixes for header length truncation
Packets should not enter the stack with truncated link layer headers
and link layer headers should always be stored in the skb linear
segment.
Patch 1 ensures the first for PF_PACKET sockets
Patch 2 ensures the second for PF_PACKET GSO sockets without tx_ring
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Willem de Bruijn [Tue, 7 Feb 2017 20:57:21 +0000 (15:57 -0500)]
packet: round up linear to header len
Link layer protocols may unconditionally pull headers, as Ethernet
does in eth_type_trans. Ensure that the entire link layer header
always lies in the skb linear segment. tpacket_snd has such a check.
Extend this to packet_snd.
Variable length link layer headers complicate the computation
somewhat. Here skb->len may be smaller than dev->hard_header_len.
Round up the linear length to be at least as long as the smallest of
the two.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Willem de Bruijn [Tue, 7 Feb 2017 20:57:20 +0000 (15:57 -0500)]
net: introduce device min_header_len
The stack must not pass packets to device drivers that are shorter
than the minimum link layer header length.
Previously, packet sockets would drop packets smaller than or equal
to dev->hard_header_len, but this has false positives. Zero length
payload is used over Ethernet. Other link layer protocols support
variable length headers. Support for validation of these protocols
removed the min length check for all protocols.
Introduce an explicit dev->min_header_len parameter and drop all
packets below this value. Initially, set it to non-zero only for
Ethernet and loopback. Other protocols can follow in a patch to
net-next.
Fixes:
9ed988cd5915 ("packet: validate variable length ll headers")
Reported-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
WANG Cong [Wed, 8 Feb 2017 18:02:13 +0000 (10:02 -0800)]
sit: fix a double free on error path
Dmitry reported a double free in sit_init_net():
kernel BUG at mm/percpu.c:689!
invalid opcode: 0000 [#1] SMP KASAN
Dumping ftrace buffer:
(ftrace buffer empty)
Modules linked in:
CPU: 0 PID: 15692 Comm: syz-executor1 Not tainted 4.10.0-rc6-next-
20170206 #1
Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS Google 01/01/2011
task:
ffff8801c9cc27c0 task.stack:
ffff88017d1d8000
RIP: 0010:pcpu_free_area+0x68b/0x810 mm/percpu.c:689
RSP: 0018:
ffff88017d1df488 EFLAGS:
00010046
RAX:
0000000000010000 RBX:
00000000000007c0 RCX:
ffffc90002829000
RDX:
0000000000010000 RSI:
ffffffff81940efb RDI:
ffff8801db841d94
RBP:
ffff88017d1df590 R08:
dffffc0000000000 R09:
1ffffffff0bb3bdd
R10:
dffffc0000000000 R11:
00000000000135dd R12:
ffff8801db841d80
R13:
0000000000038e40 R14:
00000000000007c0 R15:
00000000000007c0
FS:
00007f6ea608f700(0000) GS:
ffff8801dbe00000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
000000002000aff8 CR3:
00000001c8d44000 CR4:
00000000001426f0
DR0:
0000000020000000 DR1:
0000000020000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000ffff0ff0 DR7:
0000000000000600
Call Trace:
free_percpu+0x212/0x520 mm/percpu.c:1264
ipip6_dev_free+0x43/0x60 net/ipv6/sit.c:1335
sit_init_net+0x3cb/0xa10 net/ipv6/sit.c:1831
ops_init+0x10a/0x530 net/core/net_namespace.c:115
setup_net+0x2ed/0x690 net/core/net_namespace.c:291
copy_net_ns+0x26c/0x530 net/core/net_namespace.c:396
create_new_namespaces+0x409/0x860 kernel/nsproxy.c:106
unshare_nsproxy_namespaces+0xae/0x1e0 kernel/nsproxy.c:205
SYSC_unshare kernel/fork.c:2281 [inline]
SyS_unshare+0x64e/0xfc0 kernel/fork.c:2231
entry_SYSCALL_64_fastpath+0x1f/0xc2
This is because when tunnel->dst_cache init fails, we free dev->tstats
once in ipip6_tunnel_init() and twice in sit_init_net(). This looks
redundant but its ndo_uinit() does not seem enough to clean up everything
here. So avoid this by setting dev->tstats to NULL after the first free,
at least for -net.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Wed, 8 Feb 2017 17:29:00 +0000 (09:29 -0800)]
lwtunnel: valid encap attr check should return 0 when lwtunnel is disabled
An error was reported upgrading to 4.9.8:
root@Typhoon:~# ip route add default table 210 nexthop dev eth0 via 10.68.64.1
weight 1 nexthop dev eth0 via 10.68.64.2 weight 1
RTNETLINK answers: Operation not supported
The problem occurs when CONFIG_LWTUNNEL is not enabled and a multipath
route is submitted.
The point of lwtunnel_valid_encap_type_attr is catch modules that
need to be loaded before any references are taken with rntl held. With
CONFIG_LWTUNNEL disabled, there will be no modules to load so the
lwtunnel_valid_encap_type_attr stub should just return 0.
Fixes:
9ed59592e3e3 ("lwtunnel: fix autoload of lwt modules")
Reported-by: pupilla@libero.it
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Marcus Huewe [Mon, 6 Feb 2017 17:34:56 +0000 (18:34 +0100)]
ipv6: addrconf: fix generation of new temporary addresses
Under some circumstances it is possible that no new temporary addresses
will be generated.
For instance, addrconf_prefix_rcv_add_addr() indirectly calls
ipv6_create_tempaddr(), which creates a tentative temporary address and
starts dad. Next, addrconf_prefix_rcv_add_addr() indirectly calls
addrconf_verify_rtnl(). Now, assume that the previously created temporary
address has the least preferred lifetime among all existing addresses and
is still tentative (that is, dad is still running). Hence, the next run of
addrconf_verify_rtnl() is performed when the preferred lifetime of the
temporary address ends. If dad succeeds before the next run, the temporary
address becomes deprecated during the next run, but no new temporary
address is generated.
In order to fix this, schedule the next addrconf_verify_rtnl() run slightly
before the temporary address becomes deprecated, if dad succeeded.
Signed-off-by: Marcus Huewe <suse-tux@gmx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Tue, 7 Feb 2017 20:10:57 +0000 (12:10 -0800)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) Load correct firmware in rtl8192ce wireless driver, from Jurij
Smakov.
2) Fix leak of tx_ring and tx_cq due to overwriting in mlx4 driver,
from Martin KaFai Lau.
3) Need to reference count PHY driver module when it is attached, from
Mao Wenan.
4) Don't do zero length vzalloc() in ethtool register dump, from
Stanislaw Gruszka.
5) Defer net_disable_timestamp() to a workqueue to get out of locking
issues, from Eric Dumazet.
6) We cannot drop the SKB dst when IP options refer to them, fix also
from Eric Dumazet.
7) Incorrect packet header offset calculations in ip6_gre, again from
Eric Dumazet.
8) Missing tcp_v6_restore_cb() causes use-after-free, from Eric too.
9) tcp_splice_read() can get into an infinite loop with URG, and hey
it's from Eric once more.
10) vnet_hdr_sz can change asynchronously, so read it once during
decision making in macvtap and tun, from Willem de Bruijn.
11) Can't use kernel stack for DMA transfers in USB networking drivers,
from Ben Hutchings.
12) Handle csum errors properly in UDP by calling the proper destructor,
from Eric Dumazet.
13) For non-deterministic softirq run when scheduling NAPI from a
workqueue in mlx4, from Benjamin Poirier.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (28 commits)
sctp: check af before verify address in sctp_addr_id2transport
sctp: avoid BUG_ON on sctp_wait_for_sndbuf
mlx4: Invoke softirqs after napi_reschedule
udp: properly cope with csum errors
catc: Use heap buffer for memory size test
catc: Combine failure cleanup code in catc_probe()
rtl8150: Use heap buffers for all register access
pegasus: Use heap buffers for all register access
macvtap: read vnet_hdr_size once
tun: read vnet_hdr_sz once
tcp: avoid infinite loop in tcp_splice_read()
hns: avoid stack overflow with CONFIG_KASAN
ipv6: Fix IPv6 packet loss in scenarios involving roaming + snooping switches
ipv6: tcp: add a missing tcp_v6_restore_cb()
nl80211: Fix mesh HT operation check
mac80211: Fix adding of mesh vendor IEs
mac80211: Allocate a sync skcipher explicitly for FILS AEAD
mac80211: Fix FILS AEAD protection in Association Request frame
ip6_gre: fix ip6gre_err() invalid reads
netlabel: out of bound access in cipso_v4_validate()
...
Hugh Dickins [Tue, 7 Feb 2017 19:11:16 +0000 (11:11 -0800)]
mm: fix KPF_SWAPCACHE in /proc/kpageflags
Commit
6326fec1122c ("mm: Use owner_priv bit for PageSwapCache, valid
when PageSwapBacked") aliased PG_swapcache to PG_owner_priv_1 (and
depending on PageSwapBacked being true).
As a result, the KPF_SWAPCACHE bit in '/proc/kpageflags' should now be
synthesized, instead of being shown on unrelated pages which just happen
to have PG_owner_priv_1 set.
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Xin Long [Tue, 7 Feb 2017 12:56:08 +0000 (20:56 +0800)]
sctp: check af before verify address in sctp_addr_id2transport
Commit
6f29a1306131 ("sctp: sctp_addr_id2transport should verify the
addr before looking up assoc") invoked sctp_verify_addr to verify the
addr.
But it didn't check af variable beforehand, once users pass an address
with family = 0 through sockopt, sctp_get_af_specific will return NULL
and NULL pointer dereference will be caused by af->sockaddr_len.
This patch is to fix it by returning NULL if af variable is NULL.
Fixes:
6f29a1306131 ("sctp: sctp_addr_id2transport should verify the addr before looking up assoc")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vineet Gupta [Tue, 7 Feb 2017 17:44:58 +0000 (09:44 -0800)]
ARC: [arcompact] brown paper bag bug in unaligned access delay slot fixup
Reported-by: Jo-Philipp Wich <jo@mein.io>
Fixes:
9aed02feae57bf7 ("ARC: [arcompact] handle unaligned access delay slot")
Cc: linux-kernel@vger.kernel.org
Cc: linux-snps-arc@lists.infradead.org
Cc: stable@vger.kernel.org
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Marcelo Ricardo Leitner [Mon, 6 Feb 2017 20:10:31 +0000 (18:10 -0200)]
sctp: avoid BUG_ON on sctp_wait_for_sndbuf
Alexander Popov reported that an application may trigger a BUG_ON in
sctp_wait_for_sndbuf if the socket tx buffer is full, a thread is
waiting on it to queue more data and meanwhile another thread peels off
the association being used by the first thread.
This patch replaces the BUG_ON call with a proper error handling. It
will return -EPIPE to the original sendmsg call, similarly to what would
have been done if the association wasn't found in the first place.
Acked-by: Alexander Popov <alex.popov@linux.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Benjamin Poirier [Mon, 6 Feb 2017 18:14:31 +0000 (10:14 -0800)]
mlx4: Invoke softirqs after napi_reschedule
mlx4 may schedule napi from a workqueue. Afterwards, softirqs are not run
in a deterministic time frame and the following message may be logged:
NOHZ: local_softirq_pending 08
The problem is the same as what was described in commit
ec13ee80145c
("virtio_net: invoke softirqs after __napi_schedule") and this patch
applies the same fix to mlx4.
Fixes:
07841f9d94c1 ("net/mlx4_en: Schedule napi when RX buffers allocation fails")
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Sun, 5 Feb 2017 17:25:24 +0000 (09:25 -0800)]
udp: properly cope with csum errors
Dmitry reported that UDP sockets being destroyed would trigger the
WARN_ON(atomic_read(&sk->sk_rmem_alloc)); in inet_sock_destruct()
It turns out we do not properly destroy skb(s) that have wrong UDP
checksum.
Thanks again to syzkaller team.
Fixes :
7c13f97ffde6 ("udp: do fwd memory scheduling on dequeue")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 7 Feb 2017 15:07:03 +0000 (10:07 -0500)]
Merge branch 'net-Fix-on-stack-USB-buffers'
Ben Hutchings says:
====================
net: Fix on-stack USB buffers
Allocating USB buffers on the stack is not portable, and no longer
works on x86_64 (with VMAP_STACK enabled as per default). This
series fixes all the instances I could find where USB networking
drivers do that.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Sat, 4 Feb 2017 16:57:04 +0000 (16:57 +0000)]
catc: Use heap buffer for memory size test
Allocating USB buffers on the stack is not portable, and no longer
works on x86_64 (with VMAP_STACK enabled as per default).
Fixes:
1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Sat, 4 Feb 2017 16:56:56 +0000 (16:56 +0000)]
catc: Combine failure cleanup code in catc_probe()
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Sat, 4 Feb 2017 16:56:32 +0000 (16:56 +0000)]
rtl8150: Use heap buffers for all register access
Allocating USB buffers on the stack is not portable, and no longer
works on x86_64 (with VMAP_STACK enabled as per default).
Fixes:
1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Sat, 4 Feb 2017 16:56:03 +0000 (16:56 +0000)]
pegasus: Use heap buffers for all register access
Allocating USB buffers on the stack is not portable, and no longer
works on x86_64 (with VMAP_STACK enabled as per default).
Fixes:
1da177e4c3f4 ("Linux-2.6.12-rc2")
References: https://bugs.debian.org/852556
Reported-by: Lisandro Damián Nicanor Pérez Meyer <lisandro@debian.org>
Tested-by: Lisandro Damián Nicanor Pérez Meyer <lisandro@debian.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 7 Feb 2017 03:41:27 +0000 (22:41 -0500)]
Merge branch 'read-vnet_hdr_sz-once'
Willem de Bruijn says:
====================
read vnet_hdr_sz once
Tuntap devices allow concurrent use and update of field vnet_hdr_sz.
Read the field once to avoid TOCTOU.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Willem de Bruijn [Fri, 3 Feb 2017 23:20:49 +0000 (18:20 -0500)]
macvtap: read vnet_hdr_size once
When IFF_VNET_HDR is enabled, a virtio_net header must precede data.
Data length is verified to be greater than or equal to expected header
length tun->vnet_hdr_sz before copying.
Macvtap functions read the value once, but unless READ_ONCE is used,
the compiler may ignore this and read multiple times. Enforce a single
read and locally cached value to avoid updates between test and use.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Suggested-by: Eric Dumazet <edumazet@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Willem de Bruijn [Fri, 3 Feb 2017 23:20:48 +0000 (18:20 -0500)]
tun: read vnet_hdr_sz once
When IFF_VNET_HDR is enabled, a virtio_net header must precede data.
Data length is verified to be greater than or equal to expected header
length tun->vnet_hdr_sz before copying.
Read this value once and cache locally, as it can be updated between
the test and use (TOCTOU).
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
CC: Eric Dumazet <edumazet@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 3 Feb 2017 22:59:38 +0000 (14:59 -0800)]
tcp: avoid infinite loop in tcp_splice_read()
Splicing from TCP socket is vulnerable when a packet with URG flag is
received and stored into receive queue.
__tcp_splice_read() returns 0, and sk_wait_data() immediately
returns since there is the problematic skb in queue.
This is a nice way to burn cpu (aka infinite loop) and trigger
soft lockups.
Again, this gem was found by syzkaller tool.
Fixes:
9c55e01c0cc8 ("[TCP]: Splice receive support.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Willy Tarreau <w@1wt.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Tue, 7 Feb 2017 03:36:04 +0000 (19:36 -0800)]
Merge branch 'libnvdimm-fixes' of git://git./linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm fixes from Dan Williams:
"None of these are showstoppers for 4.10 and could wait for 4.11 merge
window, but they are low enough risk for this late in the cycle and
the fixes have waiting users . They have received a build success
notification from the 0day robot, pass the latest ndctl unit tests,
and appeared in next:
- Fix a crash that can result when SIGINT is sent to a process that
is awaiting completion of an address range scrub command. We were
not properly cleaning up the workqueue after
wait_event_interruptible().
- Fix a memory hotplug failure condition that results from not
reserving enough space out of persistent memory for the memmap. By
default we align to 2M allocations that the memory hotplug code
assumes, but if the administrator specifies a non-default
4K-alignment then we can fail to correctly size the reservation.
- A one line fix to improve the predictability of libnvdimm block
device names. A common operation is to reconfigure /dev/pmem0 into
a different mode. For example, a reconfiguration might set a new
mode that reserves some of the capacity for a struct page memmap
array. It surprises users if the device name changes to
"/dev/pmem0.1" after the mode change and then back to /dev/pmem0
after a reboot.
- Add 'const' to some function pointer tables"
* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
libnvdimm, pfn: fix memmap reservation size versus 4K alignment
acpi, nfit: fix acpi_nfit_flush_probe() crash
libnvdimm, namespace: do not delete namespace-id 0
nvdimm: constify device_type structures
Linus Torvalds [Mon, 6 Feb 2017 23:11:04 +0000 (15:11 -0800)]
Merge tag 'pm-4.10-rc8' of git://git./linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These add a quirk to intel_pstate to work around a firmware setting
that leads to frequency scaling issues (discovered recently) on some
Intel Kaby Lake processors, fix up the recently added brcmstb-avs
cpufreq driver and avoid false-positive warnings from the runtime PM
framework triggered by recent changes in i915.
Specifics:
- Add an intel_pstate driver quirk to work around a firmware setting
that leads to frequency scaling issues on desktop Intel Kaby Lake
processors in some configurations if the hardware-managed P-states
(HWP) feature is in use (Srinivas Pandruvada)
- Fix up the recently added brcmstb-avs cpufreq driver: fix a bug
related to system suspend and change the sysfs interface to match
the user space expectations (Markus Mayer)
- Modify the runtime PM framework to avoid false-positive warnings
from the might_sleep_if() assertions in it (Rafael Wysocki)"
* tag 'pm-4.10-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM / runtime: Avoid false-positive warnings from might_sleep_if()
cpufreq: intel_pstate: Disable energy efficiency optimization
cpufreq: brcmstb-avs-cpufreq: properly retrieve P-state upon suspend
cpufreq: brcmstb-avs-cpufreq: extend sysfs entry brcm_avs_pmap
Linus Torvalds [Mon, 6 Feb 2017 22:42:34 +0000 (14:42 -0800)]
Merge tag 'dm-4.10-fixes' of git://git./linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
- a fix for a race in .request_fn request-based DM request handling vs
DM device destruction
- an RCU fix for dm-crypt's kernel keyring support that was included in
4.10-rc1
- a -Wbool-operation warning fix for DM multipath
* tag 'dm-4.10-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm crypt: replace RCU read-side section with rwsem
dm rq: cope with DM device destruction while in dm_old_request_fn()
dm mpath: cleanup -Wbool-operation warning in choose_pgpath()
Linus Torvalds [Mon, 6 Feb 2017 22:37:55 +0000 (14:37 -0800)]
Merge tag 'media/v4.10-3' of git://git./linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
"A few documentation fixes at CEC (with got promoted from staging for
4.10), and one fix on its core."
* tag 'media/v4.10-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
[media] cec: fix wrong last_la determination
[media] cec-intro.rst: mention the v4l-utils package and CEC utilities
[media] cec rst: remove "This API is not yet finalized" notice
Linus Torvalds [Mon, 6 Feb 2017 22:16:23 +0000 (14:16 -0800)]
Merge branch 'linus' of git://git./linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
- use-after-free in algif_aead
- modular aesni regression when pcbc is modular but absent
- bug causing IO page faults in ccp
- double list add in ccp
- NULL pointer dereference in qat (two patches)
- panic in chcr
- NULL pointer dereference in chcr
- out-of-bound access in chcr
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: chcr - Fix key length for RFC4106
crypto: algif_aead - Fix kernel panic on list_del
crypto: aesni - Fix failure when pcbc module is absent
crypto: ccp - Fix double add when creating new DMA command
crypto: ccp - Fix DMA operations when IOMMU is enabled
crypto: chcr - Check device is allocated before use
crypto: chcr - Fix panic on dma_unmap_sg
crypto: qat - zero esram only for DH85x devices
crypto: qat - fix bar discovery for c62x
Arnd Bergmann [Fri, 3 Feb 2017 16:35:46 +0000 (17:35 +0100)]
hns: avoid stack overflow with CONFIG_KASAN
The use of ACCESS_ONCE() looks like a micro-optimization to force gcc to use
an indexed load for the register address, but it has an absolutely detrimental
effect on builds with gcc-5 and CONFIG_KASAN=y, leading to a very likely
kernel stack overflow aside from very complex object code:
hisilicon/hns/hns_dsaf_gmac.c: In function 'hns_gmac_update_stats':
hisilicon/hns/hns_dsaf_gmac.c:419:1: error: the frame size of 2912 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
hisilicon/hns/hns_dsaf_ppe.c: In function 'hns_ppe_reset_common':
hisilicon/hns/hns_dsaf_ppe.c:390:1: error: the frame size of 1184 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
hisilicon/hns/hns_dsaf_ppe.c: In function 'hns_ppe_get_regs':
hisilicon/hns/hns_dsaf_ppe.c:621:1: error: the frame size of 3632 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
hisilicon/hns/hns_dsaf_rcb.c: In function 'hns_rcb_get_common_regs':
hisilicon/hns/hns_dsaf_rcb.c:970:1: error: the frame size of 2784 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
hisilicon/hns/hns_dsaf_gmac.c: In function 'hns_gmac_get_regs':
hisilicon/hns/hns_dsaf_gmac.c:641:1: error: the frame size of 5728 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
hisilicon/hns/hns_dsaf_rcb.c: In function 'hns_rcb_get_ring_regs':
hisilicon/hns/hns_dsaf_rcb.c:1021:1: error: the frame size of 2208 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
hisilicon/hns/hns_dsaf_main.c: In function 'hns_dsaf_comm_init':
hisilicon/hns/hns_dsaf_main.c:1209:1: error: the frame size of 1904 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
hisilicon/hns/hns_dsaf_xgmac.c: In function 'hns_xgmac_get_regs':
hisilicon/hns/hns_dsaf_xgmac.c:748:1: error: the frame size of 4704 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
hisilicon/hns/hns_dsaf_main.c: In function 'hns_dsaf_update_stats':
hisilicon/hns/hns_dsaf_main.c:2420:1: error: the frame size of 1088 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
hisilicon/hns/hns_dsaf_main.c: In function 'hns_dsaf_get_regs':
hisilicon/hns/hns_dsaf_main.c:2753:1: error: the frame size of 10768 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
This does not seem to happen any more with gcc-7, but removing the ACCESS_ONCE
seems safe anyway and it avoids a serious issue for some people. I have verified
that with gcc-5.3.1, the object code we get is better in the new version
both with and without CONFIG_KASAN, as we no longer allocate a 1344 byte
stack frame for hns_dsaf_get_regs() but otherwise have practically identical
object code.
With gcc-7.0.0, removing ACCESS_ONCE has no effect, the object code is already
good either way.
This patch is probably not urgent to get into 4.11 as only KASAN=y builds
with certain compilers are affected, but I still think it makes sense to
backport into older kernels.
Cc: stable@vger.kernel.org
Fixes:
511e6bc ("net: add Hisilicon Network Subsystem DSAF support")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Lüssing [Fri, 3 Feb 2017 07:11:03 +0000 (08:11 +0100)]
ipv6: Fix IPv6 packet loss in scenarios involving roaming + snooping switches
When for instance a mobile Linux device roams from one access point to
another with both APs sharing the same broadcast domain and a
multicast snooping switch in between:
1) (c) <~~~> (AP1) <--[SSW]--> (AP2)
2) (AP1) <--[SSW]--> (AP2) <~~~> (c)
Then currently IPv6 multicast packets will get lost for (c) until an
MLD Querier sends its next query message. The packet loss occurs
because upon roaming the Linux host so far stayed silent regarding
MLD and the snooping switch will therefore be unaware of the
multicast topology change for a while.
This patch fixes this by always resending MLD reports when an interface
change happens, for instance from NO-CARRIER to CARRIER state.
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 6 Feb 2017 16:20:48 +0000 (11:20 -0500)]
Merge tag 'wireless-drivers-for-davem-2017-02-06' of git://git./linux/kernel/git/kvalo/wireless-drivers
Kalle Valo says:
====================
wireless-drivers fixes for 4.10
Only one important fix for rtlwifi which fixes a regression introduced
in 4.9 and which caused problems for many users.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 6 Feb 2017 15:55:08 +0000 (10:55 -0500)]
Merge tag 'mac80211-for-davem-2017-02-06' of git://git./linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
A few simple fixes:
* fix FILS AEAD cipher usage to use the correct AAD vectors
and to use synchronous algorithms
* fix using mesh HT operation data from userspace
* fix adding mesh vendor elements to beacons & plink frames
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 6 Feb 2017 04:23:22 +0000 (20:23 -0800)]
ipv6: tcp: add a missing tcp_v6_restore_cb()
Dmitry reported use-after-free in ip6_datagram_recv_specific_ctl()
A similar bug was fixed in commit
8ce48623f0cf ("ipv6: tcp: restore
IP6CB for pktoptions skbs"), but I missed another spot.
tcp_v6_syn_recv_sock() can indeed set np->pktoptions from ireq->pktopts
Fixes:
971f10eca186 ("tcp: better TCP_SKB_CB layout to reduce cache line misses")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rafael J. Wysocki [Mon, 6 Feb 2017 13:52:10 +0000 (14:52 +0100)]
Merge branches 'pm-core-fixes' and 'pm-cpufreq-fixes'
* pm-core-fixes:
PM / runtime: Avoid false-positive warnings from might_sleep_if()
* pm-cpufreq-fixes:
cpufreq: intel_pstate: Disable energy efficiency optimization
cpufreq: brcmstb-avs-cpufreq: properly retrieve P-state upon suspend
cpufreq: brcmstb-avs-cpufreq: extend sysfs entry brcm_avs_pmap
Masashi Honma [Wed, 25 Jan 2017 23:56:13 +0000 (08:56 +0900)]
nl80211: Fix mesh HT operation check
A previous change to fix checks for NL80211_MESHCONF_HT_OPMODE
missed setting the flag when replacing FILL_IN_MESH_PARAM_IF_SET
with checking codes. This results in dropping the received HT
operation value when called by nl80211_update_mesh_config(). Fix
this by setting the flag properly.
Fixes:
9757235f451c ("nl80211: correct checks for NL80211_MESHCONF_HT_OPMODE value")
Signed-off-by: Masashi Honma <masashi.honma@gmail.com>
[rewrite commit message to use Fixes: line]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Thorsten Horstmann [Fri, 3 Feb 2017 13:38:29 +0000 (14:38 +0100)]
mac80211: Fix adding of mesh vendor IEs
The function ieee80211_ie_split_vendor doesn't return 0 on errors. Instead
it returns any offset < ielen when WLAN_EID_VENDOR_SPECIFIC is found. The
return value in mesh_add_vendor_ies must therefore be checked against
ifmsh->ie_len and not 0. Otherwise all ifmsh->ie starting with
WLAN_EID_VENDOR_SPECIFIC will be rejected.
Fixes:
082ebb0c258d ("mac80211: fix mesh beacon format")
Signed-off-by: Thorsten Horstmann <thorsten@defutech.de>
Signed-off-by: Mathias Kretschmer <mathias.kretschmer@fit.fraunhofer.de>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
[sven@narfation.org: Add commit message]
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Jouni Malinen [Sat, 4 Feb 2017 16:08:42 +0000 (18:08 +0200)]
mac80211: Allocate a sync skcipher explicitly for FILS AEAD
The skcipher could have been of the async variant which may return from
skcipher_encrypt() with -EINPROGRESS after having queued the request.
The FILS AEAD implementation here does not have code for dealing with
that possibility, so allocate a sync cipher explicitly to avoid
potential issues with hardware accelerators.
This is based on the patch sent out by Ard.
Fixes:
39404feee691 ("mac80211: FILS AEAD protection for station mode association frames")
Reported-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Jouni Malinen [Sat, 4 Feb 2017 11:59:22 +0000 (13:59 +0200)]
mac80211: Fix FILS AEAD protection in Association Request frame
Incorrect num_elem parameter value (1 vs. 5) was used in the
aes_siv_encrypt() call. This resulted in only the first one of the five
AAD vectors to SIV getting included in calculation. This does not
protect all the contents correctly and would not interoperate with a
standard compliant implementation.
Fix this by using the correct number. A matching fix is needed in the AP
side (hostapd) to get FILS authentication working properly.
Fixes:
39404feee691 ("mac80211: FILS AEAD protection for station mode association frames")
Reported-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Linus Torvalds [Sun, 5 Feb 2017 23:10:58 +0000 (15:10 -0800)]
Linux 4.10-rc7
Eric Dumazet [Sun, 5 Feb 2017 07:18:55 +0000 (23:18 -0800)]
ip6_gre: fix ip6gre_err() invalid reads
Andrey Konovalov reported out of bound accesses in ip6gre_err()
If GRE flags contains GRE_KEY, the following expression
*(((__be32 *)p) + (grehlen / 4) - 1)
accesses data ~40 bytes after the expected point, since
grehlen includes the size of IPv6 headers.
Let's use a "struct gre_base_hdr *greh" pointer to make this
code more readable.
p[1] becomes greh->protocol.
grhlen is the GRE header length.
Fixes:
c12b395a4664 ("gre: Support GRE over IPv6")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 3 Feb 2017 08:03:26 +0000 (00:03 -0800)]
netlabel: out of bound access in cipso_v4_validate()
syzkaller found another out of bound access in ip_options_compile(),
or more exactly in cipso_v4_validate()
Fixes:
20e2a8648596 ("cipso: handle CIPSO options correctly when NetLabel is disabled")
Fixes:
446fda4f2682 ("[NetLabel]: CIPSOv4 engine")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Paul Moore <paul@paul-moore.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Sat, 4 Feb 2017 19:16:52 +0000 (11:16 -0800)]
ipv4: keep skb->dst around in presence of IP options
Andrey Konovalov got crashes in __ip_options_echo() when a NULL skb->dst
is accessed.
ipv4_pktinfo_prepare() should not drop the dst if (evil) IP options
are present.
We could refine the test to the presence of ts_needtime or srr,
but IP options are not often used, so let's be conservative.
Thanks to syzkaller team for finding this bug.
Fixes:
d826eb14ecef ("ipv4: PKTINFO doesnt need dst reference")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Williams [Sat, 4 Feb 2017 22:47:31 +0000 (14:47 -0800)]
libnvdimm, pfn: fix memmap reservation size versus 4K alignment
When vmemmap_populate() allocates space for the memmap it does so in 2MB
sized chunks. The libnvdimm-pfn driver incorrectly accounts for this
when the alignment of the device is set to 4K. When this happens we
trigger memory allocation failures in altmap_alloc_block_buf() and
trigger warnings of the form:
WARNING: CPU: 0 PID: 3376 at arch/x86/mm/init_64.c:656 arch_add_memory+0xe4/0xf0
[..]
Call Trace:
dump_stack+0x86/0xc3
__warn+0xcb/0xf0
warn_slowpath_null+0x1d/0x20
arch_add_memory+0xe4/0xf0
devm_memremap_pages+0x29b/0x4e0
Fixes:
315c562536c4 ("libnvdimm, pfn: add 'align' attribute, default to HPAGE_SIZE")
Cc: <stable@vger.kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Linus Torvalds [Sat, 4 Feb 2017 20:18:01 +0000 (12:18 -0800)]
Merge branch 'irq-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
- Prevent double activation of interrupt lines, which causes problems
on certain interrupt controllers
- Handle the fallout of the above because x86 (ab)uses the activation
function to reconfigure interrupts under the hood.
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/irq: Make irq activate operations symmetric
irqdomain: Avoid activating interrupts more than once
Linus Torvalds [Sat, 4 Feb 2017 20:07:54 +0000 (12:07 -0800)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm
Pull KVM fix from Radim Krčmář:
"Fix a regression that prevented migration between hosts with different
XSAVE features even if the missing features were not used by the guest
(for stable)"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: do not save guest-unsupported XSAVE state
Linus Torvalds [Sat, 4 Feb 2017 18:44:15 +0000 (10:44 -0800)]
Merge tag 'char-misc-4.10-rc7' of git://git./linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes from Greg KH:
"Here are two bugfixes that resolve some reported issues. One in the
firmware loader, that should fix the much-reported problem of crashes
with it. The other is a hyperv fix for a reported regression.
Both have been in linux-next for a week or so with no reported issues"
* tag 'char-misc-4.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
Drivers: hv: vmbus: finally fix hv_need_to_signal_on_read()
firmware: fix NULL pointer dereference in __fw_load_abort()
Linus Torvalds [Sat, 4 Feb 2017 18:38:09 +0000 (10:38 -0800)]
Merge tag 'staging-4.10-rc7' of git://git./linux/kernel/git/gregkh/staging
Pull staging/IIO fixes from Greg KH:
"Here are a few small IIO and one staging driver fix for 4.10-rc7. They
fix some reported issues with the drivers.
All of them have been in linux-next for a week or so with no reported
issues"
* tag 'staging-4.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: greybus: timesync: validate platform state callback
iio: dht11: Use usleep_range instead of msleep for start signal
iio: adc: palmas_gpadc: retrieve a valid iio_dev in suspend/resume
iio: health: max30100: fixed parenthesis around FIFO count check
iio: health:
afe4404: retrieve a valid iio_dev in suspend/resume
iio: health:
afe4403: retrieve a valid iio_dev in suspend/resume
Linus Torvalds [Sat, 4 Feb 2017 18:35:55 +0000 (10:35 -0800)]
Merge tag 'usb-4.10-rc7' of git://git./linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are some small USB fixes for some reported issues, and the usual
number of new device ids for 4.10-rc7.
All of these, except the last new device id, have been in linux-next
for a while with no reported issues"
* tag 'usb-4.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
USB: serial: pl2303: add ATEN device ID
usb: gadget: f_fs: Assorted buffer overflow checks.
USB: Add quirk for WORLDE easykey.25 MIDI keyboard
usb: musb: Fix external abort on non-linefetch for musb_irq_work()
usb: musb: Fix host mode error -71 regression
USB: serial: option: add device ID for HP lt2523 (Novatel E371)
USB: serial: qcserial: add Dell DW5570 QDL
Linus Torvalds [Sat, 4 Feb 2017 00:18:51 +0000 (16:18 -0800)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fix from James Bottomley:
"A single fix this time: a fix for a virtqueue removal bug which only
appears to affect S390, but which results in the queue hanging forever
thus causing the machine to fail shutdown"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: virtio_scsi: Reject commands when virtqueue is broken
Rafael J. Wysocki [Fri, 3 Feb 2017 23:44:36 +0000 (00:44 +0100)]
PM / runtime: Avoid false-positive warnings from might_sleep_if()
The might_sleep_if() assertions in __pm_runtime_idle(),
__pm_runtime_suspend() and __pm_runtime_resume() may generate
false-positive warnings in some situations. For example, that
happens if a nested pm_runtime_get_sync()/pm_runtime_put() pair
is executed with disabled interrupts within an outer
pm_runtime_get_sync()/pm_runtime_put() section for the same device.
[Generally, pm_runtime_get_sync() may sleep, so it should not be
called with disabled interrupts, but in this particular case the
previous pm_runtime_get_sync() guarantees that the device will not
be suspended, so the inner pm_runtime_get_sync() will return
immediately after incrementing the device's usage counter.]
That started to happen in the i915 driver in 4.10-rc, leading to
the following splat:
BUG: sleeping function called from invalid context at drivers/base/power/runtime.c:1032
in_atomic(): 1, irqs_disabled(): 0, pid: 1500, name: Xorg
1 lock held by Xorg/1500:
#0: (&dev->struct_mutex){+.+.+.}, at:
[<
ffffffffa0680c13>] i915_mutex_lock_interruptible+0x43/0x140 [i915]
CPU: 0 PID: 1500 Comm: Xorg Not tainted
Call Trace:
dump_stack+0x85/0xc2
___might_sleep+0x196/0x260
__might_sleep+0x53/0xb0
__pm_runtime_resume+0x7a/0x90
intel_runtime_pm_get+0x25/0x90 [i915]
aliasing_gtt_bind_vma+0xaa/0xf0 [i915]
i915_vma_bind+0xaf/0x1e0 [i915]
i915_gem_execbuffer_relocate_entry+0x513/0x6f0 [i915]
i915_gem_execbuffer_relocate_vma.isra.34+0x188/0x250 [i915]
? trace_hardirqs_on+0xd/0x10
? i915_gem_execbuffer_reserve_vma.isra.31+0x152/0x1f0 [i915]
? i915_gem_execbuffer_reserve.isra.32+0x372/0x3a0 [i915]
i915_gem_do_execbuffer.isra.38+0xa70/0x1a40 [i915]
? __might_fault+0x4e/0xb0
i915_gem_execbuffer2+0xc5/0x260 [i915]
? __might_fault+0x4e/0xb0
drm_ioctl+0x206/0x450 [drm]
? i915_gem_execbuffer+0x340/0x340 [i915]
? __fget+0x5/0x200
do_vfs_ioctl+0x91/0x6f0
? __fget+0x111/0x200
? __fget+0x5/0x200
SyS_ioctl+0x79/0x90
entry_SYSCALL_64_fastpath+0x23/0xc6
even though the code triggering it is correct.
Unfortunately, the might_sleep_if() assertions in question are
too coarse-grained to cover such cases correctly, so make them
a bit less sensitive in order to avoid the false-positives.
Reported-and-tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Linus Torvalds [Fri, 3 Feb 2017 23:43:30 +0000 (15:43 -0800)]
Merge tag 'for_linus' of git://git./linux/kernel/git/mst/vhost
Pull virtio/vhost fixes from Michael S. Tsirkin:
"Last minute fixes:
- ARM DMA fix revert
- vhost endian-ness fix
- MAINTAINERS: email address change for Amit"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
MAINTAINERS: update email address for Amit Shah
vhost: fix initialization for vq->is_le
Revert "vring: Force use of DMA API for ARM-based systems with legacy devices"
Linus Torvalds [Fri, 3 Feb 2017 23:38:53 +0000 (15:38 -0800)]
Merge tag 'vfio-v4.10-rc7' of git://github.com/awilliam/linux-vfio
Pull VFIO fix from Alex Williamson:
"Fix an error path in SPAPR IOMMU backend (Alexey Kardashevskiy)"
* tag 'vfio-v4.10-rc7' of git://github.com/awilliam/linux-vfio:
vfio/spapr: Fix missing mutex unlock when creating a window
Srinivas Pandruvada [Fri, 3 Feb 2017 22:18:39 +0000 (14:18 -0800)]
cpufreq: intel_pstate: Disable energy efficiency optimization
Some Kabylake desktop processors may not reach max turbo when running in
HWP mode, even if running under sustained 100% utilization.
This occurs when the HWP.EPP (Energy Performance Preference) is set to
"balance_power" (0x80) -- the default on most systems.
It occurs because the platform BIOS may erroneously enable an
energy-efficiency setting -- MSR_IA32_POWER_CTL BIT-EE, which is not
recommended to be enabled on this SKU.
On the failing systems, this BIOS issue was not discovered when the
desktop motherboard was tested with Windows, because the BIOS also
neglects to provide the ACPI/CPPC table, that Windows requires to enable
HWP, and so Windows runs in legacy P-state mode, where this setting has
no effect.
Linux' intel_pstate driver does not require ACPI/CPPC to enable HWP, and
so it runs in HWP mode, exposing this incorrect BIOS configuration.
There are several ways to address this problem.
First, Linux can also run in legacy P-state mode on this system.
As intel_pstate is how Linux enables HWP, booting with
"intel_pstate=disable"
will run in acpi-cpufreq/ondemand legacy p-state mode.
Or second, the "performance" governor can be used with intel_pstate,
which will modify HWP.EPP to 0.
Or third, starting in 4.10, the
/sys/devices/system/cpu/cpufreq/policy*/energy_performance_preference
attribute in can be updated from "balance_power" to "performance".
Or fourth, apply this patch, which fixes the erroneous setting of
MSR_IA32_POWER_CTL BIT_EE on this model, allowing the default
configuration to function as designed.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>
Cc: 4.6+ <stable@vger.kernel.org> # 4.6+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Linus Torvalds [Fri, 3 Feb 2017 22:50:42 +0000 (14:50 -0800)]
Merge branch 'akpm' (patches from Andrew)
Merge fixes from Andrew Morton:
"8 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm, fs: check for fatal signals in do_generic_file_read()
fs: break out of iomap_file_buffered_write on fatal signals
base/memory, hotplug: fix a kernel oops in show_valid_zones()
mm/memory_hotplug.c: check start_pfn in test_pages_in_a_zone()
jump label: pass kbuild_cflags when checking for asm goto support
shmem: fix sleeping from atomic context
kasan: respect /proc/sys/kernel/traceoff_on_warning
zswap: disable changing params if init fails
Michal Hocko [Fri, 3 Feb 2017 21:13:29 +0000 (13:13 -0800)]
mm, fs: check for fatal signals in do_generic_file_read()
do_generic_file_read() can be told to perform a large request from
userspace. If the system is under OOM and the reading task is the OOM
victim then it has an access to memory reserves and finishing the full
request can lead to the full memory depletion which is dangerous. Make
sure we rather go with a short read and allow the killed task to
terminate.
Link: http://lkml.kernel.org/r/20170201092706.9966-3-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michal Hocko [Fri, 3 Feb 2017 21:13:26 +0000 (13:13 -0800)]
fs: break out of iomap_file_buffered_write on fatal signals
Tetsuo has noticed that an OOM stress test which performs large write
requests can cause the full memory reserves depletion. He has tracked
this down to the following path
__alloc_pages_nodemask+0x436/0x4d0
alloc_pages_current+0x97/0x1b0
__page_cache_alloc+0x15d/0x1a0 mm/filemap.c:728
pagecache_get_page+0x5a/0x2b0 mm/filemap.c:1331
grab_cache_page_write_begin+0x23/0x40 mm/filemap.c:2773
iomap_write_begin+0x50/0xd0 fs/iomap.c:118
iomap_write_actor+0xb5/0x1a0 fs/iomap.c:190
? iomap_write_end+0x80/0x80 fs/iomap.c:150
iomap_apply+0xb3/0x130 fs/iomap.c:79
iomap_file_buffered_write+0x68/0xa0 fs/iomap.c:243
? iomap_write_end+0x80/0x80
xfs_file_buffered_aio_write+0x132/0x390 [xfs]
? remove_wait_queue+0x59/0x60
xfs_file_write_iter+0x90/0x130 [xfs]
__vfs_write+0xe5/0x140
vfs_write+0xc7/0x1f0
? syscall_trace_enter+0x1d0/0x380
SyS_write+0x58/0xc0
do_syscall_64+0x6c/0x200
entry_SYSCALL64_slow_path+0x25/0x25
the oom victim has access to all memory reserves to make a forward
progress to exit easier. But iomap_file_buffered_write and other
callers of iomap_apply loop to complete the full request. We need to
check for fatal signals and back off with a short write instead.
As the iomap_apply delegates all the work down to the actor we have to
hook into those. All callers that work with the page cache are calling
iomap_write_begin so we will check for signals there. dax_iomap_actor
has to handle the situation explicitly because it copies data to the
userspace directly. Other callers like iomap_page_mkwrite work on a
single page or iomap_fiemap_actor do not allocate memory based on the
given len.
Fixes:
68a9f5e7007c ("xfs: implement iomap based buffered write path")
Link: http://lkml.kernel.org/r/20170201092706.9966-2-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: <stable@vger.kernel.org> [4.8+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Toshi Kani [Fri, 3 Feb 2017 21:13:23 +0000 (13:13 -0800)]
base/memory, hotplug: fix a kernel oops in show_valid_zones()
Reading a sysfs "memoryN/valid_zones" file leads to the following oops
when the first page of a range is not backed by struct page.
show_valid_zones() assumes that 'start_pfn' is always valid for
page_zone().
BUG: unable to handle kernel paging request at
ffffea017a000000
IP: show_valid_zones+0x6f/0x160
This issue may happen on x86-64 systems with 64GiB or more memory since
their memory block size is bumped up to 2GiB. [1] An example of such
systems is desribed below. 0x3240000000 is only aligned by 1GiB and
this memory block starts from 0x3200000000, which is not backed by
struct page.
BIOS-e820: [mem 0x0000003240000000-0x000000603fffffff] usable
Since test_pages_in_a_zone() already checks holes, fix this issue by
extending this function to return 'valid_start' and 'valid_end' for a
given range. show_valid_zones() then proceeds with the valid range.
[1] 'Commit
bdee237c0343 ("x86: mm: Use 2GB memory block size on
large-memory x86-64 systems")'
Link: http://lkml.kernel.org/r/20170127222149.30893-3-toshi.kani@hpe.com
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Zhang Zhen <zhenzhang.zhang@huawei.com>
Cc: Reza Arbab <arbab@linux.vnet.ibm.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: <stable@vger.kernel.org> [4.4+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Toshi Kani [Fri, 3 Feb 2017 21:13:20 +0000 (13:13 -0800)]
mm/memory_hotplug.c: check start_pfn in test_pages_in_a_zone()
Patch series "fix a kernel oops when reading sysfs valid_zones", v2.
A sysfs memory file is created for each 2GiB memory block on x86-64 when
the system has 64GiB or more memory. [1] When the start address of a
memory block is not backed by struct page, i.e. a memory range is not
aligned by 2GiB, reading its 'valid_zones' attribute file leads to a
kernel oops. This issue was observed on multiple x86-64 systems with
more than 64GiB of memory. This patch-set fixes this issue.
Patch 1 first fixes an issue in test_pages_in_a_zone(), which does not
test the start section.
Patch 2 then fixes the kernel oops by extending test_pages_in_a_zone()
to return valid [start, end).
Note for stable kernels: The memory block size change was made by commit
bdee237c0343 ("x86: mm: Use 2GB memory block size on large-memory x86-64
systems"), which was accepted to 3.9. However, this patch-set depends
on (and fixes) the change to test_pages_in_a_zone() made by commit
5f0f2887f4de ("mm/memory_hotplug.c: check for missing sections in
test_pages_in_a_zone()"), which was accepted to 4.4.
So, I recommend that we backport it up to 4.4.
[1] 'Commit
bdee237c0343 ("x86: mm: Use 2GB memory block size on
large-memory x86-64 systems")'
This patch (of 2):
test_pages_in_a_zone() does not check 'start_pfn' when it is aligned by
section since 'sec_end_pfn' is set equal to 'pfn'. Since this function
is called for testing the range of a sysfs memory file, 'start_pfn' is
always aligned by section.
Fix it by properly setting 'sec_end_pfn' to the next section pfn.
Also make sure that this function returns 1 only when the range belongs
to a zone.
Link: http://lkml.kernel.org/r/20170127222149.30893-2-toshi.kani@hpe.com
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: Andrew Banman <abanman@sgi.com>
Cc: Reza Arbab <arbab@linux.vnet.ibm.com>
Cc: Greg KH <greg@kroah.com>
Cc: <stable@vger.kernel.org> [4.4+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Lin [Fri, 3 Feb 2017 21:13:18 +0000 (13:13 -0800)]
jump label: pass kbuild_cflags when checking for asm goto support
Some versions of ARM GCC compiler such as Android toolchain throws in a
'-fpic' flag by default. This causes the gcc-goto check script to fail
although some config would have '-fno-pic' flag in the KBUILD_CFLAGS.
This patch passes the KBUILD_CFLAGS to the check script so that the
script does not rely on the default config from different compilers.
Link: http://lkml.kernel.org/r/20170120234329.78868-1-dtwlin@google.com
Signed-off-by: David Lin <dtwlin@google.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Michal Marek <mmarek@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kirill A. Shutemov [Fri, 3 Feb 2017 21:13:15 +0000 (13:13 -0800)]
shmem: fix sleeping from atomic context
Syzkaller fuzzer managed to trigger this:
BUG: sleeping function called from invalid context at mm/shmem.c:852
in_atomic(): 1, irqs_disabled(): 0, pid: 529, name: khugepaged
3 locks held by khugepaged/529:
#0: (shrinker_rwsem){++++..}, at: [<
ffffffff818d7ef1>] shrink_slab.part.59+0x121/0xd30 mm/vmscan.c:451
#1: (&type->s_umount_key#29){++++..}, at: [<
ffffffff81a63630>] trylock_super+0x20/0x100 fs/super.c:392
#2: (&(&sbinfo->shrinklist_lock)->rlock){+.+.-.}, at: [<
ffffffff818fd83e>] spin_lock include/linux/spinlock.h:302 [inline]
#2: (&(&sbinfo->shrinklist_lock)->rlock){+.+.-.}, at: [<
ffffffff818fd83e>] shmem_unused_huge_shrink+0x28e/0x1490 mm/shmem.c:427
CPU: 2 PID: 529 Comm: khugepaged Not tainted 4.10.0-rc5+ #201
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
shmem_undo_range+0xb20/0x2710 mm/shmem.c:852
shmem_truncate_range+0x27/0xa0 mm/shmem.c:939
shmem_evict_inode+0x35f/0xca0 mm/shmem.c:1030
evict+0x46e/0x980 fs/inode.c:553
iput_final fs/inode.c:1515 [inline]
iput+0x589/0xb20 fs/inode.c:1542
shmem_unused_huge_shrink+0xbad/0x1490 mm/shmem.c:446
shmem_unused_huge_scan+0x10c/0x170 mm/shmem.c:512
super_cache_scan+0x376/0x450 fs/super.c:106
do_shrink_slab mm/vmscan.c:378 [inline]
shrink_slab.part.59+0x543/0xd30 mm/vmscan.c:481
shrink_slab mm/vmscan.c:2592 [inline]
shrink_node+0x2c7/0x870 mm/vmscan.c:2592
shrink_zones mm/vmscan.c:2734 [inline]
do_try_to_free_pages+0x369/0xc80 mm/vmscan.c:2776
try_to_free_pages+0x3c6/0x900 mm/vmscan.c:2982
__perform_reclaim mm/page_alloc.c:3301 [inline]
__alloc_pages_direct_reclaim mm/page_alloc.c:3322 [inline]
__alloc_pages_slowpath+0xa24/0x1c30 mm/page_alloc.c:3683
__alloc_pages_nodemask+0x544/0xae0 mm/page_alloc.c:3848
__alloc_pages include/linux/gfp.h:426 [inline]
__alloc_pages_node include/linux/gfp.h:439 [inline]
khugepaged_alloc_page+0xc2/0x1b0 mm/khugepaged.c:750
collapse_huge_page+0x182/0x1fe0 mm/khugepaged.c:955
khugepaged_scan_pmd+0xfdf/0x12a0 mm/khugepaged.c:1208
khugepaged_scan_mm_slot mm/khugepaged.c:1727 [inline]
khugepaged_do_scan mm/khugepaged.c:1808 [inline]
khugepaged+0xe9b/0x1590 mm/khugepaged.c:1853
kthread+0x326/0x3f0 kernel/kthread.c:227
ret_from_fork+0x31/0x40 arch/x86/entry/entry_64.S:430
The iput() from atomic context was a bad idea: if after igrab() somebody
else calls iput() and we left with the last inode reference, our iput()
would lead to inode eviction and therefore sleeping.
This patch should fix the situation.
Link: http://lkml.kernel.org/r/20170131093141.GA15899@node.shutemov.name
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Peter Zijlstra [Fri, 3 Feb 2017 21:13:12 +0000 (13:13 -0800)]
kasan: respect /proc/sys/kernel/traceoff_on_warning
After much waiting I finally reproduced a KASAN issue, only to find my
trace-buffer empty of useful information because it got spooled out :/
Make kasan_report honour the /proc/sys/kernel/traceoff_on_warning
interface.
Link: http://lkml.kernel.org/r/20170125164106.3514-1-aryabinin@virtuozzo.com
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Acked-by: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dan Streetman [Fri, 3 Feb 2017 21:13:09 +0000 (13:13 -0800)]
zswap: disable changing params if init fails
Add zswap_init_failed bool that prevents changing any of the module
params, if init_zswap() fails, and set zswap_enabled to false. Change
'enabled' param to a callback, and check zswap_init_failed before
allowing any change to 'enabled', 'zpool', or 'compressor' params.
Any driver that is built-in to the kernel will not be unloaded if its
init function returns error, and its module params remain accessible for
users to change via sysfs. Since zswap uses param callbacks, which
assume that zswap has been initialized, changing the zswap params after
a failed initialization will result in WARNING due to the param
callbacks expecting a pool to already exist. This prevents that by
immediately exiting any of the param callbacks if initialization failed.
This was reported here:
https://marc.info/?l=linux-mm&m=
147004228125528&w=4
And fixes this WARNING:
[ 429.723476] WARNING: CPU: 0 PID: 5140 at mm/zswap.c:503 __zswap_pool_current+0x56/0x60
The warning is just noise, and not serious. However, when init fails,
zswap frees all its percpu dstmem pages and its kmem cache. The kmem
cache might be serious, if kmem_cache_alloc(NULL, gfp) has problems; but
the percpu dstmem pages are definitely a problem, as they're used as
temporary buffer for compressed pages before copying into place in the
zpool.
If the user does get zswap enabled after an init failure, then zswap
will likely Oops on the first page it tries to compress (or worse, start
corrupting memory).
Fixes:
90b0fc26d5db ("zswap: change zpool/compressor at runtime")
Link: http://lkml.kernel.org/r/20170124200259.16191-2-ddstreet@ieee.org
Signed-off-by: Dan Streetman <dan.streetman@canonical.com>
Reported-by: Marcin Miroslaw <marcin@mejor.pl>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 3 Feb 2017 21:46:38 +0000 (13:46 -0800)]
Merge tag 'regulator-fix-v4.10-rc6' of git://git./linux/kernel/git/broonie/regulator
Pull regulator fixes from Mark Brown:
"Three changes here: two run of the mill driver specific fixes and a
change from Mark Rutland which reverts some new device specific ACPI
binding code which was added during the merge window as there are
concerns about this sending the wrong signal about usage of regulators
in ACPI systems"
* tag 'regulator-fix-v4.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: fixed: Revert support for ACPI interface
regulator: axp20x: AXP806: Fix dcdcb being set instead of dcdce
regulator: twl6030: fix range comparison, allowing vsel = 59
Amit Shah [Fri, 3 Feb 2017 11:18:14 +0000 (16:48 +0530)]
MAINTAINERS: update email address for Amit Shah
I'm leaving my job at Red Hat, this email address will stop working next week.
Update it to one that I will have access to later.
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Halil Pasic [Mon, 30 Jan 2017 10:09:36 +0000 (11:09 +0100)]
vhost: fix initialization for vq->is_le
Currently, under certain circumstances vhost_init_is_le does just a part
of the initialization job, and depends on vhost_reset_is_le being called
too. For this reason vhost_vq_init_access used to call vhost_reset_is_le
when vq->private_data is NULL. This is not only counter intuitive, but
also real a problem because it breaks vhost_net. The bug was introduced to
vhost_net with commit
2751c9882b94 ("vhost: cross-endian support for
legacy devices"). The symptom is corruption of the vq's used.idx field
(virtio) after VHOST_NET_SET_BACKEND was issued as a part of the vhost
shutdown on a vq with pending descriptors.
Let us make sure the outcome of vhost_init_is_le never depend on the state
it is actually supposed to initialize, and fix virtio_net by removing the
reset from vhost_vq_init_access.
With the above, there is no reason for vhost_reset_is_le to do just half
of the job. Let us make vhost_reset_is_le reinitialize is_le.
Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Reported-by: Michael A. Tebolt <miket@us.ibm.com>
Reported-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Fixes: commit
2751c9882b94 ("vhost: cross-endian support for legacy devices")
Cc: <stable@vger.kernel.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Tested-by: Michael A. Tebolt <miket@us.ibm.com>
Michael S. Tsirkin [Fri, 3 Feb 2017 03:43:52 +0000 (05:43 +0200)]
Revert "vring: Force use of DMA API for ARM-based systems with legacy devices"
This reverts commit
c7070619f3408d9a0dffbed9149e6f00479cf43b.
This has been shown to regress on some ARM systems:
by forcing on DMA API usage for ARM systems, we have inadvertently
kicked open a hornets' nest in terms of cache-coherency. Namely that
unless the virtio device is explicitly described as capable of coherent
DMA by firmware, the DMA APIs on ARM and other DT-based platforms will
assume it is non-coherent. This turns out to cause a big problem for the
likes of QEMU and kvmtool, which generate virtio-mmio devices in their
guest DTs but neglect to add the often-overlooked "dma-coherent"
property; as a result, we end up with the guest making non-cacheable
accesses to the vring, the host doing so cacheably, both talking past
each other and things going horribly wrong.
We are working on a safer work-around.
Fixes:
c7070619f340 ("vring: Force use of DMA API for ARM-based systems with legacy devices")
Reported-by: Robin Murphy <robin.murphy@arm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Greg Kroah-Hartman [Fri, 3 Feb 2017 21:19:15 +0000 (22:19 +0100)]
Merge tag 'usb-serial-4.10-rc7' of git://git./linux/kernel/git/johan/usb-serial into usb-linus
Johan writes:
USB-serial fixes for v4.10-rc7
One more device ID for pl2303.
Signed-off-by: Johan Hovold <johan@kernel.org>
Eric Dumazet [Thu, 2 Feb 2017 18:31:35 +0000 (10:31 -0800)]
net: use a work queue to defer net_disable_timestamp() work
Dmitry reported a warning [1] showing that we were calling
net_disable_timestamp() -> static_key_slow_dec() from a non
process context.
Grabbing a mutex while holding a spinlock or rcu_read_lock()
is not allowed.
As Cong suggested, we now use a work queue.
It is possible netstamp_clear() exits while netstamp_needed_deferred
is not zero, but it is probably not worth trying to do better than that.
netstamp_needed_deferred atomic tracks the exact number of deferred
decrements.
[1]
[ INFO: suspicious RCU usage. ]
4.10.0-rc5+ #192 Not tainted
-------------------------------
./include/linux/rcupdate.h:561 Illegal context switch in RCU read-side
critical section!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 0
2 locks held by syz-executor14/23111:
#0: (sk_lock-AF_INET6){+.+.+.}, at: [<
ffffffff83a35c35>] lock_sock
include/net/sock.h:1454 [inline]
#0: (sk_lock-AF_INET6){+.+.+.}, at: [<
ffffffff83a35c35>]
rawv6_sendmsg+0x1e65/0x3ec0 net/ipv6/raw.c:919
#1: (rcu_read_lock){......}, at: [<
ffffffff83ae2678>] nf_hook
include/linux/netfilter.h:201 [inline]
#1: (rcu_read_lock){......}, at: [<
ffffffff83ae2678>]
__ip6_local_out+0x258/0x840 net/ipv6/output_core.c:160
stack backtrace:
CPU: 2 PID: 23111 Comm: syz-executor14 Not tainted 4.10.0-rc5+ #192
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:15 [inline]
dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
lockdep_rcu_suspicious+0x139/0x180 kernel/locking/lockdep.c:4452
rcu_preempt_sleep_check include/linux/rcupdate.h:560 [inline]
___might_sleep+0x560/0x650 kernel/sched/core.c:7748
__might_sleep+0x95/0x1a0 kernel/sched/core.c:7739
mutex_lock_nested+0x24f/0x1730 kernel/locking/mutex.c:752
atomic_dec_and_mutex_lock+0x119/0x160 kernel/locking/mutex.c:1060
__static_key_slow_dec+0x7a/0x1e0 kernel/jump_label.c:149
static_key_slow_dec+0x51/0x90 kernel/jump_label.c:174
net_disable_timestamp+0x3b/0x50 net/core/dev.c:1728
sock_disable_timestamp+0x98/0xc0 net/core/sock.c:403
__sk_destruct+0x27d/0x6b0 net/core/sock.c:1441
sk_destruct+0x47/0x80 net/core/sock.c:1460
__sk_free+0x57/0x230 net/core/sock.c:1468
sock_wfree+0xae/0x120 net/core/sock.c:1645
skb_release_head_state+0xfc/0x200 net/core/skbuff.c:655
skb_release_all+0x15/0x60 net/core/skbuff.c:668
__kfree_skb+0x15/0x20 net/core/skbuff.c:684
kfree_skb+0x16e/0x4c0 net/core/skbuff.c:705
inet_frag_destroy+0x121/0x290 net/ipv4/inet_fragment.c:304
inet_frag_put include/net/inet_frag.h:133 [inline]
nf_ct_frag6_gather+0x1106/0x3840
net/ipv6/netfilter/nf_conntrack_reasm.c:617
ipv6_defrag+0x1be/0x2b0 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:68
nf_hook_entry_hookfn include/linux/netfilter.h:102 [inline]
nf_hook_slow+0xc3/0x290 net/netfilter/core.c:310
nf_hook include/linux/netfilter.h:212 [inline]
__ip6_local_out+0x489/0x840 net/ipv6/output_core.c:160
ip6_local_out+0x2d/0x170 net/ipv6/output_core.c:170
ip6_send_skb+0xa1/0x340 net/ipv6/ip6_output.c:1722
ip6_push_pending_frames+0xb3/0xe0 net/ipv6/ip6_output.c:1742
rawv6_push_pending_frames net/ipv6/raw.c:613 [inline]
rawv6_sendmsg+0x2d1a/0x3ec0 net/ipv6/raw.c:927
inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:744
sock_sendmsg_nosec net/socket.c:635 [inline]
sock_sendmsg+0xca/0x110 net/socket.c:645
sock_write_iter+0x326/0x600 net/socket.c:848
do_iter_readv_writev+0x2e3/0x5b0 fs/read_write.c:695
do_readv_writev+0x42c/0x9b0 fs/read_write.c:872
vfs_writev+0x87/0xc0 fs/read_write.c:911
do_writev+0x110/0x2c0 fs/read_write.c:944
SYSC_writev fs/read_write.c:1017 [inline]
SyS_writev+0x27/0x30 fs/read_write.c:1014
entry_SYSCALL_64_fastpath+0x1f/0xc2
RIP: 0033:0x445559
RSP: 002b:
00007f6f46fceb58 EFLAGS:
00000292 ORIG_RAX:
0000000000000014
RAX:
ffffffffffffffda RBX:
0000000000000005 RCX:
0000000000445559
RDX:
0000000000000001 RSI:
0000000020f1eff0 RDI:
0000000000000005
RBP:
00000000006e19c0 R08:
0000000000000000 R09:
0000000000000000
R10:
0000000000000000 R11:
0000000000000292 R12:
0000000000700000
R13:
0000000020f59000 R14:
0000000000000015 R15:
0000000000020400
BUG: sleeping function called from invalid context at
kernel/locking/mutex.c:752
in_atomic(): 1, irqs_disabled(): 0, pid: 23111, name: syz-executor14
INFO: lockdep is turned off.
CPU: 2 PID: 23111 Comm: syz-executor14 Not tainted 4.10.0-rc5+ #192
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:15 [inline]
dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
___might_sleep+0x47e/0x650 kernel/sched/core.c:7780
__might_sleep+0x95/0x1a0 kernel/sched/core.c:7739
mutex_lock_nested+0x24f/0x1730 kernel/locking/mutex.c:752
atomic_dec_and_mutex_lock+0x119/0x160 kernel/locking/mutex.c:1060
__static_key_slow_dec+0x7a/0x1e0 kernel/jump_label.c:149
static_key_slow_dec+0x51/0x90 kernel/jump_label.c:174
net_disable_timestamp+0x3b/0x50 net/core/dev.c:1728
sock_disable_timestamp+0x98/0xc0 net/core/sock.c:403
__sk_destruct+0x27d/0x6b0 net/core/sock.c:1441
sk_destruct+0x47/0x80 net/core/sock.c:1460
__sk_free+0x57/0x230 net/core/sock.c:1468
sock_wfree+0xae/0x120 net/core/sock.c:1645
skb_release_head_state+0xfc/0x200 net/core/skbuff.c:655
skb_release_all+0x15/0x60 net/core/skbuff.c:668
__kfree_skb+0x15/0x20 net/core/skbuff.c:684
kfree_skb+0x16e/0x4c0 net/core/skbuff.c:705
inet_frag_destroy+0x121/0x290 net/ipv4/inet_fragment.c:304
inet_frag_put include/net/inet_frag.h:133 [inline]
nf_ct_frag6_gather+0x1106/0x3840
net/ipv6/netfilter/nf_conntrack_reasm.c:617
ipv6_defrag+0x1be/0x2b0 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:68
nf_hook_entry_hookfn include/linux/netfilter.h:102 [inline]
nf_hook_slow+0xc3/0x290 net/netfilter/core.c:310
nf_hook include/linux/netfilter.h:212 [inline]
__ip6_local_out+0x489/0x840 net/ipv6/output_core.c:160
ip6_local_out+0x2d/0x170 net/ipv6/output_core.c:170
ip6_send_skb+0xa1/0x340 net/ipv6/ip6_output.c:1722
ip6_push_pending_frames+0xb3/0xe0 net/ipv6/ip6_output.c:1742
rawv6_push_pending_frames net/ipv6/raw.c:613 [inline]
rawv6_sendmsg+0x2d1a/0x3ec0 net/ipv6/raw.c:927
inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:744
sock_sendmsg_nosec net/socket.c:635 [inline]
sock_sendmsg+0xca/0x110 net/socket.c:645
sock_write_iter+0x326/0x600 net/socket.c:848
do_iter_readv_writev+0x2e3/0x5b0 fs/read_write.c:695
do_readv_writev+0x42c/0x9b0 fs/read_write.c:872
vfs_writev+0x87/0xc0 fs/read_write.c:911
do_writev+0x110/0x2c0 fs/read_write.c:944
SYSC_writev fs/read_write.c:1017 [inline]
SyS_writev+0x27/0x30 fs/read_write.c:1014
entry_SYSCALL_64_fastpath+0x1f/0xc2
RIP: 0033:0x445559
Fixes:
b90e5794c5bd ("net: dont call jump_label_dec from irq context")
Suggested-by: Cong Wang <xiyou.wangcong@gmail.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 3 Feb 2017 20:01:54 +0000 (12:01 -0800)]
Merge tag 'mmc-v4.10-rc6' of git://git./linux/kernel/git/ulfh/mmc
Pull MMC fix from Ulf Hansson:
"MMC host: sdhci: Avoid hang when receiving spurious CARD_INT
interrupts"
* tag 'mmc-v4.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: sdhci: Ignore unexpected CARD_INT interrupts
Dan Williams [Thu, 2 Feb 2017 18:31:00 +0000 (10:31 -0800)]
acpi, nfit: fix acpi_nfit_flush_probe() crash
We queue an on-stack work item to 'nfit_wq' and wait for it to complete
as part of a 'flush_probe' request. However, if the user cancels the
wait we need to make sure the item is flushed from the queue otherwise
we are leaving an out-of-scope stack address on the work list.
BUG: unable to handle kernel paging request at
ffffbcb3c72f7cd0
IP: [<
ffffffffa9413a7b>] __list_add+0x1b/0xb0
[..]
RIP: 0010:[<
ffffffffa9413a7b>] [<
ffffffffa9413a7b>] __list_add+0x1b/0xb0
RSP: 0018:
ffffbcb3c7ba7c00 EFLAGS:
00010046
[..]
Call Trace:
[<
ffffffffa90bb11a>] insert_work+0x3a/0xc0
[<
ffffffffa927fdda>] ? seq_open+0x5a/0xa0
[<
ffffffffa90bb30a>] __queue_work+0x16a/0x460
[<
ffffffffa90bbb08>] queue_work_on+0x38/0x40
[<
ffffffffc0cf2685>] acpi_nfit_flush_probe+0x95/0xc0 [nfit]
[<
ffffffffc0cf25d0>] ? nfit_visible+0x40/0x40 [nfit]
[<
ffffffffa9571495>] wait_probe_show+0x25/0x60
[<
ffffffffa9546b30>] dev_attr_show+0x20/0x50
Fixes:
7ae0fa439faf ("nfit, libnvdimm: async region scrub workqueue")
Cc: <stable@vger.kernel.org>
Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Linus Torvalds [Fri, 3 Feb 2017 19:32:25 +0000 (11:32 -0800)]
Merge tag 'drm-fixes-for-v4.10-rc7' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"Another fixes pull for v4.10, it's a bit big due to the backport of
the VMA fixes for i915 that should fix the oops on shutdown problems
that you've worked around.
There are also two drm core connector registration fixes, a bunch of
nouveau regression fixes and two AMD fixes"
* tag 'drm-fixes-for-v4.10-rc7' of git://people.freedesktop.org/~airlied/linux:
drm/radeon: Fix vram_size/visible values in DRM_RADEON_GEM_INFO ioctl
drm/amdgpu/si: fix crash on headless asics
drm/i915: Track pinned vma in intel_plane_state
drm/atomic: Unconditionally call prepare_fb.
drm/atomic: Fix double free in drm_atomic_state_default_clear
drm/nouveau/kms/nv50: request vblank events for commits that send completion events
drm/nouveau/nv1a,nv1f/disp: fix memory clock rate retrieval
drm/nouveau/disp/gt215: Fix HDA ELD handling (thus, HDMI audio) on gt215
drm/nouveau/nouveau/led: prevent compiling the led-code if nouveau=y and leds=m
drm/nouveau/disp/mcp7x: disable dptmds workaround
drm/nouveau: prevent userspace from deleting client object
drm/nouveau/fence/g84-: protect against concurrent access to semaphore buffers
drm: Don't race connector registration
drm: prevent double-(un)registration for connectors
Linus Torvalds [Fri, 3 Feb 2017 19:10:06 +0000 (11:10 -0800)]
Merge tag 'powerpc-4.10-3' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"The main change is we're reverting the initial stack protector support
we merged this cycle. It turns out to not work on toolchains built
with libc support, and fixing it will be need to wait for another
release.
And the rest are all fairly minor:
- Some pasemi machines were not booting due to a missing error check
in prom_find_boot_cpu()
- In EEH we were checking a pointer rather than the bool it pointed
to
- The clang build was broken by a BUILD_BUG_ON() we added.
- The radix (Power9 only) version of map_kernel_page() was broken if
our memory size was a multiple of 2MB, which it generally isn't
Thanks to: Darren Stevens, Gavin Shan, Reza Arbab"
* tag 'powerpc-4.10-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/mm: Use the correct pointer when setting a 2MB pte
powerpc: Fix build failure with clang due to BUILD_BUG_ON()
powerpc: Revert the initial stack protector support
powerpc/eeh: Fix wrong flag passed to eeh_unfreeze_pe()
powerpc: Add missing error check to prom_find_boot_cpu()
Linus Torvalds [Fri, 3 Feb 2017 19:06:59 +0000 (11:06 -0800)]
Merge tag 'trace-v4.10-rc2-2' of git://git./linux/kernel/git/rostedt/linux-trace
Pull tracing fix from Steven Rostedt:
"Simple fix of s/static struct __init/static __init struct/"
* tag 'trace-v4.10-rc2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing/kprobes: Fix __init annotation
Linus Torvalds [Fri, 3 Feb 2017 18:30:27 +0000 (10:30 -0800)]
Merge branch 'modversions' (modversions fixes for powerpc from Ard)
Merge kcrctab entry fixes from Ard Biesheuvel:
"This is a followup to [0] 'modversions: redefine kcrctab entries as
relative CRC pointers', but since relative CRC pointers do not work in
modules, and are actually only needed by powerpc with
CONFIG_RELOCATABLE=y, I have made it a Kconfig selectable feature
instead.
First it introduces the MODULE_REL_CRCS Kconfig symbol, and adds the
kbuild handling of it, i.e., modpost, genksyms and kallsyms.
Then it switches all architectures to 32-bit CRC entries in kcrctab,
where all architectures except powerpc with CONFIG_RELOCATABLE=y use
absolute ELF symbol references as before"
[0] http://marc.info/?l=linux-arch&m=
148493613415294&w=2
* emailed patches from Ard Biesheuvel:
module: unify absolute krctab definitions for 32-bit and 64-bit
modversions: treat symbol CRCs as 32 bit quantities
kbuild: modversions: add infrastructure for emitting relative CRCs
Ard Biesheuvel [Thu, 2 Feb 2017 18:05:26 +0000 (18:05 +0000)]
log2: make order_base_2() behave correctly on const input value zero
The function order_base_2() is defined (according to the comment block)
as returning zero on input zero, but subsequently passes the input into
roundup_pow_of_two(), which is explicitly undefined for input zero.
This has gone unnoticed until now, but optimization passes in GCC 7 may
produce constant folded function instances where a constant value of
zero is passed into order_base_2(), resulting in link errors against the
deliberately undefined '____ilog2_NaN'.
So update order_base_2() to adhere to its own documented interface.
[ See
http://marc.info/?l=linux-kernel&m=
147672952517795&w=2
and follow-up discussion for more background. The gcc "optimization
pass" is really just broken, but now the GCC trunk problem seems to
have escaped out of just specially built daily images, so we need to
work around it in mainline. - Linus ]
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Radim Krčmář [Wed, 1 Feb 2017 13:19:53 +0000 (14:19 +0100)]
KVM: x86: do not save guest-unsupported XSAVE state
Saving unsupported state prevents migration when the new host does not
support a XSAVE feature of the original host, even if the feature is not
exposed to the guest.
We've masked host features with guest-visible features before, with
4344ee981e21 ("KVM: x86: only copy XSAVE state for the supported
features") and dropped it when implementing XSAVES. Do it again.
Fixes:
df1daba7d1cb ("KVM: x86: support XSAVES usage in the host")
Cc: stable@vger.kernel.org
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Ard Biesheuvel [Fri, 3 Feb 2017 09:54:07 +0000 (09:54 +0000)]
module: unify absolute krctab definitions for 32-bit and 64-bit
The previous patch introduced a separate inline asm version of the
krcrctab declaration template for use with 64-bit architectures, which
cannot refer to ELF symbols using 32-bit quantities.
This declaration should be equivalent to the C one for 32-bit
architectures, but just in case - unify them in a separate patch, which
can simply be dropped if it turns out to break anything.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ard Biesheuvel [Fri, 3 Feb 2017 09:54:06 +0000 (09:54 +0000)]
modversions: treat symbol CRCs as 32 bit quantities
The modversion symbol CRCs are emitted as ELF symbols, which allows us
to easily populate the kcrctab sections by relying on the linker to
associate each kcrctab slot with the correct value.
This has a couple of downsides:
- Given that the CRCs are treated as memory addresses, we waste 4 bytes
for each CRC on 64 bit architectures,
- On architectures that support runtime relocation, a R_<arch>_RELATIVE
relocation entry is emitted for each CRC value, which identifies it
as a quantity that requires fixing up based on the actual runtime
load offset of the kernel. This results in corrupted CRCs unless we
explicitly undo the fixup (and this is currently being handled in the
core module code)
- Such runtime relocation entries take up 24 bytes of __init space
each, resulting in a x8 overhead in [uncompressed] kernel size for
CRCs.
Switching to explicit 32 bit values on 64 bit architectures fixes most
of these issues, given that 32 bit values are not treated as quantities
that require fixing up based on the actual runtime load offset. Note
that on some ELF64 architectures [such as PPC64], these 32-bit values
are still emitted as [absolute] runtime relocatable quantities, even if
the value resolves to a build time constant. Since relative relocations
are always resolved at build time, this patch enables MODULE_REL_CRCS on
powerpc when CONFIG_RELOCATABLE=y, which turns the absolute CRC
references into relative references into .rodata where the actual CRC
value is stored.
So redefine all CRC fields and variables as u32, and redefine the
__CRC_SYMBOL() macro for 64 bit builds to emit the CRC reference using
inline assembler (which is necessary since 64-bit C code cannot use
32-bit types to hold memory addresses, even if they are ultimately
resolved using values that do not exceed 0xffffffff). To avoid
potential problems with legacy 32-bit architectures using legacy
toolchains, the equivalent C definition of the kcrctab entry is retained
for 32-bit architectures.
Note that this mostly reverts commit
d4703aefdbc8 ("module: handle ppc64
relocating kcrctabs when CONFIG_RELOCATABLE=y")
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ard Biesheuvel [Fri, 3 Feb 2017 09:54:05 +0000 (09:54 +0000)]
kbuild: modversions: add infrastructure for emitting relative CRCs
This add the kbuild infrastructure that will allow architectures to emit
vmlinux symbol CRCs as 32-bit offsets to another location in the kernel
where the actual value is stored. This works around problems with CRCs
being mistaken for relocatable symbols on kernels that self relocate at
runtime (i.e., powerpc with CONFIG_RELOCATABLE=y)
For the kbuild side of things, this comes down to the following:
- introducing a Kconfig symbol MODULE_REL_CRCS
- adding a -R switch to genksyms to instruct it to emit the CRC symbols
as references into the .rodata section
- making modpost distinguish such references from absolute CRC symbols
by the section index (SHN_ABS)
- making kallsyms disregard non-absolute symbols with a __crc_ prefix
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Stanislaw Gruszka [Thu, 2 Feb 2017 12:32:10 +0000 (13:32 +0100)]
ethtool: do not vzalloc(0) on registers dump
If ->get_regs_len() callback return 0, we allocate 0 bytes of memory,
what print ugly warning in dmesg, which can be found further below.
This happen on mac80211 devices where ieee80211_get_regs_len() just
return 0 and driver only fills ethtool_regs structure and actually
do not provide any dump. However I assume this can happen on other
drivers i.e. when for some devices driver provide regs dump and for
others do not. Hence preventing to to print warning in ethtool code
seems to be reasonable.
ethtool: vmalloc: allocation failure: 0 bytes, mode:0x24080c2(GFP_KERNEL|__GFP_HIGHMEM|__GFP_ZERO)
<snip>
Call Trace:
[<
ffffffff813bde47>] dump_stack+0x63/0x8c
[<
ffffffff811b0a1f>] warn_alloc+0x13f/0x170
[<
ffffffff811f0476>] __vmalloc_node_range+0x1e6/0x2c0
[<
ffffffff811f0874>] vzalloc+0x54/0x60
[<
ffffffff8169986c>] dev_ethtool+0xb4c/0x1b30
[<
ffffffff816adbb1>] dev_ioctl+0x181/0x520
[<
ffffffff816714d2>] sock_do_ioctl+0x42/0x50
<snip>
Mem-Info:
active_anon:435809 inactive_anon:173951 isolated_anon:0
active_file:835822 inactive_file:196932 isolated_file:0
unevictable:0 dirty:8 writeback:0 unstable:0
slab_reclaimable:157732 slab_unreclaimable:10022
mapped:83042 shmem:306356 pagetables:9507 bounce:0
free:130041 free_pcp:1080 free_cma:0
Node 0 active_anon:1743236kB inactive_anon:695804kB active_file:3343288kB inactive_file:787728kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:332168kB dirty:32kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 1225424kB writeback_tmp:0kB unstable:0kB pages_scanned:0 all_unreclaimable? no
Node 0 DMA free:15900kB min:136kB low:168kB high:200kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15984kB managed:15900kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 3187 7643 7643
Node 0 DMA32 free:419732kB min:28124kB low:35152kB high:42180kB active_anon:541180kB inactive_anon:248988kB active_file:1466388kB inactive_file:389632kB unevictable:0kB writepending:0kB present:3370280kB managed:3290932kB mlocked:0kB slab_reclaimable:217184kB slab_unreclaimable:4180kB kernel_stack:160kB pagetables:984kB bounce:0kB free_pcp:2236kB local_pcp:660kB free_cma:0kB
lowmem_reserve[]: 0 0 4456 4456
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Lebrun [Thu, 2 Feb 2017 10:29:38 +0000 (11:29 +0100)]
ipv6: sr: remove cleanup flag and fix HMAC computation
In the latest version of the IPv6 Segment Routing IETF draft [1] the
cleanup flag is removed and the flags field length is shrunk from 16 bits
to 8 bits. As a consequence, the input of the HMAC computation is modified
in a non-backward compatible way by covering the whole octet of flags
instead of only the cleanup bit. As such, if an implementation compatible
with the latest draft computes the HMAC of an SRH who has other flags set
to 1, then the HMAC result would differ from the current implementation.
This patch carries those modifications to prevent conflict with other
implementations of IPv6 SR.
[1] https://tools.ietf.org/html/draft-ietf-6man-segment-routing-header-05
Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ondrej Kozina [Tue, 31 Jan 2017 14:47:11 +0000 (15:47 +0100)]
dm crypt: replace RCU read-side section with rwsem
The lockdep splat below hints at a bug in RCU usage in dm-crypt that
was introduced with commit
c538f6ec9f56 ("dm crypt: add ability to use
keys from the kernel key retention service"). The kernel keyring
function user_key_payload() is in fact a wrapper for
rcu_dereference_protected() which must not be called with only
rcu_read_lock() section mark.
Unfortunately the kernel keyring subsystem doesn't currently provide
an interface that allows the use of an RCU read-side section. So for
now we must drop RCU in favour of rwsem until a proper function is
made available in the kernel keyring subsystem.
===============================
[ INFO: suspicious RCU usage. ]
4.10.0-rc5 #2 Not tainted
-------------------------------
./include/keys/user-type.h:53 suspicious rcu_dereference_protected() usage!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1
2 locks held by cryptsetup/6464:
#0: (&md->type_lock){+.+.+.}, at: [<
ffffffffa02472a2>] dm_lock_md_type+0x12/0x20 [dm_mod]
#1: (rcu_read_lock){......}, at: [<
ffffffffa02822f8>] crypt_set_key+0x1d8/0x4b0 [dm_crypt]
stack backtrace:
CPU: 1 PID: 6464 Comm: cryptsetup Not tainted 4.10.0-rc5 #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.1-1.fc24 04/01/2014
Call Trace:
dump_stack+0x67/0x92
lockdep_rcu_suspicious+0xc5/0x100
crypt_set_key+0x351/0x4b0 [dm_crypt]
? crypt_set_key+0x1d8/0x4b0 [dm_crypt]
crypt_ctr+0x341/0xa53 [dm_crypt]
dm_table_add_target+0x147/0x330 [dm_mod]
table_load+0x111/0x350 [dm_mod]
? retrieve_status+0x1c0/0x1c0 [dm_mod]
ctl_ioctl+0x1f5/0x510 [dm_mod]
dm_ctl_ioctl+0xe/0x20 [dm_mod]
do_vfs_ioctl+0x8e/0x690
? ____fput+0x9/0x10
? task_work_run+0x7e/0xa0
? trace_hardirqs_on_caller+0x122/0x1b0
SyS_ioctl+0x3c/0x70
entry_SYSCALL_64_fastpath+0x18/0xad
RIP: 0033:0x7f392c9a4ec7
RSP: 002b:
00007ffef6383378 EFLAGS:
00000246 ORIG_RAX:
0000000000000010
RAX:
ffffffffffffffda RBX:
00007ffef63830a0 RCX:
00007f392c9a4ec7
RDX:
000000000124fcc0 RSI:
00000000c138fd09 RDI:
0000000000000005
RBP:
00007ffef6383090 R08:
00000000ffffffff R09:
00000000012482b0
R10:
2a28205d34383336 R11:
0000000000000246 R12:
00007f392d803a08
R13:
00007ffef63831e0 R14:
0000000000000000 R15:
00007f392d803a0b
Fixes:
c538f6ec9f56 ("dm crypt: add ability to use keys from the kernel key retention service")
Reported-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Ondrej Kozina <okozina@redhat.com>
Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Mike Snitzer [Wed, 25 Jan 2017 15:24:52 +0000 (16:24 +0100)]
dm rq: cope with DM device destruction while in dm_old_request_fn()
Fixes a crash in dm_table_find_target() due to a NULL struct dm_table
being passed from dm_old_request_fn() that races with DM device
destruction.
Reported-by: artem@flashgrid.io
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
Mike Snitzer [Fri, 6 Jan 2017 20:33:14 +0000 (15:33 -0500)]
dm mpath: cleanup -Wbool-operation warning in choose_pgpath()
Reported-by: David Binderman <dcb314@hotmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Mark Brown [Fri, 3 Feb 2017 11:39:46 +0000 (12:39 +0100)]
Merge remote-tracking branches 'regulator/fix/fixed' and 'regulator/fix/twl6040' into regulator-linus
Harsh Jain [Fri, 27 Jan 2017 10:39:06 +0000 (16:09 +0530)]
crypto: chcr - Fix key length for RFC4106
Check keylen before copying salt to avoid wrap around of Integer.
Signed-off-by: Harsh Jain <harsh@chelsio.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Harsh Jain [Wed, 1 Feb 2017 15:40:28 +0000 (21:10 +0530)]
crypto: algif_aead - Fix kernel panic on list_del
Kernel panics when userspace program try to access AEAD interface.
Remove node from Linked List before freeing its memory.
Cc: <stable@vger.kernel.org>
Signed-off-by: Harsh Jain <harsh@chelsio.com>
Reviewed-by: Stephan Müller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Herbert Xu [Wed, 1 Feb 2017 14:17:39 +0000 (22:17 +0800)]
crypto: aesni - Fix failure when pcbc module is absent
When aesni is built as a module together with pcbc, the pcbc module
must be present for aesni to load. However, the pcbc module may not
be present for reasons such as its absence on initramfs. This patch
allows the aesni to function even if the pcbc module is enabled but
not present.
Reported-by: Arkadiusz Miśkiewicz <arekm@maven.pl>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Gary R Hook [Fri, 27 Jan 2017 23:09:04 +0000 (17:09 -0600)]
crypto: ccp - Fix double add when creating new DMA command
Eliminate a double-add by creating a new list to manage
command descriptors when created; move the descriptor to
the pending list when the command is submitted.
Cc: <stable@vger.kernel.org>
Signed-off-by: Gary R Hook <gary.hook@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Gary R Hook [Fri, 27 Jan 2017 21:28:45 +0000 (15:28 -0600)]
crypto: ccp - Fix DMA operations when IOMMU is enabled
An I/O page fault occurs when the IOMMU is enabled on a
system that supports the v5 CCP. DMA operations use a
Request ID value that does not match what is expected by
the IOMMU, resulting in the I/O page fault. Setting the
Request ID value to 0 corrects this issue.
Cc: <stable@vger.kernel.org>
Signed-off-by: Gary R Hook <gary.hook@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Harsh Jain [Tue, 24 Jan 2017 05:04:33 +0000 (10:34 +0530)]
crypto: chcr - Check device is allocated before use
Ensure dev is allocated for crypto uld context before using the device
for crypto operations.
Cc: <stable@vger.kernel.org>
Signed-off-by: Atul Gupta <atul.gupta@chelsio.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Harsh Jain [Tue, 24 Jan 2017 05:04:32 +0000 (10:34 +0530)]
crypto: chcr - Fix panic on dma_unmap_sg
Save DMA mapped sg list addresses to request context buffer.
Signed-off-by: Atul Gupta <atul.gupta@chelsio.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Mao Wenan [Wed, 1 Feb 2017 02:46:43 +0000 (18:46 -0800)]
net: phy: Fix lack of reference count on PHY driver
There is currently no reference count being held on the PHY driver,
which makes it possible to remove the PHY driver module while the PHY
state machine is running and polling the PHY. This could cause crashes
similar to this one to show up:
[ 43.361162] BUG: unable to handle kernel NULL pointer dereference at
0000000000000140
[ 43.361162] IP: phy_state_machine+0x32/0x490
[ 43.361162] PGD
59dc067
[ 43.361162] PUD 0
[ 43.361162]
[ 43.361162] Oops: 0000 [#1] SMP
[ 43.361162] Modules linked in: dsa_loop [last unloaded: broadcom]
[ 43.361162] CPU: 0 PID: 1299 Comm: kworker/0:3 Not tainted 4.10.0-rc5+ #415
[ 43.361162] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS Ubuntu-1.8.2-1ubuntu2 04/01/2014
[ 43.361162] Workqueue: events_power_efficient phy_state_machine
[ 43.361162] task:
ffff880006782b80 task.stack:
ffffc90000184000
[ 43.361162] RIP: 0010:phy_state_machine+0x32/0x490
[ 43.361162] RSP: 0018:
ffffc90000187e18 EFLAGS:
00000246
[ 43.361162] RAX:
0000000000000000 RBX:
ffff8800059e53c0 RCX:
ffff880006a15c60
[ 43.361162] RDX:
ffff880006782b80 RSI:
0000000000000000 RDI:
ffff8800059e5428
[ 43.361162] RBP:
ffffc90000187e48 R08:
ffff880006a15c40 R09:
0000000000000000
[ 43.361162] R10:
0000000000000000 R11:
0000000000000000 R12:
ffff8800059e5428
[ 43.361162] R13:
ffff8800059e5000 R14:
0000000000000000 R15:
ffff880006a15c40
[ 43.361162] FS:
0000000000000000(0000) GS:
ffff880006a00000(0000)
knlGS:
0000000000000000
[ 43.361162] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 43.361162] CR2:
0000000000000140 CR3:
0000000005979000 CR4:
00000000000006f0
[ 43.361162] Call Trace:
[ 43.361162] process_one_work+0x1b4/0x3e0
[ 43.361162] worker_thread+0x43/0x4d0
[ 43.361162] ? __schedule+0x17f/0x4e0
[ 43.361162] kthread+0xf7/0x130
[ 43.361162] ? process_one_work+0x3e0/0x3e0
[ 43.361162] ? kthread_create_on_node+0x40/0x40
[ 43.361162] ret_from_fork+0x29/0x40
[ 43.361162] Code: 56 41 55 41 54 4c 8d 67 68 53 4c 8d af 40 fc ff ff
48 89 fb 4c 89 e7 48 83 ec 08 e8 c9 9d 27 00 48 8b 83 60 ff ff ff 44 8b
73 98 <48> 8b 90 40 01 00 00 44 89 f0 48 85 d2 74 08 4c 89 ef ff d2 8b
Keep references on the PHY driver module right before we are going to
utilize it in phy_attach_direct(), and conversely when we don't use it
anymore in phy_detach().
Signed-off-by: Mao Wenan <maowenan@huawei.com>
[florian: rebase, rework commit message]
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 3 Feb 2017 02:27:05 +0000 (21:27 -0500)]
Merge branch 'mlx4-queue-reinit'
Martin KaFai Lau says:
====================
mlx4: Misc bug fixes after reinitializing queues
This patchset fixes misc bugs after reinitializing
queues (e.g. by ethtool -L).
v2:
* Add another fix to mem leak in tx_ring[t] and tx_cq[t]
* In mlx4_en_try_alloc_resources(),
move all xdp_prog logic after calling mlx4_en_alloc_resources()
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Martin KaFai Lau [Wed, 1 Feb 2017 06:35:33 +0000 (22:35 -0800)]
mlx4: xdp_prog becomes inactive after ethtool '-L' or '-G'
After calling mlx4_en_try_alloc_resources (e.g. by changing the
number of rx-queues with ethtool -L), the existing xdp_prog becomes
inactive.
The bug is that the xdp_prog ptr has not been carried over from
the old rx-queues to the new rx-queues
Fixes:
47a38e155037 ("net/mlx4_en: add support for fast rx drop bpf program")
Cc: Brenden Blanco <bblanco@plumgrid.com>
Cc: Saeed Mahameed <saeedm@mellanox.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Martin KaFai Lau [Wed, 1 Feb 2017 06:35:32 +0000 (22:35 -0800)]
mlx4: Fix memory leak after mlx4_en_update_priv()
In mlx4_en_update_priv(), dst->tx_ring[t] and dst->tx_cq[t]
are over-written by src->tx_ring[t] and src->tx_cq[t] without
first calling kfree.
One of the reproducible code paths is by doing 'ethtool -L'.
The fix is to do the kfree in mlx4_en_free_resources().
Here is the kmemleak report:
unreferenced object 0xffff880841211800 (size 2048):
comm "ethtool", pid 3096, jiffies
4294716940 (age 528.353s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<
ffffffff81930718>] kmemleak_alloc+0x28/0x50
[<
ffffffff8120b213>] kmem_cache_alloc_trace+0x103/0x260
[<
ffffffff8170e0a8>] mlx4_en_try_alloc_resources+0x118/0x1a0
[<
ffffffff817065a9>] mlx4_en_set_ringparam+0x169/0x210
[<
ffffffff818040c5>] dev_ethtool+0xae5/0x2190
[<
ffffffff8181b898>] dev_ioctl+0x168/0x6f0
[<
ffffffff817d7a72>] sock_do_ioctl+0x42/0x50
[<
ffffffff817d819b>] sock_ioctl+0x21b/0x2d0
[<
ffffffff81247a73>] do_vfs_ioctl+0x93/0x6a0
[<
ffffffff812480f9>] SyS_ioctl+0x79/0x90
[<
ffffffff8193d7ea>] entry_SYSCALL_64_fastpath+0x18/0xad
[<
ffffffffffffffff>] 0xffffffffffffffff
unreferenced object 0xffff880841213000 (size 2048):
comm "ethtool", pid 3096, jiffies
4294716940 (age 528.353s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<
ffffffff81930718>] kmemleak_alloc+0x28/0x50
[<
ffffffff8120b213>] kmem_cache_alloc_trace+0x103/0x260
[<
ffffffff8170e0cb>] mlx4_en_try_alloc_resources+0x13b/0x1a0
[<
ffffffff817065a9>] mlx4_en_set_ringparam+0x169/0x210
[<
ffffffff818040c5>] dev_ethtool+0xae5/0x2190
[<
ffffffff8181b898>] dev_ioctl+0x168/0x6f0
[<
ffffffff817d7a72>] sock_do_ioctl+0x42/0x50
[<
ffffffff817d819b>] sock_ioctl+0x21b/0x2d0
[<
ffffffff81247a73>] do_vfs_ioctl+0x93/0x6a0
[<
ffffffff812480f9>] SyS_ioctl+0x79/0x90
[<
ffffffff8193d7ea>] entry_SYSCALL_64_fastpath+0x18/0xad
[<
ffffffffffffffff>] 0xffffffffffffffff
(gdb) list *mlx4_en_try_alloc_resources+0x118
0xffffffff8170e0a8 is in mlx4_en_try_alloc_resources (drivers/net/ethernet/mellanox/mlx4/en_netdev.c:2145).
2140 if (!dst->tx_ring_num[t])
2141 continue;
2142
2143 dst->tx_ring[t] = kzalloc(sizeof(struct mlx4_en_tx_ring *) *
2144 MAX_TX_RINGS, GFP_KERNEL);
2145 if (!dst->tx_ring[t])
2146 goto err_free_tx;
2147
2148 dst->tx_cq[t] = kzalloc(sizeof(struct mlx4_en_cq *) *
2149 MAX_TX_RINGS, GFP_KERNEL);
(gdb) list *mlx4_en_try_alloc_resources+0x13b
0xffffffff8170e0cb is in mlx4_en_try_alloc_resources (drivers/net/ethernet/mellanox/mlx4/en_netdev.c:2150).
2145 if (!dst->tx_ring[t])
2146 goto err_free_tx;
2147
2148 dst->tx_cq[t] = kzalloc(sizeof(struct mlx4_en_cq *) *
2149 MAX_TX_RINGS, GFP_KERNEL);
2150 if (!dst->tx_cq[t]) {
2151 kfree(dst->tx_ring[t]);
2152 goto err_free_tx;
2153 }
2154 }
Fixes:
ec25bc04ed8e ("net/mlx4_en: Add resilience in low memory systems")
Cc: Eugenia Emantayev <eugenia@mellanox.com>
Cc: Saeed Mahameed <saeedm@mellanox.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>