Ido Schimmel [Mon, 10 Apr 2017 11:59:28 +0000 (14:59 +0300)]
bridge: netlink: register netdevice before executing changelink
Peter reported a kernel oops when executing the following command:
$ ip link add name test type bridge vlan_default_pvid 1
[13634.939408] BUG: unable to handle kernel NULL pointer dereference at
0000000000000190
[13634.939436] IP: __vlan_add+0x73/0x5f0
[...]
[13634.939783] Call Trace:
[13634.939791] ? pcpu_next_unpop+0x3b/0x50
[13634.939801] ? pcpu_alloc+0x3d2/0x680
[13634.939810] ? br_vlan_add+0x135/0x1b0
[13634.939820] ? __br_vlan_set_default_pvid.part.28+0x204/0x2b0
[13634.939834] ? br_changelink+0x120/0x4e0
[13634.939844] ? br_dev_newlink+0x50/0x70
[13634.939854] ? rtnl_newlink+0x5f5/0x8a0
[13634.939864] ? rtnl_newlink+0x176/0x8a0
[13634.939874] ? mem_cgroup_commit_charge+0x7c/0x4e0
[13634.939886] ? rtnetlink_rcv_msg+0xe1/0x220
[13634.939896] ? lookup_fast+0x52/0x370
[13634.939905] ? rtnl_newlink+0x8a0/0x8a0
[13634.939915] ? netlink_rcv_skb+0xa1/0xc0
[13634.939925] ? rtnetlink_rcv+0x24/0x30
[13634.939934] ? netlink_unicast+0x177/0x220
[13634.939944] ? netlink_sendmsg+0x2fe/0x3b0
[13634.939954] ? _copy_from_user+0x39/0x40
[13634.939964] ? sock_sendmsg+0x30/0x40
[13634.940159] ? ___sys_sendmsg+0x29d/0x2b0
[13634.940326] ? __alloc_pages_nodemask+0xdf/0x230
[13634.940478] ? mem_cgroup_commit_charge+0x7c/0x4e0
[13634.940592] ? mem_cgroup_try_charge+0x76/0x1a0
[13634.940701] ? __handle_mm_fault+0xdb9/0x10b0
[13634.940809] ? __sys_sendmsg+0x51/0x90
[13634.940917] ? entry_SYSCALL_64_fastpath+0x1e/0xad
The problem is that the bridge's VLAN group is created after setting the
default PVID, when registering the netdevice and executing its
ndo_init().
Fix this by changing the order of both operations, so that
br_changelink() is only processed after the netdevice is registered,
when the VLAN group is already initialized.
Fixes:
b6677449dff6 ("bridge: netlink: call br_changelink() during br_dev_newlink()")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Peter V. Saveliev <peter@svinota.eu>
Tested-by: Peter V. Saveliev <peter@svinota.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Mon, 10 Apr 2017 11:59:27 +0000 (14:59 +0300)]
bridge: implement missing ndo_uninit()
While the bridge driver implements an ndo_init(), it was missing a
symmetric ndo_uninit(), causing the different de-initialization
operations to be scattered around its dellink() and destructor().
Implement a symmetric ndo_uninit() and remove the overlapping operations
from its dellink() and destructor().
This is a prerequisite for the next patch, as it allows us to have a
proper cleanup upon changelink() failure during the bridge's newlink().
Fixes:
b6677449dff6 ("bridge: netlink: call br_changelink() during br_dev_newlink()")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johannes Berg [Tue, 11 Apr 2017 10:10:58 +0000 (12:10 +0200)]
bpf: reference may_access_skb() from __bpf_prog_run()
It took me quite some time to figure out how this was linked,
so in order to save the next person the effort of finding it
add a comment in __bpf_prog_run() that indicates what exactly
determines that a program can access the ctx == skb.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Sat, 8 Apr 2017 15:07:33 +0000 (08:07 -0700)]
tcp: clear saved_syn in tcp_disconnect()
In the (very unlikely) case a passive socket becomes a listener,
we do not want to duplicate its saved SYN headers.
This would lead to double frees, use after free, and please hackers and
various fuzzers
Tested:
0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt(3, IPPROTO_TCP, TCP_SAVE_SYN, [1], 4) = 0
+0 fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0
+0 bind(3, ..., ...) = 0
+0 listen(3, 5) = 0
+0 < S 0:0(0) win 32972 <mss 1460,nop,wscale 7>
+0 > S. 0:0(0) ack 1 <...>
+.1 < . 1:1(0) ack 1 win 257
+0 accept(3, ..., ...) = 4
+0 connect(4, AF_UNSPEC, ...) = 0
+0 close(3) = 0
+0 bind(4, ..., ...) = 0
+0 listen(4, 5) = 0
+0 < S 0:0(0) win 32972 <mss 1460,nop,wscale 7>
+0 > S. 0:0(0) ack 1 <...>
+.1 < . 1:1(0) ack 1 win 257
Fixes:
cd8ae85299d5 ("tcp: provide SYN headers for passive connections")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 8 Apr 2017 15:48:50 +0000 (08:48 -0700)]
Merge tag 'linux-can-fixes-for-4.12-
20170404' of git://git./linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2017-04-04
this is a pull request of two patches for net/master.
The first patch by Markus Marb fixes a register read access in the ifi driver.
The second patch by Geert Uytterhoeven for the rcar driver remove the printing
of a kernel virtual address.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Gao Feng [Thu, 6 Apr 2017 15:05:49 +0000 (23:05 +0800)]
net: tcp: Increase TCP_MIB_OUTRSTS even though fail to alloc skb
Because TCP_MIB_OUTRSTS is an important count, so always increase it
whatever send it successfully or not.
Now move the increment of TCP_MIB_OUTRSTS to the top of
tcp_send_active_reset to make sure it is increased always even though
fail to alloc skb.
Signed-off-by: Gao Feng <fgao@ikuai8.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 8 Apr 2017 15:29:05 +0000 (08:29 -0700)]
Merge branch 'l2tp-sockopt-errors'
Guillaume Nault says:
====================
l2tp: fix error handling of PPPoL2TP socket options
Fix pppol2tp_[gs]etsockopt() so that they don't ignore errors returned
by their helper functions.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Guillaume Nault [Thu, 6 Apr 2017 16:31:21 +0000 (18:31 +0200)]
l2tp: don't mask errors in pppol2tp_getsockopt()
pppol2tp_getsockopt() doesn't take into account the error code returned
by pppol2tp_tunnel_getsockopt() or pppol2tp_session_getsockopt(). If
error occurs there, pppol2tp_getsockopt() continues unconditionally and
reports erroneous values.
Fixes:
fd558d186df2 ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Guillaume Nault [Thu, 6 Apr 2017 16:31:20 +0000 (18:31 +0200)]
l2tp: don't mask errors in pppol2tp_setsockopt()
pppol2tp_setsockopt() unconditionally overwrites the error value
returned by pppol2tp_tunnel_setsockopt() or
pppol2tp_session_setsockopt(), thus hiding errors from userspace.
Fixes:
fd558d186df2 ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuchung Cheng [Fri, 7 Apr 2017 18:42:05 +0000 (11:42 -0700)]
tcp: restrict F-RTO to work-around broken middle-boxes
The recent extension of F-RTO
89fe18e44 ("tcp: extend F-RTO
to catch more spurious timeouts") interacts badly with certain
broken middle-boxes. These broken boxes modify and falsely raise
the receive window on the ACKs. During a timeout induced recovery,
F-RTO would send new data packets to probe if the timeout is false
or not. Since the receive window is falsely raised, the receiver
would silently drop these F-RTO packets. The recovery would take N
(exponentially backoff) timeouts to repair N packet losses. A TCP
performance killer.
Due to this unfortunate situation, this patch removes this extension
to revert F-RTO back to the RFC specification.
Fixes:
89fe18e44f7e ("tcp: extend F-RTO to catch more spurious timeouts")
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Thu, 6 Apr 2017 05:41:28 +0000 (13:41 +0800)]
team: call netdev_change_features out of team lock
Commit
f6988cb63a4e ("team: don't call netdev_change_features under
team->lock") fixed the issue calling netdev_change_features under
team->lock for team_compute_features.
But there are still two places where it calls netdev_change_features
under team->lock, team_port_add and team_port_del. It may cause a
dead lock when the slave port with LRO enabled is added.
This patch is to fix this dead lock by moving netdev_change_features
out of team_port_add and team_port_del, and call it after unlocking
the team lock.
Reported-by: Patrick Talbert <ptalbert@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Thu, 6 Apr 2017 05:10:52 +0000 (13:10 +0800)]
sctp: listen on the sock only when it's state is listening or closed
Now sctp doesn't check sock's state before listening on it. It could
even cause changing a sock with any state to become a listening sock
when doing sctp_listen.
This patch is to fix it by checking sock's state in sctp_listen, so
that it will listen on the sock with right state.
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Oliver Neukum [Wed, 5 Apr 2017 12:14:39 +0000 (14:14 +0200)]
usbnet: make sure no NULL pointer is passed through
Coverity reports:
** CID 751368: Null pointer dereferences (FORWARD_NULL)
/drivers/net/usb/usbnet.c: 1925 in __usbnet_read_cmd()
________________________________________________________________________________________________________
*** CID 751368: Null pointer dereferences (FORWARD_NULL)
/drivers/net/usb/usbnet.c: 1925 in __usbnet_read_cmd()
1919 EXPORT_SYMBOL(usbnet_link_change);
1920
1921 /*-------------------------------------------------------------------------*/
1922 static int __usbnet_read_cmd(struct usbnet *dev, u8 cmd, u8 reqtype,
1923 u16 value, u16 index, void *data, u16 size)
1924 {
>>> CID 751368: Null pointer dereferences (FORWARD_NULL)
>>> Assigning: "buf" = "NULL".
1925 void *buf = NULL;
1926 int err = -ENOMEM;
1927
1928 netdev_dbg(dev->net, "usbnet_read_cmd cmd=0x%02x reqtype=%02x"
1929 " value=0x%04x index=0x%04x size=%d\n",
1930 cmd, reqtype, value, index, size);
** CID 751370: Null pointer dereferences (FORWARD_NULL)
/drivers/net/usb/usbnet.c: 1952 in __usbnet_write_cmd()
________________________________________________________________________________________________________
*** CID 751370: Null pointer dereferences (FORWARD_NULL)
/drivers/net/usb/usbnet.c: 1952 in __usbnet_write_cmd()
1946 }
1947
1948 static int __usbnet_write_cmd(struct usbnet *dev, u8 cmd, u8 reqtype,
1949 u16 value, u16 index, const void *data,
1950 u16 size)
1951 {
>>> CID 751370: Null pointer dereferences (FORWARD_NULL)
>>> Assigning: "buf" = "NULL".
1952 void *buf = NULL;
1953 int err = -ENOMEM;
1954
1955 netdev_dbg(dev->net, "usbnet_write_cmd cmd=0x%02x reqtype=%02x"
1956 " value=0x%04x index=0x%04x size=%d\n",
1957 cmd, reqtype, value, index, size);
** CID
1325026: Null pointer dereferences (FORWARD_NULL)
/drivers/net/usb/ch9200.c: 143 in control_write()
It is valid to offer commands without a buffer, but then you need a size
of zero. This should actually be checked.
Signed-off-by: Oliver Neukum <oneukum@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
WANG Cong [Wed, 5 Apr 2017 01:51:30 +0000 (18:51 -0700)]
net_sched: check noop_qdisc before qdisc_hash_add()
Dmitry reported a crash when injecting faults in
attach_one_default_qdisc() and dev->qdisc is still
a noop_disc, the check before qdisc_hash_add() fails
to catch it because it tests NULL. We should test
against noop_qdisc since it is the default qdisc
at this point.
Fixes:
59cc1f61f09c ("net: sched: convert qdisc linked list to hashtable")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Larysch [Mon, 3 Apr 2017 14:46:09 +0000 (16:46 +0200)]
net: ipv4: fix multipath RTM_GETROUTE behavior when iif is given
inet_rtm_getroute synthesizes a skeletal ICMP skb, which is passed to
ip_route_input when iif is given. If a multipath route is present for
the designated destination, ip_multipath_icmp_hash ends up being called,
which uses the source/destination addresses within the skb to calculate
a hash. However, those are not set in the synthetic skb, causing it to
return an arbitrary and incorrect result.
Instead, use UDP, which gets no such special treatment.
Signed-off-by: Florian Larysch <fl@n621.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 6 Apr 2017 18:57:04 +0000 (11:57 -0700)]
Merge branch 'for-davem' of git://git./linux/kernel/git/viro/vfs
Linus Torvalds [Thu, 6 Apr 2017 03:17:38 +0000 (20:17 -0700)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) Reject invalid updates to netfilter expectation policies, from Pablo
Neira Ayuso.
2) Fix memory leak in nfnl_cthelper, from Jeffy Chen.
3) Don't do stupid things if we get a neigh_probe() on a neigh entry
whose ops lack a solicit method. From Eric Dumazet.
4) Don't transmit packets in r8152 driver when the carrier is off, from
Hayes Wang.
5) Fix ipv6 packet type detection in aquantia driver, from Pavel
Belous.
6) Don't write uninitialized data into hw registers in bna driver, from
Arnd Bergmann.
7) Fix locking in ping_unhash(), from Eric Dumazet.
8) Make BPF verifier range checks able to understand certain sequences
emitted by LLVM, from Alexei Starovoitov.
9) Fix use after free in ipconfig, from Mark Rutland.
10) Fix refcount leak on force commit in openvswitch, from Jarno
Rajahalme.
11) Fix various overflow checks in AF_PACKET, from Andrey Konovalov.
12) Fix endianness bug in be2net driver, from Suresh Reddy.
13) Don't forget to wake TX queues when processing a timeout, from
Grygorii Strashko.
14) ARP header on-stack storage is wrong in flow dissector, from Simon
Horman.
15) Lost retransmit and reordering SNMP stats in TCP can be
underreported. From Yuchung Cheng.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (82 commits)
nfp: fix potential use after free on xdp prog
tcp: fix reordering SNMP under-counting
tcp: fix lost retransmit SNMP under-counting
sctp: get sock from transport in sctp_transport_update_pmtu
net: ethernet: ti: cpsw: fix race condition during open()
l2tp: fix PPP pseudo-wire auto-loading
bnx2x: fix spelling mistake in macros HW_INTERRUT_ASSERT_SET_*
l2tp: take reference on sessions being dumped
tcp: minimize false-positives on TCP/GRO check
sctp: check for dst and pathmtu update in sctp_packet_config
flow dissector: correct size of storage for ARP
net: ethernet: ti: cpsw: wake tx queues on ndo_tx_timeout
l2tp: take a reference on sessions used in genetlink handlers
l2tp: hold session while sending creation notifications
l2tp: fix duplicate session creation
l2tp: ensure session can't get removed during pppol2tp_session_ioctl()
l2tp: fix race in l2tp_recv_common()
sctp: use right in and out stream cnt
bpf: add various verifier test cases for self-tests
bpf, verifier: fix rejection of unaligned access checks for map_value_adj
...
Jakub Kicinski [Tue, 4 Apr 2017 22:56:55 +0000 (15:56 -0700)]
nfp: fix potential use after free on xdp prog
We should unregister the net_device first, before we give back
our reference on xdp_prog. Otherwise xdp_prog may be freed
before .ndo_stop() disabled the datapath. Found by code inspection.
Fixes:
ecd63a0217d5 ("nfp: add XDP support in the driver")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuchung Cheng [Tue, 4 Apr 2017 21:15:40 +0000 (14:15 -0700)]
tcp: fix reordering SNMP under-counting
Currently the reordering SNMP counters only increase if a connection
sees a higher degree then it has previously seen. It ignores if the
reordering degree is not greater than the default system threshold.
This significantly under-counts the number of reordering events
and falsely convey that reordering is rare on the network.
This patch properly and faithfully records the number of reordering
events detected by the TCP stack, just like the comment says "this
exciting event is worth to be remembered". Note that even so TCP
still under-estimate the actual reordering events because TCP
requires TS options or certain packet sequences to detect reordering
(i.e. ACKing never-retransmitted sequence in recovery or disordered
state).
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuchung Cheng [Tue, 4 Apr 2017 21:15:39 +0000 (14:15 -0700)]
tcp: fix lost retransmit SNMP under-counting
The lost retransmit SNMP stat is under-counting retransmission
that uses segment offloading. This patch fixes that so all
retransmission related SNMP counters are consistent.
Fixes:
10d3be569243 ("tcp-tso: do not split TSO packets at retransmit time")
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Wed, 5 Apr 2017 16:04:26 +0000 (09:04 -0700)]
Merge tag 'mfd-fixes-4.11' of git://git./linux/kernel/git/lee/mfd
Pull MFD bug fix from Lee Jones:
"Increase buffer size om cros-ec to allow for SPI messages"
* tag 'mfd-fixes-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd:
mfd: cros-ec: Fix host command buffer size
Linus Torvalds [Wed, 5 Apr 2017 15:37:28 +0000 (08:37 -0700)]
Merge tag 'kbuild-fixes-v4.11' of git://git./linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- hand-off primary maintainership of Kbuild
- fix build warnings
- fix build error when GCOV is enabled with old compiler
- fix HAVE_ASM_GOTO check when GCC plugin is enabled
* tag 'kbuild-fixes-v4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
gconfig: remove misleading parentheses around a condition
jump label: fix passing kbuild_cflags when checking for asm goto support
Kbuild: use cc-disable-warning consistently for maybe-uninitialized
kbuild: external module build warnings when KBUILD_OUTPUT set and W=1
MAINTAINERS: add Masahiro Yamada as a Kbuild maintainer
Xin Long [Tue, 4 Apr 2017 05:39:55 +0000 (13:39 +0800)]
sctp: get sock from transport in sctp_transport_update_pmtu
This patch is almost to revert commit
02f3d4ce9e81 ("sctp: Adjust PMTU
updates to accomodate route invalidation."). As t->asoc can't be NULL
in sctp_transport_update_pmtu, it could get sk from asoc, and no need
to pass sk into that function.
It is also to remove some duplicated codes from that function.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vic Yang [Fri, 24 Mar 2017 17:44:01 +0000 (18:44 +0100)]
mfd: cros-ec: Fix host command buffer size
For SPI, we can get up to 32 additional bytes for response preamble.
The current overhead (2 bytes) may cause problems when we try to receive
a big response. Update it to 32 bytes.
Without this fix we could see a kernel BUG when we receive a big response
from the Chrome EC when is connected via SPI.
Signed-off-by: Vic Yang <victoryang@google.com>
Tested-by: Enric Balletbo i Serra <enric.balletbo.collabora.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Linus Torvalds [Tue, 4 Apr 2017 18:40:20 +0000 (11:40 -0700)]
Merge tag 'gpio-v4.11-3' of git://git./linux/kernel/git/linusw/linux-gpio
Pull late GPIO fixes from Linus Walleij:
"Some late coming ACPI fixes for GPIO.
We're dealing with ACPI issues here. The first is related to wake IRQs
on Bay Trail/Cherry Trail CPUs which are common in laptops. The second
is about proper probe deferral when reading _CRS properties.
For my untrained eye it seems there was some quarrel between the BIOS
and the kernel about who is supposed to deal with wakeups from GPIO
lines"
* tag 'gpio-v4.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
ACPI / gpio: do not fall back to parsing _CRS when we get a deferral
gpio: acpi: Call enable_irq_wake for _IAE GpioInts with Wake set
David S. Miller [Tue, 4 Apr 2017 18:36:54 +0000 (11:36 -0700)]
Merge tag 'wireless-drivers-for-davem-2017-04-03' of git://git./linux/kernel/git/kvalo/wireless-drivers
Kalle Valo says:
====================
wireless-drivers fixes for 4.11
iwlwifi
* an RCU fix
* a fix for a potential out-of-bounds access crash
* a fix for IBSS which has been broken since DQA was enabled
rtlwifi
* fix scheduling while atomic regression
brcmfmac
* fix use-after-free bug found by KASAN
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Tue, 4 Apr 2017 18:16:52 +0000 (11:16 -0700)]
Merge tag 'nios2-v4.11-fix' of git://git./linux/kernel/git/lftan/nios2
Pull nios2 fix from Ley Foon Tan:
- nios2: reserve boot memory for device tree
* tag 'nios2-v4.11-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/lftan/nios2:
nios2: reserve boot memory for device tree
Sekhar Nori [Mon, 3 Apr 2017 12:04:28 +0000 (17:34 +0530)]
net: ethernet: ti: cpsw: fix race condition during open()
TI's cpsw driver handles both OF and non-OF case for phy
connect. Unfortunately of_phy_connect() returns NULL on
error while phy_connect() returns ERR_PTR().
To handle this, cpsw_slave_open() overrides the return value
from phy_connect() to make it NULL or error.
This leaves a small window, where cpsw_adjust_link() may be
invoked for a slave while slave->phy pointer is temporarily
set to -ENODEV (or some other error) before it is finally set
to NULL.
_cpsw_adjust_link() only handles the NULL case, and an oops
results when ERR_PTR() is seen by it.
Note that cpsw_adjust_link() checks PHY status for each
slave whenever it is invoked. It can so happen that even
though phy_connect() for a given slave returns error,
_cpsw_adjust_link() is still called for that slave because
the link status of another slave changed.
Fix this by using a temporary pointer to store return value
of {of_}phy_connect() and do a one-time write to slave->phy.
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Reported-by: Yan Liu <yan-liu@ti.com>
Signed-off-by: Sekhar Nori <nsekhar@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Tue, 4 Apr 2017 17:12:15 +0000 (10:12 -0700)]
Merge tag 'drm-fixes-for-v4.11-rc6' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"This is just mostly stuff that missed rc5, from vmwgfx and msm
drivers"
* tag 'drm-fixes-for-v4.11-rc6' of git://people.freedesktop.org/~airlied/linux:
drm/msm: Make sure to detach the MMU during GPU cleanup
drm/msm/hdmi: redefinitions of macros not required
drm/msm/mdp5: Update SSPP_MAX value
drm/msm/dsi: Fix bug in dsi_mgr_phy_enable
drm/msm: Don't allow zero sized buffer objects
drm/msm: Fix wrong pointer check in a5xx_destroy
drm/msm: adreno: fix build error without debugfs
drm/vmwgfx: fix integer overflow in vmw_surface_define_ioctl()
drm/vmwgfx: Remove getparam error message
drm/ttm: Avoid calling drm_ht_remove from atomic context
drm/ttm, drm/vmwgfx: Relax permission checking when opening surfaces
drm/vmwgfx: avoid calling vzalloc with a 0 size in vmw_get_cap_3d_ioctl()
drm/vmwgfx: NULL pointer dereference in vmw_surface_define_ioctl()
drm/vmwgfx: Type-check lookups of fence objects
Guillaume Nault [Mon, 3 Apr 2017 11:23:15 +0000 (13:23 +0200)]
l2tp: fix PPP pseudo-wire auto-loading
PPP pseudo-wire type is 7 (11 is L2TP_PWTYPE_IP).
Fixes:
f1f39f911027 ("l2tp: auto load type modules")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Mon, 3 Apr 2017 10:19:10 +0000 (11:19 +0100)]
bnx2x: fix spelling mistake in macros HW_INTERRUT_ASSERT_SET_*
Trival fix, rename HW_INTERRUT_ASSERT_SET_* to HW_INTERRUPT_ASSERT_SET_*
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Guillaume Nault [Mon, 3 Apr 2017 10:03:13 +0000 (12:03 +0200)]
l2tp: take reference on sessions being dumped
Take a reference on the sessions returned by l2tp_session_find_nth()
(and rename it l2tp_session_get_nth() to reflect this change), so that
caller is assured that the session isn't going to disappear while
processing it.
For procfs and debugfs handlers, the session is held in the .start()
callback and dropped in .show(). Given that pppol2tp_seq_session_show()
dereferences the associated PPPoL2TP socket and that
l2tp_dfs_seq_session_show() might call pppol2tp_show(), we also need to
call the session's .ref() callback to prevent the socket from going
away from under us.
Fixes:
fd558d186df2 ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts")
Fixes:
0ad6614048cf ("l2tp: Add debugfs files for dumping l2tp debug info")
Fixes:
309795f4bec2 ("l2tp: Add netlink control API for L2TP")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Geert Uytterhoeven [Mon, 3 Apr 2017 10:11:26 +0000 (12:11 +0200)]
can: rcar_can: Do not print virtual addresses
During probe, the rcar_can driver prints:
rcar_can
e6e80000.can: device registered (regs @
e08bc000, IRQ76)
The "regs" value is a virtual address, exposing internal information,
hence stop printing it. The (useful) physical address is already
printed as part of the device name.
Fixes:
fd1159318e55e901 ("can: add Renesas R-Car CAN driver")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Markus Marb [Fri, 17 Mar 2017 22:14:47 +0000 (23:14 +0100)]
can: ifi: use correct register to read rx status
The incorrect offset was used when trying to read the RXSTCMD register.
Signed-off-by: Markus Marb <markus@marb.org>
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Marcelo Ricardo Leitner [Sat, 1 Apr 2017 14:00:21 +0000 (11:00 -0300)]
tcp: minimize false-positives on TCP/GRO check
Markus Trippelsdorf reported that after commit
dcb17d22e1c2 ("tcp: warn
on bogus MSS and try to amend it") the kernel started logging the
warning for a NIC driver that doesn't even support GRO.
It was diagnosed that it was possibly caused on connections that were
using TCP Timestamps but some packets lacked the Timestamps option. As
we reduce rcv_mss when timestamps are used, the lack of them would cause
the packets to be bigger than expected, although this is a valid case.
As this warning is more as a hint, getting a clean-cut on the
threshold is probably not worth the execution time spent on it. This
patch thus alleviates the false-positives with 2 quick checks: by
accounting for the entire TCP option space and also checking against the
interface MTU if it's available.
These changes, specially the MTU one, might mask some real positives,
though if they are really happening, it's possible that sooner or later
it will be triggered anyway.
Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Tue, 4 Apr 2017 00:56:32 +0000 (17:56 -0700)]
Merge tag 'xtensa-
20170403' of git://github.com/jcmvbkbc/linux-xtensa
Pull Xtensa fixes from Max Filippov:
- make __pa work with uncached KSEG addresses, it fixes DMA memory
mmapping and DMA debug
- fix torn stack dump output
- wire up statx syscall
* tag 'xtensa-
20170403' of git://github.com/jcmvbkbc/linux-xtensa:
xtensa: wire up statx system call
xtensa: fix stack dump output
xtensa: make __pa work with uncached KSEG addresses
Dave Airlie [Tue, 4 Apr 2017 00:13:40 +0000 (10:13 +1000)]
Merge branch 'msm-fixes-4.11-rc6' of git://people.freedesktop.org/~robclark/linux into drm-fixes
misc msm fixes.
* 'msm-fixes-4.11-rc6' of git://people.freedesktop.org/~robclark/linux:
drm/msm: Make sure to detach the MMU during GPU cleanup
drm/msm/hdmi: redefinitions of macros not required
drm/msm/mdp5: Update SSPP_MAX value
drm/msm/dsi: Fix bug in dsi_mgr_phy_enable
drm/msm: Don't allow zero sized buffer objects
drm/msm: Fix wrong pointer check in a5xx_destroy
drm/msm: adreno: fix build error without debugfs
Xin Long [Sat, 1 Apr 2017 09:15:59 +0000 (17:15 +0800)]
sctp: check for dst and pathmtu update in sctp_packet_config
This patch is to move sctp_transport_dst_check into sctp_packet_config
from sctp_packet_transmit and add pathmtu check in sctp_packet_config.
With this fix, sctp can update dst or pathmtu before appending chunks,
which can void dropping packets in sctp_packet_transmit when dst is
obsolete or dst's mtu is changed.
This patch is also to improve some other codes in sctp_packet_config.
It updates packet max_size with gso_max_size, checks for dst and
pathmtu, and appends ecne chunk only when packet is empty and asoc
is not NULL.
It makes sctp flush work better, as we only need to set up them once
for one flush schedule. It's also safe, since asoc is NULL only when
the packet is created by sctp_ootb_pkt_new in which it just gets the
new dst, no need to do more things for it other than set packet with
transport's pathmtu.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Simon Horman [Mon, 3 Apr 2017 19:42:58 +0000 (15:42 -0400)]
flow dissector: correct size of storage for ARP
The last argument to __skb_header_pointer() should be a buffer large
enough to store struct arphdr. This can be a pointer to a struct arphdr
structure. The code was previously using a pointer to a pointer to
struct arphdr.
By my counting the storage available both before and after is 8 bytes on
x86_64.
Fixes:
55733350e5e8 ("flow disector: ARP support")
Reported-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jordan Crouse [Mon, 6 Feb 2017 17:39:29 +0000 (10:39 -0700)]
drm/msm: Make sure to detach the MMU during GPU cleanup
We should be detaching the MMU before destroying the address
space. To do this cleanly, the detach has to happen in
adreno_gpu_cleanup() because it needs access to structs
in adreno_gpu.c. Plus it is better symmetry to have
the attach and detach at the same code level.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Vinay Simha BN [Tue, 14 Mar 2017 05:25:56 +0000 (10:55 +0530)]
drm/msm/hdmi: redefinitions of macros not required
4 macros already defined in hdmi.h,
which is not required to redefine in hdmi_audio.c
Signed-off-by: Vinay Simha BN <simhavcs@gmail.com>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Archit Taneja [Fri, 17 Mar 2017 03:39:48 +0000 (09:09 +0530)]
drm/msm/mdp5: Update SSPP_MAX value
'SSPP_MAX + 1' is the max number of hwpipes that can be present on a
MDP5 platform. Recently, 2 new cursor hwpipes were added, which
caused overflows in arrays that used SSPP_MAX to represent the number
of elements. Update the SSPP_MAX value to incorporate the extra
hwpipes.
Signed-off-by: Archit Taneja <architt@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Archit Taneja [Thu, 16 Feb 2017 10:59:04 +0000 (16:29 +0530)]
drm/msm/dsi: Fix bug in dsi_mgr_phy_enable
A recent commit introduces a bug in dsi_mgr_phy_enable. In the non
dual DSI mode, we reset the mdsi (master DSI) PHY. This isn't right
since master and slave DSI exist only in dual DSI mode. For the normal
mode of operation, we should simply reset the PHY of the DSI device
(i.e. msm_dsi) corresponding to the current bridge.
Usage of the wrong DSI pointer also resulted in a static checker
warning. That too is resolved with this fix.
Fixes:
b62aa70a98c5 (drm/msm/dsi: Move PHY operations out of host)
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Archit Taneja <architt@codeaurora.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Jordan Crouse [Tue, 7 Mar 2017 17:02:51 +0000 (10:02 -0700)]
drm/msm: Don't allow zero sized buffer objects
Zero sized buffer objects tend to make various bits of the GEM
infrastructure complain:
WARNING: CPU: 1 PID: 2323 at drivers/gpu/drm/drm_mm.c:389 drm_mm_insert_node_generic+0x258/0x2f0
Modules linked in:
CPU: 1 PID: 2323 Comm: drm-api-test Tainted: G W
4.9.0-rc4-00906-g693af44 #213
Hardware name: Qualcomm Technologies, Inc. DB820c (DT)
task:
ffff8000d7353400 task.stack:
ffff8000d7720000
PC is at drm_mm_insert_node_generic+0x258/0x2f0
LR is at drm_vma_offset_add+0x4c/0x70
Zero sized buffers serve no appreciable value to the user so disallow
them at create time.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Jordan Crouse [Tue, 7 Mar 2017 16:50:27 +0000 (09:50 -0700)]
drm/msm: Fix wrong pointer check in a5xx_destroy
Instead of checking for a5xx_gpu->gpmu_iova during destroy we
accidently check a5xx_gpu->gpmu_bo.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Arnd Bergmann [Mon, 13 Mar 2017 16:43:48 +0000 (17:43 +0100)]
drm/msm: adreno: fix build error without debugfs
The newly added a5xx support fails to build when debugfs is diabled:
drivers/gpu/drm/msm/adreno/a5xx_gpu.c:849:4: error: 'struct msm_gpu_funcs' has no member named 'show'
drivers/gpu/drm/msm/adreno/a5xx_gpu.c:849:11: error: 'a5xx_show' undeclared here (not in a function); did you mean 'a5xx_irq'?
This adds a missing #ifdef.
Fixes:
b5f103ab98c7 ("drm/msm: gpu: Add A5XX target support")
Cc: stable@vger.kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Dave Airlie [Mon, 3 Apr 2017 19:45:27 +0000 (05:45 +1000)]
Merge branch 'vmwgfx-fixes-4.11' of git://people.freedesktop.org/~thomash/linux into drm-fixes
Set of vmwgfx fixes
* 'vmwgfx-fixes-4.11' of git://people.freedesktop.org/~thomash/linux:
drm/vmwgfx: fix integer overflow in vmw_surface_define_ioctl()
drm/vmwgfx: Remove getparam error message
drm/ttm: Avoid calling drm_ht_remove from atomic context
drm/ttm, drm/vmwgfx: Relax permission checking when opening surfaces
drm/vmwgfx: avoid calling vzalloc with a 0 size in vmw_get_cap_3d_ioctl()
drm/vmwgfx: NULL pointer dereference in vmw_surface_define_ioctl()
drm/vmwgfx: Type-check lookups of fence objects
Linus Torvalds [Mon, 3 Apr 2017 15:44:43 +0000 (08:44 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/s390/linux
Pull s390 fixes from Martin Schwidefsky:
"Four bug fixes, two of them for stable:
- avoid initrd corruptions in the kernel decompressor
- prevent inconsistent dumps if the boot CPU does not have address
zero
- fix the new pkey interface added with the merge window for 4.11
- a fix for a fix, another issue with user copy zero padding"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/uaccess: get_user() should zero on failure (again)
s390/pkey: Fix wrong handling of secure key with old MKVP
s390/smp: fix ipl from cpu with non-zero address
s390/decompressor: fix initrd corruption caused by bss clear
Linus Torvalds [Mon, 3 Apr 2017 15:36:24 +0000 (08:36 -0700)]
Merge branch 'ras-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull RAS fix from Thomas Gleixner:
"Prevent dmesg from being spammed when MCE logging is active"
* 'ras-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mce: Don't print MCEs when mcelog is active
Tobias Klauser [Mon, 3 Apr 2017 03:08:04 +0000 (20:08 -0700)]
nios2: reserve boot memory for device tree
Make sure to reserve the boot memory for the flattened device tree.
Otherwise it might get overwritten, e.g. when initial_boot_params is
copied, leading to a corrupted FDT and a boot hang/crash:
bootconsole [early0] enabled
Early console on uart16650 initialized at 0xf8001600
OF: fdt: Error -11 processing FDT
Kernel panic - not syncing: setup_cpuinfo: No CPU found in devicetree!
---[ end Kernel panic - not syncing: setup_cpuinfo: No CPU found in devicetree!
Guenter Roeck says:
> I think I found the problem. In unflatten_and_copy_device_tree(), with added
> debug information:
>
> OF: fdt: initial_boot_params=
c861e400, dt=
c861f000 size=28874 (0x70ca)
>
> ... and then initial_boot_params is copied to dt, which results in corrupted
> fdt since the memory overlaps. Looks like the initial_boot_params memory
> is not reserved and (re-)allocated by early_init_dt_alloc_memory_arch().
Cc: stable@vger.kernel.org
Reported-by: Guenter Roeck <linux@roeck-us.net>
Reference: http://lkml.kernel.org/r/
20170226210338.GA19476@roeck-us.net
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Acked-by: Ley Foon Tan <ley.foon.tan@intel.com>
Grygorii Strashko [Fri, 31 Mar 2017 23:41:23 +0000 (18:41 -0500)]
net: ethernet: ti: cpsw: wake tx queues on ndo_tx_timeout
In case, if TX watchdog is fired some or all netdev TX queues will be
stopped and as part of recovery it is required not only to drain and
reinitailize CPSW TX channeles, but also wake up stoppted TX queues what
doesn't happen now and netdevice will stop transmiting data until
reopenned.
Hence, add netif_tx_wake_all_queues() call in .ndo_tx_timeout() to complete
recovery and restore TX path.
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Mon, 3 Apr 2017 00:23:54 +0000 (17:23 -0700)]
Linux 4.11-rc5
Linus Torvalds [Sun, 2 Apr 2017 23:29:34 +0000 (16:29 -0700)]
Merge tag 'dmaengine-fix-4.11-rc5' of git://git.infradead.org/users/vkoul/slave-dma
Pull dmaengine fixes from Vinod Koul:
"A couple of minor fixes for 4.11:
- array bound fix for __get_unmap_pool()
- cyclic period splitting for bcm2835"
* tag 'dmaengine-fix-4.11-rc5' of git://git.infradead.org/users/vkoul/slave-dma:
dmaengine: Fix array index out of bounds warning in __get_unmap_pool()
dmaengine: bcm2835: Fix cyclic DMA period splitting
Linus Torvalds [Sun, 2 Apr 2017 16:27:02 +0000 (09:27 -0700)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
"This update provides:
- prevent KASLR from randomizing EFI regions
- restrict the usage of -maccumulate-outgoing-args and document when
and why it is required.
- make the Global Physical Address calculation for UV4 systems work
correctly.
- address a copy->paste->forgot-edit problem in the MCE exception
table entries.
- assign a name to AMD MCA bank 3, so the sysfs file registration
works.
- add a missing include in the boot code"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/boot: Include missing header file
x86/mce/AMD: Give a name to MCA bank 3 when accessed with legacy MSRs
x86/build: Mostly disable '-maccumulate-outgoing-args'
x86/mm/KASLR: Exclude EFI region from KASLR VA space randomization
x86/mce: Fix copy/paste error in exception table entries
x86/platform/uv: Fix calculation of Global Physical Address
Linus Torvalds [Sun, 2 Apr 2017 16:25:10 +0000 (09:25 -0700)]
Merge branch 'sched-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull scheduler fixes from Thomas Gleixner:
"This update provides:
- make the scheduler clock switch to unstable mode smooth so the
timestamps stay at microseconds granularity instead of switching to
tick granularity.
- unbreak perf test tsc by taking the new offset into account which
was added in order to proveide better sched clock continuity
- switching sched clock to unstable mode runs all clock related
computations which affect the sched clock output itself from a work
queue. In case of preemption sched clock uses half updated data and
provides wrong timestamps. Keep the math in the protected context
and delegate only the static key switch to workqueue context.
- remove a duplicate header include"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/headers: Remove duplicate #include <linux/sched/debug.h> line
sched/clock: Fix broken stable to unstable transfer
sched/clock, x86/perf: Fix "perf test tsc"
sched/clock: Fix clear_sched_clock_stable() preempt wobbly
Linus Torvalds [Sun, 2 Apr 2017 16:23:31 +0000 (09:23 -0700)]
Merge branch 'efi-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull EFI fix from Thomas Gleixner:
"Downgrade the missing ESRT header printk to warning level and remove a
useless error printk which just generates noise for no value"
* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi/esrt: Cleanup bad memory map log messages
Linus Torvalds [Sun, 2 Apr 2017 16:22:03 +0000 (09:22 -0700)]
Merge branch 'timers-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull timer fixes from Thomas Gleixner:
"Two small fixes for the new CLKEVT_OF infrastructure"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
vmlinux.lds: Add __clkevt_of_table to kernel
clockevents: Fix syntax error in clkevt-of macro
Linus Torvalds [Sun, 2 Apr 2017 16:20:34 +0000 (09:20 -0700)]
Merge branch 'irq-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
"Two small fixlets:
- select a required Kconfig to make the MVEBU driver compile
- add the missing MIPS local GIC interrupts which prevent drivers to
probe successfully"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/mips-gic: Fix Local compare interrupt
irqchip/mvebu-odmi: Select GENERIC_MSI_IRQ_DOMAIN
Linus Torvalds [Sun, 2 Apr 2017 16:18:59 +0000 (09:18 -0700)]
Merge branch 'core-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull core fix from Thomas Gleixner:
"Prevent leaking kernel memory via /proc/$pid/syscall when the queried
task is not in a syscall"
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
lib/syscall: Clear return values when no stack
Al Viro [Sat, 18 Feb 2017 01:16:34 +0000 (20:16 -0500)]
make skb_copy_datagram_msg() et.al. preserve ->msg_iter on error
Fixes the mess observed in e.g. rsync over a noisy link we'd been
seeing since last Summer. What happens is that we copy part of
a datagram before noticing a checksum mismatch. Datagram will be
resent, all right, but we want the next try go into the same place,
not after it...
All this family of primitives (copy/checksum and copy a datagram
into destination) is "all or nothing" sort of interface - either
we get 0 (meaning that copy had been successful) or we get an
error (and no way to tell how much had been copied before we ran
into whatever error it had been). Make all of them leave iterator
unadvanced in case of errors - all callers must be able to cope
with that (an error might've been caught before the iterator had
been advanced), it costs very little to arrange, it's safer for
callers and actually fixes at least one bug in said callers.
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Fri, 17 Feb 2017 23:42:24 +0000 (18:42 -0500)]
[iov_iter] new privimitive: iov_iter_revert()
opposite to iov_iter_advance(); the caller is responsible for never
using it to move back past the initial position.
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
David S. Miller [Sun, 2 Apr 2017 03:16:42 +0000 (20:16 -0700)]
Merge branch 'l2tp_session_find-fixes'
Guillaume Nault says:
====================
l2tp: fix usage of l2tp_session_find()
l2tp_session_find() doesn't take a reference on the session returned to
its caller. Virtually all l2tp_session_find() users are racy, either
because the session can disappear from under them or because they take
a reference too late. This leads to bugs like 'use after free' or
failure to notice duplicate session creations.
In some cases, taking a reference on the session is not enough. The
special callbacks .ref() and .deref() also have to be called in cases
where the PPP pseudo-wire uses the socket associated with the session.
Therefore, when looking up a session, we also have to pass a flag
indicating if the .ref() callback has to be called.
In the future, we probably could drop the .ref() and .deref() callbacks
entirely by protecting the .sock field of struct pppol2tp_session with
RCU, thus allowing it to be freed and set to NULL even if the L2TP
session is still alive.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Guillaume Nault [Fri, 31 Mar 2017 11:02:30 +0000 (13:02 +0200)]
l2tp: take a reference on sessions used in genetlink handlers
Callers of l2tp_nl_session_find() need to hold a reference on the
returned session since there's no guarantee that it isn't going to
disappear from under them.
Relying on the fact that no l2tp netlink message may be processed
concurrently isn't enough: sessions can be deleted by other means
(e.g. by closing the PPPOL2TP socket of a ppp pseudowire).
l2tp_nl_cmd_session_delete() is a bit special: it runs a callback
function that may require a previous call to session->ref(). In
particular, for ppp pseudowires, the callback is l2tp_session_delete(),
which then calls pppol2tp_session_close() and dereferences the PPPOL2TP
socket. The socket might already be gone at the moment
l2tp_session_delete() calls session->ref(), so we need to take a
reference during the session lookup. So we need to pass the do_ref
variable down to l2tp_session_get() and l2tp_session_get_by_ifname().
Since all callers have to be updated, l2tp_session_find_by_ifname() and
l2tp_nl_session_find() are renamed to reflect their new behaviour.
Fixes:
309795f4bec2 ("l2tp: Add netlink control API for L2TP")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Guillaume Nault [Fri, 31 Mar 2017 11:02:29 +0000 (13:02 +0200)]
l2tp: hold session while sending creation notifications
l2tp_session_find() doesn't take any reference on the returned session.
Therefore, the session may disappear while sending the notification.
Use l2tp_session_get() instead and decrement session's refcount once
the notification is sent.
Fixes:
33f72e6f0c67 ("l2tp : multicast notification to the registered listeners")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Guillaume Nault [Fri, 31 Mar 2017 11:02:27 +0000 (13:02 +0200)]
l2tp: fix duplicate session creation
l2tp_session_create() relies on its caller for checking for duplicate
sessions. This is racy since a session can be concurrently inserted
after the caller's verification.
Fix this by letting l2tp_session_create() verify sessions uniqueness
upon insertion. Callers need to be adapted to check for
l2tp_session_create()'s return code instead of calling
l2tp_session_find().
pppol2tp_connect() is a bit special because it has to work on existing
sessions (if they're not connected) or to create a new session if none
is found. When acting on a preexisting session, a reference must be
held or it could go away on us. So we have to use l2tp_session_get()
instead of l2tp_session_find() and drop the reference before exiting.
Fixes:
d9e31d17ceba ("l2tp: Add L2TP ethernet pseudowire support")
Fixes:
fd558d186df2 ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Guillaume Nault [Fri, 31 Mar 2017 11:02:26 +0000 (13:02 +0200)]
l2tp: ensure session can't get removed during pppol2tp_session_ioctl()
Holding a reference on session is required before calling
pppol2tp_session_ioctl(). The session could get freed while processing the
ioctl otherwise. Since pppol2tp_session_ioctl() uses the session's socket,
we also need to take a reference on it in l2tp_session_get().
Fixes:
fd558d186df2 ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Guillaume Nault [Fri, 31 Mar 2017 11:02:25 +0000 (13:02 +0200)]
l2tp: fix race in l2tp_recv_common()
Taking a reference on sessions in l2tp_recv_common() is racy; this
has to be done by the callers.
To this end, a new function is required (l2tp_session_get()) to
atomically lookup a session and take a reference on it. Callers then
have to manually drop this reference.
Fixes:
fd558d186df2 ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Fri, 31 Mar 2017 09:57:28 +0000 (17:57 +0800)]
sctp: use right in and out stream cnt
Since sctp reconf was added in sctp, the real cnt of in/out stream
have not been c.sinit_max_instreams and c.sinit_num_ostreams any
more.
This patch is to replace them with stream->in/outcnt.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Sun, 2 Apr 2017 03:11:35 +0000 (20:11 -0700)]
Merge branch 'parisc-4.11-3' of git://git./linux/kernel/git/deller/parisc-linux
Pull parisc fixes from Helge Deller:
"Al Viro reported that - in case of read faults - our copy_from_user()
implementation may claim to have copied more bytes than it actually
did. In order to fix this bug and because of the way how gcc optimizes
register usage for inline assembly in C code, we had to replace our
pa_memcpy() function with a pure assembler implementation.
While fixing the memcpy bug we noticed some other issues with our
get_user() and put_user() functions, e.g. nested faults may return
wrong data. This is now fixed by a common fixup handler for
get_user/put_user in the exception handler which additionally makes
generated code smaller and faster.
The third patch is a trivial one-line fix for a patch which went in
during 4.11-rc and which avoids stalled CPU warnings after power
shutdown (for parisc machines which can't plug power off themselves).
Due to the rewrite of pa_memcpy() into assembly this patch got bigger
than what I wanted to have sent at this stage.
Those patches have been running in production during the last few days
on our debian build servers without any further issues"
* 'parisc-4.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Avoid stalled CPU warnings after system shutdown
parisc: Clean up fixup routines for get_user()/put_user()
parisc: Fix access fault handling in pa_memcpy()
Linus Torvalds [Sun, 2 Apr 2017 03:07:31 +0000 (20:07 -0700)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Thirteen small fixes: The hopefully final effort to get the lpfc nvme
kconfig problems sorted, there's one important sg fix (user can induce
read after end of buffer) and one minor enhancement (adding an extra
PCI ID to qedi). The rest are a set of minor fixes, which mostly occur
as user visible in error legs or on specific devices"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: ufs: remove the duplicated checking for supporting clkscaling
scsi: lpfc: fix building without debugfs support
scsi: lpfc: Fix PT2PT PRLI reject
scsi: hpsa: fix volume offline state
scsi: libsas: fix ata xfer length
scsi: scsi_dh_alua: Warn if the first argument of alua_rtpg_queue() is NULL
scsi: scsi_dh_alua: Ensure that alua_activate() calls the completion function
scsi: scsi_dh_alua: Check scsi_device_get() return value
scsi: sg: check length passed to SG_NEXT_CMD_LEN
scsi: ufshcd-platform: remove the useless cast in ERR_PTR/IS_ERR
scsi: qedi: Add PCI device-ID for QL41xxx adapters.
scsi: aacraid: Fix potential null access
scsi: qla2xxx: Fix crash in qla2xxx_eh_abort on bad ptr
Linus Torvalds [Sun, 2 Apr 2017 02:45:05 +0000 (19:45 -0700)]
Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
"11 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
kasan: do not sanitize kexec purgatory
drivers/rapidio/devices/tsi721.c: make module parameter variable name unique
mm/hugetlb.c: don't call region_abort if region_chg fails
kasan: report only the first error by default
hugetlbfs: initialize shared policy as part of inode allocation
mm: fix section name for .data..ro_after_init
mm, hugetlb: use pte_present() instead of pmd_present() in follow_huge_pmd()
mm: workingset: fix premature shadow node shrinking with cgroups
mm: rmap: fix huge file mmap accounting in the memcg stats
mm: move mm_percpu_wq initialization earlier
mm: migrate: fix remove_migration_pte() for ksm pages
David S. Miller [Sat, 1 Apr 2017 20:31:15 +0000 (13:31 -0700)]
Merge tag 'mac80211-for-davem-2017-03-31' of git://git./linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
Two fixes:
* don't block netdev queues (indefinitely!) if mac80211
manages traffic queueing itself
* check wiphy registration before checking for ops
on resume, to avoid crash
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 1 Apr 2017 19:36:38 +0000 (12:36 -0700)]
Merge branch 'bpf-map_value_adj-reg-types-fixes'
Daniel Borkmann says:
====================
BPF fixes on map_value_adj reg types
This set adds two fixes for map_value_adj register type in the
verifier and user space tests along with them for the BPF self
test suite. For details, please see individual patches.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Fri, 31 Mar 2017 00:24:04 +0000 (02:24 +0200)]
bpf: add various verifier test cases for self-tests
Add a couple of test cases, for example, probing for xadd on a spilled
pointer to packet and map_value_adj register, various other map_value_adj
tests including the unaligned load/store, and trying out pointer arithmetic
on map_value_adj register itself. For the unaligned load/store, we need
to figure out whether the architecture has efficient unaligned access and
need to mark affected tests accordingly.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Fri, 31 Mar 2017 00:24:03 +0000 (02:24 +0200)]
bpf, verifier: fix rejection of unaligned access checks for map_value_adj
Currently, the verifier doesn't reject unaligned access for map_value_adj
register types. Commit
484611357c19 ("bpf: allow access into map value
arrays") added logic to check_ptr_alignment() extending it from PTR_TO_PACKET
to also PTR_TO_MAP_VALUE_ADJ, but for PTR_TO_MAP_VALUE_ADJ no enforcement
is in place, because reg->id for PTR_TO_MAP_VALUE_ADJ reg types is never
non-zero, meaning, we can cause BPF_H/_W/_DW-based unaligned access for
architectures not supporting efficient unaligned access, and thus worst
case could raise exceptions on some archs that are unable to correct the
unaligned access or perform a different memory access to the actual
requested one and such.
i) Unaligned load with !CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
on r0 (map_value_adj):
0: (bf) r2 = r10
1: (07) r2 += -8
2: (7a) *(u64 *)(r2 +0) = 0
3: (18) r1 = 0x42533a00
5: (85) call bpf_map_lookup_elem#1
6: (15) if r0 == 0x0 goto pc+11
R0=map_value(ks=8,vs=48,id=0),min_value=0,max_value=0 R10=fp
7: (61) r1 = *(u32 *)(r0 +0)
8: (35) if r1 >= 0xb goto pc+9
R0=map_value(ks=8,vs=48,id=0),min_value=0,max_value=0 R1=inv,min_value=0,max_value=10 R10=fp
9: (07) r0 += 3
10: (79) r7 = *(u64 *)(r0 +0)
R0=map_value_adj(ks=8,vs=48,id=0),min_value=3,max_value=3 R1=inv,min_value=0,max_value=10 R10=fp
11: (79) r7 = *(u64 *)(r0 +2)
R0=map_value_adj(ks=8,vs=48,id=0),min_value=3,max_value=3 R1=inv,min_value=0,max_value=10 R7=inv R10=fp
[...]
ii) Unaligned store with !CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
on r0 (map_value_adj):
0: (bf) r2 = r10
1: (07) r2 += -8
2: (7a) *(u64 *)(r2 +0) = 0
3: (18) r1 = 0x4df16a00
5: (85) call bpf_map_lookup_elem#1
6: (15) if r0 == 0x0 goto pc+19
R0=map_value(ks=8,vs=48,id=0),min_value=0,max_value=0 R10=fp
7: (07) r0 += 3
8: (7a) *(u64 *)(r0 +0) = 42
R0=map_value_adj(ks=8,vs=48,id=0),min_value=3,max_value=3 R10=fp
9: (7a) *(u64 *)(r0 +2) = 43
R0=map_value_adj(ks=8,vs=48,id=0),min_value=3,max_value=3 R10=fp
10: (7a) *(u64 *)(r0 -2) = 44
R0=map_value_adj(ks=8,vs=48,id=0),min_value=3,max_value=3 R10=fp
[...]
For the PTR_TO_PACKET type, reg->id is initially zero when skb->data
was fetched, it later receives a reg->id from env->id_gen generator
once another register with UNKNOWN_VALUE type was added to it via
check_packet_ptr_add(). The purpose of this reg->id is twofold: i) it
is used in find_good_pkt_pointers() for setting the allowed access
range for regs with PTR_TO_PACKET of same id once verifier matched
on data/data_end tests, and ii) for check_ptr_alignment() to determine
that when not having efficient unaligned access and register with
UNKNOWN_VALUE was added to PTR_TO_PACKET, that we're only allowed
to access the content bytewise due to unknown unalignment. reg->id
was never intended for PTR_TO_MAP_VALUE{,_ADJ} types and thus is
always zero, the only marking is in PTR_TO_MAP_VALUE_OR_NULL that
was added after
484611357c19 via
57a09bf0a416 ("bpf: Detect identical
PTR_TO_MAP_VALUE_OR_NULL registers"). Above tests will fail for
non-root environment due to prohibited pointer arithmetic.
The fix splits register-type specific checks into their own helper
instead of keeping them combined, so we don't run into a similar
issue in future once we extend check_ptr_alignment() further and
forget to add reg->type checks for some of the checks.
Fixes:
484611357c19 ("bpf: allow access into map value arrays")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Fri, 31 Mar 2017 00:24:02 +0000 (02:24 +0200)]
bpf, verifier: fix alu ops against map_value{, _adj} register types
While looking into map_value_adj, I noticed that alu operations
directly on the map_value() resp. map_value_adj() register (any
alu operation on a map_value() register will turn it into a
map_value_adj() typed register) are not sufficiently protected
against some of the operations. Two non-exhaustive examples are
provided that the verifier needs to reject:
i) BPF_AND on r0 (map_value_adj):
0: (bf) r2 = r10
1: (07) r2 += -8
2: (7a) *(u64 *)(r2 +0) = 0
3: (18) r1 = 0xbf842a00
5: (85) call bpf_map_lookup_elem#1
6: (15) if r0 == 0x0 goto pc+2
R0=map_value(ks=8,vs=48,id=0),min_value=0,max_value=0 R10=fp
7: (57) r0 &= 8
8: (7a) *(u64 *)(r0 +0) = 22
R0=map_value_adj(ks=8,vs=48,id=0),min_value=0,max_value=8 R10=fp
9: (95) exit
from 6 to 9: R0=inv,min_value=0,max_value=0 R10=fp
9: (95) exit
processed 10 insns
ii) BPF_ADD in 32 bit mode on r0 (map_value_adj):
0: (bf) r2 = r10
1: (07) r2 += -8
2: (7a) *(u64 *)(r2 +0) = 0
3: (18) r1 = 0xc24eee00
5: (85) call bpf_map_lookup_elem#1
6: (15) if r0 == 0x0 goto pc+2
R0=map_value(ks=8,vs=48,id=0),min_value=0,max_value=0 R10=fp
7: (04) (u32) r0 += (u32) 0
8: (7a) *(u64 *)(r0 +0) = 22
R0=map_value_adj(ks=8,vs=48,id=0),min_value=0,max_value=0 R10=fp
9: (95) exit
from 6 to 9: R0=inv,min_value=0,max_value=0 R10=fp
9: (95) exit
processed 10 insns
Issue is, while min_value / max_value boundaries for the access
are adjusted appropriately, we change the pointer value in a way
that cannot be sufficiently tracked anymore from its origin.
Operations like BPF_{AND,OR,DIV,MUL,etc} on a destination register
that is PTR_TO_MAP_VALUE{,_ADJ} was probably unintended, in fact,
all the test cases coming with
484611357c19 ("bpf: allow access
into map value arrays") perform BPF_ADD only on the destination
register that is PTR_TO_MAP_VALUE_ADJ.
Only for UNKNOWN_VALUE register types such operations make sense,
f.e. with unknown memory content fetched initially from a constant
offset from the map value memory into a register. That register is
then later tested against lower / upper bounds, so that the verifier
can then do the tracking of min_value / max_value, and properly
check once that UNKNOWN_VALUE register is added to the destination
register with type PTR_TO_MAP_VALUE{,_ADJ}. This is also what the
original use-case is solving. Note, tracking on what is being
added is done through adjust_reg_min_max_vals() and later access
to the map value enforced with these boundaries and the given offset
from the insn through check_map_access_adj().
Tests will fail for non-root environment due to prohibited pointer
arithmetic, in particular in check_alu_op(), we bail out on the
is_pointer_value() check on the dst_reg (which is false in root
case as we allow for pointer arithmetic via env->allow_ptr_leaks).
Similarly to PTR_TO_PACKET, one way to fix it is to restrict the
allowed operations on PTR_TO_MAP_VALUE{,_ADJ} registers to 64 bit
mode BPF_ADD. The test_verifier suite runs fine after the patch
and it also rejects mentioned test cases.
Fixes:
484611357c19 ("bpf: allow access into map value arrays")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
René Rebe [Tue, 28 Mar 2017 05:56:51 +0000 (07:56 +0200)]
r8152: The Microsoft Surface docks also use R8152 v2
Without this the generic cdc_ether grabs the device,
and does not really work.
Signed-off-by: René Rebe <rene@exactcode.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yi-Hung Wei [Thu, 30 Mar 2017 19:36:03 +0000 (12:36 -0700)]
openvswitch: Fix ovs_flow_key_update()
ovs_flow_key_update() is called when the flow key is invalid, and it is
used to update and revalidate the flow key. Commit
329f45bc4f19
("openvswitch: add mac_proto field to the flow key") introduces mac_proto
field to flow key and use it to determine whether the flow key is valid.
However, the commit does not update the code path in ovs_flow_key_update()
to revalidate the flow key which may cause BUG_ON() on execute_recirc().
This patch addresses the aforementioned issue.
Fixes:
329f45bc4f19 ("openvswitch: add mac_proto field to the flow key")
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Acked-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mark Brown [Thu, 30 Mar 2017 16:00:12 +0000 (17:00 +0100)]
net/faraday: Explicitly include linux/of.h and linux/property.h
This driver uses interfaces from linux/of.h and linux/property.h but
relies on implict inclusion of those headers which means that changes in
other headers could break the build, as happened in -next for arm today.
Add a explicit includes.
Signed-off-by: Mark Brown <broonie@kernel.org>
Acked-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daode Huang [Thu, 30 Mar 2017 15:37:41 +0000 (16:37 +0100)]
net: hns: Add ACPI support to check SFP present
The current code only supports DT to check SFP present.
This patch adds ACPI support as well.
Signed-off-by: Daode Huang <huangdaode@hisilicon.com>
Reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Sat, 1 Apr 2017 18:50:25 +0000 (11:50 -0700)]
Merge tag 'usb-4.11-rc5' of git://git./linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are some small USB fixes for 4.11-rc5.
The usual xhci fixes are here, as well as a fix for yet-another-bug-
found-by-KASAN, those developers are doing great stuff here.
And there's a phy build warning fix that showed up in 4.11-rc1.
All of these have been in linux-next with no reported issues"
* tag 'usb-4.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: phy: isp1301: Fix build warning when CONFIG_OF is disabled
xhci: Manually give back cancelled URB if we can't queue it for cancel
xhci: Set URB actual length for stopped control transfers
xhci: plat: Register shutdown for xhci_plat
USB: fix linked-list corruption in rh_call_control()
Linus Torvalds [Sat, 1 Apr 2017 18:47:36 +0000 (11:47 -0700)]
Merge tag 'tty-4.11-rc5' of git://git./linux/kernel/git/gregkh/tty
Pull tty/serial fixes from Greg KH:
"Here are some small fixes for some serial drivers and Kconfig help
text for 4.11-rc5. Nothing major here at all, a few things resolving
reported bugs in some random serial drivers.
I don't think these made the last linux-next due to me getting to them
yesterday, but I am not sure, they might have snuck in. The patches
only affect drivers that the maintainers of sent me these patches for,
so we should be safe here :)"
* tag 'tty-4.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
tty: pl011: fix earlycon work-around for QDF2400 erratum 44
serial: 8250_EXAR: fix duplicate Kconfig text and add missing help text
tty/serial: atmel: fix TX path in atmel_console_write()
tty/serial: atmel: fix race condition (TX+DMA)
serial: mxs-auart: Fix baudrate calculation
Linus Torvalds [Sat, 1 Apr 2017 18:22:05 +0000 (11:22 -0700)]
Merge tag 'acpi-4.11-rc5' of git://git./linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"These fix two issues related to IOAPIC hotplug, an overzealous build
optimization that prevents the function graph tracer from working with
the ACPI subsystem correctly and an RCU synchronization issue in the
ACPI APEI code.
Specifics:
- drop the unconditional setting of the '-Os' gcc flag from the ACPI
Makefile to make the function graph tracer work correctly with the
ACPI subsystem (Josh Poimboeuf).
- add missing synchronize_rcu() to ghes_remove() which removes an
element from an RCU-protected list, but fails to synchronize it
properly afterward (James Morse).
- fix two problems related to IOAPIC hotplug, a local variable
initialization in setup_res() and the creation of platform device
objects for IO(x)APICs which are (a) unused and (b) leaked on
hot-removal (Joerg Roedel)"
* tag 'acpi-4.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: Fix incompatibility with mcount-based function graph tracing
ACPI / APEI: Add missing synchronize_rcu() on NOTIFY_SCI removal
ACPI: Do not create a platform_device for IOAPIC/IOxAPIC
ACPI: ioapic: Clear on-stack resource before using it
Linus Torvalds [Sat, 1 Apr 2017 18:17:48 +0000 (11:17 -0700)]
Merge tag 'pm-4.11-rc5' of git://git./linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix a cpufreq core issue with the initialization of the cpufreq
sysfs interface and a cpuidle powernv driver initialization issue.
Specifics:
- symbolic links from CPU directories to the corresponding cpufreq
policy directories in sysfs are not created during initialization
in some cases which confuses user space, so prevent that from
happening (Rafael Wysocki).
- the powernv cpuidle driver fails to pass a correct cpumaks to the
cpuidle core in some cases which causes subsequent failures to
occur, so fix it (Vaidyanathan Srinivasan)"
* tag 'pm-4.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpuidle: powernv: Pass correct drv->cpumask for registration
cpufreq: Fix creation of symbolic links to policy directories
Linus Torvalds [Sat, 1 Apr 2017 18:13:31 +0000 (11:13 -0700)]
Merge branch 'i2c/for-current' of git://git./linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"Two bugfixes from I2C, specifically the I2C mux section. Thanks to
peda for collecting them"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: mux: pca954x: Add missing pca9546 definition to chip_desc
Revert "i2c: mux: pca954x: Add ACPI support for pca954x"
Linus Torvalds [Sat, 1 Apr 2017 17:52:19 +0000 (10:52 -0700)]
Merge tag 'arc-4.11-rc5' of git://git./linux/kernel/git/vgupta/arc
Pull ARC fixes from Vineet Gupta:
"Accumulated fixes for ARC which I've been been sitting on for a while:
- reading clk from driver vs device tree [Vlad]
- fix support for UIO in VDK platform [Alexey]
- SLC busy bit reading workaround
- build warning with kprobes header reorg"
* tag 'arc-4.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
ARC: fix build warnings with !CONFIG_KPROBES
ARCv2: SLC: Make sure busy bit is set properly on SLC flushing
ARC: vdk: Fix support of UIO
ARCv2: make unimplemented vectors as no-ops rather than halt core
ARC: get rate from clk driver instead of reading device tree
ARC: [dts] add cpu nodes to ARCHS SMP device tree
ARC: [dts] add input clocks for cpu nodes
Linus Torvalds [Sat, 1 Apr 2017 17:43:37 +0000 (10:43 -0700)]
Merge tag 'nfsd-4.11-1' of git://linux-nfs.org/~bfields/linux
Pull nfsd fixes from Bruce Fields:
"The restriction of NFSv4 to TCP went overboard and also broke the
backchannel; fix.
Also some minor refinements to the nfsd version-setting interface that
we'd like to get fixed before release"
* tag 'nfsd-4.11-1' of git://linux-nfs.org/~bfields/linux:
svcrdma: set XPT_CONG_CTRL flag for bc xprt
NFSD: fix nfsd_reset_versions for NFSv4.
NFSD: fix nfsd_minorversion(.., NFSD_AVAIL)
NFSD: further refinement of content of /proc/fs/nfsd/versions
nfsd: map the ENOKEY to nfserr_perm for avoiding warning
SUNRPC/backchanel: set XPT_CONG_CTRL flag for bc xprt
Timur Tabi [Fri, 31 Mar 2017 22:05:02 +0000 (17:05 -0500)]
tty: pl011: fix earlycon work-around for QDF2400 erratum 44
The work-around for the Qualcomm Datacenter Technologies QDF2400
erratum 44 sets the "qdf2400_e44_present" global variable if the
work-around is needed. However, this check does not happen until after
earlycon is initialized, which means the work-around is not
used, and the console hangs as soon as it displays one character.
Fixes:
d8a4995bcea1 ("tty: pl011: Work around QDF2400 E44 stuck BUSY bit")
Signed-off-by: Timur Tabi <timur@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Linus Torvalds [Sat, 1 Apr 2017 00:58:48 +0000 (17:58 -0700)]
Merge branch 'for-linus-4.11' of git://git./linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
"We have three small fixes queued up in my for-linus-4.11 branch"
* 'for-linus-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: fix an integer overflow check
btrfs: Change qgroup_meta_rsv to 64bit
Btrfs: bring back repair during read
Mike Galbraith [Fri, 31 Mar 2017 22:12:12 +0000 (15:12 -0700)]
kasan: do not sanitize kexec purgatory
Fixes this:
kexec: Undefined symbol: __asan_load8_noabort
kexec-bzImage64: Loading purgatory failed
Link: http://lkml.kernel.org/r/1489672155.4458.7.camel@gmx.de
Signed-off-by: Mike Galbraith <efault@gmx.de>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Randy Dunlap [Fri, 31 Mar 2017 22:12:10 +0000 (15:12 -0700)]
drivers/rapidio/devices/tsi721.c: make module parameter variable name unique
kbuild test robot reported a non-static variable name collision between
a staging driver and a RapidIO driver, with a generic variable name of
'dbg_level'.
Both drivers should be changed so that they don't use this generic
public variable name. This patch fixes the RapidIO driver but does not
change the user interface (name) for the module parameter.
drivers/staging/built-in.o:(.bss+0x109d0): multiple definition of `dbg_level'
drivers/rapidio/built-in.o:(.bss+0x16c): first defined here
Link: http://lkml.kernel.org/r/ab527fc5-aa3c-4b07-5d48-eef5de703192@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Jérémy Lefaure <jeremy.lefaure@lse.epita.fr>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mike Kravetz [Fri, 31 Mar 2017 22:12:07 +0000 (15:12 -0700)]
mm/hugetlb.c: don't call region_abort if region_chg fails
Changes to hugetlbfs reservation maps is a two step process. The first
step is a call to region_chg to determine what needs to be changed, and
prepare that change. This should be followed by a call to call to
region_add to commit the change, or region_abort to abort the change.
The error path in hugetlb_reserve_pages called region_abort after a
failed call to region_chg. As a result, the adds_in_progress counter in
the reservation map is off by 1. This is caught by a VM_BUG_ON in
resv_map_release when the reservation map is freed.
syzkaller fuzzer (when using an injected kmalloc failure) found this
bug, that resulted in the following:
kernel BUG at mm/hugetlb.c:742!
Call Trace:
hugetlbfs_evict_inode+0x7b/0xa0 fs/hugetlbfs/inode.c:493
evict+0x481/0x920 fs/inode.c:553
iput_final fs/inode.c:1515 [inline]
iput+0x62b/0xa20 fs/inode.c:1542
hugetlb_file_setup+0x593/0x9f0 fs/hugetlbfs/inode.c:1306
newseg+0x422/0xd30 ipc/shm.c:575
ipcget_new ipc/util.c:285 [inline]
ipcget+0x21e/0x580 ipc/util.c:639
SYSC_shmget ipc/shm.c:673 [inline]
SyS_shmget+0x158/0x230 ipc/shm.c:657
entry_SYSCALL_64_fastpath+0x1f/0xc2
RIP: resv_map_release+0x265/0x330 mm/hugetlb.c:742
Link: http://lkml.kernel.org/r/1490821682-23228-1-git-send-email-mike.kravetz@oracle.com
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mark Rutland [Fri, 31 Mar 2017 22:12:04 +0000 (15:12 -0700)]
kasan: report only the first error by default
Disable kasan after the first report. There are several reasons for
this:
- Single bug quite often has multiple invalid memory accesses causing
storm in the dmesg.
- Write OOB access might corrupt metadata so the next report will print
bogus alloc/free stacktraces.
- Reports after the first easily could be not bugs by itself but just
side effects of the first one.
Given that multiple reports usually only do harm, it makes sense to
disable kasan after the first one. If user wants to see all the
reports, the boot-time parameter kasan_multi_shot must be used.
[aryabinin@virtuozzo.com: wrote changelog and doc, added missing include]
Link: http://lkml.kernel.org/r/20170323154416.30257-1-aryabinin@virtuozzo.com
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mike Kravetz [Fri, 31 Mar 2017 22:12:01 +0000 (15:12 -0700)]
hugetlbfs: initialize shared policy as part of inode allocation
Any time after inode allocation, destroy_inode can be called. The
hugetlbfs inode contains a shared_policy structure, and
mpol_free_shared_policy is unconditionally called as part of
hugetlbfs_destroy_inode. Initialize the policy as part of inode
allocation so that any quick (error path) calls to destroy_inode will be
handed an initialized policy.
syzkaller fuzzer found this bug, that resulted in the following:
BUG: KASAN: user-memory-access in atomic_inc
include/asm-generic/atomic-instrumented.h:87 [inline] at addr
000000131730bd7a
BUG: KASAN: user-memory-access in __lock_acquire+0x21a/0x3a80
kernel/locking/lockdep.c:3239 at addr
000000131730bd7a
Write of size 4 by task syz-executor6/14086
CPU: 3 PID: 14086 Comm: syz-executor6 Not tainted 4.11.0-rc3+ #364
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
atomic_inc include/asm-generic/atomic-instrumented.h:87 [inline]
__lock_acquire+0x21a/0x3a80 kernel/locking/lockdep.c:3239
lock_acquire+0x1ee/0x590 kernel/locking/lockdep.c:3762
__raw_write_lock include/linux/rwlock_api_smp.h:210 [inline]
_raw_write_lock+0x33/0x50 kernel/locking/spinlock.c:295
mpol_free_shared_policy+0x43/0xb0 mm/mempolicy.c:2536
hugetlbfs_destroy_inode+0xca/0x120 fs/hugetlbfs/inode.c:952
alloc_inode+0x10d/0x180 fs/inode.c:216
new_inode_pseudo+0x69/0x190 fs/inode.c:889
new_inode+0x1c/0x40 fs/inode.c:918
hugetlbfs_get_inode+0x40/0x420 fs/hugetlbfs/inode.c:734
hugetlb_file_setup+0x329/0x9f0 fs/hugetlbfs/inode.c:1282
newseg+0x422/0xd30 ipc/shm.c:575
ipcget_new ipc/util.c:285 [inline]
ipcget+0x21e/0x580 ipc/util.c:639
SYSC_shmget ipc/shm.c:673 [inline]
SyS_shmget+0x158/0x230 ipc/shm.c:657
entry_SYSCALL_64_fastpath+0x1f/0xc2
Analysis provided by Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Link: http://lkml.kernel.org/r/1490477850-7944-1-git-send-email-mike.kravetz@oracle.com
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kees Cook [Fri, 31 Mar 2017 22:11:58 +0000 (15:11 -0700)]
mm: fix section name for .data..ro_after_init
A section name for .data..ro_after_init was added by both:
commit
d07a980c1b8d ("s390: add proper __ro_after_init support")
and
commit
d7c19b066dcf ("mm: kmemleak: scan .data.ro_after_init")
The latter adds incorrect wrapping around the existing s390 section, and
came later. I'd prefer the s390 naming, so this moves the s390-specific
name up to the asm-generic/sections.h and renames the section as used by
kmemleak (and in the future, kernel/extable.c).
Link: http://lkml.kernel.org/r/20170327192213.GA129375@beast
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> [s390 parts]
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Cc: Eddie Kovsky <ewk@edkovsky.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Naoya Horiguchi [Fri, 31 Mar 2017 22:11:55 +0000 (15:11 -0700)]
mm, hugetlb: use pte_present() instead of pmd_present() in follow_huge_pmd()
I found the race condition which triggers the following bug when
move_pages() and soft offline are called on a single hugetlb page
concurrently.
Soft offlining page 0x119400 at 0x700000000000
BUG: unable to handle kernel paging request at
ffffea0011943820
IP: follow_huge_pmd+0x143/0x190
PGD
7ffd2067
PUD
7ffd1067
PMD 0
[61163.582052] Oops: 0000 [#1] SMP
Modules linked in: binfmt_misc ppdev virtio_balloon parport_pc pcspkr i2c_piix4 parport i2c_core acpi_cpufreq ip_tables xfs libcrc32c ata_generic pata_acpi virtio_blk 8139too crc32c_intel ata_piix serio_raw libata virtio_pci 8139cp virtio_ring virtio mii floppy dm_mirror dm_region_hash dm_log dm_mod [last unloaded: cap_check]
CPU: 0 PID: 22573 Comm: iterate_numa_mo Tainted: P OE 4.11.0-rc2-mm1+ #2
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
RIP: 0010:follow_huge_pmd+0x143/0x190
RSP: 0018:
ffffc90004bdbcd0 EFLAGS:
00010202
RAX:
0000000465003e80 RBX:
ffffea0004e34d30 RCX:
00003ffffffff000
RDX:
0000000011943800 RSI:
0000000000080001 RDI:
0000000465003e80
RBP:
ffffc90004bdbd18 R08:
0000000000000000 R09:
ffff880138d34000
R10:
ffffea0004650000 R11:
0000000000c363b0 R12:
ffffea0011943800
R13:
ffff8801b8d34000 R14:
ffffea0000000000 R15:
000077ff80000000
FS:
00007fc977710740(0000) GS:
ffff88007dc00000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
ffffea0011943820 CR3:
000000007a746000 CR4:
00000000001406f0
Call Trace:
follow_page_mask+0x270/0x550
SYSC_move_pages+0x4ea/0x8f0
SyS_move_pages+0xe/0x10
do_syscall_64+0x67/0x180
entry_SYSCALL64_slow_path+0x25/0x25
RIP: 0033:0x7fc976e03949
RSP: 002b:
00007ffe72221d88 EFLAGS:
00000246 ORIG_RAX:
0000000000000117
RAX:
ffffffffffffffda RBX:
0000000000000000 RCX:
00007fc976e03949
RDX:
0000000000c22390 RSI:
0000000000001400 RDI:
0000000000005827
RBP:
00007ffe72221e00 R08:
0000000000c2c3a0 R09:
0000000000000004
R10:
0000000000c363b0 R11:
0000000000000246 R12:
0000000000400650
R13:
00007ffe72221ee0 R14:
0000000000000000 R15:
0000000000000000
Code: 81 e4 ff ff 1f 00 48 21 c2 49 c1 ec 0c 48 c1 ea 0c 4c 01 e2 49 bc 00 00 00 00 00 ea ff ff 48 c1 e2 06 49 01 d4 f6 45 bc 04 74 90 <49> 8b 7c 24 20 40 f6 c7 01 75 2b 4c 89 e7 8b 47 1c 85 c0 7e 2a
RIP: follow_huge_pmd+0x143/0x190 RSP:
ffffc90004bdbcd0
CR2:
ffffea0011943820
---[ end trace
e4f81353a2d23232 ]---
Kernel panic - not syncing: Fatal exception
Kernel Offset: disabled
This bug is triggered when pmd_present() returns true for non-present
hugetlb, so fixing the present check in follow_huge_pmd() prevents it.
Using pmd_present() to determine present/non-present for hugetlb is not
correct, because pmd_present() checks multiple bits (not only
_PAGE_PRESENT) for historical reason and it can misjudge hugetlb state.
Fixes:
e66f17ff7177 ("mm/hugetlb: take page table lock in follow_huge_pmd()")
Link: http://lkml.kernel.org/r/1490149898-20231-1-git-send-email-n-horiguchi@ah.jp.nec.com
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: <stable@vger.kernel.org> [4.0+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Johannes Weiner [Fri, 31 Mar 2017 22:11:52 +0000 (15:11 -0700)]
mm: workingset: fix premature shadow node shrinking with cgroups
Commit
0a6b76dd23fa ("mm: workingset: make shadow node shrinker memcg
aware") enabled cgroup-awareness in the shadow node shrinker, but forgot
to also enable cgroup-awareness in the list_lru the shadow nodes sit on.
Consequently, all shadow nodes are sitting on a global (per-NUMA node)
list, while the shrinker applies the limits according to the amount of
cache in the cgroup its shrinking. The result is excessive pressure on
the shadow nodes from cgroups that have very little cache.
Enable memcg-mode on the shadow node LRUs, such that per-cgroup limits
are applied to per-cgroup lists.
Fixes:
0a6b76dd23fa ("mm: workingset: make shadow node shrinker memcg aware")
Link: http://lkml.kernel.org/r/20170322005320.8165-1-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Vladimir Davydov <vdavydov@tarantool.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: <stable@vger.kernel.org> [4.6+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Johannes Weiner [Fri, 31 Mar 2017 22:11:50 +0000 (15:11 -0700)]
mm: rmap: fix huge file mmap accounting in the memcg stats
Huge pages are accounted as single units in the memcg's "file_mapped"
counter. Account the correct number of base pages, like we do in the
corresponding node counter.
Link: http://lkml.kernel.org/r/20170322005111.3156-1-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: <stable@vger.kernel.org> [4.8+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michal Hocko [Fri, 31 Mar 2017 22:11:47 +0000 (15:11 -0700)]
mm: move mm_percpu_wq initialization earlier
Yang Li has reported that drain_all_pages triggers a WARN_ON which means
that this function is called earlier than the mm_percpu_wq is
initialized on arm64 with CMA configured:
WARNING: CPU: 2 PID: 1 at mm/page_alloc.c:2423 drain_all_pages+0x244/0x25c
Modules linked in:
CPU: 2 PID: 1 Comm: swapper/0 Not tainted
4.11.0-rc1-next-20170310-00027-g64dfbc5 #127
Hardware name: Freescale Layerscape 2088A RDB Board (DT)
task:
ffffffc07c4a6d00 task.stack:
ffffffc07c4a8000
PC is at drain_all_pages+0x244/0x25c
LR is at start_isolate_page_range+0x14c/0x1f0
[...]
drain_all_pages+0x244/0x25c
start_isolate_page_range+0x14c/0x1f0
alloc_contig_range+0xec/0x354
cma_alloc+0x100/0x1fc
dma_alloc_from_contiguous+0x3c/0x44
atomic_pool_init+0x7c/0x208
arm64_dma_init+0x44/0x4c
do_one_initcall+0x38/0x128
kernel_init_freeable+0x1a0/0x240
kernel_init+0x10/0xfc
ret_from_fork+0x10/0x20
Fix this by moving the whole setup_vmstat which is an initcall right now
to init_mm_internals which will be called right after the WQ subsystem
is initialized.
Link: http://lkml.kernel.org/r/20170315164021.28532-1-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Yang Li <pku.leo@gmail.com>
Tested-by: Yang Li <pku.leo@gmail.com>
Tested-by: Xiaolong Ye <xiaolong.ye@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Naoya Horiguchi [Fri, 31 Mar 2017 22:11:44 +0000 (15:11 -0700)]
mm: migrate: fix remove_migration_pte() for ksm pages
I found that calling page migration for ksm pages causes the following
bug:
page:
ffffea0004d51180 count:2 mapcount:2 mapping:
ffff88013c785141 index:0x913
flags: 0x57ffffc0040068(uptodate|lru|active|swapbacked)
raw:
0057ffffc0040068 ffff88013c785141 0000000000000913 0000000200000001
raw:
ffffea0004d5f9e0 ffffea0004d53f60 0000000000000000 ffff88007d81b800
page dumped because: VM_BUG_ON_PAGE(!PageLocked(page))
page->mem_cgroup:
ffff88007d81b800
------------[ cut here ]------------
kernel BUG at /src/linux-dev/mm/rmap.c:1086!
invalid opcode: 0000 [#1] SMP
Modules linked in: ppdev parport_pc virtio_balloon i2c_piix4 pcspkr parport i2c_core acpi_cpufreq ip_tables xfs libcrc32c ata_generic pata_acpi ata_piix 8139too libata virtio_blk 8139cp crc32c_intel mii virtio_pci virtio_ring serio_raw virtio floppy dm_mirror dm_region_hash dm_log dm_mod
CPU: 0 PID: 3162 Comm: bash Not tainted 4.11.0-rc2-mm1+ #1
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
RIP: 0010:do_page_add_anon_rmap+0x1ba/0x260
RSP: 0018:
ffffc90002473b30 EFLAGS:
00010282
RAX:
0000000000000021 RBX:
ffffea0004d51180 RCX:
0000000000000006
RDX:
0000000000000000 RSI:
0000000000000082 RDI:
ffff88007dc0dfe0
RBP:
ffffc90002473b58 R08:
00000000fffffffe R09:
00000000000001c1
R10:
0000000000000005 R11:
00000000000001c0 R12:
ffff880139ab3d80
R13:
0000000000000000 R14:
0000700000000200 R15:
0000160000000000
FS:
00007f5195f50740(0000) GS:
ffff88007dc00000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
00007fd450287000 CR3:
000000007a08e000 CR4:
00000000001406f0
Call Trace:
page_add_anon_rmap+0x18/0x20
remove_migration_pte+0x220/0x2c0
rmap_walk_ksm+0x143/0x220
rmap_walk+0x55/0x60
remove_migration_ptes+0x53/0x80
migrate_pages+0x8ed/0xb60
soft_offline_page+0x309/0x8d0
store_soft_offline_page+0xaf/0xf0
dev_attr_store+0x18/0x30
sysfs_kf_write+0x3a/0x50
kernfs_fop_write+0xff/0x180
__vfs_write+0x37/0x160
vfs_write+0xb2/0x1b0
SyS_write+0x55/0xc0
do_syscall_64+0x67/0x180
entry_SYSCALL64_slow_path+0x25/0x25
RIP: 0033:0x7f51956339e0
RSP: 002b:
00007ffcfa0dffc8 EFLAGS:
00000246 ORIG_RAX:
0000000000000001
RAX:
ffffffffffffffda RBX:
000000000000000c RCX:
00007f51956339e0
RDX:
000000000000000c RSI:
00007f5195f53000 RDI:
0000000000000001
RBP:
00007f5195f53000 R08:
000000000000000a R09:
00007f5195f50740
R10:
000000000000000b R11:
0000000000000246 R12:
00007f5195907400
R13:
000000000000000c R14:
0000000000000001 R15:
0000000000000000
Code: fe ff ff 48 81 c2 00 02 00 00 48 89 55 d8 e8 2e c3 fd ff 48 8b 55 d8 e9 42 ff ff ff 48 c7 c6 e0 52 a1 81 48 89 df e8 46 ad fe ff <0f> 0b 48 83 e8 01 e9 7f fe ff ff 48 83 e8 01 e9 96 fe ff ff 48
RIP: do_page_add_anon_rmap+0x1ba/0x260 RSP:
ffffc90002473b30
---[ end trace
a679d00f4af2df48 ]---
Kernel panic - not syncing: Fatal exception
Kernel Offset: disabled
---[ end Kernel panic - not syncing: Fatal exception
The problem is in the following lines:
new = page - pvmw.page->index +
linear_page_index(vma, pvmw.address);
The 'new' is calculated with 'page' which is given by the caller as a
destination page and some offset adjustment for thp. But this doesn't
properly work for ksm pages because pvmw.page->index doesn't change for
each address but linear_page_index() changes, which means that 'new'
points to different pages for each addresses backed by the ksm page. As
a result, we try to set totally unrelated pages as destination pages,
and that causes kernel crash.
This patch fixes the miscalculation and makes ksm page migration work
fine.
Fixes:
3fe87967c536 ("mm: convert remove_migration_pte() to use page_vma_mapped_walk()")
Link: http://lkml.kernel.org/r/1489717683-29905-1-git-send-email-n-horiguchi@ah.jp.nec.com
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>