Marek Vasut [Wed, 18 May 2016 14:16:51 +0000 (16:16 +0200)]
configfs: Remove ppos increment in configfs_write_bin_file
The simple_write_to_buffer() already increments the @ppos on success,
see fs/libfs.c simple_write_to_buffer() comment:
"
On success, the number of bytes written is returned and the offset @ppos
advanced by this number, or negative value is returned on error.
"
If the configfs_write_bin_file() is invoked with @count smaller than the
total length of the written binary file, it will be invoked multiple times.
Since configfs_write_bin_file() increments @ppos on success, after calling
simple_write_to_buffer(), the @ppos is incremented twice.
Subsequent invocation of configfs_write_bin_file() will result in the next
piece of data being written to the offset twice as long as the length of
the previous write, thus creating buffer with "holes" in it.
The simple testcase using DTO follows:
$ mkdir /sys/kernel/config/device-tree/overlays/1
$ dd bs=1 if=foo.dtbo of=/sys/kernel/config/device-tree/overlays/1/dtbo
Without this patch, the testcase will result in twice as big buffer in the
kernel, which is then passed to the cfs_overlay_item_dtbo_write() .
Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Pantelis Antoniou <pantelis.antoniou@konsulko.com>
Linus Torvalds [Wed, 29 Jun 2016 22:30:26 +0000 (15:30 -0700)]
Merge tag 'nfs-for-4.7-2' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull NFS client bugfixes from Anna Schumaker:
"Stable bugfixes:
- Fix _cancel_empty_pagelist
- Fix a double page unlock
- Make nfs_atomic_open() call d_drop() on all ->open_context() errors.
- Fix another OPEN_DOWNGRADE bug
Other bugfixes:
- Ensure we handle delegation errors in nfs4_proc_layoutget()
- Layout stateids start out as being invalid
- Add sparse lock annotations for pnfs_find_alloc_layout
- Handle bad delegation stateids in nfs4_layoutget_handle_exception
- Fix up O_DIRECT results
- Fix potential use after free of state in nfs4_do_reclaim.
- Mark the layout stateid invalid when all segments are removed
- Don't let readdirplus revalidate an inode that was marked as stale
- Fix potential race in nfs_fhget()
- Fix an unused variable warning"
* tag 'nfs-for-4.7-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
NFS: Fix another OPEN_DOWNGRADE bug
make nfs_atomic_open() call d_drop() on all ->open_context() errors.
NFS: Fix an unused variable warning
NFS: Fix potential race in nfs_fhget()
NFS: Don't let readdirplus revalidate an inode that was marked as stale
NFSv4.1/pnfs: Mark the layout stateid invalid when all segments are removed
NFS: Fix a double page unlock
pnfs_nfs: fix _cancel_empty_pagelist
nfs4: Fix potential use after free of state in nfs4_do_reclaim.
NFS: Fix up O_DIRECT results
NFS/pnfs: handle bad delegation stateids in nfs4_layoutget_handle_exception
NFSv4.1/pnfs: Add sparse lock annotations for pnfs_find_alloc_layout
NFSv4.1/pnfs: Layout stateids start out as being invalid
NFSv4.1/pnfs: Ensure we handle delegation errors in nfs4_proc_layoutget()
Linus Torvalds [Wed, 29 Jun 2016 22:18:47 +0000 (15:18 -0700)]
Merge branch 'stable-4.7' of git://git.infradead.org/users/pcmoore/audit
Pull audit fixes from Paul Moore:
"Two small patches to fix audit problems in 4.7-rcX: the first fixes a
potential kref leak, the second removes some header file noise.
The first is an important bug fix that really should go in before 4.7
is released, the second is not critical, but falls into the very-nice-
to-have category so I'm including in the pull request.
Both patches are straightforward, self-contained, and pass our
testsuite without problem"
* 'stable-4.7' of git://git.infradead.org/users/pcmoore/audit:
audit: move audit_get_tty to reduce scope and kabi changes
audit: move calcs after alloc and check when logging set loginuid
Linus Torvalds [Wed, 29 Jun 2016 18:50:42 +0000 (11:50 -0700)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
"I've been traveling so this accumulates more than week or so of bug
fixing. It perhaps looks a little worse than it really is.
1) Fix deadlock in ath10k driver, from Ben Greear.
2) Increase scan timeout in iwlwifi, from Luca Coelho.
3) Unbreak STP by properly reinjecting STP packets back into the
stack. Regression fix from Ido Schimmel.
4) Mediatek driver fixes (missing malloc failure checks, leaking of
scratch memory, wrong indexing when mapping TX buffers, etc.) from
John Crispin.
5) Fix endianness bug in icmpv6_err() handler, from Hannes Frederic
Sowa.
6) Fix hashing of flows in UDP in the ruseport case, from Xuemin Su.
7) Fix netlink notifications in ovs for tunnels, delete link messages
are never emitted because of how the device registry state is
handled. From Nicolas Dichtel.
8) Conntrack module leaks kmemcache on unload, from Florian Westphal.
9) Prevent endless jump loops in nft rules, from Liping Zhang and
Pablo Neira Ayuso.
10) Not early enough spinlock initialization in mlx4, from Eric
Dumazet.
11) Bind refcount leak in act_ipt, from Cong WANG.
12) Missing RCU locking in HTB scheduler, from Florian Westphal.
13) Several small MACSEC bug fixes from Sabrina Dubroca (missing RCU
barrier, using heap for SG and IV, and erroneous use of async flag
when allocating AEAD conext.)
14) RCU handling fix in TIPC, from Ying Xue.
15) Pass correct protocol down into ipv4_{update_pmtu,redirect}() in
SIT driver, from Simon Horman.
16) Socket timer deadlock fix in TIPC from Jon Paul Maloy.
17) Fix potential deadlock in team enslave, from Ido Schimmel.
18) Memory leak in KCM procfs handling, from Jiri Slaby.
19) ESN generation fix in ipv4 ESP, from Herbert Xu.
20) Fix GFP_KERNEL allocations with locks held in act_ife, from Cong
WANG.
21) Use after free in netem, from Eric Dumazet.
22) Uninitialized last assert time in multicast router code, from Tom
Goff.
23) Skip raw sockets in sock_diag destruction broadcast, from Willem
de Bruijn.
24) Fix link status reporting in thunderx, from Sunil Goutham.
25) Limit resegmentation of retransmit queue so that we do not
retransmit too large GSO frames. From Eric Dumazet.
26) Delay bpf program release after grace period, from Daniel
Borkmann"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (141 commits)
openvswitch: fix conntrack netlink event delivery
qed: Protect the doorbell BAR with the write barriers.
neigh: Explicitly declare RCU-bh read side critical section in neigh_xmit()
e1000e: keep VLAN interfaces functional after rxvlan off
cfg80211: fix proto in ieee80211_data_to_8023 for frames without LLC header
qlcnic: use the correct ring in qlcnic_83xx_process_rcv_ring_diag()
bpf, perf: delay release of BPF prog after grace period
net: bridge: fix vlan stats continue counter
tcp: do not send too big packets at retransmit time
ibmvnic: fix to use list_for_each_safe() when delete items
net: thunderx: Fix TL4 configuration for secondary Qsets
net: thunderx: Fix link status reporting
net/mlx5e: Reorganize ethtool statistics
net/mlx5e: Fix number of PFC counters reported to ethtool
net/mlx5e: Prevent adding the same vxlan port
net/mlx5e: Check for BlueFlame capability before allocating SQ uar
net/mlx5e: Change enum to better reflect usage
net/mlx5: Add ConnectX-5 PCIe 4.0 to list of supported devices
net/mlx5: Update command strings
net: marvell: Add separate config ANEG function for Marvell
88E1111
...
Linus Torvalds [Wed, 29 Jun 2016 18:48:05 +0000 (11:48 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/s390/linux
Pull s390 fixes from Martin Schwidefsky:
"Another two bug fixes for 4.7:
- The revert of patch which removed boot information for systems
using an intermediate boot kernel, e.g. the SLES12 grub setup.
- A fix for an incorrect inline assembly constraint that causes
broken code to be generated with gcc 4.8.5"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390: fix test_fp_ctl inline assembly contraints
Revert "s390/kdump: Clear subchannel ID to signal non-CCW/SCSI IPL"
Linus Torvalds [Wed, 29 Jun 2016 17:05:44 +0000 (10:05 -0700)]
Merge tag 'pinctrl-v4.7-3' of git://git./linux/kernel/git/linusw/linux-pinctrl
Pull pin control fixes from Linus Walleij:
"Here are a bunch of fixes for pin control. Just drivers and a
MAINTAINERS fixup:
- Driver fixes for i.MX, single register, Tegra and BayTrail.
- MAINTAINERS entry for the documentation"
* tag 'pinctrl-v4.7-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: baytrail: Fix mingled clock pins
MAINTAINERS: belong Documentation/pinctrl.txt properly
pinctrl: tegra: Fix build dependency
gpio: tegra: Make lockdep class file-scoped
pinctrl: single: Fix missing flush of posted write for a wakeirq
pinctrl: imx: Do not treat a PIN without MUX register as an error
Linus Torvalds [Wed, 29 Jun 2016 17:04:42 +0000 (10:04 -0700)]
Merge branch 'for-4.7-fixes' of git://git./linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo:
"Three fix patches. Two are for cgroup / css init failure path. The
last one makes css_set_lock irq-safe as the deadline scheduler ends up
calling put_css_set() from irq context"
* 'for-4.7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: Disable IRQs while holding css_set_lock
cgroup: set css->id to -1 during init
cgroup: remove redundant cleanup in css_create
David S. Miller [Wed, 29 Jun 2016 12:33:46 +0000 (08:33 -0400)]
Merge tag 'mac80211-for-davem-2016-06-29-v2' of git://git./linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
Just two small fixes
* fix mesh peer link counter, decrement wasn't always done at all
* fix ethertype (length) for packets without RFC 1042 or bridge
tunnel header
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Samuel Gauthier [Tue, 28 Jun 2016 15:22:26 +0000 (17:22 +0200)]
openvswitch: fix conntrack netlink event delivery
Only the first and last netlink message for a particular conntrack are
actually sent. The first message is sent through nf_conntrack_confirm when
the conntrack is committed. The last one is sent when the conntrack is
destroyed on timeout. The other conntrack state change messages are not
advertised.
When the conntrack subsystem is used from netfilter, nf_conntrack_confirm
is called for each packet, from the postrouting hook, which in turn calls
nf_ct_deliver_cached_events to send the state change netlink messages.
This commit fixes the problem by calling nf_ct_deliver_cached_events in the
non-commit case as well.
Fixes:
7f8a436eaa2c ("openvswitch: Add conntrack action")
CC: Joe Stringer <joestringer@nicira.com>
CC: Justin Pettit <jpettit@nicira.com>
CC: Andy Zhou <azhou@nicira.com>
CC: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Samuel Gauthier <samuel.gauthier@6wind.com>
Acked-by: Joe Stringer <joe@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sudarsana Reddy Kalluru [Tue, 28 Jun 2016 11:46:03 +0000 (07:46 -0400)]
qed: Protect the doorbell BAR with the write barriers.
SPQ doorbell is currently protected with the compilation barrier. Under the
stress scenarios, we may get into a state where (due to the weak ordering)
several ramrod doorbells were written to the BAR with an out-of-order
producer values. Need to change the barrier type to a write barrier to make
sure that the write buffer is flushed after each doorbell.
Signed-off-by: Sudarsana Reddy Kalluru <sudarsana.kalluru@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Barroso [Tue, 28 Jun 2016 08:16:43 +0000 (11:16 +0300)]
neigh: Explicitly declare RCU-bh read side critical section in neigh_xmit()
neigh_xmit() expects to be called inside an RCU-bh read side critical
section, and while one of its two current callers gets this right, the
other one doesn't.
More specifically, neigh_xmit() has two callers, mpls_forward() and
mpls_output(), and while both callers call neigh_xmit() under
rcu_read_lock(), this provides sufficient protection for neigh_xmit()
only in the case of mpls_forward(), as that is always called from
softirq context and therefore doesn't need explicit BH protection,
while mpls_output() can be called from process context with softirqs
enabled.
When mpls_output() is called from process context, with softirqs
enabled, we can be preempted by a softirq at any time, and RCU-bh
considers the completion of a softirq as signaling the end of any
pending read-side critical sections, so if we do get a softirq
while we are in the part of neigh_xmit() that expects to be run inside
an RCU-bh read side critical section, we can end up with an unexpected
RCU grace period running right in the middle of that critical section,
making things go boom.
This patch fixes this impedance mismatch in the callee, by making
neigh_xmit() always take rcu_read_{,un}lock_bh() around the code that
expects to be treated as an RCU-bh read side critical section, as this
seems a safer option than fixing it in the callers.
Fixes:
4fd3d7d9e868f ("neigh: Add helper function neigh_xmit")
Signed-off-by: David Barroso <dbarroso@fastly.com>
Signed-off-by: Lennert Buytenhek <lbuytenhek@fastly.com>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Acked-by: Robert Shearman <rshearma@brocade.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jarod Wilson [Wed, 29 Jun 2016 03:41:31 +0000 (20:41 -0700)]
e1000e: keep VLAN interfaces functional after rxvlan off
I've got a bug report about an e1000e interface, where a VLAN interface is
set up on top of it:
$ ip link add link ens1f0 name ens1f0.99 type vlan id 99
$ ip link set ens1f0 up
$ ip link set ens1f0.99 up
$ ip addr add 192.168.99.92 dev ens1f0.99
At this point, I can ping another host on vlan 99, ip 192.168.99.91.
However, if I do the following:
$ ethtool -K ens1f0 rxvlan off
Then no traffic passes on ens1f0.99. It comes back if I toggle rxvlan on
again. I'm not sure if this is actually intended behavior, or if there's a
lack of software VLAN stripping fallback, or what, but things continue to
work if I simply don't call e1000e_vlan_strip_disable() if there are
active VLANs (plagiarizing a function from the e1000 driver here) on the
interface.
Also slipped a related-ish fix to the kerneldoc text for
e1000e_vlan_strip_disable here...
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Felix Fietkau [Wed, 29 Jun 2016 08:36:39 +0000 (10:36 +0200)]
cfg80211: fix proto in ieee80211_data_to_8023 for frames without LLC header
The PDU length of incoming LLC frames is set to the total skb payload size
in __ieee80211_data_to_8023() of net/wireless/util.c which incorrectly
includes the length of the IEEE 802.11 header.
The resulting LLC frame header has a too large PDU length, causing the
llc_fixup_skb() function of net/llc/llc_input.c to reject the incoming
skb, effectively breaking STP.
Solve the problem by properly substracting the IEEE 802.11 frame header size
from the PDU length, allowing the LLC processor to pick up the incoming
control messages.
Special thanks to Gerry Rozema for tracking down the regression and proposing
a suitable patch.
Fixes:
2d1c304cb2d5 ("cfg80211: add function for 802.3 conversion with separate output buffer")
Cc: stable@vger.kernel.org
Reported-by: Gerry Rozema <gerryr@rozeware.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Dan Carpenter [Mon, 27 Jun 2016 20:50:29 +0000 (23:50 +0300)]
qlcnic: use the correct ring in qlcnic_83xx_process_rcv_ring_diag()
There is a static checker warning here "warn: mask and shift to zero"
and the code sets "ring" to zero every time. From looking at how
QLCNIC_FETCH_RING_ID() is used in qlcnic_83xx_process_rcv_ring() the
qlcnic_83xx_hndl() should be removed.
Fixes:
4be41e92f7c6 ('qlcnic: 83xx data path routines')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Mon, 27 Jun 2016 19:38:11 +0000 (21:38 +0200)]
bpf, perf: delay release of BPF prog after grace period
Commit
dead9f29ddcc ("perf: Fix race in BPF program unregister") moved
destruction of BPF program from free_event_rcu() callback to __free_event(),
which is problematic if used with tail calls: if prog A is attached as
trace event directly, but at the same time present in a tail call map used
by another trace event program elsewhere, then we need to delay destruction
via RCU grace period since it can still be in use by the program doing the
tail call (the prog first needs to be dropped from the tail call map, then
trace event with prog A attached destroyed, so we get immediate destruction).
Fixes:
dead9f29ddcc ("perf: Fix race in BPF program unregister")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Cc: Jann Horn <jann@thejh.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nikolay Aleksandrov [Mon, 27 Jun 2016 16:34:42 +0000 (18:34 +0200)]
net: bridge: fix vlan stats continue counter
I made a dumb off-by-one mistake when I added the vlan stats counter
dumping code. The increment should happen before the check, not after
otherwise we miss one entry when we continue dumping.
Fixes:
a60c090361ea ("bridge: netlink: export per-vlan stats")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 27 Jun 2016 15:38:50 +0000 (17:38 +0200)]
tcp: do not send too big packets at retransmit time
Arjun reported a bug in TCP stack and bisected it to a recent commit.
In case where we process SACK, we can coalesce multiple skbs
into fat ones (tcp_shift_skb_data()), to lower write queue
overhead, because we do not expect to retransmit these packets.
However, SACK reneging can happen, forcing the sender to retransmit
all these packets. If skb->len is above 64KB, we then send buggy
IP packets that could hang TSO engine on cxgb4.
Neal suggested to use tcp_tso_autosize() instead of tp->gso_segs
so that we cook packets of optimal size vs TCP/pacing.
Thanks to Arjun for reporting the bug and running the tests !
Fixes:
10d3be569243 ("tcp-tso: do not split TSO packets at retransmit time")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Arjun V <arjun@chelsio.com>
Tested-by: Arjun V <arjun@chelsio.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Yongjun [Mon, 27 Jun 2016 12:48:53 +0000 (20:48 +0800)]
ibmvnic: fix to use list_for_each_safe() when delete items
Since we will remove items off the list using list_del() we need
to use a safe version of the list_for_each() macro aptly named
list_for_each_safe().
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 29 Jun 2016 09:14:19 +0000 (05:14 -0400)]
Merge branch 'thunderx-fixes'
Sunil Goutham says:
====================
net: thunderx: Miscellaneous fixes
This 2 patch series fixes issues w.r.t physical link status
reporting and transmit datapath configuration for
secondary qsets.
Changes from v1:
Fixed lmac disable sequence for interfaces of type SGMII.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Sunil Goutham [Mon, 27 Jun 2016 10:00:03 +0000 (15:30 +0530)]
net: thunderx: Fix TL4 configuration for secondary Qsets
TL4 calculation for a given SQ of secondary Qsets is incorrect
and goes out of bounds and also for some SQ's TL4 chosen will
transmit data via a different BGX interface and not same as
primary Qset's interface.
This patch fixes this issue.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sunil Goutham [Mon, 27 Jun 2016 10:00:02 +0000 (15:30 +0530)]
net: thunderx: Fix link status reporting
Check for SMU RX local/remote faults along with SPU LINK
status. Otherwise at times link is UP at our end but DOWN
at link partner's side. Also due to an issue in BGX it's
rarely seen that initialization doesn't happen properly
and SMU RX reports faults with everything fine at SPU.
This patch tries to reinitialize LMAC to fix it.
Also fixed LMAC disable sequence to properly bring down link.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: Tao Wang <tao.wang@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 29 Jun 2016 08:28:59 +0000 (04:28 -0400)]
Merge branch 'mlx5-100G-fixes'
Saeed Mahameed says:
====================
Mellanox 100G mlx5 fixes#2 for 4.7-rc
The following series provides one-liners fixes for mlx5 driver plus one
medium patch to reorganize ethtool counters reporting.
Highlights:
- Added MODIFY_FLOW_TABLE to command strings table
- Add ConnectX-5 PCIe 4.0 to list of supported devices
- Rename ASYNC_EVENTS enum
- Enable BlueFlame only when supported by device
- Avoid adding same vxlan port twice
- Report the correct number of PFC counters
- Reorganize ethtool reported counters and remove duplications
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Gal Pressman [Mon, 27 Jun 2016 09:08:38 +0000 (12:08 +0300)]
net/mlx5e: Reorganize ethtool statistics
Categorize and reorganize ethtool statistics counters by renaming to
"rx_*" and "tx_*" and removing redundant and duplicated counters, this
way they are easier to grasp and more user friendly.
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gal Pressman [Mon, 27 Jun 2016 09:08:37 +0000 (12:08 +0300)]
net/mlx5e: Fix number of PFC counters reported to ethtool
Number of PFC counters used to count only number of priorities with PFC
enabled, but each priority has more than one counter, hence the need to
multiply it by the number of PFC counters per priority.
Fixes:
cf678570d5a1 ('net/mlx5e: Add per priority group to PPort counters')
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Matthew Finlay [Mon, 27 Jun 2016 09:08:36 +0000 (12:08 +0300)]
net/mlx5e: Prevent adding the same vxlan port
Do not allow the same vxlan udp port to be added to the device more than
once.
Fixes:
b3f63c3d5e2c ("net/mlx5e: Add netdev support for VXLAN tunneling")
Signed-off-by: Matthew Finlay <matt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gal Pressman [Mon, 27 Jun 2016 09:08:35 +0000 (12:08 +0300)]
net/mlx5e: Check for BlueFlame capability before allocating SQ uar
Previous to this patch mapping was always set to write combining without
checking whether BlueFlame is supported in the device.
Fixes:
0ba422410bbf ('net/mlx5: Fix global UAR mapping')
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eli Cohen [Mon, 27 Jun 2016 09:08:34 +0000 (12:08 +0300)]
net/mlx5e: Change enum to better reflect usage
Change MLX5E_STATE_ASYNC_EVENTS_ENABLE to
MLX5E_STATE_ASYNC_EVENTS_ENABLED since it represent a state and not an
operation.
Fixes:
acff797cd1874 ('net/mlx5: Extend mlx5_core to support ConnectX-4 Ethernet functionality')
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Majd Dibbiny [Mon, 27 Jun 2016 09:08:33 +0000 (12:08 +0300)]
net/mlx5: Add ConnectX-5 PCIe 4.0 to list of supported devices
Add the upcoming ConnectX-5 PCIe 4.0 device to the list of
supported devices by the mlx5 driver.
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eli Cohen [Mon, 27 Jun 2016 09:08:32 +0000 (12:08 +0300)]
net/mlx5: Update command strings
Add command string for MODIFY_FLOW_TABLE which is used by the driver.
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Harini Katakam [Mon, 27 Jun 2016 07:39:59 +0000 (13:09 +0530)]
net: marvell: Add separate config ANEG function for Marvell
88E1111
Marvell
88E1111 currently uses the generic marvell config ANEG function.
This function has a sequence accessing Page 5 and Register 31,
both of which are not defined or reserved for this PHY.
Hence this patch adds a new config ANEG function for Marvell
88E1111
without these erroneous accesses.
Signed-off-by: Harini Katakam <harinik@xilinx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 29 Jun 2016 08:01:53 +0000 (04:01 -0400)]
Merge branch 'batman-adv-fixes'
Sven Eckelmann says:
====================
batman-adv: Fixes for Linux 4.7
Antonio currently seems to be occupied. This is currently rather unfortunate
because there are patches waiting in the batman-adv development repository
maint(enance) branch [1] since up to 6 weeks. I am now getting asked when
these patches will hit the distribution kernels and therefore decided to
submit these patches directly to netdev.
The patch from Simon works around the problem that warnings could be triggered
in the translation table code via packets using a VLAN not configured on the
target host. This warning was replaced with a rate limited info message.
Ben Hutchings found an superfluous batadv_softif_vlan_put in the error
handling code of the translation table while he backported the "batman-adv:
Fix reference counting of vlan object for tt_local_entry" patch to the stable
kernels. He noticed correctly that this batadv_softif_vlan_put should also
have been removed by the said patch.
The most requested fix at the moment is related to a double free in the
translation table code. It is a race condition which mostly happens on systems
with multiple cores and multiple network interface attached to batman-adv. Two
Freifunk communities which were haunted by weird crashes (with backtraces
reporting problems in other parts of the kernel) were kind enough to test this
patch. They reported that there systems are now running stable after applying
this patch.
An invalid memory access was detected in the batadv_icmp_packet_rr handling
code when receiving a skbuff with fragments. The last patch is fixing a memory
leak when the interface is removed via .dellink. The code to fix it was copied
from the code handling the legacy sysfs interface to remove netdevices from a
batman-adv netdevice.
There are still 28 patches in the development tree for v4.8 but I will leave
them to Antonio because these are cleanups and features and therefore for net-
next.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Sven Eckelmann [Sun, 26 Jun 2016 09:16:13 +0000 (11:16 +0200)]
batman-adv: Clean up untagged vlan when destroying via rtnl-link
The untagged vlan object is only destroyed when the interface is removed
via the legacy sysfs interface. But it also has to be destroyed when the
standard rtnl-link interface is used.
Fixes:
5d2c05b21337 ("batman-adv: add per VLAN interface attribute framework")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Acked-by: Antonio Quartulli <a@unstable.cc>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sven Eckelmann [Sun, 26 Jun 2016 09:16:12 +0000 (11:16 +0200)]
batman-adv: Fix ICMP RR ethernet access after skb_linearize
The skb_linearize may reallocate the skb. This makes the calculated pointer
for ethhdr invalid. But it the pointer is used later to fill in the RR
field of the batadv_icmp_packet_rr packet.
Instead re-evaluate eth_hdr after the skb_linearize+skb_cow to fix the
pointer and avoid the invalid read.
Fixes:
da6b8c20a5b8 ("batman-adv: generalize batman-adv icmp packet handling")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Sun, 26 Jun 2016 09:16:11 +0000 (11:16 +0200)]
batman-adv: Fix double-put of vlan object
Each batadv_tt_local_entry hold a single reference to a
batadv_softif_vlan. In case a new entry cannot be added to the hash
table, the error path puts the reference, but the reference will also
now be dropped by batadv_tt_local_entry_release().
Fixes:
a33d970d0b54 ("batman-adv: Fix reference counting of vlan object for tt_local_entry")
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sven Eckelmann [Sun, 26 Jun 2016 09:16:10 +0000 (11:16 +0200)]
batman-adv: Fix use-after-free/double-free of tt_req_node
The tt_req_node is added and removed from a list inside a spinlock. But the
locking is sometimes removed even when the object is still referenced and
will be used later via this reference. For example batadv_send_tt_request
can create a new tt_req_node (including add to a list) and later
re-acquires the lock to remove it from the list and to free it. But at this
time another context could have already removed this tt_req_node from the
list and freed it.
CPU#0
batadv_batman_skb_recv from net_device 0
-> batadv_iv_ogm_receive
-> batadv_iv_ogm_process
-> batadv_iv_ogm_process_per_outif
-> batadv_tvlv_ogm_receive
-> batadv_tvlv_ogm_receive
-> batadv_tvlv_containers_process
-> batadv_tvlv_call_handler
-> batadv_tt_tvlv_ogm_handler_v1
-> batadv_tt_update_orig
-> batadv_send_tt_request
-> batadv_tt_req_node_new
spin_lock(...)
allocates new tt_req_node and adds it to list
spin_unlock(...)
return tt_req_node
CPU#1
batadv_batman_skb_recv from net_device 1
-> batadv_recv_unicast_tvlv
-> batadv_tvlv_containers_process
-> batadv_tvlv_call_handler
-> batadv_tt_tvlv_unicast_handler_v1
-> batadv_handle_tt_response
spin_lock(...)
tt_req_node gets removed from list and is freed
spin_unlock(...)
CPU#0
<- returned to batadv_send_tt_request
spin_lock(...)
tt_req_node gets removed from list and is freed
MEMORY CORRUPTION/SEGFAULT/...
spin_unlock(...)
This can only be solved via reference counting to allow multiple contexts
to handle the list manipulation while making sure that only the last
context holding a reference will free the object.
Fixes:
a73105b8d4c7 ("batman-adv: improved client announcement mechanism")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Tested-by: Martin Weinelt <martin@darmstadt.freifunk.net>
Tested-by: Amadeus Alfa <amadeus@chemnitz.freifunk.net>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Simon Wunderlich [Sun, 26 Jun 2016 09:16:09 +0000 (11:16 +0200)]
batman-adv: replace WARN with rate limited output on non-existing VLAN
If a VLAN tagged frame is received and the corresponding VLAN is not
configured on the soft interface, it will splat a WARN on every packet
received. This is a quite annoying behaviour for some scenarios, e.g. if
bat0 is bridged with eth0, and there are arbitrary VLAN tagged frames
from Ethernet coming in without having any VLAN configuration on bat0.
The code should probably create vlan objects on the fly and
transparently transport these VLAN-tagged Ethernet frames, but until
this is done, at least the WARN splat should be replaced by a rate
limited output.
Fixes:
354136bcc3c4 ("batman-adv: fix kernel crash due to missing NULL checks")
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Fri, 24 Jun 2016 23:25:24 +0000 (16:25 -0700)]
net: phy: Manage fixed PHY address space using IDA
If we have a system which uses fixed PHY devices and calls
fixed_phy_register() then fixed_phy_unregister() we can exhaust the
number of fixed PHYs available after a while, since we keep incrementing
the variable phy_fixed_addr, but we never decrement it.
This patch fixes that by converting the fixed PHY allocation to using
IDA, which takes care of the allocation/dealloaction of the PHY
addresses for us.
Fixes:
a75951217472 ("net: phy: extend fixed driver with fixed_phy_register()")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Trond Myklebust [Sat, 25 Jun 2016 23:19:28 +0000 (19:19 -0400)]
NFS: Fix another OPEN_DOWNGRADE bug
Olga Kornievskaia reports that the following test fails to trigger
an OPEN_DOWNGRADE on the wire, and only triggers the final CLOSE.
fd0 = open(foo, RDRW) -- should be open on the wire for "both"
fd1 = open(foo, RDONLY) -- should be open on the wire for "read"
close(fd0) -- should trigger an open_downgrade
read(fd1)
close(fd1)
The issue is that we're missing a check for whether or not the current
state transitioned from an O_RDWR state as opposed to having transitioned
from a combination of O_RDONLY and O_WRONLY.
Reported-by: Olga Kornievskaia <aglo@umich.edu>
Fixes:
cd9288ffaea4 ("NFSv4: Fix another bug in the close/open_downgrade code")
Cc: stable@vger.kernel.org # 2.6.33+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Richard Guy Briggs [Tue, 28 Jun 2016 16:07:50 +0000 (12:07 -0400)]
audit: move audit_get_tty to reduce scope and kabi changes
The only users of audit_get_tty and audit_put_tty are internal to
audit, so move it out of include/linux/audit.h to kernel.h and create
a proper function rather than inlining it. This also reduces kABI
changes.
Suggested-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
[PM: line wrapped description]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Richard Guy Briggs [Tue, 28 Jun 2016 16:06:58 +0000 (12:06 -0400)]
audit: move calcs after alloc and check when logging set loginuid
Move the calculations of values after the allocation in case the
allocation fails. This avoids wasting effort in the rare case that it
fails, but more importantly saves us extra logic to release the tty
ref.
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Linus Torvalds [Tue, 28 Jun 2016 19:11:31 +0000 (12:11 -0700)]
Merge branch 'for-4.7-fixes' of git://git./linux/kernel/git/tj/libata
Pull libata fixes from Tejun Heo:
"Two trivial fixes - one for a bug in the allocation failure path and
the other a compiler warning fix"
* 'for-4.7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
ata: sata_mv: fix mis-conversion in mv_write_cached_reg()
ata: fix return value check in ahci_seattle_get_port_info()
Linus Torvalds [Tue, 28 Jun 2016 19:01:14 +0000 (12:01 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jikos/hid
Pull HID fix from Jiri Kosina:
"Regression fix for multitouch palm rejection from Allen Hung"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
HID: multitouch: enable palm rejection for Windows Precision Touchpad
Revert "HID: multitouch: enable palm rejection if device implements confidence usage"
Willem de Bruijn [Fri, 24 Jun 2016 20:02:35 +0000 (16:02 -0400)]
sock_diag: do not broadcast raw socket destruction
Diag intends to broadcast tcp_sk and udp_sk socket destruction.
Testing sk->sk_protocol for IPPROTO_TCP/IPPROTO_UDP alone is not
sufficient for this. Raw sockets can have the same type.
Add a test for sk->sk_type.
Fixes:
eb4cb008529c ("sock_diag: define destruction multicast groups")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Aaron Campbell [Fri, 24 Jun 2016 13:05:32 +0000 (10:05 -0300)]
connector: fix out-of-order cn_proc netlink message delivery
The proc connector messages include a sequence number, allowing userspace
programs to detect lost messages. However, performing this detection is
currently more difficult than necessary, since netlink messages can be
delivered to the application out-of-order. To fix this, leave pre-emption
disabled during cn_netlink_send(), and use GFP_NOWAIT.
The following was written as a test case. Building the kernel w/ make -j32
proved a reliable way to generate out-of-order cn_proc messages.
int
main(int argc, char *argv[])
{
static uint32_t last_seq[CPU_SETSIZE], seq;
int cpu, fd;
struct sockaddr_nl sa;
struct __attribute__((aligned(NLMSG_ALIGNTO))) {
struct nlmsghdr nl_hdr;
struct __attribute__((__packed__)) {
struct cn_msg cn_msg;
struct proc_event cn_proc;
};
} rmsg;
struct __attribute__((aligned(NLMSG_ALIGNTO))) {
struct nlmsghdr nl_hdr;
struct __attribute__((__packed__)) {
struct cn_msg cn_msg;
enum proc_cn_mcast_op cn_mcast;
};
} smsg;
fd = socket(PF_NETLINK, SOCK_DGRAM, NETLINK_CONNECTOR);
if (fd < 0) {
perror("socket");
}
sa.nl_family = AF_NETLINK;
sa.nl_groups = CN_IDX_PROC;
sa.nl_pid = getpid();
if (bind(fd, (struct sockaddr *)&sa, sizeof(sa)) < 0) {
perror("bind");
}
memset(&smsg, 0, sizeof(smsg));
smsg.nl_hdr.nlmsg_len = sizeof(smsg);
smsg.nl_hdr.nlmsg_pid = getpid();
smsg.nl_hdr.nlmsg_type = NLMSG_DONE;
smsg.cn_msg.id.idx = CN_IDX_PROC;
smsg.cn_msg.id.val = CN_VAL_PROC;
smsg.cn_msg.len = sizeof(enum proc_cn_mcast_op);
smsg.cn_mcast = PROC_CN_MCAST_LISTEN;
if (send(fd, &smsg, sizeof(smsg), 0) != sizeof(smsg)) {
perror("send");
}
while (recv(fd, &rmsg, sizeof(rmsg), 0) == sizeof(rmsg)) {
cpu = rmsg.cn_proc.cpu;
if (cpu < 0) {
continue;
}
seq = rmsg.cn_msg.seq;
if ((last_seq[cpu] != 0) && (seq != last_seq[cpu] + 1)) {
printf("out-of-order seq=%d on cpu=%d\n", seq, cpu);
}
last_seq[cpu] = seq;
}
/* NOTREACHED */
perror("recv");
return -1;
}
Signed-off-by: Aaron Campbell <aaron@monkey.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
daniel [Fri, 24 Jun 2016 10:35:18 +0000 (12:35 +0200)]
Bridge: Fix ipv6 mc snooping if bridge has no ipv6 address
The bridge is falsly dropping ipv6 mulitcast packets if there is:
1. No ipv6 address assigned on the brigde.
2. No external mld querier present.
3. The internal querier enabled.
When the bridge fails to build mld queries, because it has no
ipv6 address, it slilently returns, but keeps the local querier enabled.
This specific case causes confusing packet loss.
Ipv6 multicast snooping can only work if:
a) An external querier is present
OR
b) The bridge has an ipv6 address an is capable of sending own queries
Otherwise it has to forward/flood the ipv6 multicast traffic,
because snooping cannot work.
This patch fixes the issue by adding a flag to the bridge struct that
indicates that there is currently no ipv6 address assinged to the bridge
and returns a false state for the local querier in
__br_multicast_querier_exists().
Special thanks to Linus Lüssing.
Fixes:
d1d81d4c3dd8 ("bridge: check return value of ipv6_dev_get_saddr()")
Signed-off-by: Daniel Danzberger <daniel@dd-wrt.com>
Acked-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allen Hung [Thu, 23 Jun 2016 08:31:30 +0000 (16:31 +0800)]
HID: multitouch: enable palm rejection for Windows Precision Touchpad
The usage Confidence is mandary to Windows Precision Touchpad devices. If
it is examined in input_mapping on a WIndows Precision Touchpad, a new add
quirk MT_QUIRK_CONFIDENCE desgned for such devices will be applied to the
device. A touch with the confidence bit is not set is determined as
invalid.
Tested on Dell XPS13 9343
Cc: stable@vger.kernel.org # v4.5+
Reviewed-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Tested-by: Andy Lutomirski <luto@kernel.org> # XPS 13 9350, BIOS 1.4.3
Signed-off-by: Allen Hung <allen_hung@dell.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Allen Hung [Thu, 23 Jun 2016 08:31:29 +0000 (16:31 +0800)]
Revert "HID: multitouch: enable palm rejection if device implements confidence usage"
This reverts commit
25a84db15b3f ("HID: multitouch: enable palm rejection
if device implements confidence usage")
The commit enables palm rejection for Win8 Precision Touchpad devices but
the quirk MT_QUIRK_VALID_IS_CONFIDENCE it is using is not working very
properly. This quirk is originally designed for some WIn7 touchscreens. Use
of this for a Win8 Precision Touchpad will cause unexpected pointer jumping
problem.
Cc: stable@vger.kernel.org # v4.5+
Reviewed-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Tested-by: Andy Lutomirski <luto@kernel.org> # XPS 13 9350, BIOS 1.4.3
Signed-off-by: Allen Hung <allen_hung@dell.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Jouni Malinen [Sun, 19 Jun 2016 20:51:02 +0000 (23:51 +0300)]
mac80211: Fix mesh estab_plinks counting in STA removal case
If a user space program (e.g., wpa_supplicant) deletes a STA entry that
is currently in NL80211_PLINK_ESTAB state, the number of established
plinks counter was not decremented and this could result in rejecting
new plink establishment before really hitting the real maximum plink
limit. For !user_mpm case, this decrementation is handled by
mesh_plink_deactive().
Fix this by decrementing estab_plinks on STA deletion
(mesh_sta_cleanup() gets called from there) so that the counter has a
correct value and the Beacon frame advertisement in Mesh Configuration
element shows the proper value for capability to accept additional
peers.
Cc: stable@vger.kernel.org
Signed-off-by: Jouni Malinen <j@w1.fi>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Wang Sheng-Hui [Fri, 24 Jun 2016 00:52:11 +0000 (08:52 +0800)]
net/mlx5: use mlx5_buf_alloc_node instead of mlx5_buf_alloc in mlx5_wq_ll_create
Commit
311c7c71c9bb ("net/mlx5e: Allocate DMA coherent memory on
reader NUMA node") introduced mlx5_*_alloc_node() but missed changing
some calling and warn messages. This patch introduces 2 changes:
* Use mlx5_buf_alloc_node() instead of mlx5_buf_alloc() in
mlx5_wq_ll_create()
* Update the failure warn messages with _node postfix for
mlx5_*_alloc function names
Fixes:
311c7c71c9bb ("net/mlx5e: Allocate DMA coherent memory on reader NUMA node")
Signed-off-by: Wang Sheng-Hui <shhuiw@foxmail.com>
Acked-By: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 28 Jun 2016 08:22:25 +0000 (04:22 -0400)]
Merge branch 'bgmac-fixes'
Florian Fainelli says:
====================
net: bgmac: Random fixes
This patch series fixes a few issues spotted by code inspection and
actual testing.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Thu, 23 Jun 2016 21:25:33 +0000 (14:25 -0700)]
net: bgmac: Remove superflous netif_carrier_on()
bgmac_open() calls phy_start() to initialize the PHY state machine,
which will set the interface's carrier state accordingly, no need to
force that as this could be conflicting with the PHY state determined by
PHYLIB.
Fixes:
dd4544f05469 ("bgmac: driver for GBit MAC core on BCMA bus")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Thu, 23 Jun 2016 21:25:32 +0000 (14:25 -0700)]
net: bgmac: Start transmit queue in bgmac_open
The driver does not start the transmit queue in bgmac_open(). If the
queue was stopped prior to closing then re-opening the interface, we
would never be able to wake-up again.
Fixes:
dd4544f05469 ("bgmac: driver for GBit MAC core on BCMA bus")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Thu, 23 Jun 2016 21:23:12 +0000 (14:23 -0700)]
net: bgmac: Fix SOF bit checking
We are checking for the Start of Frame bit in the ctl1 word, while this
bit is set in the ctl0 word instead. Read the ctl0 word and update the
check to verify that.
Fixes:
9cde94506eac ("bgmac: implement scatter/gather support")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jay Vosburgh [Thu, 23 Jun 2016 21:20:51 +0000 (14:20 -0700)]
bonding: fix 802.3ad aggregator reselection
Since commit
7bb11dc9f59d ("bonding: unify all places where
actor-oper key needs to be updated."), the logic in bonding to handle
selection between multiple aggregators has not functioned.
This affects only configurations wherein the bonding slaves
connect to two discrete aggregators (e.g., two independent switches, each
with LACP enabled), thus creating two separate aggregation groups within a
single bond.
The cause is a change in
7bb11dc9f59d to no longer set
AD_PORT_BEGIN on a port after a link state change, which would cause the
port to be reselected for attachment to an aggregator as if were newly
added to the bond. We cannot restore the prior behavior, as it
contradicts IEEE 802.1AX 5.4.12, which requires ports that "become
inoperable" (lose carrier, setting port_enabled=false as per 802.1AX
5.4.7) to remain selected (i.e., assigned to the aggregator). As the port
now remains selected, the aggregator selection logic is not invoked.
A side effect of this change is that aggregators in bonding will
now contain ports that are link down. The aggregator selection logic
does not currently handle this situation correctly, causing incorrect
aggregator selection.
This patch makes two changes to repair the aggregator selection
logic in bonding to function as documented and within the confines of the
standard:
First, the aggregator selection and related logic now utilizes the
number of active ports per aggregator, not the number of selected ports
(as some selected ports may be down). The ad_select "bandwidth" and
"count" options only consider ports that are link up.
Second, on any carrier state change of any slave, the aggregator
selection logic is explicitly called to insure the correct aggregator is
active.
Reported-by: Veli-Matti Lintu <veli-matti.lintu@opinsys.fi>
Fixes:
7bb11dc9f59d ("bonding: unify all places where actor-oper key needs to be updated.")
Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tom Goff [Thu, 23 Jun 2016 20:11:57 +0000 (16:11 -0400)]
ipmr/ip6mr: Initialize the last assert time of mfc entries.
This fixes wrong-interface signaling on 32-bit platforms for entries
created when jiffies > 2^31 + MFC_ASSERT_THRESH.
Signed-off-by: Tom Goff <thomas.goff@ll.mit.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Martin Schwidefsky [Mon, 27 Jun 2016 15:06:45 +0000 (17:06 +0200)]
s390: fix test_fp_ctl inline assembly contraints
The test_fp_ctl function is used to test if a given value is a valid
floating-point control. The inline assembly in test_fp_ctl uses an
incorrect constraint for the 'orig_fpc' variable. If the compiler
chooses the same register for 'fpc' and 'orig_fpc' the test_fp_ctl()
function always returns true. This allows user space to trigger
kernel oopses with invalid floating-point control values on the
signal stack.
This problem has been introduced with git commit
4725c86055f5bbdcdf
"s390: fix save and restore of the floating-point-control register"
Cc: stable@vger.kernel.org # v3.13+
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Michael Holzheu [Mon, 13 Jun 2016 15:03:48 +0000 (17:03 +0200)]
Revert "s390/kdump: Clear subchannel ID to signal non-CCW/SCSI IPL"
This reverts commit
852ffd0f4e23248b47531058e531066a988434b5.
There are use cases where an intermediate boot kernel (1) uses kexec
to boot the final production kernel (2). For this scenario we should
provide the original boot information to the production kernel (2).
Therefore clearing the boot information during kexec() should not
be done.
Cc: stable@vger.kernel.org # v3.17+
Reported-by: Steffen Maier <maier@linux.vnet.ibm.com>
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Linus Torvalds [Tue, 28 Jun 2016 03:43:00 +0000 (20:43 -0700)]
Merge tag 'for-v4.7-rc' of git://git./linux/kernel/git/sre/linux-power-supply
Pull power supply fixes from Sebastian Reichel.
* tag 'for-v4.7-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply:
power_supply: tps65217-charger: Fix NULL deref during property export
power_supply: power_supply_read_temp only if use_cnt > 0
Linus Torvalds [Tue, 28 Jun 2016 03:34:43 +0000 (20:34 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input
Pull input fixes from Dmitry Torokhov.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: vmmouse - remove port reservation
Input: elantech - add more IC body types to the list
Input: wacom_w8001 - ignore invalid pen data packets
Input: wacom_w8001 - w8001_MAX_LENGTH should be 13
Input: xpad - fix oops when attaching an unknown Xbox One gamepad
MAINTAINERS: add Pali Rohár as reviewer of ALPS PS/2 touchpad driver
Input: add HDMI CEC specific keycodes
Input: add BUS_CEC type
Input: xpad - fix rumble on Xbox One controllers with 2015 firmware
Linus Torvalds [Mon, 27 Jun 2016 20:38:58 +0000 (13:38 -0700)]
Merge branch 'rc-fixes' of git://git./linux/kernel/git/mmarek/kbuild
Pull kbuild regression fix from Michal Marek:
"The problem is that commit
9c8fa9bc08f6 ("kbuild: fix if_change and
friends to consider argument order") fixed a potential missed rebuild,
but this results in unnnecessary rebuilds with the packaging targets.
Which is still more correct than the previous logic, but also very
annoying"
* 'rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
kbuild: Initialize exported variables
Linus Torvalds [Mon, 27 Jun 2016 18:23:44 +0000 (11:23 -0700)]
Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
"Various small cifs/smb3 fixes, include some for stable, and some from
the recent SMB3 test event"
* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
File names with trailing period or space need special case conversion
Fix reconnect to not defer smb3 session reconnect long after socket reconnect
cifs: check hash calculating succeeded
cifs: dynamic allocation of ntlmssp blob
cifs: use CIFS_MAX_DOMAINNAME_LEN when converting the domain name
cifs: stuff the fl_owner into "pid" field in the lock request
Linus Torvalds [Mon, 27 Jun 2016 17:59:53 +0000 (10:59 -0700)]
Merge branch 'linus' of git://git./linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
"This fixes the following issues:
- Missing length check for user-space GETALG request
- Bogus memmove length in ux500 driver
- Incorrect priority setting for vmx driver
- Incorrect ABI selection for vmx driver"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: user - re-add size check for CRYPTO_MSG_GETALG
crypto: ux500 - memmove the right size
crypto: vmx - Increase priority of aes-cbc cipher
crypto: vmx - Fix ABI detection
Stefan Hajnoczi [Thu, 23 Jun 2016 15:28:58 +0000 (16:28 +0100)]
vsock: make listener child lock ordering explicit
There are several places where the listener and pending or accept queue
child sockets are accessed at the same time. Lockdep is unhappy that
two locks from the same class are held.
Tell lockdep that it is safe and document the lock ordering.
Originally Claudio Imbrenda <imbrenda@linux.vnet.ibm.com> sent a similar
patch asking whether this is safe. I have audited the code and also
covered the vsock_pending_work() function.
Suggested-by: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Thu, 23 Jun 2016 13:25:09 +0000 (15:25 +0200)]
ipv6: enforce egress device match in per table nexthop lookups
with the commit
8c14586fc320 ("net: ipv6: Use passed in table for
nexthop lookups"), net hop lookup is first performed on route creation
in the passed-in table.
However device match is not enforced in table lookup, so the found
route can be later discarded due to egress device mismatch and no
global lookup will be performed.
This cause the following to fail:
ip link add dummy1 type dummy
ip link add dummy2 type dummy
ip link set dummy1 up
ip link set dummy2 up
ip route add 2001:db8:8086::/48 dev dummy1 metric 20
ip route add 2001:db8:d34d::/64 via 2001:db8:8086::2 dev dummy1 metric 20
ip route add 2001:db8:8086::/48 dev dummy2 metric 21
ip route add 2001:db8:d34d::/64 via 2001:db8:8086::2 dev dummy2 metric 21
RTNETLINK answers: No route to host
This change fixes the issue enforcing device lookup in
ip6_nh_lookup_table()
v1->v2: updated commit message title
Fixes:
8c14586fc320 ("net: ipv6: Use passed in table for nexthop lookups")
Reported-and-tested-by: Beniamino Galvani <bgalvani@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 27 Jun 2016 14:05:55 +0000 (10:05 -0400)]
Merge tag 'linux-can-fixes-for-4.7-
20160623' of git://git./linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2016-06-23
this is a pull request of 3 patches for the upcoming linux-4.7 release.
The first two patches are by Oliver Hartkopp fixing oopes in the generic CAN
device netlink handling. Jimmy Assarsson's patch for the kvaser_usb driver adds
support for more devices by adding their USB product ids.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Al Viro [Mon, 20 Jun 2016 17:14:36 +0000 (13:14 -0400)]
make nfs_atomic_open() call d_drop() on all ->open_context() errors.
In "NFSv4: Move dentry instantiation into the NFSv4-specific atomic open code"
unconditional d_drop() after the ->open_context() had been removed. It had
been correct for success cases (there ->open_context() itself had been doing
dcache manipulations), but not for error ones. Only one of those (ENOENT)
got a compensatory d_drop() added in that commit, but in fact it should've
been done for all errors. As it is, the case of O_CREAT non-exclusive open
on a hashed negative dentry racing with e.g. symlink creation from another
client ended up with ->open_context() getting an error and proceeding to
call nfs_lookup(). On a hashed dentry, which would've instantly triggered
BUG_ON() in d_materialise_unique() (or, these days, its equivalent in
d_splice_alias()).
Cc: stable@vger.kernel.org # v3.10+
Tested-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Jeremy Linton [Wed, 22 Jun 2016 17:40:50 +0000 (12:40 -0500)]
net: smsc911x: Fix bug where PHY interrupts are overwritten by 0
By default, mdiobus_alloc() sets the PHYs to polling mode, but a
pointer size memcpy means that a couple IRQs end up being overwritten
with a value of 0. This means that PHY_POLL is disabled and results
in unpredictable behavior depending on the PHY's location on the
MDIO bus. Remove that memcpy and the now unused phy_irq member to
force the SMSC911x PHYs into polling mode 100% of the time.
Fixes:
e7f4dc3536a4 ("mdio: Move allocation of interrupts into core")
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Mon, 27 Jun 2016 00:52:03 +0000 (17:52 -0700)]
Linux 4.7-rc5
Linus Torvalds [Sun, 26 Jun 2016 17:08:49 +0000 (10:08 -0700)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Two straightforward fixes.
One is a concurrency issue only affecting SAS connected SATA drives,
but which could hang the storage subsystem if it triggers (because the
outstanding command count on error never goes back to zero) and the
other is a NO_TAG fallout from the switch to hostwide tags which
causes the system to crash on module insertion (we've checked
carefully and only the 53c700 family of drivers is vulnerable to this
issue)"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
53c700: fix BUG on untagged commands
scsi: fix race between simultaneous decrements of ->host_failed
Linus Torvalds [Sat, 25 Jun 2016 15:53:38 +0000 (08:53 -0700)]
Merge branch 'for-linus-4.7-part2' of git://git./linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes part 2 from Chris Mason:
"This has one patch from Omar to bring iterate_shared back to btrfs.
We have a tree of work we queue up for directory items and it doesn't
lend itself well to shared access. While we're cleaning it up, Omar
has changed things to use an exclusive lock when there are delayed
items"
* 'for-linus-4.7-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: fix ->iterate_shared() by upgrading i_rwsem for delayed nodes
Linus Torvalds [Sat, 25 Jun 2016 15:42:31 +0000 (08:42 -0700)]
Merge branch 'for-linus-4.7' of git://git./linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
"I have a two part pull this time because one of the patches Dave
Sterba collected needed to be against v4.7-rc2 or higher (we used
rc4). I try to make my for-linus-xx branch testable on top of the
last major so we can hand fixes to people on the list more easily, so
I've split this pull in two.
This first part has some fixes and two performance improvements that
we've been testing for some time.
Josef's two performance fixes are most notable. The transid tracking
patch makes a big improvement on pretty much every workload"
* 'for-linus-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: Force stripesize to the value of sectorsize
btrfs: fix disk_i_size update bug when fallocate() fails
Btrfs: fix error handling in map_private_extent_buffer
Btrfs: fix error return code in btrfs_init_test_fs()
Btrfs: don't do nocow check unless we have to
btrfs: fix deadlock in delayed_ref_async_start
Btrfs: track transid for delayed ref flushing
Linus Torvalds [Sat, 25 Jun 2016 13:55:48 +0000 (06:55 -0700)]
Merge tag 'sound-4.7-rc5' of git://git./linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Again pretty calm weeks: we've had only a few trivial / stable
HD-audio fixes in addition to a possible race fix for snd-dummy driver
spotted by syzkaller"
* tag 'sound-4.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: dummy: Fix a use-after-free at closing
ALSA: hda / realtek - add two more Thinkpad IDs (5050,5053) for tpt460 fixup
ALSA: hda - Fix the headset mic jack detection on Dell machine
ALSA: hda/tegra: iomem fixups for sparse warnings
ALSA: hdac_regmap - fix the register access for runtime PM
Linus Torvalds [Sat, 25 Jun 2016 13:49:32 +0000 (06:49 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 kprobe fix from Thomas Gleixner:
"A single fix clearing the TF bit when a fault is single stepped"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
kprobes/x86: Clear TF bit in fault on single-stepping
Linus Torvalds [Sat, 25 Jun 2016 13:38:42 +0000 (06:38 -0700)]
Merge branch 'sched-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull scheduler fixes from Thomas Gleixner:
"A couple of scheduler fixes:
- force watchdog reset while processing sysrq-w
- fix a deadlock when enabling trace events in the scheduler
- fixes to the throttled next buddy logic
- fixes for the average accounting (missing serialization and
underflow handling)
- allow kernel threads for fallback to online but not active cpus"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/core: Allow kthreads to fall back to online && !active cpus
sched/fair: Do not announce throttled next buddy in dequeue_task_fair()
sched/fair: Initialize throttle_count for new task-groups lazily
sched/fair: Fix cfs_rq avg tracking underflow
kernel/sysrq, watchdog, sched/core: Reset watchdog on all CPUs while processing sysrq-w
sched/debug: Fix deadlock when enabling sched events
sched/fair: Fix post_init_entity_util_avg() serialization
Omar Sandoval [Fri, 20 May 2016 20:50:33 +0000 (13:50 -0700)]
Btrfs: fix ->iterate_shared() by upgrading i_rwsem for delayed nodes
Commit
fe742fd4f90f ("Revert "btrfs: switch to ->iterate_shared()"")
backed out the conversion to ->iterate_shared() for Btrfs because the
delayed inode handling in btrfs_real_readdir() is racy. However, we can
still do readdir in parallel if there are no delayed nodes.
This is a temporary fix which upgrades the shared inode lock to an
exclusive lock only when we have delayed items until we come up with a
more complete solution. While we're here, rename the
btrfs_{get,put}_delayed_items functions to make it very clear that
they're just for readdir.
Tested with xfstests and by doing a parallel kernel build:
while make tinyconfig && make -j4 && git clean dqfx; do
:
done
along with a bunch of parallel finds in another shell:
while true; do
for ((i=0; i<4; i++)); do
find . >/dev/null &
done
wait
done
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
Linus Torvalds [Sat, 25 Jun 2016 13:14:44 +0000 (06:14 -0700)]
Merge branch 'locking-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull locking fix from Thomas Gleixner:
"A single fix to address a race in the static key logic"
* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/static_key: Fix concurrent static_key_slow_inc()
Linus Torvalds [Sat, 25 Jun 2016 13:09:59 +0000 (06:09 -0700)]
Merge branch 'irq-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull irq fix from Thomas Gleixner:
"A single fix for the fallout from the conversion of MIPS GIC to irq
domains"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/mips-gic: Fix IRQs in gic_dev_domain
Linus Torvalds [Sat, 25 Jun 2016 13:01:48 +0000 (06:01 -0700)]
Merge tag 'powerpc-4.7-4' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"mm/radix (Aneesh Kumar K.V):
- Update to tlb functions ric argument
- Flush page walk cache when freeing page table
- Update Radix tree size as per ISA 3.0
mm/hash (Aneesh Kumar K.V):
- Use the correct PPP mask when updating HPTE
- Don't add memory coherence if cache inhibited is set
eeh (Gavin Shan):
- Fix invalid cached PE primary bus
bpf/jit (Naveen N. Rao):
- Disable classic BPF JIT on ppc64le
.. and fix faults caused by radix patching of SLB miss handler
(Michael Ellerman)"
* tag 'powerpc-4.7-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/bpf/jit: Disable classic BPF JIT on ppc64le
powerpc: Fix faults caused by radix patching of SLB miss handler
powerpc/eeh: Fix invalid cached PE primary bus
powerpc/mm/radix: Update Radix tree size as per ISA 3.0
powerpc/mm/hash: Don't add memory coherence if cache inhibited is set
powerpc/mm/hash: Use the correct PPP mask when updating HPTE
powerpc/mm/radix: Flush page walk cache when freeing page table
powerpc/mm/radix: Update to tlb functions ric argument
Michael Ellerman [Sat, 25 Jun 2016 11:53:30 +0000 (21:53 +1000)]
Fix build break in fork.c when THREAD_SIZE < PAGE_SIZE
Commit
b235beea9e99 ("Clarify naming of thread info/stack allocators")
breaks the build on some powerpc configs, where THREAD_SIZE < PAGE_SIZE:
kernel/fork.c:235:2: error: implicit declaration of function 'free_thread_stack'
kernel/fork.c:355:8: error: assignment from incompatible pointer type
stack = alloc_thread_stack_node(tsk, node);
^
Fix it by renaming free_stack() to free_thread_stack(), and updating the
return type of alloc_thread_stack_node().
Fixes:
b235beea9e99 ("Clarify naming of thread info/stack allocators")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sat, 25 Jun 2016 02:08:33 +0000 (19:08 -0700)]
Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
"Two weeks worth of fixes here"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (41 commits)
init/main.c: fix initcall_blacklisted on ia64, ppc64 and parisc64
autofs: don't get stuck in a loop if vfs_write() returns an error
mm/page_owner: avoid null pointer dereference
tools/vm/slabinfo: fix spelling mistake: "Ocurrences" -> "Occurrences"
fs/nilfs2: fix potential underflow in call to crc32_le
oom, suspend: fix oom_reaper vs. oom_killer_disable race
ocfs2: disable BUG assertions in reading blocks
mm, compaction: abort free scanner if split fails
mm: prevent KASAN false positives in kmemleak
mm/hugetlb: clear compound_mapcount when freeing gigantic pages
mm/swap.c: flush lru pvecs on compound page arrival
memcg: css_alloc should return an ERR_PTR value on error
memcg: mem_cgroup_migrate() may be called with irq disabled
hugetlb: fix nr_pmds accounting with shared page tables
Revert "mm: disable fault around on emulated access bit architecture"
Revert "mm: make faultaround produce old ptes"
mailmap: add Boris Brezillon's email
mailmap: add Antoine Tenart's email
mm, sl[au]b: add __GFP_ATOMIC to the GFP reclaim mask
mm: mempool: kasan: don't poot mempool objects in quarantine
...
Linus Torvalds [Sat, 25 Jun 2016 01:52:31 +0000 (18:52 -0700)]
Merge tag 'for-linus' of git://git./linux/kernel/git/dledford/rdma
Pull rdma fixes from Doug Ledford:
"This is the second batch of queued up rdma patches for this rc cycle.
There isn't anything really major in here. It's passed 0day,
linux-next, and local testing across a wide variety of hardware.
There are still a few known issues to be tracked down, but this should
amount to the vast majority of the rdma RC fixes.
Round two of 4.7 rc fixes:
- A couple minor fixes to the rdma core
- Multiple minor fixes to hfi1
- Multiple minor fixes to mlx4/mlx4
- A few minor fixes to i40iw"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (31 commits)
IB/srpt: Reduce QP buffer size
i40iw: Enable level-1 PBL for fast memory registration
i40iw: Return correct max_fast_reg_page_list_len
i40iw: Correct status check on i40iw_get_pble
i40iw: Correct CQ arming
IB/rdmavt: Correct qp_priv_alloc() return value test
IB/hfi1: Don't zero out qp->s_ack_queue in rvt_reset_qp
IB/hfi1: Fix deadlock with txreq allocation slow path
IB/mlx4: Prevent cross page boundary allocation
IB/mlx4: Fix memory leak if QP creation failed
IB/mlx4: Verify port number in flow steering create flow
IB/mlx4: Fix error flow when sending mads under SRIOV
IB/mlx4: Fix the SQ size of an RC QP
IB/mlx5: Fix wrong naming of port_rcv_data counter
IB/mlx5: Fix post send fence logic
IB/uverbs: Initialize ib_qp_init_attr with zeros
IB/core: Fix false search of the IB_SA_WELL_KNOWN_GUID
IB/core: Fix RoCE v1 multicast join logic issue
IB/core: Fix no default GIDs when netdevice reregisters
IB/hfi1: Send a pkey change event on driver pkey update
...
Linus Torvalds [Sat, 25 Jun 2016 01:43:58 +0000 (18:43 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jikos/hid
Pull HID fix from Jiri Kosina:
"hiddev ioctl() validation fix from Scott Bauer"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
HID: hiddev: validate num_values for HIDIOCGUSAGES, HIDIOCSUSAGES commands
Linus Torvalds [Sat, 25 Jun 2016 01:36:15 +0000 (18:36 -0700)]
Merge tag 'hwmon-for-linus-v4.7-rc5' of git://git./linux/kernel/git/groeck/linux-staging
Pull hwmon fix from Guenter Roeck:
"Improve fan type detection for dell-smm to prevent kernel hang"
* tag 'hwmon-for-linus-v4.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (dell-smm) Cache fan_type() calls and change fan detection
Linus Torvalds [Sat, 25 Jun 2016 01:29:55 +0000 (18:29 -0700)]
Merge tag 'acpi-4.7-rc5' of git://git./linux/kernel/git/rafael/linux-pm
Pull ACPI fix from Rafael Wysocki:
"Stable-candidate fix for a deadlock in ACPICA introduced during the
4.5 development cycle by a commit attempting to improve the handling
of AML code that doesn't belong to any namespace objects in a given
definition block (Lv Zheng)"
* tag 'acpi-4.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPICA: Namespace: Fix deadlock triggered by MLC support in dynamic table loading
Linus Torvalds [Sat, 25 Jun 2016 01:03:22 +0000 (18:03 -0700)]
Merge tag 'pm-4.7-rc5' of git://git./linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"Fix for a latent cpufreq driver bug uncovered by a recent ACPICA
change and several fixes for the devfreq framework, including one fix
for an issue introduced recently.
Specifics:
- Fix a latent initialization issue in the pcc-cpufreq driver
(incorrect initial value of a structure field) that has been
uncovered by a recent ACPICA commit (Mike Galbraith).
- Add a missing notification in an update_devfreq() error code path
forgotten by a recent devfreq commit (Chanwoo Choi).
- Fix devfreq device frequency initialization (Lukasz Luba).
- Fix an incorrect IS_ERR() check in the devfreq framework discovered
by the Smatch checker (Dan Carpenter).
- Drop two excessive put_device() calls from the devfreq framework
(MyungJoo Ham, Cai Zhiyong).
- Fix a possible memory leak in the devfreq framework and drop an
unnecessary kfree() invocation from it (MyungJoo Ham)"
* tag 'pm-4.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM / devfreq: Send the DEVFREQ_POSTCHANGE notification when target() is failed
cpufreq: pcc-cpufreq: Fix doorbell.access_width
PM / devfreq: fix initialization of current frequency in last status
PM / devfreq: exynos-nocp: Remove incorrect IS_ERR() check
PM / devfreq: remove double put_device
PM / devfreq: fix double call put_device
PM / devfreq: fix duplicated kfree on devfreq pointer
PM / devfreq: devm_kzalloc to have dev pointer more precisely
Linus Torvalds [Sat, 25 Jun 2016 00:57:37 +0000 (17:57 -0700)]
Merge tag 'for-linus-4.7b-rc4-tag' of git://git./linux/kernel/git/xen/tip
Pull xen bug fixes from David Vrabel:
- fix x86 PV dom0 crash during early boot on some hardware
- fix two pciback bugs affects certain devices
- fix potential overflow when clearing page tables in x86 PV
* tag 'for-linus-4.7b-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen-pciback: return proper values during BAR sizing
x86/xen: avoid m2p lookup when setting early page table entries
xen/pciback: Fix conf_space read/write overlap check.
x86/xen: fix upper bound of pmd loop in xen_cleanhighmap()
xen/balloon: Fix declared-but-not-defined warning
Linus Torvalds [Sat, 25 Jun 2016 00:51:14 +0000 (17:51 -0700)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"Here are a few more arm64 fixes, but things do finally appear to be
slowing down. The main fix is avoiding hibernation in a previously
unanticipated situation where we have CPUs parked in the kernel, but
it's all good stuff.
- Fix icache/dcache sync for anonymous pages under migration
- Correct the ASID limit check
- Fix parallel builds of Image and Image.gz
- Refuse to hibernate when we have CPUs that we can't offline"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: hibernate: Don't hibernate on systems with stuck CPUs
arm64: smp: Add function to determine if cpus are stuck in the kernel
arm64: mm: remove page_mapping check in __sync_icache_dcache
arm64: fix boot image dependencies to not generate invalid images
arm64: update ASID limit
Rasmus Villemoes [Fri, 24 Jun 2016 21:50:30 +0000 (14:50 -0700)]
init/main.c: fix initcall_blacklisted on ia64, ppc64 and parisc64
When I replaced kasprintf("%pf") with a direct call to
sprint_symbol_no_offset I must have broken the initcall blacklisting
feature on the arches where dereference_function_descriptor() is
non-trivial.
Fixes:
c8cdd2be213f (init/main.c: simplify initcall_blacklisted())
Link: http://lkml.kernel.org/r/1466027283-4065-1-git-send-email-linux@rasmusvillemoes.dk
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Yang Shi <yang.shi@linaro.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Petr Mladek <pmladek@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrey Vagin [Fri, 24 Jun 2016 21:50:27 +0000 (14:50 -0700)]
autofs: don't get stuck in a loop if vfs_write() returns an error
__vfs_write() returns a negative value in a error case.
Link: http://lkml.kernel.org/r/20160616083108.6278.65815.stgit@pluto.themaw.net
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sudip Mukherjee [Fri, 24 Jun 2016 21:50:24 +0000 (14:50 -0700)]
mm/page_owner: avoid null pointer dereference
We have dereferenced page_ext before checking it. Lets check it first
and then used it.
Fixes:
f86e4271978b ("mm: check the return value of lookup_page_ext for all call sites")
Link: http://lkml.kernel.org/r/1465249059-7883-1-git-send-email-sudipm.mukherjee@gmail.com
Signed-off-by: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Colin Ian King [Fri, 24 Jun 2016 21:50:21 +0000 (14:50 -0700)]
tools/vm/slabinfo: fix spelling mistake: "Ocurrences" -> "Occurrences"
trivial fix to spelling mistake
Link: http://lkml.kernel.org/r/1466672144-831-1-git-send-email-colin.king@canonical.com
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Torsten Hilbrich [Fri, 24 Jun 2016 21:50:18 +0000 (14:50 -0700)]
fs/nilfs2: fix potential underflow in call to crc32_le
The value `bytes' comes from the filesystem which is about to be
mounted. We cannot trust that the value is always in the range we
expect it to be.
Check its value before using it to calculate the length for the crc32_le
call. It value must be larger (or equal) sumoff + 4.
This fixes a kernel bug when accidentially mounting an image file which
had the nilfs2 magic value 0x3434 at the right offset 0x406 by chance.
The bytes 0x01 0x00 were stored at 0x408 and were interpreted as a
s_bytes value of 1. This caused an underflow when substracting sumoff +
4 (20) in the call to crc32_le.
BUG: unable to handle kernel paging request at
ffff88021e600000
IP: crc32_le+0x36/0x100
...
Call Trace:
nilfs_valid_sb.part.5+0x52/0x60 [nilfs2]
nilfs_load_super_block+0x142/0x300 [nilfs2]
init_nilfs+0x60/0x390 [nilfs2]
nilfs_mount+0x302/0x520 [nilfs2]
mount_fs+0x38/0x160
vfs_kern_mount+0x67/0x110
do_mount+0x269/0xe00
SyS_mount+0x9f/0x100
entry_SYSCALL_64_fastpath+0x16/0x71
Link: http://lkml.kernel.org/r/1466778587-5184-2-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Torsten Hilbrich <torsten.hilbrich@secunet.com>
Tested-by: Torsten Hilbrich <torsten.hilbrich@secunet.com>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michal Hocko [Fri, 24 Jun 2016 21:50:16 +0000 (14:50 -0700)]
oom, suspend: fix oom_reaper vs. oom_killer_disable race
Tetsuo has reported the following potential oom_killer_disable vs.
oom_reaper race:
(1) freeze_processes() starts freezing user space threads.
(2) Somebody (maybe a kenrel thread) calls out_of_memory().
(3) The OOM killer calls mark_oom_victim() on a user space thread
P1 which is already in __refrigerator().
(4) oom_killer_disable() sets oom_killer_disabled = true.
(5) P1 leaves __refrigerator() and enters do_exit().
(6) The OOM reaper calls exit_oom_victim(P1) before P1 can call
exit_oom_victim(P1).
(7) oom_killer_disable() returns while P1 not yet finished
(8) P1 perform IO/interfere with the freezer.
This situation is unfortunate. We cannot move oom_killer_disable after
all the freezable kernel threads are frozen because the oom victim might
depend on some of those kthreads to make a forward progress to exit so
we could deadlock. It is also far from trivial to teach the oom_reaper
to not call exit_oom_victim() because then we would lose a guarantee of
the OOM killer and oom_killer_disable forward progress because
exit_mm->mmput might block and never call exit_oom_victim.
It seems the easiest way forward is to workaround this race by calling
try_to_freeze_tasks again after oom_killer_disable. This will make sure
that all the tasks are frozen or it bails out.
Fixes:
449d777d7ad6 ("mm, oom_reaper: clear TIF_MEMDIE for all tasks queued for oom_reaper")
Link: http://lkml.kernel.org/r/1466597634-16199-1-git-send-email-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Gang He [Fri, 24 Jun 2016 21:50:13 +0000 (14:50 -0700)]
ocfs2: disable BUG assertions in reading blocks
According to some high-load testing, these two BUG assertions were
encountered, this led system panic. Actually, there were some
discussions about removing these two BUG() assertions, it would not
bring any side effect.
Then, I did the the following changes,
1) use the existing macro CATCH_BH_JBD_RACES to wrap BUG() in the
ocfs2_read_blocks_sync function like before.
2) disable the macro CATCH_BH_JBD_RACES in Makefile by default.
Link: http://lkml.kernel.org/r/1466574294-26863-1-git-send-email-ghe@suse.com
Signed-off-by: Gang He <ghe@suse.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Rientjes [Fri, 24 Jun 2016 21:50:10 +0000 (14:50 -0700)]
mm, compaction: abort free scanner if split fails
If the memory compaction free scanner cannot successfully split a free
page (only possible due to per-zone low watermark), terminate the free
scanner rather than continuing to scan memory needlessly. If the
watermark is insufficient for a free page of order <= cc->order, then
terminate the scanner since all future splits will also likely fail.
This prevents the compaction freeing scanner from scanning all memory on
very large zones (very noticeable for zones > 128GB, for instance) when
all splits will likely fail while holding zone->lock.
compaction_alloc() iterating a 128GB zone has been benchmarked to take
over 400ms on some systems whereas any free page isolated and ready to
be split ends up failing in split_free_page() because of the low
watermark check and thus the iteration continues.
The next time compaction occurs, the freeing scanner will likely start
at the end of the zone again since no success was made previously and we
get the same lengthy iteration until the zone is brought above the low
watermark. All thp page faults can take >400ms in such a state without
this fix.
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1606211820350.97086@chino.kir.corp.google.com
Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dmitry Vyukov [Fri, 24 Jun 2016 21:50:07 +0000 (14:50 -0700)]
mm: prevent KASAN false positives in kmemleak
When kmemleak dumps contents of leaked objects it reads whole objects
regardless of user-requested size. This upsets KASAN. Disable KASAN
checks around object dump.
Link: http://lkml.kernel.org/r/1466617631-68387-1-git-send-email-dvyukov@google.com
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Gerald Schaefer [Fri, 24 Jun 2016 21:50:04 +0000 (14:50 -0700)]
mm/hugetlb: clear compound_mapcount when freeing gigantic pages
While working on s390 support for gigantic hugepages I ran into the
following "Bad page state" warning when freeing gigantic pages:
BUG: Bad page state in process bash pfn:580001
page:
000003d116000040 count:0 mapcount:0 mapping:
ffffffff00000000 index:0x0
flags: 0x7fffc0000000000()
page dumped because: non-NULL mapping
This is because page->compound_mapcount, which is part of a union with
page->mapping, is initialized with -1 in prep_compound_gigantic_page(),
and not cleared again during destroy_compound_gigantic_page(). Fix this
by clearing the compound_mapcount in destroy_compound_gigantic_page()
before clearing compound_head.
Interestingly enough, the warning will not show up on x86_64, although
this should not be architecture specific. Apparently there is an
endianness issue, combined with the fact that the union contains both a
64 bit ->mapping pointer and a 32 bit atomic_t ->compound_mapcount as
members. The resulting bogus page->mapping on x86_64 therefore contains
00000000ffffffff instead of
ffffffff00000000 on s390, which will falsely
trigger the PageAnon() check in free_pages_prepare() because
page->mapping & PAGE_MAPPING_ANON is true on little-endian architectures
like x86_64 in this case (the page is not compound anymore,
->compound_head was already cleared before). As a result, page->mapping
will be cleared before doing the checks in free_pages_check().
Not sure if the bogus "PageAnon() returning true" on x86_64 for the
first tail page of a gigantic page (at this stage) has other theoretical
implications, but they would also be fixed with this patch.
Link: http://lkml.kernel.org/r/1466612719-5642-1-git-send-email-gerald.schaefer@de.ibm.com
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Lukasz Odzioba [Fri, 24 Jun 2016 21:50:01 +0000 (14:50 -0700)]
mm/swap.c: flush lru pvecs on compound page arrival
Currently we can have compound pages held on per cpu pagevecs, which
leads to a lot of memory unavailable for reclaim when needed. In the
systems with hundreads of processors it can be GBs of memory.
On of the way of reproducing the problem is to not call munmap
explicitly on all mapped regions (i.e. after receiving SIGTERM). After
that some pages (with THP enabled also huge pages) may end up on
lru_add_pvec, example below.
void main() {
#pragma omp parallel
{
size_t size = 55 * 1000 * 1000; // smaller than MEM/CPUS
void *p = mmap(NULL, size, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS , -1, 0);
if (p != MAP_FAILED)
memset(p, 0, size);
//munmap(p, size); // uncomment to make the problem go away
}
}
When we run it with THP enabled it will leave significant amount of
memory on lru_add_pvec. This memory will be not reclaimed if we hit
OOM, so when we run above program in a loop:
for i in `seq 100`; do ./a.out; done
many processes (95% in my case) will be killed by OOM.
The primary point of the LRU add cache is to save the zone lru_lock
contention with a hope that more pages will belong to the same zone and
so their addition can be batched. The huge page is already a form of
batched addition (it will add 512 worth of memory in one go) so skipping
the batching seems like a safer option when compared to a potential
excess in the caching which can be quite large and much harder to fix
because lru_add_drain_all is way to expensive and it is not really clear
what would be a good moment to call it.
Similarly we can reproduce the problem on lru_deactivate_pvec by adding:
madvise(p, size, MADV_FREE); after memset.
This patch flushes lru pvecs on compound page arrival making the problem
less severe - after applying it kill rate of above example drops to 0%,
due to reducing maximum amount of memory held on pvec from 28MB (with
THP) to 56kB per CPU.
Suggested-by: Michal Hocko <mhocko@suse.com>
Link: http://lkml.kernel.org/r/1466180198-18854-1-git-send-email-lukasz.odzioba@intel.com
Signed-off-by: Lukasz Odzioba <lukasz.odzioba@intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Vladimir Davydov <vdavydov@parallels.com>
Cc: Ming Li <mingli199x@qq.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tejun Heo [Fri, 24 Jun 2016 21:49:58 +0000 (14:49 -0700)]
memcg: css_alloc should return an ERR_PTR value on error
mem_cgroup_css_alloc() was returning NULL on failure while cgroup core
expected it to return an ERR_PTR value leading to the following NULL
deref after a css allocation failure. Fix it by return
ERR_PTR(-ENOMEM) instead. I'll also update cgroup core so that it
can handle NULL returns.
mkdir: page allocation failure: order:6, mode:0x240c0c0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO)
CPU: 0 PID: 8738 Comm: mkdir Not tainted 4.7.0-rc3+ #123
...
Call Trace:
dump_stack+0x68/0xa1
warn_alloc_failed+0xd6/0x130
__alloc_pages_nodemask+0x4c6/0xf20
alloc_pages_current+0x66/0xe0
alloc_kmem_pages+0x14/0x80
kmalloc_order_trace+0x2a/0x1a0
__kmalloc+0x291/0x310
memcg_update_all_caches+0x6c/0x130
mem_cgroup_css_alloc+0x590/0x610
cgroup_apply_control_enable+0x18b/0x370
cgroup_mkdir+0x1de/0x2e0
kernfs_iop_mkdir+0x55/0x80
vfs_mkdir+0xb9/0x150
SyS_mkdir+0x66/0xd0
do_syscall_64+0x53/0x120
entry_SYSCALL64_slow_path+0x25/0x25
...
BUG: unable to handle kernel NULL pointer dereference at
00000000000000d0
IP: init_and_link_css+0x37/0x220
PGD
34b1e067 PUD
3a109067 PMD 0
Oops: 0002 [#1] SMP
Modules linked in:
CPU: 0 PID: 8738 Comm: mkdir Not tainted 4.7.0-rc3+ #123
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.2-20160422_131301-anatol 04/01/2014
task:
ffff88007cbc5200 ti:
ffff8800666d4000 task.ti:
ffff8800666d4000
RIP: 0010:[<
ffffffff810f2ca7>] [<
ffffffff810f2ca7>] init_and_link_css+0x37/0x220
RSP: 0018:
ffff8800666d7d90 EFLAGS:
00010246
RAX:
0000000000000000 RBX:
0000000000000000 RCX:
0000000000000000
RDX:
ffffffff810f2499 RSI:
0000000000000000 RDI:
0000000000000008
RBP:
ffff8800666d7db8 R08:
0000000000000003 R09:
0000000000000000
R10:
0000000000000001 R11:
0000000000000000 R12:
ffff88005a5fb400
R13:
ffffffff81f0f8a0 R14:
ffff88005a5fb400 R15:
0000000000000010
FS:
00007fc944689700(0000) GS:
ffff88007fc00000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
00007f3aed0d2b80 CR3:
000000003a1e8000 CR4:
00000000000006f0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
Call Trace:
cgroup_apply_control_enable+0x1ac/0x370
cgroup_mkdir+0x1de/0x2e0
kernfs_iop_mkdir+0x55/0x80
vfs_mkdir+0xb9/0x150
SyS_mkdir+0x66/0xd0
do_syscall_64+0x53/0x120
entry_SYSCALL64_slow_path+0x25/0x25
Code: 89 f5 48 89 fb 49 89 d4 48 83 ec 08 8b 05 72 3b d8 00 85 c0 0f 85 60 01 00 00 4c 89 e7 e8 72 f7 ff ff 48 8d 7b 08 48 89 d9 31 c0 <48> c7 83 d0 00 00 00 00 00 00 00 48 83 e7 f8 48 29 f9 81 c1 d8
RIP init_and_link_css+0x37/0x220
RSP <
ffff8800666d7d90>
CR2:
00000000000000d0
---[ end trace
a2d8836ae1e852d1 ]---
Link: http://lkml.kernel.org/r/20160621165740.GJ3262@mtj.duckdns.org
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tejun Heo [Fri, 24 Jun 2016 21:49:54 +0000 (14:49 -0700)]
memcg: mem_cgroup_migrate() may be called with irq disabled
mem_cgroup_migrate() uses local_irq_disable/enable() but can be called
with irq disabled from migrate_page_copy(). This ends up enabling irq
while holding a irq context lock triggering the following lockdep
warning. Fix it by using irq_save/restore instead.
=================================
[ INFO: inconsistent lock state ]
4.7.0-rc1+ #52 Tainted: G W
---------------------------------
inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
kcompactd0/151 [HC0[0]:SC0[0]:HE1:SE1] takes:
(&(&ctx->completion_lock)->rlock){+.?.-.}, at: [<
000000000038fd96>] aio_migratepage+0x156/0x1e8
{IN-SOFTIRQ-W} state was registered at:
__lock_acquire+0x5b6/0x1930
lock_acquire+0xee/0x270
_raw_spin_lock_irqsave+0x66/0xb0
aio_complete+0x98/0x328
dio_complete+0xe4/0x1e0
blk_update_request+0xd4/0x450
scsi_end_request+0x48/0x1c8
scsi_io_completion+0x272/0x698
blk_done_softirq+0xca/0xe8
__do_softirq+0xc8/0x518
irq_exit+0xee/0x110
do_IRQ+0x6a/0x88
io_int_handler+0x11a/0x25c
__mutex_unlock_slowpath+0x144/0x1d8
__mutex_unlock_slowpath+0x140/0x1d8
kernfs_iop_permission+0x64/0x80
__inode_permission+0x9e/0xf0
link_path_walk+0x6e/0x510
path_lookupat+0xc4/0x1a8
filename_lookup+0x9c/0x160
user_path_at_empty+0x5c/0x70
SyS_readlinkat+0x68/0x140
system_call+0xd6/0x270
irq event stamp: 971410
hardirqs last enabled at (971409): migrate_page_move_mapping+0x3ea/0x588
hardirqs last disabled at (971410): _raw_spin_lock_irqsave+0x3c/0xb0
softirqs last enabled at (970526): __do_softirq+0x460/0x518
softirqs last disabled at (970519): irq_exit+0xee/0x110
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&(&ctx->completion_lock)->rlock);
<Interrupt>
lock(&(&ctx->completion_lock)->rlock);
*** DEADLOCK ***
3 locks held by kcompactd0/151:
#0: (&(&mapping->private_lock)->rlock){+.+.-.}, at: aio_migratepage+0x42/0x1e8
#1: (&ctx->ring_lock){+.+.+.}, at: aio_migratepage+0x5a/0x1e8
#2: (&(&ctx->completion_lock)->rlock){+.?.-.}, at: aio_migratepage+0x156/0x1e8
stack backtrace:
CPU: 20 PID: 151 Comm: kcompactd0 Tainted: G W 4.7.0-rc1+ #52
Call Trace:
show_trace+0xea/0xf0
show_stack+0x72/0xf0
dump_stack+0x9a/0xd8
print_usage_bug.part.27+0x2d4/0x2e8
mark_lock+0x17e/0x758
mark_held_locks+0xa2/0xd0
trace_hardirqs_on_caller+0x140/0x1c0
mem_cgroup_migrate+0x266/0x370
aio_migratepage+0x16a/0x1e8
move_to_new_page+0xb0/0x260
migrate_pages+0x8f4/0x9f0
compact_zone+0x4dc/0xdc8
kcompactd_do_work+0x1aa/0x358
kcompactd+0xba/0x2c8
kthread+0x10a/0x110
kernel_thread_starter+0x6/0xc
kernel_thread_starter+0x0/0xc
INFO: lockdep is turned off.
Link: http://lkml.kernel.org/r/20160620184158.GO3262@mtj.duckdns.org
Link: http://lkml.kernel.org/g/5767CFE5.7080904@de.ibm.com
Fixes:
74485cf2bc85 ("mm: migrate: consolidate mem_cgroup_migrate() calls")
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Cc: <stable@vger.kernel.org> [4.5+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>