Linus Torvalds [Mon, 2 May 2016 16:59:57 +0000 (09:59 -0700)]
Merge branch 'for_linus' of git://git./linux/kernel/git/jack/linux-fs
Pull UDF fix from Jan Kara:
"A fix of a regression in UDF that got introduced in 4.6-rc1 by one of
the charset encoding fixes"
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
udf: Fix conversion of 'dstring' fields to UTF8
Linus Torvalds [Mon, 2 May 2016 16:54:22 +0000 (09:54 -0700)]
Merge tag 'gpio-v4.6-4' of git://git./linux/kernel/git/linusw/linux-gpio
Pull GPIO fixes from Linus Walleij:
"Here are some late but important fixes for the v4.6 kernel series.
ACPI and RCAR, so two driver fixes (PM related) and a self-evident
string lookup fix for ACPI GPIOs:
- A serious ACPI fix targeted for stable: lookup strings were being
free'd.
- Revert two patches from the RCAR driver"
* tag 'gpio-v4.6-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
gpiolib-acpi: Duplicate con_id string when adding it to the crs lookup list
Revert "gpio: rcar: Fine-grained Runtime PM support"
Revert "gpio: rcar: Add Runtime PM handling for interrupts"
Linus Torvalds [Mon, 2 May 2016 16:40:42 +0000 (09:40 -0700)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) MODULE_FIRMWARE firmware string not correct for iwlwifi 8000 chips,
from Sara Sharon.
2) Fix SKB size checks in batman-adv stack on receive, from Sven
Eckelmann.
3) Leak fix on mac80211 interface add error paths, from Johannes Berg.
4) Cannot invoke napi_disable() with BH disabled in myri10ge driver,
fix from Stanislaw Gruszka.
5) Fix sign extension problem when computing feature masks in
net_gso_ok(), from Marcelo Ricardo Leitner.
6) lan78xx driver doesn't count packets and packet lengths in its
statistics properly, fix from Woojung Huh.
7) Fix the buffer allocation sizes in pegasus USB driver, from Petko
Manolov.
8) Fix refcount overflows in bpf, from Alexei Starovoitov.
9) Unified dst cache handling introduced a preempt warning in
ip_tunnel, fix by resetting rather then setting the cached route.
From Paolo Abeni.
10) Listener hash collision test fix in soreuseport, from Craig Gallak
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (47 commits)
gre: do not pull header in ICMP error processing
net: Implement net_dbg_ratelimited() for CONFIG_DYNAMIC_DEBUG case
tipc: only process unicast on intended node
cxgb3: fix out of bounds read
net/smscx5xx: use the device tree for mac address
soreuseport: Fix TCP listener hash collision
net: l2tp: fix reversed udp6 checksum flags
ip_tunnel: fix preempt warning in ip tunnel creation/updating
samples/bpf: fix trace_output example
bpf: fix check_map_func_compatibility logic
bpf: fix refcnt overflow
drivers: net: cpsw: use of_phy_connect() in fixed-link case
dt: cpsw: phy-handle, phy_id, and fixed-link are mutually exclusive
drivers: net: cpsw: don't ignore phy-mode if phy-handle is used
drivers: net: cpsw: fix segfault in case of bad phy-handle
drivers: net: cpsw: fix parsing of phy-handle DT property in dual_emac config
MAINTAINERS: net: Change maintainer for GRETH 10/100/1G Ethernet MAC device driver
gre: reject GUE and FOU in collect metadata mode
pegasus: fixes reported packet length
pegasus: fixes URB buffer allocation size;
...
Linus Torvalds [Mon, 2 May 2016 16:32:50 +0000 (09:32 -0700)]
Merge git://git./linux/kernel/git/davem/sparc
Pull sparc fixes from David Miller:
1) Fix panics with SR-IOV, from Babu Moger.
2) Wire up preadv2/pwritev2.
3) Allow proper auto-loading of VIO devices, from John Paul Adrian
Glaubitz.
4) Recognize Sonoma cpus, from Khalid Aziz.
5) Fix bootup regressions caused by syscall trace fixes made recently.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
sparc64: Fix bootup regressions on some Kconfig combinations.
sparc64: recognize and support Sonoma CPU type
sparc: Implement and wire up vio_hotplug for vio.
sparc: Implement and wire up modalias_show for vio.
sparc/pci: Refactor dev_archdata initialization into pci_init_dev_archdata
sparc/defconfigs: Remove CONFIG_IPV6_PRIVACY
sparc: Write up preadv2/pwritev2 syscalls.
sparc/PCI: Fix for panic while enabling SR-IOV
Jiri Benc [Fri, 29 Apr 2016 21:31:32 +0000 (23:31 +0200)]
gre: do not pull header in ICMP error processing
iptunnel_pull_header expects that IP header was already pulled; with this
expectation, it pulls the tunnel header. This is not true in gre_err.
Furthermore, ipv4_update_pmtu and ipv4_redirect expect that skb->data points
to the IP header.
We cannot pull the tunnel header in this path. It's just a matter of not
calling iptunnel_pull_header - we don't need any of its effects.
Fixes:
bda7bb463436 ("gre: Allow multiple protocol listener for gre protocol.")
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tim Bingham [Fri, 29 Apr 2016 17:30:23 +0000 (13:30 -0400)]
net: Implement net_dbg_ratelimited() for CONFIG_DYNAMIC_DEBUG case
Prior to commit
d92cff89a0c8 ("net_dbg_ratelimited: turn into no-op
when !DEBUG") the implementation of net_dbg_ratelimited() was buggy
for both the DEBUG and CONFIG_DYNAMIC_DEBUG cases.
The bug was that net_ratelimit() was being called and, despite
returning true, nothing was being printed to the console. This
resulted in messages like the following -
"net_ratelimit: %d callbacks suppressed"
with no other output nearby.
After commit
d92cff89a0c8 ("net_dbg_ratelimited: turn into no-op when
!DEBUG") the bug is fixed for the DEBUG case. However, there's no
output at all for CONFIG_DYNAMIC_DEBUG case.
This patch restores debug output (if enabled) for the
CONFIG_DYNAMIC_DEBUG case.
Add a definition of net_dbg_ratelimited() for the CONFIG_DYNAMIC_DEBUG
case. The implementation takes care to check that dynamic debugging is
enabled before calling net_ratelimit().
Fixes:
d92cff89a0c8 ("net_dbg_ratelimited: turn into no-op when !DEBUG")
Signed-off-by: Tim Bingham <tbingham@akamai.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hamish Martin [Fri, 29 Apr 2016 14:40:24 +0000 (10:40 -0400)]
tipc: only process unicast on intended node
We have observed complete lock up of broadcast-link transmission due to
unacknowledged packets never being removed from the 'transmq' queue. This
is traced to nodes having their ack field set beyond the sequence number
of packets that have actually been transmitted to them.
Consider an example where node 1 has sent 10 packets to node 2 on a
link and node 3 has sent 20 packets to node 2 on another link. We
see examples of an ack from node 2 destined for node 3 being treated as
an ack from node 2 at node 1. This leads to the ack on the node 1 to node
2 link being increased to 20 even though we have only sent 10 packets.
When node 1 does get around to sending further packets, none of the
packets with sequence numbers less than 21 are actually removed from the
transmq.
To resolve this we reinstate some code lost in commit
d999297c3dbb ("tipc:
reduce locking scope during packet reception") which ensures that only
messages destined for the receiving node are processed by that node. This
prevents the sequence numbers from getting out of sync and resolves the
packet leakage, thereby resolving the broadcast-link transmission
lock-ups we observed.
While we are aware that this change only patches over a root problem that
we still haven't identified, this is a sanity test that it is always
legitimate to do. It will remain in the code even after we identify and
fix the real problem.
Reviewed-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Reviewed-by: John Thompson <john.thompson@alliedtelesis.co.nz>
Signed-off-by: Hamish Martin <hamish.martin@alliedtelesis.co.nz>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michal Schmidt [Fri, 29 Apr 2016 09:06:50 +0000 (11:06 +0200)]
cxgb3: fix out of bounds read
An out of bounds read of 2 bytes was discovered in cxgb3 with KASAN.
t3_config_rss() expects both arrays it gets as parameters to have
terminators. setup_rss(), the caller, forgets to add a terminator to
one of the arrays. Thankfully the iteration in t3_config_rss() stops
anyway, but in the last iteration the check for the terminator
is an out of bounds read.
Add the missing terminator to rspq_map[].
Reported-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Fri, 29 Apr 2016 07:05:59 +0000 (09:05 +0200)]
net/smscx5xx: use the device tree for mac address
This takes the MAC address for smsc75xx/smsc95xx USB network devices
from a the device tree. This is required to get a usable persistent
address on the popular beagleboard, whose hardware designers
accidentally forgot that an ethernet device really requires an a
MAC address to be functional.
The Raspberry Pi also ships smsc9514 without a serial EEPROM, stores
the MAC address in ROM accessible via VC4 firmware.
The smsc75xx and smsc95xx drivers are just two copies of the
same code, so better fix both.
[lkundrak@v3.sk: updated to use of_get_property() as per suggestion from
Arnd, reworded the message and comments a bit]
Tested-by: Lubomir Rintel <lkundrak@v3.sk>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Craig Gallek [Thu, 28 Apr 2016 23:24:32 +0000 (19:24 -0400)]
soreuseport: Fix TCP listener hash collision
I forgot to include a check for listener port equality when deciding
if two sockets should belong to the same reuseport group. This was
not caught previously because it's only necessary when two listening
sockets for the same user happen to hash to the same listener bucket.
The same error does not exist in the UDP path.
Fixes:
c125e80b8868("soreuseport: fast reuseport TCP socket selection")
Signed-off-by: Craig Gallek <kraig@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wang Shanker [Thu, 28 Apr 2016 17:29:43 +0000 (01:29 +0800)]
net: l2tp: fix reversed udp6 checksum flags
This patch fixes a bug which causes the behavior of whether to ignore
udp6 checksum of udp6 encapsulated l2tp tunnel contrary to what
userspace program requests.
When the flag `L2TP_ATTR_UDP_ZERO_CSUM6_RX` is set by userspace, it is
expected that udp6 checksums of received packets of the l2tp tunnel
to create should be ignored. In `l2tp_netlink.c`:
`l2tp_nl_cmd_tunnel_create()`, `cfg.udp6_zero_rx_checksums` is set
according to the flag, and then passed to `l2tp_core.c`:
`l2tp_tunnel_create()` and then `l2tp_tunnel_sock_create()`. In
`l2tp_tunnel_sock_create()`, `udp_conf.use_udp6_rx_checksums` is set
the same to `cfg.udp6_zero_rx_checksums`. However, if we want the
checksum to be ignored, `udp_conf.use_udp6_rx_checksums` should be set
to `false`, i.e. be set to the contrary. Similarly, the same should be
done to `udp_conf.use_udp6_tx_checksums`.
Signed-off-by: Miao Wang <shankerwangmiao@gmail.com>
Acked-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Sun, 1 May 2016 22:52:31 +0000 (15:52 -0700)]
Linux 4.6-rc6
Linus Torvalds [Sun, 1 May 2016 01:57:42 +0000 (18:57 -0700)]
Merge branch 'fixes' of git://git./linux/kernel/git/evalenti/linux-soc-thermal
Pull thermal fixes from Eduardo Valentin:
"A couple of minor fixes for the thermal subsystem.
Specifics in this pull request:
- Fixes in hisilicon thermal driver
- More fixes of unsigned to int type change in thermal_core.c"
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal:
thermal: use %d to print S32 parameters
thermal: hisilicon: increase temperature resolution
Ville Syrjälä [Mon, 25 Apr 2016 13:01:19 +0000 (16:01 +0300)]
gpiolib-acpi: Duplicate con_id string when adding it to the crs lookup list
Calling gpiod_get() from a module and then unloading the module leads to an
oops due to acpi_can_fallback_to_crs() storing the pointer to the passed
'con_id' string onto acpi_crs_lookup_list. The next guy to come along will then
try to access the string but the memory may now be gone with the module.
Make a copy of the passed string instead, and store the copy on the list.
BUG: unable to handle kernel paging request at
ffffffffa03e7855
IP: [<
ffffffff81338322>] strcmp+0x12/0x30
PGD
2a07067 PUD
2a08063 PMD
74720067 PTE 0
Oops: 0000 [#1] PREEMPT SMP
Modules linked in: i915(+) drm_kms_helper drm intel_gtt snd_hda_codec snd_hda_core i2c_algo_bit syscopya
rea sysfillrect sysimgblt fb_sys_fops agpgart snd_soc_sst_bytcr_rt5640 coretemp hwmon intel_rapl intel_soc_dts_thermal
punit_atom_debug snd_soc_rt5640 snd_soc_rl6231 serio snd_intel_sst_acpi snd_intel_sst_core video snd_soc_sst_mfld_platf
orm snd_soc_sst_match backlight int3402_thermal processor_thermal_device int3403_thermal int3400_thermal acpi_thermal_r
el snd_soc_core intel_soc_dts_iosf int340x_thermal_zone snd_compress i2c_hid hid snd_pcm snd_timer snd soundcore evdev
sch_fq_codel efivarfs ipv6 autofs4 [last unloaded: drm]
CPU: 2 PID: 3064 Comm: modprobe Tainted: G U W 4.6.0-rc3-ffrd-ipvr+ #302
Hardware name: Intel Corp. VALLEYVIEW C0 PLATFORM/BYT-T FFD8, BIOS BLAKFF81.X64.0088.R10.
1403240443 FFD8
_X64_R_2014_13_1_00 03/24/2014
task:
ffff8800701cd200 ti:
ffff880070034000 task.ti:
ffff880070034000
RIP: 0010:[<
ffffffff81338322>] [<
ffffffff81338322>] strcmp+0x12/0x30
RSP: 0000:
ffff880070037748 EFLAGS:
00010286
RAX:
0000000080000000 RBX:
ffff88007a342800 RCX:
0000000000000006
RDX:
0000000000000006 RSI:
ffffffffa054f856 RDI:
ffffffffa03e7856
RBP:
ffff880070037748 R08:
0000000000000000 R09:
0000000000000001
R10:
0000000000000000 R11:
0000000000000000 R12:
ffffffffa054f855
R13:
ffff88007281cae0 R14:
0000000000000010 R15:
ffffffffffffffea
FS:
00007faa51447700(0000) GS:
ffff880079300000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
ffffffffa03e7855 CR3:
0000000041eba000 CR4:
00000000001006e0
Stack:
ffff880070037770 ffffffff8136ad28 ffffffffa054f855 0000000000000000
ffff88007a0a2098 ffff8800700377e8 ffffffff8136852e ffff88007a342800
00000007700377a0 ffff8800700377a0 ffffffff81412442 70672d6c656e6170
Call Trace:
[<
ffffffff8136ad28>] acpi_can_fallback_to_crs+0x88/0x100
[<
ffffffff8136852e>] gpiod_get_index+0x25e/0x310
[<
ffffffff81412442>] ? mipi_dsi_attach+0x22/0x30
[<
ffffffff813685f2>] gpiod_get+0x12/0x20
[<
ffffffffa04fcf41>] intel_dsi_init+0x421/0x480 [i915]
[<
ffffffffa04d3783>] intel_modeset_init+0x853/0x16b0 [i915]
[<
ffffffffa0504864>] ? intel_setup_gmbus+0x214/0x260 [i915]
[<
ffffffffa0510158>] i915_driver_load+0xdc8/0x19b0 [i915]
[<
ffffffff8160fb53>] ? _raw_spin_unlock_irqrestore+0x43/0x70
[<
ffffffffa026b13b>] drm_dev_register+0xab/0xc0 [drm]
[<
ffffffffa026d7b3>] drm_get_pci_dev+0x93/0x1f0 [drm]
[<
ffffffff8160fb53>] ? _raw_spin_unlock_irqrestore+0x43/0x70
[<
ffffffffa043f1f4>] i915_pci_probe+0x34/0x50 [i915]
[<
ffffffff81379751>] pci_device_probe+0x91/0x100
[<
ffffffff8141a75a>] driver_probe_device+0x20a/0x2d0
[<
ffffffff8141a8be>] __driver_attach+0x9e/0xb0
[<
ffffffff8141a820>] ? driver_probe_device+0x2d0/0x2d0
[<
ffffffff81418439>] bus_for_each_dev+0x69/0xa0
[<
ffffffff8141a04e>] driver_attach+0x1e/0x20
[<
ffffffff81419c20>] bus_add_driver+0x1c0/0x240
[<
ffffffff8141b6d0>] driver_register+0x60/0xe0
[<
ffffffff81377d20>] __pci_register_driver+0x60/0x70
[<
ffffffffa026d9f4>] drm_pci_init+0xe4/0x110 [drm]
[<
ffffffff810ce04e>] ? trace_hardirqs_on+0xe/0x10
[<
ffffffffa02f1000>] ? 0xffffffffa02f1000
[<
ffffffffa02f1094>] i915_init+0x94/0x9b [i915]
[<
ffffffff810003bb>] do_one_initcall+0x8b/0x1c0
[<
ffffffff810eb616>] ? rcu_read_lock_sched_held+0x86/0x90
[<
ffffffff811de6d6>] ? kmem_cache_alloc_trace+0x1f6/0x270
[<
ffffffff81183826>] do_init_module+0x60/0x1dc
[<
ffffffff81115a8d>] load_module+0x1d0d/0x2390
[<
ffffffff811120b0>] ? __symbol_put+0x70/0x70
[<
ffffffff811f41b2>] ? kernel_read_file+0x92/0x120
[<
ffffffff811162f4>] SYSC_finit_module+0xa4/0xb0
[<
ffffffff8111631e>] SyS_finit_module+0xe/0x10
[<
ffffffff81001ff3>] do_syscall_64+0x63/0x350
[<
ffffffff816103da>] entry_SYSCALL64_slow_path+0x25/0x25
Code: f7 48 8d 76 01 48 8d 52 01 0f b6 4e ff 84 c9 88 4a ff 75 ed 5d c3 0f 1f 00 55 48 89 e5 eb 04 84 c0
74 18 48 8d 7f 01 48 8d 76 01 <0f> b6 47 ff 3a 46 ff 74 eb 19 c0 83 c8 01 5d c3 31 c0 5d c3 66
RIP [<
ffffffff81338322>] strcmp+0x12/0x30
RSP <
ffff880070037748>
CR2:
ffffffffa03e7855
v2: Make the copied con_id const
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Alexandre Courbot <gnurou@gmail.com>
Cc: stable@vger.kernel.org
Fixes:
10cf4899f8af ("gpiolib: tighten up ACPI legacy gpio lookups")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Linus Torvalds [Sat, 30 Apr 2016 01:50:08 +0000 (18:50 -0700)]
Merge tag 'powerpc-4.6-4' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"A few more powerpc fixes for 4.6:
- cxl: Keep IRQ mappings on context teardown from Michael Neuling
- cxl: Poll for outstanding IRQs when detaching a context from
Michael Neuling
- Wire up preadv2 and pwritev2 syscalls from Rui Salvaterra"
* tag 'powerpc-4.6-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc: wire up preadv2 and pwritev2 syscalls
cxl: Poll for outstanding IRQs when detaching a context
cxl: Keep IRQ mappings on context teardown
Linus Torvalds [Sat, 30 Apr 2016 00:59:26 +0000 (17:59 -0700)]
Merge tag 'edac_fix_for_4.6' of git://git./linux/kernel/git/bp/bp
Pull EDAC fix from Borislav Petkov:
"Make sure sb_edac and i7core_edac do not terminate MCE processing on
the decoding callchain prematurely"
* tag 'edac_fix_for_4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
EDAC: i7core, sb_edac: Don't return NOTIFY_BAD from mce_decoder callback
Linus Torvalds [Sat, 30 Apr 2016 00:39:51 +0000 (17:39 -0700)]
Merge tag 'pm+acpi-4.6-rc6' of git://git./linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"One revert of a recent cpufreq commit that introduced a regression and
a fix for intel_pstate's Turbo Activation Ratio handling code.
Specifics:
- Revert cpufreq commit that attempted to fix a problem in the
ondemand/conservative governor code, but did that incorrectly and
introduced another problem instead (Rafael Wysocki).
- Fix incorrect decoding of MSR contents related to the Turbo
Activation Ratio (TAR) handling in the intel_pstate driver
(Srinivas Pandruvada)"
* tag 'pm+acpi-4.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: intel_pstate: Fix processing for turbo activation ratio
Revert "cpufreq: governor: Fix negative idle_time when configured with CONFIG_HZ_PERIODIC"
Linus Torvalds [Sat, 30 Apr 2016 00:32:19 +0000 (17:32 -0700)]
Merge tag 'mmc-v4.6-rc4' of git://git.linaro.org/people/ulf.hansson/mmc
Pull MMC fixes from Ulf Hansson:
"Here are a two MMC host fixes:
- sdhci-acpi: Reduce Baytrail eMMC/SD/SDIO hangs
- sunxi: Disable eMMC HS-DDR for Allwinner A80"
* tag 'mmc-v4.6-rc4' of git://git.linaro.org/people/ulf.hansson/mmc:
mmc: sunxi: Disable eMMC HS-DDR (MMC_CAP_1_8V_DDR) for Allwinner A80
mmc: sdhci-acpi: Reduce Baytrail eMMC/SD/SDIO hangs
Linus Torvalds [Sat, 30 Apr 2016 00:18:55 +0000 (17:18 -0700)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"A few fixes all over the place:
radeon is probably the biggest standout, it's a fix for screen
corruption or hung black outputs so I thought it was worth pulling in.
Otherwise some amdgpu power control fixes, some misc vmwgfx fixes, one
etnaviv fix, one virtio-gpu fix, two DP MST fixes, and a single TTM
fix"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm/vmwgfx: Fix order of operation
drm/vmwgfx: use vmw_cmd_dx_cid_check for query commands.
drm/vmwgfx: Enable SVGA_3D_CMD_DX_SET_PREDICATION
drm/amdgpu: disable vm interrupts with vm_fault_stop=2
drm/amdgpu: print a message if ATPX dGPU power control is missing
Revert "drm/amdgpu: disable runtime pm on PX laptops without dGPU power control"
drm/radeon: fix vertical bars appear on monitor (v2)
drm/ttm: fix kref count mess in ttm_bo_move_to_lru_tail
drm/virtio: send vblank event after crtc updates
drm/dp/mst: Restore primary hub guid on resume
drm/dp/mst: Get validated port ref in drm_dp_update_payload_part1()
drm/etnaviv: don't move linear memory window on 3D cores without MC2.0
Linus Torvalds [Sat, 30 Apr 2016 00:07:54 +0000 (17:07 -0700)]
Merge tag 'for-linus' of git://git./linux/kernel/git/dledford/rdma
Pull rdma fixes from Doug Ledford:
"Final set of -rc fixes for 4.6.
I've collected up a number of patches that are all pretty small with
the exception of only a couple. The hfi1 driver has a number of
important patches, and it is what really drives the line count of this
pull request up. These are all small and I've got this kernel built
and running in the test lab (I have most of the hardware, I think nes
is the only thing in this patch set that I can't say I've personally
tested and have up and running).
Summary:
- A number of collected fixes for oopses, memory corruptions,
deadlocks, etc. All of these fixes are small (many only 5-10
lines), obvious, and tested.
- Fix for the security issue related to the use of write for
bi-directional communications"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
RDMA/nes: don't leak skb if carrier down
IB/security: Restrict use of the write() interface
IB/hfi1: Use kernel default llseek for ui device
IB/hfi1: Don't attempt to free resources if initialization failed
IB/hfi1: Fix missing lock/unlock in verbs drain callback
IB/rdmavt: Fix send scheduling
IB/hfi1: Prevent unpinning of wrong pages
IB/hfi1: Fix deadlock caused by locking with wrong scope
IB/hfi1: Prevent NULL pointer deferences in caching code
MAINTAINERS: Update iser/isert maintainer contact info
IB/mlx5: Expose correct max_sge_rd limit
RDMA/iw_cxgb4: Fix bar2 virt addr calculation for T4 chips
iw_cxgb4: handle draining an idle qp
iw_cxgb3: initialize ibdev.iwcm->ifname for port mapping
iw_cxgb4: initialize ibdev.iwcm->ifname for port mapping
IB/core: Don't drain non-existent rq queue-pair
IB/core: Fix oops in ib_cache_gid_set_default_gid
Linus Torvalds [Fri, 29 Apr 2016 18:21:22 +0000 (11:21 -0700)]
Merge branch 'akpm' (patches from Andrew)
Merge fixes from Andrew Morton:
"20 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
Documentation/sysctl/vm.txt: update numa_zonelist_order description
lib/stackdepot.c: allow the stack trace hash to be zero
rapidio: fix potential NULL pointer dereference
mm/memory-failure: fix race with compound page split/merge
ocfs2/dlm: return zero if deref_done message is successfully handled
Ananth has moved
kcov: don't profile branches in kcov
kcov: don't trace the code coverage code
mm: wake kcompactd before kswapd's short sleep
.mailmap: add Frank Rowand
mm/hwpoison: fix wrong num_poisoned_pages accounting
mm: call swap_slot_free_notify() with page lock held
mm: vmscan: reclaim highmem zone if buffer_heads is over limit
numa: fix /proc/<pid>/numa_maps for THP
mm/huge_memory: replace VM_NO_THP VM_BUG_ON with actual VMA check
mailmap: fix Krzysztof Kozlowski's misspelled name
thp: keep huge zero page pinned until tlb flush
mm: exclude HugeTLB pages from THP page_mapped() logic
kexec: export OFFSET(page.compound_head) to find out compound tail page
kexec: update VMCOREINFO for compound_order/dtor
Paolo Abeni [Thu, 28 Apr 2016 09:04:51 +0000 (11:04 +0200)]
ip_tunnel: fix preempt warning in ip tunnel creation/updating
After the commit
e09acddf873b ("ip_tunnel: replace dst_cache with generic
implementation"), a preemption debug warning is triggered on ip4
tunnels updating; the dst cache helper needs to be invoked in unpreemptible
context.
We don't need to load the cache on tunnel update, so this commit fixes
the warning replacing the load with a dst cache reset, which is
preempt safe.
Fixes:
e09acddf873b ("ip_tunnel: replace dst_cache with generic implementation")
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tony Luck [Fri, 29 Apr 2016 13:42:25 +0000 (15:42 +0200)]
EDAC: i7core, sb_edac: Don't return NOTIFY_BAD from mce_decoder callback
Both of these drivers can return NOTIFY_BAD, but this terminates
processing other callbacks that were registered later on the chain.
Since the driver did nothing to log the error it seems wrong to prevent
other interested parties from seeing it. E.g. neither of them had even
bothered to check the type of the error to see if it was a memory error
before the return NOTIFY_BAD.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: Aristeu Rozanski <aris@redhat.com>
Acked-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/72937355dd92318d2630979666063f8a2853495b.1461864507.git.tony.luck@intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Rafael J. Wysocki [Fri, 29 Apr 2016 12:22:25 +0000 (14:22 +0200)]
Merge branch 'pm-cpufreq-fixes'
* pm-cpufreq-fixes:
cpufreq: intel_pstate: Fix processing for turbo activation ratio
Revert "cpufreq: governor: Fix negative idle_time when configured with CONFIG_HZ_PERIODIC"
Dave Airlie [Fri, 29 Apr 2016 04:31:44 +0000 (14:31 +1000)]
Merge branch 'drm-fixes-4.6' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
A few fixes for 4.6.
- revert amdgpu PX commit that was previously reverted on the radeon side
- cleaned up version of the NI+ MC update display fix for radeon
- TTM kref fix
* 'drm-fixes-4.6' of git://people.freedesktop.org/~agd5f/linux:
drm/amdgpu: disable vm interrupts with vm_fault_stop=2
drm/amdgpu: print a message if ATPX dGPU power control is missing
Revert "drm/amdgpu: disable runtime pm on PX laptops without dGPU power control"
drm/radeon: fix vertical bars appear on monitor (v2)
drm/ttm: fix kref count mess in ttm_bo_move_to_lru_tail
Dave Airlie [Fri, 29 Apr 2016 04:27:50 +0000 (14:27 +1000)]
Merge branch 'drm-vmwgfx-fixes' of git://people.freedesktop.org/~syeh/repos_linux into drm-fixes
three misc vmwgfx fixes
* 'drm-vmwgfx-fixes' of git://people.freedesktop.org/~syeh/repos_linux:
drm/vmwgfx: Fix order of operation
drm/vmwgfx: use vmw_cmd_dx_cid_check for query commands.
drm/vmwgfx: Enable SVGA_3D_CMD_DX_SET_PREDICATION
Linus Torvalds [Fri, 29 Apr 2016 03:24:27 +0000 (20:24 -0700)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"Two boot crash fixes and an IRQ handling crash fix"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/apic: Handle zero vector gracefully in clear_vector_irq()
Revert "x86/mm/32: Set NX in __supported_pte_mask before enabling paging"
xen/qspinlock: Don't kick CPU if IRQ is not initialized
Linus Torvalds [Fri, 29 Apr 2016 03:19:04 +0000 (20:19 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"x86 PMU driver fixes plus a core code race fix"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel: Fix incorrect lbr_sel_mask value
perf/x86/intel/pt: Don't die on VMXON
perf/core: Fix perf_event_open() vs. execve() race
perf/x86/amd: Set the size of event map array to PERF_COUNT_HW_MAX
perf/core: Make sysctl_perf_cpu_time_max_percent conform to documentation
perf/x86/intel/rapl: Add missing Haswell model
perf/x86/intel: Add model number for Skylake Server to perf
Linus Torvalds [Fri, 29 Apr 2016 02:59:17 +0000 (19:59 -0700)]
Merge branch 'locking-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull locking fixes from Ingo Molnar:
"Two lockdep fixes"
* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
lockdep: Fix lock_chain::base size
locking/lockdep: Fix ->irq_context calculation
Linus Torvalds [Fri, 29 Apr 2016 02:54:50 +0000 (19:54 -0700)]
Merge branch 'efi-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull EFI fix from Ingo Molnar:
"This fixes a bug in the efivars code"
* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi: Fix out-of-bounds read in variable_matches()
Linus Torvalds [Fri, 29 Apr 2016 02:44:47 +0000 (19:44 -0700)]
Merge tag 'media/v4.6-4' of git://git./linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
"Some regression fixes:
- videobuf2 core: avoid the risk of going past buffer on multi-planes
and fix rw mode
- fix support for 4K formats at V4L2 core
- fix a trouble at davinci_fpe, caused by a bad patch
- usbvision: revert a patch with a partial fixup. The fixup patch
was merged already, and this one has some issues"
* tag 'media/v4.6-4' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
[media] vb2-memops: Fix over allocation of frame vectors
[media] media: vb2: Fix regression on poll() for RW mode
[media] v4l2-dv-timings.h: fix polarity for 4k formats
[media] davinci_vpfe: Revert "staging: media: davinci_vpfe: remove,unnecessary ret variable"
[media] usbvision: revert commit
588afcc1
[media] videobuf2-v4l2: Verify planes array in buffer dequeueing
[media] videobuf2-core: Check user space planes array in dqbuf
Linus Torvalds [Fri, 29 Apr 2016 02:38:45 +0000 (19:38 -0700)]
Merge tag 'sound-4.6-rc6' of git://git./linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Usually we get a big collection of fixes for ASoC once during rc. And
this is it.
At this time, most of fixes are about Intel Skylake ASoC driver, which
is a new and still on-going development. Along with it, a slight
large LOC is seen in legacy HD-audio driver, but it's merely a code
move to the upper layer.
Other than that, the rest are small or trivial fixes to various
drivers, in addition to an ASoC dapm debugfs code fix"
* tag 'sound-4.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (24 commits)
ALSA: hda - Update BCLK also at hotplug for i915 HSW/BDW
ALSA: hda - Add dock support for ThinkPad X260
ASoC: wm5102: Free compressed IRQ in CODEC remove
ASoC: arizona: Free speaker thermal IRQs in CODEC remove
ASoC: Intel: Skylake: Fix ibs/obs calc for non-integral sampling rates
ASoC: Intel: sst: fix a loop timeout in sst_hsw_stream_reset()
ASoC: Intel: Skylake: Fix to turn OFF codec power when entering S3
ASoC: hdac_hdmi: Fix codec power state in S3 during playback
ASoC: hdac_hdmi: Fix to use dev_pm ops instead soc pm
ASoC: wm8962: Correct typo when setting DSPCLK rate
ASoC: nau8825: Fix jack detection across suspend
ASoC: Intel: Skylake: Fix DSP resource de-allocation
ASoC: Intel: Skylake: Fix for unloading module only when it is loaded
ASoC: Intel: Skylake: Fix kbuild dependency
ASoC: dapm: Make sure we have a card when displaying component widgets
ASoC: rt5640: Correct the digital interface data select
ASoC: Intel: Skylake: remove call to pci_dev_put
ASoC: Intel: Skylake: Call i915 exit last
ASoC: Intel: Skylake: Unmap the address last
ASoC: Intel: Skylake: Freeup properly on skl_dsp_free
...
Xishi Qiu [Thu, 28 Apr 2016 23:19:11 +0000 (16:19 -0700)]
Documentation/sysctl/vm.txt: update numa_zonelist_order description
Commit
3193913ce62c ("mm: page_alloc: default node-ordering on 64-bit
NUMA, zone-ordering on 32-bit") changes the default value of
numa_zonelist_order. Update the document.
Signed-off-by: Xishi Qiu <qiuxishi@huawei.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexander Potapenko [Thu, 28 Apr 2016 23:19:09 +0000 (16:19 -0700)]
lib/stackdepot.c: allow the stack trace hash to be zero
Do not bail out from depot_save_stack() if the stack trace has zero hash.
Initially depot_save_stack() silently dropped stack traces with zero
hashes, however there's actually no point in reserving this zero value.
Reported-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Alexander Potapenko <glider@google.com>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vladimir Zapolskiy [Thu, 28 Apr 2016 23:19:06 +0000 (16:19 -0700)]
rapidio: fix potential NULL pointer dereference
The change fixes improper check for a returned error value by
class_create() function, which on error returns ERR_PTR() value, thus the
original check always results in a dead code on error path.
Signed-off-by: Vladimir Zapolskiy <vz@mleia.com>
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Konstantin Khlebnikov [Thu, 28 Apr 2016 23:19:03 +0000 (16:19 -0700)]
mm/memory-failure: fix race with compound page split/merge
get_hwpoison_page() must recheck relation between head and tail pages.
n-horiguchi said: without this recheck, the race causes kernel to pin an
irrelevant page, and finally makes kernel crash for refcount mismatch.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
xuejiufei [Thu, 28 Apr 2016 23:19:01 +0000 (16:19 -0700)]
ocfs2/dlm: return zero if deref_done message is successfully handled
dlm_deref_lockres_done_handler() should return zero if the message is
successfully handled.
Fixes:
60d663cb5273 ("ocfs2/dlm: add DEREF_DONE message").
Signed-off-by: xuejiufei <xuejiufei@huawei.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ananth N Mavinakayanahalli [Thu, 28 Apr 2016 23:18:58 +0000 (16:18 -0700)]
Ananth has moved
The current ID is going away soon... update email address
Signed-off-by: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrey Ryabinin [Thu, 28 Apr 2016 23:18:55 +0000 (16:18 -0700)]
kcov: don't profile branches in kcov
Profiling 'if' statements in __sanitizer_cov_trace_pc() leads to
unbound recursion and crash:
__sanitizer_cov_trace_pc() ->
ftrace_likely_update ->
__sanitizer_cov_trace_pc() ...
Define DISABLE_BRANCH_PROFILING to disable this tracer.
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
James Morse [Thu, 28 Apr 2016 23:18:52 +0000 (16:18 -0700)]
kcov: don't trace the code coverage code
Kcov causes the compiler to add a call to __sanitizer_cov_trace_pc() in
every basic block. Ftrace patches in a call to _mcount() to each
function it has annotated.
Letting these mechanisms annotate each other is a bad thing. Break the
loop by adding 'notrace' to __sanitizer_cov_trace_pc() so that ftrace
won't try to patch this code.
This patch lets arm64 with KCOV and STACK_TRACER boot.
Signed-off-by: James Morse <james.morse@arm.com>
Acked-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Alexander Potapenko <glider@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vlastimil Babka [Thu, 28 Apr 2016 23:18:49 +0000 (16:18 -0700)]
mm: wake kcompactd before kswapd's short sleep
When kswapd goes to sleep it checks if the node is balanced and at first
it sleeps only for HZ/10 time, then rechecks if the node is still
balanced and nobody has woken it during the initial sleep. Only then it
goes fully sleep until an allocation slowpath wakes it up again.
For higher-order allocations, waking up kcompactd is done only before
the full sleep. This turns out to be an issue in case another
high-order allocation fails during the initial sleep. It will wake
kswapd up, however kswapd considers the zone balanced from the order-0
perspective, and will just quickly try to sleep again. So if there's a
longer stream of high-order allocations hitting the slowpath and waking
up kswapd, it might never actually wake up kcompactd, which may be
considered a regression from kswapd-based compaction. In the worst
case, it might be that a single allocation that cannot direct
reclaim/compact itself is waking kswapd in the retry loop and preventing
kcompactd from being woken up and unblocking it.
This patch makes sure kcompactd is woken up in such situations by simply
moving the wakeup before the short initial sleep. More efficient
solution would be to wake kcompactd immediately instead of kswapd if the
node is already order-0 balanced, but in that case we should also move
reset_isolation_suitable() call to kcompactd so it's not adding to the
allocator's latency. Since it's late in the 4.6 cycle, let's go with
the simpler change for now.
Fixes:
accf62422b3a ("mm, kswapd: replace kswapd compaction with waking up kcompactd")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: David Rientjes <rientjes@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Frank Rowand [Thu, 28 Apr 2016 23:18:47 +0000 (16:18 -0700)]
.mailmap: add Frank Rowand
Set current email address to replace obsolete email addresses.
Signed-off-by: Frank Rowand <frank.rowand@sonymobile.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Minchan Kim [Thu, 28 Apr 2016 23:18:44 +0000 (16:18 -0700)]
mm/hwpoison: fix wrong num_poisoned_pages accounting
Currently, migration code increses num_poisoned_pages on *failed*
migration page as well as successfully migrated one at the trial of
memory-failure. It will make the stat wrong. As well, it marks the
page as PG_HWPoison even if the migration trial failed. It would mean
we cannot recover the corrupted page using memory-failure facility.
This patches fixes it.
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reported-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Minchan Kim [Thu, 28 Apr 2016 23:18:41 +0000 (16:18 -0700)]
mm: call swap_slot_free_notify() with page lock held
Kyeongdon reported below error which is BUG_ON(!PageSwapCache(page)) in
page_swap_info. The reason is that page_endio in rw_page unlocks the
page if read I/O is completed so we need to hold a PG_lock again to
check PageSwapCache. Otherwise, the page can be removed from swapcache.
Kernel BUG at
c00f9040 [verbose debug info unavailable]
Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
Modules linked in:
CPU: 4 PID: 13446 Comm: RenderThread Tainted: G W
3.10.84-g9f14aec-dirty #73
task:
c3b73200 ti:
dd192000 task.ti:
dd192000
PC is at page_swap_info+0x10/0x2c
LR is at swap_slot_free_notify+0x18/0x6c
pc : [<
c00f9040>] lr : [<
c00f5560>] psr:
400f0113
sp :
dd193d78 ip :
c2deb1e4 fp :
da015180
r10:
00000000 r9 :
000200da r8 :
c120fe08
r7 :
00000000 r6 :
00000000 r5 :
c249a6c0 r4 : =
c249a6c0
r3 :
00000000 r2 :
40080009 r1 :
200f0113 r0 : =
c249a6c0
..<snip> ..
Call Trace:
page_swap_info+0x10/0x2c
swap_slot_free_notify+0x18/0x6c
swap_readpage+0x90/0x11c
read_swap_cache_async+0x134/0x1ac
swapin_readahead+0x70/0xb0
handle_pte_fault+0x320/0x6fc
handle_mm_fault+0xc0/0xf0
do_page_fault+0x11c/0x36c
do_DataAbort+0x34/0x118
Fixes:
3f2b1a04f44933f2 ("zram: revive swap_slot_free_notify")
Signed-off-by: Minchan Kim <minchan@kernel.org>
Tested-by: Kyeongdon Kim <kyeongdon.kim@lge.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Minchan Kim [Thu, 28 Apr 2016 23:18:38 +0000 (16:18 -0700)]
mm: vmscan: reclaim highmem zone if buffer_heads is over limit
We have been reclaimed highmem zone if buffer_heads is over limit but
commit
6b4f7799c6a5 ("mm: vmscan: invoke slab shrinkers from
shrink_zone()") changed the behavior so it doesn't reclaim highmem zone
although buffer_heads is over the limit. This patch restores the logic.
Fixes:
6b4f7799c6a5 ("mm: vmscan: invoke slab shrinkers from shrink_zone()")
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Gerald Schaefer [Thu, 28 Apr 2016 23:18:35 +0000 (16:18 -0700)]
numa: fix /proc/<pid>/numa_maps for THP
In gather_pte_stats() a THP pmd is cast into a pte, which is wrong
because the layouts may differ depending on the architecture. On s390
this will lead to inaccurate numa_maps accounting in /proc because of
misguided pte_present() and pte_dirty() checks on the fake pte.
On other architectures pte_present() and pte_dirty() may work by chance,
but there may be an issue with direct-access (dax) mappings w/o
underlying struct pages when HAVE_PTE_SPECIAL is set and THP is
available. In vm_normal_page() the fake pte will be checked with
pte_special() and because there is no "special" bit in a pmd, this will
always return false and the VM_PFNMAP | VM_MIXEDMAP checking will be
skipped. On dax mappings w/o struct pages, an invalid struct page
pointer would then be returned that can crash the kernel.
This patch fixes the numa_maps THP handling by introducing new "_pmd"
variants of the can_gather_numa_stats() and vm_normal_page() functions.
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Cc: <stable@vger.kernel.org> [4.3+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Konstantin Khlebnikov [Thu, 28 Apr 2016 23:18:32 +0000 (16:18 -0700)]
mm/huge_memory: replace VM_NO_THP VM_BUG_ON with actual VMA check
Khugepaged detects own VMAs by checking vm_file and vm_ops but this way
it cannot distinguish private /dev/zero mappings from other special
mappings like /dev/hpet which has no vm_ops and popultes PTEs in mmap.
This fixes false-positive VM_BUG_ON and prevents installing THP where
they are not expected.
Link: http://lkml.kernel.org/r/CACT4Y+ZmuZMV5CjSFOeXviwQdABAgT7T+StKfTqan9YDtgEi5g@mail.gmail.com
Fixes:
78f11a255749 ("mm: thp: fix /dev/zero MAP_PRIVATE and vm_flags cleanups")
Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Krzysztof Kozlowski [Thu, 28 Apr 2016 23:18:30 +0000 (16:18 -0700)]
mailmap: fix Krzysztof Kozlowski's misspelled name
Patchwork introduced a garbled Polish character in commit
1e3012d0fdc5
("crypto: s5p-sss - Use memcpy_toio for iomem annotated memory") so fix
the mail mapping. Additionally prefer to use kernel.org account for
personal work, instead of my gmail address.
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kirill A. Shutemov [Thu, 28 Apr 2016 23:18:27 +0000 (16:18 -0700)]
thp: keep huge zero page pinned until tlb flush
Andrea has found[1] a race condition on MMU-gather based TLB flush vs
split_huge_page() or shrinker which frees huge zero under us (patch 1/2
and 2/2 respectively).
With new THP refcounting, we don't need patch 1/2: mmu_gather keeps the
page pinned until flush is complete and the pin prevents the page from
being split under us.
We still need patch 2/2. This is simplified version of Andrea's patch.
We don't need fancy encoding.
[1] http://lkml.kernel.org/r/
1447938052-22165-1-git-send-email-aarcange@redhat.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Andrea Arcangeli <aarcange@redhat.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Steve Capper [Thu, 28 Apr 2016 23:18:24 +0000 (16:18 -0700)]
mm: exclude HugeTLB pages from THP page_mapped() logic
HugeTLB pages cannot be split, so we use the compound_mapcount to track
rmaps.
Currently page_mapped() will check the compound_mapcount, but will also
go through the constituent pages of a THP compound page and query the
individual _mapcount's too.
Unfortunately, page_mapped() does not distinguish between HugeTLB and
THP compound pages and assumes that a compound page always needs to have
HPAGE_PMD_NR pages querying.
For most cases when dealing with HugeTLB this is just inefficient, but
for scenarios where the HugeTLB page size is less than the pmd block
size (e.g. when using contiguous bit on ARM) this can lead to crashes.
This patch adjusts the page_mapped function such that we skip the
unnecessary THP reference checks for HugeTLB pages.
Fixes:
e1534ae95004 ("mm: differentiate page_mapped() from page_mapcount() for compound pages")
Signed-off-by: Steve Capper <steve.capper@arm.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Atsushi Kumagai [Thu, 28 Apr 2016 23:18:21 +0000 (16:18 -0700)]
kexec: export OFFSET(page.compound_head) to find out compound tail page
PageAnon() always look at head page to check PAGE_MAPPING_ANON and tail
page's page->mapping has just a poisoned data since commit
1c290f642101
("mm: sanitize page->mapping for tail pages").
If makedumpfile checks page->mapping of a compound tail page to
distinguish anonymous page as usual, it must fail in newer kernel. So
it's necessary to export OFFSET(page.compound_head) to avoid checking
compound tail pages.
The problem is that unnecessary hugepages won't be removed from a dump
file in kernels 4.5.x and later. This means that extra disk space would
be consumed. It's a problem, but not critical.
Signed-off-by: Atsushi Kumagai <ats-kumagai@wm.jp.nec.com>
Acked-by: Dave Young <dyoung@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Atsushi Kumagai [Thu, 28 Apr 2016 23:18:18 +0000 (16:18 -0700)]
kexec: update VMCOREINFO for compound_order/dtor
makedumpfile refers page.lru.next to get the order of compound pages for
page filtering.
However, now the order is stored in page.compound_order, hence
VMCOREINFO should be updated to export the offset of
page.compound_order.
The fact is, page.compound_order was introduced already in kernel 4.0,
but the offset of it was the same as page.lru.next until kernel 4.3, so
this was not actual problem.
The above can be said also for page.lru.prev and page.compound_dtor,
it's necessary to detect hugetlbfs pages. Further, the content was
changed from direct address to the ID which means dtor.
The problem is that unnecessary hugepages won't be removed from a dump
file in kernels 4.4.x and later. This means that extra disk space would
be consumed. It's a problem, but not critical.
Signed-off-by: Atsushi Kumagai <ats-kumagai@wm.jp.nec.com>
Acked-by: Dave Young <dyoung@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 29 Apr 2016 01:59:24 +0000 (18:59 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/sage/ceph-client
Pull Ceph fixes from Sage Weil:
"There is a lifecycle fix in the auth code, a fix for a narrow race
condition on map, and a helpful message in the log when there is a
feature mismatch (which happens frequently now that the default
server-side options have changed)"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
rbd: report unsupported features to syslog
rbd: fix rbd map vs notify races
libceph: make authorizer destruction independent of ceph_auth_client
Linus Torvalds [Fri, 29 Apr 2016 01:52:11 +0000 (18:52 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/s390/linux
Pull s390 fixes from Martin Schwidefsky:
"Three more bug fixes for 4.6
- Due to a race in the dynamic page table code a multi-threaded
program can cause a translation specification exception. With
panic_on_oops a user space program can crash the system.
- An information leak with the /dev/sclp device.
- A use after free in the s390 PCI code"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/sclp_ctl: fix potential information leak with /dev/sclp
s390/mm: fix asce_bits handling with dynamic pagetable levels
s390/pci: fix use after free in dma_init
Florian Westphal [Sun, 24 Apr 2016 20:18:59 +0000 (22:18 +0200)]
RDMA/nes: don't leak skb if carrier down
Alternatively one could free the skb, OTOH I don't think this test is
useful so just remove it.
Cc: <linux-rdma@vger.kernel.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
David S. Miller [Thu, 28 Apr 2016 21:29:46 +0000 (17:29 -0400)]
Merge branch 'bpf-fixes'
Alexei Starovoitov says:
====================
bpf: fix several bugs
First two patches address bugs found by Jann Horn.
Last patch is a minor samples fix spotted during the testing.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexei Starovoitov [Thu, 28 Apr 2016 01:56:22 +0000 (18:56 -0700)]
samples/bpf: fix trace_output example
llvm cannot always recognize memset as builtin function and optimize
it away, so just delete it. It was a leftover from testing
of bpf_perf_event_output() with large data structures.
Fixes:
39111695b1b8 ("samples: bpf: add bpf_perf_event_output example")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexei Starovoitov [Thu, 28 Apr 2016 01:56:21 +0000 (18:56 -0700)]
bpf: fix check_map_func_compatibility logic
The commit
35578d798400 ("bpf: Implement function bpf_perf_event_read() that get the selected hardware PMU conuter")
introduced clever way to check bpf_helper<->map_type compatibility.
Later on commit
a43eec304259 ("bpf: introduce bpf_perf_event_output() helper") adjusted
the logic and inadvertently broke it.
Get rid of the clever bool compare and go back to two-way check
from map and from helper perspective.
Fixes:
a43eec304259 ("bpf: introduce bpf_perf_event_output() helper")
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexei Starovoitov [Thu, 28 Apr 2016 01:56:20 +0000 (18:56 -0700)]
bpf: fix refcnt overflow
On a system with >32Gbyte of phyiscal memory and infinite RLIMIT_MEMLOCK,
the malicious application may overflow 32-bit bpf program refcnt.
It's also possible to overflow map refcnt on 1Tb system.
Impose 32k hard limit which means that the same bpf program or
map cannot be shared by more than 32k processes.
Fixes:
1be7f75d1668 ("bpf: enable non-root eBPF programs")
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 28 Apr 2016 21:27:31 +0000 (17:27 -0400)]
Merge branch 'cpsw-phy-handle-fixes'
David Rivshin says:
====================
drivers: net: cpsw: phy-handle fixes
This series fixes a number of related issues around using phy-handle
properties in cpsw emac nodes.
Patch 1 fixes a bug if more than one slave is used, and either
slave uses the phy-handle property in the devicetree.
Patch 2 fixes a NULL pointer dereference which can occur if a
phy-handle property is used and of_phy_connect() return NULL,
such as with a bad devicetree.
Patch 3 fixes an issue where the phy-mode property would be ignored
if a phy-handle property was used. This also fixes a bogus error
message that would be emitted.
Patch 4 fixes makes the binding documentation more explicit that
exactly one PHY property should be used, and also marks phy_id as
deprecated.
Patch 5 cleans up the fixed-link case to work like the now-fixed
phy-handle case.
I have tested on the following hardware configurations:
- (EVMSK) dual emac, phy_id property in both slaves
- (EVMSK) dual emac, phy-handle property in both slaves
- (EVMSK) a bad phy-handle property pointing to &mmc1
- (EVMSK) phy_id property with incorrect PHY address
- (BeagleBoneBlack) single emac, phy_id property
- (custom) single emac, fixed-link subnode
Andrew Goodbody reported testing v2 on a board that doesn't use
dual_emac mode, but with 2 PHYs using phy-handle properties [1].
Nicolas Chauvet reported testing v2 on an HP t410 (dm8148).
Markus Brunner reported testing v1 on the following [2]:
- emac0 with phy_id and emac1 with fixed phy
- emac0 with phy-handle and emac1 with fixed phy
- emac0 with fixed phy and emac1 with fixed phy
[1] https://lkml.org/lkml/2016/4/22/537
[2] http://www.spinics.net/lists/netdev/msg357890.html
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David Rivshin [Thu, 28 Apr 2016 01:45:45 +0000 (21:45 -0400)]
drivers: net: cpsw: use of_phy_connect() in fixed-link case
If a fixed-link DT subnode is used, the phy_device was looked up so
that a PHY ID string could be constructed and passed to phy_connect().
This is not necessary, as the device_node can be passed directly to
of_phy_connect() instead. This reuses the same codepath as if the
phy-handle DT property was used.
Signed-off-by: David Rivshin <drivshin@allworx.com>
Tested-by: Nicolas Chauvet <kwizart@gmail.com>
Tested-by: Andrew Goodbody <andrew.goodbody@cambrionix.com>
Reviewed-by: Mugunthan V N <mugunthanvnm@ti.com>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Rivshin [Thu, 28 Apr 2016 01:42:47 +0000 (21:42 -0400)]
dt: cpsw: phy-handle, phy_id, and fixed-link are mutually exclusive
The phy-handle, phy_id, and fixed-link properties are mutually exclusive,
and only one need be specified. Make this clear in the binding doc.
Also mark the phy_id property as deprecated, as phy-handle should be
used instead.
Signed-off-by: David Rivshin <drivshin@allworx.com>
Reviewed-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Rivshin [Thu, 28 Apr 2016 01:38:26 +0000 (21:38 -0400)]
drivers: net: cpsw: don't ignore phy-mode if phy-handle is used
The phy-mode emac property was only being processed in the phy_id
or fixed-link cases. However if phy-handle was specified instead,
an error message would complain about the lack of phy_id or
fixed-link, and then jump past the of_get_phy_mode(). This would
result in the PHY mode defaulting to MII, regardless of what the
devicetree specified.
Fixes:
9e42f715264f ("drivers: net: cpsw: add phy-handle parsing")
Signed-off-by: David Rivshin <drivshin@allworx.com>
Tested-by: Nicolas Chauvet <kwizart@gmail.com>
Tested-by: Andrew Goodbody <andrew.goodbody@cambrionix.com>
Reviewed-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Rivshin [Thu, 28 Apr 2016 01:32:31 +0000 (21:32 -0400)]
drivers: net: cpsw: fix segfault in case of bad phy-handle
If an emac node has a phy-handle property that points to something
which is not a phy, then a segmentation fault will occur when the
interface is brought up. This is because while phy_connect() will
return ERR_PTR() on failure, of_phy_connect() will return NULL.
The common error check uses IS_ERR(), and so missed when
of_phy_connect() fails. The NULL pointer is then dereferenced.
Also, the common error message referenced slave->data->phy_id,
which would be empty in the case of phy-handle. Instead, use the
name of the device_node as a useful identifier. And in the phy_id
case add the error code for completeness.
Fixes:
9e42f715264f ("drivers: net: cpsw: add phy-handle parsing")
Signed-off-by: David Rivshin <drivshin@allworx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Rivshin [Thu, 28 Apr 2016 01:25:25 +0000 (21:25 -0400)]
drivers: net: cpsw: fix parsing of phy-handle DT property in dual_emac config
Commit
9e42f715264ff158478fa30eaed847f6e131366b ("drivers: net: cpsw: add
phy-handle parsing") saved the "phy-handle" phandle into a new cpsw_priv
field. However, phy connections are per-slave, so the phy_node field should
be in cpsw_slave_data rather than cpsw_priv.
This would go unnoticed in a single emac configuration. But in dual_emac
mode, the last "phy-handle" property parsed for either slave would be used
by both of them, causing them both to refer to the same phy_device.
Fixes:
9e42f715264f ("drivers: net: cpsw: add phy-handle parsing")
Signed-off-by: David Rivshin <drivshin@allworx.com>
Tested-by: Nicolas Chauvet <kwizart@gmail.com>
Tested-by: Andrew Goodbody <andrew.goodbody@cambrionix.com>
Reviewed-by: Mugunthan V N <mugunthanvnm@ti.com>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andreas Larsson [Wed, 27 Apr 2016 14:46:10 +0000 (16:46 +0200)]
MAINTAINERS: net: Change maintainer for GRETH 10/100/1G Ethernet MAC device driver
Signed-off-by: Andreas Larsson <andreas@gaisler.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Benc [Wed, 27 Apr 2016 12:08:01 +0000 (14:08 +0200)]
gre: reject GUE and FOU in collect metadata mode
The collect metadata mode does not support GUE nor FOU. This might be
implemented later; until then, we should reject such config.
I think this is okay to be changed. It's unlikely anyone has such
configuration (as it doesn't work anyway) and we may need a way to
distinguish whether it's supported or not by the kernel later.
For backwards compatibility with iproute2, it's not possible to just check
the attribute presence (iproute2 always includes the attribute), the actual
value has to be checked, too.
Fixes:
2e15ea390e6f4 ("ip_gre: Add support to collect tunnel metadata.")
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 28 Apr 2016 21:05:25 +0000 (17:05 -0400)]
Merge branch 'pegasus-sizes'
Petko Manolov says:
====================
pegasus: correct buffer & packet sizes
As noticed by Lincoln Ramsay <
a1291762@gmail.com> some old (usb 1.1) Pegasus
based devices may actually return more bytes than the specified in the datasheet
amount. That would not be a problem if the allocated space for the SKB was
equal to the parameter passed to usb_fill_bulk_urb(). Some poor bugger (i
really hope it was not me, but 'git blame' is useless in this case, so anyway)
decided to add '+ 8' to the buffer length parameter. Sometimes the usb transfer
overflows and corrupts the socket structure, leading to kernel panic.
The above doesn't seem to happen for newer (Pegasus2 based) devices which did
help this bug to hide for so long.
The new default is to not include the CRC at the end of each received package.
So far CRC has been ignored which makes no sense to do it in a first place.
The patch is against v4.6-rc5 and was tested on ADM8515 device by transferring
multiple gigabytes of data over a couple of days without any complaints from the
kernel. Please apply it to whatever net tree you deem fit.
Changes since v1:
- split the patch in two parts;
- corrected the subject lines;
Changes since v2:
- do not append CRC by default (based on a discussion with Johannes Berg);
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Petko Manolov [Wed, 27 Apr 2016 11:24:50 +0000 (14:24 +0300)]
pegasus: fixes reported packet length
The default Pegasus setup was to append the status and CRC at the end of each
received packet. The status bits are used to update various stats, but CRC has
been ignored. The new default is to not append CRC at the end of RX packets.
Signed-off-by: Petko Manolov <petkan@mip-labs.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Petko Manolov [Wed, 27 Apr 2016 11:24:49 +0000 (14:24 +0300)]
pegasus: fixes URB buffer allocation size;
usb_fill_bulk_urb() receives buffer length parameter 8 bytes larger
than what's allocated by alloc_skb(); This seems to be a problem with
older (pegasus usb-1.1) devices, which may silently return more data
than the maximal packet length.
Reported-by: Lincoln Ramsay <a1291762@gmail.com>
Signed-off-by: Petko Manolov <petkan@mip-labs.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 28 Apr 2016 21:02:45 +0000 (17:02 -0400)]
Merge branch 'gre-lwt-fixes'
Jiri Benc says:
====================
gre: fix lwtunnel support
This patchset fixes a few bugs in ipgre metadata mode implementation.
As an example, in this setup:
ip a a 192.168.1.1/24 dev eth0
ip l a gre1 type gre external
ip l s gre1 up
ip a a 192.168.99.1/24 dev gre1
ip r a 192.168.99.2/32 encap ip dst 192.168.1.2 ttl 10 dev gre1
ping 192.168.99.2
the traffic does not go through before this patchset and does as expected
with it applied.
v3: Back to v1 in order not to break existing users. Dropped patch 3, will
be fixed in iproute2 instead.
v2: Rejecting invalid configuration, added patch 3, dropped patch for
ETH_P_TEB (will target net-next).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Benc [Wed, 27 Apr 2016 09:29:07 +0000 (11:29 +0200)]
gre: build header correctly for collect metadata tunnels
In ipgre (i.e. not gretap) + collect metadata mode, the skb was assumed to
contain Ethernet header and was encapsulated as ETH_P_TEB. This is not the
case, the interface is ARPHRD_IPGRE and the protocol to be used for
encapsulation is skb->protocol.
Fixes:
2e15ea390e6f4 ("ip_gre: Add support to collect tunnel metadata.")
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Benc [Wed, 27 Apr 2016 09:29:06 +0000 (11:29 +0200)]
gre: do not assign header_ops in collect metadata mode
In ipgre mode (i.e. not gretap) with collect metadata flag set, the tunnel
is incorrectly assumed to be mGRE in NBMA mode (see commit
6a5f44d7a048c).
This is not the case, we're controlling the encapsulation addresses by
lwtunnel metadata. And anyway, assigning dev->header_ops in collect metadata
mode does not make sense.
Although it would be more user firendly to reject requests that specify
both the collect metadata flag and a remote/local IP address, this would
break current users of gretap or introduce ugly code and differences in
handling ipgre and gretap configuration. Keep the current behavior of
remote/local IP address being ignored in such case.
v3: Back to v1, added explanation paragraph.
v2: Reject configuration specifying both remote/local address and collect
metadata flag.
Fixes:
2e15ea390e6f4 ("ip_gre: Add support to collect tunnel metadata.")
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 28 Apr 2016 20:55:26 +0000 (16:55 -0400)]
Merge tag 'mac80211-for-davem-2016-04-27' of git://git./linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
Just a single fix, for a per-CPU memory leak in a
(root user triggerable) error case.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Timur Tabi [Tue, 26 Apr 2016 17:44:18 +0000 (12:44 -0500)]
net: phy: at803x: only the AT8030 needs a hardware reset on link change
Commit
13a56b44 ("at803x: Add support for hardware reset") added a
work-around for a hardware bug on the AT8030. However, the work-around
was being called for all 803x PHYs, even those that don't need it.
Function at803x_link_change_notify() checks to make sure that it only
resets the PHY on the 8030, but it makes more sense to not call that
function at all if it isn't needed.
Signed-off-by: Timur Tabi <timur@codeaurora.org>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 28 Apr 2016 20:42:40 +0000 (16:42 -0400)]
Merge tag 'batman-adv-fix-for-davem' of git://git.open-mesh.org/linux-merge
Antonio Quartulli says:
====================
In this patchset you can find the following fixes:
1) check skb size to avoid reading beyond its border when delivering
payloads, by Sven Eckelmann
2) initialize last_seen time in neigh_node object to prevent cleanup
routine from accidentally purge it, by Marek Lindner
3) release "recently added" slave interfaces upon virtual/batman
interface shutdown, by Sven Eckelmann
4) properly decrease router object reference counter upon routing table
update, by Sven Eckelmann
5) release queue slots when purging OGM packets of deactivating slave
interface, by Linus Lüssing
Patch 2 and 3 have no "Fixes:" tag because the offending commits date
back to when batman-adv was not yet officially in the net tree.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Christophe Jaillet [Tue, 26 Apr 2016 02:33:43 +0000 (04:33 +0200)]
ps3_gelic: fix memcpy parameter
The size allocated for target->hwinfo and the number of bytes copied in it
should be consistent.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Woojung Huh [Mon, 25 Apr 2016 22:22:36 +0000 (22:22 +0000)]
lan78xx: workaround of forced 100 Full/Half duplex mode error
At forced 100 Full & Half duplex mode, chip may fail to set mode correctly
when cable is switched between long(~50+m) and short one.
As workaround, set to 10 before setting to 100 at forced 100 F/H mode.
Signed-off-by: Woojung Huh <woojung.huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Woojung Huh [Mon, 25 Apr 2016 22:22:32 +0000 (22:22 +0000)]
lan78xx: fix statistics counter error
Fix rx_bytes, tx_bytes and tx_frames error in netdev.stats.
- rx_bytes counted bytes excluding size of struct ethhdr.
- tx_packets didn't count multiple packets in a single urb
- tx_bytes included 8 bytes of extra commands.
Signed-off-by: Woojung Huh <woojung.huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Mon, 25 Apr 2016 22:11:22 +0000 (23:11 +0100)]
net: dsa: mv88e6xxx: fix uninitialized error return
The error return err is not initialized and there is a possibility
that err is not assigned causing mv88e6xxx_port_bridge_join to
return a garbage error return status. Fix this by initializing err
to 0.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Marcelo Ricardo Leitner [Mon, 25 Apr 2016 18:13:17 +0000 (15:13 -0300)]
net: fix net_gso_ok for new GSO types.
Fix casting in net_gso_ok. Otherwise the shift on
gso_type << NETIF_F_GSO_SHIFT may hit the 32th bit and make it look like
a INT_MIN, which is then promoted from signed to uint64 which is
0xffffffff80000000, resulting in wrong behavior when it is and'ed with
the feature itself, as in:
This test app:
#include <stdio.h>
#include <stdint.h>
int main(int argc, char **argv)
{
uint64_t feature1;
uint64_t feature2;
int gso_type = 1 << 15;
feature1 = gso_type << 16;
feature2 = (uint64_t)gso_type << 16;
printf("%lx %lx\n", feature1, feature2);
return 0;
}
Gives:
ffffffff80000000 80000000
So that this:
return (features & feature) == feature;
Actually works on more bits than expected and invalid ones.
Fix is to promote it earlier.
Issue noted while rebasing SCTP GSO patch but posting separetely as
someone else may experience this meanwhile.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Neil Armstrong [Mon, 25 Apr 2016 17:41:38 +0000 (19:41 +0200)]
net: ethernet: davinci_emac: Fix devioctl while in fixed link
When configured in fixed link, the DaVinci emac driver sets the
priv->phydev to NULL and further ioctl calls to the phy_mii_ioctl()
causes the kernel to crash.
Cc: Brian Hutchinson <b.hutchman@gmail.com>
Fixes:
1bb6aa56bb38 ("net: davinci_emac: Add support for fixed-link PHY")
Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
Reviewed-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 28 Apr 2016 18:23:27 +0000 (14:23 -0400)]
Merge tag 'wireless-drivers-for-davem-2016-04-25' of git://git./linux/kernel/git/kvalo/wireless-drivers
Kalle Valo says:
====================
wireless-drivers fixes for 4.6
ath9k
* fix a couple release old throughput regression on ar9281
iwlwifi
* add new device IDs for 8265
* fix a NULL pointer dereference when paging firmware asserts
* remove a WARNING on gscan capabilities
* fix MODULE_FIRMWARE for 8260
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Bert Kenward [Mon, 25 Apr 2016 16:42:12 +0000 (17:42 +0100)]
MAINTAINERS: net: update sfc maintainers
Add myself and Edward Cree as maintainers.
Remove Shradha Shah, who is on extended leave.
Cc: David S. Miller <davem@davemloft.net>
Cc: Edward Cree <ecree@solarflare.com>
Cc: Shradha Shah <sshah@solarflare.com>
Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Acked-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Cooper [Mon, 25 Apr 2016 15:51:00 +0000 (16:51 +0100)]
sfc: disable RSS when unsupported
When certain firmware variants are selected (via the sfboot utility) the
SFC7000 and SFC8000 series NICs don't support RSS. The driver still
tries (and fails) to insert filters with the RSS flag, and the NIC fails
to pass traffic.
When the firmware reports RSS_LIMITED suppress allocating a default RSS
context. The absence of an RSS context is picked up in filter insertion
and RSS flags are discarded.
Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stanislaw Gruszka [Mon, 25 Apr 2016 08:59:19 +0000 (10:59 +0200)]
myri10ge: fix sleeping with bh disabled
napi_disable() can not be called with bh disabled, move locking just
around myri10ge_ss_lock_napi() .
Patches fixes following bug:
[ 114.278378] BUG: sleeping function called from invalid context at net/core/dev.c:4383
<snip>
[ 114.313712] Call Trace:
[ 114.314943] [<
ffffffff817010ce>] dump_stack+0x19/0x1b
[ 114.317673] [<
ffffffff810ce7f3>] __might_sleep+0x173/0x230
[ 114.320566] [<
ffffffff815b3117>] napi_disable+0x27/0x90
[ 114.323254] [<
ffffffffa01e437f>] myri10ge_close+0xbf/0x3f0 [myri10ge]
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Acked-by: Hyong-Youb Kim <hykim@myri.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Engestrom [Mon, 25 Apr 2016 06:36:56 +0000 (07:36 +0100)]
Documentation: networking: fix spelling mistakes
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sinclair Yeh [Thu, 21 Apr 2016 18:29:31 +0000 (11:29 -0700)]
drm/vmwgfx: Fix order of operation
mode->hdisplay * (var->bits_per_pixel + 7) gets evaluated before
the division, potentially making the pitch larger than it should
be.
Since the original intention is to do a div-round-up, just use
the macro instead.
Signed-off-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Charmaine Lee [Tue, 12 Apr 2016 15:19:08 +0000 (08:19 -0700)]
drm/vmwgfx: use vmw_cmd_dx_cid_check for query commands.
Instead of calling vmw_cmd_ok, call vmw_cmd_dx_cid_check to
validate the context id for query commands.
Signed-off-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Charmaine Lee [Tue, 12 Apr 2016 15:14:23 +0000 (08:14 -0700)]
drm/vmwgfx: Enable SVGA_3D_CMD_DX_SET_PREDICATION
Fixes piglit tests nv_conditional_render-* crashes.
Signed-off-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Jason Gunthorpe [Mon, 11 Apr 2016 01:13:13 +0000 (19:13 -0600)]
IB/security: Restrict use of the write() interface
The drivers/infiniband stack uses write() as a replacement for
bi-directional ioctl(). This is not safe. There are ways to
trigger write calls that result in the return structure that
is normally written to user space being shunted off to user
specified kernel memory instead.
For the immediate repair, detect and deny suspicious accesses to
the write API.
For long term, update the user space libraries and the kernel API
to something that doesn't present the same security vulnerabilities
(likely a structured ioctl() interface).
The impacted uAPI interfaces are generally only available if
hardware from drivers/infiniband is installed in the system.
Reported-by: Jann Horn <jann@thejh.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
[ Expanded check to all known write() entry points ]
Cc: stable@vger.kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
Dean Luick [Fri, 22 Apr 2016 18:17:03 +0000 (11:17 -0700)]
IB/hfi1: Use kernel default llseek for ui device
The ui device llseek had a mistake with SEEK_END and did
not fully follow seek semantics. Correct all this by
using a kernel supplied function for fixed size devices.
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Mitko Haralanov [Wed, 20 Apr 2016 13:05:36 +0000 (06:05 -0700)]
IB/hfi1: Don't attempt to free resources if initialization failed
Attempting to free resources which have not been allocated and
initialized properly led to the following kernel backtrace:
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<
ffffffffa09658fe>] unlock_exp_tids.isra.8+0x2e/0x120 [hfi1]
PGD
852a43067 PUD
85d4a6067 PMD 0
Oops: 0000 [#1] SMP
CPU: 0 PID: 2831 Comm: osu_bw Tainted: G IO 3.12.18-wfr+ #1
task:
ffff88085b15b540 ti:
ffff8808588fe000 task.ti:
ffff8808588fe000
RIP: 0010:[<
ffffffffa09658fe>] [<
ffffffffa09658fe>] unlock_exp_tids.isra.8+0x2e/0x120 [hfi1]
RSP: 0018:
ffff8808588ffde0 EFLAGS:
00010282
RAX:
0000000000000000 RBX:
ffff880858a31800 RCX:
0000000000000000
RDX:
ffff88085d971bc0 RSI:
ffff880858a318f8 RDI:
ffff880858a318c0
RBP:
ffff8808588ffe20 R08:
0000000000000000 R09:
0000000000000000
R10:
ffff88087ffd6f40 R11:
0000000001100348 R12:
ffff880852900000
R13:
ffff880858a318c0 R14:
0000000000000000 R15:
ffff88085d971be8
FS:
00007f4674e83740(0000) GS:
ffff88087f400000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
0000000000000000 CR3:
000000085c377000 CR4:
00000000001407f0
Stack:
ffffffffa0941a71 ffff880858a318f8 ffff88085d971bc0 ffff880858a31800
ffff880852900000 ffff880858a31800 00000000003ffff7 ffff88085d971bc0
ffff8808588ffe60 ffffffffa09663fc ffff8808588ffe60 ffff880858a31800
Call Trace:
[<
ffffffffa0941a71>] ? find_mmu_handler+0x51/0x70 [hfi1]
[<
ffffffffa09663fc>] hfi1_user_exp_rcv_free+0x6c/0x120 [hfi1]
[<
ffffffffa0932809>] hfi1_file_close+0x1a9/0x340 [hfi1]
[<
ffffffff8116c189>] __fput+0xe9/0x270
[<
ffffffff8116c35e>] ____fput+0xe/0x10
[<
ffffffff81065707>] task_work_run+0xa7/0xe0
[<
ffffffff81002969>] do_notify_resume+0x59/0x80
[<
ffffffff814ffc1a>] int_signal+0x12/0x17
This commit re-arranges the context initialization code in a way that
would allow for context event flags to be used to determine whether
the context has been successfully initialized.
In turn, this can be used to skip the resource de-allocation if they
were never allocated in the first place.
Fixes:
3abb33ac6521 ("staging/hfi1: Add TID cache receive init and free funcs")
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com.
Signed-off-by: Doug Ledford <dledford@redhat.com>
Mike Marciniszyn [Wed, 20 Apr 2016 13:05:30 +0000 (06:05 -0700)]
IB/hfi1: Fix missing lock/unlock in verbs drain callback
The iowait_sdma_drained() callback lacked locking to
protect the qp s_flags field.
This causes the s_flags to be out of sync
on multiple CPUs, potentially corrupting the s_flags.
Fixes:
a545f5308b6c ("staging/rdma/hfi: fix CQ completion order issue")
Reviewed-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Jubin John [Tue, 12 Apr 2016 17:47:00 +0000 (10:47 -0700)]
IB/rdmavt: Fix send scheduling
call_send is used to determine whether to send immediately or schedule
a send for later. The current logic in rdmavt is inverted and has a
negative impact on the latency of the hfi1 and qib drivers. Fix this
regression by correctly calling send immediately when call_send is set.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Mitko Haralanov [Tue, 12 Apr 2016 17:46:16 +0000 (10:46 -0700)]
IB/hfi1: Prevent unpinning of wrong pages
The routine used by the SDMA cache to handle already
cached nodes can extend an already existing node.
In its error handling code, the routine will unpin pages
when not all pages of the buffer extension were pinned.
There was a bug in that part of the routine, which would
mistakenly unpin pages from the original set rather than
the newly pinned pages.
This commit fixes that bug by offsetting the page array
to the proper place pointing at the beginning of the newly
pinned pages.
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Mitko Haralanov [Tue, 12 Apr 2016 17:46:03 +0000 (10:46 -0700)]
IB/hfi1: Fix deadlock caused by locking with wrong scope
The locking around the interval RB tree is designed to prevent
access to the tree while it's being modified. The locking in its
current form is too overzealous, which is causing a deadlock in
certain cases with the following backtrace:
Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 0
CPU: 0 PID: 5836 Comm: IMB-MPI1 Tainted: G O 3.12.18-wfr+ #1
0000000000000000 ffff88087f206c50 ffffffff814f1caa ffffffff817b53f0
ffff88087f206cc8 ffffffff814ecd56 0000000000000010 ffff88087f206cd8
ffff88087f206c78 0000000000000000 0000000000000000 0000000000001662
Call Trace:
<NMI> [<
ffffffff814f1caa>] dump_stack+0x45/0x56
[<
ffffffff814ecd56>] panic+0xc2/0x1cb
[<
ffffffff810d4370>] ? restart_watchdog_hrtimer+0x50/0x50
[<
ffffffff810d4432>] watchdog_overflow_callback+0xc2/0xd0
[<
ffffffff81109b4e>] __perf_event_overflow+0x8e/0x2b0
[<
ffffffff8110a714>] perf_event_overflow+0x14/0x20
[<
ffffffff8101c906>] intel_pmu_handle_irq+0x1b6/0x390
[<
ffffffff814f927b>] perf_event_nmi_handler+0x2b/0x50
[<
ffffffff814f8ad8>] nmi_handle.isra.3+0x88/0x180
[<
ffffffff814f8d39>] do_nmi+0x169/0x310
[<
ffffffff814f8177>] end_repeat_nmi+0x1e/0x2e
[<
ffffffff81272600>] ? unmap_single+0x30/0x30
[<
ffffffff814f780d>] ? _raw_spin_lock_irqsave+0x2d/0x40
[<
ffffffff814f780d>] ? _raw_spin_lock_irqsave+0x2d/0x40
[<
ffffffff814f780d>] ? _raw_spin_lock_irqsave+0x2d/0x40
<<EOE>> <IRQ> [<
ffffffffa056c4a8>] hfi1_mmu_rb_search+0x38/0x70 [hfi1]
[<
ffffffffa05919cb>] user_sdma_free_request+0xcb/0x120 [hfi1]
[<
ffffffffa0593393>] user_sdma_txreq_cb+0x263/0x350 [hfi1]
[<
ffffffffa057fad7>] ? sdma_txclean+0x27/0x1c0 [hfi1]
[<
ffffffffa0593130>] ? user_sdma_send_pkts+0x1710/0x1710 [hfi1]
[<
ffffffffa057fdd6>] sdma_make_progress+0x166/0x480 [hfi1]
[<
ffffffff810762c9>] ? ttwu_do_wakeup+0x19/0xd0
[<
ffffffffa0581c7e>] sdma_engine_interrupt+0x8e/0x100 [hfi1]
[<
ffffffffa0546bdd>] sdma_interrupt+0x5d/0xa0 [hfi1]
[<
ffffffff81097e57>] handle_irq_event_percpu+0x47/0x1d0
[<
ffffffff81098017>] handle_irq_event+0x37/0x60
[<
ffffffff8109aa5f>] handle_edge_irq+0x6f/0x120
[<
ffffffff810044af>] handle_irq+0xbf/0x150
[<
ffffffff8104c9b7>] ? irq_enter+0x17/0x80
[<
ffffffff8150168d>] do_IRQ+0x4d/0xc0
[<
ffffffff814f7c6a>] common_interrupt+0x6a/0x6a
<EOI> [<
ffffffff81073524>] ? finish_task_switch+0x54/0xe0
[<
ffffffff814f56c6>] __schedule+0x3b6/0x7e0
[<
ffffffff810763a6>] __cond_resched+0x26/0x30
[<
ffffffff814f5eda>] _cond_resched+0x3a/0x50
[<
ffffffff814f4f82>] down_write+0x12/0x30
[<
ffffffffa0591619>] hfi1_release_user_pages+0x69/0x90 [hfi1]
[<
ffffffffa059173a>] sdma_rb_remove+0x9a/0xc0 [hfi1]
[<
ffffffffa056c00d>] __mmu_rb_remove.isra.5+0x5d/0x70 [hfi1]
[<
ffffffffa056c536>] hfi1_mmu_rb_remove+0x56/0x70 [hfi1]
[<
ffffffffa059427b>] hfi1_user_sdma_process_request+0x74b/0x1160 [hfi1]
[<
ffffffffa055c763>] hfi1_aio_write+0xc3/0x100 [hfi1]
[<
ffffffff8116a14c>] do_sync_readv_writev+0x4c/0x80
[<
ffffffff8116b58b>] do_readv_writev+0xbb/0x230
[<
ffffffff811a9da1>] ? fsnotify+0x241/0x320
[<
ffffffff81073524>] ? finish_task_switch+0x54/0xe0
[<
ffffffff8116b795>] vfs_writev+0x35/0x60
[<
ffffffff8116b8c9>] SyS_writev+0x49/0xc0
[<
ffffffff810cd876>] ? __audit_syscall_exit+0x1f6/0x2a0
[<
ffffffff814ff992>] system_call_fastpath+0x16/0x1b
As evident from the backtrace above, the process was being put to sleep
while holding the lock.
Limiting the scope of the lock only to the RB tree operation fixes the
above error allowing for proper locking and the process being put to
sleep when needed.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Mitko Haralanov [Tue, 12 Apr 2016 17:45:57 +0000 (10:45 -0700)]
IB/hfi1: Prevent NULL pointer deferences in caching code
There is a potential kernel crash when the MMU notifier calls the
invalidation routines in the hfi1 pinned page caching code for sdma.
The invalidation routine could call the remove callback
for the node, which in turn ends up dereferencing the
current task_struct to get a pointer to the mm_struct.
However, the mm_struct pointer could be NULL resulting in
the following backtrace:
BUG: unable to handle kernel NULL pointer dereference at
00000000000000a8
IP: [<
ffffffffa041f75a>] sdma_rb_remove+0xaa/0x100 [hfi1]
15
task:
ffff88085e66e080 ti:
ffff88085c244000 task.ti:
ffff88085c244000
RIP: 0010:[<
ffffffffa041f75a>] [<
ffffffffa041f75a>] sdma_rb_remove+0xaa/0x100 [hfi1]
RSP: 0000:
ffff88085c245878 EFLAGS:
00010002
RAX:
0000000000000000 RBX:
ffff88105b9bbd40 RCX:
ffffea003931a830
RDX:
0000000000000004 RSI:
ffff88105754a9c0 RDI:
ffff88105754a9c0
RBP:
ffff88085c245890 R08:
ffff88105b9bbd70 R09:
00000000fffffffb
R10:
ffff88105b9bbd58 R11:
0000000000000013 R12:
ffff88105754a9c0
R13:
0000000000000001 R14:
0000000000000001 R15:
ffff88105b9bbd40
FS:
0000000000000000(0000) GS:
ffff88107ef40000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
00000000000000a8 CR3:
0000000001a0b000 CR4:
00000000001407e0
Stack:
ffff88105b9bbd40 ffff88080ec481a8 ffff88080ec481b8 ffff88085c2458c0
ffffffffa03fa00e ffff88080ec48190 ffff88080ed9cd00 0000000001024000
0000000000000000 ffff88085c245920 ffffffffa03fa0e7 0000000000000282
Call Trace:
[<
ffffffffa03fa00e>] __mmu_rb_remove.isra.5+0x5e/0x70 [hfi1]
[<
ffffffffa03fa0e7>] mmu_notifier_mem_invalidate+0xc7/0xf0 [hfi1]
[<
ffffffffa03fa143>] mmu_notifier_page+0x13/0x20 [hfi1]
[<
ffffffff81156dd0>] __mmu_notifier_invalidate_page+0x50/0x70
[<
ffffffff81140bbb>] try_to_unmap_one+0x20b/0x470
[<
ffffffff81141ee7>] try_to_unmap_anon+0xa7/0x120
[<
ffffffff81141fad>] try_to_unmap+0x4d/0x60
[<
ffffffff8111fd7b>] shrink_page_list+0x2eb/0x9d0
[<
ffffffff81120ab3>] shrink_inactive_list+0x243/0x490
[<
ffffffff81121491>] shrink_lruvec+0x4c1/0x640
[<
ffffffff81121641>] shrink_zone+0x31/0x100
[<
ffffffff81121b0f>] kswapd_shrink_zone.constprop.62+0xef/0x1c0
[<
ffffffff811229e3>] kswapd+0x403/0x7e0
[<
ffffffff811225e0>] ? shrink_all_memory+0xf0/0xf0
[<
ffffffff81068ac0>] kthread+0xc0/0xd0
[<
ffffffff81068a00>] ? insert_kthread_work+0x40/0x40
[<
ffffffff814ff8ec>] ret_from_fork+0x7c/0xb0
[<
ffffffff81068a00>] ? insert_kthread_work+0x40/0x40
To correct this, the mm_struct passed to us by the MMU notifier is
used (which is what should have been done to begin with). This avoids
the broken derefences and ensures that the correct mm_struct is used.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sagi Grimberg [Sun, 3 Apr 2016 12:03:12 +0000 (15:03 +0300)]
MAINTAINERS: Update iser/isert maintainer contact info
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Acked-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sagi Grimberg [Thu, 31 Mar 2016 16:03:25 +0000 (19:03 +0300)]
IB/mlx5: Expose correct max_sge_rd limit
mlx5 devices (Connect-IB, ConnectX-4, ConnectX-4-LX) has a limitation
where rdma read work queue entries cannot exceed 512 bytes.
A rdma_read wqe needs to fit in 512 bytes:
- wqe control segment (16 bytes)
- rdma segment (16 bytes)
- scatter elements (16 bytes each)
So max_sge_rd should be: (512 - 16 - 16) / 16 = 30.
Cc: linux-stable@vger.kernel.org
Reported-by: Christoph Hellwig <hch@lst.de>
Tested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagig@grimberg.me>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>