Rongjun Chen [Wed, 29 Mar 2017 07:30:35 +0000 (15:30 +0800)]
wifi/bt: initial add wifi and bt driver
PD#138714: initial add wifi and bt
Change-Id: I96b3731297860c77a5a125573b98be7e54950cb1
Signed-off-by: Rongjun Chen <rongjun.chen@amlogic.com>
Jianxin Pan [Fri, 31 Mar 2017 11:10:59 +0000 (19:10 +0800)]
defconfig: meson64: defconfig for no video device
PD#138714: meson64_audio_defconfig
Change-Id: I6af9bae918ab684c5353c0331eb3c85f9b3b003e
Signed-off-by: Jianxin Pan <jianxin.pan@amlogic.com>
Yan Wang [Fri, 31 Mar 2017 08:00:59 +0000 (16:00 +0800)]
meson32: fix m8b cpu hotplug crash issue
PD#141217: fix m8b cpu hotplug crash issue
Change-Id: I7a80785a37a9ed4ab05f25cdc5ba43deba83d74c
Signed-off-by: Yan Wang <yan.wang@amlogic.com>
Bo Yang [Fri, 31 Mar 2017 13:08:16 +0000 (21:08 +0800)]
led: add system gpio led driver support
PD#138714: led: add system gpio led driver support
add new configuration:
+CONFIG_AMLOGIC_LED=y
+CONFIG_AMLOGIC_LED_SYS=y
Change-Id: I34c3740eaf9efb02667d9e3d7e95ef8570e2c63c
Signed-off-by: Bo Yang <bo.yang@amlogic.com>
Bo Yang [Fri, 31 Mar 2017 09:44:40 +0000 (17:44 +0800)]
watchdog: initial add meson watchdog driver
PD#138714: watchdog: initial add meson watchdog driver
add two configurations
+CONFIG_AMLOGIC_WDT=y
+CONFIG_AMLOGIC_WDT_MESON=y
Change-Id: If48e1cb61ec6f51e4fa903a83f67cdeffdd5a5b4
Signed-off-by: Bo Yang <bo.yang@amlogic.com>
Sunny Luo [Fri, 31 Mar 2017 13:14:34 +0000 (21:14 +0800)]
spicc: initial add spicc driver
PD#138714: spicc: initial add spicc driver
Change-Id: I60aa176f7bd9d64bd6e9db56adc7f592bc856f50
Signed-off-by: Sunny Luo <sunny.luo@amlogic.com>
Guosong Zhou [Thu, 30 Mar 2017 06:56:39 +0000 (14:56 +0800)]
amlvideo: initial add the driver
PD#138714: initial add the driver
1.Add amlogic amlvideo driver;
2.related Makefiles/Kconfig/Headfiles update;
Change-Id: If506455a1611af9a940112a8b37ab8c63b6a37d8
Signed-off-by: Guosong Zhou <guosong.zhou@amlogic.com>
Bo Yang [Fri, 31 Mar 2017 12:08:17 +0000 (20:08 +0800)]
jtag: add jtag setup driver support
PD#138714: jtag: add jtag setup driver support
add two configuration fot jtag support:
+CONFIG_AMLOGIC_JTAG=y
+CONFIG_AMLOGIC_JTAG_MESON=y
Change-Id: Ic0bdd336b1ec8ca31359f1a6ab0b0e305a8e37d1
Signed-off-by: Bo Yang <bo.yang@amlogic.com>
Xing Wang [Thu, 30 Mar 2017 07:50:13 +0000 (15:50 +0800)]
audio: pcm capability for channel and sample rate info
PD#138714: pcm capabilit for master mode
pcm supports 1~16channel, 8k~192k, S16_LE/S24_LE/S32_LE
Change-Id: Idc61667a4e374fa156f936277f74c7f19801b3d3
Signed-off-by: Xing Wang <xing.wang@amlogic.com>
Jianxin Pan [Thu, 30 Mar 2017 08:06:48 +0000 (01:06 -0700)]
Merge "cec: fix system server crash for long time waiting" into amlogic-4.9-dev
Tao Zeng [Thu, 30 Mar 2017 07:56:28 +0000 (15:56 +0800)]
cec: fix system server crash for long time waiting
PD#138714: cec: fix system server crash in cec
Avoid too long check for physical address valid
Change-Id: Ia012c0f2b9901c39d0d71fd326df2828c2a2e991
Signed-off-by: Tao Zeng <tao.zeng@amlogic.com>
Qiufang Dai [Thu, 16 Mar 2017 11:02:23 +0000 (19:02 +0800)]
PM / sleep: GX family PM driver inital import
PD#138714: GX family PM driver inital import
Enable cpu idle.
Change-Id: Ifd2d80dbef835926bdadc95d75a0cd4e3b388eb2
Signed-off-by: Qiufang Dai <qiufang.dai@amlogic.com>
Signed-off-by: Yan Wang <yan.wang@amlogic.com>
Xing Wang [Wed, 29 Mar 2017 04:24:50 +0000 (12:24 +0800)]
audio: add audio dsp
PD#138714: add audio dsp driver
Change-Id: I289771eed7b5ec4ce6c8f657f32593461cc61770
Signed-off-by: Xing Wang <xing.wang@amlogic.com>
Jianxin Pan [Wed, 29 Mar 2017 03:20:27 +0000 (11:20 +0800)]
clk: fix gxl clock not enabled issue
PD#138714: gxl clock is not enabled after
d3e988b45
Change-Id: I29867355feac23161f3078570973ea566671d11f
Signed-off-by: Jianxin Pan <jianxin.pan@amlogic.com>
Yue Wang [Tue, 28 Mar 2017 03:11:55 +0000 (11:11 +0800)]
usb: enable meson8b usb driver
PD#141217: usb: enable meson8b usb driver
Change-Id: I4c7e99b83322da8a691293f4cee45a91d7e0b38e
Signed-off-by: Yue Wang <yue.wang@amlogic.com>
Guosong Zhou [Fri, 24 Mar 2017 12:05:04 +0000 (20:05 +0800)]
ppmgr: initial add the driver
PD#138714: initial add the driver
1.add amlogic ppmgr driver;
2.device tree support ppmgr for p212/q200/skt/p400/p401;
3.related Makefiles/Kconfig/Headfiles update;
Change-Id: I03f6b17090b854c6552156a2d6f35895fe8ecd3b
Signed-off-by: Guosong Zhou <guosong.zhou@amlogic.com>
Xing Wang [Mon, 27 Mar 2017 11:53:59 +0000 (19:53 +0800)]
audio: add sound card
PD#138714: add sound card driver
Change-Id: I8c5cea4c62507976ab28ad76f6a3e42a9472cea4
Signed-off-by: Xing Wang <xing.wang@amlogic.com>
Yun Cai [Thu, 23 Mar 2017 11:11:07 +0000 (19:11 +0800)]
clk: move from driver/clk/meson
PD#141217: initialize clock tree
Change-Id: Ie00bbdd65b4e430434e66d4c1f4275a6c46ee44a
Signed-off-by: Yun Cai <yun.cai@amlogic.com>
Frank Chen [Sat, 25 Mar 2017 14:52:28 +0000 (22:52 +0800)]
nand: fix data corruption after mtd write
PD#138714: fix data corruption after mtd write
nand_chip.erase is obligatory, without this implement, nand write
will fail and cause ecc fail next time read data.
Change-Id: I79cabce7c03938e13785ecc42d48b9e750508f3e
Signed-off-by: Frank Chen <frank.chen@amlogic.com>
Evoke Zhang [Mon, 27 Mar 2017 07:56:35 +0000 (15:56 +0800)]
PD#138714: hdmitx: fix validate_vmode checking crash for invalid mode
Change-Id: I2c57617316f24ac378143fc9e7ff66071b8dced1
Signed-off-by: Evoke Zhang <evoke.zhang@amlogic.com>
Zongdong Jiao [Wed, 22 Mar 2017 03:15:46 +0000 (11:15 +0800)]
hdmitx: optimize 'valid_mode' checking
PD#138714: hdmitx: optimize 'valid_mode' checking
Pick from PD#139639.
For Y420 modes, only 4k50/60hz modes have this colorspace.
Forbidden for any other hdmi modes.
Change-Id: Ib929b55c68a6c2f535fb576ebe98254e464c1daa
Signed-off-by: Zongdong Jiao <zongdong.jiao@amlogic.com>
Victor Wan [Fri, 24 Mar 2017 09:35:59 +0000 (17:35 +0800)]
Merge branch 'linux-linaro-lsk-v4.9' into amlogic-4.9-dev
Jianxin Pan [Fri, 24 Mar 2017 05:40:43 +0000 (13:40 +0800)]
defconfig: meson32: set printk buffer to 512K
PD#141217: set CONFIG_LOG_BUF_SHIFT=19
Change-Id: Ic9a8c61848dfcee9ca46ef4277ba6f5271b92feb
Signed-off-by: Jianxin Pan <jianxin.pan@amlogic.com>
Frank Chen [Fri, 24 Mar 2017 06:31:23 +0000 (14:31 +0800)]
defconfig: meson64: enable squashfs and ubifs
PD#138714: enable squashfs and ubifs
+CONFIG_MTD_UBI=y
+CONFIG_UBIFS_FS=y
+CONFIG_SQUASHFS=y
Change-Id: I0a5dbe748ae210fd4e7f027005a5768d11f17db6
Signed-off-by: Frank Chen <frank.chen@amlogic.com>
Liang Yang [Fri, 17 Mar 2017 14:35:23 +0000 (22:35 +0800)]
mtd: add mtd driver
PD#138714: initial mtd driver
Change-Id: I8ba13b64621899f420ffcc10c68eacf9d9763b9b
Signed-off-by: Liang Yang <liang.yang@amlogic.com>
Signed-off-by: Frank Chen <frank.chen@amlogic.com>
Jianxin Pan [Tue, 21 Mar 2017 05:43:19 +0000 (13:43 +0800)]
meson8b: initial add m8b
PD#141217: initial add meson8b
1. smp
2. ioremap
3. uart
4. gcc6.3.1
Change-Id: I23aa5d64260452df13d6d8965386977581dcdb09
Signed-off-by: Jianxin Pan <jianxin.pan@amlogic.com>
Guosong Zhou [Tue, 21 Mar 2017 12:46:54 +0000 (20:46 +0800)]
amlvideo2: initial add the driver
PD#138714: initial add the driver
1.Add amlogic amlvideo2 driver;
2.device tree support of amlvideo2 for p212/q200/skt;
3.related Makefiles/Kconfig/Headfiles update;
Change-Id: I6cbd4e7f19a7ec7199106f586f195a2099130e09
Signed-off-by: Guosong Zhou <guosong.zhou@amlogic.com>
Greg Kroah-Hartman [Wed, 22 Mar 2017 11:44:07 +0000 (12:44 +0100)]
Linux 4.9.17
Daniel Axtens [Fri, 3 Mar 2017 06:56:55 +0000 (17:56 +1100)]
crypto: powerpc - Fix initialisation of crc32c context
commit
aa2be9b3d6d2d699e9ca7cbfc00867c80e5da213 upstream.
Turning on crypto self-tests on a POWER8 shows:
alg: hash: Test 1 failed for crc32c-vpmsum
00000000: ff ff ff ff
Comparing the code with the Intel CRC32c implementation on which
ours is based shows that we are doing an init with 0, not ~0
as CRC32c requires.
This probably wasn't caught because btrfs does its own weird
open-coded initialisation.
Initialise our internal context to ~0 on init.
This makes the self-tests pass, and btrfs continues to work.
Fixes:
6dd7a82cc54e ("crypto: powerpc - Add POWER8 optimised crc32c")
Cc: Anton Blanchard <anton@samba.org>
Signed-off-by: Daniel Axtens <dja@axtens.net>
Acked-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Niklas Cassel [Sat, 25 Feb 2017 00:17:53 +0000 (01:17 +0100)]
locking/rwsem: Fix down_write_killable() for CONFIG_RWSEM_GENERIC_SPINLOCK=y
commit
17fcbd590d0c3e35bd9646e2215f86586378bc42 upstream.
We hang if SIGKILL has been sent, but the task is stuck in down_read()
(after do_exit()), even though no task is doing down_write() on the
rwsem in question:
INFO: task libupnp:21868 blocked for more than 120 seconds.
libupnp D 0 21868 1 0x08100008
...
Call Trace:
__schedule()
schedule()
__down_read()
do_exit()
do_group_exit()
__wake_up_parent()
This bug has already been fixed for CONFIG_RWSEM_XCHGADD_ALGORITHM=y in
the following commit:
04cafed7fc19 ("locking/rwsem: Fix down_write_killable()")
... however, this bug also exists for CONFIG_RWSEM_GENERIC_SPINLOCK=y.
Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <mhocko@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Niklas Cassel <niklass@axis.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes:
d47996082f52 ("locking/rwsem: Introduce basis for down_write_killable()")
Link: http://lkml.kernel.org/r/1487981873-12649-1-git-send-email-niklass@axis.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Zijlstra [Sat, 4 Mar 2017 09:27:19 +0000 (10:27 +0100)]
futex: Add missing error handling to FUTEX_REQUEUE_PI
commit
9bbb25afeb182502ca4f2c4f3f88af0681b34cae upstream.
Thomas spotted that fixup_pi_state_owner() can return errors and we
fail to unlock the rt_mutex in that case.
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Darren Hart <dvhart@linux.intel.com>
Cc: juri.lelli@arm.com
Cc: bigeasy@linutronix.de
Cc: xlpang@redhat.com
Cc: rostedt@goodmis.org
Cc: mathieu.desnoyers@efficios.com
Cc: jdesfossez@efficios.com
Cc: dvhart@infradead.org
Cc: bristot@redhat.com
Link: http://lkml.kernel.org/r/20170304093558.867401760@infradead.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Zijlstra [Sat, 4 Mar 2017 09:27:18 +0000 (10:27 +0100)]
futex: Fix potential use-after-free in FUTEX_REQUEUE_PI
commit
c236c8e95a3d395b0494e7108f0d41cf36ec107c upstream.
While working on the futex code, I stumbled over this potential
use-after-free scenario. Dmitry triggered it later with syzkaller.
pi_mutex is a pointer into pi_state, which we drop the reference on in
unqueue_me_pi(). So any access to that pointer after that is bad.
Since other sites already do rt_mutex_unlock() with hb->lock held, see
for example futex_lock_pi(), simply move the unlock before
unqueue_me_pi().
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Darren Hart <dvhart@linux.intel.com>
Cc: juri.lelli@arm.com
Cc: bigeasy@linutronix.de
Cc: xlpang@redhat.com
Cc: rostedt@goodmis.org
Cc: mathieu.desnoyers@efficios.com
Cc: jdesfossez@efficios.com
Cc: dvhart@infradead.org
Cc: bristot@redhat.com
Link: http://lkml.kernel.org/r/20170304093558.801744246@infradead.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Andy Lutomirski [Thu, 16 Mar 2017 19:59:39 +0000 (12:59 -0700)]
x86/perf: Fix CR4.PCE propagation to use active_mm instead of mm
commit
5dc855d44c2ad960a86f593c60461f1ae1566b6d upstream.
If one thread mmaps a perf event while another thread in the same mm
is in some context where active_mm != mm (which can happen in the
scheduler, for example), refresh_pce() would write the wrong value
to CR4.PCE. This broke some PAPI tests.
Reported-and-tested-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes:
7911d3f7af14 ("perf/x86: Only allow rdpmc if a perf_event is mapped")
Link: http://lkml.kernel.org/r/0c5b38a76ea50e405f9abe07a13dfaef87c173a1.1489694270.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Andrey Ryabinin [Mon, 13 Mar 2017 16:33:37 +0000 (19:33 +0300)]
x86/kasan: Fix boot with KASAN=y and PROFILE_ANNOTATED_BRANCHES=y
commit
be3606ff739d1c1be36389f8737c577ad87e1f57 upstream.
The kernel doesn't boot with both PROFILE_ANNOTATED_BRANCHES=y and KASAN=y
options selected. With branch profiling enabled we end up calling
ftrace_likely_update() before kasan_early_init(). ftrace_likely_update() is
built with KASAN instrumentation, so calling it before kasan has been
initialized leads to crash.
Use DISABLE_BRANCH_PROFILING define to make sure that we don't call
ftrace_likely_update() from early code before kasan_early_init().
Fixes:
ef7f0d6a6ca8 ("x86_64: add KASan support")
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: kasan-dev@googlegroups.com
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: lkp@01.org
Cc: Dmitry Vyukov <dvyukov@google.com>
Link: http://lkml.kernel.org/r/20170313163337.1704-1-aryabinin@virtuozzo.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Zijlstra [Mon, 13 Mar 2017 14:57:12 +0000 (15:57 +0100)]
x86/tsc: Fix ART for TSC_KNOWN_FREQ
commit
44fee88cea43d3c2cac962e0439cb10a3cabff6d upstream.
Subhransu reported that convert_art_to_tsc() isn't working for him.
The ART to TSC relation is only set up for systems which use the refined
TSC calibration. Systems with known TSC frequency (available via CPUID 15)
are not using the refined calibration and therefor the ART to TSC relation
is never established.
Add the setup to the known frequency init path which skips ART
calibration. The init code needs to be duplicated as for systems which use
refined calibration the ART setup must be delayed until calibration has
been done.
The problem has been there since the ART support was introdduced, but only
detected now because Subhransu tested the first time on hardware which has
TSC frequency enumerated via CPUID 15.
Note for stable: The conditional has changed from TSC_RELIABLE to
TSC_KNOWN_FREQUENCY.
[ tglx: Rewrote changelog and identified the proper 'Fixes' commit ]
Fixes:
f9677e0f8308 ("x86/tsc: Always Running Timer (ART) correlated clocksource")
Reported-by: "Prusty, Subhransu S" <subhransu.s.prusty@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Cc: christopher.s.hall@intel.com
Cc: kevin.b.stanton@intel.com
Cc: john.stultz@linaro.org
Cc: akataria@vmware.com
Link: http://lkml.kernel.org/r/20170313145712.GI3312@twins.programming.kicks-ass.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Shanker Donthineni [Tue, 7 Mar 2017 14:20:38 +0000 (08:20 -0600)]
irqchip/gicv3-its: Add workaround for QDF2400 ITS erratum 0065
commit
90922a2d03d84de36bf8a9979d62580102f31a92 upstream.
On Qualcomm Datacenter Technologies QDF2400 SoCs, the ITS hardware
implementation uses 16Bytes for Interrupt Translation Entry (ITE),
but reports an incorrect value of 8Bytes in GITS_TYPER.ITTE_size.
It might cause kernel memory corruption depending on the number
of MSI(x) that are configured and the amount of memory that has
been allocated for ITEs in its_create_device().
This patch fixes the potential memory corruption by setting the
correct ITE size to 16Bytes.
Cc: stable@vger.kernel.org
Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Marc Zyngier [Fri, 17 Feb 2017 14:32:18 +0000 (14:32 +0000)]
arm64: KVM: VHE: Clear HCR_TGE when invalidating guest TLBs
commit
68925176296a8b995e503349200e256674bfe5ac upstream.
When invalidating guest TLBs, special care must be taken to
actually shoot the guest TLBs and not the host ones if we're
running on a VHE system. This is controlled by the HCR_EL2.TGE
bit, which we forget to clear before invalidating TLBs.
Address the issue by introducing two wrappers (__tlb_switch_to_guest
and __tlb_switch_to_host) that take care of both the VTTBR_EL2
and HCR_EL2.TGE switching.
Reported-by: Tomasz Nowicki <tnowicki@caviumnetworks.com>
Tested-by: Tomasz Nowicki <tnowicki@caviumnetworks.com>
Reviewed-by: Christoffer Dall <cdall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Boris Brezillon [Fri, 2 Dec 2016 13:48:07 +0000 (14:48 +0100)]
drm/vc4: Fix ->clock_select setting for the VEC encoder
commit
ab8df60e3a3b68420d0d4477c5f07c00fbfb078b upstream.
PV_CONTROL_CLK_SELECT_VEC is actually 2 and not 0. Fix the definition and
rework the vc4_set_crtc_possible_masks() to cover the full range of the
PV_CONTROL_CLK_SELECT field.
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
Cc: Amit Pundir <amit.pundir@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Derek Foreman [Thu, 24 Nov 2016 18:11:55 +0000 (12:11 -0600)]
drm/vc4: Fix race between page flip completion event and clean-up
commit
26fc78f6fef39b9d7a15def5e7e9826ff68303f4 upstream.
There was a small window where a userspace program could submit
a pageflip after receiving a pageflip completion event yet still
receive EBUSY.
Signed-off-by: Derek Foreman <derekf@osg.samsung.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Cc: Amit Pundir <amit.pundir@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Boris Brezillon [Tue, 22 Nov 2016 20:45:28 +0000 (12:45 -0800)]
clk: bcm2835: Fix ->fixed_divider of pllh_aux
commit
f2a46926aba1f0c33944901d2420a6a887455ddc upstream.
There is no fixed divider on pllh_aux.
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Cc: Amit Pundir <amit.pundir@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Michael Ellerman [Fri, 17 Mar 2017 00:48:33 +0000 (00:48 +0000)]
powerpc/mm: Fix build break when CMA=n && SPAPR_TCE_IOMMU=y
[ Upstream commit
a05ef161cdd22faccffe06f21fc8f1e249565385 ]
Currently the build breaks if CMA=n and SPAPR_TCE_IOMMU=y:
arch/powerpc/mm/mmu_context_iommu.c: In function ‘mm_iommu_get’:
arch/powerpc/mm/mmu_context_iommu.c:193:42: error: ‘MIGRATE_CMA’ undeclared (first use in this function)
if (get_pageblock_migratetype(page) == MIGRATE_CMA) {
^~~~~~~~~~~
Fix it by using the existing is_migrate_cma_page(), which evaulates to
false when CMA=n.
Fixes:
2e5bbb5461f1 ("KVM: PPC: Book3S HV: Migrate pinned pages out of CMA")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Alexandre Belloni [Fri, 17 Mar 2017 00:48:33 +0000 (00:48 +0000)]
usb: gadget: udc: atmel: remove memory leak
[ Upstream commit
32856eea7bf75dfb99b955ada6e147f553a11366 ]
Commit
bbe097f092b0 ("usb: gadget: udc: atmel: fix endpoint name")
introduced a memory leak when unbinding the driver. The endpoint names
would not be freed. Solve that by including the name as a string in struct
usba_ep so it is freed when the endpoint is.
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Gabriel Krisman Bertazi [Fri, 17 Mar 2017 00:48:32 +0000 (00:48 +0000)]
serial: 8250_pci: Detach low-level driver during PCI error recovery
[ Upstream commit
f209fa03fc9d131b3108c2e4936181eabab87416 ]
During a PCI error recovery, like the ones provoked by EEH in the ppc64
platform, all IO to the device must be blocked while the recovery is
completed. Current 8250_pci implementation only suspends the port
instead of detaching it, which doesn't prevent incoming accesses like
TIOCMGET and TIOCMSET calls from reaching the device. Those end up
racing with the EEH recovery, crashing it. Similar races were also
observed when opening the device and when shutting it down during
recovery.
This patch implements a more robust IO blockage for the 8250_pci
recovery by unregistering the port at the beginning of the procedure and
re-adding it afterwards. Since the port is detached from the uart
layer, we can be sure that no request will make through to the device
during recovery. This is similar to the solution used by the JSM serial
driver.
I thank Peter Hurley <peter@hurleysoftware.com> for valuable input on
this one over one year ago.
Signed-off-by: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Michael Pobega [Fri, 17 Mar 2017 00:48:32 +0000 (00:48 +0000)]
ACPI / blacklist: Make Dell Latitude 3350 ethernet work
[ Upstream commit
708f5dcc21ae9b35f395865fc154b0105baf4de4 ]
The Dell Latitude 3350's ethernet card attempts to use a reserved
IRQ (18), resulting in ACPI being unable to enable the ethernet.
Adding it to acpi_rev_dmi_table[] helps to work around this problem.
Signed-off-by: Michael Pobega <mpobega@neverware.com>
[ rjw: Changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Alex Hung [Fri, 17 Mar 2017 00:48:32 +0000 (00:48 +0000)]
ACPI / blacklist: add _REV quirks for Dell Precision 5520 and 3520
[ Upstream commit
9523b9bf6dceef6b0215e90b2348cd646597f796 ]
Precision 5520 and 3520 either hang at login and during suspend or reboot.
It turns out that that adding them to acpi_rev_dmi_table[] helps to work
around those issues.
Signed-off-by: Alex Hung <alex.hung@canonical.com>
[ rjw: Changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Vladimir Davydov [Fri, 17 Mar 2017 00:48:31 +0000 (00:48 +0000)]
slub: move synchronize_sched out of slab_mutex on shrink
[ Upstream commit
89e364db71fb5e7fc8d93228152abfa67daf35fa ]
synchronize_sched() is a heavy operation and calling it per each cache
owned by a memory cgroup being destroyed may take quite some time. What
is worse, it's currently called under the slab_mutex, stalling all works
doing cache creation/destruction.
Actually, there isn't much point in calling synchronize_sched() for each
cache - it's enough to call it just once - after setting cpu_partial for
all caches and before shrinking them. This way, we can also move it out
of the slab_mutex, which we have to hold for iterating over the slab
cache list.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=172991
Link: http://lkml.kernel.org/r/0a10d71ecae3db00fb4421bcd3f82bcc911f4be4.1475329751.git.vdavydov.dev@gmail.com
Signed-off-by: Vladimir Davydov <vdavydov.dev@gmail.com>
Reported-by: Doug Smythies <dsmythies@telus.net>
Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Henrik Ingo [Fri, 17 Mar 2017 00:48:31 +0000 (00:48 +0000)]
uvcvideo: uvc_scan_fallback() for webcams with broken chain
[ Upstream commit
e950267ab802c8558f1100eafd4087fd039ad634 ]
Some devices have invalid baSourceID references, causing uvc_scan_chain()
to fail, but if we just take the entities we can find and put them
together in the most sensible chain we can think of, turns out they do
work anyway. Note: This heuristic assumes there is a single chain.
At the time of writing, devices known to have such a broken chain are
- Acer Integrated Camera (5986:055a)
- Realtek rtl157a7 (0bda:57a7)
Signed-off-by: Henrik Ingo <henrik.ingo@avoinelama.fi>
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Harald Freudenberger [Fri, 17 Mar 2017 00:48:31 +0000 (00:48 +0000)]
s390/zcrypt: Introduce CEX6 toleration
[ Upstream commit
b3e8652bcbfa04807e44708d4d0c8cdad39c9215 ]
Signed-off-by: Harald Freudenberger <freude@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Mauricio Faria de Oliveira [Fri, 17 Mar 2017 00:48:30 +0000 (00:48 +0000)]
block: allow WRITE_SAME commands with the SG_IO ioctl
[ Upstream commit
25cdb64510644f3e854d502d69c73f21c6df88a9 ]
The WRITE_SAME commands are not present in the blk_default_cmd_filter
write_ok list, and thus are failed with -EPERM when the SG_IO ioctl()
is executed without CAP_SYS_RAWIO capability (e.g., unprivileged users).
[ sg_io() -> blk_fill_sghdr_rq() > blk_verify_command() -> -EPERM ]
The problem can be reproduced with the sg_write_same command
# sg_write_same --num 1 --xferlen 512 /dev/sda
#
# capsh --drop=cap_sys_rawio -- -c \
'sg_write_same --num 1 --xferlen 512 /dev/sda'
Write same: pass through os error: Operation not permitted
#
For comparison, the WRITE_VERIFY command does not observe this problem,
since it is in that list:
# capsh --drop=cap_sys_rawio -- -c \
'sg_write_verify --num 1 --ilen 512 --lba 0 /dev/sda'
#
So, this patch adds the WRITE_SAME commands to the list, in order
for the SG_IO ioctl to finish successfully:
# capsh --drop=cap_sys_rawio -- -c \
'sg_write_same --num 1 --xferlen 512 /dev/sda'
#
That case happens to be exercised by QEMU KVM guests with 'scsi-block' devices
(qemu "-device scsi-block" [1], libvirt "<disk type='block' device='lun'>" [2]),
which employs the SG_IO ioctl() and runs as an unprivileged user (libvirt-qemu).
In that scenario, when a filesystem (e.g., ext4) performs its zero-out calls,
which are translated to write-same calls in the guest kernel, and then into
SG_IO ioctls to the host kernel, SCSI I/O errors may be observed in the guest:
[...] sd 0:0:0:0: [sda] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[...] sd 0:0:0:0: [sda] tag#0 Sense Key : Aborted Command [current]
[...] sd 0:0:0:0: [sda] tag#0 Add. Sense: I/O process terminated
[...] sd 0:0:0:0: [sda] tag#0 CDB: Write Same(10) 41 00 01 04 e0 78 00 00 08 00
[...] blk_update_request: I/O error, dev sda, sector
17096824
Links:
[1] http://git.qemu.org/?p=qemu.git;a=commit;h=
336a6915bc7089fb20fea4ba99972ad9a97c5f52
[2] https://libvirt.org/formatdomain.html#elementsDisks (see 'disk' -> 'device')
Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
Signed-off-by: Brahadambal Srinivasan <latha@linux.vnet.ibm.com>
Reported-by: Manjunatha H R <manjuhr1@in.ibm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Ben Skeggs [Fri, 17 Mar 2017 00:48:30 +0000 (00:48 +0000)]
drm/nouveau/disp/nv50-: specify ctrl/user separately when constructing classes
[ Upstream commit
2a32b9b1866a2ee9f01fbf2a48d99012f0120739 ]
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Ben Skeggs [Fri, 17 Mar 2017 00:48:29 +0000 (00:48 +0000)]
drm/nouveau/disp/nv50-: split chid into chid.ctrl and chid.user
[ Upstream commit
4391d7f5c79a9fe6fa11cf6c160ca7f7bdb49d2a ]
GP102/GP104 make life difficult by redefining the channel indices for
some registers, but not others.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Ben Skeggs [Fri, 17 Mar 2017 00:48:29 +0000 (00:48 +0000)]
drm/nouveau/disp/gp102: fix cursor/overlay immediate channel indices
[ Upstream commit
e50fcff15fe120ef2103a9e18af6644235c2b14d ]
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Alexey Kardashevskiy [Fri, 17 Mar 2017 00:48:29 +0000 (00:48 +0000)]
vfio/spapr: Postpone default window creation
[ Upstream commit
d9c728949ddc9de5734bf3b12ea906ca8a77f2a0 ]
We are going to allow the userspace to configure container in
one memory context and pass container fd to another so
we are postponing memory allocations accounted against
the locked memory limit. One of previous patches took care of
it_userspace.
At the moment we create the default DMA window when the first group is
attached to a container; this is done for the userspace which is not
DDW-aware but familiar with the SPAPR TCE IOMMU v2 in the part of memory
pre-registration - such client expects the default DMA window to exist.
This postpones the default DMA window allocation till one of
the folliwing happens:
1. first map/unmap request arrives;
2. new window is requested;
This adds noop for the case when the userspace requested removal
of the default window which has not been created yet.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Alexey Kardashevskiy [Fri, 17 Mar 2017 00:48:28 +0000 (00:48 +0000)]
vfio/spapr: Add a helper to create default DMA window
[ Upstream commit
6f01cc692a16405235d5c34056455b182682123c ]
There is already a helper to create a DMA window which does allocate
a table and programs it to the IOMMU group. However
tce_iommu_take_ownership_ddw() did not use it and did these 2 calls
itself to simplify error path.
Since we are going to delay the default window creation till
the default window is accessed/removed or new window is added,
we need a helper to create a default window from all these cases.
This adds tce_iommu_create_default_window(). Since it relies on
a VFIO container to have at least one IOMMU group (for future use),
this changes tce_iommu_attach_group() to add a group to the container
first and then call the new helper.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Alexey Kardashevskiy [Fri, 17 Mar 2017 00:48:27 +0000 (00:48 +0000)]
powerpc/mm/iommu, vfio/spapr: Put pages on VFIO container shutdown
[ Upstream commit
4b6fad7097f883335b6d9627c883cb7f276d94c9 ]
At the moment the userspace tool is expected to request pinning of
the entire guest RAM when VFIO IOMMU SPAPR v2 driver is present.
When the userspace process finishes, all the pinned pages need to
be put; this is done as a part of the userspace memory context (MM)
destruction which happens on the very last mmdrop().
This approach has a problem that a MM of the userspace process
may live longer than the userspace process itself as kernel threads
use userspace process MMs which was runnning on a CPU where
the kernel thread was scheduled to. If this happened, the MM remains
referenced until this exact kernel thread wakes up again
and releases the very last reference to the MM, on an idle system this
can take even hours.
This moves preregistered regions tracking from MM to VFIO; insteads of
using mm_iommu_table_group_mem_t::used, tce_container::prereg_list is
added so each container releases regions which it has pre-registered.
This changes the userspace interface to return EBUSY if a memory
region is already registered in a container. However it should not
have any practical effect as the only userspace tool available now
does register memory region once per container anyway.
As tce_iommu_register_pages/tce_iommu_unregister_pages are called
under container->lock, this does not need additional locking.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Alexey Kardashevskiy [Fri, 17 Mar 2017 00:48:27 +0000 (00:48 +0000)]
vfio/spapr: Reference mm in tce_container
[ Upstream commit
bc82d122ae4a0e9f971f13403995898fcfa0c09e ]
In some situations the userspace memory context may live longer than
the userspace process itself so if we need to do proper memory context
cleanup, we better have tce_container take a reference to mm_struct and
use it later when the process is gone (@current or @current->mm is NULL).
This references mm and stores the pointer in the container; this is done
in a new helper - tce_iommu_mm_set() - when one of the following happens:
- a container is enabled (IOMMU v1);
- a first attempt to pre-register memory is made (IOMMU v2);
- a DMA window is created (IOMMU v2).
The @mm stays referenced till the container is destroyed.
This replaces current->mm with container->mm everywhere except debug
prints.
This adds a check that current->mm is the same as the one stored in
the container to prevent userspace from making changes to a memory
context of other processes.
DMA map/unmap ioctls() do not check for @mm as they already check
for @enabled which is set after tce_iommu_mm_set() is called.
This does not reference a task as multiple threads within the same mm
are allowed to ioctl() to vfio and supposedly they will have same limits
and capabilities and if they do not, we'll just fail with no harm made.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Alexey Kardashevskiy [Fri, 17 Mar 2017 00:48:27 +0000 (00:48 +0000)]
powerpc/iommu: Stop using @current in mm_iommu_xxx
[ Upstream commit
d7baee6901b34c4895eb78efdbf13a49079d7404 ]
This changes mm_iommu_xxx helpers to take mm_struct as a parameter
instead of getting it from @current which in some situations may
not have a valid reference to mm.
This changes helpers to receive @mm and moves all references to @current
to the caller, including checks for !current and !current->mm;
checks in mm_iommu_preregistered() are removed as there is no caller
yet.
This moves the mm_iommu_adjust_locked_vm() call to the caller as
it receives mm_iommu_table_group_mem_t but it needs mm.
This should cause no behavioral change.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Alexey Kardashevskiy [Fri, 17 Mar 2017 00:48:26 +0000 (00:48 +0000)]
powerpc/iommu: Pass mm_struct to init/cleanup helpers
[ Upstream commit
88f54a3581eb9deaa3bd1aade40aef266d782385 ]
We are going to get rid of @current references in mmu_context_boos3s64.c
and cache mm_struct in the VFIO container. Since mm_context_t does not
have reference counting, we will be using mm_struct which does have
the reference counter.
This changes mm_iommu_init/mm_iommu_cleanup to receive mm_struct rather
than mm_context_t (which is embedded into mm).
This should not cause any behavioral change.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Alexey Kardashevskiy [Fri, 17 Mar 2017 00:48:26 +0000 (00:48 +0000)]
vfio/spapr: Postpone allocation of userspace version of TCE table
[ Upstream commit
39701e56f5f16ea0cf8fc9e8472e645f8de91d23 ]
The iommu_table struct manages a hardware TCE table and a vmalloc'd
table with corresponding userspace addresses. Both are allocated when
the default DMA window is created and this happens when the very first
group is attached to a container.
As we are going to allow the userspace to configure container in one
memory context and pas container fd to another, we have to postpones
such allocations till a container fd is passed to the destination
user process so we would account locked memory limit against the actual
container user constrainsts.
This postpones the it_userspace array allocation till it is used first
time for mapping. The unmapping patch already checks if the array is
allocated.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Vitaly Kuznetsov [Fri, 17 Mar 2017 00:48:26 +0000 (00:48 +0000)]
Drivers: hv: ring_buffer: count on wrap around mappings in get_next_pkt_raw() (v2)
[ Upstream commit
fa32ff6576623616c1751562edaed8c164ca5199 ]
With wrap around mappings in place we can always provide drivers with
direct links to packets on the ring buffer, even when they wrap around.
Do the required updates to get_next_pkt_raw()/put_pkt_raw()
The first version of this commit was reverted (
65a532f3d50a) to deal with
cross-tree merge issues which are (hopefully) resolved now.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Tested-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Thomas Falcon [Fri, 17 Mar 2017 00:48:25 +0000 (00:48 +0000)]
ibmveth: calculate gso_segs for large packets
[ Upstream commit
94acf164dc8f1184e8d0737be7125134c2701dbe ]
Include calculations to compute the number of segments
that comprise an aggregated large packet.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Reviewed-by: Jonathan Maxwell <jmaxwell37@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Gavin Shan [Fri, 17 Mar 2017 00:48:25 +0000 (00:48 +0000)]
PCI: Do any VF BAR updates before enabling the BARs
[ Upstream commit
f40ec3c748c6912f6266c56a7f7992de61b255ed ]
Previously we enabled VFs and enable their memory space before calling
pcibios_sriov_enable(). But pcibios_sriov_enable() may update the VF BARs:
for example, on PPC PowerNV we may change them to manage the association of
VFs to PEs.
Because 64-bit BARs cannot be updated atomically, it's unsafe to update
them while they're enabled. The half-updated state may conflict with other
devices in the system.
Call pcibios_sriov_enable() before enabling the VFs so any BAR updates
happen while the VF BARs are disabled.
[bhelgaas: changelog]
Tested-by: Carol Soto <clsoto@us.ibm.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Bjorn Helgaas [Fri, 17 Mar 2017 00:48:24 +0000 (00:48 +0000)]
PCI: Ignore BAR updates on virtual functions
[ Upstream commit
63880b230a4af502c56dde3d4588634c70c66006 ]
VF BARs are read-only zero, so updating VF BARs will not have any effect.
See the SR-IOV spec r1.1, sec 3.4.1.11.
We already ignore these updates because of
70675e0b6a1a ("PCI: Don't try to
restore VF BARs"); this merely restructures it slightly to make it easier
to split updates for standard and SR-IOV BARs.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Bjorn Helgaas [Fri, 17 Mar 2017 00:48:24 +0000 (00:48 +0000)]
PCI: Update BARs using property bits appropriate for type
[ Upstream commit
45d004f4afefdd8d79916ee6d97a9ecd94bb1ffe ]
The BAR property bits (0-3 for memory BARs, 0-1 for I/O BARs) are supposed
to be read-only, but we do save them in res->flags and include them when
updating the BAR.
Mask the I/O property bits with ~PCI_BASE_ADDRESS_IO_MASK (0x3) instead of
PCI_REGION_FLAG_MASK (0xf) to make it obvious that we can't corrupt bits
2-3 of I/O addresses.
Use PCI_ROM_ADDRESS_MASK for ROM BARs. This means we'll only check the top
21 bits (instead of the 28 bits we used to check) of a ROM BAR to see if
the update was successful.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Bjorn Helgaas [Fri, 17 Mar 2017 00:48:24 +0000 (00:48 +0000)]
PCI: Don't update VF BARs while VF memory space is enabled
[ Upstream commit
546ba9f8f22f71b0202b6ba8967be5cc6dae4e21 ]
If we update a VF BAR while it's enabled, there are two potential problems:
1) Any driver that's using the VF has a cached BAR value that is stale
after the update, and
2) We can't update 64-bit BARs atomically, so the intermediate state
(new lower dword with old upper dword) may conflict with another
device, and an access by a driver unrelated to the VF may cause a bus
error.
Warn about attempts to update VF BARs while they are enabled. This is a
programming error, so use dev_WARN() to get a backtrace.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Bjorn Helgaas [Fri, 17 Mar 2017 00:48:23 +0000 (00:48 +0000)]
PCI: Decouple IORESOURCE_ROM_ENABLE and PCI_ROM_ADDRESS_ENABLE
[ Upstream commit
7a6d312b50e63f598f5b5914c4fd21878ac2b595 ]
Remove the assumption that IORESOURCE_ROM_ENABLE == PCI_ROM_ADDRESS_ENABLE.
PCI_ROM_ADDRESS_ENABLE is the ROM enable bit defined by the PCI spec, so if
we're reading or writing a BAR register value, that's what we should use.
IORESOURCE_ROM_ENABLE is a corresponding bit in struct resource flags.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Bjorn Helgaas [Fri, 17 Mar 2017 00:48:23 +0000 (00:48 +0000)]
PCI: Add comments about ROM BAR updating
[ Upstream commit
0b457dde3cf8b7c76a60f8e960f21bbd4abdc416 ]
pci_update_resource() updates a hardware BAR so its address matches the
kernel's struct resource UNLESS it's a disabled ROM BAR. We only update
those when we enable the ROM.
It's not obvious from the code why ROM BARs should be handled specially.
Apparently there are Matrox devices with defective ROM BARs that read as
zero when disabled. That means that if pci_enable_rom() reads the disabled
BAR, sets PCI_ROM_ADDRESS_ENABLE (without re-inserting the address), and
writes it back, it would enable the ROM at address zero.
Add comments and references to explain why we can't make the code look more
rational.
The code changes are from
755528c860b0 ("Ignore disabled ROM resources at
setup") and
8085ce084c0f ("[PATCH] Fix PCI ROM mapping").
Link: https://lkml.org/lkml/2005/8/30/138
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Bjorn Helgaas [Fri, 17 Mar 2017 00:48:23 +0000 (00:48 +0000)]
PCI: Remove pci_resource_bar() and pci_iov_resource_bar()
[ Upstream commit
286c2378aaccc7343ebf17ec6cd86567659caf70 ]
pci_std_update_resource() only deals with standard BARs, so we don't have
to worry about the complications of VF BARs in an SR-IOV capability.
Compute the BAR address inline and remove pci_resource_bar(). That makes
pci_iov_resource_bar() unused, so remove that as well.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Bjorn Helgaas [Fri, 17 Mar 2017 00:48:22 +0000 (00:48 +0000)]
PCI: Separate VF BAR updates from standard BAR updates
[ Upstream commit
6ffa2489c51da77564a0881a73765ea2169f955d ]
Previously pci_update_resource() used the same code path for updating
standard BARs and VF BARs in SR-IOV capabilities.
Split the VF BAR update into a new pci_iov_update_resource() internal
interface, which makes it simpler to compute the BAR address (we can get
rid of pci_resource_bar() and pci_iov_resource_bar()).
This patch:
- Renames pci_update_resource() to pci_std_update_resource(),
- Adds pci_iov_update_resource(),
- Makes pci_update_resource() a wrapper that calls the appropriate one,
No functional change intended.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Vitaly Kuznetsov [Fri, 17 Mar 2017 00:48:22 +0000 (00:48 +0000)]
x86/hyperv: Handle unknown NMIs on one CPU when unknown_nmi_panic
[ Upstream commit
59107e2f48831daedc46973ce4988605ab066de3 ]
There is a feature in Hyper-V ('Debug-VM --InjectNonMaskableInterrupt')
which injects NMI to the guest. We may want to crash the guest and do kdump
on this NMI by enabling unknown_nmi_panic. To make kdump succeed we need to
allow the kdump kernel to re-establish VMBus connection so it will see
VMBus devices (storage, network,..).
To properly unload VMBus making it possible to start over during kdump we
need to do the following:
- Send an 'unload' message to the hypervisor. This can be done on any CPU
so we do this the crashing CPU.
- Receive the 'unload finished' reply message. WS2012R2 delivers this
message to the CPU which was used to establish VMBus connection during
module load and this CPU may differ from the CPU sending 'unload'.
Receiving a VMBus message means the following:
- There is a per-CPU slot in memory for one message. This slot can in
theory be accessed by any CPU.
- We get an interrupt on the CPU when a message was placed into the slot.
- When we read the message we need to clear the slot and signal the fact
to the hypervisor. In case there are more messages to this CPU pending
the hypervisor will deliver the next message. The signaling is done by
writing to an MSR so this can only be done on the appropriate CPU.
To avoid doing cross-CPU work on crash we have vmbus_wait_for_unload()
function which checks message slots for all CPUs in a loop waiting for the
'unload finished' messages. However, there is an issue which arises when
these conditions are met:
- We're crashing on a CPU which is different from the one which was used
to initially contact the hypervisor.
- The CPU which was used for the initial contact is blocked with interrupts
disabled and there is a message pending in the message slot.
In this case we won't be able to read the 'unload finished' message on the
crashing CPU. This is reproducible when we receive unknown NMIs on all CPUs
simultaneously: the first CPU entering panic() will proceed to crash and
all other CPUs will stop themselves with interrupts disabled.
The suggested solution is to handle unknown NMIs for Hyper-V guests on the
first CPU which gets them only. This will allow us to rely on VMBus
interrupt handler being able to receive the 'unload finish' message in
case it is delivered to a different CPU.
The issue is not reproducible on WS2016 as Debug-VM delivers NMI to the
boot CPU only, WS2012R2 and earlier Hyper-V versions are affected.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: K. Y. Srinivasan <kys@microsoft.com>
Cc: devel@linuxdriverproject.org
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Link: http://lkml.kernel.org/r/20161202100720.28121-1-vkuznets@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Michael Cyr [Fri, 17 Mar 2017 00:48:22 +0000 (00:48 +0000)]
scsi: ibmvscsis: Synchronize cmds at remove time
[ Upstream commit
8bf11557d44d00562360d370de8aa70ba89aa0d5 ]
This patch adds code to disconnect from the client, which will make sure
any outstanding commands have been completed, before continuing on with
the remove operation.
Signed-off-by: Michael Cyr <mikecyr@us.ibm.com>
Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Tested-by: Steven Royer <seroyer@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Michael Cyr [Fri, 17 Mar 2017 00:48:21 +0000 (00:48 +0000)]
scsi: ibmvscsis: Synchronize cmds at tpg_enable_store time
[ Upstream commit
c9b3379f60a83288a5e2f8ea75476460978689b0 ]
This patch changes the way the IBM vSCSI server driver manages its
Command/Response Queue (CRQ). We used to register the CRQ with phyp at
probe time. Now we wait until tpg_enable_store. Similarly, when
tpg_enable_store is called to "disable" (i.e. the stored value is 0),
we unregister the queue with phyp.
One consquence to this is that we have no need for the PART_UP_WAIT_ENAB
state, since we can't get an Init Message from the client in our CRQ if
we're waiting to be enabled, since we haven't registered the queue yet.
Signed-off-by: Michael Cyr <mikecyr@us.ibm.com>
Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Tested-by: Steven Royer <seroyer@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Michael Cyr [Fri, 17 Mar 2017 00:48:21 +0000 (00:48 +0000)]
scsi: ibmvscsis: Rearrange functions for future patches
[ Upstream commit
79fac9c9b74f4951c9ce82b22e714bcc34ae4a56 ]
This patch reorders functions in a manner necessary for a follow-on
patch. It also makes some minor styling changes (mostly removing extra
spaces) and fixes some typos.
There are no code changes in this patch, with one exception: due to the
reordering of the functions, I needed to explicitly declare a function
at the top of the file. However, this will be removed in the next patch,
since the code requiring the predeclaration will be removed.
Signed-off-by: Michael Cyr <mikecyr@us.ibm.com>
Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Tested-by: Steven Royer <seroyer@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Michael Cyr [Fri, 17 Mar 2017 00:48:20 +0000 (00:48 +0000)]
scsi: ibmvscsis: Clean up properly if target_submit_cmd/tmr fails
[ Upstream commit
7435b32e2d2fb5da6c2ae9b9c8ce56d8a3cb3bc3 ]
Signed-off-by: Michael Cyr <mikecyr@us.ibm.com>
Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Tested-by: Steven Royer <seroyer@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Michael Cyr [Fri, 17 Mar 2017 00:48:20 +0000 (00:48 +0000)]
scsi: ibmvscsis: Return correct partition name/# to client
[ Upstream commit
9c93cf03d4eb3dc58931ff7cac0af9c344fe5e0b ]
Signed-off-by: Michael Cyr <mikecyr@us.ibm.com>
Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Tested-by: Steven Royer <seroyer@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Michael Cyr [Fri, 17 Mar 2017 00:48:20 +0000 (00:48 +0000)]
scsi: ibmvscsis: Issues from Dan Carpenter/Smatch
[ Upstream commit
11950d70b52d2bc5e3580da8cd63909ef38d67db ]
Signed-off-by: Michael Cyr <mikecyr@us.ibm.com>
Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Tested-by: Steven Royer <seroyer@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Todd Fujinaka [Fri, 17 Mar 2017 00:48:19 +0000 (00:48 +0000)]
igb: add i211 to i210 PHY workaround
[ Upstream commit
5bc8c230e2a993b49244f9457499f17283da9ec7 ]
i210 and i211 share the same PHY but have different PCI IDs. Don't
forget i211 for any i210 workarounds.
Signed-off-by: Todd Fujinaka <todd.fujinaka@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Chris J Arges [Fri, 17 Mar 2017 00:48:19 +0000 (00:48 +0000)]
igb: Workaround for igb i210 firmware issue
[ Upstream commit
4e684f59d760a2c7c716bb60190783546e2d08a1 ]
Sometimes firmware may not properly initialize I347AT4_PAGE_SELECT causing
the probe of an igb i210 NIC to fail. This patch adds an addition zeroing
of this register during igb_get_phy_id to workaround this issue.
Thanks for Jochen Henneberg for the idea and original patch.
Signed-off-by: Chris J Arges <christopherarges@gmail.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Dan Streetman [Fri, 17 Mar 2017 00:48:18 +0000 (00:48 +0000)]
xen: do not re-use pirq number cached in pci device msi msg data
[ Upstream commit
c74fd80f2f41d05f350bb478151021f88551afe8 ]
Revert the main part of commit:
af42b8d12f8a ("xen: fix MSI setup and teardown for PV on HVM guests")
That commit introduced reading the pci device's msi message data to see
if a pirq was previously configured for the device's msi/msix, and re-use
that pirq. At the time, that was the correct behavior. However, a
later change to Qemu caused it to call into the Xen hypervisor to unmap
all pirqs for a pci device, when the pci device disables its MSI/MSIX
vectors; specifically the Qemu commit:
c976437c7dba9c7444fb41df45468968aaa326ad
("qemu-xen: free all the pirqs for msi/msix when driver unload")
Once Qemu added this pirq unmapping, it was no longer correct for the
kernel to re-use the pirq number cached in the pci device msi message
data. All Qemu releases since 2.1.0 contain the patch that unmaps the
pirqs when the pci device disables its MSI/MSIX vectors.
This bug is causing failures to initialize multiple NVMe controllers
under Xen, because the NVMe driver sets up a single MSIX vector for
each controller (concurrently), and then after using that to talk to
the controller for some configuration data, it disables the single MSIX
vector and re-configures all the MSIX vectors it needs. So the MSIX
setup code tries to re-use the cached pirq from the first vector
for each controller, but the hypervisor has already given away that
pirq to another controller, and its initialization fails.
This is discussed in more detail at:
https://lists.xen.org/archives/html/xen-devel/2017-01/msg00447.html
Fixes:
af42b8d12f8a ("xen: fix MSI setup and teardown for PV on HVM guests")
Signed-off-by: Dan Streetman <dan.streetman@canonical.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Krister Johansen [Wed, 4 Jan 2017 09:22:52 +0000 (01:22 -0800)]
dmaengine: iota: ioat_alloc_chan_resources should not perform sleeping allocations.
commit
21d25f6a4217e755906cb548b55ddab39d0e88b9 upstream.
On a kernel with DEBUG_LOCKS, ioat_free_chan_resources triggers an
in_interrupt() warning. With PROVE_LOCKING, it reports detecting a
SOFTIRQ-safe to SOFTIRQ-unsafe lock ordering in the same code path.
This is because dma_generic_alloc_coherent() checks if the GFP flags
permit blocking. It allocates from different subsystems if blocking is
permitted. The free path knows how to return the memory to the correct
allocator. If GFP_KERNEL is specified then the alloc and free end up
going through cma_alloc(), which uses mutexes.
Given that ioat_free_chan_resources() can be called in interrupt
context, ioat_alloc_chan_resources() must specify GFP_NOWAIT so that the
allocations do not block and instead use an allocator that uses
spinlocks.
Signed-off-by: Krister Johansen <kjlx@templeofstupid.com>
Acked-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Daniel Borkmann [Sun, 18 Dec 2016 00:52:59 +0000 (01:52 +0100)]
bpf: fix mark_reg_unknown_value for spilled regs on map value marking
[ Upstream commit
6760bf2ddde8ad64f8205a651223a93de3a35494 ]
Martin reported a verifier issue that hit the BUG_ON() for his
test case in the mark_reg_unknown_value() function:
[ 202.861380] kernel BUG at kernel/bpf/verifier.c:467!
[...]
[ 203.291109] Call Trace:
[ 203.296501] [<
ffffffff811364d5>] mark_map_reg+0x45/0x50
[ 203.308225] [<
ffffffff81136558>] mark_map_regs+0x78/0x90
[ 203.320140] [<
ffffffff8113938d>] do_check+0x226d/0x2c90
[ 203.331865] [<
ffffffff8113a6ab>] bpf_check+0x48b/0x780
[ 203.343403] [<
ffffffff81134c8e>] bpf_prog_load+0x27e/0x440
[ 203.355705] [<
ffffffff8118a38f>] ? handle_mm_fault+0x11af/0x1230
[ 203.369158] [<
ffffffff812d8188>] ? security_capable+0x48/0x60
[ 203.382035] [<
ffffffff811351a4>] SyS_bpf+0x124/0x960
[ 203.393185] [<
ffffffff810515f6>] ? __do_page_fault+0x276/0x490
[ 203.406258] [<
ffffffff816db320>] entry_SYSCALL_64_fastpath+0x13/0x94
This issue got uncovered after the fix in
a08dd0da5307 ("bpf: fix
regression on verifier pruning wrt map lookups"). The reason why it
wasn't noticed before was, because as mentioned in
a08dd0da5307,
mark_map_regs() was doing the id matching incorrectly based on the
uncached regs[regno].id. So, in the first loop, we walked all regs
and as soon as we found regno == i, then this reg's id was cleared
when calling mark_reg_unknown_value() thus that every subsequent
register was probed against id of 0 (which, in combination with the
PTR_TO_MAP_VALUE_OR_NULL type is an invalid condition that no other
register state can hold), and therefore wasn't type transitioned such
as in the spilled register case for the second loop.
Now since that got fixed, it turned out that
57a09bf0a416 ("bpf:
Detect identical PTR_TO_MAP_VALUE_OR_NULL registers") used
mark_reg_unknown_value() incorrectly for the spilled regs, and thus
hitting the BUG_ON() in some cases due to regno >= MAX_BPF_REG.
Although spilled regs have the same type as the non-spilled regs
for the verifier state, that is, struct bpf_reg_state, they are
semantically different from the non-spilled regs. In other words,
there can be up to 64 (MAX_BPF_STACK / BPF_REG_SIZE) spilled regs
in the stack, for example, register R<x> could have been spilled by
the program to stack location X, Y, Z, and in mark_map_regs() we
need to scan these stack slots of type STACK_SPILL for potential
registers that we have to transition from PTR_TO_MAP_VALUE_OR_NULL.
Therefore, depending on the location, the spilled_regs regno can
be a lot higher than just MAX_BPF_REG's value since we operate on
stack instead. The reset in mark_reg_unknown_value() itself is
just fine, only that the BUG_ON() was inappropriate for this. Fix
it by making a __mark_reg_unknown_value() version that can be
called from mark_map_reg() generically; we know for the non-spilled
case that the regno is always < MAX_BPF_REG anyway.
Fixes:
57a09bf0a416 ("bpf: Detect identical PTR_TO_MAP_VALUE_OR_NULL registers")
Reported-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Daniel Borkmann [Thu, 15 Dec 2016 00:30:06 +0000 (01:30 +0100)]
bpf: fix regression on verifier pruning wrt map lookups
[ Upstream commit
a08dd0da5307ba01295c8383923e51e7997c3576 ]
Commit
57a09bf0a416 ("bpf: Detect identical PTR_TO_MAP_VALUE_OR_NULL
registers") introduced a regression where existing programs stopped
loading due to reaching the verifier's maximum complexity limit,
whereas prior to this commit they were loading just fine; the affected
program has roughly 2k instructions.
What was found is that state pruning couldn't be performed effectively
anymore due to mismatches of the verifier's register state, in particular
in the id tracking. It doesn't mean that
57a09bf0a416 is incorrect per
se, but rather that verifier needs to perform a lot more work for the
same program with regards to involved map lookups.
Since commit
57a09bf0a416 is only about tracking registers with type
PTR_TO_MAP_VALUE_OR_NULL, the id is only needed to follow registers
until they are promoted through pattern matching with a NULL check to
either PTR_TO_MAP_VALUE or UNKNOWN_VALUE type. After that point, the
id becomes irrelevant for the transitioned types.
For UNKNOWN_VALUE, id is already reset to 0 via mark_reg_unknown_value(),
but not so for PTR_TO_MAP_VALUE where id is becoming stale. It's even
transferred further into other types that don't make use of it. Among
others, one example is where UNKNOWN_VALUE is set on function call
return with RET_INTEGER return type.
states_equal() will then fall through the memcmp() on register state;
note that the second memcmp() uses offsetofend(), so the id is part of
that since
d2a4dd37f6b4 ("bpf: fix state equivalence"). But the bisect
pointed already to
57a09bf0a416, where we really reach beyond complexity
limit. What I found was that states_equal() often failed in this
case due to id mismatches in spilled regs with registers in type
PTR_TO_MAP_VALUE. Unlike non-spilled regs, spilled regs just perform
a memcmp() on their reg state and don't have any other optimizations
in place, therefore also id was relevant in this case for making a
pruning decision.
We can safely reset id to 0 as well when converting to PTR_TO_MAP_VALUE.
For the affected program, it resulted in a ~17 fold reduction of
complexity and let the program load fine again. Selftest suite also
runs fine. The only other place where env->id_gen is used currently is
through direct packet access, but for these cases id is long living, thus
a different scenario.
Also, the current logic in mark_map_regs() is not fully correct when
marking NULL branch with UNKNOWN_VALUE. We need to cache the destination
reg's id in any case. Otherwise, once we marked that reg as UNKNOWN_VALUE,
it's id is reset and any subsequent registers that hold the original id
and are of type PTR_TO_MAP_VALUE_OR_NULL won't be marked UNKNOWN_VALUE
anymore, since mark_map_reg() reuses the uncached regs[regno].id that
was just overridden. Note, we don't need to cache it outside of
mark_map_regs(), since it's called once on this_branch and the other
time on other_branch, which are both two independent verifier states.
A test case for this is added here, too.
Fixes:
57a09bf0a416 ("bpf: Detect identical PTR_TO_MAP_VALUE_OR_NULL registers")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Alexei Starovoitov [Wed, 7 Dec 2016 18:57:59 +0000 (10:57 -0800)]
bpf: fix state equivalence
[ Upstream commit
d2a4dd37f6b41fbcad76efbf63124eb3126c66fe ]
Commmits
57a09bf0a416 ("bpf: Detect identical PTR_TO_MAP_VALUE_OR_NULL registers")
and
484611357c19 ("bpf: allow access into map value arrays") by themselves
are correct, but in combination they make state equivalence ignore 'id' field
of the register state which can lead to accepting invalid program.
Fixes:
57a09bf0a416 ("bpf: Detect identical PTR_TO_MAP_VALUE_OR_NULL registers")
Fixes:
484611357c19 ("bpf: allow access into map value arrays")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Thomas Graf [Tue, 18 Oct 2016 17:51:19 +0000 (19:51 +0200)]
bpf: Detect identical PTR_TO_MAP_VALUE_OR_NULL registers
[ Upstream commit
57a09bf0a416700676e77102c28f9cfcb48267e0 ]
A BPF program is required to check the return register of a
map_elem_lookup() call before accessing memory. The verifier keeps
track of this by converting the type of the result register from
PTR_TO_MAP_VALUE_OR_NULL to PTR_TO_MAP_VALUE after a conditional
jump ensures safety. This check is currently exclusively performed
for the result register 0.
In the event the compiler reorders instructions, BPF_MOV64_REG
instructions may be moved before the conditional jump which causes
them to keep their type PTR_TO_MAP_VALUE_OR_NULL to which the
verifier objects when the register is accessed:
0: (b7) r1 = 10
1: (7b) *(u64 *)(r10 -8) = r1
2: (bf) r2 = r10
3: (07) r2 += -8
4: (18) r1 = 0x59c00000
6: (85) call 1
7: (bf) r4 = r0
8: (15) if r0 == 0x0 goto pc+1
R0=map_value(ks=8,vs=8) R4=map_value_or_null(ks=8,vs=8) R10=fp
9: (7a) *(u64 *)(r4 +0) = 0
R4 invalid mem access 'map_value_or_null'
This commit extends the verifier to keep track of all identical
PTR_TO_MAP_VALUE_OR_NULL registers after a map_elem_lookup() by
assigning them an ID and then marking them all when the conditional
jump is observed.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Hannes Frederic Sowa [Sun, 12 Mar 2017 23:01:30 +0000 (00:01 +0100)]
dccp: fix memory leak during tear-down of unsuccessful connection request
[ Upstream commit
72ef9c4125c7b257e3a714d62d778ab46583d6a3 ]
This patch fixes a memory leak, which happens if the connection request
is not fulfilled between parsing the DCCP options and handling the SYN
(because e.g. the backlog is full), because we forgot to free the
list of ack vectors.
Reported-by: Jianwen Ji <jiji@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Hannes Frederic Sowa [Sun, 12 Mar 2017 23:00:26 +0000 (00:00 +0100)]
tun: fix premature POLLOUT notification on tun devices
[ Upstream commit
b20e2d54789c6acbf6bd0efdbec2cf5fa4d90ef1 ]
aszlig observed failing ssh tunnels (-w) during initialization since
commit
cc9da6cc4f56e0 ("ipv6: addrconf: use stable address generator for
ARPHRD_NONE"). We already had reports that the mentioned commit breaks
Juniper VPN connections. I can't clearly say that the Juniper VPN client
has the same problem, but it is worth a try to hint to this patch.
Because of the early generation of link local addresses, the kernel now
can start asking for routers on the local subnet much earlier than usual.
Those router solicitation packets arrive inside the ssh channels and
should be transmitted to the tun fd before the configuration scripts
might have upped the interface and made it ready for transmission.
ssh polls on the interface and receives back a POLL_OUT. It tries to send
the earily router solicitation packet to the tun interface. Unfortunately
it hasn't been up'ed yet by config scripts, thus failing with -EIO. ssh
doesn't retry again and considers the tun interface broken forever.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=121131
Fixes:
cc9da6cc4f56 ("ipv6: addrconf: use stable address generator for ARPHRD_NONE")
Cc: Bjørn Mork <bjorn@mork.no>
Reported-by: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Reported-by: Jonas Lippuner <jonas@lippuner.ca>
Cc: Jonas Lippuner <jonas@lippuner.ca>
Reported-by: aszlig <aszlig@redmoonstudios.org>
Cc: aszlig <aszlig@redmoonstudios.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jon Maxwell [Fri, 10 Mar 2017 05:40:33 +0000 (16:40 +1100)]
dccp/tcp: fix routing redirect race
[ Upstream commit
45caeaa5ac0b4b11784ac6f932c0ad4c6b67cda0 ]
As Eric Dumazet pointed out this also needs to be fixed in IPv6.
v2: Contains the IPv6 tcp/Ipv6 dccp patches as well.
We have seen a few incidents lately where a dst_enty has been freed
with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that
dst_entry. If the conditions/timings are right a crash then ensues when the
freed dst_entry is referenced later on. A Common crashing back trace is:
#8 [] page_fault at
ffffffff8163e648
[exception RIP: __tcp_ack_snd_check+74]
.
.
#9 [] tcp_rcv_established at
ffffffff81580b64
#10 [] tcp_v4_do_rcv at
ffffffff8158b54a
#11 [] tcp_v4_rcv at
ffffffff8158cd02
#12 [] ip_local_deliver_finish at
ffffffff815668f4
#13 [] ip_local_deliver at
ffffffff81566bd9
#14 [] ip_rcv_finish at
ffffffff8156656d
#15 [] ip_rcv at
ffffffff81566f06
#16 [] __netif_receive_skb_core at
ffffffff8152b3a2
#17 [] __netif_receive_skb at
ffffffff8152b608
#18 [] netif_receive_skb at
ffffffff8152b690
#19 [] vmxnet3_rq_rx_complete at
ffffffffa015eeaf [vmxnet3]
#20 [] vmxnet3_poll_rx_only at
ffffffffa015f32a [vmxnet3]
#21 [] net_rx_action at
ffffffff8152bac2
#22 [] __do_softirq at
ffffffff81084b4f
#23 [] call_softirq at
ffffffff8164845c
#24 [] do_softirq at
ffffffff81016fc5
#25 [] irq_exit at
ffffffff81084ee5
#26 [] do_IRQ at
ffffffff81648ff8
Of course it may happen with other NIC drivers as well.
It's found the freed dst_entry here:
224 static bool tcp_in_quickack_mode(struct sock *sk)↩
225 {↩
226 ▹ const struct inet_connection_sock *icsk = inet_csk(sk);↩
227 ▹ const struct dst_entry *dst = __sk_dst_get(sk);↩
228 ↩
229 ▹ return (dst && dst_metric(dst, RTAX_QUICKACK)) ||↩
230 ▹ ▹ (icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong);↩
231 }↩
But there are other backtraces attributed to the same freed dst_entry in
netfilter code as well.
All the vmcores showed 2 significant clues:
- Remote hosts behind the default gateway had always been redirected to a
different gateway. A rtable/dst_entry will be added for that host. Making
more dst_entrys with lower reference counts. Making this more probable.
- All vmcores showed a postitive LockDroppedIcmps value, e.g:
LockDroppedIcmps 267
A closer look at the tcp_v4_err() handler revealed that do_redirect() will run
regardless of whether user space has the socket locked. This can result in a
race condition where the same dst_entry cached in sk->sk_dst_entry can be
decremented twice for the same socket via:
do_redirect()->__sk_dst_check()-> dst_release().
Which leads to the dst_entry being prematurely freed with another socket
pointing to it via sk->sk_dst_cache and a subsequent crash.
To fix this skip do_redirect() if usespace has the socket locked. Instead let
the redirect take place later when user space does not have the socket
locked.
The dccp/IPv6 code is very similar in this respect, so fixing it there too.
As Eric Garver pointed out the following commit now invalidates routes. Which
can set the dst->obsolete flag so that ipv4_dst_check() returns null and
triggers the dst_release().
Fixes:
ceb3320610d6 ("ipv4: Kill routes during PMTU/redirect updates.")
Cc: Eric Garver <egarver@redhat.com>
Cc: Hannes Sowa <hsowa@redhat.com>
Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Florian Westphal [Mon, 13 Mar 2017 16:38:17 +0000 (17:38 +0100)]
bridge: drop netfilter fake rtable unconditionally
[ Upstream commit
a13b2082ece95247779b9995c4e91b4246bed023 ]
Andreas reports kernel oops during rmmod of the br_netfilter module.
Hannes debugged the oops down to a NULL rt6info->rt6i_indev.
Problem is that br_netfilter has the nasty concept of adding a fake
rtable to skb->dst; this happens in a br_netfilter prerouting hook.
A second hook (in bridge LOCAL_IN) is supposed to remove these again
before the skb is handed up the stack.
However, on module unload hooks get unregistered which means an
skb could traverse the prerouting hook that attaches the fake_rtable,
while the 'fake rtable remove' hook gets removed from the hooklist
immediately after.
Fixes:
34666d467cbf1e2e3c7 ("netfilter: bridge: move br_netfilter out of the core")
Reported-by: Andreas Karis <akaris@redhat.com>
Debugged-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Florian Westphal [Mon, 13 Mar 2017 15:24:28 +0000 (16:24 +0100)]
ipv6: avoid write to a possibly cloned skb
[ Upstream commit
79e49503efe53a8c51d8b695bedc8a346c5e4a87 ]
ip6_fragment, in case skb has a fraglist, checks if the
skb is cloned. If it is, it will move to the 'slow path' and allocates
new skbs for each fragment.
However, right before entering the slowpath loop, it updates the
nexthdr value of the last ipv6 extension header to NEXTHDR_FRAGMENT,
to account for the fragment header that will be inserted in the new
ipv6-fragment skbs.
In case original skb is cloned this munges nexthdr value of another
skb. Avoid this by doing the nexthdr update for each of the new fragment
skbs separately.
This was observed with tcpdump on a bridge device where netfilter ipv6
reassembly is active: tcpdump shows malformed fragment headers as
the l4 header (icmpv6, tcp, etc). is decoded as a fragment header.
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Reported-by: Andreas Karis <akaris@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sabrina Dubroca [Mon, 13 Mar 2017 12:28:09 +0000 (13:28 +0100)]
ipv6: make ECMP route replacement less greedy
[ Upstream commit
67e194007be08d071294456274dd53e0a04fdf90 ]
Commit
27596472473a ("ipv6: fix ECMP route replacement") introduced a
loop that removes all siblings of an ECMP route that is being
replaced. However, this loop doesn't stop when it has replaced
siblings, and keeps removing other routes with a higher metric.
We also end up triggering the WARN_ON after the loop, because after
this nsiblings < 0.
Instead, stop the loop when we have taken care of all routes with the
same metric as the route being replaced.
Reproducer:
===========
#!/bin/sh
ip netns add ns1
ip netns add ns2
ip -net ns1 link set lo up
for x in 0 1 2 ; do
ip link add veth$x netns ns2 type veth peer name eth$x netns ns1
ip -net ns1 link set eth$x up
ip -net ns2 link set veth$x up
done
ip -net ns1 -6 r a 2000::/64 nexthop via fe80::0 dev eth0 \
nexthop via fe80::1 dev eth1 nexthop via fe80::2 dev eth2
ip -net ns1 -6 r a 2000::/64 via fe80::42 dev eth0 metric 256
ip -net ns1 -6 r a 2000::/64 via fe80::43 dev eth0 metric 2048
echo "before replace, 3 routes"
ip -net ns1 -6 r | grep -v '^fe80\|^ff00'
echo
ip -net ns1 -6 r c 2000::/64 nexthop via fe80::4 dev eth0 \
nexthop via fe80::5 dev eth1 nexthop via fe80::6 dev eth2
echo "after replace, only 2 routes, metric 2048 is gone"
ip -net ns1 -6 r | grep -v '^fe80\|^ff00'
Fixes:
27596472473a ("ipv6: fix ECMP route replacement")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
David Ahern [Fri, 10 Mar 2017 22:11:39 +0000 (14:11 -0800)]
mpls: Do not decrement alive counter for unregister events
[ Upstream commit
79099aab38c8f5c746748b066ae74ba984fe2cc8 ]
Multipath routes can be rendered usesless when a device in one of the
paths is deleted. For example:
$ ip -f mpls ro ls
100
nexthop as to 200 via inet 172.16.2.2 dev virt12
nexthop as to 300 via inet 172.16.3.2 dev br0
101
nexthop as to 201 via inet6 2000:2::2 dev virt12
nexthop as to 301 via inet6 2000:3::2 dev br0
$ ip li del br0
When br0 is deleted the other hop is not considered in
mpls_select_multipath because of the alive check -- rt_nhn_alive
is 0.
rt_nhn_alive is decremented once in mpls_ifdown when the device is taken
down (NETDEV_DOWN) and again when it is deleted (NETDEV_UNREGISTER). For
a 2 hop route, deleting one device drops the alive count to 0. Since
devices are taken down before unregistering, the decrement on
NETDEV_UNREGISTER is redundant.
Fixes:
c89359a42e2a4 ("mpls: support for dead routes")
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
David Ahern [Fri, 10 Mar 2017 17:46:15 +0000 (09:46 -0800)]
mpls: Send route delete notifications when router module is unloaded
[ Upstream commit
e37791ec1ad785b59022ae211f63a16189bacebf ]
When the mpls_router module is unloaded, mpls routes are deleted but
notifications are not sent to userspace leaving userspace caches
out of sync. Add the call to mpls_notify_route in mpls_net_exit as
routes are freed.
Fixes:
0189197f44160 ("mpls: Basic routing support")
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Etienne Noss [Fri, 10 Mar 2017 15:55:32 +0000 (16:55 +0100)]
act_connmark: avoid crashing on malformed nlattrs with null parms
[ Upstream commit
52491c7607c5527138095edf44c53169dc1ddb82 ]
tcf_connmark_init does not check in its configuration if TCA_CONNMARK_PARMS
is set, resulting in a null pointer dereference when trying to access it.
[501099.043007] BUG: unable to handle kernel NULL pointer dereference at
0000000000000004
[501099.043039] IP: [<
ffffffffc10c60fb>] tcf_connmark_init+0x8b/0x180 [act_connmark]
...
[501099.044334] Call Trace:
[501099.044345] [<
ffffffffa47270e8>] ? tcf_action_init_1+0x198/0x1b0
[501099.044363] [<
ffffffffa47271b0>] ? tcf_action_init+0xb0/0x120
[501099.044380] [<
ffffffffa47250a4>] ? tcf_exts_validate+0xc4/0x110
[501099.044398] [<
ffffffffc0f5fa97>] ? u32_set_parms+0xa7/0x270 [cls_u32]
[501099.044417] [<
ffffffffc0f60bf0>] ? u32_change+0x680/0x87b [cls_u32]
[501099.044436] [<
ffffffffa4725d1d>] ? tc_ctl_tfilter+0x4dd/0x8a0
[501099.044454] [<
ffffffffa44a23a1>] ? security_capable+0x41/0x60
[501099.044471] [<
ffffffffa470ca01>] ? rtnetlink_rcv_msg+0xe1/0x220
[501099.044490] [<
ffffffffa470c920>] ? rtnl_newlink+0x870/0x870
[501099.044507] [<
ffffffffa472cc61>] ? netlink_rcv_skb+0xa1/0xc0
[501099.044524] [<
ffffffffa47073f4>] ? rtnetlink_rcv+0x24/0x30
[501099.044541] [<
ffffffffa472c634>] ? netlink_unicast+0x184/0x230
[501099.044558] [<
ffffffffa472c9d8>] ? netlink_sendmsg+0x2f8/0x3b0
[501099.044576] [<
ffffffffa46d8880>] ? sock_sendmsg+0x30/0x40
[501099.044592] [<
ffffffffa46d8e03>] ? SYSC_sendto+0xd3/0x150
[501099.044608] [<
ffffffffa425fda1>] ? __do_page_fault+0x2d1/0x510
[501099.044626] [<
ffffffffa47fbd7b>] ? system_call_fast_compare_end+0xc/0x9b
Fixes:
22a5dc0e5e3e ("net: sched: Introduce connmark action")
Signed-off-by: Étienne Noss <etienne.noss@wifirst.fr>
Signed-off-by: Victorien Molle <victorien.molle@wifirst.fr>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Dmitry V. Levin [Tue, 7 Mar 2017 20:50:50 +0000 (23:50 +0300)]
uapi: fix linux/packet_diag.h userspace compilation error
[ Upstream commit
745cb7f8a5de0805cade3de3991b7a95317c7c73 ]
Replace MAX_ADDR_LEN with its numeric value to fix the following
linux/packet_diag.h userspace compilation error:
/usr/include/linux/packet_diag.h:67:17: error: 'MAX_ADDR_LEN' undeclared here (not in a function)
__u8 pdmc_addr[MAX_ADDR_LEN];
This is not the first case in the UAPI where the numeric value
of MAX_ADDR_LEN is used instead of symbolic one, uapi/linux/if_link.h
already does the same:
$ grep MAX_ADDR_LEN include/uapi/linux/if_link.h
__u8 mac[32]; /* MAX_ADDR_LEN */
There are no UAPI headers besides these two that use MAX_ADDR_LEN.
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Acked-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Paolo Abeni [Tue, 7 Mar 2017 17:33:31 +0000 (18:33 +0100)]
net/tunnel: set inner protocol in network gro hooks
[ Upstream commit
294acf1c01bace5cea5d30b510504238bf5f7c25 ]
The gso code of several tunnels type (gre and udp tunnels)
takes for granted that the skb->inner_protocol is properly
initialized and drops the packet elsewhere.
On the forwarding path no one is initializing such field,
so gro encapsulated packets are dropped on forward.
Since commit
38720352412a ("gre: Use inner_proto to obtain
inner header protocol"), this can be reproduced when the
encapsulated packets use gre as the tunneling protocol.
The issue happens also with vxlan and geneve tunnels since
commit
8bce6d7d0d1e ("udp: Generalize skb_udp_segment"), if the
forwarding host's ingress nic has h/w offload for such tunnel
and a vxlan/geneve device is configured on top of it, regardless
of the configured peer address and vni.
To address the issue, this change initialize the inner_protocol
field for encapsulated packets in both ipv4 and ipv6 gro complete
callbacks.
Fixes:
38720352412a ("gre: Use inner_proto to obtain inner header protocol")
Fixes:
8bce6d7d0d1e ("udp: Generalize skb_udp_segment")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
David Ahern [Mon, 6 Mar 2017 16:53:04 +0000 (08:53 -0800)]
vrf: Fix use-after-free in vrf_xmit
[ Upstream commit
f7887d40e541f74402df0684a1463c0a0bb68c68 ]
KASAN detected a use-after-free:
[ 269.467067] BUG: KASAN: use-after-free in vrf_xmit+0x7f1/0x827 [vrf] at addr
ffff8800350a21c0
[ 269.467067] Read of size 4 by task ssh/1879
[ 269.467067] CPU: 1 PID: 1879 Comm: ssh Not tainted 4.10.0+ #249
[ 269.467067] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
[ 269.467067] Call Trace:
[ 269.467067] dump_stack+0x81/0xb6
[ 269.467067] kasan_object_err+0x21/0x78
[ 269.467067] kasan_report+0x2f7/0x450
[ 269.467067] ? vrf_xmit+0x7f1/0x827 [vrf]
[ 269.467067] ? ip_output+0xa4/0xdb
[ 269.467067] __asan_load4+0x6b/0x6d
[ 269.467067] vrf_xmit+0x7f1/0x827 [vrf]
...
Which corresponds to the skb access after xmit handling. Fix by saving
skb->len and using the saved value to update stats.
Fixes:
193125dbd8eb2 ("net: Introduce VRF device driver")
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Eric Dumazet [Sun, 5 Mar 2017 18:52:16 +0000 (10:52 -0800)]
dccp: fix use-after-free in dccp_feat_activate_values
[ Upstream commit
62f8f4d9066c1c6f2474845d1ca7e2891f2ae3fd ]
Dmitry reported crashes in DCCP stack [1]
Problem here is that when I got rid of listener spinlock, I missed the
fact that DCCP stores a complex state in struct dccp_request_sock,
while TCP does not.
Since multiple cpus could access it at the same time, we need to add
protection.
[1]
BUG: KASAN: use-after-free in dccp_feat_activate_values+0x967/0xab0
net/dccp/feat.c:1541 at addr
ffff88003713be68
Read of size 8 by task syz-executor2/8457
CPU: 2 PID: 8457 Comm: syz-executor2 Not tainted 4.10.0-rc7+ #127
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:15 [inline]
dump_stack+0x292/0x398 lib/dump_stack.c:51
kasan_object_err+0x1c/0x70 mm/kasan/report.c:162
print_address_description mm/kasan/report.c:200 [inline]
kasan_report_error mm/kasan/report.c:289 [inline]
kasan_report.part.1+0x20e/0x4e0 mm/kasan/report.c:311
kasan_report mm/kasan/report.c:332 [inline]
__asan_report_load8_noabort+0x29/0x30 mm/kasan/report.c:332
dccp_feat_activate_values+0x967/0xab0 net/dccp/feat.c:1541
dccp_create_openreq_child+0x464/0x610 net/dccp/minisocks.c:121
dccp_v6_request_recv_sock+0x1f6/0x1960 net/dccp/ipv6.c:457
dccp_check_req+0x335/0x5a0 net/dccp/minisocks.c:186
dccp_v6_rcv+0x69e/0x1d00 net/dccp/ipv6.c:711
ip6_input_finish+0x46d/0x17a0 net/ipv6/ip6_input.c:279
NF_HOOK include/linux/netfilter.h:257 [inline]
ip6_input+0xdb/0x590 net/ipv6/ip6_input.c:322
dst_input include/net/dst.h:507 [inline]
ip6_rcv_finish+0x289/0x890 net/ipv6/ip6_input.c:69
NF_HOOK include/linux/netfilter.h:257 [inline]
ipv6_rcv+0x12ec/0x23d0 net/ipv6/ip6_input.c:203
__netif_receive_skb_core+0x1ae5/0x3400 net/core/dev.c:4190
__netif_receive_skb+0x2a/0x170 net/core/dev.c:4228
process_backlog+0xe5/0x6c0 net/core/dev.c:4839
napi_poll net/core/dev.c:5202 [inline]
net_rx_action+0xe70/0x1900 net/core/dev.c:5267
__do_softirq+0x2fb/0xb7d kernel/softirq.c:284
do_softirq_own_stack+0x1c/0x30 arch/x86/entry/entry_64.S:902
</IRQ>
do_softirq.part.17+0x1e8/0x230 kernel/softirq.c:328
do_softirq kernel/softirq.c:176 [inline]
__local_bh_enable_ip+0x1f2/0x200 kernel/softirq.c:181
local_bh_enable include/linux/bottom_half.h:31 [inline]
rcu_read_unlock_bh include/linux/rcupdate.h:971 [inline]
ip6_finish_output2+0xbb0/0x23d0 net/ipv6/ip6_output.c:123
ip6_finish_output+0x302/0x960 net/ipv6/ip6_output.c:148
NF_HOOK_COND include/linux/netfilter.h:246 [inline]
ip6_output+0x1cb/0x8d0 net/ipv6/ip6_output.c:162
ip6_xmit+0xcdf/0x20d0 include/net/dst.h:501
inet6_csk_xmit+0x320/0x5f0 net/ipv6/inet6_connection_sock.c:179
dccp_transmit_skb+0xb09/0x1120 net/dccp/output.c:141
dccp_xmit_packet+0x215/0x760 net/dccp/output.c:280
dccp_write_xmit+0x168/0x1d0 net/dccp/output.c:362
dccp_sendmsg+0x79c/0xb10 net/dccp/proto.c:796
inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:744
sock_sendmsg_nosec net/socket.c:635 [inline]
sock_sendmsg+0xca/0x110 net/socket.c:645
SYSC_sendto+0x660/0x810 net/socket.c:1687
SyS_sendto+0x40/0x50 net/socket.c:1655
entry_SYSCALL_64_fastpath+0x1f/0xc2
RIP: 0033:0x4458b9
RSP: 002b:
00007f8ceb77bb58 EFLAGS:
00000282 ORIG_RAX:
000000000000002c
RAX:
ffffffffffffffda RBX:
0000000000000017 RCX:
00000000004458b9
RDX:
0000000000000023 RSI:
0000000020e60000 RDI:
0000000000000017
RBP:
00000000006e1b90 R08:
00000000200f9fe1 R09:
0000000000000020
R10:
0000000000008010 R11:
0000000000000282 R12:
00000000007080a8
R13:
0000000000000000 R14:
00007f8ceb77c9c0 R15:
00007f8ceb77c700
Object at
ffff88003713be50, in cache kmalloc-64 size: 64
Allocated:
PID = 8446
save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57
save_stack+0x43/0xd0 mm/kasan/kasan.c:502
set_track mm/kasan/kasan.c:514 [inline]
kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:605
kmem_cache_alloc_trace+0x82/0x270 mm/slub.c:2738
kmalloc include/linux/slab.h:490 [inline]
dccp_feat_entry_new+0x214/0x410 net/dccp/feat.c:467
dccp_feat_push_change+0x38/0x220 net/dccp/feat.c:487
__feat_register_sp+0x223/0x2f0 net/dccp/feat.c:741
dccp_feat_propagate_ccid+0x22b/0x2b0 net/dccp/feat.c:949
dccp_feat_server_ccid_dependencies+0x1b3/0x250 net/dccp/feat.c:1012
dccp_make_response+0x1f1/0xc90 net/dccp/output.c:423
dccp_v6_send_response+0x4ec/0xc20 net/dccp/ipv6.c:217
dccp_v6_conn_request+0xaba/0x11b0 net/dccp/ipv6.c:377
dccp_rcv_state_process+0x51e/0x1650 net/dccp/input.c:606
dccp_v6_do_rcv+0x213/0x350 net/dccp/ipv6.c:632
sk_backlog_rcv include/net/sock.h:893 [inline]
__sk_receive_skb+0x36f/0xcc0 net/core/sock.c:479
dccp_v6_rcv+0xba5/0x1d00 net/dccp/ipv6.c:742
ip6_input_finish+0x46d/0x17a0 net/ipv6/ip6_input.c:279
NF_HOOK include/linux/netfilter.h:257 [inline]
ip6_input+0xdb/0x590 net/ipv6/ip6_input.c:322
dst_input include/net/dst.h:507 [inline]
ip6_rcv_finish+0x289/0x890 net/ipv6/ip6_input.c:69
NF_HOOK include/linux/netfilter.h:257 [inline]
ipv6_rcv+0x12ec/0x23d0 net/ipv6/ip6_input.c:203
__netif_receive_skb_core+0x1ae5/0x3400 net/core/dev.c:4190
__netif_receive_skb+0x2a/0x170 net/core/dev.c:4228
process_backlog+0xe5/0x6c0 net/core/dev.c:4839
napi_poll net/core/dev.c:5202 [inline]
net_rx_action+0xe70/0x1900 net/core/dev.c:5267
__do_softirq+0x2fb/0xb7d kernel/softirq.c:284
Freed:
PID = 15
save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57
save_stack+0x43/0xd0 mm/kasan/kasan.c:502
set_track mm/kasan/kasan.c:514 [inline]
kasan_slab_free+0x73/0xc0 mm/kasan/kasan.c:578
slab_free_hook mm/slub.c:1355 [inline]
slab_free_freelist_hook mm/slub.c:1377 [inline]
slab_free mm/slub.c:2954 [inline]
kfree+0xe8/0x2b0 mm/slub.c:3874
dccp_feat_entry_destructor.part.4+0x48/0x60 net/dccp/feat.c:418
dccp_feat_entry_destructor net/dccp/feat.c:416 [inline]
dccp_feat_list_pop net/dccp/feat.c:541 [inline]
dccp_feat_activate_values+0x57f/0xab0 net/dccp/feat.c:1543
dccp_create_openreq_child+0x464/0x610 net/dccp/minisocks.c:121
dccp_v6_request_recv_sock+0x1f6/0x1960 net/dccp/ipv6.c:457
dccp_check_req+0x335/0x5a0 net/dccp/minisocks.c:186
dccp_v6_rcv+0x69e/0x1d00 net/dccp/ipv6.c:711
ip6_input_finish+0x46d/0x17a0 net/ipv6/ip6_input.c:279
NF_HOOK include/linux/netfilter.h:257 [inline]
ip6_input+0xdb/0x590 net/ipv6/ip6_input.c:322
dst_input include/net/dst.h:507 [inline]
ip6_rcv_finish+0x289/0x890 net/ipv6/ip6_input.c:69
NF_HOOK include/linux/netfilter.h:257 [inline]
ipv6_rcv+0x12ec/0x23d0 net/ipv6/ip6_input.c:203
__netif_receive_skb_core+0x1ae5/0x3400 net/core/dev.c:4190
__netif_receive_skb+0x2a/0x170 net/core/dev.c:4228
process_backlog+0xe5/0x6c0 net/core/dev.c:4839
napi_poll net/core/dev.c:5202 [inline]
net_rx_action+0xe70/0x1900 net/core/dev.c:5267
__do_softirq+0x2fb/0xb7d kernel/softirq.c:284
Memory state around the buggy address:
ffff88003713bd00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff88003713bd80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>
ffff88003713be00: fc fc fc fc fc fc fc fc fc fc fb fb fb fb fb fb
^
Fixes:
079096f103fa ("tcp/dccp: install syn_recv requests into ehash table")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Alexey Khoroshilov [Sun, 5 Mar 2017 00:01:55 +0000 (03:01 +0300)]
net/sched: act_skbmod: remove unneeded rcu_read_unlock in tcf_skbmod_dump
[ Upstream commit
6c4dc75c251721f517e9daeb5370ea606b5b35ce ]
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Eric Dumazet [Sat, 4 Mar 2017 05:01:03 +0000 (21:01 -0800)]
net: fix socket refcounting in skb_complete_tx_timestamp()
[ Upstream commit
9ac25fc063751379cb77434fef9f3b088cd3e2f7 ]
TX skbs do not necessarily hold a reference on skb->sk->sk_refcnt
By the time TX completion happens, sk_refcnt might be already 0.
sock_hold()/sock_put() would then corrupt critical state, like
sk_wmem_alloc and lead to leaks or use after free.
Fixes:
62bccb8cdb69 ("net-timestamp: Make the clone operation stand-alone from phy timestamping")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Eric Dumazet [Sat, 4 Mar 2017 05:01:02 +0000 (21:01 -0800)]
net: fix socket refcounting in skb_complete_wifi_ack()
[ Upstream commit
dd4f10722aeb10f4f582948839f066bebe44e5fb ]
TX skbs do not necessarily hold a reference on skb->sk->sk_refcnt
By the time TX completion happens, sk_refcnt might be already 0.
sock_hold()/sock_put() would then corrupt critical state, like
sk_wmem_alloc.
Fixes:
bf7fa551e0ce ("mac80211: Resolve sk_refcnt/sk_wmem_alloc issue in wifi ack path")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>