Andrew F. Davis [Tue, 2 Aug 2016 21:07:09 +0000 (14:07 -0700)]
w1: add helper macro module_w1_family
The helper macro module_w1_family can be used in module drivers that
only register a w1 driver in their module init functions. Add this
macro and use it in all applicable drivers.
Link: http://lkml.kernel.org/r/20160531204313.20979-2-afd@ti.com
Signed-off-by: Andrew F. Davis <afd@ti.com>
Acked-by: Evgeniy Polyakov <zbr@ioremap.net>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew F. Davis [Tue, 2 Aug 2016 21:07:06 +0000 (14:07 -0700)]
w1: remove need for ida and use PLATFORM_DEVID_AUTO
PLATFORM_DEVID_AUTO can be used to have the platform core assign a
unique ID instead of manually creating one with IDA. Do this in all
applicable drivers.
Link: http://lkml.kernel.org/r/20160531204313.20979-1-afd@ti.com
Signed-off-by: Andrew F. Davis <afd@ti.com>
Acked-by: Evgeniy Polyakov <zbr@ioremap.net>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexandre Bounine [Tue, 2 Aug 2016 21:07:03 +0000 (14:07 -0700)]
rapidio/switches: add driver for IDT gen3 switches
Add RapidIO switch driver for IDT Gen3 switch devices: RXS1632 and
RXS2448.
[alexandre.bounine@idt.com: fixup for original driver patch]
Link: http://lkml.kernel.org/r/1469137596-18241-1-git-send-email-alexandre.bounine@idt.com
Link: http://lkml.kernel.org/r/1469125134-16523-14-git-send-email-alexandre.bounine@idt.com
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Tested-by: Barry Wood <barry.wood@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com>
Cc: Barry Wood <barry.wood@idt.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexandre Bounine [Tue, 2 Aug 2016 21:07:00 +0000 (14:07 -0700)]
powerpc/fsl_rio: apply changes for RIO spec rev 3
- Remove check for parallel PHY
- Set LP-Serial Register Map type
[akpm@linux-foundation.org: fix build]
[alexandre.bounine@idt.com: fix build fix]
Link: http://lkml.kernel.org/r/20160802184932.2755-1-alexandre.bounine@idt.com
Link: http://lkml.kernel.org/r/1469125134-16523-13-git-send-email-alexandre.bounine@idt.com
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com>
Cc: Barry Wood <barry.wood@idt.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexandre Bounine [Tue, 2 Aug 2016 21:06:57 +0000 (14:06 -0700)]
rapidio: modify for rev.3 specification changes
Implement changes made in RapidIO specification rev.3 to LP-Serial Physical
Layer register definitions:
- use per-port register offset calculations based on LP-Serial Extended
Features Block (EFB) Register Map type (I or II) with different
per-port offset step (0x20 vs 0x40 respectfully).
- remove deprecated Parallel Physical layer definitions and related
code.
[alexandre.bounine@idt.com: fix DocBook warning for gen3 update]
Link: http://lkml.kernel.org/r/1469191173-19338-1-git-send-email-alexandre.bounine@idt.com
Link: http://lkml.kernel.org/r/1469125134-16523-12-git-send-email-alexandre.bounine@idt.com
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Tested-by: Barry Wood <barry.wood@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com>
Cc: Barry Wood <barry.wood@idt.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexandre Bounine [Tue, 2 Aug 2016 21:06:54 +0000 (14:06 -0700)]
rapidio: change inbound window size type to u64
Current definition of map_inb() mport operations callback uses u32 type
to specify required inbound window (IBW) size. This is limiting factor
because existing hardware - tsi721 and fsl_rio, both support IBW size up
to 16GB.
Changing type of size parameter to u64 to allow IBW size configurations
larger than 4GB.
[alexandre.bounine@idt.com: remove compiler warning about size of constant]
Link: http://lkml.kernel.org/r/20160802184856.2566-1-alexandre.bounine@idt.com
Link: http://lkml.kernel.org/r/1469125134-16523-11-git-send-email-alexandre.bounine@idt.com
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com>
Cc: Barry Wood <barry.wood@idt.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexandre Bounine [Tue, 2 Aug 2016 21:06:52 +0000 (14:06 -0700)]
rapidio/idt_gen2: fix locking warning
Fix lockdep warning during device probing: move sysfs initialization out
of code protected by a spin lock.
Link: http://lkml.kernel.org/r/1469125134-16523-10-git-send-email-alexandre.bounine@idt.com
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com>
Cc: Barry Wood <barry.wood@idt.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexandre Bounine [Tue, 2 Aug 2016 21:06:49 +0000 (14:06 -0700)]
rapidio: fix error handling in mbox request/release functions
Add checking for error code returned by HW-specific mbox open routines.
Ensure that resources are properly release if failed.
This patch is applicable to kernel versions starting from v2.6.15.
Link: http://lkml.kernel.org/r/1469125134-16523-9-git-send-email-alexandre.bounine@idt.com
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com>
Cc: Barry Wood <barry.wood@idt.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexandre Bounine [Tue, 2 Aug 2016 21:06:46 +0000 (14:06 -0700)]
rapidio/tsi721_dma: advance queue processing from transfer submit call
Add advancing transfer queue immediately from transfer submit call. DMA
performance improvement: This will start transfer without waiting for
'issue_pending' command if there is no DMA transfer in progress.
Link: http://lkml.kernel.org/r/1469125134-16523-8-git-send-email-alexandre.bounine@idt.com
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com>
Cc: Barry Wood <barry.wood@idt.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexandre Bounine [Tue, 2 Aug 2016 21:06:43 +0000 (14:06 -0700)]
rapidio/tsi721: add messaging mbox selector parameter
Add module parameter to allow load time configuration of available
RapidIO messaging mailboxes (MBOX1 - MBOX4).
Having a messaging MBOX selector mask allows to define which MBOXes are
controlled by the mport device driver and reserve some of them for
direct use by other drivers.
Link: http://lkml.kernel.org/r/1469125134-16523-7-git-send-email-alexandre.bounine@idt.com
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Tested-by: Barry Wood <barry.wood@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com>
Cc: Barry Wood <barry.wood@idt.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexandre Bounine [Tue, 2 Aug 2016 21:06:40 +0000 (14:06 -0700)]
rapidio/tsi721: add PCIe MRRS override parameter
Add PCIe Maximum Read Request Size (MRRS) adjustment parameter to allow
users to override configuration register value set during PCIe bus
initialization.
Performance of Tsi721 device as PCIe bus master can be improved if MRRS
is set to its maximum value (4096 bytes). Some platforms have
limitations for supported MRRS and therefore the default value should be
preserved, unless it is known that given platform supports full set of
MRRS values defined by PCI Express specification.
Link: http://lkml.kernel.org/r/1469125134-16523-6-git-send-email-alexandre.bounine@idt.com
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com>
Cc: Barry Wood <barry.wood@idt.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexandre Bounine [Tue, 2 Aug 2016 21:06:37 +0000 (14:06 -0700)]
rapidio/tsi721_dma: add channel mask and queue size parameters
Add module parameters to allow load time configuration of DMA channels.
Depending on application, performance of DMA data transfers can benefit
from adjusted sizes of buffer descriptor ring and/or transaction
requests queue.
Having HW DMA channel selector mask allows to define which channels
(from seven available) are controlled by the mport device driver and
reserve some of them for direct use by other drivers.
Link: http://lkml.kernel.org/r/1469125134-16523-5-git-send-email-alexandre.bounine@idt.com
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Tested-by: Barry Wood <barry.wood@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com>
Cc: Barry Wood <barry.wood@idt.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexandre Bounine [Tue, 2 Aug 2016 21:06:34 +0000 (14:06 -0700)]
rapidio: fix return value description for dma_prep functions
Update return value description for rio_dma_prep_... functions to
include error-valued pointer that can be returned by HW mport device
drivers. Return values from these functions must be checked using
IS_ERR_OR_NULL macro.
This patch is applicable to kernel versions starting from v4.6-rc1.
Link: http://lkml.kernel.org/r/1469125134-16523-4-git-send-email-alexandre.bounine@idt.com
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com>
Cc: Barry Wood <barry.wood@idt.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexandre Bounine [Tue, 2 Aug 2016 21:06:31 +0000 (14:06 -0700)]
rapidio/documentation: fix mangled paragraph in mport_cdev
Minor edits to correct parameter description.
This patch is applicable to kernel versions starting from v4.6.
Link: http://lkml.kernel.org/r/1469125134-16523-3-git-send-email-alexandre.bounine@idt.com
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Reported-by: Barry Wood <barry.wood@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com>
Cc: Barry Wood <barry.wood@idt.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joe Perches [Tue, 2 Aug 2016 21:06:28 +0000 (14:06 -0700)]
rapidio: remove unnecessary 0x prefixes before %pa extension uses
Patch series "RapidIO subsystem updates".
This set of patches contains RapidIO subsystem fixes and updates that
have been made since kernel v4.6. The most significant update brings
changes related to the latest revision of RapidIO specification
(rev.3.x) and introduction of next generation of RapidIO switches by IDT
(RXS1632 and RXS2448).
This patch (of 13):
This is RapidIO part of the original patch submitted by Joe Perches.
(see: https://lkml.org/lkml/2016/3/5/19)
Since commit
3cab1e711297 ("lib/vsprintf: refactor duplicate code
to special_hex_number()") %pa uses have been output with a 0x prefix.
These 0x prefixes in the formats are unnecessary.
Link: http://lkml.kernel.org/r/1469125134-16523-2-git-send-email-alexandre.bounine@idt.com
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com>
Cc: Barry Wood <barry.wood@idt.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexandre Bounine [Tue, 2 Aug 2016 21:06:25 +0000 (14:06 -0700)]
rapidio: add RapidIO channelized messaging driver
Add channelized messaging driver to support native RapidIO messaging
exchange between multiple senders/recipients on devices that use kernel
RapidIO subsystem services.
This device driver is the result of collaboration within the RapidIO.org
Software Task Group (STG) between Texas Instruments, Prodrive
Technologies, Nokia Networks, BAE and IDT. Additional input was
received from other members of RapidIO.org.
The objective was to create a character mode driver interface which
exposes messaging capabilities of RapidIO endpoint devices (mports)
directly to applications, in a manner that allows the numerous and
varied RapidIO implementations to interoperate.
This char mode device driver allows user-space applications to setup
messaging communication channels using single shared RapidIO messaging
mailbox.
By default this driver uses RapidIO MBOX_1 (MBOX_0 is reserved for use by
RIONET Ethernet emulation driver).
[weiyj.lk@gmail.com: rapidio/rio_cm: fix return value check in riocm_init()]
Link: http://lkml.kernel.org/r/1469198221-21970-1-git-send-email-alexandre.bounine@idt.com
Link: http://lkml.kernel.org/r/1468952862-18056-1-git-send-email-alexandre.bounine@idt.com
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Tested-by: Barry Wood <barry.wood@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com>
Cc: Barry Wood <barry.wood@idt.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
zhong jiang [Tue, 2 Aug 2016 21:06:22 +0000 (14:06 -0700)]
kexec: add restriction on kexec_load() segment sizes
I hit the following issue when run trinity in my system. The kernel is
3.4 version, but mainline has the same issue.
The root cause is that the segment size is too large so the kerenl
spends too long trying to allocate a page. Other cases will block until
the test case quits. Also, OOM conditions will occur.
Call Trace:
__alloc_pages_nodemask+0x14c/0x8f0
alloc_pages_current+0xaf/0x120
kimage_alloc_pages+0x10/0x60
kimage_alloc_control_pages+0x5d/0x270
machine_kexec_prepare+0xe5/0x6c0
? kimage_free_page_list+0x52/0x70
sys_kexec_load+0x141/0x600
? vfs_write+0x100/0x180
system_call_fastpath+0x16/0x1b
The patch changes sanity_check_segment_list() to verify that the usage by
all segments does not exceed half of memory.
[akpm@linux-foundation.org: fix for kexec-return-error-number-directly.patch, update comment]
Link: http://lkml.kernel.org/r/1469625474-53904-1-git-send-email-zhongjiang@huawei.com
Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Suggested-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Petr Tesarik [Tue, 2 Aug 2016 21:06:19 +0000 (14:06 -0700)]
kexec: allow kdump with crash_kexec_post_notifiers
If a crash kernel is loaded, do not crash the running domain. This is
needed if the kernel is loaded with crash_kexec_post_notifiers, because
panic notifiers are run before __crash_kexec() in that case, and this
Xen hook prevents its being called later.
[akpm@linux-foundation.org: build fix: unconditionally include kexec.h]
Link: http://lkml.kernel.org/r/20160713122000.14969.99963.stgit@hananiah.suse.cz
Signed-off-by: Petr Tesarik <ptesarik@suse.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Petr Tesarik [Tue, 2 Aug 2016 21:06:16 +0000 (14:06 -0700)]
kexec: add a kexec_crash_loaded() function
Provide a wrapper function to be used by kernel code to check whether a
crash kernel is loaded. It returns the same value that can be seen in
/sys/kernel/kexec_crash_loaded by userspace programs.
I'm exporting the function, because it will be used by Xen, and it is
possible to compile Xen modules separately to enable the use of PV
drivers with unmodified bare-metal kernels.
Link: http://lkml.kernel.org/r/20160713121955.14969.69080.stgit@hananiah.suse.cz
Signed-off-by: Petr Tesarik <ptesarik@suse.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hidehiro Kawai [Tue, 2 Aug 2016 21:06:13 +0000 (14:06 -0700)]
kexec: use core_param for crash_kexec_post_notifiers boot option
crash_kexec_post_notifiers ia a boot option which controls whether the
1st kernel calls panic notifiers or not before booting the 2nd kernel.
However, there is no need to limit it to being modifiable only at boot
time. So, use core_param instead of early_param.
Link: http://lkml.kernel.org/r/20160705113327.5864.43139.stgit@softrs
Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Borislav Petkov <bp@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Russell King [Tue, 2 Aug 2016 21:06:10 +0000 (14:06 -0700)]
ARM: kexec: fix kexec for Keystone 2
Provide kexec with the boot view of memory by overriding the normal
kexec translation functions added in a previous patch. We also need to
fix a call to memblock in machine_kexec_prepare() so that we provide it
with a running-view physical address rather than a boot- view physical
address.
Link: http://lkml.kernel.org/r/E1b8koa-0004Hl-Ey@rmk-PC.armlinux.org.uk
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Keerthy <j-keerthy@ti.com>
Cc: Pratyush Anand <panand@redhat.com>
Cc: Vitaly Andrianov <vitalya@ti.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Simon Horman <horms@verge.net.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vitaly Andrianov [Tue, 2 Aug 2016 21:06:07 +0000 (14:06 -0700)]
ARM: keystone: dts: add psci command definition
This commit adds definition for cpu_on, cpu_off and cpu_suspend
commands. These definitions must match the corresponding PSCI
definitions in boot monitor.
Having those command and corresponding PSCI support in boot monitor
allows run time CPU hot plugin.
Link: http://lkml.kernel.org/r/E1b8koV-0004Hf-2j@rmk-PC.armlinux.org.uk
Signed-off-by: Keerthy <j-keerthy@ti.com>
Signed-off-by: Vitaly Andrianov <vitalya@ti.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Pratyush Anand <panand@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Simon Horman <horms@verge.net.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Russell King [Tue, 2 Aug 2016 21:06:04 +0000 (14:06 -0700)]
kexec: allow architectures to override boot mapping
kexec physical addresses are the boot-time view of the system. For
certain ARM systems (such as Keystone 2), the boot view of the system
does not match the kernel's view of the system: the boot view uses a
special alias in the lower 4GB of the physical address space.
To cater for these kinds of setups, we need to translate between the
boot view physical addresses and the normal kernel view physical
addresses. This patch extracts the current transation points into
linux/kexec.h, and allows an architecture to override the functions.
Due to the translations required, we unfortunately end up with six
translation functions, which are reduced down to four that the
architecture can override.
[akpm@linux-foundation.org: kexec.h needs asm/io.h for phys_to_virt()]
Link: http://lkml.kernel.org/r/E1b8koP-0004HZ-Vf@rmk-PC.armlinux.org.uk
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Keerthy <j-keerthy@ti.com>
Cc: Pratyush Anand <panand@redhat.com>
Cc: Vitaly Andrianov <vitalya@ti.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Simon Horman <horms@verge.net.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Russell King [Tue, 2 Aug 2016 21:06:00 +0000 (14:06 -0700)]
kdump: arrange for paddr_vmcoreinfo_note() to return phys_addr_t
On PAE systems (eg, ARM LPAE) the vmcore note may be located above 4GB
physical on 32-bit architectures, so we need a wider type than "unsigned
long" here. Arrange for paddr_vmcoreinfo_note() to return a
phys_addr_t, thereby allowing it to be located above 4GB.
This makes no difference for kexec-tools, as they already assume a
64-bit type when reading from this file.
Link: http://lkml.kernel.org/r/E1b8koK-0004HS-K9@rmk-PC.armlinux.org.uk
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Reviewed-by: Pratyush Anand <panand@redhat.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Keerthy <j-keerthy@ti.com>
Cc: Vitaly Andrianov <vitalya@ti.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Simon Horman <horms@verge.net.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Russell King [Tue, 2 Aug 2016 21:05:57 +0000 (14:05 -0700)]
kexec: ensure user memory sizes do not wrap
Ensure that user memory sizes do not wrap around when validating the
user input, which can lead to the following input validation working
incorrectly.
[akpm@linux-foundation.org: fix it for kexec-return-error-number-directly.patch]
Link: http://lkml.kernel.org/r/E1b8koF-0004HM-5x@rmk-PC.armlinux.org.uk
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Reviewed-by: Pratyush Anand <panand@redhat.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Keerthy <j-keerthy@ti.com>
Cc: Vitaly Andrianov <vitalya@ti.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Simon Horman <horms@verge.net.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Russell King [Tue, 2 Aug 2016 21:05:54 +0000 (14:05 -0700)]
kexec: don't invoke OOM-killer for control page allocation
If we are unable to find a suitable page when allocating the control
page, do not invoke the OOM-killer: killing processes probably isn't
going to help.
Link: http://lkml.kernel.org/r/E1b8ko9-0004HG-R5@rmk-PC.armlinux.org.uk
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Reviewed-by: Pratyush Anand <panand@redhat.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Keerthy <j-keerthy@ti.com>
Cc: Vitaly Andrianov <vitalya@ti.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Simon Horman <horms@verge.net.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Russell King [Tue, 2 Aug 2016 21:05:51 +0000 (14:05 -0700)]
ARM: kexec: advertise location of bootable RAM
Advertise the location of bootable RAM to kexec-tools. kexec needs to
know where it can place the kernel in RAM, and so be executable when the
system needs to jump into it.
Advertise these areas in /proc/iomem with a "System RAM (boot alias)"
tag.
Link: http://lkml.kernel.org/r/E1b8ko4-0004HA-GF@rmk-PC.armlinux.org.uk
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Reviewed-by: Pratyush Anand <panand@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Keerthy <j-keerthy@ti.com>
Cc: Vitaly Andrianov <vitalya@ti.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Simon Horman <horms@verge.net.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Russell King [Tue, 2 Aug 2016 21:05:48 +0000 (14:05 -0700)]
ARM: kdump: advertise boot aliased crash kernel resource
Advertise a resource which describes where the crash kernel is located
in the boot view of RAM. This allows kexec-tools to have this vital
information.
Link: http://lkml.kernel.org/r/E1b8knz-0004H4-Bd@rmk-PC.armlinux.org.uk
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Baoquan He <bhe@redhat.com>
Cc: Keerthy <j-keerthy@ti.com>
Cc: Pratyush Anand <panand@redhat.com>
Cc: Vitaly Andrianov <vitalya@ti.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Simon Horman <horms@verge.net.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Minfei Huang [Tue, 2 Aug 2016 21:05:45 +0000 (14:05 -0700)]
kexec: return error number directly
This is a cleanup patch to make kexec more clear to return error number
directly. The variable result is useless, because there is no other
function's return value assignes to it. So remove it.
Link: http://lkml.kernel.org/r/1464179273-57668-1-git-send-email-mnghuan@gmail.com
Signed-off-by: Minfei Huang <mnghuan@gmail.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Xunlei Pang <xlpang@redhat.com>
Cc: Atsushi Kumagai <ats-kumagai@wm.jp.nec.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Geliang Tang [Tue, 2 Aug 2016 21:05:42 +0000 (14:05 -0700)]
cpumask: fix code comment
Fix code comment for cpumask_parse().
Link: http://lkml.kernel.org/r/71aae2c60ae5dae0cf554199ce6aea8f88c69347.1465380581.git.geliangtang@gmail.com
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Anton Blanchard [Tue, 2 Aug 2016 21:05:40 +0000 (14:05 -0700)]
kernel/exit.c: quieten greatest stack depth printk
Many targets enable CONFIG_DEBUG_STACK_USAGE, and while the information
is useful, it isn't worthy of pr_warn(). Reduce it to pr_info().
Link: http://lkml.kernel.org/r/1466982072-29836-1-git-send-email-anton@ozlabs.org
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andy Lutomirski [Tue, 2 Aug 2016 21:05:36 +0000 (14:05 -0700)]
signal: consolidate {TS,TLF}_RESTORE_SIGMASK code
In general, there's no need for the "restore sigmask" flag to live in
ti->flags. alpha, ia64, microblaze, powerpc, sh, sparc (64-bit only),
tile, and x86 use essentially identical alternative implementations,
placing the flag in ti->status.
Replace those optimized implementations with an equally good common
implementation that stores it in a bitfield in struct task_struct and
drop the custom implementations.
Additional architectures can opt in by removing their
TIF_RESTORE_SIGMASK defines.
Link: http://lkml.kernel.org/r/8a14321d64a28e40adfddc90e18a96c086a6d6f9.1468522723.git.luto@kernel.org
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Tested-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc]
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dmitry Safonov <dsafonov@virtuozzo.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeff Mahoney [Tue, 2 Aug 2016 21:05:33 +0000 (14:05 -0700)]
reiserfs: fix "new_insert_key may be used uninitialized ..."
new_insert_key only makes any sense when it's associated with a
new_insert_ptr, which is initialized to NULL and changed to a
buffer_head when we also initialize new_insert_key. We can key off of
that to avoid the uninitialized warning.
Link: http://lkml.kernel.org/r/5eca5ffb-2155-8df2-b4a2-f162f105efed@suse.com
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Jan Kara <jack@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ryusuke Konishi [Tue, 2 Aug 2016 21:05:30 +0000 (14:05 -0700)]
nilfs2: move ioctl interface and disk layout to uapi separately
The header file "include/linux/nilfs2_fs.h" is composed of parts for
ioctl and disk format, and both are intended to be shared with user
space programs.
This moves them to the uapi directory "include/uapi/linux" splitting the
file to "nilfs2_api.h" and "nilfs2_ondisk.h". The following minor
changes are accompanied by this migration:
- nilfs_direct_node struct in nilfs2/direct.h is converged to
nilfs2_ondisk.h because it's an on-disk structure.
- inline functions nilfs_rec_len_from_disk() and
nilfs_rec_len_to_disk() are moved to nilfs2/dir.c.
Link: http://lkml.kernel.org/r/1465825507-3407-4-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ryusuke Konishi [Tue, 2 Aug 2016 21:05:28 +0000 (14:05 -0700)]
nilfs2: use BIT() macro
Replace bit shifts by BIT macro for clarity.
Link: http://lkml.kernel.org/r/1465825507-3407-3-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ryusuke Konishi [Tue, 2 Aug 2016 21:05:25 +0000 (14:05 -0700)]
nilfs2: fix misuse of a semaphore in sysfs code
Variables ns_seg_seq, ns_segnum, ns_nextnum, ns_pseg_offset, ns_cno,
ns_ctime, ns_nongc_ctime, and ns_ndirtyblks, are protected by
ns_segctor_sem, but ns_sem is wrongly used by the nilfs sysfs code when
reading these variables. This fixes the misuse and clarifies which
semaphore protects them in the comment of the_nilfs struct.
Link: http://lkml.kernel.org/r/1465825507-3407-2-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ryusuke Konishi [Tue, 2 Aug 2016 21:05:22 +0000 (14:05 -0700)]
nilfs2: refactor parser of snapshot mount option
Move parser of snapshot mount option to a separate function
nilfs_parse_snapshot_option(), replace simple_strtoull() with
kstrtoull() to avoid checkpatch.pl warning "WARNING: simple_strtoull is
obsolete, use kstrtoull instead", and refine the error message of the
parser.
Link: http://lkml.kernel.org/r/1464875891-5443-9-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ryusuke Konishi [Tue, 2 Aug 2016 21:05:19 +0000 (14:05 -0700)]
nilfs2: do not use yield()
Use cond_resched() instead of yield() in the loop of
nilfs_transaction_lock() since the usage corresponds to the "be nice for
others" case that the comment of yield() says.
This removes the following checkpatch.pl warning:
"WARNING: Using yield() is generally wrong. See yield() kernel-doc
(sched/core.c)"
Link: http://lkml.kernel.org/r/1464875891-5443-8-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ryusuke Konishi [Tue, 2 Aug 2016 21:05:17 +0000 (14:05 -0700)]
nilfs2: emit error message when I/O error is detected
When nilfs returned -EIO as an error code, it's not always clear if it
came from the underlying block device or not. This will mend the issue
by having low level I/O routines of nilfs output an error message when
they detected an I/O error.
Link: http://lkml.kernel.org/r/1464875891-5443-7-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ryusuke Konishi [Tue, 2 Aug 2016 21:05:14 +0000 (14:05 -0700)]
nilfs2: replace nilfs_warning() with nilfs_msg()
Use nilfs_msg() to output warning messages and get rid of
nilfs_warning() function. This also removes function names from the
messages unless we embed them explicitly in format strings. Instead,
some messages are revised to clarify the context.
[arnd@arndb.de: avoid warning about unused variables]
Link: http://lkml.kernel.org/r/20160615201945.3348205-1-arnd@arndb.de
Link: http://lkml.kernel.org/r/1464875891-5443-6-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ryusuke Konishi [Tue, 2 Aug 2016 21:05:10 +0000 (14:05 -0700)]
nilfs2: reduce bare use of printk() with nilfs_msg()
Replace most use of printk() in nilfs2 implementation with nilfs_msg(),
and reduce the following checkpatch.pl warning:
"WARNING: Prefer [subsystem eg: netdev]_crit([subsystem]dev, ...
then dev_crit(dev, ... then pr_crit(... to printk(KERN_CRIT ..."
This patch also fixes a minor checkpatch warning "WARNING: quoted string
split across lines" that often accompanies the prior warning, and amends
message format as needed.
Link: http://lkml.kernel.org/r/1464875891-5443-5-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ryusuke Konishi [Tue, 2 Aug 2016 21:05:06 +0000 (14:05 -0700)]
nilfs2: embed a back pointer to super block instance in nilfs object
Insert a back pointer to super block instance in nilfs object so that
functions of nilfs2 easily refer to the super block instance. This
simplifies replacement of printk() in the successive change.
Link: http://lkml.kernel.org/r/1464875891-5443-4-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ryusuke Konishi [Tue, 2 Aug 2016 21:05:02 +0000 (14:05 -0700)]
nilfs2: add nilfs_msg() message interface
Define an own output routine to replace bare use of printk() function.
The output routine is implemented with a macro and a helper function,
which are named nilfs_msg() and __nilfs_msg(), respectively.
__nilfs_msg() formats a message like "NILFS (<device-name>): <message>",
prefixing it with a given log level, and terminates the statement with a
newline. The "device-name" is optional to make it available in early
stages; it will be omitted if a NULL pointer is passed to super block
instance argument. nilfs_msg() wraps __nilfs_msg() and is removed if
CONFIG_PRINTK is not set.
Link: http://lkml.kernel.org/r/1464875891-5443-3-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ryusuke Konishi [Tue, 2 Aug 2016 21:05:00 +0000 (14:05 -0700)]
nilfs2: hide function name argument from nilfs_error()
Simplify nilfs_error(), an output function used to report critical
issues in file system. This renames the original nilfs_error() function
to __nilfs_error() and redefines it as a macro to hide its function name
argument within the macro.
Every call site of nilfs_error() is changed to strip __func__ argument
except nilfs_bmap_convert_error(); nilfs_bmap_convert_error() directly
calls __nilfs_error() because it inherits caller's function name.
Link: http://lkml.kernel.org/r/1464875891-5443-2-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Daniel Wagner [Tue, 2 Aug 2016 21:04:57 +0000 (14:04 -0700)]
fs/binfmt_em86.c: fix incompatible pointer type
Since the -Wincompatible-pointer-types is reported as error, alpha
doesn't build anymore. Let's fix it in a minimal way.
fs/binfmt_em86.c:73:35: error: passing argument 2 of `copy_strings_kernel' from incompatible pointer type [-Werror=incompatible-pointer-types]
retval = copy_strings_kernel(1, &i_arg, bprm);
^ ^
fs/binfmt_em86.c:77:34: error: passing argument 2 of `copy_strings_kernel' from incompatible pointer type [-Werror=incompatible-pointer-types]
retval = copy_strings_kernel(1, &i_name, bprm);
^
Link: http://lkml.kernel.org/r/1469525978-23359-1-git-send-email-wagi@monom.org
Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kees Cook [Tue, 2 Aug 2016 21:04:54 +0000 (14:04 -0700)]
mm: refuse wrapped vm_brk requests
The vm_brk() alignment calculations should refuse to overflow. The ELF
loader depending on this, but it has been fixed now. No other unsafe
callers have been found.
Link: http://lkml.kernel.org/r/1468014494-25291-3-git-send-email-keescook@chromium.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Hector Marco-Gisbert <hecmargi@upv.es>
Cc: Ismael Ripoll Ripoll <iripoll@upv.es>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Chen Gang <gang.chen.5i5j@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kees Cook [Tue, 2 Aug 2016 21:04:51 +0000 (14:04 -0700)]
binfmt_elf: fix calculations for bss padding
A double-bug exists in the bss calculation code, where an overflow can
happen in the "last_bss - elf_bss" calculation, but vm_brk internally
aligns the argument, underflowing it, wrapping back around safe. We
shouldn't depend on these bugs staying in sync, so this cleans up the
bss padding handling to avoid the overflow.
This moves the bss padzero() before the last_bss > elf_bss case, since
the zero-filling of the ELF_PAGE should have nothing to do with the
relationship of last_bss and elf_bss: any trailing portion should be
zeroed, and a zero size is already handled by padzero().
Then it handles the math on elf_bss vs last_bss correctly. These need
to both be ELF_PAGE aligned to get the comparison correct, since that's
the expected granularity of the mappings. Since elf_bss already had
alignment-based padding happen in padzero(), the "start" of the new
vm_brk() should be moved forward as done in the original code. However,
since the "end" of the vm_brk() area will already become PAGE_ALIGNed in
vm_brk() then last_bss should get aligned here to avoid hiding it as a
side-effect.
Additionally makes a cosmetic change to the initial last_bss calculation
so it's easier to read in comparison to the load_addr calculation above
it (i.e. the only difference is p_filesz vs p_memsz).
Link: http://lkml.kernel.org/r/1468014494-25291-2-git-send-email-keescook@chromium.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Hector Marco-Gisbert <hecmargi@upv.es>
Cc: Ismael Ripoll Ripoll <iripoll@upv.es>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Chen Gang <gang.chen.5i5j@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Allen Hubbe [Tue, 2 Aug 2016 21:04:47 +0000 (14:04 -0700)]
checkpatch: if no filenames then read stdin
If no filenames are given, then read the patch from stdin.
Link: http://lkml.kernel.org/r/a8784f291ccb5067361992bf5d41ff6cfb0ce5cb.1469830917.git.allenbh@gmail.com
Signed-off-by: Allen Hubbe <allenbh@gmail.com>
Acked-by: Joe Perches <joe@perches.com>
Cc: Andy Whitcroft <apw@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Allen Hubbe [Tue, 2 Aug 2016 21:04:45 +0000 (14:04 -0700)]
checkpatch: check signoff when reading stdin
Signoff was not checked if the filename is '-', indicating reading the
patch from stdin. Commands such as the below would not warn about a
missing signoff, because the patch filename is '-'. This change allows
checkpatch to warn about a missing signoff, even if the input filename
is '-', but only if the patch has a commit message.
git show --pretty=email | scripts/checkpatch.pl -
A more common use of checkpatch with stdin is for piping git diff
through checkpatch. The diff output would not contain a commit message,
and therefore it would not contain a signoff line. For this common use
case, a warning should not be printed about the missing signoff. With
this change we will only warn about a missing signoff if the input
contains a commit message.
git diff | scripts/checkpatch.pl -
Before this patch, a workaround for the first command was to refer to
stdin by a name other than '-'. The workaround is not an elegant
solution, because elsewhere checkpatch uses the fact that filename
equals '-', such as in setting '$vname' to 'Your patch' for stdin. The
command below would report "/dev/stdin has style problems" instead of
"Your patch has style problems."
git show --pretty=email | scripts/checkpatch.pl /dev/stdin
Link: http://lkml.kernel.org/r/48be31e414bddc65bccfa6b1322359be9ba032eb.1469670589.git.allenbh@gmail.com
Signed-off-by: Allen Hubbe <allenbh@gmail.com>
Acked-by: Joe Perches <joe@perches.com>
Cc: Andy Whitcroft <apw@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joe Perches [Tue, 2 Aug 2016 21:04:42 +0000 (14:04 -0700)]
checkpatch: improve 'bare use of' signed/unsigned types warning
Fix false positive warning of identifiers ending in signed with an =
assignment of WARNING: Prefer 'signed int' to bare use of 'signed'.
Link: http://lkml.kernel.org/r/6a0e24c3e9102337528ecfcbbe91a0eb5b4820ed.1469529497.git.joe@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Reported-by: Alan Douglas <alanjhd@gmail.com>
Acked-by: Andy Whitcroft <apw@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tomas Winkler [Tue, 2 Aug 2016 21:04:39 +0000 (14:04 -0700)]
checkpatch: don't complain about BIT macro in uapi
BIT macro cannot be exported to UAPI, don't complain about it.
Link: http://lkml.kernel.org/r/1468707033-16173-1-git-send-email-tomas.winkler@intel.com
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Acked-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joe Perches [Tue, 2 Aug 2016 21:04:36 +0000 (14:04 -0700)]
checkpatch: yet another commit id improvement
Using \b isn't good enough to isolate what appears to be a commit id in
a commit message.
Make sure there is a space or a quote like character after a continuous
run of hexadecimal characters that could be a commit id.
Link: http://lkml.kernel.org/r/fdd22b47463a21c21132edbb8aa35e372950a1e6.1468869915.git.joe@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Cc: "Zhuo, Qiuxu" <qiuxu.zhuo@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joe Perches [Tue, 2 Aug 2016 21:04:33 +0000 (14:04 -0700)]
checkpatch: allow c99 style // comments
Sanitise the lines that contain c99 comments so that the error doesn't
get emitted.
Link: http://lkml.kernel.org/r/d4d22c34ad7bcc1bceb52f0742f76b7a6d585235.1468368420.git.joe@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joe Perches [Tue, 2 Aug 2016 21:04:31 +0000 (14:04 -0700)]
checkpatch: skip long lines that use an EFI_GUID macro
These are also possible single line uses that exceed the generic maximum
line length (typically 80 columns)
Link: http://lkml.kernel.org/r/32a6a85fbd6161f1bb55ce176a464e44591afc5b.1468368420.git.joe@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Stephen Boyd [Tue, 2 Aug 2016 21:04:28 +0000 (14:04 -0700)]
firmware: support loading into a pre-allocated buffer
Some systems are memory constrained but they need to load very large
firmwares. The firmware subsystem allows drivers to request this
firmware be loaded from the filesystem, but this requires that the
entire firmware be loaded into kernel memory first before it's provided
to the driver. This can lead to a situation where we map the firmware
twice, once to load the firmware into kernel memory and once to copy the
firmware into the final resting place.
This creates needless memory pressure and delays loading because we have
to copy from kernel memory to somewhere else. Let's add a
request_firmware_into_buf() API that allows drivers to request firmware
be loaded directly into a pre-allocated buffer. This skips the
intermediate step of allocating a buffer in kernel memory to hold the
firmware image while it's read from the filesystem. It also requires
that drivers know how much memory they'll require before requesting the
firmware and negates any benefits of firmware caching because the
firmware layer doesn't manage the buffer lifetime.
For a 16MB buffer, about half the time is spent performing a memcpy from
the buffer to the final resting place. I see loading times go from
0.081171 seconds to 0.047696 seconds after applying this patch. Plus
the vmalloc pressure is reduced.
This is based on a patch from Vikram Mulukutla on codeaurora.org:
https://www.codeaurora.org/cgit/quic/la/kernel/msm-3.18/commit/drivers/base/firmware_class.c?h=rel/msm-3.18&id=
0a328c5f6cd999f5c591f172216835636f39bcb5
Link: http://lkml.kernel.org/r/20160607164741.31849-4-stephen.boyd@linaro.org
Signed-off-by: Stephen Boyd <stephen.boyd@linaro.org>
Cc: Mimi Zohar <zohar@linux.vnet.ibm.com>
Cc: Vikram Mulukutla <markivx@codeaurora.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Ming Lei <ming.lei@canonical.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vikram Mulukutla [Tue, 2 Aug 2016 21:04:25 +0000 (14:04 -0700)]
firmware: provide infrastructure to make fw caching optional
Some low memory systems with complex peripherals cannot afford to have
the relatively large firmware images taking up valuable memory during
suspend and resume. Change the internal implementation of
firmware_class to disallow caching based on a configurable option. In
the near future, variants of request_firmware will take advantage of
this feature.
Link: http://lkml.kernel.org/r/20160607164741.31849-3-stephen.boyd@linaro.org
[stephen.boyd@linaro.org: Drop firmware_desc design and use flags]
Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
Signed-off-by: Stephen Boyd <stephen.boyd@linaro.org>
Cc: Mimi Zohar <zohar@linux.vnet.ibm.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Ming Lei <ming.lei@canonical.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Stephen Boyd [Tue, 2 Aug 2016 21:04:22 +0000 (14:04 -0700)]
firmware: consolidate kmap/read/write logic
Some systems are memory constrained but they need to load very large
firmwares. The firmware subsystem allows drivers to request this
firmware be loaded from the filesystem, but this requires that the
entire firmware be loaded into kernel memory first before it's provided
to the driver. This can lead to a situation where we map the firmware
twice, once to load the firmware into kernel memory and once to copy the
firmware into the final resting place.
This design creates needless memory pressure and delays loading because
we have to copy from kernel memory to somewhere else. This patch sets
adds support to the request firmware API to load the firmware directly
into a pre-allocated buffer, skipping the intermediate copying step and
alleviating memory pressure during firmware loading. The drawback is
that we can't use the firmware caching feature because the memory for
the firmware cache is not managed by the firmware layer.
This patch (of 3):
We use similar structured code to read and write the kmapped firmware
pages. The only difference is read copies from the kmap region and
write copies to it. Consolidate this into one function to reduce
duplication.
Link: http://lkml.kernel.org/r/20160607164741.31849-2-stephen.boyd@linaro.org
Signed-off-by: Stephen Boyd <stephen.boyd@linaro.org>
Cc: Vikram Mulukutla <markivx@codeaurora.org>
Cc: Mimi Zohar <zohar@linux.vnet.ibm.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Ming Lei <ming.lei@canonical.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ross Zwisler [Tue, 2 Aug 2016 21:04:19 +0000 (14:04 -0700)]
radix-tree: fix comment about "exceptional" bits
The bottom two bits of radix tree entries are reserved for special use
by the radix tree code itself. A comment detailing their usage was
added by commit
3bcadd6fa6c4 ("radix-tree: free up the bottom bit of
exceptional entries for reuse")
This comment states that if the bottom two bits are '11', this means
that this is a locked exceptional entry.
It turns out that this bit combination was never actually used. Radix
tree locking for DAX was indeed implemented, but it actually used the
third LSB:
/* We use lowest available exceptional entry bit for locking */
#define RADIX_DAX_ENTRY_LOCK (1 << RADIX_TREE_EXCEPTIONAL_SHIFT)
This locking code was also made specific to the DAX code instead of
being generally implemented in radix-tree.h.
So, fix the comment.
Link: http://lkml.kernel.org/r/1468997731-2155-1-git-send-email-ross.zwisler@linux.intel.com
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Arnd Bergmann [Tue, 2 Aug 2016 21:04:16 +0000 (14:04 -0700)]
crc32: use ktime_get_ns() for measurement
The crc32 test function measures the elapsed time in nanoseconds, but
uses 'struct timespec' for that. We want to remove timespec from the
kernel for y2038 compatibility, and ktime_get_ns() also helps make the
code simpler here.
It is also slightly better to use monontonic time, as we are only
interested in the time difference.
Link: http://lkml.kernel.org/r/20160617143932.3289626-1-arnd@arndb.de
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: "David S . Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sebastian Ott [Tue, 2 Aug 2016 21:04:13 +0000 (14:04 -0700)]
lib/iommu-helper: skip to next segment
When a large enough area in the iommu bitmap is found but would span a
boundary we continue the search starting from the next bit position.
For large allocations this can lead to several useless invocations of
bitmap_find_next_zero_area() and iommu_is_span_boundary().
Continue the search from the start of the next segment (which is the
next bit position such that we'll not cross the same segment boundary
again).
Link: http://lkml.kernel.org/r/alpine.LFD.2.20.1606081910070.3211@schleppi
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joe Perches [Tue, 2 Aug 2016 21:04:10 +0000 (14:04 -0700)]
get_maintainer.pl: reduce need for command-line option -f
If a vcs is used, look to see if the vcs tracks the file specified and
so the -f option becomes optional.
Link: http://lkml.kernel.org/r/7c86a8df0d48770c45778a43b6b3e4627b2a90ee.1469746395.git.joe@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Borislav Petkov [Tue, 2 Aug 2016 21:04:07 +0000 (14:04 -0700)]
printk: add kernel parameter to control writes to /dev/kmsg
Add a "printk.devkmsg" kernel command line parameter which controls how
userspace writes into /dev/kmsg. It has three options:
* ratelimit - ratelimit logging from userspace.
* on - unlimited logging from userspace
* off - logging from userspace gets ignored
The default setting is to ratelimit the messages written to it.
This changes the kernel default setting of "on" to "ratelimit" and we do
that because we want to keep userspace spamming /dev/kmsg to sane
levels. This is especially moot when a small kernel log buffer wraps
around and messages get lost. So the ratelimiting setting should be a
sane setting where kernel messages should have a bit higher chance of
survival from all the spamming.
It additionally does not limit logging to /dev/kmsg while the system is
booting if we haven't disabled it on the command line.
Furthermore, we can control the logging from a lower priority sysctl
interface - kernel.printk_devkmsg.
That interface will succeed only if printk.devkmsg *hasn't* been
supplied on the command line. If it has, then printk.devkmsg is a
one-time setting which remains for the duration of the system lifetime.
This "locking" of the setting is to prevent userspace from changing the
logging on us through sysctl(2).
This patch is based on previous patches from Linus and Steven.
[bp@suse.de: fixes]
Link: http://lkml.kernel.org/r/20160719072344.GC25563@nazgul.tnic
Link: http://lkml.kernel.org/r/20160716061745.15795-3-bp@alien8.de
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Dave Young <dyoung@redhat.com>
Cc: Franck Bui <fbui@suse.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Borislav Petkov [Tue, 2 Aug 2016 21:04:04 +0000 (14:04 -0700)]
ratelimit: extend to print suppressed messages on release
Extend the ratelimiting facility to print the amount of suppressed lines
when it is being released.
This use case is aimed at short-termed, burst-like users for which we
want to output the suppressed lines stats only once, after it has been
disposed of. For an example, see /dev/kmsg usage in a follow-on patch.
Also, change the printk() line we issue on release to not use
"callbacks" as it is misleading: we're not suppressing callbacks but
printk() calls.
This has been separated from a previous patch by Linus.
Link: http://lkml.kernel.org/r/20160716061745.15795-2-bp@alien8.de
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Dave Young <dyoung@redhat.com>
Cc: Franck Bui <fbui@suse.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Borislav Petkov [Tue, 2 Aug 2016 21:04:01 +0000 (14:04 -0700)]
fbdev/bfin_adv7393fb: move DRIVER_NAME before its first use
Move the DRIVER_NAME macro definition before the first usage site and
fix build error.
Link: http://lkml.kernel.org/r/20160801163937.GA28119@nazgul.tnic
Signed-off-by: Borislav Petkov <bp@suse.de>
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Christoph Hellwig [Tue, 2 Aug 2016 21:03:59 +0000 (14:03 -0700)]
printk: include <asm/sections.h> instead of <asm-generic/sections.h>
asm-generic headers are generic implementations for architecture
specific code and should not be included by common code. Thus use the
asm/ version of sections.h to get at the linker sections.
Link: http://lkml.kernel.org/r/1468285008-7331-1-git-send-email-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sergey Senozhatsky [Tue, 2 Aug 2016 21:03:56 +0000 (14:03 -0700)]
printk: introduce suppress_message_printing()
Messages' levels and console log level are inspected when the actual
printing occurs, which may provoke console_unlock() and
console_cont_flush() to waste CPU cycles on every message that has
loglevel above the current console_loglevel.
Schematically, console_unlock() does the following:
console_unlock()
{
...
for (;;) {
...
raw_spin_lock_irqsave(&logbuf_lock, flags);
skip:
msg = log_from_idx(console_idx);
if (msg->flags & LOG_NOCONS) {
...
goto skip;
}
level = msg->level;
len += msg_print_text(); >> sprintfs
memcpy,
etc.
if (nr_ext_console_drivers) {
ext_len = msg_print_ext_header(); >> scnprintf
ext_len += msg_print_ext_body(); >> scnprintfs
etc.
}
...
raw_spin_unlock(&logbuf_lock);
call_console_drivers(level, ext_text, ext_len, text, len)
{
if (level >= console_loglevel && >> drop the message
!ignore_loglevel)
return;
console->write(...);
}
local_irq_restore(flags);
}
...
}
The thing here is this deferred `level >= console_loglevel' check. We
are wasting CPU cycles on sprintfs/memcpy/etc. preparing the messages
that we will eventually drop.
This can be huge when we register a new CON_PRINTBUFFER console, for
instance. For every such a console register_console() resets the
console_seq, console_idx, console_prev
and sets a `exclusive console' pointer to replay the log buffer to that
just-registered console. And there can be a lot of messages to replay,
in the worst case most of which can be dropped after console_loglevel
test.
We know messages' levels long before we call msg_print_text() and
friends, so we can just move console_loglevel check out of
call_console_drivers() and format a new message only if we are sure that
it won't be dropped.
The patch factors out loglevel check into suppress_message_printing()
function and tests message->level and console_loglevel before formatting
functions in console_unlock() and console_cont_flush() are getting
executed. This improves things not only for exclusive CON_PRINTBUFFER
consoles, but for every console_unlock() that attempts to print a
message of level above the console_loglevel.
Link: http://lkml.kernel.org/r/20160627135012.8229-1-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Calvin Owens <calvinowens@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joe Perches [Tue, 2 Aug 2016 21:03:53 +0000 (14:03 -0700)]
printk: create pr_<level> functions
Using functions instead of macros can reduce overall code size by
eliminating unnecessary "KERN_SOH<digit>" prefixes from format strings.
defconfig x86-64:
$ size vmlinux*
text data bss dec hex filename
10193570 4331464 1105920 15630954 ee826a vmlinux.new
10192623 4335560 1105920 15634103 ee8eb7 vmlinux.old
As the return value are unimportant and unused in the kernel tree, these
new functions return void.
Miscellanea:
- change pr_<level> macros to call new __pr_<level> functions
- change vprintk_nmi and vprintk_default to add LOGLEVEL_<level> argument
[akpm@linux-foundation.org: fix LOGLEVEL_INFO, per Joe]
Link: http://lkml.kernel.org/r/e16cc34479dfefcae37c98b481e6646f0f69efc3.1466718827.git.joe@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sergey Senozhatsky [Tue, 2 Aug 2016 21:03:50 +0000 (14:03 -0700)]
printk: do not include interrupt.h
A trivial cosmetic change: interrupt.h header is redundant since commit
6b898c07cb1d ("console: use might_sleep in console_lock").
Link: http://lkml.kernel.org/r/20160620132847.21930-1-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Luis de Bethencourt [Tue, 2 Aug 2016 21:03:47 +0000 (14:03 -0700)]
dynamic_debug: only add header when used
kernel.h header doesn't directly use dynamic debug, instead we can
include it in module.c (which used it via kernel.h). printk.h only uses
it if CONFIG_DYNAMIC_DEBUG is on, changing the inclusion to only happen
in that case.
Link: http://lkml.kernel.org/r/1468429793-16917-1-git-send-email-luisbg@osg.samsung.com
[luisbg@osg.samsung.com: include dynamic_debug.h in drb_int.h]
Link: http://lkml.kernel.org/r/1468447828-18558-2-git-send-email-luisbg@osg.samsung.com
Signed-off-by: Luis de Bethencourt <luisbg@osg.samsung.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oleg Nesterov [Tue, 2 Aug 2016 21:03:44 +0000 (14:03 -0700)]
task_work: use READ_ONCE/lockless_dereference, avoid pi_lock if !task_works
Change task_work_cancel() to use lockless_dereference(), this is what
the code really wants but we didn't have this helper when it was
written.
Also add the fast-path task->task_works == NULL check, in the likely
case this task has no pending works and we can avoid
spin_lock(task->pi_lock).
While at it, change other users of ACCESS_ONCE() to use READ_ONCE().
Link: http://lkml.kernel.org/r/20160610150042.GA13868@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Andrea Parri <parri.andrea@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Chen Gang [Tue, 2 Aug 2016 21:03:42 +0000 (14:03 -0700)]
include: mman: use bool instead of int for the return value of arch_validate_prot
For pure bool function's return value, bool is a little better more or
less than int.
Link: http://lkml.kernel.org/r/1469331815-2026-1-git-send-email-chengang@emindsoft.com.cn
Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Lüssing [Tue, 2 Aug 2016 21:03:39 +0000 (14:03 -0700)]
mailmap: add Linus Lüssing
For one thing, summarizes all non-umlaut versions into the umlaut one
(Linus Luessing -> Linus Lüssing).
For another, maps obsolete email addresses to the current @c0d3.blue
one.
Link: http://lkml.kernel.org/r/1467805371-2773-1-git-send-email-linus.luessing@c0d3.blue
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Tue, 2 Aug 2016 21:03:36 +0000 (14:03 -0700)]
uapi: move forward declarations of internal structures
Don't user forward declarations of internal kernel structures in headers
exported to userspace.
Move "struct completion;".
Move "struct task_struct;".
Link: http://lkml.kernel.org/r/20160713215808.GA22486@p183.telecom.by
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fabian Frederick [Tue, 2 Aug 2016 21:03:33 +0000 (14:03 -0700)]
treewide: replace obsolete _refok by __ref
There was only one use of __initdata_refok and __exit_refok
__init_refok was used 46 times against 82 for __ref.
Those definitions are obsolete since commit
312b1485fb50 ("Introduce new
section reference annotations tags: __ref, __refdata, __refconst")
This patch removes the following compatibility definitions and replaces
them treewide.
/* compatibility defines */
#define __init_refok __ref
#define __initdata_refok __refdata
#define __exit_refok __ref
I can also provide separate patches if necessary.
(One patch per tree and check in 1 month or 2 to remove old definitions)
[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/1466796271-3043-1-git-send-email-fabf@skynet.be
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
NeilBrown [Tue, 2 Aug 2016 21:03:30 +0000 (14:03 -0700)]
memstick: don't allocate unused major for ms_block
When alloc_disk(0) is used the ->major number is completely ignored.
All devices are allocated with a major of BLOCK_EXT_MAJOR.
So remove registration and deregistration of 'major'.
Link: http://lkml.kernel.org/r/20160602064318.4403.49955.stgit@noble
Signed-off-by: NeilBrown <neilb@suse.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Maxim Levitsky <maximlevitsky@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Richard Weinberger [Tue, 2 Aug 2016 21:03:27 +0000 (14:03 -0700)]
init/Kconfig: make COMPILE_TEST depend on !UML
UML is a bit special since it does not have iomem nor dma. That means a
lot of drivers will not build if they miss a dependency on HAS_IOMEM.
s390 used to have the same issues but since it gained PCI support UML is
the only stranger.
We are tired of patching dozens of new drivers after every merge window
just to un-break allmod/yesconfig UML builds. One could argue that a
decent driver has to know on what it depends and therefore a missing
HAS_IOMEM dependency is a clear driver bug. But the dependency not
obvious and not everyone does UML builds with COMPILE_TEST enabled when
developing a device driver.
A possible solution to make these builds succeed on UML would be
providing stub functions for ioremap() and friends which fail upon
runtime. Another one is simply disabling COMPILE_TEST for UML. Since
it is the least hassle and does not force use to fake iomem support
let's do the latter.
Link: http://lkml.kernel.org/r/1466152995-28367-1-git-send-email-richard@nod.at
Signed-off-by: Richard Weinberger <richard@nod.at>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Valdis Kletnieks [Tue, 2 Aug 2016 21:03:25 +0000 (14:03 -0700)]
fs/proc/task_mmu.c: suppress compilation warnings with W=1
Suppress a bunch of warnings of the form:
fs/proc/task_mmu.c: In function 'show_smap_vma_flags':
fs/proc/task_mmu.c:635:22: warning: initialized field overwritten [-Wt override-init]
[ilog2(VM_READ)] = "rd",
^~~~
fs/proc/task_mmu.c:635:22: note: (near initialization for 'mnemonics[0]')
They happen because of the way we intentionally build the table, so
silence the warning when building with 'make W=1'.
Link: http://lkml.kernel.org/r/8727.1470022083@turing-police.cc.vt.edu
Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Arnd Bergmann [Tue, 2 Aug 2016 21:03:22 +0000 (14:03 -0700)]
procfs: avoid 32-bit time_t in /proc/*/stat
/proc/stat shows (among lots of other things) the current boottime (i.e.
number of seconds since boot). While a 32-bit number is sufficient for
this particular case, we want to get rid of the 'struct timespec'
suffers from a 32-bit overflow in 2038.
This changes the code to use a struct timespec64, which is known to be
safe in all cases.
Link: http://lkml.kernel.org/r/20160617201247.2292101-1-arnd@arndb.de
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oleg Nesterov [Tue, 2 Aug 2016 21:03:19 +0000 (14:03 -0700)]
proc_oom_score: remove tasklist_lock and pid_alive()
This was needed before to ensure that ->signal != 0 and do_each_thread()
is safe, see commit
b95c35e76b29b ("oom: fix the unsafe usage of
badness() in proc_oom_score()") for details.
Today tsk->signal can't go away and for_each_thread(tsk) is always safe.
Link: http://lkml.kernel.org/r/20160608211921.GA15508@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Luis de Bethencourt [Tue, 2 Aug 2016 21:03:16 +0000 (14:03 -0700)]
MAINTAINERS: befs: add new maintainers
Salah Triki and Luis de Bethencourt are taking over maintainership of
befs.
Link: http://lkml.kernel.org/r/1469651079-32455-1-git-send-email-luisbg@osg.samsung.com
Signed-off-by: Luis de Bethencourt <luisbg@osg.samsung.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
seokhoon.yoon [Tue, 2 Aug 2016 21:03:13 +0000 (14:03 -0700)]
cgroup: update cgroup's document path
cgroup's document path is changed to "cgroup-v1". update it.
Link: http://lkml.kernel.org/r/1470148443-6509-1-git-send-email-iamyooon@gmail.com
Signed-off-by: seokhoon.yoon <iamyooon@gmail.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nicolas Iooss [Tue, 2 Aug 2016 21:03:10 +0000 (14:03 -0700)]
UBSAN: fix typo in format string
handle_object_size_mismatch() used %pk to format a kernel pointer with
pr_err(). This seemed to be a misspelling for %pK, but using this to
format a kernel pointer does not make much sence here.
Therefore use %p instead, like in handle_missaligned_access().
Link: http://lkml.kernel.org/r/20160730083010.11569-1-nicolas.iooss_linux@m4x.org
Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fabian Frederick [Tue, 2 Aug 2016 21:03:07 +0000 (14:03 -0700)]
sysv, ipc: fix security-layer leaking
Commit
53dad6d3a8e5 ("ipc: fix race with LSMs") updated ipc_rcu_putref()
to receive rcu freeing function but used generic ipc_rcu_free() instead
of msg_rcu_free() which does security cleaning.
Running LTP msgsnd06 with kmemleak gives the following:
cat /sys/kernel/debug/kmemleak
unreferenced object 0xffff88003c0a11f8 (size 8):
comm "msgsnd06", pid 1645, jiffies
4294672526 (age 6.549s)
hex dump (first 8 bytes):
1b 00 00 00 01 00 00 00 ........
backtrace:
kmemleak_alloc+0x23/0x40
kmem_cache_alloc_trace+0xe1/0x180
selinux_msg_queue_alloc_security+0x3f/0xd0
security_msg_queue_alloc+0x2e/0x40
newque+0x4e/0x150
ipcget+0x159/0x1b0
SyS_msgget+0x39/0x40
entry_SYSCALL_64_fastpath+0x13/0x8f
Manfred Spraul suggested to fix sem.c as well and Davidlohr Bueso to
only use ipc_rcu_free in case of security allocation failure in newary()
Fixes:
53dad6d3a8e ("ipc: fix race with LSMs")
Link: http://lkml.kernel.org/r/1470083552-22966-1-git-send-email-fabf@skynet.be
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: <stable@vger.kernel.org> [3.12+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vladimir Davydov [Tue, 2 Aug 2016 21:03:04 +0000 (14:03 -0700)]
mm: vmscan: fix memcg-aware shrinkers not called on global reclaim
We must call shrink_slab() for each memory cgroup on both global and
memcg reclaim in shrink_node_memcg(). Commit
d71df22b55099 accidentally
changed that so that now shrink_slab() is only called with memcg != NULL
on memcg reclaim. As a result, memcg-aware shrinkers (including
dentry/inode) are never invoked on global reclaim. Fix that.
Fixes:
b2e18757f2c9 ("mm, vmscan: begin reclaiming pages on a per-node basis")
Link: http://lkml.kernel.org/r/1470056590-7177-1-git-send-email-vdavydov@virtuozzo.com
Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vladimir Davydov [Tue, 2 Aug 2016 21:03:01 +0000 (14:03 -0700)]
radix-tree: account nodes to memcg only if explicitly requested
Radix trees may be used not only for storing page cache pages, so
unconditionally accounting radix tree nodes to the current memory cgroup
is bad: if a radix tree node is used for storing data shared among
different cgroups we risk pinning dead memory cgroups forever.
So let's only account radix tree nodes if it was explicitly requested by
passing __GFP_ACCOUNT to INIT_RADIX_TREE. Currently, we only want to
account page cache entries, so mark mapping->page_tree so.
Fixes:
58e698af4c63 ("radix-tree: account radix_tree_node to memory cgroup")
Link: http://lkml.kernel.org/r/1470057188-7864-1-git-send-email-vdavydov@virtuozzo.com
Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: <stable@vger.kernel.org> [4.6+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexander Potapenko [Tue, 2 Aug 2016 21:02:58 +0000 (14:02 -0700)]
kasan: avoid overflowing quarantine size on low memory systems
If the total amount of memory assigned to quarantine is less than the
amount of memory assigned to per-cpu quarantines, |new_quarantine_size|
may overflow. Instead, set it to zero.
[akpm@linux-foundation.org: cleanup: use WARN_ONCE return value]
Link: http://lkml.kernel.org/r/1470063563-96266-1-git-send-email-glider@google.com
Fixes:
55834c59098d ("mm: kasan: initial memory quarantine implementation")
Signed-off-by: Alexander Potapenko <glider@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrey Ryabinin [Tue, 2 Aug 2016 21:02:55 +0000 (14:02 -0700)]
kasan: improve double-free reports
Currently we just dump stack in case of double free bug.
Let's dump all info about the object that we have.
[aryabinin@virtuozzo.com: change double free message per Alexander]
Link: http://lkml.kernel.org/r/1470153654-30160-1-git-send-email-aryabinin@virtuozzo.com
Link: http://lkml.kernel.org/r/1470062715-14077-6-git-send-email-aryabinin@virtuozzo.com
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrey Ryabinin [Tue, 2 Aug 2016 21:02:52 +0000 (14:02 -0700)]
mm/kasan: get rid of ->state in struct kasan_alloc_meta
The state of object currently tracked in two places - shadow memory, and
the ->state field in struct kasan_alloc_meta. We can get rid of the
latter. The will save us a little bit of memory. Also, this allow us
to move free stack into struct kasan_alloc_meta, without increasing
memory consumption. So now we should always know when the last time the
object was freed. This may be useful for long delayed use-after-free
bugs.
As a side effect this fixes following UBSAN warning:
UBSAN: Undefined behaviour in mm/kasan/quarantine.c:102:13
member access within misaligned address
ffff88000d1efebc for type 'struct qlist_node'
which requires 8 byte alignment
Link: http://lkml.kernel.org/r/1470062715-14077-5-git-send-email-aryabinin@virtuozzo.com
Reported-by: kernel test robot <xiaolong.ye@intel.com>
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrey Ryabinin [Tue, 2 Aug 2016 21:02:49 +0000 (14:02 -0700)]
mm/kasan: get rid of ->alloc_size in struct kasan_alloc_meta
Size of slab object already stored in cache->object_size.
Note, that kmalloc() internally rounds up size of allocation, so
object_size may be not equal to alloc_size, but, usually we don't need
to know the exact size of allocated object. In case if we need that
information, we still can figure it out from the report. The dump of
shadow memory allows to identify the end of allocated memory, and
thereby the exact allocation size.
Link: http://lkml.kernel.org/r/1470062715-14077-4-git-send-email-aryabinin@virtuozzo.com
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrey Ryabinin [Tue, 2 Aug 2016 21:02:46 +0000 (14:02 -0700)]
mm/kasan, slub: don't disable interrupts when object leaves quarantine
SLUB doesn't require disabled interrupts to call ___cache_free().
Link: http://lkml.kernel.org/r/1470062715-14077-3-git-send-email-aryabinin@virtuozzo.com
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Acked-by: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrey Ryabinin [Tue, 2 Aug 2016 21:02:43 +0000 (14:02 -0700)]
mm/kasan: don't reduce quarantine in atomic contexts
Currently we call quarantine_reduce() for ___GFP_KSWAPD_RECLAIM (implied
by __GFP_RECLAIM) allocation. So, basically we call it on almost every
allocation. quarantine_reduce() sometimes is heavy operation, and
calling it with disabled interrupts may trigger hard LOCKUP:
NMI watchdog: Watchdog detected hard LOCKUP on cpu 2irq event stamp:
1411258
Call Trace:
<NMI> dump_stack+0x68/0x96
watchdog_overflow_callback+0x15b/0x190
__perf_event_overflow+0x1b1/0x540
perf_event_overflow+0x14/0x20
intel_pmu_handle_irq+0x36a/0xad0
perf_event_nmi_handler+0x2c/0x50
nmi_handle+0x128/0x480
default_do_nmi+0xb2/0x210
do_nmi+0x1aa/0x220
end_repeat_nmi+0x1a/0x1e
<<EOE>> __kernel_text_address+0x86/0xb0
print_context_stack+0x7b/0x100
dump_trace+0x12b/0x350
save_stack_trace+0x2b/0x50
set_track+0x83/0x140
free_debug_processing+0x1aa/0x420
__slab_free+0x1d6/0x2e0
___cache_free+0xb6/0xd0
qlist_free_all+0x83/0x100
quarantine_reduce+0x177/0x1b0
kasan_kmalloc+0xf3/0x100
Reduce the quarantine_reduce iff direct reclaim is allowed.
Fixes:
55834c59098d("mm: kasan: initial memory quarantine implementation")
Link: http://lkml.kernel.org/r/1470062715-14077-2-git-send-email-aryabinin@virtuozzo.com
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reported-by: Dave Jones <davej@codemonkey.org.uk>
Acked-by: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrey Ryabinin [Tue, 2 Aug 2016 21:02:40 +0000 (14:02 -0700)]
mm/kasan: fix corruptions and false positive reports
Once an object is put into quarantine, we no longer own it, i.e. object
could leave the quarantine and be reallocated. So having set_track()
call after the quarantine_put() may corrupt slab objects.
BUG kmalloc-4096 (Not tainted): Poison overwritten
-----------------------------------------------------------------------------
Disabling lock debugging due to kernel taint
INFO: 0xffff8804540de850-0xffff8804540de857. First byte 0xb5 instead of 0x6b
...
INFO: Freed in qlist_free_all+0x42/0x100 age=75 cpu=3 pid=24492
__slab_free+0x1d6/0x2e0
___cache_free+0xb6/0xd0
qlist_free_all+0x83/0x100
quarantine_reduce+0x177/0x1b0
kasan_kmalloc+0xf3/0x100
kasan_slab_alloc+0x12/0x20
kmem_cache_alloc+0x109/0x3e0
mmap_region+0x53e/0xe40
do_mmap+0x70f/0xa50
vm_mmap_pgoff+0x147/0x1b0
SyS_mmap_pgoff+0x2c7/0x5b0
SyS_mmap+0x1b/0x30
do_syscall_64+0x1a0/0x4e0
return_from_SYSCALL_64+0x0/0x7a
INFO: Slab 0xffffea0011503600 objects=7 used=7 fp=0x (null) flags=0x8000000000004080
INFO: Object 0xffff8804540de848 @offset=26696 fp=0xffff8804540dc588
Redzone
ffff8804540de840: bb bb bb bb bb bb bb bb ........
Object
ffff8804540de848: 6b 6b 6b 6b 6b 6b 6b 6b b5 52 00 00 f2 01 60 cc kkkkkkkk.R....`.
Similarly, poisoning after the quarantine_put() leads to false positive
use-after-free reports:
BUG: KASAN: use-after-free in anon_vma_interval_tree_insert+0x304/0x430 at addr
ffff880405c540a0
Read of size 8 by task trinity-c0/3036
CPU: 0 PID: 3036 Comm: trinity-c0 Not tainted 4.7.0-think+ #9
Call Trace:
dump_stack+0x68/0x96
kasan_report_error+0x222/0x600
__asan_report_load8_noabort+0x61/0x70
anon_vma_interval_tree_insert+0x304/0x430
anon_vma_chain_link+0x91/0xd0
anon_vma_clone+0x136/0x3f0
anon_vma_fork+0x81/0x4c0
copy_process.part.47+0x2c43/0x5b20
_do_fork+0x16d/0xbd0
SyS_clone+0x19/0x20
do_syscall_64+0x1a0/0x4e0
entry_SYSCALL64_slow_path+0x25/0x25
Fix this by putting an object in the quarantine after all other
operations.
Fixes:
80a9201a5965 ("mm, kasan: switch SLUB to stackdepot, enable memory quarantine for SLUB")
Link: http://lkml.kernel.org/r/1470062715-14077-1-git-send-email-aryabinin@virtuozzo.com
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reported-by: Dave Jones <davej@codemonkey.org.uk>
Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Reported-by: Sasha Levin <alexander.levin@verizon.com>
Acked-by: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michal Hocko [Tue, 2 Aug 2016 21:02:37 +0000 (14:02 -0700)]
memcg: put soft limit reclaim out of way if the excess tree is empty
We've had a report about soft lockups caused by lock bouncing in the
soft reclaim path:
BUG: soft lockup - CPU#0 stuck for 22s! [kav4proxy-kavic:3128]
RIP: 0010:[<
ffffffff81469798>] [<
ffffffff81469798>] _raw_spin_lock+0x18/0x20
Call Trace:
mem_cgroup_soft_limit_reclaim+0x25a/0x280
shrink_zones+0xed/0x200
do_try_to_free_pages+0x74/0x320
try_to_free_pages+0x112/0x180
__alloc_pages_slowpath+0x3ff/0x820
__alloc_pages_nodemask+0x1e9/0x200
alloc_pages_vma+0xe1/0x290
do_wp_page+0x19f/0x840
handle_pte_fault+0x1cd/0x230
do_page_fault+0x1fd/0x4c0
page_fault+0x25/0x30
There are no memcgs created so there cannot be any in the soft limit
excess obviously:
[...]
memory 0 1 1
so all this just seems to be mem_cgroup_largest_soft_limit_node trying
to get spin_lock_irq(&mctz->lock) just to find out that the soft limit
excess tree is empty. This is just pointless wasting of cycles and
cache line bouncing during heavy parallel reclaim on large machines.
The particular machine wasn't very healthy and most probably suffering
from a memory leak which just caused the memory reclaim to trash
heavily. But bouncing on the lock certainly didn't help...
Fix this by optimistic lockless check and bail out early if the tree is
empty. This is theoretically racy but that shouldn't matter all that
much. First of all soft limit is a best effort feature and it is slowly
getting deprecated and its usage should be really scarce. Bouncing on a
lock without a good reason is surely much bigger problem, especially on
large CPU machines.
Link: http://lkml.kernel.org/r/1470073277-1056-1-git-send-email-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michal Hocko [Tue, 2 Aug 2016 21:02:34 +0000 (14:02 -0700)]
mm, hugetlb: fix huge_pte_alloc BUG_ON
Zhong Jiang has reported a BUG_ON from huge_pte_alloc hitting when he
runs his database load with memory online and offline running in
parallel. The reason is that huge_pmd_share might detect a shared pmd
which is currently migrated and so it has migration pte which is
!pte_huge.
There doesn't seem to be any easy way to prevent from the race and in
fact seeing the migration swap entry is not harmful. Both callers of
huge_pte_alloc are prepared to handle them. copy_hugetlb_page_range
will copy the swap entry and make it COW if needed. hugetlb_fault will
back off and so the page fault is retries if the page is still under
migration and waits for its completion in hugetlb_fault.
That means that the BUG_ON is wrong and we should update it. Let's
simply check that all present ptes are pte_huge instead.
Link: http://lkml.kernel.org/r/20160721074340.GA26398@dhcp22.suse.cz
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: zhongjiang <zhongjiang@huawei.com>
Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jia He [Tue, 2 Aug 2016 21:02:31 +0000 (14:02 -0700)]
mm/hugetlb: avoid soft lockup in set_max_huge_pages()
In powerpc servers with large memory(32TB), we watched several soft
lockups for hugepage under stress tests.
The call traces are as follows:
1.
get_page_from_freelist+0x2d8/0xd50
__alloc_pages_nodemask+0x180/0xc20
alloc_fresh_huge_page+0xb0/0x190
set_max_huge_pages+0x164/0x3b0
2.
prep_new_huge_page+0x5c/0x100
alloc_fresh_huge_page+0xc8/0x190
set_max_huge_pages+0x164/0x3b0
This patch fixes such soft lockups. It is safe to call cond_resched()
there because it is out of spin_lock/unlock section.
Link: http://lkml.kernel.org/r/1469674442-14848-1-git-send-email-hejianet@gmail.com
Signed-off-by: Jia He <hejianet@gmail.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Valdis Kletnieks [Tue, 2 Aug 2016 21:02:28 +0000 (14:02 -0700)]
tools/testing/radix-tree/linux/gfp.h: fix bitrotted value
Apparently, the tools/testing version dates to a few flags ago, and
we've sprouted 4 new ones since. Keep in sync with the value in the
main tree...
Link: http://lkml.kernel.org/r/23400.1469702675@turing-police.cc.vt.edu
Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Minchan Kim [Tue, 2 Aug 2016 21:02:25 +0000 (14:02 -0700)]
mm: move swap-in anonymous page into active list
Every swap-in anonymous page starts from inactive lru list's head. It
should be activated unconditionally when VM decide to reclaim because
page table entry for the page always usually has marked accessed bit.
Thus, their window size for getting a new referece is 2 * NR_inactive +
NR_active while others is NR_inactive + NR_active.
It's not fair that it has more chance to be referenced compared to other
newly allocated page which starts from active lru list's head.
Johannes:
: The page can still have a valid copy on the swap device, so prefering to
: reclaim that page over a fresh one could make sense. But as you point
: out, having it start inactive instead of active actually ends up giving it
: *more* LRU time, and that seems to be without justification.
Rik:
: The reason newly read in swap cache pages start on the inactive list is
: that we do some amount of read-around, and do not know which pages will
: get used.
:
: However, immediately activating the ones that DO get used, like your patch
: does, is the right thing to do.
Link: http://lkml.kernel.org/r/1469762740-17860-1-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vegard Nossum [Tue, 2 Aug 2016 21:02:22 +0000 (14:02 -0700)]
mm: fail prefaulting if page table allocation fails
I ran into this:
BUG: sleeping function called from invalid context at mm/page_alloc.c:3784
in_atomic(): 0, irqs_disabled(): 0, pid: 1434, name: trinity-c1
2 locks held by trinity-c1/1434:
#0: (&mm->mmap_sem){......}, at: [<
ffffffff810ce31e>] __do_page_fault+0x1ce/0x8f0
#1: (rcu_read_lock){......}, at: [<
ffffffff81378f86>] filemap_map_pages+0xd6/0xdd0
CPU: 0 PID: 1434 Comm: trinity-c1 Not tainted 4.7.0+ #58
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
Call Trace:
dump_stack+0x65/0x84
panic+0x185/0x2dd
___might_sleep+0x51c/0x600
__might_sleep+0x90/0x1a0
__alloc_pages_nodemask+0x5b1/0x2160
alloc_pages_current+0xcc/0x370
pte_alloc_one+0x12/0x90
__pte_alloc+0x1d/0x200
alloc_set_pte+0xe3e/0x14a0
filemap_map_pages+0x42b/0xdd0
handle_mm_fault+0x17d5/0x28b0
__do_page_fault+0x310/0x8f0
trace_do_page_fault+0x18d/0x310
do_async_page_fault+0x27/0xa0
async_page_fault+0x28/0x30
The important bits from the above is that filemap_map_pages() is calling
into the page allocator while holding rcu_read_lock (sleeping is not
allowed inside RCU read-side critical sections).
According to Kirill Shutemov, the prefaulting code in do_fault_around()
is supposed to take care of this, but missing error handling means that
the allocation failure can go unnoticed.
We don't need to return VM_FAULT_OOM (or any other error) here, since we
can just let the normal fault path try again.
Fixes:
7267ec008b5c ("mm: postpone page table allocation until we have page to map")
Link: http://lkml.kernel.org/r/1469708107-11868-1-git-send-email-vegard.nossum@oracle.com
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: "Hillf Danton" <hillf.zj@alibaba-inc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
piaojun [Tue, 2 Aug 2016 21:02:19 +0000 (14:02 -0700)]
ocfs2/dlm: continue to purge recovery lockres when recovery master goes down
We found a dlm-blocked situation caused by continuous breakdown of
recovery masters described below. To solve this problem, we should
purge recovery lock once detecting recovery master goes down.
N3 N2 N1(reco master)
go down
pick up recovery lock and
begin recoverying for N2
go down
pick up recovery
lock failed, then
purge it:
dlm_purge_lockres
->DROPPING_REF is set
send deref to N1 failed,
recovery lock is not purged
find N1 go down, begin
recoverying for N1, but
blocked in dlm_do_recovery
as DROPPING_REF is set:
dlm_do_recovery
->dlm_pick_recovery_master
->dlmlock
->dlm_get_lock_resource
->__dlm_wait_on_lockres_flags(tmpres,
DLM_LOCK_RES_DROPPING_REF);
Fixes:
8c0343968163 ("ocfs2/dlm: clear DROPPING_REF flag when the master goes down")
Link: http://lkml.kernel.org/r/578453AF.8030404@huawei.com
Signed-off-by: Jun Piao <piaojun@huawei.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Reviewed-by: Jiufei Xue <xuejiufei@huawei.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
piaojun [Tue, 2 Aug 2016 21:02:16 +0000 (14:02 -0700)]
ocfs2/dlm: solve a BUG when deref failed in dlm_drop_lockres_ref
We found a BUG situation that lockres is migrated during deref described
below. To solve the BUG, we could purge lockres directly when other
node says I did not have a ref. Additionally, we'd better purge lockres
if master goes down, as no one will response deref done.
Node 1 Node 2(old master) Node3(new master)
dlm_purge_lockres
send deref to N2
leave domain
migrate lockres to N3
finish migration
send do assert
master to N1
receive do assert msg
form N3, but can not
find lockres because
DROPPING_REF is set,
so the owner is still
N2.
receive deref from N1
and response -EINVAL
because lockres is migrated
BUG when receive -EINVAL
in dlm_drop_lockres_ref
Fixes:
842b90b62461d ("ocfs2/dlm: return in progress if master can not clear the refmap bit right now")
Link: http://lkml.kernel.org/r/57845103.3070406@huawei.com
Signed-off-by: Jun Piao <piaojun@huawei.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Reviewed-by: Jiufei Xue <xuejiufei@huawei.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>