GitHub/LineageOS/android_kernel_motorola_exynos9610.git
7 years agoarm64: dts: ls1043a: Share all MSIs
Minghuan Lian [Wed, 5 Jul 2017 06:58:59 +0000 (14:58 +0800)]
arm64: dts: ls1043a: Share all MSIs

In order to maximize the use of MSI, a PCIe controller will share
all MSI controllers. The patch changes "msi-parent" to refer to all
MSI controller dts nodes.

Signed-off-by: Minghuan Lian <Minghuan.Lian@nxp.com>
Signed-off-by: Hou Zhiqiang <Zhiqiang.Hou@nxp.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
7 years agoarm: dts: ls1021a: Share all MSIs
Minghuan Lian [Wed, 5 Jul 2017 06:58:58 +0000 (14:58 +0800)]
arm: dts: ls1021a: Share all MSIs

In order to maximize the use of MSI, a PCIe controller will share
all MSI controllers. The patch changes msi-parent to refer to all
MSI controller dts nodes.

Signed-off-by: Minghuan Lian <Minghuan.Lian@nxp.com>
Signed-off-by: Hou Zhiqiang <Zhiqiang.Hou@nxp.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
7 years agoarm64: dts: ls1043a: Fix typo of MSI compatible string
Minghuan Lian [Wed, 5 Jul 2017 06:58:57 +0000 (14:58 +0800)]
arm64: dts: ls1043a: Fix typo of MSI compatible string

"1" should be replaced by "l". This is a typo.
The patch is to fix it.

Signed-off-by: Minghuan Lian <Minghuan.Lian@nxp.com>
Signed-off-by: Hou Zhiqiang <Zhiqiang.Hou@nxp.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
7 years agoarm: dts: ls1021a: Fix typo of MSI compatible string
Minghuan Lian [Wed, 5 Jul 2017 06:58:56 +0000 (14:58 +0800)]
arm: dts: ls1021a: Fix typo of MSI compatible string

"1" should be replaced by "l". This is a typo.
The patch is to fix it.

Signed-off-by: Minghuan Lian <Minghuan.Lian@nxp.com>
Signed-off-by: Hou Zhiqiang <Zhiqiang.Hou@nxp.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
7 years agoirqchip/ls-scfg-msi: Fix typo of MSI compatible strings
Minghuan Lian [Wed, 5 Jul 2017 06:58:55 +0000 (14:58 +0800)]
irqchip/ls-scfg-msi: Fix typo of MSI compatible strings

The patch is to fix typo of the Layerscape SCFG MSI dts compatible
strings. "1" is replaced by "l".

Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Minghuan Lian <Minghuan.Lian@nxp.com>
Signed-off-by: Hou Zhiqiang <Zhiqiang.Hou@nxp.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
7 years agoirqchip/irq-bcm7120-l2: Use correct I/O accessors for irq_fwd_mask
Florian Fainelli [Thu, 31 Aug 2017 00:29:16 +0000 (17:29 -0700)]
irqchip/irq-bcm7120-l2: Use correct I/O accessors for irq_fwd_mask

Initialization of irq_fwd_mask was done using __raw_writel() which
happens to work for all cases except when using ARM BE8 which requires
writel() (with the proper swapping). Move the initialization of the
irq_fwd_mask till later when we have correctly defined our I/O
accessors.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
7 years agoirqchip/mmp: Make mmp_intc_conf const
Bhumika Goyal [Thu, 24 Aug 2017 10:26:21 +0000 (15:56 +0530)]
irqchip/mmp: Make mmp_intc_conf const

Make these const as they are only used during a copy operation. Done
using Coccinelle.

@match disable optional_qualifier@
identifier s;
@@
static struct mmp_intc_conf s = {...};

@ref@
position p;
identifier match.s;
@@
s@p

@good1@
position ref.p;
identifier match.s,f,c;
expression e;
@@
(
e = s@p
|
e = s@p.f
|
c(...,s@p.f,...)
|
c(...,s@p,...)
)

@bad depends on  !good1@
position ref.p;
identifier match.s;
@@
s@p

@depends on forall !bad disable optional_qualifier@
identifier match.s;
@@
static
+ const
struct mmp_intc_conf s;

Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
7 years agoirqchip/gic: Make irq_chip const
Bhumika Goyal [Sat, 19 Aug 2017 10:52:37 +0000 (16:22 +0530)]
irqchip/gic: Make irq_chip const

Make this const as it is only used in a copy operation.
Done using Coccinelle.

Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
7 years agoirqchip/gic-v3: Advertise GICv4 support to KVM
Marc Zyngier [Sun, 25 Jun 2017 13:10:46 +0000 (14:10 +0100)]
irqchip/gic-v3: Advertise GICv4 support to KVM

As KVM needs to know about the availability of GICv4 to enable
direct injection of interrupts, let's advertise the feature in
the gic_kvm_info structure.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
7 years agoirqchip/gic-v4: Enable low-level GICv4 operations
Marc Zyngier [Tue, 20 Dec 2016 15:31:54 +0000 (15:31 +0000)]
irqchip/gic-v4: Enable low-level GICv4 operations

Get the show on the road...

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
7 years agoirqchip/gic-v4: Add some basic documentation
Marc Zyngier [Wed, 21 Dec 2016 17:40:16 +0000 (17:40 +0000)]
irqchip/gic-v4: Add some basic documentation

Do a braindump of the way things are supposed to work.

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
7 years agoirqchip/gic-v4: Add VLPI configuration interface
Marc Zyngier [Wed, 21 Dec 2016 21:50:32 +0000 (21:50 +0000)]
irqchip/gic-v4: Add VLPI configuration interface

Add the required interfaces to map, unmap and update a VLPI.

Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
7 years agoirqchip/gic-v4: Add VPE command interface
Marc Zyngier [Tue, 20 Dec 2016 15:31:02 +0000 (15:31 +0000)]
irqchip/gic-v4: Add VPE command interface

Add the required interfaces to schedule a VPE and perform a
VINVALL command.

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
7 years agoirqchip/gic-v4: Add per-VM VPE domain creation
Marc Zyngier [Tue, 20 Dec 2016 15:27:52 +0000 (15:27 +0000)]
irqchip/gic-v4: Add per-VM VPE domain creation

When creating a VM, it is very convenient to have an irq domain
containing all the doorbell interrupts associated with that VM
(each interrupt representing a VPE).

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
7 years agoirqchip/gic-v3-its: Set implementation defined bit to enable VLPIs
Marc Zyngier [Tue, 27 Jun 2017 20:24:25 +0000 (21:24 +0100)]
irqchip/gic-v3-its: Set implementation defined bit to enable VLPIs

A long time ago, GITS_CTLR[1] used to be called GITC_CTLR.EnableVLPI.
It has been subsequently deprecated and is now an "Implementation
Defined" bit that may ot may not be set for GICv4. Brilliant.

And the current crop of the FastModel requires that bit for VLPIs
to be enabled. Oh well... Let's set it and find out what breaks.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
7 years agoirqchip/gic-v3-its: Allow doorbell interrupts to be injected/cleared
Marc Zyngier [Mon, 31 Jul 2017 13:47:24 +0000 (14:47 +0100)]
irqchip/gic-v3-its: Allow doorbell interrupts to be injected/cleared

While the doorbell interrupts are usually driven by the HW itself,
having a way to trigger them independently has proved to be a
really useful debug feature. As it is actually very little code,
let's add it to the VPE irqchip operations.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
7 years agoirqchip/gic-v3-its: Move pending doorbell after VMOVP
Marc Zyngier [Fri, 18 Aug 2017 15:14:17 +0000 (16:14 +0100)]
irqchip/gic-v3-its: Move pending doorbell after VMOVP

After moving a VPE from a redistributor to another, we're still left
with a potential pending doorbell interrupt on the old redistributor.
That interrupt should be moved to the new one to be either cleared
or take, depending on what the hypervisor wishes to do.

So let's move it right after having execited VMOVP. This doesn't
add much cost in the !DirectLPI case (we trade a DISCARD for a MOVI),
and the cost of the DIRECTLPI case should be minimal (two extra MMIO
accesses).

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
7 years agoirqchip/gic-v3-its: Add device proxy for VPE management if !DirectLpi
Marc Zyngier [Tue, 20 Dec 2016 15:23:22 +0000 (15:23 +0000)]
irqchip/gic-v3-its: Add device proxy for VPE management if !DirectLpi

When we don't have the DirectLPI feature, we must work around the
architecture shortcomings to be able to perform the required
maintenance (interrupt masking, clearing and injection).

For this, we create a fake device whose sole purpose is to
provide a way to issue commands as if we were dealing with LPIs
coming from that device (while they actually originate from
the ITS). This fake device doesn't have LPIs allocated to it,
but instead uses the VPE LPIs.

Of course, this could be a real bottleneck, and a naive
implementation would require 6 commands to issue an invalidation.

Instead, let's allocate at least one event per physical CPU
(rounded up to the next power of 2), and opportunistically
map the VPE doorbell to an event. This doorbell will be mapped
until we roll over and need to reallocate this slot.

This ensures that most of the time, we only need 2 commands
to issue an INV, INT or CLEAR, making the performance a lot
better, given that we always issue a CLEAR on entry, and
an INV on each side of a trapped WFI.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
7 years agoirqchip/gic-v3-its: Make LPI allocation optional on device creation
Marc Zyngier [Fri, 4 Aug 2017 17:37:09 +0000 (18:37 +0100)]
irqchip/gic-v3-its: Make LPI allocation optional on device creation

The normal course of action when allocating the ITS' view of a
device is to allocate the corresponding LPIs. But we're about
to introduce devices that borrow their interrupts from
some other entities.

So let's make the allocation optional.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
7 years agoirqchip/gic-v3-its: Add VPE interrupt masking
Marc Zyngier [Tue, 20 Dec 2016 15:20:38 +0000 (15:20 +0000)]
irqchip/gic-v3-its: Add VPE interrupt masking

When masking/unmasking a doorbell interrupt, it is necessary
to issue an invalidation to the corresponding redistributor.
We use the DirectLPI feature by writting directly to the corresponding
redistributor.

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
7 years agoirqchip/gic-v3-its: Add VPE affinity changes
Marc Zyngier [Tue, 20 Dec 2016 15:17:28 +0000 (15:17 +0000)]
irqchip/gic-v3-its: Add VPE affinity changes

When we're about to run a vcpu, it is crucial that the redistributor
associated with the physical CPU is being told about the new residency.

This is abstracted by hijacking the irq_set_affinity method for the
doorbell interrupt associated with the VPE. It is expected that the
hypervisor will call this method before scheduling the VPE.

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
7 years agoirqchip/gic-v3-its: Add VPE invalidation hook
Marc Zyngier [Tue, 20 Dec 2016 15:10:50 +0000 (15:10 +0000)]
irqchip/gic-v3-its: Add VPE invalidation hook

When a guest issues a INVALL command targetting a collection, it must
be translated into a VINVALL for the VPE that has this collection.

This patch implements a hook that offers this functionallity to the
hypervisor.

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
7 years agoirqchip/gic-v3-its: Add VPE scheduling
Marc Zyngier [Tue, 20 Dec 2016 15:09:31 +0000 (15:09 +0000)]
irqchip/gic-v3-its: Add VPE scheduling

When a VPE is scheduled to run, the corresponding redistributor must
be told so, by setting VPROPBASER to the VM's property table, and
VPENDBASER to the vcpu's pending table.

When scheduled out, we preserve the IDAI and PendingLast bits. The
latter is specially important, as it tells the hypervisor that
there are pending interrupts for this vcpu.

Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
7 years agoirqchip/gic-v3-its: Add VPENDBASER/VPROPBASER accessors
Marc Zyngier [Tue, 3 Jan 2017 13:39:52 +0000 (13:39 +0000)]
irqchip/gic-v3-its: Add VPENDBASER/VPROPBASER accessors

V{PEND,PROP}BASER being 64bit registers, they need some ad-hoc
accessors on 32bit, specially given that VPENDBASER contains
a Valid bit, making the access a bit convoluted.

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
7 years agoirqchip/gic-v3-its: Add VPE irq domain [de]activation
Marc Zyngier [Tue, 20 Dec 2016 14:47:05 +0000 (14:47 +0000)]
irqchip/gic-v3-its: Add VPE irq domain [de]activation

On activation, a VPE is mapped using the VMAPP command, followed
by a VINVALL for a good measure. On deactivation, the VPE is
simply unmapped.

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
7 years agoirqchip/gic-v3-its: Add VPE irq domain allocation/teardown
Marc Zyngier [Tue, 20 Dec 2016 13:55:54 +0000 (13:55 +0000)]
irqchip/gic-v3-its: Add VPE irq domain allocation/teardown

When creating a VM, the low level GICv4 code is responsible for:
- allocating each VPE a unique VPEID
- allocating a doorbell interrupt for each VPE
- allocating the pending tables for each VPE
- allocating the property table for the VM

This of course has to be reversed when the VM is brought down.

All of this is wired into the irq domain alloc/free methods.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
7 years agoirqchip/gic-v3-its: Add VPE domain infrastructure
Marc Zyngier [Tue, 20 Dec 2016 13:41:55 +0000 (13:41 +0000)]
irqchip/gic-v3-its: Add VPE domain infrastructure

Add the basic GICv4 VPE (vcpu in GICv4 parlance) infrastructure
(irqchip, irq domain) that is going to be populated in the following
patches.

Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
7 years agoirqchip/gic-v3-its: Add VLPI configuration handling
Marc Zyngier [Tue, 20 Dec 2016 09:54:57 +0000 (09:54 +0000)]
irqchip/gic-v3-its: Add VLPI configuration handling

When a VLPI is reconfigured (enabled, disabled, change in priority),
the full configuration byte must be written, and the caches invalidated.

Also, when using the irq_mask/irq_unmask methods, it is necessary
to disable the doorbell for that particular interrupt (by mapping it
to 1023) on top of clearing the Enable bit.

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
7 years agoirqchip/gic-v3-its: Add VLPI map/unmap operations
Marc Zyngier [Tue, 20 Dec 2016 09:44:41 +0000 (09:44 +0000)]
irqchip/gic-v3-its: Add VLPI map/unmap operations

In order to let a VLPI being injected into a guest, the VLPI must
be mapped using the VMAPTI command. When moved to a different vcpu,
it must be moved with the VMOVI command.

These commands are issued via the irq_set_vcpu_affinity method,
making sure we unmap the corresponding host LPI first.

The reverse is also done when the VLPI is unmapped from the guest.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
7 years agoirqchip/gic-v3-its: Add VLPI configuration hook
Marc Zyngier [Tue, 20 Dec 2016 09:31:20 +0000 (09:31 +0000)]
irqchip/gic-v3-its: Add VLPI configuration hook

Add the skeleton irq_set_vcpu_affinity method that will be used
to configure VLPIs.

Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
7 years agoirqchip/gic-v3-its: Add GICv4 ITS command definitions
Marc Zyngier [Tue, 20 Dec 2016 15:11:47 +0000 (15:11 +0000)]
irqchip/gic-v3-its: Add GICv4 ITS command definitions

Add the new GICv4 ITS command definitions, most of them, being
defined in terms of their physical counterparts.

Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
7 years agoirqchip/gic-v4: Add management structure definitions
Marc Zyngier [Mon, 19 Dec 2016 19:25:00 +0000 (19:25 +0000)]
irqchip/gic-v4: Add management structure definitions

Add a bunch of GICv4-specific data structures that will get used in
subsequent patches.

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
7 years agoirqchip/gic-v3-its: Generalize LPI configuration
Marc Zyngier [Mon, 19 Dec 2016 19:18:13 +0000 (19:18 +0000)]
irqchip/gic-v3-its: Generalize LPI configuration

We're are going to need to change a bit more than just the enable
bit in the LPI property table in the future. So let's change the
LPI configuration funtion to take a set of bits to be cleared,
and a set of bits to be set.

This way, we'll be able to use it when a guest updates an LPI
property (priority, for example).

Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
7 years agoirqchip/gic-v3-its: Generalize device table allocation
Marc Zyngier [Mon, 19 Dec 2016 18:53:02 +0000 (18:53 +0000)]
irqchip/gic-v3-its: Generalize device table allocation

As we want to use 2-level tables for VCPUs, let's hack the device
table allocator in order to make it slightly more generic. It
will get reused in subsequent patches.

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
7 years agoirqchip/gic-v3-its: Rework LPI freeing
Marc Zyngier [Mon, 19 Dec 2016 18:49:59 +0000 (18:49 +0000)]
irqchip/gic-v3-its: Rework LPI freeing

Rework LPI deallocation so that it can be reused by the v4 support
code.

Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
7 years agoirqchip/gic-v3-its: Split out pending table allocation
Marc Zyngier [Mon, 19 Dec 2016 18:34:38 +0000 (18:34 +0000)]
irqchip/gic-v3-its: Split out pending table allocation

Just as for the property table, let's move the pending table
allocation to a separate function.

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
7 years agoirqchip/gic-v3-its: Allow use of indirect VCPU tables
Marc Zyngier [Mon, 19 Dec 2016 18:18:34 +0000 (18:18 +0000)]
irqchip/gic-v3-its: Allow use of indirect VCPU tables

The VCPU tables can be quite sparse as well, and it makes sense
to use indirect tables as well if possible.

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
7 years agoirqchip/gic-v3-its: Split out property table allocation
Marc Zyngier [Mon, 19 Dec 2016 18:15:05 +0000 (18:15 +0000)]
irqchip/gic-v3-its: Split out property table allocation

Move the LPI property table allocation into its own function, as
this is going to be required for those associated with VMs in
the future.

Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
7 years agoirqchip/gic-v3-its: Implement irq_set_irqchip_state for pending state
Marc Zyngier [Mon, 19 Dec 2016 18:02:13 +0000 (18:02 +0000)]
irqchip/gic-v3-its: Implement irq_set_irqchip_state for pending state

Allow the pending state of an LPI to be set or cleared via
irq_set_irqchip_state.

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
7 years agoirqchip/gic-v3-its: Macro-ize its_send_single_command
Marc Zyngier [Mon, 19 Dec 2016 17:56:32 +0000 (17:56 +0000)]
irqchip/gic-v3-its: Macro-ize its_send_single_command

Most ITS commands do operate on a collection object, and require
a SYNC command to be performed on that collection in order to
guarantee the execution of the first command.

With GICv4 ITS, another set of commands perform similar operations
on a VPE object, and a VSYNC operations must be executed to guarantee
their execution.

Given the similarities (post a command, perform a synchronization
operation on a sync object), it makes sense to reuse the same
mechanism for both class of commands.

Let's start with turning its_send_single_command into a huge macro
that performs the bulk of the work, and a set of helpers that
make this macro usable for the GICv3 ITS commands.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
7 years agoirqchip/gic-v3-its: Add probing for VLPI properties
Marc Zyngier [Mon, 19 Dec 2016 17:25:54 +0000 (17:25 +0000)]
irqchip/gic-v3-its: Add probing for VLPI properties

Add the probing code for the ITS VLPI support. This includes
configuring the ITS number if not supporting the single VMOVP
command feature.

Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
7 years agoirqchip/gic-v3-its: Move LPI definitions around
Marc Zyngier [Mon, 19 Dec 2016 17:15:24 +0000 (17:15 +0000)]
irqchip/gic-v3-its: Move LPI definitions around

The various LPI definitions are in the middle of the code, and
would be better placed at the beginning, given that we're going
to use some of them much earlier.

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
7 years agoirqchip/gic-v3: Add VLPI/DirectLPI discovery
Marc Zyngier [Mon, 19 Dec 2016 17:01:52 +0000 (17:01 +0000)]
irqchip/gic-v3: Add VLPI/DirectLPI discovery

Add helper functions that probe for VLPI and DirectLPI properties.

Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
7 years agoirqchip/gic-v3: Add redistributor iterator
Marc Zyngier [Mon, 19 Dec 2016 17:00:38 +0000 (17:00 +0000)]
irqchip/gic-v3: Add redistributor iterator

In order to discover the VLPI properties, we need to iterate over
the redistributor regions. As we already have code that does this,
let's factor it out and make it slightly more generic.

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
7 years agogenirq: Let irq_set_vcpu_affinity() iterate over hierarchy
Marc Zyngier [Fri, 23 Jun 2017 20:42:57 +0000 (21:42 +0100)]
genirq: Let irq_set_vcpu_affinity() iterate over hierarchy

When assigning an interrupt to a vcpu, it is not unlikely that
the level of the hierarchy implementing irq_set_vcpu_affinity
is not the top level (think a generic MSI domain on top of a
virtualization aware interrupt controller).

In such a case, let's iterate over the hierarchy until we find
an irqchip implementing it.

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
7 years agoirqchip: Convert to using %pOF instead of full_name
Rob Herring [Tue, 18 Jul 2017 21:43:10 +0000 (16:43 -0500)]
irqchip: Convert to using %pOF instead of full_name

Now that we have a custom printf format specifier, convert users of
full_name to use %pOF instead. This is preparation to remove storing
of the full path string for each node.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Lee Jones <lee@kernel.org>
Cc: Stefan Wahren <stefan.wahren@i2se.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Ray Jui <rjui@broadcom.com>
Cc: Scott Branden <sbranden@broadcom.com>
Cc: bcm-kernel-feedback-list@broadcom.com
Cc: Sylvain Lemieux <slemieux.tyco@gmail.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Chen-Yu Tsai <wens@csie.org>
Cc: Thierry Reding <thierry.reding@gmail.com>
Cc: Jonathan Hunter <jonathanh@nvidia.com>
Cc: Michal Simek <michal.simek@xilinx.com>
Cc: "Sören Brinkmann" <soren.brinkmann@xilinx.com>
Cc: linux-rpi-kernel@lists.infradead.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-mediatek@lists.infradead.org
Cc: linux-tegra@vger.kernel.org
Acked-by: Eric Anholt <eric@anholt.net>
Acked-by: Baruch Siach <baruch@tkos.co.il>
Acked-by: Vladimir Zapolskiy <vz@mleia.com>
Acked-by: Matthias Brugger <matthias.bgg@gmail.com>
Acked-by: Alexandre Torgue <alexandre.torgue@st.com>
Acked-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
7 years agoirqchip: Add UniPhier AIDET irqchip driver
Masahiro Yamada [Wed, 23 Aug 2017 01:31:47 +0000 (10:31 +0900)]
irqchip: Add UniPhier AIDET irqchip driver

UniPhier SoCs contain AIDET (ARM Interrupt Detector).  This is intended
to provide additional features that are not covered by GIC.  The main
purpose is to provide logic inverter to support low level and falling
edge trigger types for interrupt lines from on-board devices.

Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
7 years agoirqchip/gic-v3-its: Properly handle command queue wrapping
Marc Zyngier [Sat, 19 Aug 2017 09:16:02 +0000 (10:16 +0100)]
irqchip/gic-v3-its: Properly handle command queue wrapping

wait_for_range_completion() is nicely busted when handling
wrapping of the command queue, leading to an early exit
instead of waiting for the command to have been executed.

Fortunately, the impact is pretty minor, as it only impair
the detection of an ITS that doesn't make any forward progress
for a whole second. And an ITS should *never* lock up.

Reported-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
7 years agoirqchip/armada-370-xp: Enable MSI-X support
Stefan Roese [Fri, 18 Aug 2017 12:59:26 +0000 (14:59 +0200)]
irqchip/armada-370-xp: Enable MSI-X support

Armada XP does not only support MSI, but also MSI-X. This patch sets
the MSI_FLAG_PCI_MSIX flag in the interrupt controller driver which
is the only change necessary to enable MSI-X support on this SoC. As
the Linux PCI MSI-X infrastructure takes care of writing the data and
address structures into the BAR specified by the MSI-X controller.

Signed-off-by: Stefan Roese <sr@denx.de>
Reviewed-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
7 years agoLinux 4.13-rc5
Linus Torvalds [Sun, 13 Aug 2017 23:01:32 +0000 (16:01 -0700)]
Linux 4.13-rc5

7 years agoMerge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus
Linus Torvalds [Sun, 13 Aug 2017 22:34:28 +0000 (15:34 -0700)]
Merge branch 'upstream' of git://git.linux-mips.org/ralf/upstream-linus

Pull MIPS fixes from Ralf Baechle:
 "Another round of MIPS fixes:

   - compressed boot: Ignore a generated .c file

   - VDSO: Fix a register clobber list

   - DECstation: Fix an int-handler.S CPU_DADDI_WORKAROUNDS regression

   - Octeon: Fix recent cleanups that cleaned away a bit too much thus
     breaking the arch side of the EDAC and USB drivers.

   - uasm: Fix duplicate const in "const struct foo const bar[]" which
     GCC 7.1 no longer accepts.

   - Fix race on setting and getting cpu_online_mask

   - Fix preemption issue. To do so cleanly introduce macro to get the
     size of L3 cache line.

   - Revert include cleanup that sometimes results in build error

   - MicroMIPS uses bit 0 of the PC to indicate microMIPS mode. Make
     sure this bit is set for kernel entry as well.

   - Prevent configuring the kernel for both microMIPS and MT. There are
     no such CPUs currently and thus the combination is unsupported and
     results in build errors.

  This has been sitting in linux-next for a few days and has survived
  automated testing by Imagination's test farm. No known regressions
  pending except a number of issues that crept up due to lots of people
  switching to GCC 7.1"

* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
  MIPS: Set ISA bit in entry-y for microMIPS kernels
  MIPS: Prevent building MT support for microMIPS kernels
  MIPS: PCI: Fix smp_processor_id() in preemptible
  MIPS: Introduce cpu_tcache_line_size
  MIPS: DEC: Fix an int-handler.S CPU_DADDI_WORKAROUNDS regression
  MIPS: VDSO: Fix clobber lists in fallback code paths
  Revert "MIPS: Don't unnecessarily include kmalloc.h into <asm/cache.h>."
  MIPS: OCTEON: Fix USB platform code breakage.
  MIPS: Octeon: Fix broken EDAC driver.
  MIPS: gitignore: ignore generated .c files
  MIPS: Fix race on setting and getting cpu_online_mask
  MIPS: mm: remove duplicate "const" qualifier on insn_table

7 years agoMerge tag 'driver-core-4.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 13 Aug 2017 19:44:18 +0000 (12:44 -0700)]
Merge tag 'driver-core-4.13-rc5' of git://git./linux/kernel/git/gregkh/driver-core

Pull driver core fixes from Greg KH:
 "Here are three firmware core fixes for 4.13-rc5.

  All three of these fix reported issues and have been floating around
  for a few weeks. They have been in linux-next with no reported
  problems"

* tag 'driver-core-4.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  firmware: avoid invalid fallback aborts by using killable wait
  firmware: fix batched requests - send wake up on failure on direct lookups
  firmware: fix batched requests - wake all waiters

7 years agoMerge tag 'char-misc-4.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregk...
Linus Torvalds [Sun, 13 Aug 2017 19:41:58 +0000 (12:41 -0700)]
Merge tag 'char-misc-4.13-rc5' of git://git./linux/kernel/git/gregkh/char-misc

Pull char/misc fixes from Greg KH:
 "Here are two patches for 4.13-rc5.

  One is a fix for a reported thunderbolt issue, and the other a fix for
  an MEI driver issue. Both have been in linux-next with no reported
  issues"

* tag 'char-misc-4.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  thunderbolt: Do not enumerate more ports from DROM than the controller has
  mei: exclude device from suspend direct complete optimization

7 years agoMerge tag 'tty-4.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Linus Torvalds [Sun, 13 Aug 2017 19:33:35 +0000 (12:33 -0700)]
Merge tag 'tty-4.13-rc5' of git://git./linux/kernel/git/gregkh/tty

Pull tty/serial fixes from Greg KH:
 "Here are two tty serial driver fixes for 4.13-rc5. One is a revert of
  a -rc1 patch that turned out to not be a good idea, and the other is a
  fix for the pl011 serial driver.

  Both have been in linux-next with no reported issues"

* tag 'tty-4.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  Revert "serial: Delete dead code for CIR serial ports"
  tty: pl011: fix initialization order of QDF2400 E44

7 years agoMerge tag 'staging-4.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh...
Linus Torvalds [Sun, 13 Aug 2017 19:30:17 +0000 (12:30 -0700)]
Merge tag 'staging-4.13-rc5' of git://git./linux/kernel/git/gregkh/staging

Pull staging/iio fixes from Greg KH:
 "Here are some Staging and IIO driver fixes for 4.13-rc5.

  Nothing major, just a number of small fixes for reported issues. All
  of these have been in linux-next for a while now with no reported
  issues. Full details are in the shortlog"

* tag 'staging-4.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  staging: comedi: comedi_fops: do not call blocking ops when !TASK_RUNNING
  iio: aspeed-adc: wait for initial sequence.
  iio: accel: bmc150: Always restore device to normal mode after suspend-resume
  staging:iio:resolver:ad2s1210 fix negative IIO_ANGL_VEL read
  iio: adc: axp288: Fix the GPADC pin reading often wrongly returning 0
  iio: adc: vf610_adc: Fix VALT selection value for REFSEL bits
  iio: accel: st_accel: add SPI-3wire support
  iio: adc: Revert "axp288: Drop bogus AXP288_ADC_TS_PIN_CTRL register modifications"
  iio: adc: sun4i-gpadc-iio: fix unbalanced irq enable/disable
  iio: pressure: st_pressure_core: disable multiread by default for LPS22HB
  iio: light: tsl2563: use correct event code

7 years agoMerge tag 'usb-4.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Linus Torvalds [Sun, 13 Aug 2017 19:27:42 +0000 (12:27 -0700)]
Merge tag 'usb-4.13-rc5' of git://git./linux/kernel/git/gregkh/usb

Pull USB fixes from Greg KH:
 "Here are a number of small USB driver fixes and new device ids for
  4.13-rc5. There is the usual gadget driver fixes, some new quirks for
  "messy" hardware, and some new device ids.

  All have been in linux-next with no reported issues"

* tag 'usb-4.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  USB: serial: pl2303: add new ATEN device id
  usb: quirks: Add no-lpm quirk for Moshi USB to Ethernet Adapter
  USB: Check for dropped connection before switching to full speed
  usb:xhci:Add quirk for Certain failing HP keyboard on reset after resume
  usb: renesas_usbhs: gadget: fix unused-but-set-variable warning
  usb: renesas_usbhs: Fix UGCTRL2 value for R-Car Gen3
  usb: phy: phy-msm-usb: Fix usage of devm_regulator_bulk_get()
  usb: gadget: udc: renesas_usb3: Fix usb_gadget_giveback_request() calling
  usb: dwc3: gadget: Correct ISOC DATA PIDs for short packets
  USB: serial: option: add D-Link DWM-222 device ID
  usb: musb: fix tx fifo flush handling again
  usb: core: unlink urbs from the tail of the endpoint's urb_list
  usb-storage: fix deadlock involving host lock and scsi_done
  uas: Add US_FL_IGNORE_RESIDUE for Initio Corporation INIC-3069
  USB: hcd: Mark secondary HCD as dead if the primary one died
  USB: serial: cp210x: add support for Qivicon USB ZigBee dongle

7 years agoMerge tag 'for-linus-20170812' of git://git.infradead.org/linux-mtd
Linus Torvalds [Sat, 12 Aug 2017 23:19:43 +0000 (16:19 -0700)]
Merge tag 'for-linus-20170812' of git://git.infradead.org/linux-mtd

Pull another MTD fix from Brian Norris:
 "An mtdblock regression occurred in -rc1 (all writes were broken!), in
  the process of some block subsystem refactoring. Noticed and fixed
  last week, but I'm a little slow on the uptake"

* tag 'for-linus-20170812' of git://git.infradead.org/linux-mtd:
  mtd: blkdevs: Fix mtd block write failure

7 years agomtd: blkdevs: Fix mtd block write failure
Abhishek Sahu [Wed, 2 Aug 2017 12:33:05 +0000 (18:03 +0530)]
mtd: blkdevs: Fix mtd block write failure

All the MTD block write requests are failing with
following error messages

    mkfs.ext4  /dev/mtdblock0

    print_req_error: I/O error, dev mtdblock0, sector 0
    Buffer I/O error on dev mtdblock0, logical block 0,
    lost async page write

The control is going to default case after block write request
because of missing return.

Fixes: commit 2a842acab109 ("block: introduce new block status code type")
Signed-off-by: Abhishek Sahu <absahu@codeaurora.org>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
7 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Linus Torvalds [Sat, 12 Aug 2017 19:08:59 +0000 (12:08 -0700)]
Merge git://git./linux/kernel/git/nab/target-pending

Pull SCSI target fixes from Nicholas Bellinger:
 "The highlights include:

   - Fix iscsi-target payload memory leak during
     ISCSI_FLAG_TEXT_CONTINUE (Varun Prakash)

   - Fix tcm_qla2xxx incorrect use of tcm_qla2xxx_free_cmd during ABORT
     (Pascal de Bruijn + Himanshu Madhani + nab)

   - Fix iscsi-target long-standing issue with parallel delete of a
     single network portal across multiple target instances (Gary Guo +
     nab)

   - Fix target dynamic se_node GPF during uncached shutdown regression
     (Justin Maggard + nab)"

* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
  target: Fix node_acl demo-mode + uncached dynamic shutdown regression
  iscsi-target: Fix iscsi_np reset hung task during parallel delete
  qla2xxx: Fix incorrect tcm_qla2xxx_free_cmd use during TMR ABORT (v2)
  cxgbit: fix sg_nents calculation
  iscsi-target: fix invalid flags in text response
  iscsi-target: fix memory leak in iscsit_setup_text_cmd()
  cxgbit: add missing __kfree_skb()
  tcmu: free old string on reconfig
  tcmu: Fix possible to/from address overflow when doing the memcpy

7 years agoMerge tag 'for-linus-4.13b-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 12 Aug 2017 16:01:36 +0000 (09:01 -0700)]
Merge tag 'for-linus-4.13b-rc5-tag' of git://git./linux/kernel/git/xen/tip

Pull xen fixes from Juergen Gross:
 "Some fixes for Xen:

   - a fix for a regression introduced in 4.13 for a Xen HVM-guest
     configured with KASLR

   - a fix for a possible deadlock in the xenbus driver when booting the
     system

   - a fix for lost interrupts in Xen guests"

* tag 'for-linus-4.13b-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/events: Fix interrupt lost during irq_disable and irq_enable
  xen: avoid deadlock in xenbus
  xen: fix hvm guest with kaslr enabled
  xen: split up xen_hvm_init_shared_info()
  x86: provide an init_mem_mapping hypervisor hook

7 years agoMerge tag 'nfs-for-4.13-5' of git://git.linux-nfs.org/projects/anna/linux-nfs
Linus Torvalds [Fri, 11 Aug 2017 20:54:09 +0000 (13:54 -0700)]
Merge tag 'nfs-for-4.13-5' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client fixes from Anna Schumaker:
 "A few more NFS client bugfixes from me for rc5.

  Dros has a stable fix for flexfiles to prevent leaking the
  nfs4_ff_ds_version arrays when freeing a layout, Trond fixed a
  potential recovery loop situation with the TEST_STATEID operation, and
  Christoph fixed up the pNFS blocklayout Kconfig options to prevent
  unsafe use with kernels that don't have large block device support.
  Summary:

  Stable fix:
   - fix leaking nfs4_ff_ds_version array

  Other fixes:
   - improve TEST_STATEID OLD_STATEID handling to prevent recovery loop

   - require 64-bit sector_t for pNFS blocklayout to prevent 32-bit
     compile errors"

* tag 'nfs-for-4.13-5' of git://git.linux-nfs.org/projects/anna/linux-nfs:
  pnfs/blocklayout: require 64-bit sector_t
  NFSv4: Ignore NFS4ERR_OLD_STATEID in nfs41_check_open_stateid()
  nfs/flexfiles: fix leak of nfs4_ff_ds_version arrays

7 years agoMerge branch 'for-linus' of git://git.kernel.dk/linux-block
Linus Torvalds [Fri, 11 Aug 2017 19:26:49 +0000 (12:26 -0700)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:
 "A set of fixes that should go into this series. This contains:

   - Fix from Bart for blk-mq requeue queue running, preventing a
     continued loop of run/restart.

   - Fix for a bio/blk-integrity issue, in two parts. One from
     Christoph, fixing where verification happens, and one from Milan,
     for a NULL profile.

   - NVMe pull request, most of the changes being for nvme-fc, but also
     a few trivial core/pci fixes"

* 'for-linus' of git://git.kernel.dk/linux-block:
  nvme: fix directive command numd calculation
  nvme: fix nvme reset command timeout handling
  nvme-pci: fix CMB sysfs file removal in reset path
  lpfc: support nvmet_fc defer_rcv callback
  nvmet_fc: add defer_req callback for deferment of cmd buffer return
  nvme: strip trailing 0-bytes in wwid_show
  block: Make blk_mq_delay_kick_requeue_list() rerun the queue at a quiet time
  bio-integrity: only verify integrity on the lowest stacked driver
  bio-integrity: Fix regression if profile verify_fn is NULL

7 years agoMerge tag 'mmc-v4.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Linus Torvalds [Fri, 11 Aug 2017 18:56:54 +0000 (11:56 -0700)]
Merge tag 'mmc-v4.13-rc4' of git://git./linux/kernel/git/ulfh/mmc

Pull MMC fixes from Ulf Hansson:
 "MMC core:

   - fix lockdep splat when removing mmc_block module

   - fix the logic for setting eMMC HS400ES signal voltage

  MMC host:

   - omap_hsmmc: add CMD23 capability to fix -EIO errors"

* tag 'mmc-v4.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
  mmc: block: fix lockdep splat when removing mmc_block module
  mmc: mmc: correct the logic for setting HS400ES signal voltage
  mmc: host: omap_hsmmc: Add CMD23 capability to omap_hsmmc driver

7 years agoMerge tag 'fbdev-v4.13-rc5' of git://github.com/bzolnier/linux
Linus Torvalds [Fri, 11 Aug 2017 18:44:18 +0000 (11:44 -0700)]
Merge tag 'fbdev-v4.13-rc5' of git://github.com/bzolnier/linux

Pull fbdev fixes from Bartlomiej Zolnierkiewicz:

 - allow user to disable write combined mapping in efifb driver (Dave
   Airlie)

 - fix use after free bugs on driver removal in imxfb driver (Dan
   Carpenter)

 - fix unused variable warning in omapfb driver (Arnd Bergmann)

* tag 'fbdev-v4.13-rc5' of git://github.com/bzolnier/linux:
  efifb: allow user to disable write combined mapping.
  fbdev: omapfb: remove unused variable
  video: fbdev: imxfb: use after free in imxfb_remove()

7 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi...
Linus Torvalds [Fri, 11 Aug 2017 18:20:48 +0000 (11:20 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/mszeredi/fuse

Pull fuse fixes from Miklos Szeredi:
 "Fix a few bugs in fuse"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: set mapping error in writepage_locked when it fails
  fuse: Dont call set_page_dirty_lock() for ITER_BVEC pages for async_dio
  fuse: initialize the flock flag in fuse_file on allocation

7 years agoMerge tag 'iommu-fixes-v4.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 11 Aug 2017 18:15:51 +0000 (11:15 -0700)]
Merge tag 'iommu-fixes-v4.13-rc4' of git://git./linux/kernel/git/joro/iommu

Pull IOMMU fix from Joerg Roedel:
 "Fix a NULL-pointer dereference in arm_smmu_add_device"

* tag 'iommu-fixes-v4.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  iommu/arm-smmu: fix null-pointer dereference in arm_smmu_add_device

7 years agopnfs/blocklayout: require 64-bit sector_t
Christoph Hellwig [Sat, 5 Aug 2017 08:59:14 +0000 (10:59 +0200)]
pnfs/blocklayout: require 64-bit sector_t

The blocklayout code does not compile cleanly for a 32-bit sector_t,
and also has no reliable checks for devices sizes, which makes it
unsafe to use with a kernel that doesn't support large block devices.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 5c83746a0cf2 ("pnfs/blocklayout: in-kernel GETDEVICEINFO XDR parsing")
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
7 years agoMerge tag 'powerpc-4.13-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc...
Linus Torvalds [Fri, 11 Aug 2017 15:56:01 +0000 (08:56 -0700)]
Merge tag 'powerpc-4.13-6' of git://git./linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 "All fixes for code that went in this cycle.

   - a revert of an optimisation to the syscall exit path, which could
     lead to an oops on either older machines or machines with > 1TB of
     memory

   - disable some deep idle states if the firmware configuration for
     them fails

   - re-enable HARD/SOFT lockup detectors in defconfigs after a Kconfig
     change

   - six fairly small patches fixing bugs in our new watchdog code

  Thanks to: Gautham R Shenoy, Nicholas Piggin"

* tag 'powerpc-4.13-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/watchdog: add locking around init/exit functions
  powerpc/watchdog: Fix marking of stuck CPUs
  powerpc/watchdog: Fix final-check recovered case
  powerpc/watchdog: Moderate touch_nmi_watchdog overhead
  powerpc/watchdog: Improve watchdog lock primitive
  powerpc: NMI IPI improve lock primitive
  powerpc/configs: Re-enable HARD/SOFT lockup detectors
  powerpc/powernv/idle: Disable LOSE_FULL_CONTEXT states when stop-api fails
  Revert "powerpc/64: Avoid restore_math call if possible in syscall exit"

7 years agoiommu/arm-smmu: fix null-pointer dereference in arm_smmu_add_device
Artem Savkov [Tue, 8 Aug 2017 10:26:02 +0000 (12:26 +0200)]
iommu/arm-smmu: fix null-pointer dereference in arm_smmu_add_device

Commit c54451a "iommu/arm-smmu: Fix the error path in arm_smmu_add_device"
removed fwspec assignment in legacy_binding path as redundant which is
wrong. It needs to be updated after fwspec initialisation in
arm_smmu_register_legacy_master() as it is dereferenced later. Without
this there is a NULL-pointer dereference panic during boot on some hosts.

Signed-off-by: Artem Savkov <asavkov@redhat.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
7 years agoxen/events: Fix interrupt lost during irq_disable and irq_enable
Liu Shuo [Sat, 29 Jul 2017 16:59:57 +0000 (00:59 +0800)]
xen/events: Fix interrupt lost during irq_disable and irq_enable

Here is a device has xen-pirq-MSI interrupt. Dom0 might lost interrupt
during driver irq_disable/irq_enable. Here is the scenario,
 1. irq_disable -> disable_dynirq -> mask_evtchn(irq channel)
 2. dev interrupt raised by HW and Xen mark its evtchn as pending
 3. irq_enable -> startup_pirq -> eoi_pirq ->
    clear_evtchn(channel of irq) -> clear pending status
 4. consume_one_event process the irq event without pending bit assert
    which result in interrupt lost once
 5. No HW interrupt raising anymore.

Now use enable_dynirq for enable_pirq of xen_pirq_chip to remove
eoi_pirq when irq_enable.

Signed-off-by: Liu Shuo <shuo.a.liu@intel.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
7 years agoxen: avoid deadlock in xenbus
Juergen Gross [Fri, 28 Jul 2017 14:53:55 +0000 (16:53 +0200)]
xen: avoid deadlock in xenbus

When starting the xenwatch thread a theoretical deadlock situation is
possible:

xs_init() contains:

    task = kthread_run(xenwatch_thread, NULL, "xenwatch");
    if (IS_ERR(task))
        return PTR_ERR(task);
    xenwatch_pid = task->pid;

And xenwatch_thread() does:

    mutex_lock(&xenwatch_mutex);
    ...
    event->handle->callback();
    ...
    mutex_unlock(&xenwatch_mutex);

The callback could call unregister_xenbus_watch() which does:

    ...
    if (current->pid != xenwatch_pid)
        mutex_lock(&xenwatch_mutex);
    ...

In case a watch is firing before xenwatch_pid could be set and the
callback of that watch unregisters a watch, then a self-deadlock would
occur.

Avoid this by setting xenwatch_pid in xenwatch_thread().

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
7 years agoMerge branch 'nvme-4.13' of git://git.infradead.org/nvme into for-linus
Jens Axboe [Fri, 11 Aug 2017 14:07:19 +0000 (08:07 -0600)]
Merge branch 'nvme-4.13' of git://git.infradead.org/nvme into for-linus

Pull NVMe fixes from Christoph:

"A few more small fixes - the fc/lpfc update is the biggest by far."

7 years agoxen: fix hvm guest with kaslr enabled
Juergen Gross [Fri, 28 Jul 2017 10:23:14 +0000 (12:23 +0200)]
xen: fix hvm guest with kaslr enabled

A Xen HVM guest running with KASLR enabled will die rather soon today
because the shared info page mapping is using va() too early. This was
introduced by commit a5d5f328b0e2baa5ee7c119fd66324eb79eeeb66 ("xen:
allocate page for shared info page from low memory").

In order to fix this use early_memremap() to get a temporary virtual
address for shared info until va() can be used safely.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Juergen Gross <jgross@suse.com>
7 years agoxen: split up xen_hvm_init_shared_info()
Juergen Gross [Fri, 28 Jul 2017 10:23:13 +0000 (12:23 +0200)]
xen: split up xen_hvm_init_shared_info()

Instead of calling xen_hvm_init_shared_info() on boot and resume split
it up into a boot time function searching for the pfn to use and a
mapping function doing the hypervisor mapping call.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Juergen Gross <jgross@suse.com>
7 years agox86: provide an init_mem_mapping hypervisor hook
Juergen Gross [Fri, 28 Jul 2017 10:23:12 +0000 (12:23 +0200)]
x86: provide an init_mem_mapping hypervisor hook

Provide a hook in hypervisor_x86 called after setting up initial
memory mapping.

This is needed e.g. by Xen HVM guests to map the hypervisor shared
info page.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Juergen Gross <jgross@suse.com>
7 years agofuse: set mapping error in writepage_locked when it fails
Jeff Layton [Thu, 25 May 2017 10:57:50 +0000 (06:57 -0400)]
fuse: set mapping error in writepage_locked when it fails

This ensures that we see errors on fsync when writeback fails.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
7 years agoMerge tag 'drm-fixes-for-v4.13-rc5' of git://people.freedesktop.org/~airlied/linux
Linus Torvalds [Fri, 11 Aug 2017 05:33:47 +0000 (22:33 -0700)]
Merge tag 'drm-fixes-for-v4.13-rc5' of git://people.freedesktop.org/~airlied/linux

Pull drm fixes from Dave Airlie:
 "Nothing too earth shattering here, it just seems like lots of little
  things all over the place.

  msm has probably the larger amount of changes, but they all seem fine,
  otherwise, some rockchip, i915, etnaviv and exynos fixes, along with
  one nouveau regression fix for some older GPUs"

* tag 'drm-fixes-for-v4.13-rc5' of git://people.freedesktop.org/~airlied/linux: (35 commits)
  drm/nouveau/disp/nv04: avoid creation of output paths
  drm: make DRM_STM default n
  drm/exynos: forbid creating framebuffers from too small GEM buffers
  drm/etnaviv: Fix off-by-one error in reloc checking
  drm/i915: fix backlight invert for non-zero minimum brightness
  drm/i915/shrinker: Wrap need_resched() inside preempt-disable
  drm/i915/perf: fix flex eu registers programming
  drm/i915: Fix out-of-bounds array access in bdw_load_gamma_lut
  drm/i915/gvt: Change the max length of mmio_reg_rw from 4 to 8
  drm/i915/gvt: Initialize MMIO Block with HW state
  drm/rockchip: vop: report error when check resource error
  drm/rockchip: vop: round_up pitches to word align
  drm/rockchip: vop: fix NV12 video display error
  drm/rockchip: vop: fix iommu page fault when resume
  drm/i915/gvt: clean workload queue if error happened
  drm/i915/gvt: change resetting to resetting_eng
  drm/msm: gpu: don't abuse dma_alloc for non-DMA allocations
  drm/msm: gpu: call qcom_mdt interfaces only for ARCH_QCOM
  drm/msm/adreno: Prevent unclocked access when retrieving timestamps
  drm/msm: Remove __user from __u64 data types
  ...

7 years agoMerge branch 'akpm' (patches from Andrew)
Linus Torvalds [Thu, 10 Aug 2017 23:20:52 +0000 (16:20 -0700)]
Merge branch 'akpm' (patches from Andrew)

Merge misc fixes from Andrew Morton:
 "21 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (21 commits)
  userfaultfd: replace ENOSPC with ESRCH in case mm has gone during copy/zeropage
  zram: rework copy of compressor name in comp_algorithm_store()
  rmap: do not call mmu_notifier_invalidate_page() under ptl
  mm: fix list corruptions on shmem shrinklist
  mm/balloon_compaction.c: don't zero ballooned pages
  MAINTAINERS: copy virtio on balloon_compaction.c
  mm: fix KSM data corruption
  mm: fix MADV_[FREE|DONTNEED] TLB flush miss problem
  mm: make tlb_flush_pending global
  mm: refactor TLB gathering API
  Revert "mm: numa: defer TLB flush for THP migration as long as possible"
  mm: migrate: fix barriers around tlb_flush_pending
  mm: migrate: prevent racy access to tlb_flush_pending
  fault-inject: fix wrong should_fail() decision in task context
  test_kmod: fix small memory leak on filesystem tests
  test_kmod: fix the lock in register_test_dev_kmod()
  test_kmod: fix bug which allows negative values on two config options
  test_kmod: fix spelling mistake: "EMTPY" -> "EMPTY"
  userfaultfd: hugetlbfs: remove superfluous page unlock in VM_SHARED case
  mm: ratelimit PFNs busy info message
  ...

7 years agouserfaultfd: replace ENOSPC with ESRCH in case mm has gone during copy/zeropage
Mike Rapoport [Thu, 10 Aug 2017 22:24:32 +0000 (15:24 -0700)]
userfaultfd: replace ENOSPC with ESRCH in case mm has gone during copy/zeropage

When the process exit races with outstanding mcopy_atomic, it would be
better to return ESRCH error.  When such race occurs the process and
it's mm are going away and returning "no such process" to the uffd
monitor seems better fit than ENOSPC.

Link: http://lkml.kernel.org/r/1502111545-32305-1-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Suggested-by: Michal Hocko <mhocko@suse.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Pavel Emelyanov <xemul@virtuozzo.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agozram: rework copy of compressor name in comp_algorithm_store()
Matthias Kaehlcke [Thu, 10 Aug 2017 22:24:29 +0000 (15:24 -0700)]
zram: rework copy of compressor name in comp_algorithm_store()

comp_algorithm_store() passes the size of the source buffer to strlcpy()
instead of the destination buffer size.  Make it explicit that the two
buffers have the same size and use strcpy() instead of strlcpy().  The
latter can be done safely since the function ensures that the string in
the source buffer is terminated.

Link: http://lkml.kernel.org/r/20170803163350.45245-1-mka@chromium.org
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agormap: do not call mmu_notifier_invalidate_page() under ptl
Kirill A. Shutemov [Thu, 10 Aug 2017 22:24:27 +0000 (15:24 -0700)]
rmap: do not call mmu_notifier_invalidate_page() under ptl

MMU notifiers can sleep, but in page_mkclean_one() we call
mmu_notifier_invalidate_page() under page table lock.

Let's instead use mmu_notifier_invalidate_range() outside
page_vma_mapped_walk() loop.

[jglisse@redhat.com: try_to_unmap_one() do not call mmu_notifier under ptl]
Link: http://lkml.kernel.org/r/20170809204333.27485-1-jglisse@redhat.com
Link: http://lkml.kernel.org/r/20170804134928.l4klfcnqatni7vsc@black.fi.intel.com
Fixes: c7ab0d2fdc84 ("mm: convert try_to_unmap_one() to use page_vma_mapped_walk()")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Reported-by: axie <axie@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Writer, Tim" <Tim.Writer@amd.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agomm: fix list corruptions on shmem shrinklist
Cong Wang [Thu, 10 Aug 2017 22:24:24 +0000 (15:24 -0700)]
mm: fix list corruptions on shmem shrinklist

We saw many list corruption warnings on shmem shrinklist:

  WARNING: CPU: 18 PID: 177 at lib/list_debug.c:59 __list_del_entry+0x9e/0xc0
  list_del corruption. prev->next should be ffff9ae5694b82d8, but was ffff9ae5699ba960
  Modules linked in: intel_rapl sb_edac edac_core x86_pkg_temp_thermal coretemp iTCO_wdt iTCO_vendor_support crct10dif_pclmul crc32_pclmul ghash_clmulni_intel raid0 dcdbas shpchp wmi hed i2c_i801 ioatdma lpc_ich i2c_smbus acpi_cpufreq tcp_diag inet_diag sch_fq_codel ipmi_si ipmi_devintf ipmi_msghandler igb ptp crc32c_intel pps_core i2c_algo_bit i2c_core dca ipv6 crc_ccitt
  CPU: 18 PID: 177 Comm: kswapd1 Not tainted 4.9.34-t3.el7.twitter.x86_64 #1
  Hardware name: Dell Inc. PowerEdge C6220/0W6W6G, BIOS 2.2.3 11/07/2013
  Call Trace:
    dump_stack+0x4d/0x66
    __warn+0xcb/0xf0
    warn_slowpath_fmt+0x4f/0x60
    __list_del_entry+0x9e/0xc0
    shmem_unused_huge_shrink+0xfa/0x2e0
    shmem_unused_huge_scan+0x20/0x30
    super_cache_scan+0x193/0x1a0
    shrink_slab.part.41+0x1e3/0x3f0
    shrink_slab+0x29/0x30
    shrink_node+0xf9/0x2f0
    kswapd+0x2d8/0x6c0
    kthread+0xd7/0xf0
    ret_from_fork+0x22/0x30

  WARNING: CPU: 23 PID: 639 at lib/list_debug.c:33 __list_add+0x89/0xb0
  list_add corruption. prev->next should be next (ffff9ae5699ba960), but was ffff9ae5694b82d8. (prev=ffff9ae5694b82d8).
  Modules linked in: intel_rapl sb_edac edac_core x86_pkg_temp_thermal coretemp iTCO_wdt iTCO_vendor_support crct10dif_pclmul crc32_pclmul ghash_clmulni_intel raid0 dcdbas shpchp wmi hed i2c_i801 ioatdma lpc_ich i2c_smbus acpi_cpufreq tcp_diag inet_diag sch_fq_codel ipmi_si ipmi_devintf ipmi_msghandler igb ptp crc32c_intel pps_core i2c_algo_bit i2c_core dca ipv6 crc_ccitt
  CPU: 23 PID: 639 Comm: systemd-udevd Tainted: G        W       4.9.34-t3.el7.twitter.x86_64 #1
  Hardware name: Dell Inc. PowerEdge C6220/0W6W6G, BIOS 2.2.3 11/07/2013
  Call Trace:
    dump_stack+0x4d/0x66
    __warn+0xcb/0xf0
    warn_slowpath_fmt+0x4f/0x60
    __list_add+0x89/0xb0
    shmem_setattr+0x204/0x230
    notify_change+0x2ef/0x440
    do_truncate+0x5d/0x90
    path_openat+0x331/0x1190
    do_filp_open+0x7e/0xe0
    do_sys_open+0x123/0x200
    SyS_open+0x1e/0x20
    do_syscall_64+0x61/0x170
    entry_SYSCALL64_slow_path+0x25/0x25

The problem is that shmem_unused_huge_shrink() moves entries from the
global sbinfo->shrinklist to its local lists and then releases the
spinlock.  However, a parallel shmem_setattr() could access one of these
entries directly and add it back to the global shrinklist if it is
removed, with the spinlock held.

The logic itself looks solid since an entry could be either in a local
list or the global list, otherwise it is removed from one of them by
list_del_init().  So probably the race condition is that, one CPU is in
the middle of INIT_LIST_HEAD() but the other CPU calls list_empty()
which returns true too early then the following list_add_tail() sees a
corrupted entry.

list_empty_careful() is designed to fix this situation.

[akpm@linux-foundation.org: add comments]
Link: http://lkml.kernel.org/r/20170803054630.18775-1-xiyou.wangcong@gmail.com
Fixes: 779750d20b93 ("shmem: split huge pages beyond i_size under memory pressure")
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agomm/balloon_compaction.c: don't zero ballooned pages
Wei Wang [Thu, 10 Aug 2017 22:24:21 +0000 (15:24 -0700)]
mm/balloon_compaction.c: don't zero ballooned pages

Revert commit bb01b64cfab7 ("mm/balloon_compaction.c: enqueue zero page
to balloon device")'

Zeroing ballon pages is rather time consuming, especially when a lot of
pages are in flight. E.g. 7GB worth of ballooned memory takes 2.8s with
__GFP_ZERO while it takes ~491ms without it.

The original commit argued that zeroing will help ksmd to merge these
pages on the host but this argument is assuming that the host actually
marks balloon pages for ksm which is not universally true.  So we pay
performance penalty for something that even might not be used in the end
which is wrong.  The host can zero out pages on its own when there is a
need.

[mhocko@kernel.org: new changelog text]
Link: http://lkml.kernel.org/r/1501761557-9758-1-git-send-email-wei.w.wang@intel.com
Fixes: bb01b64cfab7 ("mm/balloon_compaction.c: enqueue zero page to balloon device")
Signed-off-by: Wei Wang <wei.w.wang@intel.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: zhenwei.pi <zhenwei.pi@youruncloud.com>
Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoMAINTAINERS: copy virtio on balloon_compaction.c
Michael S. Tsirkin [Thu, 10 Aug 2017 22:24:18 +0000 (15:24 -0700)]
MAINTAINERS: copy virtio on balloon_compaction.c

Changes to mm/balloon_compaction.c can easily break virtio, and virtio
is the only user of that interface.  Add a line to MAINTAINERS so
whoever changes that file remembers to copy us.

Link: http://lkml.kernel.org/r/1501764010-24456-1-git-send-email-mst@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Rafael Aquini <aquini@redhat.com>
Acked-by: Wei Wang <wei.w.wang@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agomm: fix KSM data corruption
Minchan Kim [Thu, 10 Aug 2017 22:24:15 +0000 (15:24 -0700)]
mm: fix KSM data corruption

Nadav reported KSM can corrupt the user data by the TLB batching
race[1].  That means data user written can be lost.

Quote from Nadav Amit:
 "For this race we need 4 CPUs:

  CPU0: Caches a writable and dirty PTE entry, and uses the stale value
  for write later.

  CPU1: Runs madvise_free on the range that includes the PTE. It would
  clear the dirty-bit. It batches TLB flushes.

  CPU2: Writes 4 to /proc/PID/clear_refs , clearing the PTEs soft-dirty.
  We care about the fact that it clears the PTE write-bit, and of
  course, batches TLB flushes.

  CPU3: Runs KSM. Our purpose is to pass the following test in
  write_protect_page():

if (pte_write(*pvmw.pte) || pte_dirty(*pvmw.pte) ||
    (pte_protnone(*pvmw.pte) && pte_savedwrite(*pvmw.pte)))

  Since it will avoid TLB flush. And we want to do it while the PTE is
  stale. Later, and before replacing the page, we would be able to
  change the page.

  Note that all the operations the CPU1-3 perform canhappen in parallel
  since they only acquire mmap_sem for read.

  We start with two identical pages. Everything below regards the same
  page/PTE.

  CPU0        CPU1        CPU2        CPU3
  ----        ----        ----        ----
  Write the same
  value on page

  [cache PTE as
   dirty in TLB]

              MADV_FREE
              pte_mkclean()

                          4 > clear_refs
                          pte_wrprotect()

                                      write_protect_page()
                                      [ success, no flush ]

                                      pages_indentical()
                                      [ ok ]

  Write to page
  different value

  [Ok, using stale
   PTE]

                                      replace_page()

  Later, CPU1, CPU2 and CPU3 would flush the TLB, but that is too late.
  CPU0 already wrote on the page, but KSM ignored this write, and it got
  lost"

In above scenario, MADV_FREE is fixed by changing TLB batching API
including [set|clear]_tlb_flush_pending.  Remained thing is soft-dirty
part.

This patch changes soft-dirty uses TLB batching API instead of
flush_tlb_mm and KSM checks pending TLB flush by using
mm_tlb_flush_pending so that it will flush TLB to avoid data lost if
there are other parallel threads pending TLB flush.

[1] http://lkml.kernel.org/r/BD3A0EBE-ECF4-41D4-87FA-C755EA9AB6BD@gmail.com

Link: http://lkml.kernel.org/r/20170802000818.4760-8-namit@vmware.com
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Nadav Amit <namit@vmware.com>
Reported-by: Nadav Amit <namit@vmware.com>
Tested-by: Nadav Amit <namit@vmware.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Hugh Dickins <hughd@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agomm: fix MADV_[FREE|DONTNEED] TLB flush miss problem
Minchan Kim [Thu, 10 Aug 2017 22:24:12 +0000 (15:24 -0700)]
mm: fix MADV_[FREE|DONTNEED] TLB flush miss problem

Nadav reported parallel MADV_DONTNEED on same range has a stale TLB
problem and Mel fixed it[1] and found same problem on MADV_FREE[2].

Quote from Mel Gorman:
 "The race in question is CPU 0 running madv_free and updating some PTEs
  while CPU 1 is also running madv_free and looking at the same PTEs.
  CPU 1 may have writable TLB entries for a page but fail the pte_dirty
  check (because CPU 0 has updated it already) and potentially fail to
  flush.

  Hence, when madv_free on CPU 1 returns, there are still potentially
  writable TLB entries and the underlying PTE is still present so that a
  subsequent write does not necessarily propagate the dirty bit to the
  underlying PTE any more. Reclaim at some unknown time at the future
  may then see that the PTE is still clean and discard the page even
  though a write has happened in the meantime. I think this is possible
  but I could have missed some protection in madv_free that prevents it
  happening."

This patch aims for solving both problems all at once and is ready for
other problem with KSM, MADV_FREE and soft-dirty story[3].

TLB batch API(tlb_[gather|finish]_mmu] uses [inc|dec]_tlb_flush_pending
and mmu_tlb_flush_pending so that when tlb_finish_mmu is called, we can
catch there are parallel threads going on.  In that case, forcefully,
flush TLB to prevent for user to access memory via stale TLB entry
although it fail to gather page table entry.

I confirmed this patch works with [4] test program Nadav gave so this
patch supersedes "mm: Always flush VMA ranges affected by zap_page_range
v2" in current mmotm.

NOTE:

This patch modifies arch-specific TLB gathering interface(x86, ia64,
s390, sh, um).  It seems most of architecture are straightforward but
s390 need to be careful because tlb_flush_mmu works only if
mm->context.flush_mm is set to non-zero which happens only a pte entry
really is cleared by ptep_get_and_clear and friends.  However, this
problem never changes the pte entries but need to flush to prevent
memory access from stale tlb.

[1] http://lkml.kernel.org/r/20170725101230.5v7gvnjmcnkzzql3@techsingularity.net
[2] http://lkml.kernel.org/r/20170725100722.2dxnmgypmwnrfawp@suse.de
[3] http://lkml.kernel.org/r/BD3A0EBE-ECF4-41D4-87FA-C755EA9AB6BD@gmail.com
[4] https://patchwork.kernel.org/patch/9861621/

[minchan@kernel.org: decrease tlb flush pending count in tlb_finish_mmu]
Link: http://lkml.kernel.org/r/20170808080821.GA31730@bbox
Link: http://lkml.kernel.org/r/20170802000818.4760-7-namit@vmware.com
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Nadav Amit <namit@vmware.com>
Reported-by: Nadav Amit <namit@vmware.com>
Reported-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agomm: make tlb_flush_pending global
Minchan Kim [Thu, 10 Aug 2017 22:24:09 +0000 (15:24 -0700)]
mm: make tlb_flush_pending global

Currently, tlb_flush_pending is used only for CONFIG_[NUMA_BALANCING|
COMPACTION] but upcoming patches to solve subtle TLB flush batching
problem will use it regardless of compaction/NUMA so this patch doesn't
remove the dependency.

[akpm@linux-foundation.org: remove more ifdefs from world's ugliest printk statement]
Link: http://lkml.kernel.org/r/20170802000818.4760-6-namit@vmware.com
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Nadav Amit <namit@vmware.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agomm: refactor TLB gathering API
Minchan Kim [Thu, 10 Aug 2017 22:24:05 +0000 (15:24 -0700)]
mm: refactor TLB gathering API

This patch is a preparatory patch for solving race problems caused by
TLB batch.  For that, we will increase/decrease TLB flush pending count
of mm_struct whenever tlb_[gather|finish]_mmu is called.

Before making it simple, this patch separates architecture specific part
and rename it to arch_tlb_[gather|finish]_mmu and generic part just
calls it.

It shouldn't change any behavior.

Link: http://lkml.kernel.org/r/20170802000818.4760-5-namit@vmware.com
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Nadav Amit <namit@vmware.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoRevert "mm: numa: defer TLB flush for THP migration as long as possible"
Nadav Amit [Thu, 10 Aug 2017 22:24:02 +0000 (15:24 -0700)]
Revert "mm: numa: defer TLB flush for THP migration as long as possible"

While deferring TLB flushes is a good practice, the reverted patch
caused pending TLB flushes to be checked while the page-table lock is
not taken.  As a result, in architectures with weak memory model (PPC),
Linux may miss a memory-barrier, miss the fact TLB flushes are pending,
and cause (in theory) a memory corruption.

Since the alternative of using smp_mb__after_unlock_lock() was
considered a bit open-coded, and the performance impact is expected to
be small, the previous patch is reverted.

This reverts b0943d61b8fa ("mm: numa: defer TLB flush for THP migration
as long as possible").

Link: http://lkml.kernel.org/r/20170802000818.4760-4-namit@vmware.com
Signed-off-by: Nadav Amit <namit@vmware.com>
Suggested-by: Mel Gorman <mgorman@suse.de>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agomm: migrate: fix barriers around tlb_flush_pending
Nadav Amit [Thu, 10 Aug 2017 22:23:59 +0000 (15:23 -0700)]
mm: migrate: fix barriers around tlb_flush_pending

Reading tlb_flush_pending while the page-table lock is taken does not
require a barrier, since the lock/unlock already acts as a barrier.
Removing the barrier in mm_tlb_flush_pending() to address this issue.

However, migrate_misplaced_transhuge_page() calls mm_tlb_flush_pending()
while the page-table lock is already released, which may present a
problem on architectures with weak memory model (PPC).  To deal with
this case, a new parameter is added to mm_tlb_flush_pending() to
indicate if it is read without the page-table lock taken, and calling
smp_mb__after_unlock_lock() in this case.

Link: http://lkml.kernel.org/r/20170802000818.4760-3-namit@vmware.com
Signed-off-by: Nadav Amit <namit@vmware.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agomm: migrate: prevent racy access to tlb_flush_pending
Nadav Amit [Thu, 10 Aug 2017 22:23:56 +0000 (15:23 -0700)]
mm: migrate: prevent racy access to tlb_flush_pending

Patch series "fixes of TLB batching races", v6.

It turns out that Linux TLB batching mechanism suffers from various
races.  Races that are caused due to batching during reclamation were
recently handled by Mel and this patch-set deals with others.  The more
fundamental issue is that concurrent updates of the page-tables allow
for TLB flushes to be batched on one core, while another core changes
the page-tables.  This other core may assume a PTE change does not
require a flush based on the updated PTE value, while it is unaware that
TLB flushes are still pending.

This behavior affects KSM (which may result in memory corruption) and
MADV_FREE and MADV_DONTNEED (which may result in incorrect behavior).  A
proof-of-concept can easily produce the wrong behavior of MADV_DONTNEED.
Memory corruption in KSM is harder to produce in practice, but was
observed by hacking the kernel and adding a delay before flushing and
replacing the KSM page.

Finally, there is also one memory barrier missing, which may affect
architectures with weak memory model.

This patch (of 7):

Setting and clearing mm->tlb_flush_pending can be performed by multiple
threads, since mmap_sem may only be acquired for read in
task_numa_work().  If this happens, tlb_flush_pending might be cleared
while one of the threads still changes PTEs and batches TLB flushes.

This can lead to the same race between migration and
change_protection_range() that led to the introduction of
tlb_flush_pending.  The result of this race was data corruption, which
means that this patch also addresses a theoretically possible data
corruption.

An actual data corruption was not observed, yet the race was was
confirmed by adding assertion to check tlb_flush_pending is not set by
two threads, adding artificial latency in change_protection_range() and
using sysctl to reduce kernel.numa_balancing_scan_delay_ms.

Link: http://lkml.kernel.org/r/20170802000818.4760-2-namit@vmware.com
Fixes: 20841405940e ("mm: fix TLB flush race between migration, and
change_protection_range")
Signed-off-by: Nadav Amit <namit@vmware.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agofault-inject: fix wrong should_fail() decision in task context
Akinobu Mita [Thu, 10 Aug 2017 22:23:53 +0000 (15:23 -0700)]
fault-inject: fix wrong should_fail() decision in task context

Commit 1203c8e6fb0a ("fault-inject: simplify access check for fail-nth")
unintentionally broke a conditional statement in should_fail().  Any
faults are not injected in the task context by the change when the
systematic fault injection is not used.

This change restores to the previous correct behaviour.

Link: http://lkml.kernel.org/r/1501633700-3488-1-git-send-email-akinobu.mita@gmail.com
Fixes: 1203c8e6fb0a ("fault-inject: simplify access check for fail-nth")
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Reported-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Tested-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agotest_kmod: fix small memory leak on filesystem tests
Dan Carpenter [Thu, 10 Aug 2017 22:23:50 +0000 (15:23 -0700)]
test_kmod: fix small memory leak on filesystem tests

The break was in the wrong place so file system tests don't work as
intended, leaking memory at each test switch.

[mcgrof@kernel.org: massaged commit subject, noted memory leak issue without the fix]
Link: http://lkml.kernel.org/r/20170802211450.27928-6-mcgrof@kernel.org
Fixes: 39258f448d71 ("kmod: add test driver to stress test the module loader")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
Reported-by: David Binderman <dcb314@hotmail.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Jessica Yu <jeyu@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Michal Marek <mmarek@suse.com>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agotest_kmod: fix the lock in register_test_dev_kmod()
Dan Carpenter [Thu, 10 Aug 2017 22:23:47 +0000 (15:23 -0700)]
test_kmod: fix the lock in register_test_dev_kmod()

We accidentally just drop the lock twice instead of taking it and then
releasing it.  This isn't a big issue unless you are adding more than
one device to test on, and the kmod.sh doesn't do that yet, however this
obviously is the correct thing to do.

[mcgrof@kernel.org: massaged subject, explain what happens]
Link: http://lkml.kernel.org/r/20170802211450.27928-5-mcgrof@kernel.org
Fixes: 39258f448d71 ("kmod: add test driver to stress test the module loader")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: David Binderman <dcb314@hotmail.com>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Jessica Yu <jeyu@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Michal Marek <mmarek@suse.com>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agotest_kmod: fix bug which allows negative values on two config options
Luis R. Rodriguez [Thu, 10 Aug 2017 22:23:44 +0000 (15:23 -0700)]
test_kmod: fix bug which allows negative values on two config options

Parsing with kstrtol() enables values to be negative, and we failed to
check for negative values when parsing with test_dev_config_update_uint_sync()
or test_dev_config_update_uint_range().

test_dev_config_update_uint_range() has a minimum check though so an
issue is not present there.  test_dev_config_update_uint_sync() is only
used for the number of threads to use (config_num_threads_store()), and
indeed this would fail with an attempt for a large allocation.

Although the issue is only present in practice with the first fix both
by using kstrtoul() instead of kstrtol().

Link: http://lkml.kernel.org/r/20170802211450.27928-4-mcgrof@kernel.org
Fixes: 39258f448d71 ("kmod: add test driver to stress test the module loader")
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: David Binderman <dcb314@hotmail.com>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Jessica Yu <jeyu@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Michal Marek <mmarek@suse.com>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agotest_kmod: fix spelling mistake: "EMTPY" -> "EMPTY"
Colin Ian King [Thu, 10 Aug 2017 22:23:40 +0000 (15:23 -0700)]
test_kmod: fix spelling mistake: "EMTPY" -> "EMPTY"

Trivial fix to spelling mistake in snprintf text

[mcgrof@kernel.org: massaged commit message]
Link: http://lkml.kernel.org/r/20170802211450.27928-3-mcgrof@kernel.org
Fixes: 39258f448d71 ("kmod: add test driver to stress test the module loader")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Jessica Yu <jeyu@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michal Marek <mmarek@suse.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: David Binderman <dcb314@hotmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agouserfaultfd: hugetlbfs: remove superfluous page unlock in VM_SHARED case
Andrea Arcangeli [Thu, 10 Aug 2017 22:23:38 +0000 (15:23 -0700)]
userfaultfd: hugetlbfs: remove superfluous page unlock in VM_SHARED case

huge_add_to_page_cache->add_to_page_cache implicitly unlocks the page
before returning in case of errors.

The error returned was -EEXIST by running UFFDIO_COPY on a non-hole
offset of a VM_SHARED hugetlbfs mapping.  It was an userland bug that
triggered it and the kernel must cope with it returning -EEXIST from
ioctl(UFFDIO_COPY) as expected.

  page dumped because: VM_BUG_ON_PAGE(!PageLocked(page))
  kernel BUG at mm/filemap.c:964!
  invalid opcode: 0000 [#1] SMP
  CPU: 1 PID: 22582 Comm: qemu-system-x86 Not tainted 4.11.11-300.fc26.x86_64 #1
  RIP: unlock_page+0x4a/0x50
  Call Trace:
    hugetlb_mcopy_atomic_pte+0xc0/0x320
    mcopy_atomic+0x96f/0xbe0
    userfaultfd_ioctl+0x218/0xe90
    do_vfs_ioctl+0xa5/0x600
    SyS_ioctl+0x79/0x90
    entry_SYSCALL_64_fastpath+0x1a/0xa9

Link: http://lkml.kernel.org/r/20170802165145.22628-2-aarcange@redhat.com
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Tested-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Alexey Perevalov <a.perevalov@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agomm: ratelimit PFNs busy info message
Jonathan Toppins [Thu, 10 Aug 2017 22:23:35 +0000 (15:23 -0700)]
mm: ratelimit PFNs busy info message

The RDMA subsystem can generate several thousand of these messages per
second eventually leading to a kernel crash.  Ratelimit these messages
to prevent this crash.

Doug said:
 "I've been carrying a version of this for several kernel versions. I
  don't remember when they started, but we have one (and only one) class
  of machines: Dell PE R730xd, that generate these errors. When it
  happens, without a rate limit, we get rcu timeouts and kernel oopses.
  With the rate limit, we just get a lot of annoying kernel messages but
  the machine continues on, recovers, and eventually the memory
  operations all succeed"

And:
 "> Well... why are all these EBUSY's occurring? It sounds inefficient
  > (at least) but if it is expected, normal and unavoidable then
  > perhaps we should just remove that message altogether?

  I don't have an answer to that question. To be honest, I haven't
  looked real hard. We never had this at all, then it started out of the
  blue, but only on our Dell 730xd machines (and it hits all of them),
  but no other classes or brands of machines. And we have our 730xd
  machines loaded up with different brands and models of cards (for
  instance one dedicated to mlx4 hardware, one for qib, one for mlx5, an
  ocrdma/cxgb4 combo, etc), so the fact that it hit all of the machines
  meant it wasn't tied to any particular brand/model of RDMA hardware.
  To me, it always smelled of a hardware oddity specific to maybe the
  CPUs or mainboard chipsets in these machines, so given that I'm not an
  mm expert anyway, I never chased it down.

  A few other relevant details: it showed up somewhere around 4.8/4.9 or
  thereabouts. It never happened before, but the prinkt has been there
  since the 3.18 days, so possibly the test to trigger this message was
  changed, or something else in the allocator changed such that the
  situation started happening on these machines?

  And, like I said, it is specific to our 730xd machines (but they are
  all identical, so that could mean it's something like their specific
  ram configuration is causing the allocator to hit this on these
  machine but not on other machines in the cluster, I don't want to say
  it's necessarily the model of chipset or CPU, there are other bits of
  identicalness between these machines)"

Link: http://lkml.kernel.org/r/499c0f6cc10d6eb829a67f2a4d75b4228a9b356e.1501695897.git.jtoppins@redhat.com
Signed-off-by: Jonathan Toppins <jtoppins@redhat.com>
Reviewed-by: Doug Ledford <dledford@redhat.com>
Tested-by: Doug Ledford <dledford@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agomm: fix global NR_SLAB_.*CLAIMABLE counter reads
Johannes Weiner [Thu, 10 Aug 2017 22:23:31 +0000 (15:23 -0700)]
mm: fix global NR_SLAB_.*CLAIMABLE counter reads

As Tetsuo points out:
 "Commit 385386cff4c6 ("mm: vmstat: move slab statistics from zone to
  node counters") broke "Slab:" field of /proc/meminfo . It shows nearly
  0kB"

In addition to /proc/meminfo, this problem also affects the slab
counters OOM/allocation failure info dumps, can cause early -ENOMEM from
overcommit protection, and miscalculate image size requirements during
suspend-to-disk.

This is because the patch in question switched the slab counters from
the zone level to the node level, but forgot to update the global
accessor functions to read the aggregate node data instead of the
aggregate zone data.

Use global_node_page_state() to access the global slab counters.

Fixes: 385386cff4c6 ("mm: vmstat: move slab statistics from zone to node counters")
Link: http://lkml.kernel.org/r/20170801134256.5400-1-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Stefan Agner <stefan@agner.ch>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoMerge tag 'pci-v4.13-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaa...
Linus Torvalds [Thu, 10 Aug 2017 21:52:45 +0000 (14:52 -0700)]
Merge tag 'pci-v4.13-fixes-2' of git://git./linux/kernel/git/helgaas/pci

Pull PCI fix from Bjorn Helgaas:
 "Work around Renesas uPD72020x 32-bit DMA issue"

* tag 'pci-v4.13-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  xhci: Reset Renesas uPD72020x USB controller for 32-bit DMA issue
  PCI: Add pci_reset_function_locked()