git.stricted.de - GitHub/mt8127/android_kernel_alcatel

ARM: 8618/1: decompressor: reset ttbcr fields to use TTBR0 on ARMv7

commit 117e5e9c4cfcb7628f08de074fbfefec1bb678b7 upstream.

If the bootloader uses the long descriptor format and jumps to
kernel decompressor code, TTBCR may not be in a right state.
Before enabling the MMU, it is required to clear the TTBCR.PD0
field to use TTBR0 for translation table walks.

The commit dbece45894d3a ("ARM: 7501/1: decompressor:
reset ttbcr for VMSA ARMv7 cores") does the reset of TTBCR.N, but
doesn't consider all the bits for the size of TTBCR.N.

Clear TTBCR.PD0 field and reset all the three bits of TTBCR.N to
indicate the use of TTBR0 and the correct base address width.

Fixes: dbece45894d3 ("ARM: 7501/1: decompressor: reset ttbcr for VMSA ARMv7 cores")
Acked-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Srinivas Ramana <sramana@codeaurora.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Willy Tarreau <w@1wt.eu>

ARM: 8616/1: dt: Respect property size when parsing CPUs

commit ba6dea4f7cedb4b1c17e36f4087675d817c2e24b upstream.

Whilst MPIDR values themselves are less than 32 bits, it is still
perfectly valid for a DT to have #address-cells > 1 in the CPUs node,
resulting in the "reg" property having leading zero cell(s). In that
situation, the big-endian nature of the data conspires with the current
behaviour of only reading the first cell to cause the kernel to think
all CPUs have ID 0, and become resoundingly unhappy as a consequence.

Take the full property length into account when parsing CPUs so as to
be correct under any circumstances.

Cc: Russell King <linux@armlinux.org.uk>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Willy Tarreau <w@1wt.eu>

iommu/amd: Free domain id when free a domain of struct dma_ops_domain

commit c3db901c54466a9c135d1e6e95fec452e8a42666 upstream.

The current code missed freeing domain id when free a domain of
struct dma_ops_domain.

Signed-off-by: Baoquan He <bhe@redhat.com>
Fixes: ec487d1a110a ('x86, AMD IOMMU: add domain allocation and deallocation functions')
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Willy Tarreau <w@1wt.eu>

iommu/amd: Update Alias-DTE in update_device_table()

commit 3254de6bf74fe94c197c9f819fe62a3a3c36f073 upstream.

Not doing so might cause IO-Page-Faults when a device uses
an alias request-id and the alias-dte is left in a lower
page-mode which does not cover the address allocated from
the iova-allocator.

Fixes: 492667dacc0a ('x86/amd-iommu: Remove amd_iommu_pd_table')
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Willy Tarreau <w@1wt.eu>

x86/um: reuse asm-generic/barrier.h

commit 577f183acc88645eae116326cc2203dc88ea730c upstream.

On x86/um CONFIG_SMP is never defined. As a result, several macros
match the asm-generic variant exactly. Drop the local definitions and
pull in asm-generic/barrier.h instead.

This is in preparation to refactoring this code area.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Richard Weinberger <richard@nod.at>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>

x86/build: Build compressed x86 kernels as PIE

commit 6d92bc9d483aa1751755a66fee8fb39dffb088c0 upstream.

The 32-bit x86 assembler in binutils 2.26 will generate R_386_GOT32X
relocation to get the symbol address in PIC.  When the compressed x86
kernel isn't built as PIC, the linker optimizes R_386_GOT32X relocations
to their fixed symbol addresses.  However, when the compressed x86
kernel is loaded at a different address, it leads to the following
load failure:

  Failed to allocate space for phdrs

during the decompression stage.

If the compressed x86 kernel is relocatable at run-time, it should be
compiled with -fPIE, instead of -fPIC, if possible and should be built as
Position Independent Executable (PIE) so that linker won't optimize
R_386_GOT32X relocation to its fixed symbol address.

Older linkers generate R_386_32 relocations against locally defined
symbols, _bss, _ebss, _got and _egot, in PIE.  It isn't wrong, just less
optimal than R_386_RELATIVE.  But the x86 kernel fails to properly handle
R_386_32 relocations when relocating the kernel.  To generate
R_386_RELATIVE relocations, we mark _bss, _ebss, _got and _egot as
hidden in both 32-bit and 64-bit x86 kernels.

To build a 64-bit compressed x86 kernel as PIE, we need to disable the
relocation overflow check to avoid relocation overflow errors. We do
this with a new linker command-line option, -z noreloc-overflow, which
got added recently:

commit 4c10bbaa0912742322f10d9d5bb630ba4e15dfa7
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Tue Mar 15 11:07:06 2016 -0700

    Add -z noreloc-overflow option to x86-64 ld

    Add -z noreloc-overflow command-line option to the x86-64 ELF linker to
    disable relocation overflow check.  This can be used to avoid relocation
    overflow check if there will be no dynamic relocation overflow at
    run-time.

The 64-bit compressed x86 kernel is built as PIE only if the linker supports
-z noreloc-overflow.  So far 64-bit relocatable compressed x86 kernel
boots fine even when it is built as a normal executable.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
[ Edited the changelog and comments. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>

x86/paravirt: Do not trace _paravirt_ident_*() functions

commit 15301a570754c7af60335d094dd2d1808b0641a5 upstream.

Å\81ukasz Daniluk reported that on a RHEL kernel that his machine would lock up
after enabling function tracer. I asked him to bisect the functions within
available_filter_functions, which he did and it came down to three:

  _paravirt_nop(), _paravirt_ident_32() and _paravirt_ident_64()

It was found that this is only an issue when noreplace-paravirt is added
to the kernel command line.

This means that those functions are most likely called within critical
sections of the funtion tracer, and must not be traced.

In newer kenels _paravirt_nop() is defined within gcc asm(), and is no
longer an issue.  But both _paravirt_ident_{32,64}() causes the
following splat when they are traced:

mm/pgtable-generic.c:33: bad pmd ffff8800d2435150(0000000001d00054)
mm/pgtable-generic.c:33: bad pmd ffff8800d3624190(0000000001d00070)
mm/pgtable-generic.c:33: bad pmd ffff8800d36a5110(0000000001d00054)
mm/pgtable-generic.c:33: bad pmd ffff880118eb1450(0000000001d00054)
NMI watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [systemd-journal:469]
Modules linked in: e1000e
CPU: 2 PID: 469 Comm: systemd-journal Not tainted 4.6.0-rc4-test+ #513
Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v02.05 05/07/2012
task: ffff880118f740c0 ti: ffff8800d4aec000 task.ti: ffff8800d4aec000
RIP: 0010:[<ffffffff81134148>]  [<ffffffff81134148>] queued_spin_lock_slowpath+0x118/0x1a0
RSP: 0018:ffff8800d4aefb90  EFLAGS: 00000246
RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff88011eb16d40
RDX: ffffffff82485760 RSI: 000000001f288820 RDI: ffffea0000008030
RBP: ffff8800d4aefb90 R08: 00000000000c0000 R09: 0000000000000000
R10: ffffffff821c8e0e R11: 0000000000000000 R12: ffff880000200fb8
R13: 00007f7a4e3f7000 R14: ffffea000303f600 R15: ffff8800d4b562e0
FS:  00007f7a4e3d7840(0000) GS:ffff88011eb00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f7a4e3f7000 CR3: 00000000d3e71000 CR4: 00000000001406e0
Call Trace:
   _raw_spin_lock+0x27/0x30
   handle_pte_fault+0x13db/0x16b0
   handle_mm_fault+0x312/0x670
   __do_page_fault+0x1b1/0x4e0
   do_page_fault+0x22/0x30
   page_fault+0x28/0x30
   __vfs_read+0x28/0xe0
   vfs_read+0x86/0x130
   SyS_read+0x46/0xa0
   entry_SYSCALL_64_fastpath+0x1e/0xa8
Code: 12 48 c1 ea 0c 83 e8 01 83 e2 30 48 98 48 81 c2 40 6d 01 00 48 03 14 c5 80 6a 5d 82 48 89 0a 8b 41 08 85 c0 75 09 f3 90 8b 41 08 <85> c0 74 f7 4c 8b 09 4d 85 c9 74 08 41 0f 18 09 eb 02 f3 90 8b

Reported-by: Å\81ukasz Daniluk <lukasz.daniluk@intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>

x86/mm/pat, /dev/mem: Remove superfluous error message

commit 39380b80d72723282f0ea1d1bbf2294eae45013e upstream.

Currently it's possible for broken (or malicious) userspace to flood a
kernel log indefinitely with messages a-la

Program dmidecode tried to access /dev/mem between f0000->100000

because range_is_allowed() is case of CONFIG_STRICT_DEVMEM being turned on
dumps this information each and every time devmem_is_allowed() fails.

Reportedly userspace that is able to trigger contignuous flow of these
messages exists.

It would be possible to rate limit this message, but that'd have a
questionable value; the administrator wouldn't get information about all
the failing accessess, so then the information would be both superfluous
and incomplete at the same time :)

Returning EPERM (which is what is actually happening) is enough indication
for userspace what has happened; no need to log this particular error as
some sort of special condition.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Link: http://lkml.kernel.org/r/alpine.LNX.2.00.1607081137020.24757@cbobk.fhfr.pm
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>

x86/apic: Do not init irq remapping if ioapic is disabled

commit 2e63ad4bd5dd583871e6602f9d398b9322d358d9 upstream.

native_smp_prepare_cpus
  -> default_setup_apic_routing
    -> enable_IR_x2apic
      -> irq_remapping_prepare
        -> intel_prepare_irq_remapping
          -> intel_setup_irq_remapping

So IR table is setup even if "noapic" boot parameter is added. As a result we
crash later when the interrupt affinity is set due to a half initialized
remapping infrastructure.

Prevent remap initialization when IOAPIC is disabled.

Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Joerg Roedel <joro@8bytes.org>
Link: http://lkml.kernel.org/r/1471954039-3942-1-git-send-email-wanpeng.li@hotmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Willy Tarreau <w@1wt.eu>

x86/mm: Disable preemption during CR3 read+write

commit 5cf0791da5c162ebc14b01eb01631cfa7ed4fa6e upstream.

There's a subtle preemption race on UP kernels:

Usually current->mm (and therefore mm->pgd) stays the same during the
lifetime of a task so it does not matter if a task gets preempted during
the read and write of the CR3.

But then, there is this scenario on x86-UP:

TaskA is in do_exit() and exit_mm() sets current->mm = NULL followed by:

-> mmput()
-> exit_mmap()
-> tlb_finish_mmu()
-> tlb_flush_mmu()
-> tlb_flush_mmu_tlbonly()
-> tlb_flush()
-> flush_tlb_mm_range()
-> __flush_tlb_up()
-> __flush_tlb()
-> __native_flush_tlb()

At this point current->mm is NULL but current->active_mm still points to
the "old" mm.

Let's preempt taskA _after_ native_read_cr3() by taskB. TaskB has its
own mm so CR3 has changed.

Now preempt back to taskA. TaskA has no ->mm set so it borrows taskB's
mm and so CR3 remains unchanged. Once taskA gets active it continues
where it was interrupted and that means it writes its old CR3 value
back. Everything is fine because userland won't need its memory
anymore.

Now the fun part:

Let's preempt taskA one more time and get back to taskB. This
time switch_mm() won't do a thing because oldmm (->active_mm)
is the same as mm (as per context_switch()). So we remain
with a bad CR3 / PGD and return to userland.

The next thing that happens is handle_mm_fault() with an address for
the execution of its code in userland. handle_mm_fault() realizes that
it has a PTE with proper rights so it returns doing nothing. But the
CPU looks at the wrong PGD and insists that something is wrong and
faults again. And again. And one more timeâ\80¦

This pagefault circle continues until the scheduler gets tired of it and
puts another task on the CPU. It gets little difficult if the task is a
RT task with a high priority. The system will either freeze or it gets
fixed by the software watchdog thread which usually runs at RT-max prio.
But waiting for the watchdog will increase the latency of the RT task
which is no good.

Fix this by disabling preemption across the critical code section.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/1470404259-26290-1-git-send-email-bigeasy@linutronix.de
[ Prettified the changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>

x86/traps: Ignore high word of regs->cs in early_idt_handler_common

This is a backport of:
commit fc0e81b2bea0ebceb71889b61d2240856141c9ee upstream

On the 80486 DX, it seems that some exceptions may leave garbage in
the high bits of CS.  This causes sporadic failures in which
early_fixup_exception() refuses to fix up an exception.

As far as I can tell, this has been buggy for a long time, but the
problem seems to have been exacerbated by commits:

  1e02ce4cccdc ("x86: Store a per-cpu shadow copy of CR4")
  e1bfc11c5a6f ("x86/init: Fix cr4_init_shadow() on CR4-less machines")

This appears to have broken for as long as we've had early
exception handling.

[ This backport should apply to kernels from 3.4 - 4.5. ]

Fixes: 4c5023a3fa2e ("x86-32: Handle exception table entries during early boot")
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: stable@vger.kernel.org
Reported-by: Matthew Whitehead <tedheadster@gmail.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>

x86/xen: fix upper bound of pmd loop in xen_cleanhighmap()

commit 1cf38741308c64d08553602b3374fb39224eeb5a upstream.

xen_cleanhighmap() is operating on level2_kernel_pgt only. The upper
bound of the loop setting non-kernel-image entries to zero should not
exceed the size of level2_kernel_pgt.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>

xen-pciback: Add name prefix to global 'permissive' variable

commit 8014bcc86ef112eab9ee1db312dba4e6b608cf89 upstream.

The variable for the 'permissive' module parameter used to be static
but was recently changed to be extern. This puts it in the kernel
global namespace if the driver is built-in, so its name should begin
with a prefix identifying the driver.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Fixes: af6fc858a35b ("xen-pciback: limit guest control of command register")
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>

xen/pciback: Don't allow MSI-X ops if PCI_COMMAND_MEMORY is not set.

commit 408fb0e5aa7fda0059db282ff58c3b2a4278baa0 upstream.

commit f598282f51 ("PCI: Fix the NIU MSI-X problem in a better way")
teaches us that dealing with MSI-X can be troublesome.

Further checks in the MSI-X architecture shows that if the
PCI_COMMAND_MEMORY bit is turned of in the PCI_COMMAND we
may not be able to access the BAR (since they are memory regions).

Since the MSI-X tables are located in there.. that can lead
to us causing PCIe errors. Inhibit us performing any
operation on the MSI-X unless the MEMORY bit is set.

Note that Xen hypervisor with:
"x86/MSI-X: access MSI-X table only after having enabled MSI-X"
will return:
xen_pciback: 0000:0a:00.1: error -6 enabling MSI-X for guest 3!

When the generic MSI code tries to setup the PIRQ without
MEMORY bit set. Which means with later versions of Xen
(4.6) this patch is not neccessary.

This is part of XSA-157

Reviewed-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>

xen/pciback: For XEN_PCI_OP_disable_msi[|x] only disable if device has MSI(X) enabled.

commit 7cfb905b9638982862f0331b36ccaaca5d383b49 upstream.

Otherwise just continue on, returning the same values as
previously (return of 0, and op->result has the PIRQ value).

This does not change the behavior of XEN_PCI_OP_disable_msi[|x].

The pci_disable_msi or pci_disable_msix have the checks for
msi_enabled or msix_enabled so they will error out immediately.

However the guest can still call these operations and cause
us to disable the 'ack_intr'. That means the backend IRQ handler
for the legacy interrupt will not respond to interrupts anymore.

This will lead to (if the device is causing an interrupt storm)
for the Linux generic code to disable the interrupt line.

Naturally this will only happen if the device in question
is plugged in on the motherboard on shared level interrupt GSI.

This is part of XSA-157

Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>

xen/pciback: Do not install an IRQ handler for MSI interrupts.

commit a396f3a210c3a61e94d6b87ec05a75d0be2a60d0 upstream.

Otherwise an guest can subvert the generic MSI code to trigger
an BUG_ON condition during MSI interrupt freeing:

for (i = 0; i < entry->nvec_used; i++)
BUG_ON(irq_has_action(entry->irq + i));

Xen PCI backed installs an IRQ handler (request_irq) for
the dev->irq whenever the guest writes PCI_COMMAND_MEMORY
(or PCI_COMMAND_IO) to the PCI_COMMAND register. This is
done in case the device has legacy interrupts the GSI line
is shared by the backend devices.

To subvert the backend the guest needs to make the backend
to change the dev->irq from the GSI to the MSI interrupt line,
make the backend allocate an interrupt handler, and then command
the backend to free the MSI interrupt and hit the BUG_ON.

Since the backend only calls 'request_irq' when the guest
writes to the PCI_COMMAND register the guest needs to call
XEN_PCI_OP_enable_msi before any other operation. This will
cause the generic MSI code to setup an MSI entry and
populate dev->irq with the new PIRQ value.

Then the guest can write to PCI_COMMAND PCI_COMMAND_MEMORY
and cause the backend to setup an IRQ handler for dev->irq
(which instead of the GSI value has the MSI pirq). See
'xen_pcibk_control_isr'.

Then the guest disables the MSI: XEN_PCI_OP_disable_msi
which ends up triggering the BUG_ON condition in 'free_msi_irqs'
as there is an IRQ handler for the entry->irq (dev->irq).

Note that this cannot be done using MSI-X as the generic
code does not over-write dev->irq with the MSI-X PIRQ values.

The patch inhibits setting up the IRQ handler if MSI or
MSI-X (for symmetry reasons) code had been called successfully.

P.S.
Xen PCIBack when it sets up the device for the guest consumption
ends up writting 0 to the PCI_COMMAND (see xen_pcibk_reset_device).
XSA-120 addendum patch removed that - however when upstreaming said
addendum we found that it caused issues with qemu upstream. That
has now been fixed in qemu upstream.

This is part of XSA-157

Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>

xen/pciback: Return error on XEN_PCI_OP_enable_msix when device has MSI or MSI-X enabled

commit 5e0ce1455c09dd61d029b8ad45d82e1ac0b6c4c9 upstream.

The guest sequence of:

a) XEN_PCI_OP_enable_msix
b) XEN_PCI_OP_enable_msix

results in hitting an NULL pointer due to using freed pointers.

The device passed in the guest MUST have MSI-X capability.

The a) constructs and SysFS representation of MSI and MSI groups.
The b) adds a second set of them but adding in to SysFS fails (duplicate entry).
'populate_msi_sysfs' frees the newly allocated msi_irq_groups (note that
in a) pdev->msi_irq_groups is still set) and also free's ALL of the
MSI-X entries of the device (the ones allocated in step a) and b)).

The unwind code: 'free_msi_irqs' deletes all the entries and tries to
delete the pdev->msi_irq_groups (which hasn't been set to NULL).
However the pointers in the SysFS are already freed and we hit an
NULL pointer further on when 'strlen' is attempted on a freed pointer.

The patch adds a simple check in the XEN_PCI_OP_enable_msix to guard
against that. The check for msi_enabled is not stricly neccessary.

This is part of XSA-157

Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>

xen/pciback: Return error on XEN_PCI_OP_enable_msi when device has MSI or MSI-X enabled

commit 56441f3c8e5bd45aab10dd9f8c505dd4bec03b0d upstream.

The guest sequence of:

a) XEN_PCI_OP_enable_msi
b) XEN_PCI_OP_enable_msi
c) XEN_PCI_OP_disable_msi

results in hitting an BUG_ON condition in the msi.c code.

The MSI code uses an dev->msi_list to which it adds MSI entries.
Under the above conditions an BUG_ON() can be hit. The device
passed in the guest MUST have MSI capability.

The a) adds the entry to the dev->msi_list and sets msi_enabled.
The b) adds a second entry but adding in to SysFS fails (duplicate entry)
and deletes all of the entries from msi_list and returns (with msi_enabled
is still set). c) pci_disable_msi passes the msi_enabled checks and hits:

BUG_ON(list_empty(dev_to_msi_list(&dev->dev)));

and blows up.

The patch adds a simple check in the XEN_PCI_OP_enable_msi to guard
against that. The check for msix_enabled is not stricly neccessary.

This is part of XSA-157.

Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>

xen/pciback: Save the number of MSI-X entries to be copied later.

commit d159457b84395927b5a52adb72f748dd089ad5e5 upstream.

Commit 8135cf8b092723dbfcc611fe6fdcb3a36c9951c5 (xen/pciback: Save
xen_pci_op commands before processing it) broke enabling MSI-X because
it would never copy the resulting vectors into the response. The
number of vectors requested was being overwritten by the return value
(typically zero for success).

Save the number of vectors before processing the op, so the correct
number of vectors are copied afterwards.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: <stable@vger.kernel.org>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>

xen/pciback: Save xen_pci_op commands before processing it

commit 8135cf8b092723dbfcc611fe6fdcb3a36c9951c5 upstream.

Double fetch vulnerabilities that happen when a variable is
fetched twice from shared memory but a security check is only
performed the first time.

The xen_pcibk_do_op function performs a switch statements on the op->cmd
value which is stored in shared memory. Interestingly this can result
in a double fetch vulnerability depending on the performed compiler
optimization.

This patch fixes it by saving the xen_pci_op command before
processing it. We also use 'barrier' to make sure that the
compiler does not perform any optimization.

This is part of XSA155.

Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Jan Beulich <JBeulich@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: "Jan Beulich" <JBeulich@suse.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>

xen-blkback: only read request operation from shared ring once

commit 1f13d75ccb806260079e0679d55d9253e370ec8a upstream.

A compiler may load a switch statement value multiple times, which could
be bad when the value is in memory shared with the frontend.

When converting a non-native request to a native one, ensure that
src->operation is only loaded once by using READ_ONCE().

This is part of XSA155.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: "Jan Beulich" <JBeulich@suse.com>
[wt: s/READ_ONCE/ACCESS_ONCE for 3.10]
Signed-off-by: Willy Tarreau <w@1wt.eu>

xen-netback: use RING_COPY_REQUEST() throughout

commit 68a33bfd8403e4e22847165d149823a2e0e67c9c upstream.

Instead of open-coding memcpy()s and directly accessing Tx and Rx
requests, use the new RING_COPY_REQUEST() that ensures the local copy
is correct.

This is more than is strictly necessary for guest Rx requests since
only the id and gref fields are used and it is harmless if the
frontend modifies these.

This is part of XSA155.

Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[wt: adjustments for 3.10 : netbk_rx_meta instead of struct xenvif_rx_meta]

Signed-off-by: Willy Tarreau <w@1wt.eu>

xen-netback: don't use last request to determine minimum Tx credit

commit 0f589967a73f1f30ab4ac4dd9ce0bb399b4d6357 upstream.

The last from guest transmitted request gives no indication about the
minimum amount of credit that the guest might need to send a packet
since the last packet might have been a small one.

Instead allow for the worst case 128 KiB packet.

This is part of XSA155.

Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>

xen: Add RING_COPY_REQUEST()

commit 454d5d882c7e412b840e3c99010fe81a9862f6fb upstream.

Using RING_GET_REQUEST() on a shared ring is easy to use incorrectly
(i.e., by not considering that the other end may alter the data in the
shared ring while it is being inspected). Safe usage of a request
generally requires taking a local copy.

Provide a RING_COPY_REQUEST() macro to use instead of
RING_GET_REQUEST() and an open-coded memcpy(). This takes care of
ensuring that the copy is done correctly regardless of any possible
compiler optimizations.

Use a volatile source to prevent the compiler from reordering or
omitting the copy.

This is part of XSA155.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>

x86/mm/xen: Suppress hugetlbfs in PV guests

commit 103f6112f253017d7062cd74d17f4a514ed4485c upstream.

Huge pages are not normally available to PV guests. Not suppressing
hugetlbfs use results in an endless loop of page faults when user mode
code tries to access a hugetlbfs mapped area (since the hypervisor
denies such PTEs to be created, but error indications can't be
propagated out of xen_set_pte_at(), just like for various of its
siblings), and - once killed in an oops like this:

  kernel BUG at .../fs/hugetlbfs/inode.c:428!
  invalid opcode: 0000 [#1] SMP
  ...
  RIP: e030:[<ffffffff811c333b>]  [<ffffffff811c333b>] remove_inode_hugepages+0x25b/0x320
  ...
  Call Trace:
   [<ffffffff811c3415>] hugetlbfs_evict_inode+0x15/0x40
   [<ffffffff81167b3d>] evict+0xbd/0x1b0
   [<ffffffff8116514a>] __dentry_kill+0x19a/0x1f0
   [<ffffffff81165b0e>] dput+0x1fe/0x220
   [<ffffffff81150535>] __fput+0x155/0x200
   [<ffffffff81079fc0>] task_work_run+0x60/0xa0
   [<ffffffff81063510>] do_exit+0x160/0x400
   [<ffffffff810637eb>] do_group_exit+0x3b/0xa0
   [<ffffffff8106e8bd>] get_signal+0x1ed/0x470
   [<ffffffff8100f854>] do_signal+0x14/0x110
   [<ffffffff810030e9>] prepare_exit_to_usermode+0xe9/0xf0
   [<ffffffff814178a5>] retint_user+0x8/0x13

This is CVE-2016-3961 / XSA-174.

Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Juergen Gross <JGross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: xen-devel <xen-devel@lists.xenproject.org>
Link: http://lkml.kernel.org/r/57188ED802000078000E431C@prv-mh.provo.novell.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>

ppp: defer netns reference release for ppp channel

commit 205e1e255c479f3fd77446415706463b282f94e4 upstream

Matt reported that we have a NULL pointer dereference
in ppp_pernet() from ppp_connect_channel(),
i.e. pch->chan_net is NULL.

This is due to that a parallel ppp_unregister_channel()
could happen while we are in ppp_connect_channel(), during
which pch->chan_net set to NULL. Since we need a reference
to net per channel, it makes sense to sync the refcnt
with the life time of the channel, therefore we should
release this reference when we destroy it.

Fixes: 1f461dcdd296 ("ppp: take reference on channels netns")
Reported-by: Matt Bennett <Matt.Bennett@alliedtelesis.co.nz>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linux-ppp@vger.kernel.org
Cc: Guillaume Nault <g.nault@alphalink.fr>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>

PM / devfreq: Fix incorrect type issue.

commit 5f25f066f75a67835abb5e400471a27abd09395b upstream

time_in_state in struct devfreq is defined as unsigned long, so
devm_kzalloc should use sizeof(unsigned long) as argument instead
of sizeof(unsigned int), otherwise it will cause unexpected result
in 64bit system.

Signed-off-by: Xiaolong Ye <yexl@marvell.com>
Signed-off-by: Kevin Liu <kliu5@marvell.com>
Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>

KVM: Disable irq while unregistering user notifier

commit 1650b4ebc99da4c137bfbfc531be4a2405f951dd upstream.

Function user_notifier_unregister should be called only once for each
registered user notifier.

Function kvm_arch_hardware_disable can be executed from an IPI context
which could cause a race condition with a VCPU returning to user mode
and attempting to unregister the notifier.

Signed-off-by: Ignacio Alvarado <ikalvarado@google.com>
Fixes: 18863bdd60f8 ("KVM: x86 shared msr infrastructure")
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim KrÄ\8dmÃ¡Å\99 <rkrcmar@redhat.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>

KVM: x86: fix missed SRCU usage in kvm_lapic_set_vapic_addr

commit 7301d6abaea926d685832f7e1f0c37dd206b01f4 upstream.

Reported by syzkaller:

    [ INFO: suspicious RCU usage. ]
    4.9.0-rc4+ #47 Not tainted
    -------------------------------
    ./include/linux/kvm_host.h:536 suspicious rcu_dereference_check() usage!

    stack backtrace:
    CPU: 1 PID: 6679 Comm: syz-executor Not tainted 4.9.0-rc4+ #47
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
     ffff880039e2f6d0 ffffffff81c2e46b ffff88003e3a5b40 0000000000000000
     0000000000000001 ffffffff83215600 ffff880039e2f700 ffffffff81334ea9
     ffffc9000730b000 0000000000000004 ffff88003c4f8420 ffff88003d3f8000
    Call Trace:
     [<     inline     >] __dump_stack lib/dump_stack.c:15
     [<ffffffff81c2e46b>] dump_stack+0xb3/0x118 lib/dump_stack.c:51
     [<ffffffff81334ea9>] lockdep_rcu_suspicious+0x139/0x180 kernel/locking/lockdep.c:4445
     [<     inline     >] __kvm_memslots include/linux/kvm_host.h:534
     [<     inline     >] kvm_memslots include/linux/kvm_host.h:541
     [<ffffffff8105d6ae>] kvm_gfn_to_hva_cache_init+0xa1e/0xce0 virt/kvm/kvm_main.c:1941
     [<ffffffff8112685d>] kvm_lapic_set_vapic_addr+0xed/0x140 arch/x86/kvm/lapic.c:2217

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Fixes: fda4e2e85589191b123d31cdc21fd33ee70f50fd
Cc: Andrew Honig <ahonig@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim KrÄ\8dmÃ¡Å\99 <rkrcmar@redhat.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>

KVM: x86: fix wbinvd_dirty_mask use-after-free

commit bd768e146624cbec7122ed15dead8daa137d909d upstream.

vcpu->arch.wbinvd_dirty_mask may still be used after freeing it,
corrupting memory. For example, the following call trace may set a bit
in an already freed cpu mask:
    kvm_arch_vcpu_load
    vcpu_load
    vmx_free_vcpu_nested
    vmx_free_vcpu
    kvm_arch_vcpu_free

Fix this by deferring freeing of wbinvd_dirty_mask.

Signed-off-by: Ido Yariv <ido@wizery.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim KrÄ\8dmÃ¡Å\99 <rkrcmar@redhat.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>

KVM: MIPS: Make ERET handle ERL before EXL

commit ede5f3e7b54a4347be4d8525269eae50902bd7cd upstream.

The ERET instruction to return from exception is used for returning from
exception level (Status.EXL) and error level (Status.ERL). If both bits
are set however we should be returning from ERL first, as ERL can
interrupt EXL, for example when an NMI is taken. KVM however checks EXL
first.

Fix the order of the checks to match the pseudocode in the instruction
set manual.

Fixes: e685c689f3a8 ("KVM/MIPS32: Privileged instruction/target branch emulation.")
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim KrÄ\8dmÃ¡Å\99" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>

KVM: nVMX: postpone VMCS changes on MSR_IA32_APICBASE write

commit dccbfcf52cebb8963246eba5b177b77f26b34da0 upstream.

If vmcs12 does not intercept APIC_BASE writes, then KVM will handle the
write with vmcs02 as the current VMCS.
This will incorrectly apply modifications intended for vmcs01 to vmcs02
and L2 can use it to gain access to L0's x2APIC registers by disabling
virtualized x2APIC while using msr bitmap that assumes enabled.

Postpone execution of vmx_set_virtual_x2apic_mode until vmcs01 is the
current VMCS. An alternative solution would temporarily make vmcs01 the
current VMCS, but it requires more care.

Fixes: 8d14695f9542 ("x86, apicv: add virtual x2apic support")
Reported-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Radim KrÄ\8dmÃ¡Å\99 <rkrcmar@redhat.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>

KVM: MIPS: Drop other CPU ASIDs on guest MMU changes

commit 91e4f1b6073dd680d86cdb7e42d7cccca9db39d8 upstream.

When a guest TLB entry is replaced by TLBWI or TLBWR, we only invalidate
TLB entries on the local CPU. This doesn't work correctly on an SMP host
when the guest is migrated to a different physical CPU, as it could pick
up stale TLB mappings from the last time the vCPU ran on that physical
CPU.

Therefore invalidate both user and kernel host ASIDs on other CPUs,
which will cause new ASIDs to be generated when it next runs on those
CPUs.

We're careful only to do this if the TLB entry was already valid, and
only for the kernel ASID where the virtual address it mapped is outside
of the guest user address range.

Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim KrÄ\8dmÃ¡Å\99" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Cc: <stable@vger.kernel.org> # 3.10.x-
Cc: Jiri Slaby <jslaby@suse.cz>
[james.hogan@imgtec.com: Backport to 3.10..3.16]
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>

KVM: MIPS: Precalculate MMIO load resume PC

commit e1e575f6b026734be3b1f075e780e91ab08ca541 upstream.

The advancing of the PC when completing an MMIO load is done before
re-entering the guest, i.e. before restoring the guest ASID. However if
the load is in a branch delay slot it may need to access guest code to
read the prior branch instruction. This isn't safe in TLB mapped code at
the moment, nor in the future when we'll access unmapped guest segments
using direct user accessors too, as it could read the branch from host
user memory instead.

Therefore calculate the resume PC in advance while we're still in the
right context and save it in the new vcpu->arch.io_pc (replacing the no
longer needed vcpu->arch.pending_load_cause), and restore it on MMIO
completion.

Fixes: e685c689f3a8 ("KVM/MIPS32: Privileged instruction/target branch emulation.")
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim KrÄ\8dmÃ¡Å\99 <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Cc: <stable@vger.kernel.org> # 3.10.x-3.16.x: 5f508c43a764: MIPS: KVM: Fix unused variable build warning
Cc: <stable@vger.kernel.org> # 3.10.x-3.16.x
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[james.hogan@imgtec.com: Backport to 3.10..3.16]
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>

MIPS: KVM: Fix unused variable build warning

commit 5f508c43a7648baa892528922402f1e13f258bd4 upstream.

As kvm_mips_complete_mmio_load() did not yet modify PC at this point
as James Hogans <james.hogan@imgtec.com> explained the curr_pc variable
and the comments along with it can be dropped.

Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org>
Link: http://lkml.org/lkml/2015/5/8/422
Cc: Gleb Natapov <gleb@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: kvm@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/9993/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
[james.hogan@imgtec.com: Backport to 3.10..3.16]
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>

crypto: gcm - Fix IV buffer size in crypto_gcm_setkey

commit 50d2e6dc1f83db0563c7d6603967bf9585ce934b upstream.

The cipher block size for GCM is 16 bytes, and thus the CTR transform
used in crypto_gcm_setkey() will also expect a 16-byte IV. However,
the code currently reserves only 8 bytes for the IV, causing
an out-of-bounds access in the CTR transform. This patch fixes
the issue by setting the size of the IV buffer to 16 bytes.

Fixes: 84c911523020 ("[CRYPTO] gcm: Add support for async ciphers")
Signed-off-by: Ondrej Mosnacek <omosnacek@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Willy Tarreau <w@1wt.eu>

crypto: skcipher - Fix blkcipher walk OOM crash

commit acdb04d0b36769b3e05990c488dc74d8b7ac8060 upstream.

When we need to allocate a temporary blkcipher_walk_next and it
fails, the code is supposed to take the slow path of processing
the data block by block. However, due to an unrelated change
we instead end up dereferencing the NULL pointer.

This patch fixes it by moving the unrelated bsize setting out
of the way so that we enter the slow path as inteded.

Fixes: 7607bd8ff03b ("[CRYPTO] blkcipher: Added blkcipher_walk_virt_block")
Cc: stable@vger.kernel.org
Reported-by: xiakaixu <xiakaixu@huawei.com>
Reported-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>

crypto: cryptd - initialize child shash_desc on import

commit 0bd2223594a4dcddc1e34b15774a3a4776f7749e upstream.

When calling .import() on a cryptd ahash_request, the structure members
that describe the child transform in the shash_desc need to be initialized
like they are when calling .init()

Cc: stable@vger.kernel.org
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>

crypto: algif_skcipher - Load TX SG list after waiting

commit 4f0414e54e4d1893c6f08260693f8ef84c929293 upstream.

We need to load the TX SG list in sendmsg(2) after waiting for
incoming data, not before.

Cc: stable@vger.kernel.org
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>

crypto: algif_skcipher - Fix race condition in skcipher_check_key

commit 1822793a523e5d5730b19cc21160ff1717421bc8 upstream.

We need to lock the child socket in skcipher_check_key as otherwise
two simultaneous calls can cause the parent socket to be freed.

Cc: stable@vger.kernel.org
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>

crypto: algif_hash - Fix race condition in hash_check_key

commit ad46d7e33219218605ea619e32553daf4f346b9f upstream.

We need to lock the child socket in hash_check_key as otherwise
two simultaneous calls can cause the parent socket to be freed.

Cc: stable@vger.kernel.org
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>

crypto: af_alg - Forbid bind(2) when nokey child sockets are present

commit a6a48c565f6f112c6983e2a02b1602189ed6e26e upstream.

This patch forbids the calling of bind(2) when there are child
sockets created by accept(2) in existence, even if they are created
on the nokey path.

This is needed as those child sockets have references to the tfm
object which bind(2) will destroy.

Cc: stable@vger.kernel.org
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>

crypto: algif_skcipher - Remove custom release parent function

commit d7b65aee1e7b4c87922b0232eaba56a8a143a4a0 upstream.

This patch removes the custom release parent function as the
generic af_alg_release_parent now works for nokey sockets too.

Cc: stable@vger.kernel.org
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>

crypto: algif_hash - Remove custom release parent function

commit f1d84af1835846a5a2b827382c5848faf2bb0e75 upstream.

This patch removes the custom release parent function as the
generic af_alg_release_parent now works for nokey sockets too.

Cc: stable@vger.kernel.org
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>

crypto: af_alg - Allow af_af_alg_release_parent to be called on nokey path

commit 6a935170a980024dd29199e9dbb5c4da4767a1b9 upstream.

This patch allows af_alg_release_parent to be called even for
nokey sockets.

Cc: stable@vger.kernel.org
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>

crypto: algif_skcipher - Add key check exception for cipher_null

commit 6e8d8ecf438792ecf7a3207488fb4eebc4edb040 upstream.

This patch adds an exception to the key check so that cipher_null
users may continue to use algif_skcipher without setting a key.

Cc: stable@vger.kernel.org
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>

crypto: skcipher - Add crypto_skcipher_has_setkey

commit a1383cd86a062fc798899ab20f0ec2116cce39cb upstream.

This patch adds a way for skcipher users to determine whether a key
is required by a transform.

Cc: stable@vger.kernel.org
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>

crypto: algif_hash - Require setkey before accept(2)

commit 6de62f15b581f920ade22d758f4c338311c2f0d4 upstream.

Hash implementations that require a key may crash if you use
them without setting a key. This patch adds the necessary checks
so that if you do attempt to use them without a key that we return
-ENOKEY instead of proceeding.

This patch also adds a compatibility path to support old applications
that do acept(2) before setkey.

Cc: stable@vger.kernel.org
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>

crypto: shash - Fix has_key setting

commit 00420a65fa2beb3206090ead86942484df2275f3 upstream.

The has_key logic is wrong for shash algorithms as they always
have a setkey function. So we should instead be testing against
shash_no_setkey.

Fixes: a5596d633278 ("crypto: hash - Add crypto_ahash_has_setkey")
Cc: stable@vger.kernel.org
Reported-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tested-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>

crypto: hash - Add crypto_ahash_has_setkey

commit a5596d6332787fd383b3b5427b41f94254430827 upstream.

This patch adds a way for ahash users to determine whether a key
is required by a crypto_ahash transform.

Cc: stable@vger.kernel.org
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>

crypto: algif_skcipher - Add nokey compatibility path

commit a0fa2d037129a9849918a92d91b79ed6c7bd2818 upstream.

This patch adds a compatibility path to support old applications
that do acept(2) before setkey.

Cc: stable@vger.kernel.org
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>

crypto: af_alg - Add nokey compatibility path

commit 37766586c965d63758ad542325a96d5384f4a8c9 upstream.

This patch adds a compatibility path to support old applications
that do acept(2) before setkey.

Cc: stable@vger.kernel.org
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>

crypto: af_alg - Disallow bind/setkey/... after accept(2)

commit c840ac6af3f8713a71b4d2363419145760bd6044 upstream.

Each af_alg parent socket obtained by socket(2) corresponds to a
tfm object once bind(2) has succeeded. An accept(2) call on that
parent socket creates a context which then uses the tfm object.

Therefore as long as any child sockets created by accept(2) exist
the parent socket must not be modified or freed.

This patch guarantees this by using locks and a reference count
on the parent socket. Any attempt to modify the parent socket will
fail with EBUSY.

Cc: stable@vger.kernel.org
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>

crypto: algif_skcipher - Require setkey before accept(2)

commit dd504589577d8e8e70f51f997ad487a4cb6c026f upstream.

Some cipher implementations will crash if you try to use them
without calling setkey first. This patch adds a check so that
the accept(2) call will fail with -ENOKEY if setkey hasn't been
done on the socket yet.

Cc: stable@vger.kernel.org
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>

sched/core: Fix an SMP ordering race in try_to_wake_up() vs. schedule()

commit ecf7d01c229d11a44609c0067889372c91fb4f36 upstream.

Oleg noticed that its possible to falsely observe p->on_cpu == 0 such
that we'll prematurely continue with the wakeup and effectively run p on
two CPUs at the same time.

Even though the overlap is very limited; the task is in the middle of
being scheduled out; it could still result in corruption of the
scheduler data structures.

        CPU0                            CPU1

        set_current_state(...)

        <preempt_schedule>
          context_switch(X, Y)
            prepare_lock_switch(Y)
              Y->on_cpu = 1;
            finish_lock_switch(X)
              store_release(X->on_cpu, 0);

                                        try_to_wake_up(X)
                                          LOCK(p->pi_lock);

                                          t = X->on_cpu; // 0

          context_switch(Y, X)
            prepare_lock_switch(X)
              X->on_cpu = 1;
            finish_lock_switch(Y)
              store_release(Y->on_cpu, 0);
        </preempt_schedule>

        schedule();
          deactivate_task(X);
          X->on_rq = 0;

                                          if (X->on_rq) // false

                                          if (t) while (X->on_cpu)
                                            cpu_relax();

          context_switch(X, ..)
            finish_lock_switch(X)
              store_release(X->on_cpu, 0);

Avoid the load of X->on_cpu being hoisted over the X->on_rq load.

Reported-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>

sched/core: Fix a race between try_to_wake_up() and a woken up task

commit 135e8c9250dd5c8c9aae5984fde6f230d0cbfeaf upstream.

The origin of the issue I've seen is related to
a missing memory barrier between check for task->state and
the check for task->on_rq.

The task being woken up is already awake from a schedule()
and is doing the following:

do {
schedule()
set_current_state(TASK_(UN)INTERRUPTIBLE);
} while (!cond);

The waker, actually gets stuck doing the following in
try_to_wake_up():

while (p->on_cpu)
cpu_relax();

Analysis:

The instance I've seen involves the following race:

CPU1 CPU2

while () {
   if (cond)
     break;
   do {
     schedule();
     set_current_state(TASK_UN..)
   } while (!cond);
wakeup_routine()
  spin_lock_irqsave(wait_lock)
   raw_spin_lock_irqsave(wait_lock)   wake_up_process()
}   try_to_wake_up()
set_current_state(TASK_RUNNING);   ..
list_del(&waiter.list);

CPU2 wakes up CPU1, but before it can get the wait_lock and set
current state to TASK_RUNNING the following occurs:

CPU3
wakeup_routine()
raw_spin_lock_irqsave(wait_lock)
if (!list_empty)
   wake_up_process()
   try_to_wake_up()
   raw_spin_lock_irqsave(p->pi_lock)
   ..
   if (p->on_rq && ttwu_wakeup())
   ..
   while (p->on_cpu)
     cpu_relax()
   ..

CPU3 tries to wake up the task on CPU1 again since it finds
it on the wait_queue, CPU1 is spinning on wait_lock, but immediately
after CPU2, CPU3 got it.

CPU3 checks the state of p on CPU1, it is TASK_UNINTERRUPTIBLE and
the task is spinning on the wait_lock. Interestingly since p->on_rq
is checked under pi_lock, I've noticed that try_to_wake_up() finds
p->on_rq to be 0. This was the most confusing bit of the analysis,
but p->on_rq is changed under runqueue lock, rq_lock, the p->on_rq
check is not reliable without this fix IMHO. The race is visible
(based on the analysis) only when ttwu_queue() does a remote wakeup
via ttwu_queue_remote. In which case the p->on_rq change is not
done uder the pi_lock.

The result is that after a while the entire system locks up on
the raw_spin_irqlock_save(wait_lock) and the holder spins infintely

Reproduction of the issue:

The issue can be reproduced after a long run on my system with 80
threads and having to tweak available memory to very low and running
memory stress-ng mmapfork test. It usually takes a long time to
reproduce. I am trying to work on a test case that can reproduce
the issue faster, but thats work in progress. I am still testing the
changes on my still in a loop and the tests seem OK thus far.

Big thanks to Benjamin and Nick for helping debug this as well.
Ben helped catch the missing barrier, Nick caught every missing
bit in my theory.

Signed-off-by: Balbir Singh <bsingharora@gmail.com>
[ Updated comment to clarify matching barriers. Many
  architectures do not have a full barrier in switch_to()
  so that cannot be relied upon. ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicholas Piggin <nicholas.piggin@gmail.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/e02cce7b-d9ca-1ad0-7a61-ea97c7582b37@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>

Linux 3.10.104

mm: remove gup_flags FOLL_WRITE games from __get_user_pages()

commit 19be0eaffa3ac7d8eb6784ad9bdbc7d67ed8e619 upstream.

This is an ancient bug that was actually attempted to be fixed once
(badly) by me eleven years ago in commit 4ceb5db9757a ("Fix
get_user_pages() race for write access") but that was then undone due to
problems on s390 by commit f33ea7f404e5 ("fix get_user_pages bug").

In the meantime, the s390 situation has long been fixed, and we can now
fix it by checking the pte_dirty() bit properly (and do it better).  The
s390 dirty bit was implemented in abf09bed3cce ("s390/mm: implement
software dirty bits") which made it into v3.9.  Earlier kernels will
have to look at the page state itself.

Also, the VM has become more scalable, and what used a purely
theoretical race back then has become easier to trigger.

To fix it, we introduce a new internal FOLL_COW flag to mark the "yes,
we already did a COW" rather than play racy games with FOLL_WRITE that
is very fundamental, and then use the pte dirty flag to validate that
the FOLL_COW flag is still valid.

Reported-and-tested-by: Phil "not Paul" Oester <kernel@linuxace.com>
Acked-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Michal Hocko <mhocko@suse.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Nick Piggin <npiggin@gmail.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[wt: s/gup.c/memory.c; s/follow_page_pte/follow_page_mask;
     s/faultin_page/__get_user_page]
Signed-off-by: Willy Tarreau <w@1wt.eu>

xen-netback: ref count shared rings

... so that we can make sure the rings are not freed until all SKBs in
internal queues are consumed.

1. The VM is receiving packets through bonding + bridge + netback +
   netfront.
2. For some unknown reason at least one packet remains in the rx queue
   and is not delivered to the domU immediately by netback.
3. The VM finishes shutting down.
4. The shared ring between dom0 and domU is freed.
5. then xen-netback continues processing the pending requests and tries
   to put the packet into the now already released shared ring.

> XXXlan0: port 9(vif26.0) entered disabled state
> BUG: unable to handle kernel paging request at ffffc900108641d8
> IP: [<ffffffffa04147dc>] xen_netbk_rx_action+0x18b/0x6f0 [xen_netback]
> PGD 57e20067 PUD 57e21067 PMD 571a7067 PTE 0
> Oops: 0000 [#1] SMP
> ...
> CPU: 0 PID: 12587 Comm: netback/0 Not tainted 3.10.0-ucs58-amd64 #1 Debian 3.10.11-1.58.201405060908
> Hardware name: FUJITSU PRIMERGY BX620 S6/D3051, BIOS 080015 Rev.3C78.3051 07/22/2011
> task: ffff880004b067c0 ti: ffff8800561ec000 task.ti: ffff8800561ec000
> RIP: e030:[<ffffffffa04147dc>]  [<ffffffffa04147dc>] xen_netbk_rx_action+0x18b/0x6f0 [xen_netback]
> RSP: e02b:ffff8800561edce8  EFLAGS: 00010202
> RAX: ffffc900104adac0 RBX: ffff8800541e95c0 RCX: ffffc90010864000
> RDX: 000000000000003b RSI: 0000000000000000 RDI: ffff880040014380
> RBP: ffff8800570e6800 R08: 0000000000000000 R09: ffff880004799800
> R10: ffffffff813ca115 R11: ffff88005e4fdb08 R12: ffff880054e6f800
> R13: ffff8800561edd58 R14: ffffc900104a1000 R15: 0000000000000000
> FS:  00007f19a54a8700(0000) GS:ffff88005da00000(0000) knlGS:0000000000000000
> CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: ffffc900108641d8 CR3: 0000000054cb3000 CR4: 0000000000002660
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Stack:
>  ffff880004b06ba0 0000000000000000 ffff88005da13ec0 ffff88005da13ec0
>  0000000004b067c0 ffffc900104a8ac0 ffffc900104a1020 000000005da13ec0
>  0000000000000000 0000000000000001 ffffc900104a8ac0 ffffc900104adac0
> Call Trace:
>  [<ffffffff813ca32d>] ? _raw_spin_lock_irqsave+0x11/0x2f
>  [<ffffffffa0416033>] ? xen_netbk_kthread+0x174/0x841 [xen_netback]
>  [<ffffffff8105d373>] ? wake_up_bit+0x20/0x20
>  [<ffffffffa0415ebf>] ? xen_netbk_tx_build_gops+0xce8/0xce8 [xen_netback]
>  [<ffffffff8105cd73>] ? kthread_freezable_should_stop+0x56/0x56
>  [<ffffffffa0415ebf>] ? xen_netbk_tx_build_gops+0xce8/0xce8 [xen_netback]
>  [<ffffffff8105ce1e>] ? kthread+0xab/0xb3
>  [<ffffffff81003638>] ? xen_end_context_switch+0xe/0x1c
>  [<ffffffff8105cd73>] ? kthread_freezable_should_stop+0x56/0x56
>  [<ffffffff813cfbfc>] ? ret_from_fork+0x7c/0xb0
>  [<ffffffff8105cd73>] ? kthread_freezable_should_stop+0x56/0x56
> Code: 8b b3 d0 00 00 00 48 8b bb d8 00 00 00 0f b7 74 37 02 89 70 08 eb 07 c7 40 08 00 00 00 00 89 d2 c7 40 04 00 00 00 00 48 83 c2 08 <0f> b7 34 d1 89 30 c7 44 24 60 00 00 00 00 8b 44 d1 04 89 44 24
> RIP  [<ffffffffa04147dc>] xen_netbk_rx_action+0x18b/0x6f0 [xen_netback]
>  RSP <ffff8800561edce8>
> CR2: ffffc900108641d8

Track the shared ring buffer being unmapped and drop those packets.

Ref-count the rings as followed:
  map         -> set to 1
   start_xmit -> inc when queueing SKB to internal queue
   rx_action  -> dec after finishing processing a SKB
  unmap       -> dec and wait to be 0

Note that this is different from ref counting the vif structure itself.
Currently only guest Rx path is taken care of because that's where the
bug surfaced.

This bug doesn't exist in kernel >=3.12 as multi-queue support was added
there.

Link: <https://lists.xenproject.org/archives/html/xen-devel/2014-06/msg00818.html>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Philipp Hahn <hahn@univention.de>
Cc: David Vrabel <david.vrabel@citrix.com>
Tested-by: Philipp Hahn <hahn@univention.de>
Signed-off-by: Willy Tarreau <w@1wt.eu>

security: let security modules use PTRACE_MODE_* with bitmasks

commit 3dfb7d8cdbc7ea0c2970450e60818bb3eefbad69 upstream.

It looks like smack and yama weren't aware that the ptrace mode
can have flags ORed into it - PTRACE_MODE_NOAUDIT until now, but
only for /proc/$pid/stat, and with the PTRACE_MODE_*CREDS patch,
all modes have flags ORed into them.

Signed-off-by: Jann Horn <jann@thejh.net>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: "Serge E. Hallyn" <serge.hallyn@ubuntu.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Willy Tarreau <w@1wt.eu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[wt: no smk_ptrace_mode() in 3.10]
Signed-off-by: Willy Tarreau <w@1wt.eu>

MIPS: KVM: Check for pfn noslot case

commit ba913e4f72fc9cfd03dad968dfb110eb49211d80 upstream.

When mapping a page into the guest we error check using is_error_pfn(),
however this doesn't detect a value of KVM_PFN_NOSLOT, indicating an
error HVA for the page. This can only happen on MIPS right now due to
unusual memslot management (e.g. being moved / removed / resized), or
with an Enhanced Virtual Memory (EVA) configuration where the default
KVM_HVA_ERR_* and kvm_is_error_hva() definitions are unsuitable (fixed
in a later patch). This case will be treated as a pfn of zero, mapping
the first page of physical memory into the guest.

It would appear the MIPS KVM port wasn't updated prior to being merged
(in v3.10) to take commit 81c52c56e2b4 ("KVM: do not treat noslot pfn as
a error pfn") into account (merged v3.8), which converted a bunch of
is_error_pfn() calls to is_error_noslot_pfn(). Switch to using
is_error_noslot_pfn() instead to catch this case properly.

Fixes: 858dd5d45733 ("KVM/MIPS32: MMU/TLB operations for the Guest.")
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim KrÄ\8dmÃ¡Å\99" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[james.hogan@imgtec.com: Backport to v3.16.y]
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>

mm: thp: fix SMP race condition between THP page fault and MADV_DONTNEED

commit ad33bb04b2a6cee6c1f99fabb15cddbf93ff0433 upstream.

pmd_trans_unstable()/pmd_none_or_trans_huge_or_clear_bad() were
introduced to locklessy (but atomically) detect when a pmd is a regular
(stable) pmd or when the pmd is unstable and can infinitely transition
from pmd_none() and pmd_trans_huge() from under us, while only holding
the mmap_sem for reading (for writing not).

While holding the mmap_sem only for reading, MADV_DONTNEED can run from
under us and so before we can assume the pmd to be a regular stable pmd
we need to compare it against pmd_none() and pmd_trans_huge() in an
atomic way, with pmd_trans_unstable(). The old pmd_trans_huge() left a
tiny window for a race.

Useful applications are unlikely to notice the difference as doing
MADV_DONTNEED concurrently with a page fault would lead to undefined
behavior.

[js] 3.12 backport: no pmd_devmap in 3.12 yet.

[akpm@linux-foundation.org: tidy up comment grammar/layout]
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reported-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>

ACPI / sysfs: fix error code in get_status()

commit f18ebc211e259d4f591e39e74b2aa2de226c9a1d upstream.

The problem with ornamental, do-nothing gotos is that they lead to
"forgot to set the error code" bugs.  We should be returning -EINVAL
here but we don't.  It leads to an uninitalized variable in
counter_show():

    drivers/acpi/sysfs.c:603 counter_show()
    error: uninitialized symbol 'status'.

Fixes: 1c8fce27e275 (ACPI: introduce drivers/acpi/sysfs.c)
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>

staging: comedi: daqboard2000: bug fix board type matching code

commit 80e162ee9b31d77d851b10f8c5299132be1e120f upstream.

`daqboard2000_find_boardinfo()` is supposed to check if the
DaqBoard/2000 series model is supported, based on the PCI subvendor and
subdevice ID. The current code is wrong as it is comparing the PCI
device's subdevice ID to an expected, fixed value for the subvendor ID.
It should be comparing the PCI device's subvendor ID to this fixed
value. Correct it.

Fixes: 7e8401b23e7f ("staging: comedi: daqboard2000: add back
subsystem_device check")
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Cc: <stable@vger.kernel.org> # 3.7+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>

crypto: nx - off by one bug in nx_of_update_msc()

commit e514cc0a492a3f39ef71b31590a7ef67537ee04b upstream.

The props->ap[] array is defined like this:

struct alg_props ap[NX_MAX_FC][NX_MAX_MODE][3];

So we can see that if msc->fc and msc->mode are == to NX_MAX_FC or
NX_MAX_MODE then we're off by one.

Fixes: ae0222b7289d ('powerpc/crypto: nx driver code supporting nx encryption')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Willy Tarreau <w@1wt.eu>

megaraid_sas: Fix probing cards without io port

commit e7f851684efb3377e9c93aca7fae6e76212e5680 upstream.

Found one megaraid_sas HBA probe fails,

[  187.235190] scsi host2: Avago SAS based MegaRAID driver
[  191.112365] megaraid_sas 0000:89:00.0: BAR 0: can't reserve [io  0x0000-0x00ff]
[  191.120548] megaraid_sas 0000:89:00.0: IO memory region busy!

and the card has resource like,
[  125.097714] pci 0000:89:00.0: [1000:005d] type 00 class 0x010400
[  125.104446] pci 0000:89:00.0: reg 0x10: [io  0x0000-0x00ff]
[  125.110686] pci 0000:89:00.0: reg 0x14: [mem 0xce400000-0xce40ffff 64bit]
[  125.118286] pci 0000:89:00.0: reg 0x1c: [mem 0xce300000-0xce3fffff 64bit]
[  125.125891] pci 0000:89:00.0: reg 0x30: [mem 0xce200000-0xce2fffff pref]

that does not io port resource allocated from BIOS, and kernel can not
assign one as io port shortage.

The driver is only looking for MEM, and should not fail.

It turns out megasas_init_fw() etc are using bar index as mask.  index 1
is used as mask 1, so that pci_request_selected_regions() is trying to
request BAR0 instead of BAR1.

Fix all related reference.

Fixes: b6d5d8808b4c ("megaraid_sas: Use lowest memory bar for SR-IOV VF support")
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>

aacraid: Check size values after double-fetch from user

commit fa00c437eef8dc2e7b25f8cd868cfa405fcc2bb3 upstream.

In aacraid's ioctl_send_fib() we do two fetches from userspace, one the
get the fib header's size and one for the fib itself. Later we use the
size field from the second fetch to further process the fib. If for some
reason the size from the second fetch is different than from the first
fix, we may encounter an out-of- bounds access in aac_fib_send(). We
also check the sender size to insure it is not out of bounds. This was
reported in https://bugzilla.kernel.org/show_bug.cgi?id=116751 and was
assigned CVE-2016-6480.

Reported-by: Pengfei Wang <wpengfeinudt@gmail.com>
Fixes: 7c00ffa31 '[SCSI] 2.6 aacraid: Variable FIB size (updated patch)'
Cc: stable@vger.kernel.org
Signed-off-by: Dave Carroll <david.carroll@microsemi.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>

PCI: Limit config space size for Netronome NFP4000

commit c2e771b02792d222cbcd9617fe71482a64f52647 upstream.

Like the NFP6000, the NFP4000 as an erratum where reading/writing to PCI
config space addresses above 0x600 can cause the NFP to generate PCIe
completion timeouts.

Limit the NFP4000's PF's config space size to 0x600 bytes as is already
done for the NFP6000.

The NFP4000's VF is 0x6004 (PCI_DEVICE_ID_NETRONOME_NFP6000_VF), the same
device ID as the NFP6000's VF. Thus, its config space is already limited
by the existing use of quirk_nfp6000().

Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>

PCI: Add Netronome NFP4000 PF device ID

commit 69874ec233871a62e1bc8c89e643993af93a8630 upstream.

Add the device ID for the PF of the NFP4000. The device ID for the VF,
0x6003, is already present as PCI_DEVICE_ID_NETRONOME_NFP6000_VF.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>

PCI: Limit config space size for Netronome NFP6000 family

commit 9f33a2ae59f24452c1076749deb615bccd435ca9 upstream.

The NFP6000 has an erratum where reading/writing to PCI config space
addresses above 0x600 can cause the NFP to generate PCIe completion
timeouts.

Limit the NFP6000's config space size to 0x600 bytes.

Signed-off-by: Jason S. McMullan <jason.mcmullan@netronome.com>
[simon: edited changelog]
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>

PCI: Add Netronome vendor and device IDs

commit a755e169031dac9ebaed03302c4921687c271d62 upstream.

Device IDs for the Netronome NFP3200, NFP3240, NFP6000, and NFP6000 SR-IOV
devices.

Signed-off-by: Jason S. McMullan <jason.mcmullan@netronome.com>
[simon: edited changelog]
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>

PCI: Support PCIe devices with short cfg_size

commit c20aecf6963d1273d8f6d61c042b4845441ca592 upstream.

If a device quirk modifies the pci_dev->cfg_size to be less than
PCI_CFG_SPACE_EXP_SIZE (4096), but greater than PCI_CFG_SPACE_SIZE (256),
the PCI sysfs interface truncates the readable size to PCI_CFG_SPACE_SIZE.

Allow sysfs access to config space up to cfg_size, even if the device
doesn't support the entire 4096-byte PCIe config space.

Note that pci_read_config() and pci_write_config() limit access to
dev->cfg_size even though pcie_config_attr contains 4096 (the maximum
size).

Signed-off-by: Jason S. McMullan <jason.mcmullan@netronome.com>
[simon: edited changelog]
Signed-off-by: Simon Horman <simon.horman@netronome.com>
[bhelgaas: more changelog edits]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>

Revert "powerpc/tm: Always reclaim in start_thread() for exec() class syscalls"

This reverts commit 8110080dc53335d5dd99b123144a6174f19ffc65.

Guenter noticed that this breaks PPC build when CONFIG_PPC_TRANSACTIONAL_MEM
is set, because this patch was not for 3.10.

Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>

Linux 3.10.103

spi: spi-xilinx: cleanup a check in xilinx_spi_txrx_bufs()

commit e33d085d11e54bc9fb07b2555cd104d8e7b3089b upstream.

'!' has higher precedence than comparisons so the original condition
is equivalent to "if (xspi->remaining_bytes == 0)". This makes the
static checkers complain.

xspi->remaining_bytes is signed and from looking at the code
briefly, I think it might be able to go negative. I suspect that
going negative may cause a bug, but I don't have the hardware and
can't test.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Mark Brown <broonie@linaro.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>

stb6100: fix buffer length check in stb6100_write_reg_range()

commit 7e6bd12fb77b0067df13fb3ba3fadbdff2945396 upstream.

We are checking sizeof() the wrong variable!

Signed-off-by: Alexander Shiyan <shc_work@mail.ru>
Signed-off-by: Michael Krufky <mkrufky@linuxtv.org>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>

isdn: hfcpci_softirq: get func return to suppress compiler warning

commit d6d6d1bc44362112e10a48d434e5b3c716152003 upstream.

Signed-off-by: Antonio Alecrim Jr <antonio.alecrim@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>

net: rfkill: Do not ignore errors from regulator_enable()

commit dee08ab83d0378d922b67e7cf10bbec3e4ea343b upstream.

Function regulator_enable() may return an error that has to be checked.
This patch changes function rfkill_regulator_set_block() so that it checks
for the return code. Also, rfkill_data->reg_enabled is set to 'true' only
if there is no error.

This fixes the following compilation warning:

net/rfkill/rfkill-regulator.c:43:20: warning: ignoring return value of 'regulator_enable', declared with attribute warn_unused_result [-Wunused-result]

Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>

ALSA: oxygen: Fix logical-not-parentheses warning

commit 8ec7cfce3762299ae289c384e281b2f4010ae231 upstream.

This fixes the following warning, that is seen with gcc 5.1:
warning: logical not is only applied to the left hand side of comparison [-Wlogical-not-parentheses].

Signed-off-by: Tomer Barletz <barletz@gmail.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Willy Tarreau <w@1wt.eu>

HID: hid-input: Add parentheses to quell gcc warning

commit 09a5c34e8d6b05663ec4c3d22b1fbd9fec89aaf9 upstream.

GCC reports a -Wlogical-not-parentheses warning here; therefore
add parentheses to shut it up and to express our intent more.

Signed-off-by: James C Boyd <jcboyd.dev@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Willy Tarreau <w@1wt.eu>

squash mm: Export migrate_page_... : also make it non-static

commit ce16887b69e94a8c0305e88c918989f8bc1bd6b7 upstream.

Signed-off-by: Willy Tarreau <w@1wt.eu>

be2iscsi: Fix bogus WARN_ON length check

commit dd29dae00d39186890a5eaa2fe4ad8768bfd41a9 upstream.

drivers/scsi/be2iscsi/be_main.c: In function 'be_sgl_create_contiguous':
drivers/scsi/be2iscsi/be_main.c:3187:18: warning: logical not is only applied to the left hand side of comparison [-Wlogical-not-parentheses]
WARN_ON(!length > 0);

gcc version 5.2.1

Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Cc: Jayamohan Kallickal <jayamohan.kallickal@avagotech.com>
Cc: Minh Tran <minh.tran@avagotech.com>
Cc: John Soni Jose <sony.john-n@avagotech.com>
Cc: "James E.J. Bottomley" <JBottomley@odin.com>
Reported-by: Joel Stanley <joel@jms.id.au>
Reviewed-by: Manoj Kumar <manoj@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>

module: Invalidate signatures on force-loaded modules

commit bca014caaa6130e57f69b5bf527967aa8ee70fdd upstream.

Signing a module should only make it trusted by the specific kernel it
was built for, not anything else. Loading a signed module meant for a
kernel with a different ABI could have interesting effects.
Therefore, treat all signatures as invalid when a module is
force-loaded.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>

dm flakey: error READ bios during the down_interval

commit 99f3c90d0d85708e7401a81ce3314e50bf7f2819 upstream.

When the corrupt_bio_byte feature was introduced it caused READ bios to
no longer be errored with -EIO during the down_interval. This had to do
with the complexity of needing to submit READs if the corrupt_bio_byte
feature was used.

Fix it so READ bios are properly errored with -EIO; doing so early in
flakey_map() as long as there isn't a match for the corrupt_bio_byte
feature.

Fixes: a3998799fb4df ("dm flakey: add corrupt_bio_byte feature")
Reported-by: Akira Hayakawa <ruby.wktk@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Willy Tarreau <w@1wt.eu>

ubi: Fix race condition between ubi device creation and udev

commit 714fb87e8bc05ff78255afc0dca981e8c5242785 upstream.

Install the UBI device object before we arm sysfs.
Otherwise udev tries to read sysfs attributes before UBI is ready and
udev rules will not match.

Cc: <stable@vger.kernel.org>
Signed-off-by: Iosif Harutyunov <iharutyunov@sonicwall.com>
[rw: massaged commit message]
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Willy Tarreau <w@1wt.eu>

ubi: Make volume resize power cut aware

commit 4946784bd3924b1374f05eebff2fd68660bae866 upstream.

When the volume resize operation shrinks a volume,
LEBs will be unmapped. Since unmapping will not erase these
LEBs immediately we have to wait for that operation to finish.
Otherwise in case of a power cut right after writing the new
volume table the UBI attach process can find more LEBs than the
volume table knows. This will render the UBI image unattachable.

Fix this issue by waiting for erase to complete and write the new
volume table afterward.

Cc: <stable@vger.kernel.org>
Reported-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Reviewed-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Willy Tarreau <w@1wt.eu>

metag: Fix __cmpxchg_u32 asm constraint for CMP

commit 6154c187b97ee7513046bb4eb317a89f738f13ef upstream.

The LNKGET based atomic sequence in __cmpxchg_u32 has slightly incorrect
constraints for the return value which under certain circumstances can
allow an address unit register to be used as the first operand of a CMP
instruction. This isn't a valid instruction however as the encodings
only allow a data unit to be specified. This would result in an
assembler error like the following:

Error: failed to assemble instruction: "CMP A0.2,D0Ar6"

Fix by changing the constraint from "=&da" (assigned, early clobbered,
data or address unit register) to "=&d" (data unit register only).

The constraint for the second operand, "bd" (an op2 register where op1
is a data unit register and the instruction supports O2R) is already
correct assuming the first operand is a data unit register.

Other cases of CMP in inline asm have had their constraints checked, and
appear to all be fine.

Fixes: 6006c0d8ce94 ("metag: Atomics, locks and bitops")
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: linux-metag@vger.kernel.org
Cc: <stable@vger.kernel.org> # 3.9.x-
Signed-off-by: Willy Tarreau <w@1wt.eu>

ftrace/recordmcount: Work around for addition of metag magic but not relocations

commit b2e1c26f0b62531636509fbcb6dab65617ed8331 upstream.

glibc recently did a sync up (94e73c95d9b5 "elf.h: Sync with the gabi
webpage") that added a #define for EM_METAG but did not add relocations

This triggers build errors:

scripts/recordmcount.c: In function 'do_file':
scripts/recordmcount.c:466:28: error: 'R_METAG_ADDR32' undeclared (first use in this function)
  case EM_METAG:  reltype = R_METAG_ADDR32;
                            ^~~~~~~~~~~~~~
scripts/recordmcount.c:466:28: note: each undeclared identifier is reported only once for each function it appears in
scripts/recordmcount.c:468:20: error: 'R_METAG_NONE' undeclared (first use in this function)
     rel_type_nop = R_METAG_NONE;
                    ^~~~~~~~~~~~

Work around this change with some more #ifdefery for the relocations.

Fedora Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1354034

Link: http://lkml.kernel.org/r/1468005530-14757-1-git-send-email-labbott@redhat.com
Cc: stable@vger.kernel.org # v3.9+
Cc: James Hogan <james.hogan@imgtec.com>
Fixes: 00512bdd4573 ("metag: ftrace support")
Reported-by: Ross Burton <ross.burton@intel.com>
Signed-off-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>

balloon: check the number of available pages in leak balloon

commit 37cf99e08c6fb4dcea0f9ad2b13b6daa8c76a711 upstream.

The balloon has a special mechanism that is subscribed to the oom
notification which leads to deflation for a fixed number of pages.
The number is always fixed even when the balloon is fully deflated.
But leak_balloon did not expect that the pages to deflate will be more
than taken, and raise a "BUG" in balloon_page_dequeue when page list
will be empty.

So, the simplest solution would be to check that the number of releases
pages is less or equal to the number taken pages.

Cc: stable@vger.kernel.org
Signed-off-by: Konstantin Neumoin <kneumoin@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>

netlabel: add address family checks to netlbl_{sock,req}_delattr()

commit 0e0e36774081534783aa8eeb9f6fbddf98d3c061 upstream.

It seems risky to always rely on the caller to ensure the socket's
address family is correct before passing it to the NetLabel kAPI,
especially since we see at least one LSM which didn't. Add address
family checks to the *_delattr() functions to help prevent future
problems.

Cc: <stable@vger.kernel.org>
Reported-by: Maninder Singh <maninder1.s@samsung.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>

cifs: Check for existing directory when opening file with O_CREAT

commit 8d9535b6efd86e6c07da59f97e68f44efb7fe080 upstream.

When opening a file with O_CREAT flag, check to see if the file opened
is an existing directory.

This prevents the directory from being opened which subsequently causes
a crash when the close function for directories cifs_closedir() is called
which frees up the file->private_data memory while the file is still
listed on the open file list for the tcon.

Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
CC: Stable <stable@vger.kernel.org>
Reported-by: Xiaoli Feng <xifeng@redhat.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>

Bluetooth: Fix l2cap_sock_setsockopt() with optname BT_RCVMTU

commit 23bc6ab0a0912146fd674a0becc758c3162baabc upstream.

When we retrieve imtu value from userspace we should use 16 bit pointer
cast instead of 32 as it's defined that way in headers. Fixes setsockopt
calls on big-endian platforms.

Signed-off-by: Amadeusz SÅ\82awiÅ\84ski <amadeusz.slawinski@tieto.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Cc: stable@vger.kernel.org
Signed-off-by: Willy Tarreau <w@1wt.eu>

s5p-mfc: Add release callback for memory region devs

commit 6311f1261f59ce5e51fbe5cc3b5e7737197316ac upstream.

When s5p_mfc_remove() calls put_device() for the reserved memory region
devs, the driver core warns that the dev doesn't have a release callback:

WARNING: CPU: 0 PID: 591 at drivers/base/core.c:251 device_release+0x8c/0x90
Device 's5p-mfc-l' does not have a release() function, it is broken and must be fixed.

Also, the declared DMA memory using dma_declare_coherent_memory() isn't
relased so add a dev .release that calls dma_release_declared_memory().

Cc: <stable@vger.kernel.org>
Fixes: 6e83e6e25eb4 ("[media] s5p-mfc: Fix kernel warning on memory init")
Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>

s5p-mfc: Set device name for reserved memory region devs

commit 29debab0a94035a390801d1f177d171d014b7765 upstream.

The devices don't have a name set, so makes dev_name() returns NULL which
makes harder to identify the devices that are causing issues, for example:

WARNING: CPU: 2 PID: 616 at drivers/base/core.c:251 device_release+0x8c/0x90
Device '(null)' does not have a release() function, it is broken and must be fixed.

And after setting the device name:

WARNING: CPU: 0 PID: 591 at drivers/base/core.c:251 device_release+0x8c/0x90
Device 's5p-mfc-l' does not have a release() function, it is broken and must be fixed.

Cc: <stable@vger.kernel.org>
Fixes: 6e83e6e25eb4 ("[media] s5p-mfc: Fix kernel warning on memory init")
Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>

hp-wmi: Fix wifi cannot be hard-unblocked

commit fc8a601e1175ae351f662506030f9939cb7fdbfe upstream.

Several users reported wifi cannot be unblocked as discussed in [1].
This patch removes the use of the 2009 flag by BIOS but uses the actual
WMI function calls - it will be skipped if WMI reports unsupported.

[1] https://bugzilla.kernel.org/show_bug.cgi?id=69131

Signed-off-by: Alex Hung <alex.hung@canonical.com>
Tested-by: Evgenii Shatokhin <eugene.shatokhin@yandex.ru>
Signed-off-by: Darren Hart <dvhart@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>

gpio: pca953x: Fix NBANK calculation for PCA9536

commit a246b8198f776a16d1d3a3bbfc2d437bad766b29 upstream.

NBANK() macro assumes that ngpios is a multiple of 8(BANK_SZ) and
hence results in 0 banks for PCA9536 which has just 4 gpios. This is
wrong as PCA9356 has 1 bank with 4 gpios. This results in uninitialized
PCA953X_INVERT register. Fix this by using DIV_ROUND_UP macro in
NBANK().

Cc: stable@vger.kernel.org
Signed-off-by: Vignesh R <vigneshr@ti.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>

net/irda: fix NULL pointer dereference on memory allocation failure

commit d3e6952cfb7ba5f4bfa29d4803ba91f96ce1204d upstream.

I ran into this:

    kasan: CONFIG_KASAN_INLINE enabled
    kasan: GPF could be caused by NULL-ptr deref or user memory access
    general protection fault: 0000 [#1] PREEMPT SMP KASAN
    CPU: 2 PID: 2012 Comm: trinity-c3 Not tainted 4.7.0-rc7+ #19
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
    task: ffff8800b745f2c0 ti: ffff880111740000 task.ti: ffff880111740000
    RIP: 0010:[<ffffffff82bbf066>]  [<ffffffff82bbf066>] irttp_connect_request+0x36/0x710
    RSP: 0018:ffff880111747bb8  EFLAGS: 00010286
    RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000069dd8358
    RDX: 0000000000000009 RSI: 0000000000000027 RDI: 0000000000000048
    RBP: ffff880111747c00 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000069dd8358 R11: 1ffffffff0759723 R12: 0000000000000000
    R13: ffff88011a7e4780 R14: 0000000000000027 R15: 0000000000000000
    FS:  00007fc738404700(0000) GS:ffff88011af00000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007fc737fdfb10 CR3: 0000000118087000 CR4: 00000000000006e0
    Stack:
     0000000000000200 ffff880111747bd8 ffffffff810ee611 ffff880119f1f220
     ffff880119f1f4f8 ffff880119f1f4f0 ffff88011a7e4780 ffff880119f1f232
     ffff880119f1f220 ffff880111747d58 ffffffff82bca542 0000000000000000
    Call Trace:
     [<ffffffff82bca542>] irda_connect+0x562/0x1190
     [<ffffffff825ae582>] SYSC_connect+0x202/0x2a0
     [<ffffffff825b4489>] SyS_connect+0x9/0x10
     [<ffffffff8100334c>] do_syscall_64+0x19c/0x410
     [<ffffffff83295ca5>] entry_SYSCALL64_slow_path+0x25/0x25
    Code: 41 89 ca 48 89 e5 41 57 41 56 41 55 41 54 41 89 d7 53 48 89 fb 48 83 c7 48 48 89 fa 41 89 f6 48 c1 ea 03 48 83 ec 20 4c 8b 65 10 <0f> b6 04 02 84 c0 74 08 84 c0 0f 8e 4c 04 00 00 80 7b 48 00 74
    RIP  [<ffffffff82bbf066>] irttp_connect_request+0x36/0x710
     RSP <ffff880111747bb8>
    ---[ end trace 4cda2588bc055b30 ]---

The problem is that irda_open_tsap() can fail and leave self->tsap = NULL,
and then irttp_connect_request() almost immediately dereferences it.

Cc: stable@vger.kernel.org
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>

fuse: fix wrong assignment of ->flags in fuse_send_init()

commit 9446385f05c9af25fed53dbed3cc75763730be52 upstream.

FUSE_HAS_IOCTL_DIR should be assigned to ->flags, it may be a typo.

Signed-off-by: Wei Fang <fangwei1@huawei.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: 69fe05c90ed5 ("fuse: add missing INIT flags")
Cc: <stable@vger.kernel.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>

block: fix use-after-free in seq file

commit 77da160530dd1dc94f6ae15a981f24e5f0021e84 upstream.

I got a KASAN report of use-after-free:

    ==================================================================
    BUG: KASAN: use-after-free in klist_iter_exit+0x61/0x70 at addr ffff8800b6581508
    Read of size 8 by task trinity-c1/315
    =============================================================================
    BUG kmalloc-32 (Not tainted): kasan: bad access detected
    -----------------------------------------------------------------------------

    Disabling lock debugging due to kernel taint
    INFO: Allocated in disk_seqf_start+0x66/0x110 age=144 cpu=1 pid=315
            ___slab_alloc+0x4f1/0x520
            __slab_alloc.isra.58+0x56/0x80
            kmem_cache_alloc_trace+0x260/0x2a0
            disk_seqf_start+0x66/0x110
            traverse+0x176/0x860
            seq_read+0x7e3/0x11a0
            proc_reg_read+0xbc/0x180
            do_loop_readv_writev+0x134/0x210
            do_readv_writev+0x565/0x660
            vfs_readv+0x67/0xa0
            do_preadv+0x126/0x170
            SyS_preadv+0xc/0x10
            do_syscall_64+0x1a1/0x460
            return_from_SYSCALL_64+0x0/0x6a
    INFO: Freed in disk_seqf_stop+0x42/0x50 age=160 cpu=1 pid=315
            __slab_free+0x17a/0x2c0
            kfree+0x20a/0x220
            disk_seqf_stop+0x42/0x50
            traverse+0x3b5/0x860
            seq_read+0x7e3/0x11a0
            proc_reg_read+0xbc/0x180
            do_loop_readv_writev+0x134/0x210
            do_readv_writev+0x565/0x660
            vfs_readv+0x67/0xa0
            do_preadv+0x126/0x170
            SyS_preadv+0xc/0x10
            do_syscall_64+0x1a1/0x460
            return_from_SYSCALL_64+0x0/0x6a

    CPU: 1 PID: 315 Comm: trinity-c1 Tainted: G    B           4.7.0+ #62
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
     ffffea0002d96000 ffff880119b9f918 ffffffff81d6ce81 ffff88011a804480
     ffff8800b6581500 ffff880119b9f948 ffffffff8146c7bd ffff88011a804480
     ffffea0002d96000 ffff8800b6581500 fffffffffffffff4 ffff880119b9f970
    Call Trace:
     [<ffffffff81d6ce81>] dump_stack+0x65/0x84
     [<ffffffff8146c7bd>] print_trailer+0x10d/0x1a0
     [<ffffffff814704ff>] object_err+0x2f/0x40
     [<ffffffff814754d1>] kasan_report_error+0x221/0x520
     [<ffffffff8147590e>] __asan_report_load8_noabort+0x3e/0x40
     [<ffffffff83888161>] klist_iter_exit+0x61/0x70
     [<ffffffff82404389>] class_dev_iter_exit+0x9/0x10
     [<ffffffff81d2e8ea>] disk_seqf_stop+0x3a/0x50
     [<ffffffff8151f812>] seq_read+0x4b2/0x11a0
     [<ffffffff815f8fdc>] proc_reg_read+0xbc/0x180
     [<ffffffff814b24e4>] do_loop_readv_writev+0x134/0x210
     [<ffffffff814b4c45>] do_readv_writev+0x565/0x660
     [<ffffffff814b8a17>] vfs_readv+0x67/0xa0
     [<ffffffff814b8de6>] do_preadv+0x126/0x170
     [<ffffffff814b92ec>] SyS_preadv+0xc/0x10

This problem can occur in the following situation:

open()
- pread()
    - .seq_start()
       - iter = kmalloc() // succeeds
       - seqf->private = iter
    - .seq_stop()
       - kfree(seqf->private)
- pread()
    - .seq_start()
       - iter = kmalloc() // fails
    - .seq_stop()
       - class_dev_iter_exit(seqf->private) // boom! old pointer

As the comment in disk_seqf_stop() says, stop is called even if start
failed, so we need to reinitialise the private pointer to NULL when seq
iteration stops.

An alternative would be to set the private pointer to NULL when the
kmalloc() in disk_seqf_start() fails.

Cc: stable@vger.kernel.org
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>

scsi_lib: correctly retry failed zero length REQ_TYPE_FS commands

commit a621bac3044ed6f7ec5fa0326491b2d4838bfa93 upstream.

When SCSI was written, all commands coming from the filesystem
(REQ_TYPE_FS commands) had data.  This meant that our signal for needing
to complete the command was the number of bytes completed being equal to
the number of bytes in the request.  Unfortunately, with the advent of
flush barriers, we can now get zero length REQ_TYPE_FS commands, which
confuse this logic because they satisfy the condition every time.  This
means they never get retried even for retryable conditions, like UNIT
ATTENTION because we complete them early assuming they're done.  Fix
this by special casing the early completion condition to recognise zero
length commands with errors and let them drop through to the retry code.

Reported-by: Sebastian Parschauer <s.parschauer@gmx.de>
Signed-off-by: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
Tested-by: Jack Wang <jinpu.wang@profitbricks.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
[ jwang: backport from upstream 4.7 to fix scsi resize issue ]
Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>