git.stricted.de - GitHub/moto-9609/android_kernel_motorola

projects / GitHub / moto-9609 / android_kernel_motorola_exynos9610.git / log

Ingo Molnar [Fri, 27 Mar 2015 09:09:21 +0000 (10:09 +0100)]

Merge branch 'timers/core' into perf/timer, to apply dependent patch

An upcoming patch will depend on tai_ns() and NMI-safe ktime_get_raw_fast(),
so merge timers/core here in a separate topic branch until it's all cooked
and timers/core is merged upstream.

Signed-off-by: Ingo Molnar <mingo@kernel.org>

commit | commitdiff | tree

Peter Zijlstra [Tue, 17 Mar 2015 11:39:10 +0000 (12:39 +0100)]

time: Add ktime_get_tai_ns()

Because it was the only clock for which we didn't have a _ns()
accessor yet.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>

commit | commitdiff | tree

Peter Zijlstra [Thu, 19 Mar 2015 08:39:08 +0000 (09:39 +0100)]

time: Introduce tk_fast_raw

Add the NMI safe CLOCK_MONOTONIC_RAW accessor..

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150319093400.562746929@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>

commit | commitdiff | tree

Peter Zijlstra [Thu, 19 Mar 2015 08:36:19 +0000 (09:36 +0100)]

time: Parametrize all tk_fast_mono users

In preparation for more tk_fast instances, remove all hard-coded
tk_fast_mono references.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150319093400.484279927@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>

commit | commitdiff | tree

Peter Zijlstra [Thu, 19 Mar 2015 08:28:44 +0000 (09:28 +0100)]

time: Add timerkeeper::tkr_raw

Introduce tkr_raw and make use of it.

base_raw -> tkr_raw.base
clock->{mult,shift} -> tkr_raw.{mult.shift}

Kill timekeeping_get_ns_raw() in favour of
timekeeping_get_ns(&tkr_raw), this removes all mono_raw special
casing.

Duplicate the updates to tkr_mono.cycle_last into tkr_raw.cycle_last,
both need the same value.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150319093400.422589590@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>

commit | commitdiff | tree

Peter Zijlstra [Thu, 19 Mar 2015 09:09:06 +0000 (10:09 +0100)]

time: Rename timekeeper::tkr to timekeeper::tkr_mono

In preparation of adding another tkr field, rename this one to
tkr_mono. Also rename tk_read_base::base_mono to tk_read_base::base,
since the structure is not specific to CLOCK_MONOTONIC and the mono
name got added to the tk_read_base instance.

Lots of trivial churn.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150319093400.344679419@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>

commit | commitdiff | tree

Ingo Molnar [Fri, 27 Mar 2015 06:08:06 +0000 (07:08 +0100)]

timers, sched/clock: Clean up the code a bit

Trivial cleanups, to improve the readability of the generic sched_clock() code:

- Improve and standardize comments
- Standardize the coding style
- Use vertical spacing where appropriate
- etc.

No code changed:

  md5:
    19a053b31e0c54feaeff1492012b019a  sched_clock.o.before.asm
    19a053b31e0c54feaeff1492012b019a  sched_clock.o.after.asm

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>

commit | commitdiff | tree

Daniel Thompson [Thu, 26 Mar 2015 19:23:26 +0000 (12:23 -0700)]

timers, sched/clock: Avoid deadlock during read from NMI

Currently it is possible for an NMI (or FIQ on ARM) to come in
and read sched_clock() whilst update_sched_clock() has locked
the seqcount for writing. This results in the NMI handler
locking up when it calls raw_read_seqcount_begin().

This patch fixes the NMI safety issues by providing banked clock
data. This is a similar approach to the one used in Thomas
Gleixner's 4396e058c52e("timekeeping: Provide fast and NMI safe
access to CLOCK_MONOTONIC").

Suggested-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Link: http://lkml.kernel.org/r/1427397806-20889-6-git-send-email-john.stultz@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>

commit | commitdiff | tree

Daniel Thompson [Thu, 26 Mar 2015 19:23:25 +0000 (12:23 -0700)]

timers, sched/clock: Remove redundant notrace from update function

Currently update_sched_clock() is marked as notrace but this
function is not called by ftrace. This is trivially fixed by
removing the mark up.

Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Link: http://lkml.kernel.org/r/1427397806-20889-5-git-send-email-john.stultz@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>

commit | commitdiff | tree

Daniel Thompson [Thu, 26 Mar 2015 19:23:24 +0000 (12:23 -0700)]

timers, sched/clock: Remove suspend from clock_read_data()

Currently cd.read_data.suspended is read by the hotpath function
sched_clock(). This variable need not be accessed on the
hotpath. In fact, once it is removed, we can remove the
conditional branches from sched_clock() and install a dummy
read_sched_clock function to suspend the clock.

The new master copy of the function pointer
(actual_read_sched_clock) is introduced and is used for all
reads of the clock hardware except those within sched_clock
itself.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Will Deacon <will.deacon@arm.com>
Link: http://lkml.kernel.org/r/1427397806-20889-4-git-send-email-john.stultz@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>

commit | commitdiff | tree

Daniel Thompson [Thu, 26 Mar 2015 19:23:23 +0000 (12:23 -0700)]

timers, sched/clock: Optimize cache line usage

Currently sched_clock(), a very hot code path, is not optimized
to minimise its cache profile. In particular:

  1. cd is not ____cacheline_aligned,

  2. struct clock_data does not distinguish between hotpath and
     coldpath data, reducing locality of reference in the hotpath,

  3. Some hotpath data is missing from struct clock_data and is marked
     __read_mostly (which more or less guarantees it will not share a
     cache line with cd).

This patch corrects these problems by extracting all hotpath
data into a separate structure and using ____cacheline_aligned
to ensure the hotpath uses a single (64 byte) cache line.

Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Link: http://lkml.kernel.org/r/1427397806-20889-3-git-send-email-john.stultz@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>

commit | commitdiff | tree

Daniel Thompson [Thu, 26 Mar 2015 19:23:22 +0000 (12:23 -0700)]

timers, sched/clock: Match scope of read and write seqcounts

Currently the scope of the raw_write_seqcount_begin/end() in
sched_clock_register() far exceeds the scope of the read section
in sched_clock(). This gives the impression of safety during
cursory review but achieves little.

Note that this is likely to be a latent issue at present because
sched_clock_register() is typically called before we enable
interrupts, however the issue does risk bugs being needlessly
introduced as the code evolves.

This patch fixes the problem by increasing the scope of the read
locking performed by sched_clock() to cover all data modified by
sched_clock_register.

We also improve clarity by moving writes to struct clock_data
that do not impact sched_clock() outside of the critical
section.

Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
[ Reworked it slightly to apply to tip/timers/core]
Signed-off-by: John Stultz <john.stultz@linaro.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Link: http://lkml.kernel.org/r/1427397806-20889-2-git-send-email-john.stultz@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>

commit | commitdiff | tree

Linus Torvalds [Thu, 26 Mar 2015 22:04:05 +0000 (15:04 -0700)]

Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux

Pull drm refcounting fixes from Dave Airlie:
"Here is the complete set of i915 bug/warn/refcounting fixes"

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
  drm/i915: Fixup legacy plane->crtc link for initial fb config
  drm/i915: Fix atomic state when reusing the firmware fb
  drm/i915: Keep ring->active_list and ring->requests_list consistent
  drm/i915: Don't try to reference the fb in get_initial_plane_config()
  drm: Fixup racy refcounting in plane_force_disable

commit | commitdiff | tree

Linus Torvalds [Thu, 26 Mar 2015 21:53:47 +0000 (14:53 -0700)]

Merge tag 'dm-4.0-fix-2' of git://git./linux/kernel/git/device-mapper/linux-dm

Pull device mapper fix from Mike Snitzer:
"Fix DM core device cleanup regression -- due to a latent race that was
exposed by the bdi changes that were introduced during the 4.0 merge"

* tag 'dm-4.0-fix-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm: fix add_disk() NULL pointer due to race with free_dev()

commit | commitdiff | tree

Linus Torvalds [Thu, 26 Mar 2015 21:43:42 +0000 (14:43 -0700)]

Merge tag 'linux-kselftest-4.0-rc6' of git://git./linux/kernel/git/shuah/linux-kselftest

Pull kselftest fix from Shuah Khan.

* tag 'linux-kselftest-4.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests: Fix build failures when invoked from kselftest target

commit | commitdiff | tree

Dave Airlie [Thu, 26 Mar 2015 21:39:45 +0000 (07:39 +1000)]

Merge tag 'drm-intel-fixes-2015-03-26' of git://anongit.freedesktop.org/drm-intel into drm-fixes

This should cover the final warnings in -rc5 with two more backports
from our development branch (drm-intel-next-queued). They're the ones
from Daniel and Damien, with references to the reports.

This is on top of drm-fixes because of the dependency on the two earlier
fixes not yet in Linus' tree.

There's an additional regression fix from Chris.

* tag 'drm-intel-fixes-2015-03-26' of git://anongit.freedesktop.org/drm-intel:
  drm/i915: Fixup legacy plane->crtc link for initial fb config
  drm/i915: Fix atomic state when reusing the firmware fb
  drm/i915: Keep ring->active_list and ring->requests_list consistent

commit | commitdiff | tree

Linus Torvalds [Thu, 26 Mar 2015 21:11:17 +0000 (14:11 -0700)]

Merge branch 'for-linus' of git://git./linux/kernel/git/s390/linux

Pull s390 fixes from Martin Schwidefsky:
"A couple of bug fixes for s390.

  The ftrace comile fix is quite large for a -rc6 release, but it would
  be nice to have it in 4.0"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/smp: reenable smt after resume
  s390/mm: limit STACK_RND_MASK for compat tasks
  s390/ftrace: fix compile error if CONFIG_KPROBES is disabled
  s390/cpum_sf: add diagnostic sampling event only if it is authorized

commit | commitdiff | tree

Daniel Vetter [Wed, 25 Mar 2015 17:30:38 +0000 (18:30 +0100)]

drm/i915: Fixup legacy plane->crtc link for initial fb config

This is a very similar bug in the load detect code fixed in

commit 9128b040eb774e04bc23777b005ace2b66ab2a85
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date: Tue Mar 3 17:31:21 2015 +0100

drm/i915: Fix modeset state confusion in the load detect code

But this time around it was the initial fb code that forgot to update
the plane->crtc pointer. Otherwise it's the exact same bug, with the
exact same restrains (any set_config call/ioctl that doesn't disable
the pipe papers over the bug for free, so fairly hard to hit in normal
testing). So if you want the full explanation just go read that one
over there - it's rather long ...

Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Josh Boyer <jwboyer@fedoraproject.org>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Reported-and-tested-by: Josh Boyer <jwboyer@fedoraproject.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
[Jani: backported to drm-intel-fixes for v4.0-rc]
Reference: http://mid.gmane.org/CA+5PVA7ChbtJrknqws1qvZcbrg1CW2pQAFkSMURWWgyASRyGXg@mail.gmail.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>

commit | commitdiff | tree

Damien Lespiau [Thu, 5 Feb 2015 19:24:25 +0000 (19:24 +0000)]

drm/i915: Fix atomic state when reusing the firmware fb

Right now, we get a warning when taking over the firmware fb:

  [drm:drm_atomic_plane_check] FB set but no CRTC

with the following backtrace:

  [<ffffffffa010339d>] drm_atomic_check_only+0x35d/0x510 [drm]
  [<ffffffffa0103567>] drm_atomic_commit+0x17/0x60 [drm]
  [<ffffffffa00a6ccd>] drm_atomic_helper_plane_set_property+0x8d/0xd0 [drm_kms_helper]
  [<ffffffffa00f1fed>] drm_mode_plane_set_obj_prop+0x2d/0x90 [drm]
  [<ffffffffa00a8a1b>] restore_fbdev_mode+0x6b/0xf0 [drm_kms_helper]
  [<ffffffffa00aa969>] drm_fb_helper_restore_fbdev_mode_unlocked+0x29/0x80 [drm_kms_helper]
  [<ffffffffa00aa9e2>] drm_fb_helper_set_par+0x22/0x50 [drm_kms_helper]
  [<ffffffffa050a71a>] intel_fbdev_set_par+0x1a/0x60 [i915]
  [<ffffffff813ad444>] fbcon_init+0x4f4/0x580

That's because we update the plane state with the fb from the firmware, but we
never associate the plane to that CRTC.

We don't quite have the full DRM take over from HW state just yet, so
fake enough of the plane atomic state to pass the checks.

v2: Fix the state on which we set the CRTC in the case we're sharing the
    initial fb with another pipe. (Matt)

Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
[Jani: backported to drm-intel-fixes for v4.0-rc]
Reference: http://mid.gmane.org/CA+5PVA7yXH=U757w8V=Zj2U1URG4nYNav20NpjtQ4svVueyPNw@mail.gmail.com
Reference: http://lkml.kernel.org/r/CA+55aFweWR=nDzc2Y=rCtL_H8JfdprQiCimN5dwc+TgyD4Bjsg@mail.gmail.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>

commit | commitdiff | tree

Chris Wilson [Wed, 18 Mar 2015 18:19:22 +0000 (18:19 +0000)]

drm/i915: Keep ring->active_list and ring->requests_list consistent

If we retire requests last, we may use a later seqno and so clear
the requests lists without clearing the active list, leading to
confusion. Hence we should retire requests first for consistency with
the early return. The order used to be important as the lifecycle for
the object on the active list was determined by request->seqno. However,
the requests themselves are now reference counted removing the
constraint from the order of retirement.

Fixes regression from

commit 1b5a433a4dd967b125131da42b89b5cc0d5b1f57
Author: John Harrison <John.C.Harrison@Intel.com>
Date:   Mon Nov 24 18:49:42 2014 +0000

    drm/i915: Convert 'i915_seqno_passed' calls into 'i915_gem_request_completed
'

and a

WARNING: CPU: 0 PID: 1383 at drivers/gpu/drm/i915/i915_gem_evict.c:279 i915_gem_evict_vm+0x10c/0x140()
WARN_ON(!list_empty(&vm->active_list))

Identified by updating WATCH_LISTS:

[drm:i915_verify_lists] *ERROR* blitter ring: active list not empty, but no requests
WARNING: CPU: 0 PID: 681 at drivers/gpu/drm/i915/i915_gem.c:2751 i915_gem_retire_requests_ring+0x149/0x230()
WARN_ON(i915_verify_lists(ring->dev))

Note that this is only a problem in evict_vm where the following happens
after a retire_request has cleaned out all requests, but not all active
bo:
- intel_ring_idle called from i915_gpu_idle notices that no requests are
  outstanding and immediately returns.
- i915_gem_retire_requests_ring called from i915_gem_retire_requests also
  immediately returns when there's no request, still leaving the bo on the
  active list.
- evict_vm hits the WARN_ON(!list_empty(&vm->active_list)) after evicting
  all active objects that there's still stuff left that shouldn't be
  there.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>

commit | commitdiff | tree

Linus Torvalds [Wed, 25 Mar 2015 23:52:53 +0000 (16:52 -0700)]

Merge tag 'metag-fixes-v4.0-2' of git://git./linux/kernel/git/jhogan/metag

Pull arch/metag fix from James Hogan:
"Another metag architecture fix for v4.0

  This is another single fix, for an include dependency problem when
  using ioremap_wc() from asm/io.h without also including asm/pgtable.h"

* tag 'metag-fixes-v4.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/metag:
  metag: Fix ioremap_wc/ioremap_cached build errors

commit | commitdiff | tree

Linus Torvalds [Wed, 25 Mar 2015 23:21:17 +0000 (16:21 -0700)]

Merge branch 'akpm' (patches from Andrew)

Merge misc fixes from Andrew Morton:
"15 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mm: numa: mark huge PTEs young when clearing NUMA hinting faults
  mm: numa: slow PTE scan rate if migration failures occur
  mm: numa: preserve PTE write permissions across a NUMA hinting fault
  mm: numa: group related processes based on VMA flags instead of page table flags
  hfsplus: fix B-tree corruption after insertion at position 0
  MAINTAINERS: add Jan as DMI/SMBIOS support maintainer
  fs/affs/file.c: unlock/release page on error
  mm/page_alloc.c: call kernel_map_pages in unset_migrateype_isolate
  mm/slub: fix lockups on PREEMPT && !SMP kernels
  mm/memory hotplug: postpone the reset of obsolete pgdat
  MAINTAINERS: correct rtc armada38x pattern entry
  mm/pagewalk.c: prevent positive return value of walk_page_test() from being passed to callers
  mm: fix anon_vma->degree underflow in anon_vma endless growing prevention
  drivers/rtc/rtc-mrst: fix suspend/resume
  aoe: update aoe maintainer information

commit | commitdiff | tree

Mel Gorman [Wed, 25 Mar 2015 22:55:45 +0000 (15:55 -0700)]

mm: numa: mark huge PTEs young when clearing NUMA hinting faults

Base PTEs are marked young when the NUMA hinting information is cleared
but the same does not happen for huge pages which this patch addresses.

Note that migrated pages are not marked young as the base page migration
code does not assume that migrated pages have been referenced. This
could be addressed but beyond the scope of this series which is aimed at
Dave Chinners shrink workload that is unlikely to be affected by this
issue.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Mel Gorman [Wed, 25 Mar 2015 22:55:42 +0000 (15:55 -0700)]

mm: numa: slow PTE scan rate if migration failures occur

Dave Chinner reported the following on https://lkml.org/lkml/2015/3/1/226

  Across the board the 4.0-rc1 numbers are much slower, and the degradation
  is far worse when using the large memory footprint configs. Perf points
  straight at the cause - this is from 4.0-rc1 on the "-o bhash=101073" config:

   -   56.07%    56.07%  [kernel]            [k] default_send_IPI_mask_sequence_phys
      - default_send_IPI_mask_sequence_phys
         - 99.99% physflat_send_IPI_mask
            - 99.37% native_send_call_func_ipi
                 smp_call_function_many
               - native_flush_tlb_others
                  - 99.85% flush_tlb_page
                       ptep_clear_flush
                       try_to_unmap_one
                       rmap_walk
                       try_to_unmap
                       migrate_pages
                       migrate_misplaced_page
                     - handle_mm_fault
                        - 99.73% __do_page_fault
                             trace_do_page_fault
                             do_async_page_fault
                           + async_page_fault
              0.63% native_send_call_func_single_ipi
                 generic_exec_single
                 smp_call_function_single

This is showing excessive migration activity even though excessive
migrations are meant to get throttled.  Normally, the scan rate is tuned
on a per-task basis depending on the locality of faults.  However, if
migrations fail for any reason then the PTE scanner may scan faster if
the faults continue to be remote.  This means there is higher system CPU
overhead and fault trapping at exactly the time we know that migrations
cannot happen.  This patch tracks when migration failures occur and
slows the PTE scanner.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reported-by: Dave Chinner <david@fromorbit.com>
Tested-by: Dave Chinner <david@fromorbit.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Mel Gorman [Wed, 25 Mar 2015 22:55:40 +0000 (15:55 -0700)]

mm: numa: preserve PTE write permissions across a NUMA hinting fault

Protecting a PTE to trap a NUMA hinting fault clears the writable bit
and further faults are needed after trapping a NUMA hinting fault to set
the writable bit again.  This patch preserves the writable bit when
trapping NUMA hinting faults.  The impact is obvious from the number of
minor faults trapped during the basis balancing benchmark and the system
CPU usage;

  autonumabench
                                             4.0.0-rc4             4.0.0-rc4
                                              baseline              preserve
  Time System-NUMA01                  107.13 (  0.00%)      103.13 (  3.73%)
  Time System-NUMA01_THEADLOCAL       131.87 (  0.00%)       83.30 ( 36.83%)
  Time System-NUMA02                    8.95 (  0.00%)       10.72 (-19.78%)
  Time System-NUMA02_SMT                4.57 (  0.00%)        3.99 ( 12.69%)
  Time Elapsed-NUMA01                 515.78 (  0.00%)      517.26 ( -0.29%)
  Time Elapsed-NUMA01_THEADLOCAL      384.10 (  0.00%)      384.31 ( -0.05%)
  Time Elapsed-NUMA02                  48.86 (  0.00%)       48.78 (  0.16%)
  Time Elapsed-NUMA02_SMT              47.98 (  0.00%)       48.12 ( -0.29%)

               4.0.0-rc4   4.0.0-rc4
                baseline    preserve
  User          44383.95    43971.89
  System          252.61      201.24
  Elapsed         998.68     1000.94

  Minor Faults   2597249     1981230
  Major Faults       365         364

There is a similar drop in system CPU usage using Dave Chinner's xfsrepair
workload

                                      4.0.0-rc4             4.0.0-rc4
                                       baseline              preserve
  Amean    real-xfsrepair      454.14 (  0.00%)      442.36 (  2.60%)
  Amean    syst-xfsrepair      277.20 (  0.00%)      204.68 ( 26.16%)

The patch looks hacky but the alternatives looked worse.  The tidest was
to rewalk the page tables after a hinting fault but it was more complex
than this approach and the performance was worse.  It's not generally
safe to just mark the page writable during the fault if it's a write
fault as it may have been read-only for COW so that approach was
discarded.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reported-by: Dave Chinner <david@fromorbit.com>
Tested-by: Dave Chinner <david@fromorbit.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Mel Gorman [Wed, 25 Mar 2015 22:55:37 +0000 (15:55 -0700)]

mm: numa: group related processes based on VMA flags instead of page table flags

These are three follow-on patches based on the xfsrepair workload Dave
Chinner reported was problematic in 4.0-rc1 due to changes in page table
management -- https://lkml.org/lkml/2015/3/1/226.

Much of the problem was reduced by commit 53da3bc2ba9e ("mm: fix up numa
read-only thread grouping logic") and commit ba68bc0115eb ("mm: thp:
Return the correct value for change_huge_pmd").  It was known that the
performance in 3.19 was still better even if is far less safe.  This
series aims to restore the performance without compromising on safety.

For the test of this mail, I'm comparing 3.19 against 4.0-rc4 and the
three patches applied on top

  autonumabench
                                                3.19.0             4.0.0-rc4             4.0.0-rc4             4.0.0-rc4             4.0.0-rc4
                                               vanilla               vanilla          vmwrite-v5r8         preserve-v5r8         slowscan-v5r8
  Time System-NUMA01                  124.00 (  0.00%)      161.86 (-30.53%)      107.13 ( 13.60%)      103.13 ( 16.83%)      145.01 (-16.94%)
  Time System-NUMA01_THEADLOCAL       115.54 (  0.00%)      107.64 (  6.84%)      131.87 (-14.13%)       83.30 ( 27.90%)       92.35 ( 20.07%)
  Time System-NUMA02                    9.35 (  0.00%)       10.44 (-11.66%)        8.95 (  4.28%)       10.72 (-14.65%)        8.16 ( 12.73%)
  Time System-NUMA02_SMT                3.87 (  0.00%)        4.63 (-19.64%)        4.57 (-18.09%)        3.99 ( -3.10%)        3.36 ( 13.18%)
  Time Elapsed-NUMA01                 570.06 (  0.00%)      567.82 (  0.39%)      515.78 (  9.52%)      517.26 (  9.26%)      543.80 (  4.61%)
  Time Elapsed-NUMA01_THEADLOCAL      393.69 (  0.00%)      384.83 (  2.25%)      384.10 (  2.44%)      384.31 (  2.38%)      380.73 (  3.29%)
  Time Elapsed-NUMA02                  49.09 (  0.00%)       49.33 ( -0.49%)       48.86 (  0.47%)       48.78 (  0.63%)       50.94 ( -3.77%)
  Time Elapsed-NUMA02_SMT              47.51 (  0.00%)       47.15 (  0.76%)       47.98 ( -0.99%)       48.12 ( -1.28%)       49.56 ( -4.31%)

                3.19.0   4.0.0-rc4   4.0.0-rc4   4.0.0-rc4   4.0.0-rc4
               vanilla     vanillavmwrite-v5r8preserve-v5r8slowscan-v5r8
  User        46334.60    46391.94    44383.95    43971.89    44372.12
  System        252.84      284.66      252.61      201.24      249.00
  Elapsed      1062.14     1050.96      998.68     1000.94     1026.78

Overall the system CPU usage is comparable and the test is naturally a
bit variable.  The slowing of the scanner hurts numa01 but on this
machine it is an adverse workload and patches that dramatically help it
often hurt absolutely everything else.

Due to patch 2, the fault activity is interesting

                                  3.19.0   4.0.0-rc4   4.0.0-rc4   4.0.0-rc4   4.0.0-rc4
                                 vanilla     vanillavmwrite-v5r8preserve-v5r8slowscan-v5r8
  Minor Faults                   2097811     2656646     2597249     1981230     1636841
  Major Faults                       362         450         365         364         365

Note the impact preserving the write bit across protection updates and
fault reduces faults.

  NUMA alloc hit                 1229008     1217015     1191660     1178322     1199681
  NUMA alloc miss                      0           0           0           0           0
  NUMA interleave hit                  0           0           0           0           0
  NUMA alloc local               1228514     1216317     1190871     1177448     1199021
  NUMA base PTE updates        245706197   240041607   238195516   244704842   115012800
  NUMA huge PMD updates           479530      468448      464868      477573      224487
  NUMA page range updates      491225557   479886983   476207932   489222218   229950144
  NUMA hint faults                659753      656503      641678      656926      294842
  NUMA hint local faults          381604      373963      360478      337585      186249
  NUMA hint local percent             57          56          56          51          63
  NUMA pages migrated            5412140     6374899     6266530     5277468     5755096
  AutoNUMA cost                    5121%       5083%       4994%       5097%       2388%

Here the impact of slowing the PTE scanner on migratrion failures is
obvious as "NUMA base PTE updates" and "NUMA huge PMD updates" are
massively reduced even though the headline performance is very similar.

As xfsrepair was the reported workload here is the impact of the series
on it.

  xfsrepair
                                         3.19.0             4.0.0-rc4             4.0.0-rc4             4.0.0-rc4             4.0.0-rc4
                                        vanilla               vanilla          vmwrite-v5r8         preserve-v5r8         slowscan-v5r8
  Min      real-fsmark        1183.29 (  0.00%)     1165.73 (  1.48%)     1152.78 (  2.58%)     1153.64 (  2.51%)     1177.62 (  0.48%)
  Min      syst-fsmark        4107.85 (  0.00%)     4027.75 (  1.95%)     3986.74 (  2.95%)     3979.16 (  3.13%)     4048.76 (  1.44%)
  Min      real-xfsrepair      441.51 (  0.00%)      463.96 ( -5.08%)      449.50 ( -1.81%)      440.08 (  0.32%)      439.87 (  0.37%)
  Min      syst-xfsrepair      195.76 (  0.00%)      278.47 (-42.25%)      262.34 (-34.01%)      203.70 ( -4.06%)      143.64 ( 26.62%)
  Amean    real-fsmark        1188.30 (  0.00%)     1177.34 (  0.92%)     1157.97 (  2.55%)     1158.21 (  2.53%)     1182.22 (  0.51%)
  Amean    syst-fsmark        4111.37 (  0.00%)     4055.70 (  1.35%)     3987.19 (  3.02%)     3998.72 (  2.74%)     4061.69 (  1.21%)
  Amean    real-xfsrepair      450.88 (  0.00%)      468.32 ( -3.87%)      454.14 ( -0.72%)      442.36 (  1.89%)      440.59 (  2.28%)
  Amean    syst-xfsrepair      199.66 (  0.00%)      290.60 (-45.55%)      277.20 (-38.84%)      204.68 ( -2.51%)      150.55 ( 24.60%)
  Stddev   real-fsmark           4.12 (  0.00%)       10.82 (-162.29%)       4.14 ( -0.28%)        5.98 (-45.05%)        4.60 (-11.53%)
  Stddev   syst-fsmark           2.63 (  0.00%)       20.32 (-671.82%)       0.37 ( 85.89%)       16.47 (-525.59%)      15.05 (-471.79%)
  Stddev   real-xfsrepair        6.87 (  0.00%)        4.55 ( 33.75%)        3.46 ( 49.58%)        1.78 ( 74.12%)        0.52 ( 92.50%)
  Stddev   syst-xfsrepair        3.02 (  0.00%)       10.30 (-241.37%)      13.17 (-336.37%)       0.71 ( 76.63%)        5.00 (-65.61%)
  CoeffVar real-fsmark           0.35 (  0.00%)        0.92 (-164.73%)       0.36 ( -2.91%)        0.52 (-48.82%)        0.39 (-12.10%)
  CoeffVar syst-fsmark           0.06 (  0.00%)        0.50 (-682.41%)       0.01 ( 85.45%)        0.41 (-543.22%)       0.37 (-478.78%)
  CoeffVar real-xfsrepair        1.52 (  0.00%)        0.97 ( 36.21%)        0.76 ( 49.94%)        0.40 ( 73.62%)        0.12 ( 92.33%)
  CoeffVar syst-xfsrepair        1.51 (  0.00%)        3.54 (-134.54%)       4.75 (-214.31%)       0.34 ( 77.20%)        3.32 (-119.63%)
  Max      real-fsmark        1193.39 (  0.00%)     1191.77 (  0.14%)     1162.90 (  2.55%)     1166.66 (  2.24%)     1188.50 (  0.41%)
  Max      syst-fsmark        4114.18 (  0.00%)     4075.45 (  0.94%)     3987.65 (  3.08%)     4019.45 (  2.30%)     4082.80 (  0.76%)
  Max      real-xfsrepair      457.80 (  0.00%)      474.60 ( -3.67%)      457.82 ( -0.00%)      444.42 (  2.92%)      441.03 (  3.66%)
  Max      syst-xfsrepair      203.11 (  0.00%)      303.65 (-49.50%)      294.35 (-44.92%)      205.33 ( -1.09%)      155.28 ( 23.55%)

The really relevant lines as syst-xfsrepair which is the system CPU
usage when running xfsrepair.  Note that on my machine the overhead was
45% higher on 4.0-rc4 which may be part of what Dave is seeing.  Once we
preserve the write bit across faults, it's only 2.51% higher on average.
With the full series applied, system CPU usage is 24.6% lower on
average.

Again, the impact of preserving the write bit on minor faults is obvious
and the impact of slowing scanning after migration failures is obvious
on the PTE updates.  Note also that the number of pages migrated is much
reduced even though the headline performance is comparable.

                                  3.19.0   4.0.0-rc4   4.0.0-rc4   4.0.0-rc4   4.0.0-rc4
                                 vanilla     vanillavmwrite-v5r8preserve-v5r8slowscan-v5r8
  Minor Faults                 153466827   254507978   249163829   153501373   105737890
  Major Faults                       610         702         690         649         724
  NUMA base PTE updates        217735049   210756527   217729596   216937111   144344993
  NUMA huge PMD updates           129294       85044      106921      127246       79887
  NUMA pages migrated           21938995    29705270    28594162    22687324    16258075

                        3.19.0   4.0.0-rc4   4.0.0-rc4   4.0.0-rc4   4.0.0-rc4
                       vanilla     vanillavmwrite-v5r8preserve-v5r8slowscan-v5r8
  Mean sdb-avgqusz       13.47        2.54        2.55        2.47        2.49
  Mean sdb-avgrqsz      202.32      140.22      139.50      139.02      138.12
  Mean sdb-await         25.92        5.09        5.33        5.02        5.22
  Mean sdb-r_await        4.71        0.19        0.83        0.51        0.11
  Mean sdb-w_await      104.13        5.21        5.38        5.05        5.32
  Mean sdb-svctm          0.59        0.13        0.14        0.13        0.14
  Mean sdb-rrqm           0.16        0.00        0.00        0.00        0.00
  Mean sdb-wrqm           3.59     1799.43     1826.84     1812.21     1785.67
  Max  sdb-avgqusz      111.06       12.13       14.05       11.66       15.60
  Max  sdb-avgrqsz      255.60      190.34      190.01      187.33      191.78
  Max  sdb-await        168.24       39.28       49.22       44.64       65.62
  Max  sdb-r_await      660.00       52.00      280.00       76.00       12.00
  Max  sdb-w_await     7804.00       39.28       49.22       44.64       65.62
  Max  sdb-svctm          4.00        2.82        2.86        1.98        2.84
  Max  sdb-rrqm           8.30        0.00        0.00        0.00        0.00
  Max  sdb-wrqm          34.20     5372.80     5278.60     5386.60     5546.15

FWIW, I also checked SPECjbb in different configurations but it's
similar observations -- minor faults lower, PTE update activity lower
and performance is roughly comparable against 3.19.

This patch (of 3):

Threads that share writable data within pages are grouped together as
related tasks.  This decision is based on whether the PTE is marked
dirty which is subject to timing races between the PTE scanner update
and when the application writes the page.  If the page is file-backed,
then background flushes and sync also affect placement.  This is
unpredictable behaviour which is impossible to reason about so this
patch makes grouping decisions based on the VMA flags.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reported-by: Dave Chinner <david@fromorbit.com>
Tested-by: Dave Chinner <david@fromorbit.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Sergei Antonov [Wed, 25 Mar 2015 22:55:34 +0000 (15:55 -0700)]

hfsplus: fix B-tree corruption after insertion at position 0

Fix B-tree corruption when a new record is inserted at position 0 in the
node in hfs_brec_insert().  In this case a hfs_brec_update_parent() is
called to update the parent index node (if exists) and it is passed
hfs_find_data with a search_key containing a newly inserted key instead
of the key to be updated.  This results in an inconsistent index node.
The bug reproduces on my machine after an extents overflow record for
the catalog file (CNID=4) is inserted into the extents overflow B-tree.
Because of a low (reserved) value of CNID=4, it has to become the first
record in the first leaf node.

The resulting first leaf node is correct:

  ----------------------------------------------------
  | key0.CNID=4 | key1.CNID=123 | key2.CNID=456, ... |
  ----------------------------------------------------

But the parent index key0 still contains the previous key CNID=123:

  -----------------------
  | key0.CNID=123 | ... |
  -----------------------

A change in hfs_brec_insert() makes hfs_brec_update_parent() work
correctly by preventing it from getting fd->record=-1 value from
__hfs_brec_find().

Along the way, I removed duplicate code with unification of the if
condition.  The resulting code is equivalent to the original code
because node is never 0.

Also hfs_brec_update_parent() will now return an error after getting a
negative fd->record value.  However, the return value of
hfs_brec_update_parent() is not checked anywhere in the file and I'm
leaving it unchanged by this patch.  brec.c lacks error checking after
some other calls too, but this issue is of less importance than the one
being fixed by this patch.

Signed-off-by: Sergei Antonov <saproj@gmail.com>
Cc: Joe Perches <joe@perches.com>
Reviewed-by: Vyacheslav Dubeyko <slava@dubeyko.com>
Acked-by: Hin-Tak Leung <htl10@users.sourceforge.net>
Cc: Anton Altaparmakov <aia21@cam.ac.uk>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Jean Delvare [Wed, 25 Mar 2015 22:55:31 +0000 (15:55 -0700)]

MAINTAINERS: add Jan as DMI/SMBIOS support maintainer

I am familiar with these drivers and I care about them so let me add
myself as their maintainer.

Signed-off-by: Jean Delvare <jdelvare@suse.de>
Acked-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Taesoo Kim [Wed, 25 Mar 2015 22:55:29 +0000 (15:55 -0700)]

fs/affs/file.c: unlock/release page on error

When affs_bread_ino() fails, correctly unlock the page and release the
page cache with proper error value. All write_end() should
unlock/release the page that was locked by write_beg().

Signed-off-by: Taesoo Kim <tsgatesv@gmail.com>
Cc: Fabian Frederick <fabf@skynet.be>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Laura Abbott [Wed, 25 Mar 2015 22:55:26 +0000 (15:55 -0700)]

mm/page_alloc.c: call kernel_map_pages in unset_migrateype_isolate

Commit 3c605096d315 ("mm/page_alloc: restrict max order of merging on
isolated pageblock") changed the logic of unset_migratetype_isolate to
check the buddy allocator and explicitly call __free_pages to merge.

The page that is being freed in this path never had prep_new_page called
so set_page_refcounted is called explicitly but there is no call to
kernel_map_pages.  With the default kernel_map_pages this is mostly
harmless but if kernel_map_pages does any manipulation of the page
tables (unmapping or setting pages to read only) this may trigger a
fault:

    alloc_contig_range test_pages_isolated(ceb00, ced00) failed
    Unable to handle kernel paging request at virtual address ffffffc0cec00000
    pgd = ffffffc045fc4000
    [ffffffc0cec00000] *pgd=0000000000000000
    Internal error: Oops: 9600004f [#1] PREEMPT SMP
    Modules linked in: exfatfs
    CPU: 1 PID: 23237 Comm: TimedEventQueue Not tainted 3.10.49-gc72ad36-dirty #1
    task: ffffffc03de52100 ti: ffffffc015388000 task.ti: ffffffc015388000
    PC is at memset+0xc8/0x1c0
    LR is at kernel_map_pages+0x1ec/0x244

Fix this by calling kernel_map_pages to ensure the page is set in the
page table properly

Fixes: 3c605096d315 ("mm/page_alloc: restrict max order of merging on isolated pageblock")
Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Mel Gorman <mgorman@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Vladimir Davydov <vdavydov@parallels.com>
Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Gioh Kim <gioh.kim@lge.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Mark Rutland [Wed, 25 Mar 2015 22:55:23 +0000 (15:55 -0700)]

mm/slub: fix lockups on PREEMPT && !SMP kernels

Commit 9aabf810a67c ("mm/slub: optimize alloc/free fastpath by removing
preemption on/off") introduced an occasional hang for kernels built with
CONFIG_PREEMPT && !CONFIG_SMP.

The problem is the following loop the patch introduced to
slab_alloc_node and slab_free:

    do {
        tid = this_cpu_read(s->cpu_slab->tid);
        c = raw_cpu_ptr(s->cpu_slab);
    } while (IS_ENABLED(CONFIG_PREEMPT) && unlikely(tid != c->tid));

GCC 4.9 has been observed to hoist the load of c and c->tid above the
loop for !SMP kernels (as in this case raw_cpu_ptr(x) is compile-time
constant and does not force a reload).  On arm64 the generated assembly
looks like:

         ldr     x4, [x0,#8]
  loop:
         ldr     x1, [x0,#8]
         cmp     x1, x4
         b.ne    loop

If the thread is preempted between the load of c->tid (into x1) and tid
(into x4), and an allocation or free occurs in another thread (bumping
the cpu_slab's tid), the thread will be stuck in the loop until
s->cpu_slab->tid wraps, which may be forever in the absence of
allocations/frees on the same CPU.

This patch changes the loop condition to access c->tid with READ_ONCE.
This ensures that the value is reloaded even when the compiler would
otherwise assume it could cache the value, and also ensures that the
load will not be torn.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Steve Capper <steve.capper@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Gu Zheng [Wed, 25 Mar 2015 22:55:20 +0000 (15:55 -0700)]

mm/memory hotplug: postpone the reset of obsolete pgdat

Qiu Xishi reported the following BUG when testing hot-add/hot-remove node under
stress condition:

  BUG: unable to handle kernel paging request at 0000000000025f60
  IP: next_online_pgdat+0x1/0x50
  PGD 0
  Oops: 0000 [#1] SMP
  ACPI: Device does not support D3cold
  Modules linked in: fuse nls_iso8859_1 nls_cp437 vfat fat loop dm_mod coretemp mperf crc32c_intel ghash_clmulni_intel aesni_intel ablk_helper cryptd lrw gf128mul glue_helper aes_x86_64 pcspkr microcode igb dca i2c_algo_bit ipv6 megaraid_sas iTCO_wdt i2c_i801 i2c_core iTCO_vendor_support tg3 sg hwmon ptp lpc_ich pps_core mfd_core acpi_pad rtc_cmos button ext3 jbd mbcache sd_mod crc_t10dif scsi_dh_alua scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh ahci libahci libata scsi_mod [last unloaded: rasf]
  CPU: 23 PID: 238 Comm: kworker/23:1 Tainted: G           O 3.10.15-5885-euler0302 #1
  Hardware name: HUAWEI TECHNOLOGIES CO.,LTD. Huawei N1/Huawei N1, BIOS V100R001 03/02/2015
  Workqueue: events vmstat_update
  task: ffffa800d32c0000 ti: ffffa800d32ae000 task.ti: ffffa800d32ae000
  RIP: 0010: next_online_pgdat+0x1/0x50
  RSP: 0018:ffffa800d32afce8  EFLAGS: 00010286
  RAX: 0000000000001440 RBX: ffffffff81da53b8 RCX: 0000000000000082
  RDX: 0000000000000000 RSI: 0000000000000082 RDI: 0000000000000000
  RBP: ffffa800d32afd28 R08: ffffffff81c93bfc R09: ffffffff81cbdc96
  R10: 00000000000040ec R11: 00000000000000a0 R12: ffffa800fffb3440
  R13: ffffa800d32afd38 R14: 0000000000000017 R15: ffffa800e6616800
  FS:  0000000000000000(0000) GS:ffffa800e6600000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000025f60 CR3: 0000000001a0b000 CR4: 00000000001407e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
    refresh_cpu_vm_stats+0xd0/0x140
    vmstat_update+0x11/0x50
    process_one_work+0x194/0x3d0
    worker_thread+0x12b/0x410
    kthread+0xc6/0xd0
    ret_from_fork+0x7c/0xb0

The cause is the "memset(pgdat, 0, sizeof(*pgdat))" at the end of
try_offline_node, which will reset all the content of pgdat to 0, as the
pgdat is accessed lock-free, so that the users still using the pgdat
will panic, such as the vmstat_update routine.

process A: offline node XX:

vmstat_updat()
   refresh_cpu_vm_stats()
     for_each_populated_zone()
       find online node XX
     cond_resched()
offline cpu and memory, then try_offline_node()
node_set_offline(nid), and memset(pgdat, 0, sizeof(*pgdat))
       zone = next_zone(zone)
         pg_data_t *pgdat = zone->zone_pgdat;  // here pgdat is NULL now
           next_online_pgdat(pgdat)
             next_online_node(pgdat->node_id);  // NULL pointer access

So the solution here is postponing the reset of obsolete pgdat from
try_offline_node() to hotadd_new_pgdat(), and just resetting
pgdat->nr_zones and pgdat->classzone_idx to be 0 rather than the memset
0 to avoid breaking pointer information in pgdat.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Reported-by: Xishi Qiu <qiuxishi@huawei.com>
Suggested-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Xie XiuQi <xiexiuqi@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Joe Perches [Wed, 25 Mar 2015 22:55:17 +0000 (15:55 -0700)]

MAINTAINERS: correct rtc armada38x pattern entry

Commit c6a95dbee793 ("MAINTAINERS: add the RTC driver for the
Armada38x") typoed the pattern, fix it.

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Naoya Horiguchi [Wed, 25 Mar 2015 22:55:14 +0000 (15:55 -0700)]

mm/pagewalk.c: prevent positive return value of walk_page_test() from being passed to callers

walk_page_test() is purely pagewalk's internal stuff, and its positive
return values are not intended to be passed to the callers of pagewalk.

However, in the current code if the last vma in the do-while loop in
walk_page_range() happens to return a positive value, it leaks outside
walk_page_range(). So the user visible effect is invalid/unexpected
return value (according to the reporter, mbind() causes it.)

This patch fixes it simply by reinitializing the return value after
checked.

Another exposed interface, walk_page_vma(), already returns 0 for such
cases so no problem.

Fixes: fafaa4264eba ("pagewalk: improve vma handling")
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Kazutomo Yoshii <kazutomo.yoshii@gmail.com>
Reported-by: Kazutomo Yoshii <kazutomo.yoshii@gmail.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Leon Yu [Wed, 25 Mar 2015 22:55:11 +0000 (15:55 -0700)]

mm: fix anon_vma->degree underflow in anon_vma endless growing prevention

I have constantly stumbled upon "kernel BUG at mm/rmap.c:399!" after
upgrading to 3.19 and had no luck with 4.0-rc1 neither.

So, after looking into new logic introduced by commit 7a3ef208e662 ("mm:
prevent endless growth of anon_vma hierarchy"), I found chances are that
unlink_anon_vmas() is called without incrementing dst->anon_vma->degree
in anon_vma_clone() due to allocation failure.  If dst->anon_vma is not
NULL in error path, its degree will be incorrectly decremented in
unlink_anon_vmas() and eventually underflow when exiting as a result of
another call to unlink_anon_vmas().  That's how "kernel BUG at
mm/rmap.c:399!" is triggered for me.

This patch fixes the underflow by dropping dst->anon_vma when allocation
fails.  It's safe to do so regardless of original value of dst->anon_vma
because dst->anon_vma doesn't have valid meaning if anon_vma_clone()
fails.  Besides, callers don't care dst->anon_vma in such case neither.

Also suggested by Michal Hocko, we can clean up vma_adjust() a bit as
anon_vma_clone() now does the work.

[akpm@linux-foundation.org: tweak comment]
Fixes: 7a3ef208e662 ("mm: prevent endless growth of anon_vma hierarchy")
Signed-off-by: Leon Yu <chianglungyu@gmail.com>
Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Lars-Peter Clausen [Wed, 25 Mar 2015 22:55:09 +0000 (15:55 -0700)]

drivers/rtc/rtc-mrst: fix suspend/resume

The Moorestown RTC driver implements suspend and resume callbacks and
assigns them to the suspend and resume fields of the device_driver
struct. These callbacks are never actually called by anything though.

Modify the driver to properly use dev_pm_ops so that the suspend and
resume functions are actually executed upon suspend/resume.

[akpm@linux-foundation.org: device_driver.name is const char *]
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Feng Tang <feng.tang@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Ed Cashin [Wed, 25 Mar 2015 22:55:06 +0000 (15:55 -0700)]

aoe: update aoe maintainer information

The coraid.com email address is defunct. The old aoe support area hosted
at coraid.com is no longer up. These changes update the email and website
to current ones.

Signed-off-by: Ed Cashin <ed.cashin@acm.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Linus Torvalds [Wed, 25 Mar 2015 22:40:21 +0000 (15:40 -0700)]

Merge branch 'for-linus' of git://git.kernel.dk/linux-block

Pull block layer fixes from Jens Axboe:
"A small collection of fixes that has been gathered over the last few
  weeks.  This contains:

   - A one-liner fix for NVMe, fixing a missing list_head init that
     could makes us oops on hitting recovery at load time.

   - Two small blk-mq fixes:
        - Fixup a bad goto jump on error handling.
        - Fix for oopsing if running out of reserved tags.

   - A memory leak fix for NBD.

   - Two small writeback fixes from Tejun, fixing a missing init to
     INITIAL_JIFFIES, and a possible underflow introduced recently.

   - A core merge fixup in sg gap detection, where rq->biotail was
     indexed with the count of rq->bio"

* 'for-linus' of git://git.kernel.dk/linux-block:
  writeback: fix possible underflow in write bandwidth calculation
  NVMe: Initialize device list head before starting
  Fix bug in blk_rq_merge_ok
  blkmq: Fix NULL pointer deref when all reserved tags in
  blk-mq: fix use of incorrect goto label in blk_mq_init_queue error path
  nbd: fix possible memory leak
  writeback: add missing INITIAL_JIFFIES init in global_update_bandwidth()

commit | commitdiff | tree

Heiko Carstens [Sat, 21 Mar 2015 11:43:08 +0000 (12:43 +0100)]

s390/smp: reenable smt after resume

After a suspend/resume cycle we missed to enable smt again, which leads
to all sorts of bugs, since the kernel assumes smt is enabled, while the
hardware thinks it is not.

Reported-and-tested-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reported-by: Stefan Haberland <stefan.haberland@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>

commit | commitdiff | tree

Linus Torvalds [Wed, 25 Mar 2015 00:27:18 +0000 (17:27 -0700)]

Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux

Pull two arm64 fixes from Catalin Marinas:

- switch_mm() fix where init_mm.pgd ends up in the user TTBR0;
   swapper_pg_dir is not suitable for user mappings

- this_cpu accessors fix for preemption safety

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: percpu: Make this_cpu accessors pre-empt safe
  arm64: Use the reserved TTBR0 if context switching to the init_mm

commit | commitdiff | tree

Linus Torvalds [Wed, 25 Mar 2015 00:23:03 +0000 (17:23 -0700)]

Merge tag 'powerpc-4.0-3' of git://git./linux/kernel/git/mpe/linux

Pull powerpc fixes from Michael Ellerman:

- Fix the MCE code to use CONFIG_KVM_BOOK3S_64_HANDLER

- Little endian fixes for post mobility device tree update

- Add PVR for POWER8NVL processor

- Fixes for hypervisor doorbell handling

* tag 'powerpc-4.0-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux:
  powerpc/book3s: Fix the MCE code to use CONFIG_KVM_BOOK3S_64_HANDLER
  powerpc/pseries: Little endian fixes for post mobility device tree update
  powerpc: Add PVR for POWER8NVL processor
  powerpc/powernv: Fixes for hypervisor doorbell handling

commit | commitdiff | tree

Linus Torvalds [Wed, 25 Mar 2015 00:13:44 +0000 (17:13 -0700)]

Merge git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Marcelo Tosatti:
"Fix for higher-order page allocation failures, fix Xen-on-KVM with
  x2apic, L1 crash with unrestricted guest mode (nested VMX)"

* git://git.kernel.org/pub/scm/virt/kvm/kvm:
  kvm: avoid page allocation failure in kvm_set_memory_region()
  KVM: x86: call irq notifiers with directed EOI
  KVM: nVMX: mask unrestricted_guest if disabled on L0

commit | commitdiff | tree

Linus Torvalds [Wed, 25 Mar 2015 00:08:29 +0000 (17:08 -0700)]

Merge branch 'for-4.0-fixes' of git://git./linux/kernel/git/tj/libata

Pull libata fix from Tejun Heo:
"One patch to fix a regression from the recent switch to blk-mq tag
allocation which can cause oops on SAS-attached SATA drives"

* 'for-4.0-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
ata: Add a new flag to destinguish sas controller

commit | commitdiff | tree

Linus Torvalds [Wed, 25 Mar 2015 00:02:45 +0000 (17:02 -0700)]

Merge tag 'mfd-fixes-4.0' of git://git./linux/kernel/git/lee/mfd

Pull MFD fixes from Lee Jones:
- Use DMA'able addresses for DMA; rtsx_usb
- Use return value in the correct way; kempld-core

* tag 'mfd-fixes-4.0' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd:
mfd: kempld-core: Fix callback return value check
mfd: rtsx_usb: Prevent DMA from stack

commit | commitdiff | tree

Damien Lespiau [Thu, 5 Feb 2015 18:30:20 +0000 (18:30 +0000)]

drm/i915: Don't try to reference the fb in get_initial_plane_config()

Tvrtko noticed a new warning on boot:

  WARNING: CPU: 1 PID: 353 at include/linux/kref.h:47 drm_framebuffer_reference+0x6c/0x80 [drm]()
  Call Trace:
  [<ffffffff8161f10c>] dump_stack+0x4f/0x7b
  [<ffffffff81052caa>] warn_slowpath_common+0xaa/0xd0
  [<ffffffff81052d8a>] warn_slowpath_null+0x1a/0x20
  [<ffffffffa00d035c>] drm_framebuffer_reference+0x6c/0x80 [drm]
  [<ffffffffa01c0df7>] update_state_fb.isra.54+0x47/0x50 [i915]
  [<ffffffffa01ccd5c>] skylake_get_initial_plane_config+0x93c/0x950 [i915]
  [<ffffffffa01e8721>] intel_modeset_init+0x1551/0x17c0 [i915]
  [<ffffffffa02476e0>] i915_driver_load+0xed0/0x11e0 [i915]
  [<ffffffff81627aa1>] ? _raw_spin_unlock_irqrestore+0x51/0x70
  [<ffffffffa00ca8b7>] drm_dev_register+0x77/0x110 [drm]
  [<ffffffffa00cda3b>] drm_get_pci_dev+0x11b/0x1f0 [drm]
  [<ffffffff81098e3d>] ? trace_hardirqs_on+0xd/0x10
  [<ffffffff81627aa1>] ? _raw_spin_unlock_irqrestore+0x51/0x70
  [<ffffffffa0145276>] i915_pci_probe+0x56/0x60 [i915]
  [<ffffffff813ad59c>] pci_device_probe+0x7c/0x100
  [<ffffffff81466aad>] driver_probe_device+0x16d/0x380

We cannot take a reference at this point, not before
intel_framebuffer_init() and the underlying drm_framebuffer_init().

Introduced in:

  commit 706dc7b549175e47f23e913b7f1e52874a7d0f56
  Author: Matt Roper <matthew.d.roper@intel.com>
  Date:   Tue Feb 3 13:10:04 2015 -0800

      drm/i915: Ensure plane->state->fb stays in sync with plane->fb

v2: Don't move update_state_fb(). It was moved around because I
    originally put update_state_fb() in intel_alloc_plane_obj() before
    finding a better place. (Matt)

Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Reported-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From drm-next:
(cherry picked from commit f55548b5af87ebfc586ca75748947f1c1b1a4a52)
Signed-off-by: Dave Airlie <airlied@redhat.com>

commit | commitdiff | tree

Linus Torvalds [Tue, 24 Mar 2015 23:58:29 +0000 (16:58 -0700)]

Merge tag 'spi-v4.0-rc5' of git://git./linux/kernel/git/broonie/spi

Pull spi fixes from Mark Brown:
"A couple of driver specific fixes of the usual "important if you have
  that device" kind together with a fix for a use after free bug that
  was introduced into the trace code in some of the recent refactoring
  of the message queue handling"

* tag 'spi-v4.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi: trigger trace event for message-done before mesg->complete
  spi: dw-mid: clear BUSY flag fist and test other one
  spi: qup: Fix cs-num DT property parsing

commit | commitdiff | tree

Linus Torvalds [Tue, 24 Mar 2015 23:51:42 +0000 (16:51 -0700)]

Merge tag 'regulator-fix-v4.0-rc5' of git://git./linux/kernel/git/broonie/regulator

Pull regulator fixes from Mark Brown:
"Two fixes here, one typo fix in the documentation and one fix for a
  system hang with one of the Palmas chips caused by the use of an
  incorrect offset being provided for one of the registers"

* tag 'regulator-fix-v4.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
  regulator: Fix documentation for regmap in the config
  regulator: palmas: Correct TPS659038 register definition for REGEN2

commit | commitdiff | tree

Linus Torvalds [Tue, 24 Mar 2015 23:42:54 +0000 (16:42 -0700)]

Merge tag 'regmap-fix-v4.0-rc5' of git://git./linux/kernel/git/broonie/regmap

Pull regmap fix from Mark Brown:
"This patch fixes a bad interaction between the support that was added
  for having regmaps without devices for early system controller
  initialization and the trace support.

  There's a very good analysis of the actual issue in the commit message
  for the change"

* tag 'regmap-fix-v4.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
  regmap: introduce regmap_name to fix syscon regmap trace events

commit | commitdiff | tree

Steve Capper [Sun, 22 Mar 2015 14:51:51 +0000 (14:51 +0000)]

arm64: percpu: Make this_cpu accessors pre-empt safe

this_cpu operations were implemented for arm64 in:
5284e1b arm64: xchg: Implement cmpxchg_double
f97fc81 arm64: percpu: Implement this_cpu operations

Unfortunately, it is possible for pre-emption to take place between
address generation and data access. This can lead to cases where data
is being manipulated by this_cpu for a different CPU than it was
called on. Which effectively breaks the spec.

This patch disables pre-emption for the this_cpu operations
guaranteeing that address generation and data manipulation take place
without a pre-emption in-between.

Fixes: 5284e1b4bc8a ("arm64: xchg: Implement cmpxchg_double")
Fixes: f97fc810798c ("arm64: percpu: Implement this_cpu operations")
Reported-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Steve Capper <steve.capper@linaro.org>
[catalin.marinas@arm.com: remove space after type cast]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

commit | commitdiff | tree

Mark Brown [Tue, 24 Mar 2015 17:38:44 +0000 (10:38 -0700)]

Merge remote-tracking branches 'spi/fix/dw', 'spi/fix/queue' and 'spi/fix/qup' into spi-linus

commit | commitdiff | tree

Daniel Vetter [Fri, 27 Feb 2015 11:58:13 +0000 (12:58 +0100)]

drm: Fixup racy refcounting in plane_force_disable

Originally it was impossible to be dropping the last refcount in this
function since there was always one around still from the idr. But in

commit 83f45fc360c8e16a330474860ebda872d1384c8c
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date: Wed Aug 6 09:10:18 2014 +0200

drm: Don't grab an fb reference for the idr

we've switched to weak references, broke that assumption but forgot to
fix it up.

Since we still force-disable planes it's only possible to hit this
when racing multiple rmfb with fbdev restoring or similar evil things.
As long as userspace is nice it's impossible to hit the BUG_ON.

But the BUG_ON would most likely be hit from fbdev code, which usually
invovles the console_lock besides all modeset locks. So very likely
we'd never get the bug reports if this was hit in the wild, hence
better be safe than sorry and backport.

Spotted by Matt Roper while reviewing other patches.

[airlied: pull this back into 4.0 - the oops happens there]

Cc: stable@vger.kernel.org
Cc: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

commit | commitdiff | tree

Igor Mammedov [Fri, 20 Mar 2015 12:21:37 +0000 (12:21 +0000)]

kvm: avoid page allocation failure in kvm_set_memory_region()

KVM guest can fail to startup with following trace on host:

qemu-system-x86: page allocation failure: order:4, mode:0x40d0
Call Trace:
  dump_stack+0x47/0x67
  warn_alloc_failed+0xee/0x150
  __alloc_pages_direct_compact+0x14a/0x150
  __alloc_pages_nodemask+0x776/0xb80
  alloc_kmem_pages+0x3a/0x110
  kmalloc_order+0x13/0x50
  kmemdup+0x1b/0x40
  __kvm_set_memory_region+0x24a/0x9f0 [kvm]
  kvm_set_ioapic+0x130/0x130 [kvm]
  kvm_set_memory_region+0x21/0x40 [kvm]
  kvm_vm_ioctl+0x43f/0x750 [kvm]

Failure happens when attempting to allocate pages for
'struct kvm_memslots', however it doesn't have to be
present in physically contiguous (kmalloc-ed) address
space, change allocation to kvm_kvzalloc() so that
it will be vmalloc-ed when its size is more then a page.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

commit | commitdiff | tree

Radim Krčmář [Wed, 18 Mar 2015 18:38:22 +0000 (19:38 +0100)]

KVM: x86: call irq notifiers with directed EOI

kvm_ioapic_update_eoi() wasn't called if directed EOI was enabled.
We need to do that for irq notifiers. (Like with edge interrupts.)

Fix it by skipping EOI broadcast only.

Bug: https://bugzilla.kernel.org/show_bug.cgi?id=82211
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Tested-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

commit | commitdiff | tree

Mike Snitzer [Mon, 23 Mar 2015 21:01:43 +0000 (17:01 -0400)]

dm: fix add_disk() NULL pointer due to race with free_dev()

Commit c4db59d31e39 ("fs: don't reassign dirty inodes to
default_backing_dev_info") exposed DM to a latent race in free_dev() vs
add_disk() in relation to management of the device's minor number.

Fix this by refactoring free_dev() to match cleanup order of the
alloc_dev() error path.  Move cleanup of the gendisk, queue, and bdev
to _before_ the cleanup of the idr managed minor number.

Also, purely due to cleanup that fell out during the free_dev() audit:
- adjust dm_blk_close() to access the gendisk's private_data under
  the _minor_lock spinlock.
- move __dm_destroy()'s dm_get_live_table() call out from under the
  _minor_lock spinlock.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1202449

Reported-by: Zdenek Kabelac <zkabelac@redhat.com>
Reported-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

commit | commitdiff | tree

Mark Brown [Mon, 23 Mar 2015 18:43:42 +0000 (11:43 -0700)]

Merge remote-tracking branches 'regulator/fix/doc' and 'regulator/fix/palmas' into regulator-linus

commit | commitdiff | tree

Catalin Marinas [Mon, 23 Mar 2015 15:06:50 +0000 (15:06 +0000)]

arm64: Use the reserved TTBR0 if context switching to the init_mm

The idle_task_exit() function may call switch_mm() with next ==
&init_mm. On arm64, init_mm.pgd cannot be used for user mappings, so
this patch simply sets the reserved TTBR0.

Cc: <stable@vger.kernel.org>
Reported-by: Jon Medhurst (Tixy) <tixy@linaro.org>
Tested-by: Jon Medhurst (Tixy) <tixy@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

commit | commitdiff | tree

Linus Torvalds [Mon, 23 Mar 2015 17:16:13 +0000 (10:16 -0700)]

Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:

1) Validate iov ranges before feeding them into iov_iter_init(), from
    Al Viro.

2) We changed copy_from_msghdr_from_user() to zero out the msg_namelen
    is a NULL pointer is given for the msg_name.  Do the same in the
    compat code too.  From Catalin Marinas.

3) Fix partially initialized tuples in netfilter conntrack helper, from
    Ian Wilson.

4) Missing continue; statement in nft_hash walker can lead to crashes,
    from Herbert Xu.

5) tproxy_tg6_check looks for IP6T_INV_PROTO in ->flags instead of
    ->invflags, fix from Pablo Neira Ayuso.

6) Incorrect memory account of TCP FINs can result in negative socket
    memory accounting values.  Fix from Josh Hunt.

7) Don't allow virtual functions to enable VLAN promiscuous mode in
    be2net driver, from Vasundhara Volam.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  netfilter: nft_compat: set IP6T_F_PROTO flag if protocol is set
  cx82310_eth: wait for firmware to become ready
  net: validate the range we feed to iov_iter_init() in sys_sendto/sys_recvfrom
  net: compat: Update get_compat_msghdr() to match copy_msghdr_from_user() behaviour
  be2net: use PCI MMIO read instead of config read for errors
  be2net: restrict MODIFY_EQ_DELAY cmd to a max of 8 EQs
  be2net: Prevent VFs from enabling VLAN promiscuous mode
  tcp: fix tcp fin memory accounting
  ipv6: fix backtracking for throw routes
  net: ethernet: pcnet32: Setup the SRAM and NOUFLO on Am79C97{3, 5}
  ipv6: call ipv6_proxy_select_ident instead of ipv6_select_ident in udp6_ufo_fragment
  netfilter: xt_TPROXY: fix invflags check in tproxy_tg6_check()
  netfilter: restore rule tracing via nfnetlink_log
  netfilter: nf_tables: allow to change chain policy without hook if it exists
  netfilter: Fix potential crash in nft_hash walker
  netfilter: Zero the tuple in nfnl_cthelper_parse_tuple()

commit | commitdiff | tree

Linus Torvalds [Mon, 23 Mar 2015 17:04:02 +0000 (10:04 -0700)]

Merge git://git./linux/kernel/git/davem/sparc

Pull sparc fixes from David Miller:
"Some perf bug fixes from David Ahern, and the fix for that nasty
  memmove() bug"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
  sparc64: Fix several bugs in memmove().
  sparc: Touch NMI watchdog when walking cpus and calling printk
  sparc: perf: Add support M7 processor
  sparc: perf: Make counting mode actually work
  sparc: perf: Remove redundant perf_pmu_{en|dis}able calls

commit | commitdiff | tree

David S. Miller [Mon, 23 Mar 2015 16:22:10 +0000 (09:22 -0700)]

sparc64: Fix several bugs in memmove().

Firstly, handle zero length calls properly.  Believe it or not there
are a few of these happening during early boot.

Next, we can't just drop to a memcpy() call in the forward copy case
where dst <= src.  The reason is that the cache initializing stores
used in the Niagara memcpy() implementations can end up clearing out
cache lines before we've sourced their original contents completely.

For example, considering NG4memcpy, the main unrolled loop begins like
this:

     load   src + 0x00
     load   src + 0x08
     load   src + 0x10
     load   src + 0x18
     load   src + 0x20
     store  dst + 0x00

Assume dst is 64 byte aligned and let's say that dst is src - 8 for
this memcpy() call.  That store at the end there is the one to the
first line in the cache line, thus clearing the whole line, which thus
clobbers "src + 0x28" before it even gets loaded.

To avoid this, just fall through to a simple copy only mildly
optimized for the case where src and dst are 8 byte aligned and the
length is a multiple of 8 as well.  We could get fancy and call
GENmemcpy() but this is good enough for how this thing is actually
used.

Reported-by: David Ahern <david.ahern@oracle.com>
Reported-by: Bob Picco <bpicco@meloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Tejun Heo [Mon, 23 Mar 2015 04:18:48 +0000 (00:18 -0400)]

writeback: fix possible underflow in write bandwidth calculation

From 1ebf33901ecc75d9496862dceb1ef0377980587c Mon Sep 17 00:00:00 2001
From: Tejun Heo <tj@kernel.org>
Date: Mon, 23 Mar 2015 00:08:19 -0400

2f800fbd777b ("writeback: fix dirtied pages accounting on redirty")
introduced account_page_redirty() which reverts stat updates for a
redirtied page, making BDI_DIRTIED no longer monotonically increasing.

bdi_update_write_bandwidth() uses the delta in BDI_DIRTIED as the
basis for bandwidth calculation.  While unlikely, since the above
patch, the newer value may be lower than the recorded past value and
underflow the bandwidth calculation leading to a wild result.

Fix it by subtracing min of the old and new values when calculating
delta.  AFAIK, there hasn't been any report of it happening but the
resulting erratic behavior would be non-critical and temporary, so
it's possible that the issue is happening without being reported.  The
risk of the fix is very low, so tagged for -stable.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jan Kara <jack@suse.cz>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Greg Thelen <gthelen@google.com>
Fixes: 2f800fbd777b ("writeback: fix dirtied pages accounting on redirty")
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@fb.com>

commit | commitdiff | tree

Keith Busch [Mon, 23 Mar 2015 15:32:37 +0000 (09:32 -0600)]

NVMe: Initialize device list head before starting

Driver recovery requires the device's list node to have been initialized.

Fixes: https://lkml.org/lkml/2015/3/22/262

Reported-by: Steven Noonan <steven@uplinklabs.net>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Jens Axboe <axboe@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

commit | commitdiff | tree

James Hogan [Mon, 23 Mar 2015 11:17:56 +0000 (11:17 +0000)]

metag: Fix ioremap_wc/ioremap_cached build errors

When ioremap_wc() or ioremap_cached() are used without first including
asm/pgtable.h, the _PAGE_CACHEABLE or _PAGE_WR_COMBINE definitions
aren't found, resulting in build errors like the following (in
next-20150323 due to "lib: devres: add a helper function for
ioremap_wc"):

lib/devres.c: In function ‘devm_ioremap_wc’:
lib/devres.c:91: error: ‘_PAGE_WR_COMBINE’ undeclared

We can't easily include asm/pgtable.h in asm/io.h due to dependency
problems, so split out the _PAGE_* definitions from asm/pgtable.h into a
separate asm/pgtable-bits.h header (as a couple of other architectures
already do), and include that in io.h instead.

Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: linux-metag@vger.kernel.org
Cc: Abhilash Kesavan <a.kesavan@samsung.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Mahesh Salgaonkar [Tue, 17 Mar 2015 10:44:41 +0000 (16:14 +0530)]

powerpc/book3s: Fix the MCE code to use CONFIG_KVM_BOOK3S_64_HANDLER

commit id 2ba9f0d has changed CONFIG_KVM_BOOK3S_64_HV to tristate to allow
HV/PR bits to be built as modules. But the MCE code still depends on
CONFIG_KVM_BOOK3S_64_HV which is wrong. When user selects
CONFIG_KVM_BOOK3S_64_HV=m to build HV/PR bits as a separate module the
relevant MCE code gets excluded.

This patch fixes the MCE code to use CONFIG_KVM_BOOK3S_64_HANDLER. This
makes sure that the relevant MCE code is included when HV/PR bits
are built as a separate modules.

Fixes: 2ba9f0d88750 ("kvm: powerpc: book3s: Support building HV and PR KVM as module")
Cc: stable@vger.kernel.org # v3.14+
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

commit | commitdiff | tree

Linus Torvalds [Sun, 22 Mar 2015 23:50:21 +0000 (16:50 -0700)]

Linux 4.0-rc5

commit | commitdiff | tree

Linus Torvalds [Sun, 22 Mar 2015 23:38:19 +0000 (16:38 -0700)]

Merge tag 'md/4.0-rc4-fix' of git://neil.brown.name/md

Pull bugfix for md from Neil Brown:
"One fix for md in 4.0-rc4

Regression in recent patch causes crash on error path"

* tag 'md/4.0-rc4-fix' of git://neil.brown.name/md:
md: fix problems with freeing private data after ->run failure.

commit | commitdiff | tree

David S. Miller [Sun, 22 Mar 2015 20:57:07 +0000 (16:57 -0400)]

Merge git://git./pub/scm/linux/kernel/git/pablo/nf

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for your net tree,
they are:

1) Fix missing initialization of tuple structure in nfnetlink_cthelper
   to avoid mismatches when looking up to attach userspace helpers to
   flows, from Ian Wilson.

2) Fix potential crash in nft_hash when we hit -EAGAIN in
   nft_hash_walk(), from Herbert Xu.

3) We don't need to indicate the hook information to update the
   basechain default policy in nf_tables.

4) Restore tracing over nfnetlink_log due to recent rework to
   accomodate logging infrastructure into nf_tables.

5) Fix wrong IP6T_INV_PROTO check in xt_TPROXY.

6) Set IP6T_F_PROTO flag in nft_compat so we can use SYNPROXY6 and
   REJECT6 from xt over nftables.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Linus Torvalds [Sun, 22 Mar 2015 19:07:47 +0000 (12:07 -0700)]

Merge tag 'driver-core-4.0-rc5' of git://git./linux/kernel/git/gregkh/driver-core

Pull driver core fixes from Greg KH:
"Here are two bugfixes for things reported.  One regression in kernfs,
  and another issue fixed in the LZ4 code that was fixed in the
  "upstream" codebase that solves a reported kernel crash

  Both have been in linux-next for a while"

* tag 'driver-core-4.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  LZ4 : fix the data abort issue
  kernfs: handle poll correctly on 'direct_read' files.

commit | commitdiff | tree

Linus Torvalds [Sun, 22 Mar 2015 19:03:14 +0000 (12:03 -0700)]

Merge tag 'char-misc-4.0-rc5' of git://git./linux/kernel/git/gregkh/char-misc

Pull char/misc fixes from Greg KH:
"Here are three fixes for 4.0-rc5 that revert 3 PCMCIA patches that
  were merged in 4.0-rc1 that cause regressions.  So let's revert them
  for now and they will be reworked and resent sometime in the future.

  All have been tested in linux-next for a while"

* tag 'char-misc-4.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  Revert "pcmcia: add a new resource manager for non ISA systems"
  Revert "pcmcia: fix incorrect bracketing on a test"
  Revert "pcmcia: add missing include for new pci resource handler"

commit | commitdiff | tree

Linus Torvalds [Sun, 22 Mar 2015 18:59:02 +0000 (11:59 -0700)]

Merge tag 'staging-4.0-rc5' of git://git./linux/kernel/git/gregkh/staging

Pull staging driver fixes from Greg KH:
"Here are four small staging driver fixes, all for the vt6656 and
  vt6655 drivers, that resolve some reported issues with them.

  All of these patches have been in linux next for a while"

* tag 'staging-4.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  vt6655: Fix late setting of byRFType.
  vt6655: RFbSetPower fix missing rate RATE_12M
  staging: vt6656: vnt_rf_setpower: fix missing rate RATE_12M
  staging: vt6655: vnt_tx_packet fix dma_idx selection.

commit | commitdiff | tree

Linus Torvalds [Sun, 22 Mar 2015 18:54:29 +0000 (11:54 -0700)]

Merge tag 'tty-4.0-rc5' of git://git./linux/kernel/git/gregkh/tty

Pull tty/serial driver fix from Greg KH:
"Here's a single 8250 serial driver that fixes a reported deadlock with
  the serial console and the tty driver.

  It's been in linux-next for a while now"

* tag 'tty-4.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  serial: 8250_dw: Fix deadlock in LCR workaround

commit | commitdiff | tree

Linus Torvalds [Sun, 22 Mar 2015 18:33:55 +0000 (11:33 -0700)]

Merge tag 'usb-4.0-rc5' of git://git./linux/kernel/git/gregkh/usb

Pull USB / PHY driver fixes from Greg KH:
"Here's a number of USB and PHY driver fixes for 4.0-rc5.

  The largest thing here is a revert of a gadget function driver patch
  that removes 500 lines of code.  Other than that, it's a number of
  reported bugs fixes and new quirk/id entries.

  All have been in linux-next for a while"

* tag 'usb-4.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (33 commits)
  usb: common: otg-fsm: only signal connect after switching to peripheral
  uas: Add US_FL_NO_ATA_1X for Initio Corporation controllers / devices
  USB: ehci-atmel: rework clk handling
  MAINTAINERS: add entry for USB OTG FSM
  usb: chipidea: otg: add a_alt_hnp_support response for B device
  phy: omap-usb2: Fix missing clk_prepare call when using old dt name
  phy: ti/omap: Fix modalias
  phy: core: Fixup return value of phy_exit when !pm_runtime_enabled
  phy: miphy28lp: Convert to devm_kcalloc and fix wrong sizof
  phy: miphy365x: Convert to devm_kcalloc and fix wrong sizeof
  phy: twl4030-usb: Remove redundant assignment for twl->linkstat
  phy: exynos5-usbdrd: Fix off-by-one valid value checking for args->args[0]
  phy: Find the right match in devm_phy_destroy()
  phy: rockchip-usb: Fixup rockchip_usb_phy_power_on failure path
  phy: ti-pipe3: Simplify ti_pipe3_dpll_wait_lock implementation
  phy: samsung-usb2: Remove NULL terminating entry from phys array
  phy: hix5hd2-sata: Check return value of platform_get_resource
  phy: exynos-dp-video: Kill exynos_dp_video_phy_pwr_isol function
  Revert "usb: gadget: zero: Add support for interrupt EP"
  Revert "xhci: Clear the host side toggle manually when endpoint is 'soft reset'"
  ...

commit | commitdiff | tree

Pablo Neira Ayuso [Sat, 21 Mar 2015 18:25:05 +0000 (19:25 +0100)]

netfilter: nft_compat: set IP6T_F_PROTO flag if protocol is set

ip6tables extensions check for this flag to restrict match/target to a
given protocol. Without this flag set, SYNPROXY6 returns an error.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Patrick McHardy <kaber@trash.net>

commit | commitdiff | tree

Ondrej Zary [Sat, 21 Mar 2015 10:29:37 +0000 (11:29 +0100)]

cx82310_eth: wait for firmware to become ready

When the device is powered up, some (older) firmware versions fail to work
properly if we send commands before the boot is complete (everything is OK
when the device is hot-plugged). The firmware indicates its ready status by
putting the link up.
Newer firmwares delay the first command so they don't suffer from this problem.
They also report the link being always up.

Wait for firmware to become ready (link up) before sending any commands and/or
data.

This also allows lowering CMD_TIMEOUT value to a reasonable time.

Tested with 4.1.0.9 (old) and 4.1.0.30 (new) firmware versions.

Signed-off-by: Ondrej Zary <linux@rainbow-software.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Linus Torvalds [Sat, 21 Mar 2015 20:05:37 +0000 (13:05 -0700)]

Merge branch 'fixes' of git://git.infradead.org/users/vkoul/slave-dma

Pull slave dmaengine fixes from Vinod Koul:
"Four fixes for dw, pl08x, imx-sdma and at_hdmac driver.  Nothing
  unusual here, simple fixes to these drivers"

* 'fixes' of git://git.infradead.org/users/vkoul/slave-dma:
  dmaengine: pl08x: Define capabilities for generic capabilities reporting
  dmaengine: dw: append MODULE_ALIAS for platform driver
  dmaengine: imx-sdma: switch to dynamic context mode after script loaded
  dmaengine: at_hdmac: Fix calculation of the residual bytes

commit | commitdiff | tree

Linus Torvalds [Sat, 21 Mar 2015 19:51:36 +0000 (12:51 -0700)]

Merge tag 'pm+acpi-4.0-rc5' of git://git./linux/kernel/git/rafael/linux-pm

Pull power management and ACPI fixes from Rafael Wysocki:
"These are fixes for recent regressions (PCI/ACPI resources and at91
  RTC locking), a stable-candidate powercap RAPL driver fix and two ARM
  cpuidle fixes (one stable-candidate too).

  Specifics:

   - Revert a recent PCI commit related to IRQ resources management that
     introduced a regression for drivers attempting to bind to devices
     whose previous drivers did not balance pci_enable_device() and
     pci_disable_device() as expected (Rafael J Wysocki).

   - Fix a deadlock in at91_rtc_interrupt() introduced by a typo in a
     recent commit related to wakeup interrupt handling (Dan Carpenter).

   - Allow the power capping RAPL (Running-Average Power Limit) driver
     to use different energy units for domains within one CPU package
     which is necessary to handle Intel Haswell EP processors correctly
     (Jacob Pan).

   - Improve the cpuidle mvebu driver's handling of Armada XP SoCs by
     updating the target residency and exit latency numbers for those
     chips (Sebastien Rannou).

   - Prevent the cpuidle mvebu driver from calling cpu_pm_enter() twice
     in a row before cpu_pm_exit() is called on the same CPU which
     breaks the core's assumptions regarding the usage of those
     functions (Gregory Clement)"

* tag 'pm+acpi-4.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  Revert "x86/PCI: Refine the way to release PCI IRQ resources"
  rtc: at91rm9200: double locking bug in at91_rtc_interrupt()
  powercap / RAPL: handle domains with different energy units
  cpuidle: mvebu: Update cpuidle thresholds for Armada XP SOCs
  cpuidle: mvebu: Fix the CPU PM notifier usage

commit | commitdiff | tree

Linus Torvalds [Sat, 21 Mar 2015 19:41:50 +0000 (12:41 -0700)]

Merge git://people.freedesktop.org/~airlied/linux

Pull drm updates from Dave Airlie:
"A bunch of fixes across drivers:

  radeon:
     disable two ended allocation for now, it breaks some stuff

  amdkfd:
     misc fixes

  nouveau:
     fix irq loop problem, add basic support for GM206 (new hw)

  i915:
     fix some WARNs people were seeing

  exynos:
     fix some iommu interactions causing boot failures"

* git://people.freedesktop.org/~airlied/linux:
  drm/radeon: drop ttm two ended allocation
  drm/exynos: fix the initialization order in FIMD
  drm/exynos: fix typo config name correctly.
  drm/exynos: Check for NULL dereference of crtc
  drm/exynos: IS_ERR() vs NULL bug
  drm/exynos: remove unused files
  drm/i915: Make sure the primary plane is enabled before reading out the fb state
  drm/nouveau/bios: fix i2c table parsing for dcb 4.1
  drm/nouveau/device/gm100: Basic GM206 bring up (as copy of GM204)
  drm/nouveau/device: post write to NV_PMC_BOOT_1 when flipping endian switch
  drm/nouveau/gr/gf100: fix some accidental or'ing of buffer addresses
  drm/nouveau/fifo/nv04: remove the loop from the interrupt handler
  drm/radeon: Changing number of compute pipe lines
  drm/amdkfd: Fix SDMA queue init. in non-HWS mode
  drm/amdkfd: destroy mqd when destroying kernel queue
  drm/i915: Ensure plane->state->fb stays in sync with plane->fb

commit | commitdiff | tree

Linus Torvalds [Sat, 21 Mar 2015 19:33:01 +0000 (12:33 -0700)]

Merge tag 'devicetree-fixes-for-4.0-part2' of git://git./linux/kernel/git/robh/linux

Pull more DeviceTree fixes vfom Rob Herring:

- revert setting stdout-path as preferred console.  This caused
   regressions in PowerMACs and other systems.

- yet another fix for stdout-path option parsing.

- fix error path handling in of_irq_parse_one

* tag 'devicetree-fixes-for-4.0-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
  Revert "of: Fix premature bootconsole disable with 'stdout-path'"
  of: handle both '/' and ':' in path strings
  of: unittest: Add option string test case with longer path
  of/irq: Fix of_irq_parse_one() returned error codes

commit | commitdiff | tree

Linus Torvalds [Sat, 21 Mar 2015 18:24:38 +0000 (11:24 -0700)]

Merge git://git./linux/kernel/git/nab/target-pending

Pull SCSI target fixes from Nicholas Bellinger:
"Here are current target-pending fixes for v4.0-rc5 code that have made
  their way into the queue over the last weeks.

  The fixes this round include:

   - Fix long-standing iser-target logout bug related to early
     conn_logout_comp completion, resulting in iscsi_conn use-after-tree
     OOpsen.  (Sagi + nab)

   - Fix long-standing tcm_fc bug in ft_invl_hw_context() failure
     handing for DDP hw offload.  (DanC)

   - Fix incorrect use of unprotected __transport_register_session() in
     tcm_qla2xxx + other single local se_node_acl fabrics.  (Bart)

   - Fix reference leak in target_submit_cmd() -> target_get_sess_cmd()
     for ack_kref=1 failure path.  (Bart)

   - Fix pSCSI backend ->get_device_type() statistics OOPs with
     un-configured device.  (Olaf + nab)

   - Fix virtual LUN=0 target_configure_device failure OOPs at modprobe
     time.  (Claudio + nab)

   - Fix FUA write false positive failure regression in v4.0-rc1 code.
     (Christophe Vu-Brugier + HCH)"

* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
  target: do not reject FUA CDBs when write cache is enabled but emulate_write_cache is 0
  target: Fix virtual LUN=0 target_configure_device failure OOPs
  target/pscsi: Fix NULL pointer dereference in get_device_type
  tcm_fc: missing curly braces in ft_invl_hw_context()
  target: Fix reference leak in target_get_sess_cmd() error path
  loop/usb/vhost-scsi/xen-scsiback: Fix use of __transport_register_session
  tcm_qla2xxx: Fix incorrect use of __transport_register_session
  iscsi-target: Avoid early conn_logout_comp for iser connections
  Revert "iscsi-target: Avoid IN_LOGOUT failure case for iser-target"
  target: Disallow changing of WRITE cache/FUA attrs after export

commit | commitdiff | tree

Linus Torvalds [Sat, 21 Mar 2015 18:15:13 +0000 (11:15 -0700)]

Merge tag 'dm-4.0-fixes' of git://git./linux/kernel/git/device-mapper/linux-dm

Pull devicemapper fixes from Mike Snitzer:
"A handful of stable fixes for DM:
   - fix thin target to always zero-fill reads to unprovisioned blocks
   - fix to interlock device destruction's suspend from internal
     suspends
   - fix 2 snapshot exception store handover bugs
   - fix dm-io to cope with DISCARD and WRITE_SAME capabilities changing"

* tag 'dm-4.0-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm io: deal with wandering queue limits when handling REQ_DISCARD and REQ_WRITE_SAME
  dm snapshot: suspend merging snapshot when doing exception handover
  dm snapshot: suspend origin when doing exception handover
  dm: hold suspend_lock while suspending device during device deletion
  dm thin: fix to consistently zero-fill reads to unprovisioned blocks

commit | commitdiff | tree

Linus Torvalds [Sat, 21 Mar 2015 17:53:37 +0000 (10:53 -0700)]

Merge branch 'for-linus' of git://git./linux/kernel/git/mason/linux-btrfs

Pull btrfs fixes from Chris Mason:
"Most of these are fixing extent reservation accounting, or corners
  with tree writeback during commit.

  Josef's set does add a test, which isn't strictly a fix, but it'll
  keep us from making this same mistake again"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: fix outstanding_extents accounting in DIO
  Btrfs: add sanity test for outstanding_extents accounting
  Btrfs: just free dummy extent buffers
  Btrfs: account merges/splits properly
  Btrfs: prepare block group cache before writing
  Btrfs: fix ASSERT(list_empty(&cur_trans->dirty_bgs_list)
  Btrfs: account for the correct number of extents for delalloc reservations
  Btrfs: fix merge delalloc logic
  Btrfs: fix comp_oper to get right order
  Btrfs: catch transaction abortion after waiting for it
  btrfs: fix sizeof format specifier in btrfs_check_super_valid()

commit | commitdiff | tree

Linus Torvalds [Sat, 21 Mar 2015 17:41:15 +0000 (10:41 -0700)]

Merge branch 'for-4.0' of git://linux-nfs.org/~bfields/linux

Pull nfsd bufix from Bruce Fields:
"This is a fix for a crash easily triggered by 4.1 activity to a server
  built with CONFIG_NFSD_PNFS.

  There are some more bugfixes queued up that I intend to pass along
  next week, but this is the most critical"

* 'for-4.0' of git://linux-nfs.org/~bfields/linux:
  Subject: nfsd: don't recursively call nfsd4_cb_layout_fail

commit | commitdiff | tree

Linus Torvalds [Sat, 21 Mar 2015 17:36:44 +0000 (10:36 -0700)]

Merge tag 'upstream-4.0-rc5' of git://git.infradead.org/linux-ubifs

Pull UBI fix from Artem Bityutskiy:
"This fixes a bug introduced during the v4.0 merge window where we
forgot to put braces where they should be"

* tag 'upstream-4.0-rc5' of git://git.infradead.org/linux-ubifs:
UBI: fix missing brace control flow

commit | commitdiff | tree

Linus Torvalds [Sat, 21 Mar 2015 17:24:10 +0000 (10:24 -0700)]

Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux

Pull arm64 fixes from Catalin Marinas:

- mm switching fix where the kernel pgd ends up in the user TTBR0 after
   returning from an EFI run-time services call

- fix __GFP_ZERO handling for atomic pool and CMA DMA allocations (the
   generic code does get the gfp flags, so it's left with the arch code
   to memzero accordingly)

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: Honor __GFP_ZERO in dma allocations
  arm64: efi: don't restore TTBR0 if active_mm points at init_mm

commit | commitdiff | tree

Linus Torvalds [Sat, 21 Mar 2015 17:03:22 +0000 (10:03 -0700)]

Merge branch 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm

Pull ARM fixes from Russell King:
"Another few ARM fixes.  Fabrice fixed the L2 cache DT parsing to allow
  prefetch configuration to be specified even when the cache size
  parsing fails.

  Laura noticed that the setting of page attributes wasn't working for
  modules due to is_module_addr() always returning false.

  Marc Gonzalez (aka Mason) noticed a potential latent bug with the way
  we read one of the CPUID registers (where we could attempt to read a
  non-present CPUID register which may fault.)

  I've fixed an issue where 32-bit DMA masks were failing with memory
  which extended to the top of physical address space, and I've also
  added debugging output of the page tables when we hit a data access
  exception which we don't specifically handle - prompted by the lack of
  information in a bug report"

* 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm:
  ARM: 8313/1: Use read_cpuid_ext() macro instead of inline asm
  ARM: 8311/1: Don't use is_module_addr in setting page attributes
  ARM: 8310/1: l2c: Fix prefetch settings dt parsing
  ARM: dump pgd, pmd and pte states on unhandled data abort faults
  ARM: dma-api: fix off-by-one error in __dma_supported()

commit | commitdiff | tree

Rafael J. Wysocki [Fri, 20 Mar 2015 23:39:12 +0000 (00:39 +0100)]

Merge branches 'pm-cpuidle', 'powercap', 'irq-pm' and 'acpi-resources'

* pm-cpuidle:
  cpuidle: mvebu: Update cpuidle thresholds for Armada XP SOCs
  cpuidle: mvebu: Fix the CPU PM notifier usage

* powercap:
  powercap / RAPL: handle domains with different energy units

* irq-pm:
  rtc: at91rm9200: double locking bug in at91_rtc_interrupt()

* acpi-resources:
  Revert "x86/PCI: Refine the way to release PCI IRQ resources"

commit | commitdiff | tree

NeilBrown [Fri, 13 Mar 2015 00:51:18 +0000 (11:51 +1100)]

md: fix problems with freeing private data after ->run failure.

If ->run() fails, it can either free the data structures it
allocated, or leave that task to ->free() which will be called
on failures.

However:
  md.c calls ->free() even if ->private_data is NULL, which
     causes problems in some personalities.
  raid0.c frees the data, but doesn't clear ->private_data,
     which will become a problem when we fix md.c

So better fix both these issues at once.

Reported-by: Richard W.M. Jones <rjones@redhat.com>
Fixes: 5aa61f427e4979be733e4847b9199ff9cc48a47e
URL: https://bugzilla.kernel.org/show_bug.cgi?id=94381
Signed-off-by: NeilBrown <neilb@suse.de>

commit | commitdiff | tree

Al Viro [Fri, 20 Mar 2015 17:41:43 +0000 (17:41 +0000)]

net: validate the range we feed to iov_iter_init() in sys_sendto/sys_recvfrom

Cc: stable@vger.kernel.org # v3.19
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Catalin Marinas [Fri, 20 Mar 2015 16:48:13 +0000 (16:48 +0000)]

net: compat: Update get_compat_msghdr() to match copy_msghdr_from_user() behaviour

Commit db31c55a6fb2 (net: clamp ->msg_namelen instead of returning an
error) introduced the clamping of msg_namelen when the unsigned value
was larger than sizeof(struct sockaddr_storage). This caused a
msg_namelen of -1 to be valid. The native code was subsequently fixed by
commit dbb490b96584 (net: socket: error on a negative msg_namelen).

In addition, the native code sets msg_namelen to 0 when msg_name is
NULL. This was done in commit (6a2a2b3ae075 net:socket: set msg_namelen
to 0 if msg_name is passed as NULL in msghdr struct from userland) and
subsequently updated by 08adb7dabd48 (fold verify_iovec() into
copy_msghdr_from_user()).

This patch brings the get_compat_msghdr() in line with
copy_msghdr_from_user().

Fixes: db31c55a6fb2 (net: clamp ->msg_namelen instead of returning an error)
Cc: David S. Miller <davem@davemloft.net>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Suzuki K. Poulose [Thu, 19 Mar 2015 18:17:09 +0000 (18:17 +0000)]

arm64: Honor __GFP_ZERO in dma allocations

Current implementation doesn't zero out the pages allocated.
Honor the __GFP_ZERO flag and zero out if set.

Cc: <stable@vger.kernel.org> # v3.14+
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

commit | commitdiff | tree

David S. Miller [Fri, 20 Mar 2015 17:25:56 +0000 (13:25 -0400)]

Merge branch 'be2net'

Sathya Perla says:

====================
be2net: patch set

Hi David, this patch set includes 3 bug fixes to the be2net driver.

Patch 1 fixes a vlan isolation issue with VFs. When a VF is placed in
promiscous mode, it could receive packets belonging to any vlan, as
the PF driver grants vlan promisc capability to VFs. The PF
driver now disables the vlan promisc capability for VFs to fix this
problem.

Patch 2 fixes the call to MODIFY_EQ_DELAY FW cmd to not include more
than 8 EQs per cmd. The FW is not capable of handling more than 8 EQs
per cmd.

Patch 3 fixes an EEH error detection issue. On Power platforms,
when an EEH error occurs, the slot disconnect state is more reliably
detected via an MMIO read compared to a config read. So, the error
register reads that occur every second are now done via MMIO.

Pls apply this patch set to the "net" tree. Thanks!
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Suresh Reddy [Fri, 20 Mar 2015 10:28:25 +0000 (06:28 -0400)]

be2net: use PCI MMIO read instead of config read for errors

When an EEH error occurs, the device/slot is disconnected. This condition
is more reliably detected (i.e., returns all ones) with an MMIO read rather
than a config read -- especially on power platforms.

Hence, this patch fixes EEH error detection by replacing config reads with
MMIO reads for reading the error registers. The error registers in
Skyhawk-R/BE2/BE3 are accessible both via the config space and the
PCICFG (BAR0) memory space.

Reported-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Suresh Reddy <Suresh.Reddy@emulex.com>
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Suresh Reddy [Fri, 20 Mar 2015 10:28:24 +0000 (06:28 -0400)]

be2net: restrict MODIFY_EQ_DELAY cmd to a max of 8 EQs

Issuing this cmd for more than 8 EQs does not have the intended effect
even on BEx and Skyhawk-R.

This patch fixes this by issuing this cmd for upto 8 EQs at a time.
Signed-off-by: Suresh Reddy <Suresh.Reddy@emulex.com>
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Vasundhara Volam [Fri, 20 Mar 2015 10:28:23 +0000 (06:28 -0400)]

be2net: Prevent VFs from enabling VLAN promiscuous mode

Currently, a PF does not restrict its VF interface from enabling vlan
promiscuous mode. This breaks vlan isolation when a vlan
(transparent tagging) is configured on a VF.

This patch fixes this problem by disabling the vlan promisc capability
for VFs.

Reported-by: Yoann Juet <veilletechno-irts@univ-nantes.fr>
Signed-off-by: Vasundhara Volam <vasundhara.volam@emulex.com>
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Josh Hunt [Thu, 19 Mar 2015 23:19:30 +0000 (19:19 -0400)]

tcp: fix tcp fin memory accounting

tcp_send_fin() does not account for the memory it allocates properly, so
sk_forward_alloc can be negative in cases where we've sent a FIN:

ss example output (ss -amn | grep -B1 f4294):
tcp FIN-WAIT-1 0 1 192.168.0.1:45520 192.0.2.1:8080
skmem:(r0,rb87380,t0,tb87380,f4294966016,w1280,o0,bl0)
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Will Deacon [Thu, 19 Mar 2015 15:43:00 +0000 (15:43 +0000)]

arm64: efi: don't restore TTBR0 if active_mm points at init_mm

init_mm isn't a normal mm: it has swapper_pg_dir as its pgd (which
contains kernel mappings) and is used as the active_mm for the idle
thread.

When restoring the pgd after an EFI call, we write current->active_mm
into TTBR0. If the current task is actually the idle thread (e.g. when
initialising the EFI RTC before entering userspace), then the TLB can
erroneously populate itself with junk global entries as a result of
speculative table walks.

When we do eventually return to userspace, the task can end up hitting
these junk mappings leading to lockups, corruption or crashes.

This patch fixes the problem in the same way as the CPU suspend code by
ensuring that we never switch to the init_mm in efi_set_pgd and instead
point TTBR0 at the zero page. A check is also added to cpu_switch_mm to
BUG if we get passed swapper_pg_dir.

Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Fixes: f3cdfd239da5 ("arm64/efi: move SetVirtualAddressMap() to UEFI stub")
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

commit | commitdiff | tree

Steven Barth [Thu, 19 Mar 2015 15:16:04 +0000 (16:16 +0100)]

ipv6: fix backtracking for throw routes

for throw routes to trigger evaluation of other policy rules
EAGAIN needs to be propagated up to fib_rules_lookup
similar to how its done for IPv4

A simple testcase for verification is:

ip -6 rule add lookup 33333 priority 33333
ip -6 route add throw 2001:db8::1
ip -6 route add 2001:db8::1 via fe80::1 dev wlan0 table 33333
ip route get 2001:db8::1

Signed-off-by: Steven Barth <cyrus@openwrt.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Markos Chandras [Thu, 19 Mar 2015 10:28:14 +0000 (10:28 +0000)]

net: ethernet: pcnet32: Setup the SRAM and NOUFLO on Am79C97{3, 5}

On a MIPS Malta board, tons of fifo underflow errors have been observed
when using u-boot as bootloader instead of YAMON. The reason for that
is that YAMON used to set the pcnet device to SRAM mode but u-boot does
not. As a result, the default Tx threshold (64 bytes) is now too small to
keep the fifo relatively used and it can result to Tx fifo underflow errors.
As a result of which, it's best to setup the SRAM on supported controllers
so we can always use the NOUFLO bit.

Cc: <netdev@vger.kernel.org>
Cc: <stable@vger.kernel.org>
Cc: <linux-kernel@vger.kernel.org>
Cc: Don Fry <pcnet32@frontier.com>
Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Sabrina Dubroca [Thu, 19 Mar 2015 10:22:32 +0000 (11:22 +0100)]

ipv6: call ipv6_proxy_select_ident instead of ipv6_select_ident in udp6_ufo_fragment

Matt Grant reported frequent crashes in ipv6_select_ident when
udp6_ufo_fragment is called from openvswitch on a skb that doesn't
have a dst_entry set.

ipv6_proxy_select_ident generates the frag_id without using the dst
associated with the skb. This approach was suggested by Vladislav
Yasevich.

Fixes: 0508c07f5e0c ("ipv6: Select fragment id during UFO segmentation if not set.")
Cc: Vladislav Yasevich <vyasevic@redhat.com>
Reported-by: Matt Grant <matt@mattgrant.net.nz>
Tested-by: Matt Grant <matt@mattgrant.net.nz>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Wenbo Wang [Fri, 20 Mar 2015 05:04:54 +0000 (01:04 -0400)]

Fix bug in blk_rq_merge_ok

Use the right array index to reference the last
element of rq->biotail->bi_io_vec[]

Signed-off-by: Wenbo Wang <wenbo.wang@memblaze.com>
Reviewed-by: Chong Yuan <chong.yuan@memblaze.com>
Fixes: 66cb45aa41315 ("block: add support for limiting gaps in SG lists")
Cc: stable@kernel.org
Signed-off-by: Jens Axboe <axboe@fb.com>

commit | commitdiff | tree

Rafael J. Wysocki [Fri, 20 Mar 2015 13:56:19 +0000 (14:56 +0100)]

Revert "x86/PCI: Refine the way to release PCI IRQ resources"

Commit b4b55cda5874 (Refine the way to release PCI IRQ resources)
introduced a regression in the PCI IRQ resource management by causing
the IRQ resource of a device, established when pci_enabled_device()
is called on a fully disabled device, to be released when the driver
is unbound from the device, regardless of the enable_cnt.

This leads to the situation that an ill-behaved driver can now make a
device unusable to subsequent drivers by an imbalance in their use of
pci_enable/disable_device(). That is a serious problem for secondary
drivers like vfio-pci, which are innocent of the transgressions of
the previous driver.

Since the solution of this problem is not immediate and requires
further discussion, revert commit b4b55cda5874 and the issue it was
supposed to address (a bug related to xen-pciback) will be taken
care of in a different way going forward.

Reported-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

RSS Atom