GitHub/moto-9609/android_kernel_motorola_exynos9610.git
8 years agomd: simplify the code with md_kick_rdev_from_array
Guoqing Jiang [Fri, 3 Jun 2016 03:32:05 +0000 (23:32 -0400)]
md: simplify the code with md_kick_rdev_from_array

Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>
8 years agomd-cluster: fix deadlock issue when add disk to an recoverying array
Guoqing Jiang [Fri, 3 Jun 2016 03:32:04 +0000 (23:32 -0400)]
md-cluster: fix deadlock issue when add disk to an recoverying array

Add a disk to an array which is performing recovery
is a little complicated, we need to do both reap the
sync thread and perform add disk for the case, then
it caused deadlock as follows.

linux44:~ # ps aux|grep md|grep D
root      1822  0.0  0.0      0     0 ?        D    16:50   0:00 [md127_resync]
root      1848  0.0  0.0  19860   952 pts/0    D+   16:50   0:00 mdadm --manage /dev/md127 --re-add /dev/vdb
linux44:~ # cat /proc/1848/stack
[<ffffffff8107afde>] kthread_stop+0x6e/0x120
[<ffffffffa051ddb0>] md_unregister_thread+0x40/0x80 [md_mod]
[<ffffffffa0526e45>] md_reap_sync_thread+0x15/0x150 [md_mod]
[<ffffffffa05271e0>] action_store+0x260/0x270 [md_mod]
[<ffffffffa05206b4>] md_attr_store+0xb4/0x100 [md_mod]
[<ffffffff81214a7e>] sysfs_write_file+0xbe/0x140
[<ffffffff811a6b98>] vfs_write+0xb8/0x1e0
[<ffffffff811a75b8>] SyS_write+0x48/0xa0
[<ffffffff8152a5c9>] system_call_fastpath+0x16/0x1b
[<00007f068ea1ed30>] 0x7f068ea1ed30
linux44:~ # cat /proc/1822/stack
[<ffffffffa05251a6>] md_do_sync+0x846/0xf40 [md_mod]
[<ffffffffa052402d>] md_thread+0x16d/0x180 [md_mod]
[<ffffffff8107ad94>] kthread+0xb4/0xc0
[<ffffffff8152a518>] ret_from_fork+0x58/0x90

                        Task1848                                Task1822
md_attr_store (held reconfig_mutex by call mddev_lock())
                        action_store
md_reap_sync_thread
md_unregister_thread
kthread_stop                    md_wakeup_thread(mddev->thread);
wait_event(mddev->sb_wait, !test_bit(MD_CHANGE_PENDING))

md_check_recovery is triggered by wakeup mddev->thread,
but it can't clear MD_CHANGE_PENDING flag since it can't
get lock which was held by md_attr_store already.

To solve the deadlock problem, we move "->resync_finish()"
from md_do_sync to md_reap_sync_thread (after md_update_sb),
also MD_HELD_RESYNC_LOCK is introduced since it is possible
that node can't get resync lock in md_do_sync.

Then we do not need to wait for MD_CHANGE_PENDING is cleared
or not since metadata should be updated after md_update_sb,
so just call resync_finish if MD_HELD_RESYNC_LOCK is set.

We also unified the code after skip label, since set PENDING
for non-clustered case should be harmless.

Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>
8 years agoright meaning of PARITY_ENABLE_RMW and PARITY_PREFER_RMW
Song Liu [Tue, 24 May 2016 00:25:06 +0000 (17:25 -0700)]
right meaning of PARITY_ENABLE_RMW and PARITY_PREFER_RMW

In current handle_stripe_dirtying, the code prefers rmw with
PARITY_ENABLE_RMW; while prefers rcw with PARITY_PREFER_RMW.

This patch reverses this behavior.

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
8 years agoMerge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Thu, 26 May 2016 00:37:33 +0000 (17:37 -0700)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull x86 fixes from Ingo Molnar:
 "Misc fixes: EFI, entry code, pkeys and MPX fixes, TASK_SIZE cleanups
  and a tsc frequency table fix"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Switch from TASK_SIZE to TASK_SIZE_MAX in the page fault code
  x86/fsgsbase/64: Use TASK_SIZE_MAX for FSBASE/GSBASE upper limits
  x86/mm/mpx: Work around MPX erratum SKD046
  x86/entry/64: Fix stack return address retrieval in thunk
  x86/efi: Fix 7-parameter efi_call()s
  x86/cpufeature, x86/mm/pkeys: Fix broken compile-time disabling of pkeys
  x86/tsc: Add missing Cherrytrail frequency to the table

8 years agoMerge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Thu, 26 May 2016 00:11:43 +0000 (17:11 -0700)]
Merge branch 'sched-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull scheduler fixes from Ingo Molnar:
 "Two fixes: one for a lost wakeup, the other to fix the compiler
  optimizing out preempt operations on ARM64 (and possibly other non-x86
  architectures)"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/core: Fix remote wakeups
  sched/preempt: Fix preempt_count manipulations

8 years agoMerge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Thu, 26 May 2016 00:05:40 +0000 (17:05 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull perf updates from Ingo Molnar:
 "Mostly tooling and PMU driver fixes, but also a number of late updates
  such as the reworking of the call-chain size limiting logic to make
  call-graph recording more robust, plus tooling side changes for the
  new 'backwards ring-buffer' extension to the perf ring-buffer"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (34 commits)
  perf record: Read from backward ring buffer
  perf record: Rename variable to make code clear
  perf record: Prevent reading invalid data in record__mmap_read
  perf evlist: Add API to pause/resume
  perf trace: Use the ptr->name beautifier as default for "filename" args
  perf trace: Use the fd->name beautifier as default for "fd" args
  perf report: Add srcline_from/to branch sort keys
  perf evsel: Record fd into perf_mmap
  perf evsel: Add overwrite attribute and check write_backward
  perf tools: Set buildid dir under symfs when --symfs is provided
  perf trace: Only auto set call-graph to "dwarf" when syscalls are being traced
  perf annotate: Sort list of recognised instructions
  perf annotate: Fix identification of ARM blt and bls instructions
  perf tools: Fix usage of max_stack sysctl
  perf callchain: Stop validating callchains by the max_stack sysctl
  perf trace: Fix exit_group() formatting
  perf top: Use machine->kptr_restrict_warned
  perf trace: Warn when trying to resolve kernel addresses with kptr_restrict=1
  perf machine: Do not bail out if not managing to read ref reloc symbol
  perf/x86/intel/p4: Trival indentation fix, remove space
  ...

8 years agoMerge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Wed, 25 May 2016 23:52:19 +0000 (16:52 -0700)]
Merge branch 'core-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull objtool build fix from Ingo Molnar:
 "An libtool fix for older libelf versions"

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  objtool: Allow building with older libelf

8 years agoMerge branch 'work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Linus Torvalds [Wed, 25 May 2016 22:59:09 +0000 (15:59 -0700)]
Merge branch 'work.iov_iter' of git://git./linux/kernel/git/viro/vfs

Pull vfs iov_iter regression fix from Al Viro:
 "Fix for braino in 'fold checks into iterate_and_advance()'"

* 'work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  do "fold checks into iterate_and_advance()" right

8 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Linus Torvalds [Wed, 25 May 2016 22:54:35 +0000 (15:54 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/viro/vfs

Pull vfs xattr regression fixes from Al Viro.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  make xattr_resolve_handlers() safe to use with NULL ->s_xattr
  xattr: Fail with -EINVAL for NULL attribute names

8 years agoMerge tag 'acpi-4.7-rc1-more' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael...
Linus Torvalds [Wed, 25 May 2016 22:38:56 +0000 (15:38 -0700)]
Merge tag 'acpi-4.7-rc1-more' of git://git./linux/kernel/git/rafael/linux-pm

Pull ACPI fix from Rafael Wysocki:
 "Additional ACPI update for v4.7-rc1

  Just one fix for incorrect async_synchronize_cookie() usage in the
  ACPI battery driver (Chris Wilson)"

* tag 'acpi-4.7-rc1-more' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI / battery: Correctly serialise with the pending async probe

8 years agoMerge tag 'pm-4.7-rc1-more' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael...
Linus Torvalds [Wed, 25 May 2016 22:29:21 +0000 (15:29 -0700)]
Merge tag 'pm-4.7-rc1-more' of git://git./linux/kernel/git/rafael/linux-pm

Pull more power management updates from Rafael Wysocki:
 "These are two stable-candidate fixes (PM core, cpuidle) and a bunch of
  cpufreq cleanups.

  Specifics:

   - Stable-candidate cpuidle fix to make it check the right variable
     when deciding whether or not to enable interrupts on the local CPU
     so as to avoid enabling iterrupts too early in some cases if the
     system has both coupled and per-core idle states (Daniel Lezcano).

   - Stable-candidate PM core fix to make it handle failures at the
     "late suspend" stage of device suspend consistently for all devices
     regardless of whether or not async suspend/resume is enabled for
     them (Rafael Wysocki).

   - Cleanups in the cpufreq core, the schedutil governor and the
     intel_pstate driver (Rafael Wysocki, Pankaj Gupta, Viresh Kumar)"

* tag 'pm-4.7-rc1-more' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  PM / sleep: Handle failures in device_suspend_late() consistently
  cpufreq: schedutil: Improve prints messages with pr_fmt
  cpuidle: Fix cpuidle_state_is_coupled() argument in cpuidle_enter()
  cpufreq: simplified goto out in cpufreq_register_driver()
  cpufreq: governor: CPUFREQ_GOV_STOP never fails
  cpufreq: governor: CPUFREQ_GOV_POLICY_EXIT never fails
  intel_pstate: Simplify conditional in intel_pstate_set_policy()

8 years agodo "fold checks into iterate_and_advance()" right
Al Viro [Wed, 25 May 2016 21:36:19 +0000 (17:36 -0400)]
do "fold checks into iterate_and_advance()" right

the only case when we should skip the iterate_and_advance() guts
is when nothing's left in the iterator, _not_ just when requested
amount is 0.  Said guts will do nothing in the latter case anyway;
the problem we tried to deal with in the aforementioned commit is
that when there's nothing left *and* the amount requested is 0,
we might end up deferencing one iovec too many; the value we fetch
from there is discarded in that case, but theoretically it might
oops if the iovec array ends exactly at the end of page with the
next page not mapped.

Bailing out on zero size requested had an unexpected side effect -
zero-length segment in the beginning of iovec array ended up
throwing do_loop_readv_writev() into infinite spin; we do not
advance past the empty segment at all.  Reproducer is trivial:
echo '#include <sys/uio.h>' >a.c
echo 'main() {char c; struct iovec v[] = {{&c,0},{&c,1}}; readv(0,v,2);}' >>a.c
cc a.c && ./a.out </proc/uptime

which should end up with the process not hanging.  Probably ought to
go into LTP or xfstests...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
8 years agomake xattr_resolve_handlers() safe to use with NULL ->s_xattr
Al Viro [Wed, 25 May 2016 21:34:41 +0000 (17:34 -0400)]
make xattr_resolve_handlers() safe to use with NULL ->s_xattr

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
8 years agoxattr: Fail with -EINVAL for NULL attribute names
Andreas Gruenbacher [Mon, 9 May 2016 11:28:49 +0000 (13:28 +0200)]
xattr: Fail with -EINVAL for NULL attribute names

Commit 98e9cb57 improved the xattr name checks in xattr_resolve_name but
didn't update the NULL attribute name check appropriately, so NULL
attribute names lead to NULL pointer dereferences.  Turn that into
-EINVAL results instead.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
  fs/xattr.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
8 years agoMerge branch 'acpi-battery'
Rafael J. Wysocki [Wed, 25 May 2016 20:11:28 +0000 (22:11 +0200)]
Merge branch 'acpi-battery'

* acpi-battery:
  ACPI / battery: Correctly serialise with the pending async probe

8 years agoMerge branches 'pm-cpufreq', 'pm-cpuidle' and 'pm-core'
Rafael J. Wysocki [Wed, 25 May 2016 19:54:45 +0000 (21:54 +0200)]
Merge branches 'pm-cpufreq', 'pm-cpuidle' and 'pm-core'

* pm-cpufreq:
  cpufreq: schedutil: Improve prints messages with pr_fmt
  cpufreq: simplified goto out in cpufreq_register_driver()
  cpufreq: governor: CPUFREQ_GOV_STOP never fails
  cpufreq: governor: CPUFREQ_GOV_POLICY_EXIT never fails
  intel_pstate: Simplify conditional in intel_pstate_set_policy()

* pm-cpuidle:
  cpuidle: Fix cpuidle_state_is_coupled() argument in cpuidle_enter()

* pm-core:
  PM / sleep: Handle failures in device_suspend_late() consistently

8 years agoMerge tag 'pwm/for-4.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry...
Linus Torvalds [Wed, 25 May 2016 17:40:15 +0000 (10:40 -0700)]
Merge tag 'pwm/for-4.7-rc1' of git://git./linux/kernel/git/thierry.reding/linux-pwm

Pull pwm updates from Thierry Reding:
 "This set of changes introduces an atomic API to the PWM subsystem.
  This is influenced by the DRM atomic API that was introduced a while
  back, though it is obviously a lot simpler.  The fundamental idea
  remains the same, though: drivers provide a single callback to
  implement the atomic configuration of a PWM channel.

  As a side-effect the PWM subsystem gains the ability for initial state
  retrieval, so that the logical state mirrors that of the hardware.
  Many use-cases don't care about this, but for others it is essential.

  These new features require changes in all users, which these patches
  take care of.  The core is transitioned to use the atomic callback if
  available and provides a fallback mechanism for other drivers.

  Changes to transition users and drivers to the atomic API are
  postponed to v4.8"

* tag 'pwm/for-4.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm: (30 commits)
  pwm: Add information about polarity, duty cycle and period to debugfs
  pwm: Switch to the atomic API
  pwm: Update documentation
  pwm: Add core infrastructure to allow atomic updates
  pwm: Add hardware readout infrastructure
  pwm: Move the enabled/disabled info into pwm_state
  pwm: Introduce the pwm_state concept
  pwm: Keep PWM state in sync with hardware state
  ARM: Explicitly apply PWM config extracted from pwm_args
  drm: i915: Explicitly apply PWM config extracted from pwm_args
  input: misc: pwm-beeper: Explicitly apply PWM config extracted from pwm_args
  input: misc: max8997: Explicitly apply PWM config extracted from pwm_args
  backlight: lm3630a: explicitly apply PWM config extracted from pwm_args
  backlight: lp855x: Explicitly apply PWM config extracted from pwm_args
  backlight: lp8788: Explicitly apply PWM config extracted from pwm_args
  backlight: pwm_bl: Use pwm_get_args() where appropriate
  fbdev: ssd1307fb: Use pwm_get_args() where appropriate
  regulator: pwm: Use pwm_get_args() where appropriate
  leds: pwm: Use pwm_get_args() where appropriate
  input: misc: max77693: Use pwm_get_args() where appropriate
  ...

8 years agoMerge git://www.linux-watchdog.org/linux-watchdog
Linus Torvalds [Wed, 25 May 2016 17:19:17 +0000 (10:19 -0700)]
Merge git://www.linux-watchdog.org/linux-watchdog

Pull watchdog updates from Wim Van Sebroeck:

 - add support for Fintek F81865 Super-IO chip

 - add support for watchdogs (RWDT and SWDT) found on RCar Gen3 based
   SoCs from Renesas

 - octeon: Handle the FROZEN hot plug notifier actions

 - f71808e_wdt fixes and cleanups

 - some small improvements in code and documentation

* git://www.linux-watchdog.org/linux-watchdog:
  MAINTAINERS: Add file patterns for watchdog device tree bindings
  Documentation: Add ebc-c384_wdt watchdog-parameters.txt entry
  watchdog: shwdt: Use setup_timer()
  watchdog: cpwd: Use setup_timer()
  arm64: defconfig: enable Renesas Watchdog Timer
  watchdog: renesas-wdt: add driver
  watchdog: remove error message when unable to allocate watchdog device
  watchdog: f71808e_wdt: Fix WDTMOUT_STS register read
  watchdog: f71808e_wdt: Fix typo
  watchdog: f71808e_wdt: Add F81865 support
  watchdog: sp5100_tco: properly check for new register layouts
  watchdog: core: Fix circular locking dependency
  watchdog: core: fix trivial typo in a comment
  watchdog: hpwdt: Adjust documentation to match latest kernel module parameters.
  watchdog: imx2_wdt: add external reset support via dt prop
  watchdog: octeon: Handle the FROZEN hot plug notifier actions.
  watchdog: qcom: Report reboot reason

8 years agoMerge tag 'vfio-v4.7-rc1' of git://github.com/awilliam/linux-vfio
Linus Torvalds [Wed, 25 May 2016 16:47:26 +0000 (09:47 -0700)]
Merge tag 'vfio-v4.7-rc1' of git://github.com/awilliam/linux-vfio

Pull VFIO updates from Alex Williamson:

 - Hide INTx on certain known broken devices (Alex Williamson)

 - Additional backdoor reset detection (Alex Williamson)

 - Remove unused iommudata reference (Alexey Kardashevskiy)

 - Use cfg_size to avoid probing extended config space (Alexey
   Kardashevskiy)

* tag 'vfio-v4.7-rc1' of git://github.com/awilliam/linux-vfio:
  vfio_pci: Test for extended capabilities if config space > 256 bytes
  vfio_iommu_spapr_tce: Remove unneeded iommu_group_get_iommudata
  vfio/pci: Add test for BAR restore
  vfio/pci: Hide broken INTx support from user

8 years agoMerge tag 'drm-4.7-rc1-headers-fix' of git://people.freedesktop.org/~airlied/linux
Linus Torvalds [Wed, 25 May 2016 16:37:50 +0000 (09:37 -0700)]
Merge tag 'drm-4.7-rc1-headers-fix' of git://people.freedesktop.org/~airlied/linux

Pull header warning fix from Dave Airlie:
 "Here is the C++ guards warning fix from Arnd"

[ Background: there are 'extern "C" { }' guards in include/uapi for the
  GPU headers.

  They should arguably be wrapped somehow, but as it is they caused
  checkpatch to warn because it would trigger on the 'extern' and think
  it's exporting a function or variable from the kernel to user space.

  This just fixes checkpatch.  Whether we wrap the C++ guards some way
  in the future will be an independent issue. ]

* tag 'drm-4.7-rc1-headers-fix' of git://people.freedesktop.org/~airlied/linux:
  headers_check: don't warn about c++ guards

8 years agoMerge branch 'parisc-4.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller...
Linus Torvalds [Wed, 25 May 2016 16:27:52 +0000 (09:27 -0700)]
Merge branch 'parisc-4.7-1' of git://git./linux/kernel/git/deller/parisc-linux

Pull parisc updates from Helge Deller:

 - Add native high-resolution timing code for sched_clock() and other
   timing functions based on the processor internal cr16 cycle counters

 - Add syscall tracepoint support

 - Add regset support

 - Speed up get_user() and put_user() functions

 - Updated futex.h to match generic implementation (John David Anglin)

 - A few smaller ftrace build fixes

 - Fixed thuge-gen kernel self test to utilize architectured MAP_HUGETLB
   value

 - Added parisc architecture to seccomp_bpf kernel self test

 - Various typo fixes (Andrea Gelmini)

* 'parisc-4.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  parisc: Whitespace cleanups in unistd.h
  parisc: Use long jump to reach ftrace_return_to_handler()
  parisc: Fix typo in fpudispatch.c
  parisc: Fix typos in eisa_eeprom.h
  parisc: Fix typo in ldcw.h
  parisc: Fix typo in pdc.h
  parisc: Update futex.h to match generic implementation
  parisc: Merge ftrace C-helper and assembler functions into .text.hot section
  selftests/thuge-gen: Use platform specific MAP_HUGETLB value
  parisc: Add native high-resolution sched_clock() implementation
  parisc: Add ARCH_TRACEHOOK and regset support
  parisc: Add 64bit get_user() and put_user() for 32bit kernel
  parisc: Simplify and speed up get_user() and put_user()
  parisc: Add syscall tracepoint support

8 years agoparisc: Whitespace cleanups in unistd.h
Helge Deller [Wed, 25 May 2016 13:40:49 +0000 (15:40 +0200)]
parisc: Whitespace cleanups in unistd.h

Clean up whitespaces and mark unused syscalls as such.

Signed-off-by: Helge Deller <deller@gmx.de>
8 years agosched/core: Fix remote wakeups
Peter Zijlstra [Mon, 23 May 2016 09:19:07 +0000 (11:19 +0200)]
sched/core: Fix remote wakeups

Commit:

  b5179ac70de8 ("sched/fair: Prepare to fix fairness problems on migration")

... introduced a bug: Mike Galbraith found that it introduced a
performance regression, while Paul E. McKenney reported lost
wakeups and bisected it to this commit.

The reason is that I mis-read ttwu_queue() such that I assumed any
wakeup that got a remote queue must have had the task migrated.

Since this is not so; we need to transfer this information between
queueing the wakeup and actually doing the wakeup. Use a new
task_struct::sched_flag for this, we already write to
sched_contributes_to_load in the wakeup path so this is a hot and
modified cacheline.

Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reported-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Tested-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Hunter <ahh@google.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Ben Segall <bsegall@google.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul Turner <pjt@google.com>
Cc: Pavan Kondeti <pkondeti@codeaurora.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: byungchul.park@lge.com
Fixes: b5179ac70de8 ("sched/fair: Prepare to fix fairness problems on migration")
Link: http://lkml.kernel.org/r/20160523091907.GD15728@worktop.ger.corp.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
8 years agoMerge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Linus Torvalds [Tue, 24 May 2016 22:50:58 +0000 (15:50 -0700)]
Merge tag 'armsoc-fixes' of git://git./linux/kernel/git/arm/arm-soc

Pull ARM SoC fixes from Arnd Bergmann:
 "This is a first set of bug fixes on top of what was merged for 4.7.

  Two patches for lpc32xx address a harmless build warning that was just
  introduced, one patch for the mediatek soc driver fixes a warning for
  arm64, and the pxa changes are minor cleanups that should have been
  part of the original pull requests but that I forgot to apply to the
  cleanup-fixes branch earlier"

* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  ARM: lpc32xx: fix NR_IRQS confict
  ARM: lpc32xx: remove legacy irq controller driver
  soc: mtk-pmic-wrap: avoid integer overflow warning
  ARM: pxa: Remove CLK_IS_ROOT
  ARM: pxa: activate pinctrl for device-tree machines

8 years agoMerge tag 'armsoc-late' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Linus Torvalds [Tue, 24 May 2016 22:46:06 +0000 (15:46 -0700)]
Merge tag 'armsoc-late' of git://git./linux/kernel/git/arm/arm-soc

Pull ARM SoC late DT updates from Arnd Bergmann:
 "This is a collection of a few late fixes and other misc stuff that had
  dependencies on things being merged from other trees.

  The Renesas R-Car power domain handling, and the Nvidia Tegra USB
  support both hand notable changes that required changing the DT
  binding in a way that only provides compatibility with old DT blobs on
  new kernels but not vice versa.  As a consequence, the DT changes are
  based on top of the driver changes and are now in this branch.

  For NXP i.MX and Samsung Exynos, the changes in here depend on other
  changes that got merged through the clk maintainer tree"

* tag 'armsoc-late' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (35 commits)
  ARM: dts: exynos: Add support of Bus frequency using VDD_INT for exynos5422-odroidxu3
  ARM: dts: exynos: Add bus nodes using VDD_INT for Exynos542x SoC
  ARM: dts: exynos: Add NoC Probe dt node for Exynos542x SoC
  ARM: dts: exynos: Add support of bus frequency for exynos4412-trats/odroidu3
  ARM: dts: exynos: Expand the voltage range of buck1/3 regulator for exynos4412-odroidu3
  ARM: dts: exynos: Add support of bus frequency using VDD_INT for exynos3250-rinato
  ARM: dts: exynos: Add exynos4412-ppmu-common dtsi to delete duplicate PPMU nodes
  ARM: dts: exynos: Add bus nodes using VDD_MIF for Exynos4210
  ARM: dts: exynos: Add bus nodes using VDD_INT for Exynos4x12
  ARM: dts: exynos: Add bus nodes using VDD_MIF for Exynos4x12
  ARM: dts: exynos: Add bus nodes using VDD_INT for Exynos3250
  ARM: dts: exynos: Add DMC bus frequency for exynos3250-rinato/monk
  ARM: dts: exynos: Add DMC bus node for Exynos3250
  ARM: tegra: Enable XUSB on Nyan
  ARM: tegra: Enable XUSB on Jetson TK1
  ARM: tegra: Enable XUSB on Venice2
  ARM: tegra: Add Tegra124 XUSB controller
  ARM: tegra: Move Tegra124 to the new XUSB pad controller binding
  ARM: dts: r8a7794: Use SYSC "always-on" PM Domain
  ARM: dts: r8a7793: Use SYSC "always-on" PM Domain
  ...

8 years agoMerge tag 'asm-generic-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd...
Linus Torvalds [Tue, 24 May 2016 22:24:37 +0000 (15:24 -0700)]
Merge tag 'asm-generic-4.7' of git://git./linux/kernel/git/arnd/asm-generic

Pull asm-generic cleanup from Arnd Bergmann:
 "I have only one patch for asm-generic in this release, this one is
  from James Hogan and updates the generic system call table for
  renameat2 so we don't need to provide both renameat and renameat2 in
  newly added architectures"

* tag 'asm-generic-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
  asm-generic: Drop renameat syscall from default list

8 years agoMerge tag 'nfsd-4.7' of git://linux-nfs.org/~bfields/linux
Linus Torvalds [Tue, 24 May 2016 21:39:20 +0000 (14:39 -0700)]
Merge tag 'nfsd-4.7' of git://linux-nfs.org/~bfields/linux

Pull nfsd updates from Bruce Fields:
 "A very quiet cycle for nfsd, mainly just an RDMA update from Chuck
  Lever"

* tag 'nfsd-4.7' of git://linux-nfs.org/~bfields/linux:
  sunrpc: fix stripping of padded MIC tokens
  svcrpc: autoload rdma module
  svcrdma: Generalize svc_rdma_xdr_decode_req()
  svcrdma: Eliminate code duplication in svc_rdma_recvfrom()
  svcrdma: Drain QP before freeing svcrdma_xprt
  svcrdma: Post Receives only for forward channel requests
  svcrdma: Remove superfluous line from rdma_read_chunks()
  svcrdma: svc_rdma_put_context() is invoked twice in Send error path
  svcrdma: Do not add XDR padding to xdr_buf page vector
  svcrdma: Support IPv6 with NFS/RDMA
  nfsd: handle seqid wraparound in nfsd4_preprocess_layout_stateid
  Remove unnecessary allocation

8 years agoMerge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso...
Linus Torvalds [Tue, 24 May 2016 19:55:26 +0000 (12:55 -0700)]
Merge tag 'ext4_for_linus' of git://git./linux/kernel/git/tytso/ext4

Pull ext4 updates from Ted Ts'o:
 "Fix a number of bugs, most notably a potential stale data exposure
  after a crash and a potential BUG_ON crash if a file has the data
  journalling flag enabled while it has dirty delayed allocation blocks
  that haven't been written yet.  Also fix a potential crash in the new
  project quota code and a maliciously corrupted file system.

  In addition, fix some DAX-specific bugs, including when there is a
  transient ENOSPC situation and races between writes via direct I/O and
  an mmap'ed segment that could lead to lost I/O.

  Finally the usual set of miscellaneous cleanups"

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (23 commits)
  ext4: pre-zero allocated blocks for DAX IO
  ext4: refactor direct IO code
  ext4: fix race in transient ENOSPC detection
  ext4: handle transient ENOSPC properly for DAX
  dax: call get_blocks() with create == 1 for write faults to unwritten extents
  ext4: remove unmeetable inconsisteny check from ext4_find_extent()
  jbd2: remove excess descriptions for handle_s
  ext4: remove unnecessary bio get/put
  ext4: silence UBSAN in ext4_mb_init()
  ext4: address UBSAN warning in mb_find_order_for_block()
  ext4: fix oops on corrupted filesystem
  ext4: fix check of dqget() return value in ext4_ioctl_setproject()
  ext4: clean up error handling when orphan list is corrupted
  ext4: fix hang when processing corrupted orphaned inode list
  ext4: remove trailing \n from ext4_warning/ext4_error calls
  ext4: fix races between changing inode journal mode and ext4_writepages
  ext4: handle unwritten or delalloc buffers before enabling data journaling
  ext4: fix jbd2 handle extension in ext4_ext_truncate_extend_restart()
  ext4: do not ask jbd2 to write data for delalloc buffers
  jbd2: add support for avoiding data writes during transaction commits
  ...

8 years agoARM: lpc32xx: fix NR_IRQS confict
Arnd Bergmann [Thu, 19 May 2016 08:31:31 +0000 (10:31 +0200)]
ARM: lpc32xx: fix NR_IRQS confict

With the change to sparse IRQs, the lpc32xx platform gets a warning about
conflicting macros:

In file included from arch/arm/mach-lpc32xx/irq.c:31:0:
arch/arm/mach-lpc32xx/include/mach/irqs.h:115:0: warning: "NR_IRQS" redefined
 #define NR_IRQS    96
arch/arm/include/asm/irq.h:9:0: note: this is the location of the previous definition
 #define NR_IRQS NR_IRQS_LEGACY

One such instance was in the old irq driver that is now removed by
the previous patch, but any other file including mach/irqs.h still
has the issue. Since none of them use this constant, we can just
remove the old definition.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 8cb17b5ed017 ("irqchip: Add LPC32xx interrupt controller driver")

8 years agoARM: lpc32xx: remove legacy irq controller driver
Vladimir Zapolskiy [Mon, 25 Apr 2016 01:00:44 +0000 (04:00 +0300)]
ARM: lpc32xx: remove legacy irq controller driver

New NXP LPC32xx irq chip driver is used instead of a legacy one.

[this also fixes a harmless build warning about the NR_IRQS redefinition]

Signed-off-by: Vladimir Zapolskiy <vz@mleia.com>
Acked-by: Sylvain Lemieux <slemieux.tyco@gmail.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
8 years agoMerge tag 'spi-v4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Linus Torvalds [Tue, 24 May 2016 18:12:32 +0000 (11:12 -0700)]
Merge tag 'spi-v4.7' of git://git./linux/kernel/git/broonie/spi

Pull spi updates from Mark Brown:
 "Another quiet release for SPI, almost entirely driver specific changes
  with the diffstat dominated by two new drivers which are about two
  thirds of it in terms of lines of code:

   - new drivers for PIC32 standard and SQI controllers
   - the Cadence driver has had runtime PM support added and quite a few
     fixes and cleanups
   - flash-specific accelerated path support now has a feature query
     interface
   - the pxa2xx driver has been moved to use the core DMA mapping support"

* tag 'spi-v4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: (48 commits)
  spi: pic32-sqi: Fix linker error, undefined reference to `bad_dma_ops'
  spi: dw-pci: Spelling s/paltforms/platforms/g
  spi: pic32-sqi: Remove pic32_sqi_setup and pic32_sqi_cleanup
  spi: Fix simple typo s/impelment/implement
  spi: rockchip: potential NULL dereference on error
  spi: zynqmp: disable clocks in error paths
  spi: Drop unnecessary dependencies on relaxed I/O accessors
  spi: qup: Add spi_master_put in remove function
  spi: qup: Handle clocks in pm_runtime suspend and resume
  spi: st-ssc4: Fix missing spi_master_put in spi_st_probe error paths
  spi: st-ssc4: Allow compile test build
  spi: omap2-mcspi: Use dma_request_chan() for requesting DMA channel
  spi: davinci: Use dma_request_chan() for requesting DMA channel
  spi: pic32: Fix checking return value of devm_ioremap_resource
  spi: spi-fsl-dspi: Update DT binding documentation
  spi: Drop duplicate code to set master->dev.parent
  spi: pic32: Set proper bits_per_word_mask
  spi: return error if kmap'd buffers passed to spi_map_buf()
  spi: core: add hook flash_read_supported to spi_master
  spi: pic32-sqi: silence array overflow warning
  ...

8 years agoMerge tag 'for-linus-20160523' of git://git.infradead.org/linux-mtd
Linus Torvalds [Tue, 24 May 2016 18:00:20 +0000 (11:00 -0700)]
Merge tag 'for-linus-20160523' of git://git.infradead.org/linux-mtd

Pull MTD updates from Brian Norris:
 "First cycle with Boris as NAND maintainer! Many (most) bullets stolen
  from him.

  Generic:
   - Migrated NAND LED trigger to be a generic MTD trigger

  NAND:
   - Introduction of the "ECC algorithm" concept, to avoid overloading
     the ECC mode field too much more
   - Replaced the nand_ecclayout infrastructure with something a little
     more flexible (finally!) and future proof
   - Rework of the OMAP GPMC and NAND drivers; the TI folks pulled some
     of this into their own tree as well
   - Prepare the sunxi NAND driver to receive DMA support
   - Handle bitflips in erased pages on GPMI revisions that do not
     support this in hardware.

  SPI NOR:
   - Start using the spi_flash_read() API for SPI drivers that support
     it (i.e., SPI drivers with special memory-mapped flash modes)

  And other small scattered improvments"

* tag 'for-linus-20160523' of git://git.infradead.org/linux-mtd: (155 commits)
  mtd: spi-nor: support GigaDevice gd25lq64c
  mtd: nand_bch: fix spelling of "probably"
  mtd: brcmnand: respect ECC algorithm set by NAND subsystem
  gpmi-nand: Handle ECC Errors in erased pages
  Documentation: devicetree: deprecate "soft_bch" nand-ecc-mode value
  mtd: nand: add support for "nand-ecc-algo" DT property
  mtd: mtd: drop NAND_ECC_SOFT_BCH enum value
  mtd: drop support for NAND_ECC_SOFT_BCH as "soft_bch" mapping
  mtd: nand: read ECC algorithm from the new field
  mtd: nand: fsmc: validate ECC setup by checking algorithm directly
  mtd: nand: set ECC algorithm to Hamming on fallback
  staging: mt29f_spinand: set ECC algorithm explicitly
  CRIS v32: nand: set ECC algorithm explicitly
  mtd: nand: atmel: set ECC algorithm explicitly
  mtd: nand: davinci: set ECC algorithm explicitly
  mtd: nand: bf5xx: set ECC algorithm explicitly
  mtd: nand: omap2: Fix high memory dma prefetch transfer
  mtd: nand: omap2: Start dma request before enabling prefetch
  mtd: nandsim: add __init attribute
  mtd: nand: move of_get_nand_xxx() helpers into nand_base.c
  ...

8 years agoMerge tag 'for-linus-4.7-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Tue, 24 May 2016 17:22:34 +0000 (10:22 -0700)]
Merge tag 'for-linus-4.7-rc0-tag' of git://git./linux/kernel/git/xen/tip

Pull xen bug fixes from David Vrabel.

* tag 'for-linus-4.7-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen: use same main loop for counting and remapping pages
  xen/events: Don't move disabled irqs
  xen/x86: actually allocate legacy interrupts on PV guests
  Xen: don't warn about 2-byte wchar_t in efi
  xen/gntdev: reduce copy batch size to 16
  xen/x86: don't lose event interrupts

8 years agoMerge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Linus Torvalds [Tue, 24 May 2016 16:46:45 +0000 (09:46 -0700)]
Merge tag 'for_linus' of git://git./linux/kernel/git/mst/vhost

Pull virtio updates from Michael Tsirkin:
 "Looks like a quiet cycle for virtio.  There's a new inorder option for
  the ringtest tool, and a bugfix for balloon for ppc platforms when
  using virtio 1 mode"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  ringtest: pass buf != NULL
  virtio_balloon: fix PFN format for virtio-1
  virtio: add inorder option

8 years agoMerge tag 'nios2-v4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/lftan/nios2
Linus Torvalds [Tue, 24 May 2016 16:41:46 +0000 (09:41 -0700)]
Merge tag 'nios2-v4.7' of git://git./linux/kernel/git/lftan/nios2

Pull nios2 update from Ley Foon Tan:
 - add order-only DTC dependency to %.dtb target
 - fix libgcc location detection

* tag 'nios2-v4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/lftan/nios2:
  nios2: Add order-only DTC dependency to %.dtb target
  nios2: Fix libgcc location detection

8 years agoMerge tag 'microblaze-4.7-rc1' of git://git.monstr.eu/linux-2.6-microblaze
Linus Torvalds [Tue, 24 May 2016 16:19:38 +0000 (09:19 -0700)]
Merge tag 'microblaze-4.7-rc1' of git://git.monstr.eu/linux-2.6-microblaze

Pull Microblaze updates from Michal Simek:

 - Wire-up new syscalls

 - Fix link error

* tag 'microblaze-4.7-rc1' of git://git.monstr.eu/linux-2.6-microblaze:
  microblaze: pci: export isa_io_base to fix link errors
  microblaze: Wire up userfaultfd, membarrier, mlock2 syscalls

8 years agoxen: use same main loop for counting and remapping pages
Juergen Gross [Wed, 18 May 2016 14:44:54 +0000 (16:44 +0200)]
xen: use same main loop for counting and remapping pages

Instead of having two functions for cycling through the E820 map in
order to count to be remapped pages and remap them later, just use one
function with a caller supplied sub-function called for each region to
be processed. This eliminates the possibility of a mismatch between
both loops which showed up in certain configurations.

Suggested-by: Ed Swierk <eswierk@skyportsystems.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
8 years agoxen/events: Don't move disabled irqs
Ross Lagerwall [Tue, 10 May 2016 15:11:00 +0000 (16:11 +0100)]
xen/events: Don't move disabled irqs

Commit ff1e22e7a638 ("xen/events: Mask a moving irq") open-coded
irq_move_irq() but left out checking if the IRQ is disabled. This broke
resuming from suspend since it tries to move a (disabled) irq without
holding the IRQ's desc->lock. Fix it by adding in a check for disabled
IRQs.

The resulting stacktrace was:
kernel BUG at /build/linux-UbQGH5/linux-4.4.0/kernel/irq/migration.c:31!
invalid opcode: 0000 [#1] SMP
Modules linked in: xenfs xen_privcmd ...
CPU: 0 PID: 9 Comm: migration/0 Not tainted 4.4.0-22-generic #39-Ubuntu
Hardware name: Xen HVM domU, BIOS 4.6.1-xs125180 05/04/2016
task: ffff88003d75ee00 ti: ffff88003d7bc000 task.ti: ffff88003d7bc000
RIP: 0010:[<ffffffff810e26e2>]  [<ffffffff810e26e2>] irq_move_masked_irq+0xd2/0xe0
RSP: 0018:ffff88003d7bfc50  EFLAGS: 00010046
RAX: 0000000000000000 RBX: ffff88003d40ba00 RCX: 0000000000000001
RDX: 0000000000000001 RSI: 0000000000000100 RDI: ffff88003d40bad8
RBP: ffff88003d7bfc68 R08: 0000000000000000 R09: ffff88003d000000
R10: 0000000000000000 R11: 000000000000023c R12: ffff88003d40bad0
R13: ffffffff81f3a4a0 R14: 0000000000000010 R15: 00000000ffffffff
FS:  0000000000000000(0000) GS:ffff88003da00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fd4264de624 CR3: 0000000037922000 CR4: 00000000003406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Stack:
 ffff88003d40ba38 0000000000000024 0000000000000000 ffff88003d7bfca0
 ffffffff814c8d92 00000010813ef89d 00000000805ea732 0000000000000009
 0000000000000024 ffff88003cc39b80 ffff88003d7bfce0 ffffffff814c8f66
Call Trace:
 [<ffffffff814c8d92>] eoi_pirq+0xb2/0xf0
 [<ffffffff814c8f66>] __startup_pirq+0xe6/0x150
 [<ffffffff814ca659>] xen_irq_resume+0x319/0x360
 [<ffffffff814c7e75>] xen_suspend+0xb5/0x180
 [<ffffffff81120155>] multi_cpu_stop+0xb5/0xe0
 [<ffffffff811200a0>] ? cpu_stop_queue_work+0x80/0x80
 [<ffffffff811203d0>] cpu_stopper_thread+0xb0/0x140
 [<ffffffff810a94e6>] ? finish_task_switch+0x76/0x220
 [<ffffffff810ca731>] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x20
 [<ffffffff810a3935>] smpboot_thread_fn+0x105/0x160
 [<ffffffff810a3830>] ? sort_range+0x30/0x30
 [<ffffffff810a0588>] kthread+0xd8/0xf0
 [<ffffffff810a04b0>] ? kthread_create_on_node+0x1e0/0x1e0
 [<ffffffff8182568f>] ret_from_fork+0x3f/0x70
 [<ffffffff810a04b0>] ? kthread_create_on_node+0x1e0/0x1e0

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
8 years agoxen/x86: actually allocate legacy interrupts on PV guests
Stefano Stabellini [Wed, 20 Apr 2016 13:15:01 +0000 (14:15 +0100)]
xen/x86: actually allocate legacy interrupts on PV guests

b4ff8389ed14 is incomplete: relies on nr_legacy_irqs() to get the number
of legacy interrupts when actually nr_legacy_irqs() returns 0 after
probe_8259A(). Use NR_IRQS_LEGACY instead.

Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
CC: stable@vger.kernel.org
8 years agoXen: don't warn about 2-byte wchar_t in efi
Arnd Bergmann [Wed, 11 May 2016 12:47:59 +0000 (14:47 +0200)]
Xen: don't warn about 2-byte wchar_t in efi

The XEN UEFI code has become available on the ARM architecture
recently, but now causes a link-time warning:

ld: warning: drivers/xen/efi.o uses 2-byte wchar_t yet the output is to use 4-byte wchar_t; use of wchar_t values across objects may fail

This seems harmless, because the efi code only uses 2-byte
characters when interacting with EFI, so we don't pass on those
strings to elsewhere in the system, and we just need to
silence the warning.

It is not clear to me whether we actually need to build the file
with the -fshort-wchar flag, but if we do, then we should also
pass --no-wchar-size-warning to the linker, to avoid the warning.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Fixes: 37060935dc04 ("ARM64: XEN: Add a function to initialize Xen specific UEFI runtime services")

8 years agoxen/gntdev: reduce copy batch size to 16
David Vrabel [Mon, 9 May 2016 09:59:48 +0000 (10:59 +0100)]
xen/gntdev: reduce copy batch size to 16

IOCTL_GNTDEV_GRANT_COPY batches copy operations to reduce the number
of hypercalls.  The stack is used to avoid a memory allocation in a
hot path. However, a batch size of 24 requires more than 1024 bytes of
stack which in some configurations causes a compiler warning.

    xen/gntdev.c: In function â€˜gntdev_ioctl_grant_copy’:
    xen/gntdev.c:949:1: warning: the frame size of 1248 bytes is
    larger than 1024 bytes [-Wframe-larger-than=]

This is a harmless warning as there is still plenty of stack spare,
but people keep trying to "fix" it.  Reduce the batch size to 16 to
reduce stack usage to less than 1024 bytes.  This should have minimal
impact on performance.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
8 years agoxen/x86: don't lose event interrupts
Stefano Stabellini [Sat, 16 Apr 2016 01:23:00 +0000 (18:23 -0700)]
xen/x86: don't lose event interrupts

On slow platforms with unreliable TSC, such as QEMU emulated machines,
it is possible for the kernel to request the next event in the past. In
that case, in the current implementation of xen_vcpuop_clockevent, we
simply return -ETIME. To be precise the Xen returns -ETIME and we pass
it on. However the result of this is a missed event, which simply causes
the kernel to hang.

Instead it is better to always ask the hypervisor for a timer event,
even if the timeout is in the past. That way there are no lost
interrupts and the kernel survives. To do that, remove the
VCPU_SSHOTTMR_future flag.

Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Juergen Gross <jgross@suse.com>
8 years agoMerge tag 'perf-core-for-mingo-20160523' of git://git.kernel.org/pub/scm/linux/kernel...
Ingo Molnar [Tue, 24 May 2016 05:40:52 +0000 (07:40 +0200)]
Merge tag 'perf-core-for-mingo-20160523' of git://git./linux/kernel/git/acme/linux into perf/urgent

Pull perf/core improvements from Arnaldo Carvalho de Melo:

User visible changes:

- Add "srcline_from" and "srcline_to" branch sort keys to 'perf top' and
  'perf report' (Andi Kleen)

Infrastructure changes:

- Make 'perf trace' auto-attach fd->name and ptr->name beautifiers based
  on the name of syscall arguments, this way new syscalls that have
  'const char * (path,pathname,filename)' will use the fd->name beautifier
  (vfs_getname perf probe, if in place) and the 'fd->name' (vfs_getname
  or via /proc/PID/fd/) (Arnaldo Carvalho de Melo)

- Infrastructure to read from a ring buffer in backward write mode (Wang Nan)

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
8 years agoheaders_check: don't warn about c++ guards
Arnd Bergmann [Wed, 18 May 2016 16:07:29 +0000 (18:07 +0200)]
headers_check: don't warn about c++ guards

A recent addition to the DRM tree for 4.7 added 'extern "C"' guards
for c++ to all the DRM headers, and that now causes warnings
in 'make headers_check':

usr/include/drm/amdgpu_drm.h:38: userspace cannot reference function or variable defined in the kernel
usr/include/drm/drm.h:63: userspace cannot reference function or variable defined in the kernel
usr/include/drm/drm.h:699: userspace cannot reference function or variable defined in the kernel
usr/include/drm/drm_fourcc.h:30: userspace cannot reference function or variable defined in the kernel
usr/include/drm/drm_mode.h:33: userspace cannot reference function or variable defined in the kernel
usr/include/drm/drm_sarea.h:38: userspace cannot reference function or variable defined in the kernel
usr/include/drm/exynos_drm.h:21: userspace cannot reference function or variable defined in the kernel
usr/include/drm/i810_drm.h:7: userspace cannot reference function or variable defined in the kernel

This changes the headers_check.pl script to not warn about this.
I'm listing the merge commit as introducing the problem, because
there are several patches in this branch that each do this for
one file.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 7c10ddf87472 ("Merge branch 'drm-uapi-extern-c-fixes' of https://github.com/evelikov/linux into drm-next")
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
8 years agoMerge branch 'akpm' (patches from Andrew)
Linus Torvalds [Tue, 24 May 2016 02:42:28 +0000 (19:42 -0700)]
Merge branch 'akpm' (patches from Andrew)

Merge yet more updates from Andrew Morton:

 - Oleg's "wait/ptrace: assume __WALL if the child is traced".  It's a
   kernel-based workaround for existing userspace issues.

 - A few hotfixes

 - befs cleanups

 - nilfs2 updates

 - sys_wait() changes

 - kexec updates

 - kdump

 - scripts/gdb updates

 - the last of the MM queue

 - a few other misc things

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (84 commits)
  kgdb: depends on VT
  drm/amdgpu: make amdgpu_mn_get wait for mmap_sem killable
  drm/radeon: make radeon_mn_get wait for mmap_sem killable
  drm/i915: make i915_gem_mmap_ioctl wait for mmap_sem killable
  uprobes: wait for mmap_sem for write killable
  prctl: make PR_SET_THP_DISABLE wait for mmap_sem killable
  exec: make exec path waiting for mmap_sem killable
  aio: make aio_setup_ring killable
  coredump: make coredump_wait wait for mmap_sem for write killable
  vdso: make arch_setup_additional_pages wait for mmap_sem for write killable
  ipc, shm: make shmem attach/detach wait for mmap_sem killable
  mm, fork: make dup_mmap wait for mmap_sem for write killable
  mm, proc: make clear_refs killable
  mm: make vm_brk killable
  mm, elf: handle vm_brk error
  mm, aout: handle vm_brk failures
  mm: make vm_munmap killable
  mm: make vm_mmap killable
  mm: make mmap_sem for write waits killable for mm syscalls
  MAINTAINERS: add co-maintainer for scripts/gdb
  ...

8 years agoMerge tag 'linux-kselftest-4.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Tue, 24 May 2016 02:37:41 +0000 (19:37 -0700)]
Merge tag 'linux-kselftest-4.7-rc1' of git://git./linux/kernel/git/shuah/linux-kselftest

Pull kselftest updates from Shuah Khan:
 "This update for Kselftest adds:

   - a new ftrace testcase
   - fixes for ftrace and intel_pstate tests"

* tag 'linux-kselftest-4.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  tools: testing: define the _GNU_SOURCE macro
  kselftests/ftrace: Add a test case for event pid filtering
  kselftests/ftrace: Detect tracefs mount point

8 years agoMerge tag 'trace-v4.7-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt...
Linus Torvalds [Tue, 24 May 2016 02:30:30 +0000 (19:30 -0700)]
Merge tag 'trace-v4.7-3' of git://git./linux/kernel/git/rostedt/linux-trace

Pull tracing fix from Steven Rostedt:
 "Reviewing the selftest I recently submitted, I realize that the second
  part of it uses my old hack to get the PID of the spawned background
  tasks, which doesn't work for all shells, instead of the common use of
  $!"

* tag 'trace-v4.7-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  ftracetest: Use proper logic to find process PID

8 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile
Linus Torvalds [Tue, 24 May 2016 02:05:11 +0000 (19:05 -0700)]
Merge git://git./linux/kernel/git/cmetcalf/linux-tile

Pull arch/tile updates from Chris Metcalf:
 "This is an even quieter cycle than usual"

* git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
  Fix typo
  Fix typo
  Fix typo
  tile: sort the "select" lines in the TILE/TILEGX configs
  tile: clarify barrier semantics of atomic_add_return
  tile/defconfigs: Remove CONFIG_IPV6_PRIVACY

8 years agoMerge branch 'for-4.7-dw' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata
Linus Torvalds [Tue, 24 May 2016 01:19:21 +0000 (18:19 -0700)]
Merge branch 'for-4.7-dw' of git://git./linux/kernel/git/tj/libata

Pull libata sata_dwc_460ex updates from Tejun Heo:
 "Patches to bring sata_dwc_460ex up to snuff.

  It was a separate pull request because it depends on dmaengine dw
  platform changes which are now in mainline"

* 'for-4.7-dw' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata: (24 commits)
  ata: dwc: add DMADEVICES dependency
  powerpc/4xx: Device tree update for the 460ex DWC SATA
  ata: sata_dwc_460ex: make debug messages neat
  ata: sata_dwc_460ex: supply physical address of FIFO to DMA
  ata: sata_dwc_460ex: use devm_ioremap
  ata: sata_dwc_460ex: tidy up sata_dwc_clear_dmacr()
  ata: sata_dwc_460ex: use readl/writel_relaxed()
  ata: sata_dwc_460ex: switch to new dmaengine_terminate_* API
  ata: sata_dwc_460ex: add __iomem to register base pointer
  ata: sata_dwc_460ex: get rid of incorrect cast
  ata: sata_dwc_460ex: get rid of some pointless casts
  ata: sata_dwc_460ex: remove empty libata callback
  ata: sata_dwc_460ex: correct HOSTDEV{P}_FROM_*() macros
  ata: sata_dwc_460ex: get rid of global data
  ata: sata_dwc_460ex: add phy support
  ata: sata_dwc_460ex: use "dmas" DT property to find dma channel
  ata: sata_dwc_460ex: don't call ata_sff_qc_issue() on DMA commands
  ata: sata_dwc_460ex: skip dma setup for non-dma commands
  ata: sata_dwc_460ex: select only core part of DMA driver
  ata: sata_dwc_460ex: DMA is always a flow controller
  ...

8 years agoMerge branch 'for-4.7-zac' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata
Linus Torvalds [Tue, 24 May 2016 00:53:39 +0000 (17:53 -0700)]
Merge branch 'for-4.7-zac' of git://git./linux/kernel/git/tj/libata

Pull libata ZAC support from Tejun Heo:
 "This contains Zone ATA Command support for Shingled Magnetic Recording
  devices.

  In addition to sending the new commands down to the device, as ZAC
  commands depend on getting a lot of responses from the device, piping
  up responses is beefed up too.  However, it doesn't involve changes to
  libata core mechanism or its interaction with upper layers, so I'm not
  expecting too many fallouts.

  Kudos to Hannes for driving SMR support"

* 'for-4.7-zac' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata: (28 commits)
  libata: support host-aware and host-managed ZAC devices
  libata: support device-managed ZAC devices
  libata: NCQ encapsulation for ZAC MANAGEMENT OUT
  libata: Implement ZBC OUT translation
  libata: implement ZBC IN translation
  libata: fixup ZAC device disabling
  libata-scsi: Generate sense code for disabled devices
  libata-trace: decode subcommands
  libata: Check log page directory before accessing pages
  libata: Add command definitions for NCQ Encapsulation for READ LOG DMA EXT
  libata: Separate out ata_dev_config_ncq_send_recv()
  libata/libsas: Define ATA_CMD_NCQ_NON_DATA
  libsas: enable FPDMA SEND/RECEIVE
  libata: do not attempt to retrieve sense code twice
  libata-scsi: Set information sense field for invalid parameter
  libata-scsi: set bit pointer for sense code information
  libata-scsi: Set field pointer in sense code
  scsi: add scsi_set_sense_field_pointer()
  libata: Implement control mode page to select sense format
  libata-scsi: generate correct ATA pass-through sense
  ...

8 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris...
Linus Torvalds [Tue, 24 May 2016 00:26:27 +0000 (17:26 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jmorris/linux-security

Pull more security subsystem updates from James Morris:
 "Minor updates for the keys code"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
  MAINTAINERS: Update keyrings record and add asymmetric keys record
  lib: asn1_decoder - add MODULE_LICENSE("GPL")
  KEYS: The PKCS#7 test key type should use the secondary keyring

8 years agokgdb: depends on VT
Jiri Slaby [Mon, 23 May 2016 23:26:20 +0000 (16:26 -0700)]
kgdb: depends on VT

With VT=n, the kernel build fails with:

  drivers/built-in.o: In function `kgdboc_pre_exp_handler':
  kgdboc.c:(.text+0x7b5aa): undefined reference to `fg_console'
  kgdboc.c:(.text+0x7b5ce): undefined reference to `vc_cons'
  kgdboc.c:(.text+0x7b5d5): undefined reference to `vc_cons'

kgdboc.o is built when KGDB_SERIAL_CONSOLE is set.  So make
KGDB_SERIAL_CONSOLE depend on HW_CONSOLE which includes those symbols.

Link: http://lkml.kernel.org/r/1459412955-4696-1-git-send-email-jslaby@suse.cz
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Reported-by: "Jim Davis" <jim.epost@gmail.com>
Acked-by: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agodrm/amdgpu: make amdgpu_mn_get wait for mmap_sem killable
Michal Hocko [Mon, 23 May 2016 23:26:17 +0000 (16:26 -0700)]
drm/amdgpu: make amdgpu_mn_get wait for mmap_sem killable

amdgpu_mn_get which is called during ioct path relies on mmap_sem for
write.  If the waiting task gets killed by the oom killer it would block
oom_reaper from asynchronous address space reclaim and reduce the
chances of timely OOM resolving.  Wait for the lock in the killable mode
and return with EINTR if the task got killed while waiting.

[arnd@arndb.de: use ERR_PTR() to return from amdgpu_mn_get]
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Christian König <christian.koenig@amd.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agodrm/radeon: make radeon_mn_get wait for mmap_sem killable
Michal Hocko [Mon, 23 May 2016 23:26:14 +0000 (16:26 -0700)]
drm/radeon: make radeon_mn_get wait for mmap_sem killable

radeon_mn_get which is called during ioct path relies on mmap_sem for
write.  If the waiting task gets killed by the oom killer it would block
oom_reaper from asynchronous address space reclaim and reduce the
chances of timely OOM resolving.  Wait for the lock in the killable mode
and return with EINTR if the task got killed while waiting.

Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: David Airlie <airlied@linux.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agodrm/i915: make i915_gem_mmap_ioctl wait for mmap_sem killable
Michal Hocko [Mon, 23 May 2016 23:26:11 +0000 (16:26 -0700)]
drm/i915: make i915_gem_mmap_ioctl wait for mmap_sem killable

i915_gem_mmap_ioctl relies on mmap_sem for write.  If the waiting task
gets killed by the oom killer it would block oom_reaper from
asynchronous address space reclaim and reduce the chances of timely OOM
resolving.  Wait for the lock in the killable mode and return with EINTR
if the task got killed while waiting.

Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: David Airlie <airlied@linux.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agouprobes: wait for mmap_sem for write killable
Michal Hocko [Mon, 23 May 2016 23:26:08 +0000 (16:26 -0700)]
uprobes: wait for mmap_sem for write killable

xol_add_vma needs mmap_sem for write.  If the waiting task gets killed
by the oom killer it would block oom_reaper from asynchronous address
space reclaim and reduce the chances of timely OOM resolving.  Wait for
the lock in the killable mode and return with EINTR if the task got
killed while waiting.

Do not warn in dup_xol_work if __create_xol_area failed due to fatal
signal pending because this is usually considered a kernel issue.

Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agoprctl: make PR_SET_THP_DISABLE wait for mmap_sem killable
Michal Hocko [Mon, 23 May 2016 23:26:05 +0000 (16:26 -0700)]
prctl: make PR_SET_THP_DISABLE wait for mmap_sem killable

PR_SET_THP_DISABLE requires mmap_sem for write.  If the waiting task
gets killed by the oom killer it would block oom_reaper from
asynchronous address space reclaim and reduce the chances of timely OOM
resolving.  Wait for the lock in the killable mode and return with EINTR
if the task got killed while waiting.

Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Alex Thorlton <athorlton@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agoexec: make exec path waiting for mmap_sem killable
Michal Hocko [Mon, 23 May 2016 23:26:02 +0000 (16:26 -0700)]
exec: make exec path waiting for mmap_sem killable

setup_arg_pages requires mmap_sem for write.  If the waiting task gets
killed by the oom killer it would block oom_reaper from asynchronous
address space reclaim and reduce the chances of timely OOM resolving.
Wait for the lock in the killable mode and return with EINTR if the task
got killed while waiting.  All the callers are already handling error
path and the fatal signal doesn't need any additional treatment.

The same applies to __bprm_mm_init.

Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agoaio: make aio_setup_ring killable
Michal Hocko [Mon, 23 May 2016 23:25:59 +0000 (16:25 -0700)]
aio: make aio_setup_ring killable

aio_setup_ring waits for mmap_sem in writable mode.  If the waiting task
gets killed by the oom killer it would block oom_reaper from
asynchronous address space reclaim and reduce the chances of timely OOM
resolving.  Wait for the lock in the killable mode and return with EINTR
if the task got killed while waiting.  This will also expedite the
return to the userspace and do_exit.

Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Benamin LaHaise <bcrl@kvack.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agocoredump: make coredump_wait wait for mmap_sem for write killable
Michal Hocko [Mon, 23 May 2016 23:25:57 +0000 (16:25 -0700)]
coredump: make coredump_wait wait for mmap_sem for write killable

coredump_wait waits for mmap_sem for write currently which can prevent
oom_reaper to reclaim the oom victims address space asynchronously
because that requires mmap_sem for read.  This might happen if the oom
victim is multi threaded and some thread(s) is holding mmap_sem for read
(e.g.  page fault) and it is stuck in the page allocator while other
thread(s) reached coredump_wait already.

This patch simply uses down_write_killable and bails out with EINTR if
the lock got interrupted by the fatal signal.  do_coredump will return
right away and do_group_exit will take care to zap the whole thread
group.

Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agovdso: make arch_setup_additional_pages wait for mmap_sem for write killable
Michal Hocko [Mon, 23 May 2016 23:25:54 +0000 (16:25 -0700)]
vdso: make arch_setup_additional_pages wait for mmap_sem for write killable

most architectures are relying on mmap_sem for write in their
arch_setup_additional_pages.  If the waiting task gets killed by the oom
killer it would block oom_reaper from asynchronous address space reclaim
and reduce the chances of timely OOM resolving.  Wait for the lock in
the killable mode and return with EINTR if the task got killed while
waiting.

Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Andy Lutomirski <luto@amacapital.net> [x86 vdso]
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agoipc, shm: make shmem attach/detach wait for mmap_sem killable
Michal Hocko [Mon, 23 May 2016 23:25:51 +0000 (16:25 -0700)]
ipc, shm: make shmem attach/detach wait for mmap_sem killable

shmat and shmdt rely on mmap_sem for write.  If the waiting task gets
killed by the oom killer it would block oom_reaper from asynchronous
address space reclaim and reduce the chances of timely OOM resolving.
Wait for the lock in the killable mode and return with EINTR if the task
got killed while waiting.

Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agomm, fork: make dup_mmap wait for mmap_sem for write killable
Michal Hocko [Mon, 23 May 2016 23:25:48 +0000 (16:25 -0700)]
mm, fork: make dup_mmap wait for mmap_sem for write killable

dup_mmap needs to lock current's mm mmap_sem for write.  If the waiting
task gets killed by the oom killer it would block oom_reaper from
asynchronous address space reclaim and reduce the chances of timely OOM
resolving.  Wait for the lock in the killable mode and return with EINTR
if the task got killed while waiting.

Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agomm, proc: make clear_refs killable
Michal Hocko [Mon, 23 May 2016 23:25:45 +0000 (16:25 -0700)]
mm, proc: make clear_refs killable

CLEAR_REFS_MM_HIWATER_RSS and CLEAR_REFS_SOFT_DIRTY are relying on
mmap_sem for write.  If the waiting task gets killed by the oom killer
and it would operate on the current's mm it would block oom_reaper from
asynchronous address space reclaim and reduce the chances of timely OOM
resolving.  Wait for the lock in the killable mode and return with EINTR
if the task got killed while waiting.  This will also expedite the
return to the userspace and do_exit even if the mm is remote.

Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Petr Cermak <petrcermak@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agomm: make vm_brk killable
Michal Hocko [Mon, 23 May 2016 23:25:42 +0000 (16:25 -0700)]
mm: make vm_brk killable

Now that all the callers handle vm_brk failure we can change it wait for
mmap_sem killable to help oom_reaper to not get blocked just because
vm_brk gets blocked behind mmap_sem readers.

Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agomm, elf: handle vm_brk error
Michal Hocko [Mon, 23 May 2016 23:25:39 +0000 (16:25 -0700)]
mm, elf: handle vm_brk error

load_elf_library doesn't handle vm_brk failure although nothing really
indicates it cannot do that because the function is allowed to fail due
to vm_mmap failures already.  This might be not a problem now but later
patch will make vm_brk killable (resp.  mmap_sem for write waiting will
become killable) and so the failure will be more probable.

Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agomm, aout: handle vm_brk failures
Michal Hocko [Mon, 23 May 2016 23:25:36 +0000 (16:25 -0700)]
mm, aout: handle vm_brk failures

vm_brk is allowed to fail but load_aout_binary simply ignores the error
and happily continues.  I haven't noticed any problem from that in real
life but later patches will make the failure more likely because vm_brk
will become killable (resp.  mmap_sem for write waiting will become
killable) so we should be more careful now.

The error handling should be quite straightforward because there are
calls to vm_mmap which check the error properly already.  The only
notable exception is set_brk which is called after beyond_if label.  But
nothing indicates that we cannot move it above set_binfmt as the two do
not depend on each other and fail before we do set_binfmt and alter
reference counting.

Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agomm: make vm_munmap killable
Michal Hocko [Mon, 23 May 2016 23:25:33 +0000 (16:25 -0700)]
mm: make vm_munmap killable

Almost all current users of vm_munmap are ignoring the return value and
so they do not handle potential error.  This means that some VMAs might
stay behind.  This patch doesn't try to solve those potential problems.
Quite contrary it adds a new failure mode by using down_write_killable
in vm_munmap.  This should be safer than other failure modes, though,
because the process is guaranteed to die as soon as it leaves the kernel
and exit_mmap will clean the whole address space.

This will help in the OOM conditions when the oom victim might be stuck
waiting for the mmap_sem for write which in turn can block oom_reaper
which relies on the mmap_sem for read to make a forward progress and
reclaim the address space of the victim.

Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agomm: make vm_mmap killable
Michal Hocko [Mon, 23 May 2016 23:25:30 +0000 (16:25 -0700)]
mm: make vm_mmap killable

All the callers of vm_mmap seem to check for the failure already and
bail out in one way or another on the error which means that we can
change it to use killable version of vm_mmap_pgoff and return -EINTR if
the current task gets killed while waiting for mmap_sem.  This also
means that vm_mmap_pgoff can be killable by default and drop the
additional parameter.

This will help in the OOM conditions when the oom victim might be stuck
waiting for the mmap_sem for write which in turn can block oom_reaper
which relies on the mmap_sem for read to make a forward progress and
reclaim the address space of the victim.

Please note that load_elf_binary is ignoring vm_mmap error for
current->personality & MMAP_PAGE_ZERO case but that shouldn't be a
problem because the address is not used anywhere and we never return to
the userspace if we got killed.

Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agomm: make mmap_sem for write waits killable for mm syscalls
Michal Hocko [Mon, 23 May 2016 23:25:27 +0000 (16:25 -0700)]
mm: make mmap_sem for write waits killable for mm syscalls

This is a follow up work for oom_reaper [1].  As the async OOM killing
depends on oom_sem for read we would really appreciate if a holder for
write didn't stood in the way.  This patchset is changing many of
down_write calls to be killable to help those cases when the writer is
blocked and waiting for readers to release the lock and so help
__oom_reap_task to process the oom victim.

Most of the patches are really trivial because the lock is help from a
shallow syscall paths where we can return EINTR trivially and allow the
current task to die (note that EINTR will never get to the userspace as
the task has fatal signal pending).  Others seem to be easy as well as
the callers are already handling fatal errors and bail and return to
userspace which should be sufficient to handle the failure gracefully.
I am not familiar with all those code paths so a deeper review is really
appreciated.

As this work is touching more areas which are not directly connected I
have tried to keep the CC list as small as possible and people who I
believed would be familiar are CCed only to the specific patches (all
should have received the cover though).

This patchset is based on linux-next and it depends on
down_write_killable for rw_semaphores which got merged into tip
locking/rwsem branch and it is merged into this next tree.  I guess it
would be easiest to route these patches via mmotm because of the
dependency on the tip tree but if respective maintainers prefer other
way I have no objections.

I haven't covered all the mmap_write(mm->mmap_sem) instances here

  $ git grep "down_write(.*\<mmap_sem\>)" next/master | wc -l
  98
  $ git grep "down_write(.*\<mmap_sem\>)" | wc -l
  62

I have tried to cover those which should be relatively easy to review in
this series because this alone should be a nice improvement.  Other
places can be changed on top.

[0] http://lkml.kernel.org/r/1456752417-9626-1-git-send-email-mhocko@kernel.org
[1] http://lkml.kernel.org/r/1452094975-551-1-git-send-email-mhocko@kernel.org
[2] http://lkml.kernel.org/r/1456750705-7141-1-git-send-email-mhocko@kernel.org

This patch (of 18):

This is the first step in making mmap_sem write waiters killable.  It
focuses on the trivial ones which are taking the lock early after
entering the syscall and they are not changing state before.

Therefore it is very easy to change them to use down_write_killable and
immediately return with -EINTR.  This will allow the waiter to pass away
without blocking the mmap_sem which might be required to make a forward
progress.  E.g.  the oom reaper will need the lock for reading to
dismantle the OOM victim address space.

The only tricky function in this patch is vm_mmap_pgoff which has many
call sites via vm_mmap.  To reduce the risk keep vm_mmap with the
original non-killable semantic for now.

vm_munmap callers do not bother checking the return value so open code
it into the munmap syscall path for now for simplicity.

Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agoMAINTAINERS: add co-maintainer for scripts/gdb
Kieran Bingham [Mon, 23 May 2016 23:25:24 +0000 (16:25 -0700)]
MAINTAINERS: add co-maintainer for scripts/gdb

Add myself as a co-maintainer for scripts/gdb supporting Jan Kizka

Link: http://lkml.kernel.org/r/fb5d34ce563f33d2f324f26f592b24ded30032ee.1462865983.git.jan.kiszka@siemens.com
Signed-off-by: Kieran Bingham <kieran@bingham.xyz>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agoscripts/gdb: decode bytestream on dmesg for Python3
Kieran Bingham [Mon, 23 May 2016 23:25:21 +0000 (16:25 -0700)]
scripts/gdb: decode bytestream on dmesg for Python3

The recent fixes to lx-dmesg, now allow the command to print
successfully on Python3, however the python interpreter wraps the bytes
for each line with a b'<text>' marker.

To remove this, we need to decode the line, where .decode() will default
to 'UTF-8'

Link: http://lkml.kernel.org/r/d67ccf93f2479c94cb3399262b9b796e0dbefcf2.1462865983.git.jan.kiszka@siemens.com
Signed-off-by: Kieran Bingham <kieran@bingham.xyz>
Acked-by: Dom Cote <buzdelabuz2@gmail.com>
Tested-by: Dom Cote <buzdelabuz2@gmail.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agoscripts/gdb: fix issue with dmesg.py and python 3.X
Dom Cote [Mon, 23 May 2016 23:25:19 +0000 (16:25 -0700)]
scripts/gdb: fix issue with dmesg.py and python 3.X

When built against Python 3, GDB differs in the return type for its
read_memory function, causing the lx-dmesg command to fail.

Now that we have an improved read_16() we can use the new
read_memoryview() abstraction to make lx-dmesg return valid data on both
current Python APIs

Tested with python 3.4 and 2.7
Tested with gdb 7.7

Link: http://lkml.kernel.org/r/28477b727ff7fe3101fd4e426060e8a68317a639.1462865983.git.jan.kiszka@siemens.com
Signed-off-by: Dom Cote <buzdelabuz2+git@gmail.com>
[kieran@bingham.xyz: Adjusted commit log to better reflect code changes]
Tested-by: Kieran Bingham <kieran@bingham.xyz> (Py2.7,Py3.4,GDB10)
Signed-off-by: Kieran Bingham <kieran@bingham.xyz>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agoscripts/gdb: improve types abstraction for gdb python scripts
Dom Cote [Mon, 23 May 2016 23:25:16 +0000 (16:25 -0700)]
scripts/gdb: improve types abstraction for gdb python scripts

Change the read_u16 function so it accepts both 'str' and 'byte' as type
for the arguments.

When calling read_memory() from gdb API, depending on if it was built
with 2.7 or 3.X, the format used to return the data will differ ( 'str'
for 2.7, and 'byte' for 3.X ).

Add a function read_memoryview() to be able to get a 'memoryview' object
back from read_memory() both with python 2.7 and 3.X .

Tested with python 3.4 and 2.7
Tested with gdb 7.7

Link: http://lkml.kernel.org/r/73621f564503137a002a639d174e4fb35f73f462.1462865983.git.jan.kiszka@siemens.com
Signed-off-by: Dom Cote <buzdelabuz2+git@gmail.com>
Tested-by: Kieran Bingham <kieran@bingham.xyz> (Py2.7,Py3.4,GDB10)
Signed-off-by: Kieran Bingham <kieran@bingham.xyz>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agoscripts/gdb: add lx_thread_info_by_pid helper
Kieran Bingham [Mon, 23 May 2016 23:25:13 +0000 (16:25 -0700)]
scripts/gdb: add lx_thread_info_by_pid helper

The tasks module already provides helpers to find the task struct by
pid, and the thread_info by task struct; however this is cumbersome to
utilise on the gdb commandline.

Wrap these two functionalities together in an extra single helper to
allow exploring the thread info, from a PID value

Link: http://lkml.kernel.org/r/dadc5667f053ec811eb3e3033d99d937fedbc93b.1462865983.git.jan.kiszka@siemens.com
Signed-off-by: Kieran Bingham <kieran.bingham@linaro.org>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agoscripts/gdb: add documentation example for radix tree
Kieran Bingham [Mon, 23 May 2016 23:25:10 +0000 (16:25 -0700)]
scripts/gdb: add documentation example for radix tree

Provide a worked example for utilising the lx_radix_tree_lookup function

Link: http://lkml.kernel.org/r/e786008ac5aec4b84198812805b326d718bdeb4b.1462865983.git.jan.kiszka@siemens.com
Signed-off-by: Kieran Bingham <kieran.bingham@linaro.org>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agoscripts/gdb: add a Radix Tree Parser
Kieran Bingham [Mon, 23 May 2016 23:25:07 +0000 (16:25 -0700)]
scripts/gdb: add a Radix Tree Parser

Linux makes use of the Radix Tree data structure to store pointers
indexed by integer values.  This structure is utilised across many
structures in the kernel including the IRQ descriptor tables, and
several filesystems.

This module provides a method to lookup values from a structure given
its head node.

Usage:

The function lx_radix_tree_lookup, must be given a symbol of type struct
radix_tree_root, and an index into that tree.

The object returned is a generic integer value, and must be cast
correctly to the type based on the storage in the data structure.

For example, to print the irq descriptor in the sparse irq_desc_tree at
index 18, try the following:

 (gdb) print (struct irq_desc)$lx_radix_tree_lookup(irq_desc_tree, 18)

Link: http://lkml.kernel.org/r/d2028c55e50cf95a9b7f8ca0d11885174b0cc709.1462865983.git.jan.kiszka@siemens.com
Signed-off-by: Kieran Bingham <kieran.bingham@linaro.org>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agoscripts/gdb: cast CPU numbers to integer
Jan Kiszka [Mon, 23 May 2016 23:25:05 +0000 (16:25 -0700)]
scripts/gdb: cast CPU numbers to integer

We won't see more than 2 billion CPUs any time soon, and having cpu_list
return long makes the output of lx-cpus a bit ugly.

Link: http://lkml.kernel.org/r/dcb45c3b0a59e0fd321fa56ff7aa398458c689b3.1462865983.git.jan.kiszka@siemens.com
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agoscripts/gdb: add cpu iterators
Kieran Bingham [Mon, 23 May 2016 23:25:02 +0000 (16:25 -0700)]
scripts/gdb: add cpu iterators

The linux kernel provides macro's for iterating against values from the
cpu_list masks.  By providing some commonly used masks, we can mirror
the kernels helper macros with easy to use generators.

Link: http://lkml.kernel.org/r/d045c6599771ada1999d49612ee30fd2f9acf17f.1462865983.git.jan.kiszka@siemens.com
Signed-off-by: Kieran Bingham <kieran.bingham@linaro.org>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agoscripts/gdb: add mount point list command
Kieran Bingham [Mon, 23 May 2016 23:24:59 +0000 (16:24 -0700)]
scripts/gdb: add mount point list command

lx-mounts will identify current mount points based on the 'init_task'
namespace by default, as we do not yet have a kernel thread list
implementation to select the current running thread.

Optionally, a user can specify a PID to list from that process'
namespace

Link: http://lkml.kernel.org/r/e614c7bc32d2350b4ff1627ec761a7148e65bfe6.1462865983.git.jan.kiszka@siemens.com
Signed-off-by: Kieran Bingham <kieran.bingham@linaro.org>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agoscripts/gdb: add io resource readers
Kieran Bingham [Mon, 23 May 2016 23:24:56 +0000 (16:24 -0700)]
scripts/gdb: add io resource readers

Provide iomem_resource and ioports_resource printers and command hooks

It can be quite interesting to halt the kernel as it's booting and check
to see this list as it is being populated.

It should be useful in the event that a kernel is not booting, you can
identify what memory resources have been registered

Link: http://lkml.kernel.org/r/f0a6b9fa9c92af4d7ed2e7343ccc84150e9c6fc5.1462865983.git.jan.kiszka@siemens.com
Signed-off-by: Kieran Bingham <kieran.bingham@linaro.org>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agoscripts/gdb: provide a dentry_name VFS path helper
Kieran Bingham [Mon, 23 May 2016 23:24:53 +0000 (16:24 -0700)]
scripts/gdb: provide a dentry_name VFS path helper

Walk the VFS entries, pre-pending the iname strings to generate a full
VFS path name from a dentry.

Link: http://lkml.kernel.org/r/4328fdb2d15ba7f1b21ad21c2eecc38d9cfc4d13.1462865983.git.jan.kiszka@siemens.com
Signed-off-by: Kieran Bingham <kieran.bingham@linaro.org>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agoscripts/gdb: support !CONFIG_MODULES gracefully
Kieran Bingham [Mon, 23 May 2016 23:24:51 +0000 (16:24 -0700)]
scripts/gdb: support !CONFIG_MODULES gracefully

If CONFIG_MODULES is not enabled, lx-lsmod tries to find a non-existent
symbol and generates an unfriendly traceback:

  (gdb) lx-lsmod
  Address    Module                  Size  Used by
  Traceback (most recent call last):
    File "scripts/gdb/linux/modules.py", line 75, in invoke
      for module in module_list():
    File "scripts/gdb/linux/modules.py", line 24, in module_list
      module_ptr_type = module_type.get_type().pointer()
    File "scripts/gdb/linux/utils.py", line 28, in get_type
      self._type = gdb.lookup_type(self._name)
  gdb.error: No struct type named module.
  Error occurred in Python command: No struct type named module.

Catch the error and return an empty module_list() for a clean command
output as follows:

  (gdb) lx-lsmod
  Address    Module                  Size  Used by
  (gdb)

Link: http://lkml.kernel.org/r/94d533819437408b85ae5864f939dd7ca6fbfcd6.1462865983.git.jan.kiszka@siemens.com
Signed-off-by: Kieran Bingham <kieran.bingham@linaro.org>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agoscripts/gdb: provide exception catching parser
Kieran Bingham [Mon, 23 May 2016 23:24:48 +0000 (16:24 -0700)]
scripts/gdb: provide exception catching parser

If we attempt to read a value that is not available to GDB, an exception
is raised.  Most of the time, this is a good thing; however on occasion
we will want to be able to determine if a symbol is available.

By catching the exception to simply return None, we can determine if we
tried to read an invalid value, without the exception taking our
execution context away from us

Link: http://lkml.kernel.org/r/c72b25c06fc66e1d68371154097e2cbb112555d8.1462865983.git.jan.kiszka@siemens.com
Signed-off-by: Kieran Bingham <kieran.bingham@linaro.org>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agoscripts/gdb: convert modules usage to lists functions
Kieran Bingham [Mon, 23 May 2016 23:24:45 +0000 (16:24 -0700)]
scripts/gdb: convert modules usage to lists functions

Simplify the module list functions with the new list_for_each_entry
abstractions

Link: http://lkml.kernel.org/r/ad0101c9391088608166fcec26af179868973d86.1462865983.git.jan.kiszka@siemens.com
Signed-off-by: Kieran Bingham <kieran.bingham@linaro.org>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agoscripts/gdb: provide kernel list item generators
Kieran Bingham [Mon, 23 May 2016 23:24:42 +0000 (16:24 -0700)]
scripts/gdb: provide kernel list item generators

Facilitate linked-list items by providing a generator to return the
dereferenced, and type-cast objects from a kernel linked list

Link: http://lkml.kernel.org/r/2b0998564e6e5abe53585d466f87e491331fd2a4.1462865983.git.jan.kiszka@siemens.com
Signed-off-by: Kieran Bingham <kieran.bingham@linaro.org>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agoscripts/gdb: provide linux constants
Kieran Bingham [Mon, 23 May 2016 23:24:40 +0000 (16:24 -0700)]
scripts/gdb: provide linux constants

Some macro's and defines are needed when parsing memory, and without
compiling the kernel as -g3 they are not available in the debug-symbols.

We use the pre-processor here to extract constants to a dedicated module
for the linux debugger extensions

Top level Kbuild is used to call in and generate the constants file,
while maintaining dependencies on autogenerated files in
include/generated

Link: http://lkml.kernel.org/r/bc3df9c25f57ea72177c066a51a446fc19e2c27f.1462865983.git.jan.kiszka@siemens.com
Signed-off-by: Kieran Bingham <kieran.bingham@linaro.org>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Michal Marek <mmarek@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agoscripts/gdb: Adjust module reference counter reported by lx-lsmod
Jan Kiszka [Mon, 23 May 2016 23:24:37 +0000 (16:24 -0700)]
scripts/gdb: Adjust module reference counter reported by lx-lsmod

This takes the MODULE_REF_BASE into account.

Link: http://lkml.kernel.org/r/d926d2d54caa034adb964b52215090cbdb875249.1462865983.git.jan.kiszka@siemens.com
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agoarch/defconfig: remove CONFIG_RESOURCE_COUNTERS
Konstantin Khlebnikov [Mon, 23 May 2016 23:24:34 +0000 (16:24 -0700)]
arch/defconfig: remove CONFIG_RESOURCE_COUNTERS

This option was replaced by PAGE_COUNTER which is selected by MEMCG.

Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agodrivers/memstick/core/mspro_block: use kmemdup
Muhammad Falak R Wani [Mon, 23 May 2016 23:24:31 +0000 (16:24 -0700)]
drivers/memstick/core/mspro_block: use kmemdup

Use kmemdup when some other buffer is immediately copied into allocated
region.  It replaces call to allocation followed by memcpy, by a single
call to kmemdup.

[akpm@linux-foundation.org: remove unneeded cast to void*]
Link: http://lkml.kernel.org/r/1463665743-16269-1-git-send-email-falakreyaz@gmail.com
Signed-off-by: Muhammad Falak R Wani <falakreyaz@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agortsx_usb_ms: use schedule_timeout_idle() in polling loop
Oleksandr Natalenko [Mon, 23 May 2016 23:24:28 +0000 (16:24 -0700)]
rtsx_usb_ms: use schedule_timeout_idle() in polling loop

First version of this patch has already been posted to LKML by Ben
Hutchings ~6 months ago, but no further action were performed.

Ben's original message:

: rtsx_usb_ms creates a task that mostly sleeps, but tasks in
: uninterruptible sleep still contribute to the load average (for
: bug-compatibility with Unix).  A load average of ~1 on a system that
: should be idle is somewhat alarming.
:
: Change the sleep to be interruptible, but still ignore signals.

References: https://bugs.debian.org/765717
Link: http://lkml.kernel.org/r/b49f95ae83057efa5d96f532803cba47@natalenko.name
Signed-off-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Lee Jones <lee.jones@linaro.org>
Cc: Wolfram Sang <wsa@the-dreams.de>
Cc: Roger Tseng <rogerable@realtek.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agokdump: fix gdb macros work work with newer and 64-bit kernels
Corey Minyard [Mon, 23 May 2016 23:24:25 +0000 (16:24 -0700)]
kdump: fix gdb macros work work with newer and 64-bit kernels

Lots of little changes needed to be made to clean these up, remove the
four byte pointer assumption and traverse the pid queue properly.  Also
consolidate the traceback code into a single function instead of having
three copies of it.

Link: http://lkml.kernel.org/r/1462926655-9390-1-git-send-email-minyard@acm.org
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Haren Myneni <hbabu@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agos390/kexec: consolidate crash_map/unmap_reserved_pages() and arch_kexec_protect(unpro...
Xunlei Pang [Mon, 23 May 2016 23:24:22 +0000 (16:24 -0700)]
s390/kexec: consolidate crash_map/unmap_reserved_pages() and arch_kexec_protect(unprotect)_crashkres()

Commit 3f625002581b ("kexec: introduce a protection mechanism for the
crashkernel reserved memory") is a similar mechanism for protecting the
crash kernel reserved memory to previous crash_map/unmap_reserved_pages()
implementation, the new one is more generic in name and cleaner in code
(besides, some arch may not be allowed to unmap the pgtable).

Therefore, this patch consolidates them, and uses the new
arch_kexec_protect(unprotect)_crashkres() to replace former
crash_map/unmap_reserved_pages() which by now has been only used by
S390.

The consolidation work needs the crash memory to be mapped initially,
this is done in machine_kdump_pm_init() which is after
reserve_crashkernel().  Once kdump kernel is loaded, the new
arch_kexec_protect_crashkres() implemented for S390 will actually
unmap the pgtable like before.

Signed-off-by: Xunlei Pang <xlpang@redhat.com>
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Acked-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Minfei Huang <mhuang@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agokexec: do a cleanup for function kexec_load
Minfei Huang [Mon, 23 May 2016 23:24:19 +0000 (16:24 -0700)]
kexec: do a cleanup for function kexec_load

There are a lof of work to be done in function kexec_load, not only for
allocating structs and loading initram, but also for some misc.

To make it more clear, wrap a new function do_kexec_load which is used
to allocate structs and load initram.  And the pre-work will be done in
kexec_load.

Signed-off-by: Minfei Huang <mnfhuang@gmail.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Xunlei Pang <xlpang@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agokexec: make a pair of map/unmap reserved pages in error path
Minfei Huang [Mon, 23 May 2016 23:24:16 +0000 (16:24 -0700)]
kexec: make a pair of map/unmap reserved pages in error path

For some arch, kexec shall map the reserved pages, then use them, when
we try to start the kdump service.

kexec may return directly, without unmaping the reserved pages, if it
fails during starting service.  To fix it, we make a pair of map/unmap
reserved pages both in generic path and error path.

This patch only affects s390.  Other architecturess don't implement the
interface of crash_unmap_reserved_pages and crash_map_reserved_pages.

It isn't a urgent patch.  Kernel can work well without any risk,
although the reserved pages are not unmapped before returning in error
path.

Signed-off-by: Minfei Huang <mnfhuang@gmail.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Xunlei Pang <xlpang@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agokexec: provide arch_kexec_protect(unprotect)_crashkres()
Xunlei Pang [Mon, 23 May 2016 23:24:13 +0000 (16:24 -0700)]
kexec: provide arch_kexec_protect(unprotect)_crashkres()

Implement the protection method for the crash kernel memory reservation
for the 64-bit x86 kdump.

Signed-off-by: Xunlei Pang <xlpang@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Minfei Huang <mhuang@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agokexec: introduce a protection mechanism for the crashkernel reserved memory
Xunlei Pang [Mon, 23 May 2016 23:24:10 +0000 (16:24 -0700)]
kexec: introduce a protection mechanism for the crashkernel reserved memory

For the cases that some kernel (module) path stamps the crash reserved
memory(already mapped by the kernel) where has been loaded the second
kernel data, the kdump kernel will probably fail to boot when panic
happens (or even not happens) leaving the culprit at large, this is
unacceptable.

The patch introduces a mechanism for detecting such cases:

1) After each crash kexec loading, it simply marks the reserved memory
   regions readonly since we no longer access it after that.  When someone
   stamps the region, the first kernel will panic and trigger the kdump.
   The weak arch_kexec_protect_crashkres() is introduced to do the actual
   protection.

2) To allow multiple loading, once 1) was done we also need to remark
   the reserved memory to readwrite each time a system call related to
   kdump is made.  The weak arch_kexec_unprotect_crashkres() is introduced
   to do the actual protection.

The architecture can make its specific implementation by overriding
arch_kexec_protect_crashkres() and arch_kexec_unprotect_crashkres().

Signed-off-by: Xunlei Pang <xlpang@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Minfei Huang <mhuang@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agoexec: remove the no longer needed remove_arg_zero()->free_arg_page()
Oleg Nesterov [Mon, 23 May 2016 23:24:08 +0000 (16:24 -0700)]
exec: remove the no longer needed remove_arg_zero()->free_arg_page()

remove_arg_zero() does free_arg_page() for no reason.  This was needed
before and only if CONFIG_MMU=y: see commit 4fc75ff4816c ("exec: fix
remove_arg_zero"), install_arg_page() was called for every page != NULL
in bprm->page[] array.  Today install_arg_page() has already gone and
free_arg_page() is nop after another commit b6a2fea39318 ("mm: variable
length argument support").

CONFIG_MMU=n does free_arg_pages() in free_bprm() and thus it doesn't
need remove_arg_zero()->free_arg_page() too; apart from get_arg_page()
it never checks if the page in bprm->page[] was allocated or not, so the
"extra" non-freed page is fine.  OTOH, this free_arg_page() can add the
minor pessimization, the caller is going to do copy_strings_kernel()
right after remove_arg_zero() which will likely need to re-allocate the
same page again.

And as Hujunjie pointed out, the "offset == PAGE_SIZE" check is wrong
because we are going to increment bprm->p once again before return, so
CONFIG_MMU=n "leaks" the page anyway if '0' is the final byte in this
page.

NOTE: remove_arg_zero() assumes that argv[0] is null-terminated but this
is not necessarily true.  copy_strings() does "len = strnlen_user(...)",
then copy_from_user(len) but another thread or debuger can overwrite the
trailing '0' in between.  Afaics nothing really bad can happen because
we must always have the null-terminated bprm->filename copied by the 1st
copy_strings_kernel(), but perhaps we should change this code to check
"bprm->p < bprm->exec" anyway, and/or change copy_strings() to ensure
that the last byte in string is always zero.

Link: http://lkml.kernel.org/r/20160517155335.GA31435@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reported by: hujunjie <jj.net@163.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agokernek/fork.c: allocate idle task for a CPU always on its local node
Andi Kleen [Mon, 23 May 2016 23:24:05 +0000 (16:24 -0700)]
kernek/fork.c: allocate idle task for a CPU always on its local node

Linux preallocates the task structs of the idle tasks for all possible
CPUs.  This currently means they all end up on node 0.  This also
implies that the cache line of MWAIT, which is around the flags field in
the task struct, are all located in node 0.

We see a noticeable performance improvement on Knights Landing CPUs when
the cache lines used for MWAIT are located in the local nodes of the
CPUs using them.  I would expect this to give a (likely slight)
improvement on other systems too.

The patch implements placing the idle task in the node of its CPUs, by
passing the right target node to copy_process()

[akpm@linux-foundation.org: use NUMA_NO_NODE, not a bare -1]
Link: http://lkml.kernel.org/r/1463492694-15833-1-git-send-email-andi@firstfloor.org
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agosignal: move the "sig < SIGRTMIN" check into siginmask(sig)
Oleg Nesterov [Mon, 23 May 2016 23:24:02 +0000 (16:24 -0700)]
signal: move the "sig < SIGRTMIN" check into siginmask(sig)

All the users of siginmask() must ensure that sig < SIGRTMIN.  sig_fatal()
doesn't and this is wrong:

UBSAN: Undefined behaviour in kernel/signal.c:911:6
shift exponent 32 is too large for 32-bit type 'long unsigned int'

the patch doesn't add the neccesary check to sig_fatal(), it moves the
check into siginmask() and updates other callers.

Link: http://lkml.kernel.org/r/20160517195052.GA15187@redhat.com
Reported-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>