Oleg Nesterov [Tue, 29 Nov 2016 17:51:10 +0000 (18:51 +0100)]
kthread: Don't abuse kthread_create_on_cpu() in __kthread_create_worker()
kthread_create_on_cpu() sets KTHREAD_IS_PER_CPU and kthread->cpu, this
only makes sense if this kthread can be parked/unparked by cpuhp code.
kthread workers never call kthread_parkme() so this has no effect.
Change __kthread_create_worker() to simply call kthread_bind(task, cpu).
The very fact that kthread_create_on_cpu() doesn't accept a generic fmt
shows that it should not be used outside of smpboot.c.
Now, the only reason we can not unexport this helper and move it into
smpboot.c is that it sets kthread->cpu and struct kthread is not exported.
And the only reason we can not kill kthread->cpu is that kthread_unpark()
is used by drivers/gpu/drm/amd/scheduler/gpu_scheduler.c and thus we can
not turn _unpark into kthread_unpark(struct smp_hotplug_thread *, cpu).
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Tested-by: Petr Mladek <pmladek@suse.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Chunming Zhou <David1.Zhou@amd.com>
Cc: Roman Pen <roman.penyaev@profitbricks.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Tejun Heo <tj@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20161129175110.GA5342@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Oleg Nesterov [Tue, 29 Nov 2016 17:51:07 +0000 (18:51 +0100)]
kthread: Don't use to_live_kthread() in kthread_[un]park()
Now that to_kthread() is always validm change kthread_park() and
kthread_unpark() to use it and kill to_live_kthread().
The conversion of kthread_unpark() is trivial. If KTHREAD_IS_PARKED is set
then the task has called complete(&self->parked) and there the function
cannot race against a concurrent kthread_stop() and exit.
kthread_park() is more tricky, because its semantics are not well
defined. It returns -ENOSYS if the thread exited but this can never happen
and as Roman pointed out kthread_park() can obviously block forever if it
would race with the exiting kthread.
The usage of kthread_park() in cpuhp code (cpu.c, smpboot.c, stop_machine.c)
is fine. It can never see an exiting/exited kthread, smpboot_destroy_threads()
clears *ht->store, smpboot_park_thread() checks it is not NULL under the same
smpboot_threads_lock. cpuhp_threads and cpu_stop_threads never exit, so other
callers are fine too.
But it has two more users:
- watchdog_park_threads():
The code is actually correct, get_online_cpus() ensures that
kthread_park() can't race with itself (note that kthread_park() can't
handle this race correctly), but it should not use kthread_park()
directly.
- drivers/gpu/drm/amd/scheduler/gpu_scheduler.c should not use
kthread_park() either.
kthread_park() must not be called after amd_sched_fini() which does
kthread_stop(), otherwise even to_live_kthread() is not safe because
task_struct can be already freed and sched->thread can point to nowhere.
The usage of kthread_park/unpark should either be restricted to core code
which is properly protected against the exit race or made more robust so it
is safe to use it in drivers.
To catch eventual exit issues, add a WARN_ON(PF_EXITING) for now.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Chunming Zhou <David1.Zhou@amd.com>
Cc: Roman Pen <roman.penyaev@profitbricks.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Tejun Heo <tj@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20161129175107.GA5339@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Oleg Nesterov [Tue, 29 Nov 2016 17:51:03 +0000 (18:51 +0100)]
kthread: Don't use to_live_kthread() in kthread_stop()
kthread_stop() had to use to_live_kthread() simply because it was not
possible to access kthread->exited after the exiting task clears
task_struct->vfork_done. Now that to_kthread() is always valid,
wake_up_process() + wait_for_completion() can be done
ununconditionally. It's not an issue anymore if the task has already issued
complete_vfork_done() or died.
The exiting task can get the spurious wakeup after mm_release() but this is
possible without this change too and is fine; do_task_dead() ensures that
this can't make any harm.
As a further enhancement this could be converted to task_work_add() later,
so ->vfork_done can be avoided completely.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Chunming Zhou <David1.Zhou@amd.com>
Cc: Roman Pen <roman.penyaev@profitbricks.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Tejun Heo <tj@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20161129175103.GA5336@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Oleg Nesterov [Tue, 29 Nov 2016 17:51:00 +0000 (18:51 +0100)]
Revert "kthread: Pin the stack via try_get_task_stack()/put_task_stack() in to_live_kthread() function"
This reverts commit
23196f2e5f5d810578a772785807dcdc2b9fdce9.
Now that struct kthread is kmalloc'ed and not longer on the task stack
there is no need anymore to pin the stack.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Chunming Zhou <David1.Zhou@amd.com>
Cc: Roman Pen <roman.penyaev@profitbricks.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Tejun Heo <tj@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20161129175100.GA5333@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Oleg Nesterov [Tue, 29 Nov 2016 17:50:57 +0000 (18:50 +0100)]
kthread: Make struct kthread kmalloc'ed
commit
23196f2e5f5d "kthread: Pin the stack via try_get_task_stack() /
put_task_stack() in to_live_kthread() function" is a workaround for the
fragile design of struct kthread being allocated on the task stack.
struct kthread in its current form should be removed, but this needs
cleanups outside of kthread.c.
As a first step move struct kthread away from the task stack by making it
kmalloc'ed. This allows to access kthread.exited without the magic of
trying to pin task stack and the try logic in to_live_kthread().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Chunming Zhou <David1.Zhou@amd.com>
Cc: Roman Pen <roman.penyaev@profitbricks.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Tejun Heo <tj@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20161129175057.GA5330@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Peter Zijlstra [Tue, 22 Nov 2016 09:57:15 +0000 (10:57 +0100)]
x86/uaccess, sched/preempt: Verify access_ok() context
I recently encountered wreckage because access_ok() was used where it
should not be, add an explicit WARN when access_ok() is used wrongly.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Ingo Molnar [Wed, 30 Nov 2016 07:33:54 +0000 (08:33 +0100)]
sched/x86: Make CONFIG_SCHED_MC_PRIO=y easier to enable
Right now CONFIG_SCHED_MC_PRIO has X86_INTEL_PSTATE as a dependency,
which is not enabled by default and which hides the CONFIG_SCHED_MC_PRIO
hardware-enabling feature.
Select X86_INTEL_PSTATE instead, plus its dependency (CPU_FREQ), if the
user enables CONFIG_SCHED_MC_PRIO=y.
(Also align the CONFIG_SCHED_MC_PRIO Kconfig help text in standard style.)
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: bp@suse.de
Cc: jolsa@redhat.com
Cc: linux-acpi@vger.kernel.org
Cc: linux-pm@vger.kernel.org
Cc: rjw@rjwysocki.net
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tim Chen [Tue, 29 Nov 2016 18:43:27 +0000 (10:43 -0800)]
sched/x86: Change CONFIG_SCHED_ITMT to CONFIG_SCHED_MC_PRIO
Rename CONFIG_SCHED_ITMT for Intel Turbo Boost Max Technology 3.0
to CONFIG_SCHED_MC_PRIO. This makes the configuration extensible
in future to other architectures that wish to similarly establish
CPU core priorities support in the scheduler.
The description in Kconfig is updated to reflect this change with
added details for better clarity. The configuration is explicitly
default-y, to enable the feature on CPUs that have this feature.
It has no effect on non-TBM3 CPUs.
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bp@suse.de
Cc: jolsa@redhat.com
Cc: linux-acpi@vger.kernel.org
Cc: linux-pm@vger.kernel.org
Cc: rjw@rjwysocki.net
Link: http://lkml.kernel.org/r/2b2ee29d93e3f162922d72d0165a1405864fbb23.1480444902.git.tim.c.chen@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Ingo Molnar [Mon, 28 Nov 2016 08:43:49 +0000 (09:43 +0100)]
x86/sched: Use #include <linux/mutex.h> instead of #include <asm/mutex.h>
asm/mutex.h is gone from the locking tree, which makes sched/core break the build.
Use linux/mutex.h instead, which is the canonical method.
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: peterz@infradead.org
Cc: jolsa@redhat.com
Cc: rjw@rjwysocki.net
Cc: bp@suse.de
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Rafael J. Wysocki [Tue, 22 Nov 2016 20:24:00 +0000 (12:24 -0800)]
cpufreq/intel_pstate: Use CPPC to get max performance
Use the acpi cppc_lib interface to get CPPC performance limits and update
the per cpu priority for the ITMT scheduler. If the highest performance of
CPUs differs the ITMT feature is enabled.
Co-developed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Cc: linux-pm@vger.kernel.org
Cc: peterz@infradead.org
Cc: jolsa@redhat.com
Cc: rjw@rjwysocki.net
Cc: linux-acpi@vger.kernel.org
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: bp@suse.de
Link: http://lkml.kernel.org/r/0998b98943bcdec7d1ddd4ff27358da555ea8e92.1479844244.git.tim.c.chen@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Srinivas Pandruvada [Tue, 22 Nov 2016 20:23:59 +0000 (12:23 -0800)]
acpi/bus: Set _OSC for diverse core support
Set the OSC_SB_CPC_DIVERSE_HIGH_SUPPORT (bit 12) to enable diverse
core support.
This is required to enable the BIOS support of the Intel Turbo Boost Max
Technology 3.0 feature.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Cc: linux-pm@vger.kernel.org
Cc: peterz@infradead.org
Cc: jolsa@redhat.com
Cc: rjw@rjwysocki.net
Cc: linux-acpi@vger.kernel.org
Cc: bp@suse.de
Link: http://lkml.kernel.org/r/a023623a727e86040a1715797055f6402caefd7e.1479844244.git.tim.c.chen@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Srinivas Pandruvada [Tue, 22 Nov 2016 20:23:58 +0000 (12:23 -0800)]
acpi/bus: Enable HWP CPPC objects
Need to set platform wide _OSC bits to enable CPPC and CPPC version 2.
If platform supports CPPC, then BIOS exposes CPPC tables.
The primary reason to enable CPPC support is to get the maximum
performance of each CPU to check and enable Intel Turbo Boost Max
Technology 3.0 (ITMT).
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Cc: linux-pm@vger.kernel.org
Cc: peterz@infradead.org
Cc: jolsa@redhat.com
Cc: rjw@rjwysocki.net
Cc: linux-acpi@vger.kernel.org
Cc: bp@suse.de
Link: http://lkml.kernel.org/r/a696f6b17843cee9a542482fae6abab087be9587.1479844244.git.tim.c.chen@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tim Chen [Tue, 22 Nov 2016 20:23:57 +0000 (12:23 -0800)]
x86/sched: Add SD_ASYM_PACKING flags to x86 ITMT CPU
Some Intel cores in a package can be boosted to a higher turbo frequency
with ITMT 3.0 technology. The scheduler can use the asymmetric packing
feature to move tasks to the more capable cores.
If ITMT is enabled, add SD_ASYM_PACKING flag to the thread and core
sched domains to enable asymmetric packing.
Co-developed-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Cc: linux-pm@vger.kernel.org
Cc: peterz@infradead.org
Cc: jolsa@redhat.com
Cc: rjw@rjwysocki.net
Cc: linux-acpi@vger.kernel.org
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: bp@suse.de
Link: http://lkml.kernel.org/r/9bbb885bedbef4eb50e197305eb16b160cff0831.1479844244.git.tim.c.chen@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tim Chen [Tue, 22 Nov 2016 20:23:56 +0000 (12:23 -0800)]
x86/sysctl: Add sysctl for ITMT scheduling feature
Intel Turbo Boost Max Technology 3.0 (ITMT) feature
allows some cores to be boosted to higher turbo
frequency than others.
Add /proc/sys/kernel/sched_itmt_enabled so operator
can enable/disable scheduling of tasks that favor cores
with higher turbo boost frequency potential.
By default, system that is ITMT capable and single
socket has this feature turned on. It is more likely
to be lightly loaded and operates in Turbo range.
When there is a change in the ITMT scheduling operation
desired, a rebuild of the sched domain is initiated
so the scheduler can set up sched domains with appropriate
flag to enable/disable ITMT scheduling operations.
Co-developed-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Co-developed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Cc: linux-pm@vger.kernel.org
Cc: peterz@infradead.org
Cc: jolsa@redhat.com
Cc: rjw@rjwysocki.net
Cc: linux-acpi@vger.kernel.org
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: bp@suse.de
Link: http://lkml.kernel.org/r/07cc62426a28bad57b01ab16bb903a9c84fa5421.1479844244.git.tim.c.chen@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tim Chen [Tue, 22 Nov 2016 20:23:55 +0000 (12:23 -0800)]
x86: Enable Intel Turbo Boost Max Technology 3.0
On platforms supporting Intel Turbo Boost Max Technology 3.0, the maximum
turbo frequencies of some cores in a CPU package may be higher than for
the other cores in the same package. In that case, better performance
(and possibly lower energy consumption as well) can be achieved by
making the scheduler prefer to run tasks on the CPUs with higher max
turbo frequencies.
To that end, set up a core priority metric to abstract the core
preferences based on the maximum turbo frequency. In that metric,
the cores with higher maximum turbo frequencies are higher-priority
than the other cores in the same package and that causes the scheduler
to favor them when making load-balancing decisions using the asymmertic
packing approach. At the same time, the priority of SMT threads with a
higher CPU number is reduced so as to avoid scheduling tasks on all of
the threads that belong to a favored core before all of the other cores
have been given a task to run.
The priority metric will be initialized by the P-state driver with the
help of the sched_set_itmt_core_prio() function. The P-state driver
will also determine whether or not ITMT is supported by the platform
and will call sched_set_itmt_support() to indicate that.
Co-developed-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Co-developed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Cc: linux-pm@vger.kernel.org
Cc: peterz@infradead.org
Cc: jolsa@redhat.com
Cc: rjw@rjwysocki.net
Cc: linux-acpi@vger.kernel.org
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: bp@suse.de
Link: http://lkml.kernel.org/r/cd401ccdff88f88c8349314febdc25d51f7c48f7.1479844244.git.tim.c.chen@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tim Chen [Tue, 22 Nov 2016 20:23:54 +0000 (12:23 -0800)]
x86/topology: Define x86's arch_update_cpu_topology
The scheduler calls arch_update_cpu_topology() to check whether the
scheduler domains have to be rebuilt.
So far x86 has no requirement for this, but the upcoming ITMT support
makes this necessary.
Request the rebuild when the x86 internal update flag is set.
Suggested-by: Morten Rasmussen <morten.rasmussen@arm.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Cc: linux-pm@vger.kernel.org
Cc: peterz@infradead.org
Cc: jolsa@redhat.com
Cc: rjw@rjwysocki.net
Cc: linux-acpi@vger.kernel.org
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: bp@suse.de
Link: http://lkml.kernel.org/r/bfbf5591276ec60b2af2da798adc1060df1e2a5f.1479844244.git.tim.c.chen@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tim Chen [Tue, 22 Nov 2016 20:23:53 +0000 (12:23 -0800)]
sched: Extend scheduler's asym packing
We generalize the scheduler's asym packing to provide an ordering
of the cpu beyond just the cpu number. This allows the use of the
ASYM_PACKING scheduler machinery to move loads to preferred CPU in a
sched domain. The preference is defined with the cpu priority
given by arch_asym_cpu_priority(cpu).
We also record the most preferred cpu in a sched group when
we build the cpu's capacity for fast lookup of preferred cpu
during load balancing.
Co-developed-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: linux-pm@vger.kernel.org
Cc: jolsa@redhat.com
Cc: rjw@rjwysocki.net
Cc: linux-acpi@vger.kernel.org
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: bp@suse.de
Link: http://lkml.kernel.org/r/0e73ae12737dfaafa46c07066cc7c5d3f1675e46.1479844244.git.tim.c.chen@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Ingo Molnar [Wed, 23 Nov 2016 06:37:00 +0000 (07:37 +0100)]
sched/fair: Clean up the tunable parameter definitions
No change in functionality:
- align the default values vertically to make them easier to scan
- standardize the 'default:' lines
- fix minor whitespace typos
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
T.Zhou [Wed, 23 Nov 2016 00:48:32 +0000 (08:48 +0800)]
sched/dl: Fix comment in pick_next_task_dl()
Fix cut & paste oversight:
s/pull_rt_task/pull_dl_task
Signed-off-by: T.Zhou <t1zhou@163.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: juri.lelli@gmail.com
Link: http://lkml.kernel.org/r/20161123004832.GA2983@geo
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Ingo Molnar [Wed, 23 Nov 2016 09:23:09 +0000 (10:23 +0100)]
Merge branch 'linus' into sched/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Linus Torvalds [Tue, 22 Nov 2016 21:53:01 +0000 (13:53 -0800)]
Merge branch 'for-rc' of git://git./linux/kernel/git/rzhang/linux
Pull thermal management fix from Zhang Rui:
"We only have one urgent fix this time.
Commit
3105f234e0ab ("thermal/powerclamp: correct cpu support check"),
which is shipped in 4.9-rc3, fixed a problem introduced by commit
b721ca0d1927 ("thermal/powerclamp: remove cpu whitelist").
But unfortunately, it broke intel_powerclamp driver module auto-
loading at the same time. Thus we need this change to add back module
auto-loading for 4.9"
* 'for-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux:
thermal/powerclamp: add back module device table
Linus Torvalds [Tue, 22 Nov 2016 21:48:05 +0000 (13:48 -0800)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Two small fixes.
One prevents timeouts on mpt3sas when trying to use the secure erase
protocol which causes the erase protocol to be aborted. The second is
a regression in a prior fix which causes all commands to abort during
PCI extended error recovery, which is incorrect because PCI EEH is
independent from what's happening on the FC transport"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: qla2xxx: do not abort all commands in the adapter during EEH recovery
scsi: mpt3sas: Fix secure erase premature termination
Linus Torvalds [Tue, 22 Nov 2016 21:20:34 +0000 (13:20 -0800)]
Merge tag 'clk-fixes-for-linus' of git://git./linux/kernel/git/clk/linux
Pull clk fixes from Stephen Boyd:
"A handful of driver fixes.
The sunxi fixes are for an incorrect clk tree configuration and a bad
frequency calculation. The other two are fixes for passing the wrong
pointer in drivers recently converted to clk_hw style registration"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: efm32gg: Pass correct type to hw provider registration
clk: berlin: Pass correct type to hw provider registration
clk: sunxi: Fix M factor computation for APB1
clk: sunxi-ng: sun6i-a31: Force AHB1 clock to use PLL6 as parent
Linus Torvalds [Tue, 22 Nov 2016 20:51:35 +0000 (12:51 -0800)]
Merge branch 'sched-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
"Two fixes for autogroup scheduling, for races when turning the feature
on/off via /proc/sys/kernel/sched_autogroup_enabled"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/autogroup: Do not use autogroup->tg in zombie threads
sched/autogroup: Fix autogroup_move_group() to never skip sched_move_task()
Linus Torvalds [Tue, 22 Nov 2016 20:17:49 +0000 (12:17 -0800)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"Misc fixes:
- two fixes to make (very) old Intel CPUs boot reliably
- fix the intel-mid driver and rename it
- two KASAN false positive fixes
- an FPU fix
- two sysfb fixes
- two build fixes related to new toolchain versions"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/platform/intel-mid: Rename platform_wdt to platform_mrfld_wdt
x86/build: Build compressed x86 kernels as PIE when !CONFIG_RELOCATABLE as well
x86/platform/intel-mid: Register watchdog device after SCU
x86/fpu: Fix invalid FPU ptrace state after execve()
x86/boot: Fail the boot if !M486 and CPUID is missing
x86/traps: Ignore high word of regs->cs in early_fixup_exception()
x86/dumpstack: Prevent KASAN false positive warnings
x86/unwind: Prevent KASAN false positive warnings in guess unwinder
x86/boot: Avoid warning for zero-filling .bss
x86/sysfb: Fix lfb_size calculation
x86/sysfb: Add support for 64bit EFI lfb_base
Oleg Nesterov [Mon, 14 Nov 2016 18:46:12 +0000 (19:46 +0100)]
sched/autogroup: Do not use autogroup->tg in zombie threads
Exactly because for_each_thread() in autogroup_move_group() can't see it
and update its ->sched_task_group before _put() and possibly free().
So the exiting task needs another sched_move_task() before exit_notify()
and we need to re-introduce the PF_EXITING (or similar) check removed by
the previous change for another reason.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: hartsjc@redhat.com
Cc: vbendel@redhat.com
Cc: vlovejoy@redhat.com
Link: http://lkml.kernel.org/r/20161114184612.GA15968@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Oleg Nesterov [Mon, 14 Nov 2016 18:46:09 +0000 (19:46 +0100)]
sched/autogroup: Fix autogroup_move_group() to never skip sched_move_task()
The PF_EXITING check in task_wants_autogroup() is no longer needed. Remove
it, but see the next patch.
However the comment is correct in that autogroup_move_group() must always
change task_group() for every thread so the sysctl_ check is very wrong;
we can race with cgroups and even sys_setsid() is not safe because a task
running with task_group() == ag->tg must participate in refcounting:
int main(void)
{
int sctl = open("/proc/sys/kernel/sched_autogroup_enabled", O_WRONLY);
assert(sctl > 0);
if (fork()) {
wait(NULL); // destroy the child's ag/tg
pause();
}
assert(pwrite(sctl, "1\n", 2, 0) == 2);
assert(setsid() > 0);
if (fork())
pause();
kill(getppid(), SIGKILL);
sleep(1);
// The child has gone, the grandchild runs with kref == 1
assert(pwrite(sctl, "0\n", 2, 0) == 2);
assert(setsid() > 0);
// runs with the freed ag/tg
for (;;)
sleep(1);
return 0;
}
crashes the kernel. It doesn't really need sleep(1), it doesn't matter if
autogroup_move_group() actually frees the task_group or this happens later.
Reported-by: Vern Lovejoy <vlovejoy@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: hartsjc@redhat.com
Cc: vbendel@redhat.com
Link: http://lkml.kernel.org/r/20161114184609.GA15965@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Linus Torvalds [Mon, 21 Nov 2016 23:27:41 +0000 (15:27 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jmorris/linux-security
Pull apparmor bugfix from James Morris:
"This has a fix for a policy replacement bug that is fairly serious for
apache mod_apparmor users, as it results in the wrong policy being
applied on an network facing service"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
apparmor: fix change_hat not finding hat after policy replacement
Linus Torvalds [Mon, 21 Nov 2016 21:56:17 +0000 (13:56 -0800)]
Merge git://git./linux/kernel/git/davem/sparc
Pull sparc fixes from David Miller:
1) With modern networking cards we can run out of 32-bit DMA space, so
support 64-bit DMA addressing when possible on sparc64. From Dave
Tushar.
2) Some signal frame validation checks are inverted on sparc32, fix
from Andreas Larsson.
3) Lockdep tables can get too large in some circumstances on sparc64,
add a way to adjust the size a bit. From Babu Moger.
4) Fix NUMA node probing on some sun4v systems, from Thomas Tai.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
sparc: drop duplicate header scatterlist.h
lockdep: Limit static allocations if PROVE_LOCKING_SMALL is defined
config: Adding the new config parameter CONFIG_PROVE_LOCKING_SMALL for sparc
sunbmac: Fix compiler warning
sunqe: Fix compiler warnings
sparc64: Enable 64-bit DMA
sparc64: Enable sun4v dma ops to use IOMMU v2 APIs
sparc64: Bind PCIe devices to use IOMMU v2 service
sparc64: Initialize iommu_map_table and iommu_pool
sparc64: Add ATU (new IOMMU) support
sparc64: Add FORCE_MAX_ZONEORDER and default to 13
sparc64: fix compile warning section mismatch in find_node()
sparc32: Fix inverted invalid_frame_pointer checks on sigreturns
sparc64: Fix find_node warning if numa node cannot be found
Linus Torvalds [Mon, 21 Nov 2016 21:26:28 +0000 (13:26 -0800)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) Clear congestion control state when changing algorithms on an
existing socket, from Florian Westphal.
2) Fix register bit values in altr_tse_pcs portion of stmmac driver,
from Jia Jie Ho.
3) Fix PTP handling in stammc driver for GMAC4, from Giuseppe
CAVALLARO.
4) Fix udplite multicast delivery handling, it ignores the udp_table
parameter passed into the lookups, from Pablo Neira Ayuso.
5) Synchronize the space estimated by rtnl_vfinfo_size and the space
actually used by rtnl_fill_vfinfo. From Sabrina Dubroca.
6) Fix memory leak in fib_info when splitting nodes, from Alexander
Duyck.
7) If a driver does a napi_hash_del() explicitily and not via
netif_napi_del(), it must perform RCU synchronization as needed. Fix
this in virtio-net and bnxt drivers, from Eric Dumazet.
8) Likewise, it is not necessary to invoke napi_hash_del() is we are
also doing neif_napi_del() in the same code path. Remove such calls
from be2net and cxgb4 drivers, also from Eric Dumazet.
9) Don't allocate an ID in peernet2id_alloc() if the netns is dead,
from WANG Cong.
10) Fix OF node and device struct leaks in of_mdio, from Johan Hovold.
11) We cannot cache routes in ip6_tunnel when using inherited traffic
classes, from Paolo Abeni.
12) Fix several crashes and leaks in cpsw driver, from Johan Hovold.
13) Splice operations cannot use freezable blocking calls in AF_UNIX,
from WANG Cong.
14) Link dump filtering by master device and kind support added an error
in loop index updates during the dump if we actually do filter, fix
from Zhang Shengju.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (59 commits)
tcp: zero ca_priv area when switching cc algorithms
net: l2tp: Treat NET_XMIT_CN as success in l2tp_eth_dev_xmit
ethernet: stmmac: make DWMAC_STM32 depend on it's associated SoC
tipc: eliminate obsolete socket locking policy description
rtnl: fix the loop index update error in rtnl_dump_ifinfo()
l2tp: fix racy SOCK_ZAPPED flag check in l2tp_ip{,6}_bind()
net: macb: add check for dma mapping error in start_xmit()
rtnetlink: fix FDB size computation
netns: fix get_net_ns_by_fd(int pid) typo
af_unix: conditionally use freezable blocking calls in read
net: ethernet: ti: cpsw: fix fixed-link phy probe deferral
net: ethernet: ti: cpsw: add missing sanity check
net: ethernet: ti: cpsw: fix secondary-emac probe error path
net: ethernet: ti: cpsw: fix of_node and phydev leaks
net: ethernet: ti: cpsw: fix deferred probe
net: ethernet: ti: cpsw: fix mdio device reference leak
net: ethernet: ti: cpsw: fix bad register access in probe error path
net: sky2: Fix shutdown crash
cfg80211: limit scan results cache size
net sched filters: pass netlink message flags in event notification
...
Florian Westphal [Mon, 21 Nov 2016 09:08:37 +0000 (10:08 +0100)]
tcp: zero ca_priv area when switching cc algorithms
We need to zero out the private data area when application switches
connection to different algorithm (TCP_CONGESTION setsockopt).
When congestion ops get assigned at connect time everything is already
zeroed because sk_alloc uses GFP_ZERO flag. But in the setsockopt case
this contains whatever previous cc placed there.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gao Feng [Mon, 21 Nov 2016 00:56:21 +0000 (08:56 +0800)]
net: l2tp: Treat NET_XMIT_CN as success in l2tp_eth_dev_xmit
The tc could return NET_XMIT_CN as one congestion notification, but
it does not mean the packe is lost. Other modules like ipvlan,
macvlan, and others treat NET_XMIT_CN as success too.
So l2tp_eth_dev_xmit should add the NET_XMIT_CN check.
Signed-off-by: Gao Feng <gfree.wind@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Peter Robinson [Sun, 20 Nov 2016 17:22:38 +0000 (17:22 +0000)]
ethernet: stmmac: make DWMAC_STM32 depend on it's associated SoC
There's not much point, except compile test, enabling the stmmac
platform drivers unless the STM32 SoC is enabled. It's not
useful without it.
Signed-off-by: Peter Robinson <pbrobinson@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jacob Pan [Mon, 14 Nov 2016 19:08:45 +0000 (11:08 -0800)]
thermal/powerclamp: add back module device table
Commit
3105f234e0aba43e44e277c20f9b32ee8add43d4 replaced module
cpu id table with a cpu feature check, which is logically correct.
But we need the module device table to allow module auto loading.
Cc: stable@vger.kernel.org # 4.8
Fixes:
3105f234 thermal/powerclamp: correct cpu support check
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Andy Shevchenko [Fri, 18 Nov 2016 17:27:23 +0000 (19:27 +0200)]
x86/platform/intel-mid: Rename platform_wdt to platform_mrfld_wdt
Rename the watchdog platform library file to explicitly show that is used only
on Intel Merrifield platforms.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20161118172723.179761-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
H.J. Lu [Fri, 18 Nov 2016 21:07:19 +0000 (13:07 -0800)]
x86/build: Build compressed x86 kernels as PIE when !CONFIG_RELOCATABLE as well
Since the bootloader may load the compressed x86 kernel at any address,
it should always be built as PIE, not just when CONFIG_RELOCATABLE=y.
Otherwise, linker in binutils 2.27 will optimize GOT load into the
absolute address when building the compressed x86 kernel as a non-PIE
executable.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
[ Small wording changes. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Andy Shevchenko [Fri, 18 Nov 2016 16:52:24 +0000 (18:52 +0200)]
x86/platform/intel-mid: Register watchdog device after SCU
Watchdog device in Intel Tangier relies on SCU to be present. It uses the SCU
IPC channel to send commands and receive responses. If watchdog driver is
initialized quite before SCU and a command has been sent the result is always
an error like the following:
intel_mid_wdt: Error stopping watchdog: 0xffffffed
Register watchdog device whne SCU is ready to avoid described issue.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20161118165224.175514-1-andriy.shevchenko@linux.intel.com
[ Small cleanups. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Yu-cheng Yu [Thu, 17 Nov 2016 17:11:35 +0000 (09:11 -0800)]
x86/fpu: Fix invalid FPU ptrace state after execve()
Robert O'Callahan reported that after an execve PTRACE_GETREGSET
NT_X86_XSTATE continues to return the pre-exec register values
until the exec'ed task modifies FPU state.
The test code is at:
https://bugzilla.redhat.com/attachment.cgi?id=
1164286.
What is happening is fpu__clear() does not properly clear fpstate.
Fix it by doing just that.
Reported-by: Robert O'Callahan <robert@ocallahan.org>
Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Cc: <stable@vger.kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1479402695-6553-1-git-send-email-yu-cheng.yu@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Andy Lutomirski [Sat, 19 Nov 2016 23:37:30 +0000 (15:37 -0800)]
x86/boot: Fail the boot if !M486 and CPUID is missing
Linux will have all kinds of sporadic problems on systems that don't
have the CPUID instruction unless CONFIG_M486=y. In particular,
sync_core() will explode.
I believe that these kernels had a better chance of working before
commit
05fb3c199bb0 ("x86/boot: Initialize FPU and X86_FEATURE_ALWAYS
even if we don't have CPUID"). That commit inadvertently fixed a
serious bug: we used to fail to detect the FPU if CPUID wasn't
present. Because we also used to forget to set X86_FEATURE_ALWAYS, we
end up with no cpu feature bits set at all. This meant that
alternative patching didn't do anything and, if paravirt was disabled,
we could plausibly finish the entire boot process without calling
sync_core().
Rather than trying to work around these issues, just have the kernel
fail loudly if it's running on a CPUID-less 486, doesn't have CPUID,
and doesn't have CONFIG_M486 set.
Reported-by: Matthew Whitehead <tedheadster@gmail.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/70eac6639f23df8be5fe03fa1984aedd5d40077a.1479598603.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Andy Lutomirski [Sun, 20 Nov 2016 02:42:40 +0000 (18:42 -0800)]
x86/traps: Ignore high word of regs->cs in early_fixup_exception()
On the 80486 DX, it seems that some exceptions may leave garbage in
the high bits of CS. This causes sporadic failures in which
early_fixup_exception() refuses to fix up an exception.
As far as I can tell, this has been buggy for a long time, but the
problem seems to have been exacerbated by commits:
1e02ce4cccdc ("x86: Store a per-cpu shadow copy of CR4")
e1bfc11c5a6f ("x86/init: Fix cr4_init_shadow() on CR4-less machines")
This appears to have broken for as long as we've had early
exception handling.
[ Note to stable maintainers: This patch is needed all the way back to 3.4,
but it will only apply to 4.6 and up, as it depends on commit:
0e861fbb5bda ("x86/head: Move early exception panic code into early_fixup_exception()")
If you want to backport to kernels before 4.6, please don't backport the
prerequisites (there was a big chain of them that rewrote a lot of the
early exception machinery); instead, ask me and I can send you a one-liner
that will apply. ]
Reported-by: Matthew Whitehead <tedheadster@gmail.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Fixes:
4c5023a3fa2e ("x86-32: Handle exception table entries during early boot")
Link: http://lkml.kernel.org/r/cb32c69920e58a1a58e7b5cad975038a69c0ce7d.1479609510.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
John Johansen [Thu, 1 Sep 2016 04:10:06 +0000 (21:10 -0700)]
apparmor: fix change_hat not finding hat after policy replacement
After a policy replacement, the task cred may be out of date and need
to be updated. However change_hat is using the stale profiles from
the out of date cred resulting in either: a stale profile being applied
or, incorrect failure when searching for a hat profile as it has been
migrated to the new parent profile.
Fixes:
01e2b670aa898a39259bc85c78e3d74820f4d3b6 (failure to find hat)
Fixes:
898127c34ec03291c86f4ff3856d79e9e18952bc (stale policy being applied)
Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=
1000287
Cc: stable@vger.kernel.org
Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Linus Torvalds [Sun, 20 Nov 2016 21:52:19 +0000 (13:52 -0800)]
Linux 4.9-rc6
Linus Torvalds [Sun, 20 Nov 2016 18:27:39 +0000 (10:27 -0800)]
Merge branch 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm
Pull ARM fixes from Russell King:
"A few more ARM fixes:
- the assembly backtrace code suffers problems with the new printk()
implementation which assumes that kernel messages without KERN_CONT
should have newlines inserted between them. Fix this.
- fix a section naming error - ".init.text" rather than ".text.init"
- preallocate DMA debug memory at core_initcall() time rather than
fs_initcall(), as we have some core drivers that need to use DMA
mapping - and that triggers a kernel warning from the DMA debug
code.
- fix XIP kernels after the ro_after_init changes made this data
permanently read-only"
* 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: Fix XIP kernels
ARM: 8628/1: dma-mapping: preallocate DMA-debug hash tables in core_initcall
ARM: 8624/1: proc-v7m.S: fix init section name
ARM: fix backtrace
Jon Paul Maloy [Sat, 19 Nov 2016 19:47:07 +0000 (14:47 -0500)]
tipc: eliminate obsolete socket locking policy description
The comment block in socket.c describing the locking policy is
obsolete, and does not reflect current reality. We remove it in this
commit.
Since the current locking policy is much simpler and follows a
mainstream approach, we see no need to add a new description.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Zhang Shengju [Sat, 19 Nov 2016 15:28:32 +0000 (23:28 +0800)]
rtnl: fix the loop index update error in rtnl_dump_ifinfo()
If the link is filtered out, loop index should also be updated. If not,
loop index will not be correct.
Fixes:
dc599f76c22b0 ("net: Add support for filtering link dump by master device and kind")
Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Guillaume Nault [Fri, 18 Nov 2016 21:13:00 +0000 (22:13 +0100)]
l2tp: fix racy SOCK_ZAPPED flag check in l2tp_ip{,6}_bind()
Lock socket before checking the SOCK_ZAPPED flag in l2tp_ip6_bind().
Without lock, a concurrent call could modify the socket flags between
the sock_flag(sk, SOCK_ZAPPED) test and the lock_sock() call. This way,
a socket could be inserted twice in l2tp_ip6_bind_table. Releasing it
would then leave a stale pointer there, generating use-after-free
errors when walking through the list or modifying adjacent entries.
BUG: KASAN: use-after-free in l2tp_ip6_close+0x22e/0x290 at addr
ffff8800081b0ed8
Write of size 8 by task syz-executor/10987
CPU: 0 PID: 10987 Comm: syz-executor Not tainted 4.8.0+ #39
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014
ffff880031d97838 ffffffff829f835b ffff88001b5a1640 ffff8800081b0ec0
ffff8800081b15a0 ffff8800081b6d20 ffff880031d97860 ffffffff8174d3cc
ffff880031d978f0 ffff8800081b0e80 ffff88001b5a1640 ffff880031d978e0
Call Trace:
[<
ffffffff829f835b>] dump_stack+0xb3/0x118 lib/dump_stack.c:15
[<
ffffffff8174d3cc>] kasan_object_err+0x1c/0x70 mm/kasan/report.c:156
[< inline >] print_address_description mm/kasan/report.c:194
[<
ffffffff8174d666>] kasan_report_error+0x1f6/0x4d0 mm/kasan/report.c:283
[< inline >] kasan_report mm/kasan/report.c:303
[<
ffffffff8174db7e>] __asan_report_store8_noabort+0x3e/0x40 mm/kasan/report.c:329
[< inline >] __write_once_size ./include/linux/compiler.h:249
[< inline >] __hlist_del ./include/linux/list.h:622
[< inline >] hlist_del_init ./include/linux/list.h:637
[<
ffffffff8579047e>] l2tp_ip6_close+0x22e/0x290 net/l2tp/l2tp_ip6.c:239
[<
ffffffff850b2dfd>] inet_release+0xed/0x1c0 net/ipv4/af_inet.c:415
[<
ffffffff851dc5a0>] inet6_release+0x50/0x70 net/ipv6/af_inet6.c:422
[<
ffffffff84c4581d>] sock_release+0x8d/0x1d0 net/socket.c:570
[<
ffffffff84c45976>] sock_close+0x16/0x20 net/socket.c:1017
[<
ffffffff817a108c>] __fput+0x28c/0x780 fs/file_table.c:208
[<
ffffffff817a1605>] ____fput+0x15/0x20 fs/file_table.c:244
[<
ffffffff813774f9>] task_work_run+0xf9/0x170
[<
ffffffff81324aae>] do_exit+0x85e/0x2a00
[<
ffffffff81326dc8>] do_group_exit+0x108/0x330
[<
ffffffff81348cf7>] get_signal+0x617/0x17a0 kernel/signal.c:2307
[<
ffffffff811b49af>] do_signal+0x7f/0x18f0
[<
ffffffff810039bf>] exit_to_usermode_loop+0xbf/0x150 arch/x86/entry/common.c:156
[< inline >] prepare_exit_to_usermode arch/x86/entry/common.c:190
[<
ffffffff81006060>] syscall_return_slowpath+0x1a0/0x1e0 arch/x86/entry/common.c:259
[<
ffffffff85e4d726>] entry_SYSCALL_64_fastpath+0xc4/0xc6
Object at
ffff8800081b0ec0, in cache L2TP/IPv6 size: 1448
Allocated:
PID = 10987
[ 1116.897025] [<
ffffffff811ddcb6>] save_stack_trace+0x16/0x20
[ 1116.897025] [<
ffffffff8174c736>] save_stack+0x46/0xd0
[ 1116.897025] [<
ffffffff8174c9ad>] kasan_kmalloc+0xad/0xe0
[ 1116.897025] [<
ffffffff8174cee2>] kasan_slab_alloc+0x12/0x20
[ 1116.897025] [< inline >] slab_post_alloc_hook mm/slab.h:417
[ 1116.897025] [< inline >] slab_alloc_node mm/slub.c:2708
[ 1116.897025] [< inline >] slab_alloc mm/slub.c:2716
[ 1116.897025] [<
ffffffff817476a8>] kmem_cache_alloc+0xc8/0x2b0 mm/slub.c:2721
[ 1116.897025] [<
ffffffff84c4f6a9>] sk_prot_alloc+0x69/0x2b0 net/core/sock.c:1326
[ 1116.897025] [<
ffffffff84c58ac8>] sk_alloc+0x38/0xae0 net/core/sock.c:1388
[ 1116.897025] [<
ffffffff851ddf67>] inet6_create+0x2d7/0x1000 net/ipv6/af_inet6.c:182
[ 1116.897025] [<
ffffffff84c4af7b>] __sock_create+0x37b/0x640 net/socket.c:1153
[ 1116.897025] [< inline >] sock_create net/socket.c:1193
[ 1116.897025] [< inline >] SYSC_socket net/socket.c:1223
[ 1116.897025] [<
ffffffff84c4b46f>] SyS_socket+0xef/0x1b0 net/socket.c:1203
[ 1116.897025] [<
ffffffff85e4d685>] entry_SYSCALL_64_fastpath+0x23/0xc6
Freed:
PID = 10987
[ 1116.897025] [<
ffffffff811ddcb6>] save_stack_trace+0x16/0x20
[ 1116.897025] [<
ffffffff8174c736>] save_stack+0x46/0xd0
[ 1116.897025] [<
ffffffff8174cf61>] kasan_slab_free+0x71/0xb0
[ 1116.897025] [< inline >] slab_free_hook mm/slub.c:1352
[ 1116.897025] [< inline >] slab_free_freelist_hook mm/slub.c:1374
[ 1116.897025] [< inline >] slab_free mm/slub.c:2951
[ 1116.897025] [<
ffffffff81748b28>] kmem_cache_free+0xc8/0x330 mm/slub.c:2973
[ 1116.897025] [< inline >] sk_prot_free net/core/sock.c:1369
[ 1116.897025] [<
ffffffff84c541eb>] __sk_destruct+0x32b/0x4f0 net/core/sock.c:1444
[ 1116.897025] [<
ffffffff84c5aca4>] sk_destruct+0x44/0x80 net/core/sock.c:1452
[ 1116.897025] [<
ffffffff84c5ad33>] __sk_free+0x53/0x220 net/core/sock.c:1460
[ 1116.897025] [<
ffffffff84c5af23>] sk_free+0x23/0x30 net/core/sock.c:1471
[ 1116.897025] [<
ffffffff84c5cb6c>] sk_common_release+0x28c/0x3e0 ./include/net/sock.h:1589
[ 1116.897025] [<
ffffffff8579044e>] l2tp_ip6_close+0x1fe/0x290 net/l2tp/l2tp_ip6.c:243
[ 1116.897025] [<
ffffffff850b2dfd>] inet_release+0xed/0x1c0 net/ipv4/af_inet.c:415
[ 1116.897025] [<
ffffffff851dc5a0>] inet6_release+0x50/0x70 net/ipv6/af_inet6.c:422
[ 1116.897025] [<
ffffffff84c4581d>] sock_release+0x8d/0x1d0 net/socket.c:570
[ 1116.897025] [<
ffffffff84c45976>] sock_close+0x16/0x20 net/socket.c:1017
[ 1116.897025] [<
ffffffff817a108c>] __fput+0x28c/0x780 fs/file_table.c:208
[ 1116.897025] [<
ffffffff817a1605>] ____fput+0x15/0x20 fs/file_table.c:244
[ 1116.897025] [<
ffffffff813774f9>] task_work_run+0xf9/0x170
[ 1116.897025] [<
ffffffff81324aae>] do_exit+0x85e/0x2a00
[ 1116.897025] [<
ffffffff81326dc8>] do_group_exit+0x108/0x330
[ 1116.897025] [<
ffffffff81348cf7>] get_signal+0x617/0x17a0 kernel/signal.c:2307
[ 1116.897025] [<
ffffffff811b49af>] do_signal+0x7f/0x18f0
[ 1116.897025] [<
ffffffff810039bf>] exit_to_usermode_loop+0xbf/0x150 arch/x86/entry/common.c:156
[ 1116.897025] [< inline >] prepare_exit_to_usermode arch/x86/entry/common.c:190
[ 1116.897025] [<
ffffffff81006060>] syscall_return_slowpath+0x1a0/0x1e0 arch/x86/entry/common.c:259
[ 1116.897025] [<
ffffffff85e4d726>] entry_SYSCALL_64_fastpath+0xc4/0xc6
Memory state around the buggy address:
ffff8800081b0d80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8800081b0e00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>
ffff8800081b0e80: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
^
ffff8800081b0f00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8800081b0f80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================
The same issue exists with l2tp_ip_bind() and l2tp_ip_bind_table.
Fixes:
c51ce49735c1 ("l2tp: fix oops in L2TP IP sockets for connect() AF_UNSPEC case")
Reported-by: Baozeng Ding <sploving1@gmail.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Baozeng Ding <sploving1@gmail.com>
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Sun, 20 Nov 2016 02:40:47 +0000 (18:40 -0800)]
Merge tag 'armsoc-fixes' of git://git./linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Olof Johansson:
"Again a set of smaller fixes across several platforms (OMAP, Marvell,
Allwinner, i.MX, etc).
A handful of typo fixes and smaller missing contents from device
trees, with some tweaks to OMAP mach files to deal with CPU feature
print misformatting, potential NULL ptr dereference and one setup
issue with UARTs"
* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
ipmi/bt-bmc: change compatible node to 'aspeed, ast2400-ibt-bmc'
ARM: dts: STiH410-b2260: Fix typo in spi0 chipselect definition
ARM: dts: omap5: board-common: fix wrong SMPS6 (VDD-DDR3) voltage
ARM: omap3: Add missing memory node in SOM-LV
arm64: dts: marvell: add unique identifiers for Armada A8k SPI controllers
arm64: dts: marvell: fix clocksource for CP110 slave SPI0
arm64: dts: marvell: Fix typo in label name on Armada 37xx
ASoC: omap-abe-twl6040: fix typo in bindings documentation
dts: omap5: board-common: enable twl6040 headset jack detection
dts: omap5: board-common: add phandle to reference Palmas gpadc
ARM: OMAP2+: avoid NULL pointer dereference
ARM: OMAP2+: PRM: initialize en_uart4_mask and grpsel_uart4_mask
ARM: dts: omap3: Fix memory node in Torpedo board
ARM: AM43XX: Select OMAP_INTERCONNECT in Kconfig
ARM: OMAP3: Fix formatting of features printed
ARM: dts: imx53-qsb: Fix regulator constraints
ARM: dts: sun8i: fix the pinmux for UART1
Linus Torvalds [Sun, 20 Nov 2016 02:33:50 +0000 (18:33 -0800)]
Merge tag 'ext4_for_stable' of git://git./linux/kernel/git/tytso/ext4
Pull ext4 fixes from Ted Ts'o:
"A security fix (so a maliciously corrupted file system image won't
panic the kernel) and some fixes for CONFIG_VMAP_STACK"
* tag 'ext4_for_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: sanity check the block and cluster size at mount time
fscrypto: don't use on-stack buffer for key derivation
fscrypto: don't use on-stack buffer for filename encryption
Theodore Ts'o [Fri, 18 Nov 2016 18:00:24 +0000 (13:00 -0500)]
ext4: sanity check the block and cluster size at mount time
If the block size or cluster size is insane, reject the mount. This
is important for security reasons (although we shouldn't be just
depending on this check).
Ref: http://www.securityfocus.com/archive/1/539661
Ref: https://bugzilla.redhat.com/show_bug.cgi?id=
1332506
Reported-by: Borislav Petkov <bp@alien8.de>
Reported-by: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
Eric Biggers [Mon, 14 Nov 2016 01:41:09 +0000 (20:41 -0500)]
fscrypto: don't use on-stack buffer for key derivation
With the new (in 4.9) option to use a virtually-mapped stack
(CONFIG_VMAP_STACK), stack buffers cannot be used as input/output for
the scatterlist crypto API because they may not be directly mappable to
struct page. get_crypt_info() was using a stack buffer to hold the
output from the encryption operation used to derive the per-file key.
Fix it by using a heap buffer.
This bug could most easily be observed in a CONFIG_DEBUG_SG kernel
because this allowed the BUG in sg_set_buf() to be triggered.
Cc: stable@vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Eric Biggers [Mon, 14 Nov 2016 01:35:52 +0000 (20:35 -0500)]
fscrypto: don't use on-stack buffer for filename encryption
With the new (in 4.9) option to use a virtually-mapped stack
(CONFIG_VMAP_STACK), stack buffers cannot be used as input/output for
the scatterlist crypto API because they may not be directly mappable to
struct page. For short filenames, fname_encrypt() was encrypting a
stack buffer holding the padded filename. Fix it by encrypting the
filename in-place in the output buffer, thereby making the temporary
buffer unnecessary.
This bug could most easily be observed in a CONFIG_DEBUG_SG kernel
because this allowed the BUG in sg_set_buf() to be triggered.
Cc: stable@vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Linus Torvalds [Sat, 19 Nov 2016 21:35:09 +0000 (13:35 -0800)]
Merge branch 'i2c/for-current' of git://git./linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"Some I2C driver bugfixes (and one documentation fix)"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: i2c-mux-pca954x: fix deselect enabling for device-tree
i2c: digicolor: use clk_disable_unprepare instead of clk_unprepare
i2c: mux: fix up dependencies
i2c: Documentation: i2c-topology: fix minor whitespace nit
i2c: mux: demux-pinctrl: make drivers with no pinctrl work again
Linus Torvalds [Sat, 19 Nov 2016 21:31:40 +0000 (13:31 -0800)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm
Pull KVM fixes from Radim Krčmář:
"ARM:
- Fix handling of the 32bit cycle counter
- Fix cycle counter filtering
x86:
- Fix a race leading to double unregistering of user notifiers
- Amend oversight in kvm_arch_set_irq that turned Hyper-V code dead
- Use SRCU around kvm_lapic_set_vapic_addr
- Avoid recursive flushing of asynchronous page faults
- Do not rely on deferred update in KVM_GET_CLOCK, which fixes #GP
- Let userspace know that KVM_GET_CLOCK is useful with master clock;
4.9 changed the return value to better match the guest clock, but
didn't provide means to let guests take advantage of it"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
kvm: x86: merge kvm_arch_set_irq and kvm_arch_set_irq_inatomic
KVM: x86: fix missed SRCU usage in kvm_lapic_set_vapic_addr
KVM: async_pf: avoid recursive flushing of work items
kvm: kvmclock: let KVM_GET_CLOCK return whether the master clock is in use
KVM: Disable irq while unregistering user notifier
KVM: x86: do not go through vcpu in __get_kvmclock_ns
KVM: arm64: Fix the issues when guest PMCCFILTR is configured
arm64: KVM: pmu: Fix AArch32 cycle counter access
Alex Hemme [Sat, 19 Nov 2016 09:48:38 +0000 (10:48 +0100)]
i2c: i2c-mux-pca954x: fix deselect enabling for device-tree
Deselect functionality can be ignored for device-trees with
"i2c-mux-idle-disconnect" entries if no platform_data is available.
By enabling the deselect functionality outside the platform_data
block the logic works as it did in previous kernels.
Fixes:
7fcac9807175 ("i2c: i2c-mux-pca954x: convert to use an explicit i2c mux core")
Cc: <stable@vger.kernel.org> # v4.7+
Signed-off-by: Alex Hemme <ahemme@cisco.com>
Signed-off-by: Ziyang Wu <ziywu@cisco.com>
[touched up a few minor issues /peda]
Signed-off-by: Peter Rosin <peda@axentia.se>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Linus Torvalds [Sat, 19 Nov 2016 19:21:59 +0000 (11:21 -0800)]
Merge tag 'powerpc-4.9-5' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"Fixes marked for stable:
- fix system reset interrupt winkle wakeups
- fix setting of AIL in hypervisor mode
Fixes for code merged this cycle:
- fix exception vector build with 2.23 era binutils
- fix missing update of HID register on secondary CPUs
Other:
- fix missing pr_cont()s
- invalidate ERAT on tlbiel for POWER9 DD1"
* tag 'powerpc-4.9-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/mm: Fix missing update of HID register on secondary CPUs
powerpc/mm/radix: Invalidate ERAT on tlbiel for POWER9 DD1
powerpc/64: Fix setting of AIL in hypervisor mode
powerpc/oops: Fix missing pr_cont()s in instruction dump
powerpc/oops: Fix missing pr_cont()s in show_regs()
powerpc/oops: Fix missing pr_cont()s in print_msr_bits() et. al.
powerpc/oops: Fix missing pr_cont()s in show_stack()
powerpc: Fix exception vector build with 2.23 era binutils
powerpc/64s: Fix system reset interrupt winkle wakeups
Linus Torvalds [Sat, 19 Nov 2016 19:15:45 +0000 (11:15 -0800)]
Merge branch 'linus' of git://git./linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
"This fixes the following issues:
- Compiler warning in caam driver that was the last one remaining
- Do not register aes-xts in caam drivers on unsupported platforms
- Regression in algif_hash interface that may lead to an oops"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: algif_hash - Fix NULL hash crash with shash
crypto: caam - fix type mismatch warning
crypto: caam - do not register AES-XTS mode on LP units
Linus Torvalds [Sat, 19 Nov 2016 19:09:28 +0000 (11:09 -0800)]
Merge tag 'leds_4.9-rc6' of git://git./linux/kernel/git/j.anaszewski/linux-leds
Pull LED subsystem update from Jacek Anaszewski:
"I'd like to announce a new co-maintainer - Pavel Machek"
* tag 'leds_4.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds:
MAINTAINERS: Add LED subsystem co-maintainer
Linus Torvalds [Sat, 19 Nov 2016 19:05:47 +0000 (11:05 -0800)]
Merge tag 'dmaengine-fix-4.9-rc6' of git://git.infradead.org/users/vkoul/slave-dma
Pull dmaengine fixes from Vinod Koul:
"Some driver fixes which we pending in my tree:
- return error code fix in edma driver
- Kconfig fix for genric allocator in mmp_tdma
- fix uninitialized value in sun6i
- Runtime pm fixes for cppi"
* tag 'dmaengine-fix-4.9-rc6' of git://git.infradead.org/users/vkoul/slave-dma:
dmaengine: cppi41: More PM runtime fixes
dmaengine: cpp41: Fix handling of error path
dmaengine: cppi41: Fix unpaired pm runtime when only a USB hub is connected
dmaengine: cppi41: Fix list not empty warning on module removal
dmaengine: sun6i: fix the uninitialized value for v_lli
dmaengine: mmp_tdma: add missing select GENERIC_ALLOCATOR in Kconfig
dmaengine: edma: Fix error return code in edma_alloc_chan_resources()
Paolo Bonzini [Thu, 17 Nov 2016 14:55:47 +0000 (15:55 +0100)]
kvm: x86: merge kvm_arch_set_irq and kvm_arch_set_irq_inatomic
kvm_arch_set_irq is unused since commit
b97e6de9c96. Merge
its functionality with kvm_arch_set_irq_inatomic.
Reported-by: Jiang Biao <jiang.biao2@zte.com.cn>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Paolo Bonzini [Thu, 17 Nov 2016 14:55:46 +0000 (15:55 +0100)]
KVM: x86: fix missed SRCU usage in kvm_lapic_set_vapic_addr
Reported by syzkaller:
[ INFO: suspicious RCU usage. ]
4.9.0-rc4+ #47 Not tainted
-------------------------------
./include/linux/kvm_host.h:536 suspicious rcu_dereference_check() usage!
stack backtrace:
CPU: 1 PID: 6679 Comm: syz-executor Not tainted 4.9.0-rc4+ #47
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
ffff880039e2f6d0 ffffffff81c2e46b ffff88003e3a5b40 0000000000000000
0000000000000001 ffffffff83215600 ffff880039e2f700 ffffffff81334ea9
ffffc9000730b000 0000000000000004 ffff88003c4f8420 ffff88003d3f8000
Call Trace:
[< inline >] __dump_stack lib/dump_stack.c:15
[<
ffffffff81c2e46b>] dump_stack+0xb3/0x118 lib/dump_stack.c:51
[<
ffffffff81334ea9>] lockdep_rcu_suspicious+0x139/0x180 kernel/locking/lockdep.c:4445
[< inline >] __kvm_memslots include/linux/kvm_host.h:534
[< inline >] kvm_memslots include/linux/kvm_host.h:541
[<
ffffffff8105d6ae>] kvm_gfn_to_hva_cache_init+0xa1e/0xce0 virt/kvm/kvm_main.c:1941
[<
ffffffff8112685d>] kvm_lapic_set_vapic_addr+0xed/0x140 arch/x86/kvm/lapic.c:2217
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Fixes:
fda4e2e85589191b123d31cdc21fd33ee70f50fd
Cc: Andrew Honig <ahonig@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Paolo Bonzini [Thu, 17 Nov 2016 14:55:45 +0000 (15:55 +0100)]
KVM: async_pf: avoid recursive flushing of work items
This was reported by syzkaller:
[ INFO: possible recursive locking detected ]
4.9.0-rc4+ #49 Not tainted
---------------------------------------------
kworker/2:1/5658 is trying to acquire lock:
([ 1644.769018] (&work->work)
[< inline >] list_empty include/linux/compiler.h:243
[<
ffffffff8128dd60>] flush_work+0x0/0x660 kernel/workqueue.c:1511
but task is already holding lock:
([ 1644.769018] (&work->work)
[<
ffffffff812916ab>] process_one_work+0x94b/0x1900 kernel/workqueue.c:2093
stack backtrace:
CPU: 2 PID: 5658 Comm: kworker/2:1 Not tainted 4.9.0-rc4+ #49
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Workqueue: events async_pf_execute
ffff8800676ff630 ffffffff81c2e46b ffffffff8485b930 ffff88006b1fc480
0000000000000000 ffffffff8485b930 ffff8800676ff7e0 ffffffff81339b27
ffff8800676ff7e8 0000000000000046 ffff88006b1fcce8 ffff88006b1fccf0
Call Trace:
...
[<
ffffffff8128ddf3>] flush_work+0x93/0x660 kernel/workqueue.c:2846
[<
ffffffff812954ea>] __cancel_work_timer+0x17a/0x410 kernel/workqueue.c:2916
[<
ffffffff81295797>] cancel_work_sync+0x17/0x20 kernel/workqueue.c:2951
[<
ffffffff81073037>] kvm_clear_async_pf_completion_queue+0xd7/0x400 virt/kvm/async_pf.c:126
[< inline >] kvm_free_vcpus arch/x86/kvm/x86.c:7841
[<
ffffffff810b728d>] kvm_arch_destroy_vm+0x23d/0x620 arch/x86/kvm/x86.c:7946
[< inline >] kvm_destroy_vm virt/kvm/kvm_main.c:731
[<
ffffffff8105914e>] kvm_put_kvm+0x40e/0x790 virt/kvm/kvm_main.c:752
[<
ffffffff81072b3d>] async_pf_execute+0x23d/0x4f0 virt/kvm/async_pf.c:111
[<
ffffffff8129175c>] process_one_work+0x9fc/0x1900 kernel/workqueue.c:2096
[<
ffffffff8129274f>] worker_thread+0xef/0x1480 kernel/workqueue.c:2230
[<
ffffffff812a5a94>] kthread+0x244/0x2d0 kernel/kthread.c:209
[<
ffffffff831f102a>] ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:433
The reason is that kvm_put_kvm is causing the destruction of the VM, but
the page fault is still on the ->queue list. The ->queue list is owned
by the VCPU, not by the work items, so we cannot just add list_del to
the work item.
Instead, use work->vcpu to note async page faults that have been resolved
and will be processed through the done list. There is no need to flush
those.
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Paolo Bonzini [Wed, 9 Nov 2016 16:48:15 +0000 (17:48 +0100)]
kvm: kvmclock: let KVM_GET_CLOCK return whether the master clock is in use
Userspace can read the exact value of kvmclock by reading the TSC
and fetching the timekeeping parameters out of guest memory. This
however is brittle and not necessary anymore with KVM 4.11. Provide
a mechanism that lets userspace know if the new KVM_GET_CLOCK
semantics are in effect, and---since we are at it---if the clock
is stable across all VCPUs.
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Ignacio Alvarado [Fri, 4 Nov 2016 19:15:55 +0000 (12:15 -0700)]
KVM: Disable irq while unregistering user notifier
Function user_notifier_unregister should be called only once for each
registered user notifier.
Function kvm_arch_hardware_disable can be executed from an IPI context
which could cause a race condition with a VCPU returning to user mode
and attempting to unregister the notifier.
Signed-off-by: Ignacio Alvarado <ikalvarado@google.com>
Cc: stable@vger.kernel.org
Fixes:
18863bdd60f8 ("KVM: x86 shared msr infrastructure")
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Paolo Bonzini [Wed, 16 Nov 2016 17:31:30 +0000 (18:31 +0100)]
KVM: x86: do not go through vcpu in __get_kvmclock_ns
Going through the first VCPU is wrong if you follow a KVM_SET_CLOCK with
a KVM_GET_CLOCK immediately after, without letting the VCPU run and
call kvm_guest_time_update.
To fix this, compute the kvmclock value ourselves, using the master
clock (tsc, nsec) pair as the base and the host CPU frequency as
the scale.
Reported-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Radim Krčmář [Sat, 19 Nov 2016 17:02:07 +0000 (18:02 +0100)]
Merge tag 'kvm-arm-for-4.9-rc6' of git://git./linux/kernel/git/kvmarm/kvmarm
KVM/ARM updates for v4.9-rc6
- Fix handling of the 32bit cycle counter
- Fix cycle counter filtering
David S. Miller [Sat, 19 Nov 2016 16:11:52 +0000 (11:11 -0500)]
Merge tag 'batadv-net-for-davem-
20161119' of git://git.open-mesh.org/linux-merge
Simon Wunderlich says:
====================
Here are two batman-adv bugfix patches:
- Revert a splat on disabling interface which created another problem,
by Sven Eckelmann
- Fix error handling when the primary interface disappears during a
throughput meter test, by Sven Eckelmann
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Geliang Tang [Fri, 18 Nov 2016 14:21:17 +0000 (22:21 +0800)]
sparc: drop duplicate header scatterlist.h
Drop duplicate header scatterlist.h from iommu_common.h.
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexey Khoroshilov [Fri, 18 Nov 2016 22:40:10 +0000 (01:40 +0300)]
net: macb: add check for dma mapping error in start_xmit()
at91ether_start_xmit() does not check for dma mapping errors.
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Sat, 19 Nov 2016 01:21:58 +0000 (17:21 -0800)]
Merge tag 'acpi-4.9-rc6' of git://git./linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"They fix an ACPI thermal management regression introduced by a recent
FADT handling cleanup, an ACPI tools build issue introduced by a
recent ACPICA commit and a PCC mailbox initialization bug causing
lockdep to complain loudly.
Specifics:
- Revert a recent ACPICA cleanup that attempted to get rid of all
FADT version 2 legacy, but broke ACPI thermal management on at
least one system (Rafael Wysocki).
- Fix cross-compiled builds of ACPI tools that stopped working after
a recent cleanup related to the handling of header files in ACPICA
(Lv Zheng).
- Fix a locking issue in the PCC channel initialization code that
invokes devm_request_irq() under a spinlock (among other things)
and causes lockdep to complain (Hoan Tran)"
* tag 'acpi-4.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
tools/power/acpi: Remove direct kernel source include reference
mailbox: PCC: Fix lockdep warning when request PCC channel
Revert "ACPICA: FADT support cleanup"
Linus Torvalds [Sat, 19 Nov 2016 00:45:21 +0000 (16:45 -0800)]
Merge branch 'rc-fixes' of git://git./linux/kernel/git/mmarek/kbuild
Pull kbuild fixes from Michal Marek:
"Here are some regression fixes for kbuild:
- modversion support for exported asm symbols (Nick Piggin). The
affected architectures need separate patches adding
asm-prototypes.h.
- fix rebuilds of lib-ksyms.o (Nick Piggin)
- -fno-PIE builds (Sebastian Siewior and Borislav Petkov). This is
not a kernel regression, but one of the Debian gcc package.
Nevertheless, it's quite annoying, so I think it should go into
mainline and stable now"
* 'rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
kbuild: Steal gcc's pie from the very beginning
kbuild: be more careful about matching preprocessed asm ___EXPORT_SYMBOL
x86/kexec: add -fno-PIE
scripts/has-stack-protector: add -fno-PIE
kbuild: add -fno-PIE
kbuild: modversions for EXPORT_SYMBOL() for asm
kbuild: prevent lib-ksyms.o rebuilds
Linus Torvalds [Sat, 19 Nov 2016 00:32:21 +0000 (16:32 -0800)]
Merge tag 'nfsd-4.9-2' of git://linux-nfs.org/~bfields/linux
Pull nfsd bugfix from Bruce Fields:
"Just one fix for an NFS/RDMA crash"
* tag 'nfsd-4.9-2' of git://linux-nfs.org/~bfields/linux:
sunrpc: svc_age_temp_xprts_now should not call setsockopt non-tcp transports
Pavel Machek [Tue, 15 Nov 2016 10:12:05 +0000 (11:12 +0100)]
MAINTAINERS: Add LED subsystem co-maintainer
Mark me as a co-maintainer of LED subsystem.
Signed-off-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Jacek Anaszewski <j.anaszewski@samsung.com>
Rafael J. Wysocki [Fri, 18 Nov 2016 20:34:42 +0000 (21:34 +0100)]
Merge branches 'acpica-fixes', 'acpi-cppc-fixes' and 'acpi-tools-fixes'
* acpica-fixes:
Revert "ACPICA: FADT support cleanup"
* acpi-cppc-fixes:
mailbox: PCC: Fix lockdep warning when request PCC channel
* acpi-tools-fixes:
tools/power/acpi: Remove direct kernel source include reference
David S. Miller [Fri, 18 Nov 2016 19:33:26 +0000 (11:33 -0800)]
Merge branch 'sparc-lockdep-small'
Babu Moger says:
====================
Adjust lockdep static allocations for sparc
These patches limit the static allocations for lockdep data structures
used for debugging locking correctness. For sparc, all the kernel's code,
data, and bss, must have locked translations in the TLB so that we don't
get TLB misses on kernel code and data. Current sparc chips have 8 TLB
entries available that may be locked down, and with a 4mb page size,
this gives a maximum of 32MB. With PROVE_LOCKING we could go over this
limit and cause system boot-up problems. These patches limit the static
allocations so that everything fits in current required size limit.
patch 1 : Adds new config parameter CONFIG_PROVE_LOCKING_SMALL
Patch 2 : Adjusts the sizes based on the new config parameter
v2-> v3:
Some more comments from Sam Ravnborg and Peter Zijlstra.
Defined PROVE_LOCKING_SMALL as invisible and moved the selection to
arch/sparc/Kconfig.
v1-> v2:
As suggested by Peter Zijlstra, keeping the default as is.
Introduced new config variable CONFIG_PROVE_LOCKING_SMALL
to handle sparc specific case.
v0:
Initial revision.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Babu Moger [Wed, 2 Nov 2016 16:36:33 +0000 (09:36 -0700)]
lockdep: Limit static allocations if PROVE_LOCKING_SMALL is defined
Reduce the size of data structure for lockdep entries by half if
PROVE_LOCKING_SMALL if defined. This is used only for sparc.
Signed-off-by: Babu Moger <babu.moger@oracle.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Babu Moger [Wed, 2 Nov 2016 16:36:32 +0000 (09:36 -0700)]
config: Adding the new config parameter CONFIG_PROVE_LOCKING_SMALL for sparc
This new config parameter limits the space used for "Lock debugging:
prove locking correctness" by about 4MB. The current sparc systems have
the limitation of 32MB size for kernel size including .text, .data and
.bss sections. With PROVE_LOCKING feature, the kernel size could grow
beyond this limit and causing system boot-up issues. With this option,
kernel limits the size of the entries of lock_chains, stack_trace etc.,
so that kernel fits in required size limit. This is not visible to user
and only used for sparc.
Signed-off-by: Babu Moger <babu.moger@oracle.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tushar Dave [Mon, 17 Oct 2016 20:57:00 +0000 (13:57 -0700)]
sunbmac: Fix compiler warning
sunbmac uses '__u32' for dma handle while invoking kernel DMA APIs,
instead of using dma_addr_t. This hasn't caused any 'incompatible
pointer type' warning on SPARC because until now dma_addr_t is of
type u32. However, recent changes in SPARC ATU (iommu) enables 64bit
DMA and therefore dma_addr_t becomes of type u64. This makes
'incompatible pointer type' warnings inevitable.
e.g.
drivers/net/ethernet/sun/sunbmac.c: In function ‘bigmac_ether_init’:
drivers/net/ethernet/sun/sunbmac.c:1166: warning: passing argument 3 of ‘dma_alloc_coherent’ from incompatible pointer type
./include/linux/dma-mapping.h:445: note: expected ‘dma_addr_t *’ but argument is of type ‘__u32 *’
This patch resolves above compiler warning.
Signed-off-by: Tushar Dave <tushar.n.dave@oracle.com>
Reviewed-by: chris hyser <chris.hyser@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tushar Dave [Mon, 17 Oct 2016 20:56:59 +0000 (13:56 -0700)]
sunqe: Fix compiler warnings
sunqe uses '__u32' for dma handle while invoking kernel DMA APIs,
instead of using dma_addr_t. This hasn't caused any 'incompatible
pointer type' warning on SPARC because until now dma_addr_t is of
type u32. However, recent changes in SPARC ATU (iommu) enables 64bit
DMA and therefore dma_addr_t becomes of type u64. This makes
'incompatible pointer type' warnings inevitable.
e.g.
drivers/net/ethernet/sun/sunqe.c: In function ‘qec_ether_init’:
drivers/net/ethernet/sun/sunqe.c:883: warning: passing argument 3 of ‘dma_alloc_coherent’ from incompatible pointer type
./include/linux/dma-mapping.h:445: note: expected ‘dma_addr_t *’ but argument is of type ‘__u32 *’
drivers/net/ethernet/sun/sunqe.c:885: warning: passing argument 3 of ‘dma_alloc_coherent’ from incompatible pointer type
./include/linux/dma-mapping.h:445: note: expected ‘dma_addr_t *’ but argument is of type ‘__u32 *’
This patch resolves above compiler warnings.
Signed-off-by: Tushar Dave <tushar.n.dave@oracle.com>
Reviewed-by: chris hyser <chris.hyser@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 18 Nov 2016 19:17:10 +0000 (11:17 -0800)]
Merge branch 'sun4v-64bit-DMA'
Tushar Dave says:
====================
sparc: Enable sun4v hypervisor PCI IOMMU v2 APIs and ATU
ATU (Address Translation Unit) is a new IOMMU in SPARC supported with
sun4v hypervisor PCI IOMMU v2 APIs.
Current SPARC IOMMU supports only 32bit address ranges and one TSB
per PCIe root complex that has a 2GB per root complex DVMA space
limit. The limit has become a scalability bottleneck nowadays that
a typical 10G/40G NIC can consume 500MB DVMA space per instance.
When DVMA resource is exhausted, devices will not be usable
since the driver can't allocate DVMA.
For example, we recently experienced legacy IOMMU limitation while
using i40e driver in system with large number of CPUs (e.g. 128).
Four ports of i40e, each request 128 QP (Queue Pairs). Each queue has
512 (default) descriptors. So considering only RX queues (because RX
premap DMA buffers), i40e takes 4*128*512 number of DMA entries in
IOMMU table. Legacy IOMMU can have at max (2G/8K)- 1 entries available
in table. So bringing up four instance of i40e alone saturate existing
IOMMU resource.
ATU removes bottleneck by allowing guest os to create IOTSB of size
32G (or more) with 64bit address ranges available in ATU HW. 32G is
more than enough DVMA space to be shared by all PCIe devices under
root complex contrast to 2G space provided by legacy IOMMU.
ATU allows PCIe devices to use 64bit DMA addressing. Devices
which choose to use 32bit DMA mask will continue to work with the
existing legacy IOMMU.
The patch set is tested on sun4v (T1000, T2000, T3, T4, T5, T7, S7)
and sun4u SPARC.
Thanks.
-Tushar
v2->v3:
- Patch #5 addresses comment by Joe Perches.
-- use %s, __func__ instead of embedding the function name.
v1->v2:
- Patch #2 addresses comments by Dave M.
-- use page allocator to allocate IOTSB.
-- use true/false with boolean variables.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Tushar Dave [Fri, 28 Oct 2016 17:12:45 +0000 (10:12 -0700)]
sparc64: Enable 64-bit DMA
ATU 64bit addressing allows PCIe devices with 64bit DMA capabilities
to use ATU for 64bit DMA.
Signed-off-by: Tushar Dave <tushar.n.dave@oracle.com>
Reviewed-by: chris hyser <chris.hyser@oracle.com>
Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tushar Dave [Fri, 28 Oct 2016 17:12:44 +0000 (10:12 -0700)]
sparc64: Enable sun4v dma ops to use IOMMU v2 APIs
Add Hypervisor IOMMU v2 APIs pci_iotsb_map(), pci_iotsb_demap() and
enable sun4v dma ops to use IOMMU v2 API for all PCIe devices with
64bit DMA mask.
Signed-off-by: Tushar Dave <tushar.n.dave@oracle.com>
Reviewed-by: chris hyser <chris.hyser@oracle.com>
Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tushar Dave [Fri, 28 Oct 2016 17:12:43 +0000 (10:12 -0700)]
sparc64: Bind PCIe devices to use IOMMU v2 service
In order to use Hypervisor (HV) IOMMU v2 API for map/demap, each PCIe
device has to be bound to IOTSB using HV API pci_iotsb_bind().
Signed-off-by: Tushar Dave <tushar.n.dave@oracle.com>
Reviewed-by: chris hyser <chris.hyser@oracle.com>
Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tushar Dave [Fri, 28 Oct 2016 17:12:42 +0000 (10:12 -0700)]
sparc64: Initialize iommu_map_table and iommu_pool
Like legacy IOMMU, use common iommu_map_table and iommu_pool for ATU.
This change initializes iommu_map_table and iommu_pool for ATU.
Signed-off-by: Tushar Dave <tushar.n.dave@oracle.com>
Reviewed-by: chris hyser <chris.hyser@oracle.com>
Reviewed-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tushar Dave [Fri, 28 Oct 2016 17:12:41 +0000 (10:12 -0700)]
sparc64: Add ATU (new IOMMU) support
ATU (Address Translation Unit) is a new IOMMU in SPARC supported with
Hypervisor IOMMU v2 APIs.
Current SPARC IOMMU supports only 32bit address ranges and one TSB
per PCIe root complex that has a 2GB per root complex DVMA space
limit. The limit has become a scalability bottleneck nowadays that
a typical 10G/40G NIC can consume 300MB-500MB DVMA space per
instance. When DVMA resource is exhausted, devices will not be usable
since the driver can't allocate DVMA.
ATU removes bottleneck by allowing guest os to create IOTSB of size
32G (or more) with 64bit address ranges available in ATU HW. 32G is
more than enough DVMA space to be shared by all PCIe devices under
root complex contrast to 2G space provided by legacy IOMMU.
ATU allows PCIe devices to use 64bit DMA addressing. Devices
which choose to use 32bit DMA mask will continue to work with the
existing legacy IOMMU.
Signed-off-by: Tushar Dave <tushar.n.dave@oracle.com>
Reviewed-by: chris hyser <chris.hyser@oracle.com>
Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dave Kleikamp [Fri, 28 Oct 2016 17:12:40 +0000 (10:12 -0700)]
sparc64: Add FORCE_MAX_ZONEORDER and default to 13
This change allows ATU (new IOMMU) in SPARC systems to request
large (32M) contiguous memory during boot for creating IOTSB backing
store.
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Signed-off-by: Tushar Dave <tushar.n.dave@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sabrina Dubroca [Fri, 18 Nov 2016 14:50:39 +0000 (15:50 +0100)]
rtnetlink: fix FDB size computation
Add missing NDA_VLAN attribute's size.
Fixes:
1e53d5bb8878 ("net: Pass VLAN ID to rtnl_fdb_notify.")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stefan Hajnoczi [Fri, 18 Nov 2016 09:41:46 +0000 (09:41 +0000)]
netns: fix get_net_ns_by_fd(int pid) typo
The argument to get_net_ns_by_fd() is a /proc/$PID/ns/net file
descriptor not a pid. Fix the typo.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Rami Rosen <roszenrami@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 18 Nov 2016 19:00:27 +0000 (14:00 -0500)]
Merge tag 'mac80211-for-davem-2016-11-18' of git://git./linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
A few more bugfixes:
* limit # of scan results stored in memory - this is a long-standing bug
Jouni and I only noticed while discussing other things in Santa Fe
* revert AP_LINK_PS patch that was causing issues (Felix)
* various A-MSDU/A-MPDU fixes for TXQ code (Felix)
* interoperability workaround for peers with broken VHT capabilities
(Filip Matusiak)
* add bitrate definition for a VHT MCS that's supposed to be invalid
but gets used by some hardware anyway (Thomas Pedersen)
* beacon timer fix in hwsim (Benjamin Beichler)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
WANG Cong [Thu, 17 Nov 2016 23:55:26 +0000 (15:55 -0800)]
af_unix: conditionally use freezable blocking calls in read
Commit
2b15af6f95 ("af_unix: use freezable blocking calls in read")
converts schedule_timeout() to its freezable version, it was probably
correct at that time, but later, commit
2b514574f7e8
("net: af_unix: implement splice for stream af_unix sockets") breaks
the strong requirement for a freezable sleep, according to
commit
0f9548ca1091:
We shouldn't try_to_freeze if locks are held. Holding a lock can cause a
deadlock if the lock is later acquired in the suspend or hibernate path
(e.g. by dpm). Holding a lock can also cause a deadlock in the case of
cgroup_freezer if a lock is held inside a frozen cgroup that is later
acquired by a process outside that group.
The pipe_lock is still held at that point.
So use freezable version only for the recvmsg call path, avoid impact for
Android.
Fixes:
2b514574f7e8 ("net: af_unix: implement splice for stream af_unix sockets")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Colin Cross <ccross@android.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 18 Nov 2016 18:48:54 +0000 (13:48 -0500)]
Merge branch 'cpsw-fixes'
Johan Hovold says:
====================
net: cpsw: fix leaks and probe deferral
This series fixes as number of leaks and issues in the cpsw probe-error
and driver-unbind paths, some which specifically prevented deferred
probing.
v2
- Keep platform device runtime-resumed throughout probe instead of
resuming in the probe error path as suggested by Grygorii (patch
1/7).
- Runtime-resume platform device before registering any children in
order to make sure it is synchronously suspended after deregistering
children in the error path (patch 3/7).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Johan Hovold [Thu, 17 Nov 2016 16:40:04 +0000 (17:40 +0100)]
net: ethernet: ti: cpsw: fix fixed-link phy probe deferral
Make sure to propagate errors from of_phy_register_fixed_link() which
can fail with -EPROBE_DEFER.
Fixes:
1f71e8c96fc6 ("drivers: net: cpsw: Add support for fixed-link
PHY")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johan Hovold [Thu, 17 Nov 2016 16:40:03 +0000 (17:40 +0100)]
net: ethernet: ti: cpsw: add missing sanity check
Make sure to check for allocation failures before dereferencing a
NULL-pointer during probe.
Fixes:
649a1688c960 ("net: ethernet: ti: cpsw: create common struct to
hold shared driver data")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johan Hovold [Thu, 17 Nov 2016 16:40:02 +0000 (17:40 +0100)]
net: ethernet: ti: cpsw: fix secondary-emac probe error path
Make sure to deregister the primary device in case the secondary emac
fails to probe.
kernel BUG at /home/johan/work/omicron/src/linux/net/core/dev.c:7743!
...
[<
c05b3dec>] (free_netdev) from [<
c04fe6c0>] (cpsw_probe+0x9cc/0xe50)
[<
c04fe6c0>] (cpsw_probe) from [<
c047b28c>] (platform_drv_probe+0x5c/0xc0)
Fixes:
d9ba8f9e6298 ("driver: net: ethernet: cpsw: dual emac interface
implementation")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johan Hovold [Thu, 17 Nov 2016 16:40:01 +0000 (17:40 +0100)]
net: ethernet: ti: cpsw: fix of_node and phydev leaks
Make sure to drop references taken and deregister devices registered
during probe on probe errors (including deferred probe) and driver
unbind.
Specifically, PHY of-node references were never released and fixed-link
PHY devices were never deregistered.
Fixes:
9e42f715264f ("drivers: net: cpsw: add phy-handle parsing")
Fixes:
1f71e8c96fc6 ("drivers: net: cpsw: Add support for fixed-link
PHY")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johan Hovold [Thu, 17 Nov 2016 16:40:00 +0000 (17:40 +0100)]
net: ethernet: ti: cpsw: fix deferred probe
Make sure to deregister all child devices also on probe errors to avoid
leaks and to fix probe deferral:
cpsw
4a100000.ethernet: omap_device: omap_device_enable() called from invalid state 1
cpsw
4a100000.ethernet: use pm_runtime_put_sync_suspend() in driver?
cpsw: probe of
4a100000.ethernet failed with error -22
Add generic helper to undo the effects of cpsw_probe_dt(), which will
also be used in a follow-on patch to fix further leaks that have been
introduced more recently.
Note that the platform device is now runtime-resumed before registering
any child devices in order to make sure that it is synchronously
suspended after having deregistered the children in the error path.
Fixes:
1fb19aa730e4 ("net: cpsw: Add parent<->child relation support
between cpsw and mdio")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johan Hovold [Thu, 17 Nov 2016 16:39:59 +0000 (17:39 +0100)]
net: ethernet: ti: cpsw: fix mdio device reference leak
Make sure to drop the reference taken by of_find_device_by_node() when
looking up an mdio device from a phy_id property during probe.
Fixes:
549985ee9c72 ("cpsw: simplify the setup of the register
pointers")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johan Hovold [Thu, 17 Nov 2016 16:39:58 +0000 (17:39 +0100)]
net: ethernet: ti: cpsw: fix bad register access in probe error path
Make sure to keep the platform device runtime-resumed throughout probe
to avoid accessing the CPSW registers in the error path (e.g. for
deferred probe) with clocks disabled:
Unhandled fault: external abort on non-linefetch (0x1008) at 0xd0872d08
...
[<
c04fabcc>] (cpsw_ale_control_set) from [<
c04fb8b4>] (cpsw_ale_destroy+0x2c/0x44)
[<
c04fb8b4>] (cpsw_ale_destroy) from [<
c04fea58>] (cpsw_probe+0xbd0/0x10c4)
[<
c04fea58>] (cpsw_probe) from [<
c047b2a0>] (platform_drv_probe+0x5c/0xc0)
Fixes:
df828598a755 ("netdev: driver: ethernet: Add TI CPSW driver")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jeremy Linton [Thu, 17 Nov 2016 15:14:25 +0000 (09:14 -0600)]
net: sky2: Fix shutdown crash
The sky2 frequently crashes during machine shutdown with:
sky2_get_stats+0x60/0x3d8 [sky2]
dev_get_stats+0x68/0xd8
rtnl_fill_stats+0x54/0x140
rtnl_fill_ifinfo+0x46c/0xc68
rtmsg_ifinfo_build_skb+0x7c/0xf0
rtmsg_ifinfo.part.22+0x3c/0x70
rtmsg_ifinfo+0x50/0x5c
netdev_state_change+0x4c/0x58
linkwatch_do_dev+0x50/0x88
__linkwatch_run_queue+0x104/0x1a4
linkwatch_event+0x30/0x3c
process_one_work+0x140/0x3e0
worker_thread+0x60/0x44c
kthread+0xdc/0xf0
ret_from_fork+0x10/0x50
This is caused by the sky2 being called after it has been shutdown.
A previous thread about this can be found here:
https://lkml.org/lkml/2016/4/12/410
An alternative fix is to assure that IFF_UP gets cleared by
calling dev_close() during shutdown. This is similar to what the
bnx2/tg3/xgene and maybe others are doing to assure that the driver
isn't being called following _shutdown().
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 18 Nov 2016 16:56:47 +0000 (08:56 -0800)]
Merge tag 'sound-4.9-rc6' of git://git./linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Three trivial fixes:
A regression fix for ASRock mobo, a use-after-free fix at hot-unplug
of USB-audio, and a quirk for new Thinkpad models"
* tag 'sound-4.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: usb-audio: Fix use-after-free of usb_device at disconnect
ALSA: hda - Fix mic regression by ASRock mobo fixup
ALSA: hda - add a new condition to check if it is thinkpad
Linus Torvalds [Fri, 18 Nov 2016 16:47:47 +0000 (08:47 -0800)]
Merge tag 'gpio-v4.9-4' of git://git./linux/kernel/git/linusw/linux-gpio
Pull GPIO fixes from Linus Walleij:
"These are hopefully the last GPIO fixes for v4.9. The most important
is that it fixes the UML randconfig builds that have been nagging me
for some time and me being confused about where the problem was really
sitting, now this fix give this nice feeling that everything is solid
and builds fine.
Summary:
- Finally, after being puzzled by a bunch of recurrent UML build
failures on randconfigs from the build robot, Keno Fischer nailed
it: GPIO_DEVRES is optional and depends on HAS_IOMEM even though
many users just unconditionally rely on it to be available. And it
*should* be available: garbage collection is nice for this and it
*certainly* has nothing to do with having IOMEM. So we got rid of
it, and now the UML builds should JustWork(TM).
- Do not call .get_direction() on sleeping GPIO chips on the fastpath
when locking GPIOs for interrupts: it is done from atomic context,
no way.
- Some driver fixes"
* tag 'gpio-v4.9-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
gpio: Remove GPIO_DEVRES option
gpio: tc3589x: fix up .get_direction()
gpio: do not double-check direction on sleeping chips
gpio: pca953x: Move memcpy into mutex lock for set multiple
gpio: pca953x: Fix corruption of other gpios in set_multiple.