Rafael J. Wysocki [Mon, 24 Oct 2016 21:20:25 +0000 (23:20 +0200)]
cpufreq: intel_pstate: Always set max P-state in performance mode
The only times at which intel_pstate checks the policy set for
a given CPU is the initialization of that CPU and updates of its
policy settings from cpufreq when intel_pstate_set_policy() is
invoked.
That is insufficient, however, because intel_pstate uses the same
P-state selection function for all CPUs regardless of the policy
setting for each of them and the P-state limits are shared between
them. Thus if the policy is set to "performance" for a particular
CPU, it may not behave as expected if the cpufreq settings are
changed subsequently for another CPU.
That can be easily demonstrated by writing "performance" to
scaling_governor for all CPUs and then switching it to "powersave"
for one of them in which case all of the CPUs will behave as though
their scaling_governor were all "powersave" (even though the policy
still appears to be "performance" for the remaining CPUs).
Fix this problem by modifying intel_pstate_adjust_busy_pstate() to
always set the P-state to the maximum allowed by the current limits
for all CPUs whose policy is set to "performance".
Note that it still is recommended to always change the policy setting
in the same way for all CPUs even with this fix applied to avoid
confusion.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Rafael J. Wysocki [Wed, 19 Oct 2016 00:57:22 +0000 (02:57 +0200)]
cpufreq: intel_pstate: Set P-state upfront in performance mode
After commit
a4675fbc4a7a (cpufreq: intel_pstate: Replace timers with
utilization update callbacks) the cpufreq governor callbacks may not
be invoked on NOHZ_FULL CPUs and, in particular, switching to the
"performance" policy via sysfs may not have any effect on them. That
is a problem, because it usually is desirable to squeeze the last
bit of performance out of those CPUs, so work around it by setting
the maximum P-state (within the limits) in intel_pstate_set_policy()
upfront when the policy is CPUFREQ_POLICY_PERFORMANCE.
Fixes:
a4675fbc4a7a (cpufreq: intel_pstate: Replace timers with utilization update callbacks)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Sergey Senozhatsky [Mon, 17 Oct 2016 15:41:12 +0000 (00:41 +0900)]
cpufreq: fix overflow in cpufreq_table_find_index_dl()
'best' is always less or equals to 'pos', so `best - pos' returns
a negative value which is then getting casted to `unsigned int'
and passed to __cpufreq_driver_target()->acpi_cpufreq_target()
for policy->freq_table selection. This results in
BUG: unable to handle kernel paging request at
ffff881019b469f8
IP: [<
ffffffffa00356c1>] acpi_cpufreq_target+0x4f/0x190 [acpi_cpufreq]
PGD
267f067
PUD 0
Oops: 0000 [#1] PREEMPT SMP
CPU: 6 PID: 70 Comm: kworker/6:1 Not tainted 4.9.0-rc1-next-
20161017-dbg-dirty
Workqueue: events dbs_work_handler
task:
ffff88041b808000 task.stack:
ffff88041b810000
RIP: 0010:[<
ffffffffa00356c1>] [<
ffffffffa00356c1>] acpi_cpufreq_target+0x4f/0x190 [acpi_cpufreq]
RSP: 0018:
ffff88041b813c60 EFLAGS:
00010282
RAX:
ffff880419b46a00 RBX:
ffff88041b848400 RCX:
ffff880419b20f80
RDX:
00000000001dff38 RSI:
00000000ffffffff RDI:
ffff88041b848400
RBP:
ffff88041b813cb0 R08:
0000000000000006 R09:
0000000000000040
R10:
ffffffff8207f9e0 R11:
ffffffff8173595b R12:
0000000000000000
R13:
ffff88041f1dff38 R14:
0000000000262900 R15:
0000000bfffffff4
FS:
0000000000000000(0000) GS:
ffff88041f000000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
ffff881019b469f8 CR3:
000000041a2d3000 CR4:
00000000001406e0
Stack:
ffff88041b813cb0 ffffffff813347f9 ffff88041b813ca0 ffffffff81334663
ffff88041f1d4bc0 ffff88041b848400 0000000000000000 0000000000000000
0000000000262900 0000000000000000 ffff88041b813d00 ffffffff813355dc
Call Trace:
[<
ffffffff813347f9>] ? cpufreq_freq_transition_begin+0xf1/0xfc
[<
ffffffff81334663>] ? get_cpu_idle_time+0x97/0xa6
[<
ffffffff813355dc>] __cpufreq_driver_target+0x3b6/0x44e
[<
ffffffff81336ca3>] cs_dbs_timer+0x11a/0x135
[<
ffffffff81336fda>] dbs_work_handler+0x39/0x62
[<
ffffffff81057823>] process_one_work+0x280/0x4a5
[<
ffffffff81058719>] worker_thread+0x24f/0x397
[<
ffffffff810584ca>] ? rescuer_thread+0x30b/0x30b
[<
ffffffff81418380>] ? nl80211_get_key+0x29/0x36a
[<
ffffffff8105d2b7>] kthread+0xfc/0x104
[<
ffffffff8107ceea>] ? put_lock_stats.isra.9+0xe/0x20
[<
ffffffff8105d1bb>] ? kthread_create_on_node+0x3f/0x3f
[<
ffffffff814b2092>] ret_from_fork+0x22/0x30
Code: 56 4d 6b ff 0c 41 55 41 54 53 48 83 ec 28 48 8b 15 ad 1e 00 00 44 8b 41
08 48 8b 87 c8 00 00 00 49 89 d5 4e 03 2c c5 80 b2 78 81 <46> 8b 74 38 04 45
3b 75 00 75 11 31 c0 83 39 00 0f 84 1c 01 00
RIP [<
ffffffffa00356c1>] acpi_cpufreq_target+0x4f/0x190 [acpi_cpufreq]
RSP <
ffff88041b813c60>
CR2:
ffff881019b469f8
---[ end trace
16d9fc7a17897d37 ]---
[ rjw: In some cases this bug may also cause incorrect frequencies to
be selected by cpufreq governors. ]
Fixes:
899bb6642f2a (cpufreq: skip invalid entries when searching the frequency)
Link: http://marc.info/?l=linux-kernel&m=147672030714331&w=2
Reported-and-tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Reported-and-tested-by: Jörg Otte <jrg.otte@gmail.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: 4.8+ <stable@vger.kernel.org> # 4.8+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Hoan Tran [Thu, 13 Oct 2016 17:33:35 +0000 (10:33 -0700)]
cpufreq: CPPC: Correct desired_perf calculation
The desired_perf is an abstract performance number. Its value should
be in the range of [lowest perf, highest perf] of CPPC.
The correct calculation is
desired_perf = freq * cppc_highest_perf / cppc_dmi_max_khz
And cppc_cpufreq_set_target() returns if desired_perf is exactly
the same with the old perf.
Signed-off-by: Hoan Tran <hotran@apm.com>
Reviewed-by: Prashanth Prakash <pprakash@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Rafael J. Wysocki [Wed, 12 Oct 2016 19:47:03 +0000 (21:47 +0200)]
cpufreq: conservative: Fix next frequency selection
Commit
d352cf47d93e (cpufreq: conservative: Do not use transition
notifications) overlooked the case when the "frequency step" used
by the conservative governor is small relative to the distances
between the available frequencies and broke the algorithm by
using policy->cur instead of the previously requested frequency
when computing the next one.
As a result, the governor may not be able to go outside of a narrow
range between two consecutive available frequencies.
Fix the problem by making the governor save the previously requested
frequency and select the next one relative that value (unless it is
out of range, in which case policy->cur will be used instead).
Fixes:
d352cf47d93e (cpufreq: conservative: Do not use transition notifications)
Link: https://bugzilla.kernel.org/show_bug.cgi?id=177171
Reported-and-tested-by: Aleksey Rybalkin <aleksey@rybalkin.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: 4.8+ <stable@vger.kernel.org> # 4.8+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Aaro Koskinen [Wed, 12 Oct 2016 03:15:05 +0000 (08:45 +0530)]
cpufreq: skip invalid entries when searching the frequency
Skip invalid entries when searching the frequency. This fixes cpufreq
at least on loongson2 MIPS board.
Fixes:
da0c6dc00c69 (cpufreq: Handle sorted frequency tables more efficiently)
Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: 4.8+ <stable@vger.kernel.org> # 4.8+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Rafael J. Wysocki [Tue, 11 Oct 2016 21:07:38 +0000 (23:07 +0200)]
cpufreq: intel_pstate: Fix struct pstate_adjust_policy kerneldoc
It looks like the name of struct pstate_adjust_policy was updated
without updating its kerneldoc comment accordingly, so fix that
mistake.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Rafael J. Wysocki [Thu, 6 Oct 2016 12:07:51 +0000 (14:07 +0200)]
cpufreq: intel_pstate: Proportional algorithm for Atom
The PID algorithm used by the intel_pstate driver tends to drive
performance to the minimum for workloads with utilization below the
setpoint, which is undesirable, so replace it with a modified
"proportional" algorithm on Atom.
The new algorithm will set the new P-state to be 1.25 times the
available maximum times the (frequency-invariant) utilization during
the previous sampling period except when the target P-state computed
this way is lower than the average P-state during the previous
sampling period. In the latter case, it will increase the target by
50% of the difference between it and the average P-state to prevent
performance from dropping down too fast in some cases.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Rafael J. Wysocki [Fri, 30 Sep 2016 01:13:34 +0000 (03:13 +0200)]
cpufreq: intel_pstate: Clarify comment in get_target_pstate_use_performance()
Make the comment explaining the meaning of the perf_scaled variable
in get_target_pstate_use_performance() more straightforward.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Srinivas Pandruvada [Sat, 8 Oct 2016 19:42:38 +0000 (12:42 -0700)]
cpufreq: intel_pstate: Fix unsafe HWP MSR access
This is a requirement that MSR MSR_PM_ENABLE must be set to 0x01 before
reading MSR_HWP_CAPABILITIES on a given CPU. If cpufreq init() is
scheduled on a CPU which is not same as policy->cpu or migrates to a
different CPU before calling msr read for MSR_HWP_CAPABILITIES, it
is possible that MSR_PM_ENABLE was not to set to 0x01 on that CPU.
This will cause GP fault. So like other places in this path
rdmsrl_on_cpu should be used instead of rdmsrl.
Moreover the scope of MSR_HWP_CAPABILITIES is on per thread basis, so it
should be read from the same CPU, for which MSR MSR_HWP_REQUEST is
getting set.
dmesg dump or warning:
[ 22.014488] WARNING: CPU: 139 PID: 1 at arch/x86/mm/extable.c:50 ex_handler_rdmsr_unsafe+0x68/0x70
[ 22.014492] unchecked MSR access error: RDMSR from 0x771
[ 22.014493] Modules linked in:
[ 22.014507] CPU: 139 PID: 1 Comm: swapper/0 Not tainted 4.7.5+ #1
...
...
[ 22.014516] Call Trace:
[ 22.014542] [<
ffffffff813d7dd1>] dump_stack+0x63/0x82
[ 22.014558] [<
ffffffff8107bc8b>] __warn+0xcb/0xf0
[ 22.014561] [<
ffffffff8107bcff>] warn_slowpath_fmt+0x4f/0x60
[ 22.014563] [<
ffffffff810676f8>] ex_handler_rdmsr_unsafe+0x68/0x70
[ 22.014564] [<
ffffffff810677d9>] fixup_exception+0x39/0x50
[ 22.014604] [<
ffffffff8102e400>] do_general_protection+0x80/0x150
[ 22.014610] [<
ffffffff817f9ec8>] general_protection+0x28/0x30
[ 22.014635] [<
ffffffff81687940>] ? get_target_pstate_use_performance+0xb0/0xb0
[ 22.014642] [<
ffffffff810600c7>] ? native_read_msr+0x7/0x40
[ 22.014657] [<
ffffffff81688123>] intel_pstate_hwp_set+0x23/0x130
[ 22.014660] [<
ffffffff81688406>] intel_pstate_set_policy+0x1b6/0x340
[ 22.014662] [<
ffffffff816829bb>] cpufreq_set_policy+0xeb/0x2c0
[ 22.014664] [<
ffffffff81682f39>] cpufreq_init_policy+0x79/0xe0
[ 22.014666] [<
ffffffff81682cb0>] ? cpufreq_update_policy+0x120/0x120
[ 22.014669] [<
ffffffff816833a6>] cpufreq_online+0x406/0x820
[ 22.014671] [<
ffffffff8168381f>] cpufreq_add_dev+0x5f/0x90
[ 22.014717] [<
ffffffff81530ac8>] subsys_interface_register+0xb8/0x100
[ 22.014719] [<
ffffffff816821bc>] cpufreq_register_driver+0x14c/0x210
[ 22.014749] [<
ffffffff81fe1d90>] intel_pstate_init+0x39d/0x4d5
[ 22.014751] [<
ffffffff81fe13f2>] ? cpufreq_gov_dbs_init+0x12/0x12
Cc: 4.3+ <stable@vger.kernel.org> # 4.3+
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Rafael J. Wysocki [Sat, 1 Oct 2016 23:42:33 +0000 (01:42 +0200)]
Merge branch 'pm-cpufreq-sched' into pm-cpufreq
Colin Ian King [Sun, 25 Sep 2016 21:40:13 +0000 (14:40 -0700)]
cpufreq: st: add missing \n to end of dev_err message
Trival fix, dev_err message is missing a \n, so add it.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Colin Ian King [Sun, 25 Sep 2016 21:35:43 +0000 (14:35 -0700)]
cpufreq: kirkwood: add missing \n to end of dev_err messages
Trival fix, dev_err messages are missing a \n, so add it.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Hoan Tran [Wed, 14 Sep 2016 23:08:28 +0000 (16:08 -0700)]
cpufreq: CPPC: Avoid overflow when calculating desired_perf
This patch fixes overflow issue when calculating the desired_perf.
Fixes:
ad38677df44b (cpufreq: CPPC: Force reporting values in KHz to fix user space interface)
Signed-off-by: Hoan Tran <hotran@apm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Dave Gerlach [Wed, 14 Sep 2016 20:41:37 +0000 (15:41 -0500)]
cpufreq: ti: Use generic platdev driver
Now that the cpufreq-dt-platdev is used to create the cpufreq-dt platform
device for all OMAP platforms and the platform code that did it
before has been removed, add ti,am33xx and ti,dra7xx to the machine list
in cpufreq-dt-platdev which had relied on the removed platform code to do
this previously.
Fixes:
7694ca6e1d6f (cpufreq: omap: Use generic platdev driver)
Signed-off-by: Dave Gerlach <d-gerlach@ti.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: 4.7+ <stable@vger.kernel.org> # 4.7+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Srinivas Pandruvada [Wed, 14 Sep 2016 00:41:33 +0000 (17:41 -0700)]
cpufreq: intel_pstate: Add io_boost trace
Add io_boost percent to current pstate_sample tracepoint.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Rafael J. Wysocki [Wed, 14 Sep 2016 00:28:13 +0000 (02:28 +0200)]
cpufreq: intel_pstate: Use IOWAIT flag in Atom algorithm
Modify the P-state selection algorithm for Atom processors to use
the new SCHED_CPUFREQ_IOWAIT flag instead of the questionable
get_cpu_iowait_time_us() function.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Rafael J. Wysocki [Fri, 9 Sep 2016 22:00:31 +0000 (00:00 +0200)]
cpufreq: schedutil: Add iowait boosting
Modify the schedutil cpufreq governor to boost the CPU
frequency if the SCHED_CPUFREQ_IOWAIT flag is passed to
it via cpufreq_update_util().
If that happens, the frequency is set to the maximum during
the first update after receiving the SCHED_CPUFREQ_IOWAIT flag
and then the boost is reduced by half during each following update.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Looks-good-to: Steve Muckle <smuckle@linaro.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Rafael J. Wysocki [Fri, 9 Sep 2016 21:59:33 +0000 (23:59 +0200)]
cpufreq / sched: SCHED_CPUFREQ_IOWAIT flag to indicate iowait condition
Testing indicates that it is possible to improve performace
significantly without increasing energy consumption too much by
teaching cpufreq governors to bump up the CPU performance level if
the in_iowait flag is set for the task in enqueue_task_fair().
For this purpose, define a new cpufreq_update_util() flag
SCHED_CPUFREQ_IOWAIT and modify enqueue_task_fair() to pass that
flag to cpufreq_update_util() in the in_iowait case. That generally
requires cpufreq_update_util() to be called directly from there,
because update_load_avg() may not be invoked in that case.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Looks-good-to: Steve Muckle <smuckle@linaro.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Rafael J. Wysocki [Tue, 13 Sep 2016 00:53:37 +0000 (02:53 +0200)]
Merge tag 'samsung-defconfig-schedutil-4.9' of git://git./linux/kernel/git/krzk/linux into pm-cpufreq-sched
The schedutil cpufreq governor will be switched from tristate to bool. Fix
defconfigs.
* tag 'samsung-defconfig-schedutil-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux:
ARM: multi_v7_defconfig: Don't attempt to enable schedutil governor as module
ARM: exynos_defconfig: Don't attempt to enable schedutil governor as module
Al Stone [Wed, 20 Jul 2016 21:10:04 +0000 (15:10 -0600)]
cpufreq: CPPC: Force reporting values in KHz to fix user space interface
When CPPC is being used by ACPI on arm64, user space tools such as
cpupower report CPU frequency values from sysfs that are incorrect.
What the driver was doing was reporting the values given by ACPI tables
in whatever scale was used to provide them. However, the ACPI spec
defines the CPPC values as unitless abstract numbers. Internal kernel
structures such as struct perf_cap, in contrast, expect these values
to be in KHz. When these struct values get reported via sysfs, the
user space tools also assume they are in KHz, causing them to report
incorrect values (for example, reporting a CPU frequency of 1MHz when
it should be 1.8GHz).
The downside is that this approach has some assumptions:
(1) It relies on SMBIOS3 being used, *and* that the Max Frequency
value for a processor is set to a non-zero value.
(2) It assumes that all processors run at the same speed, or that
the CPPC values have all been scaled to reflect relative speed.
This patch retrieves the largest CPU Max Frequency from a type 4 DMI
record that it can find. This may not be an issue, however, as a
sampling of DMI data on x86 and arm64 indicates there is often only
one such record regardless. Since CPPC is relatively new, it is
unclear if the ACPI ASL will always be written to reflect any sort
of relative performance of processors of differing speeds.
(3) It assumes that performance and frequency both scale linearly.
For arm64 servers, this may be sufficient, but it does rely on
firmware values being set correctly. Hence, other approaches will
be considered in the future.
This has been tested on three arm64 servers, with and without DMI, with
and without CPPC support.
Signed-off-by: Al Stone <ahs3@redhat.com>
Signed-off-by: Prashanth Prakash <pprakash@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Viresh Kumar [Mon, 12 Sep 2016 06:37:05 +0000 (12:07 +0530)]
cpufreq: create link to policy only for registered CPUs
If a cpufreq driver is registered very early in the boot stage (e.g.
registered from postcore_initcall()), then cpufreq core may generate
kernel warnings for it.
In this case, the CPUs are brought online, then the cpufreq driver is
registered, and then the CPU topology devices are registered. However,
by the time cpufreq_add_dev() gets called, the cpu device isn't stored
in the per-cpu variable (cpu_sys_devices,) which is read by
get_cpu_device().
So the cpufreq core fails to get device for the CPU, for which
cpufreq_add_dev() was called in the first place and we will hit a
WARN_ON(!cpu_dev).
Even if we reuse the 'dev' parameter passed to cpufreq_add_dev() to
avoid that warning, there might be other CPUs online that share the
policy with the cpu for which cpufreq_add_dev() is called. Eventually
get_cpu_device() will return NULL for them as well, and we will hit the
same WARN_ON() again.
In order to fix these issues, change cpufreq core to create links to the
policy for a cpu only when cpufreq_add_dev() is called for that CPU.
Reuse the 'real_cpus' mask to track that as well.
Note that cpufreq_remove_dev() already handles removal of the links for
individual CPUs and cpufreq_add_dev() has aligned with that now.
Reported-by: Russell King <rmk+kernel@arm.linux.org.uk>
Tested-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Julia Lawall [Sun, 11 Sep 2016 13:06:01 +0000 (15:06 +0200)]
intel_pstate: constify local structures
For structure types defined in the same file or local header files, find
top-level static structure declarations that have the following
properties:
1. Never reassigned.
2. Address never taken
3. Not passed to a top-level macro call
4. No pointer or array-typed field passed to a function or stored in a
variable.
Declare structures having all of these properties as const.
Done using Coccinelle.
Based on a suggestion by Joe Perches <joe@perches.com>.
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Viresh Kumar [Fri, 9 Sep 2016 11:18:08 +0000 (16:48 +0530)]
cpufreq: dt: Support governor tunables per policy
The cpufreq-dt driver is also used for systems with multiple
clock/voltage domains for CPUs, i.e. multiple cpufreq policies in a
system.
And in such cases the platform users may want to enable "governor
tunables per policy". Support that via platform data, as not all users
of the driver would want that behavior.
Reported-by: Juri Lelli <Juri.Lelli@arm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Viresh Kumar [Fri, 9 Sep 2016 11:18:07 +0000 (16:48 +0530)]
cpufreq: dt: Update kconfig description
The cpufreq DT driver also supports systems that have multiple
clock/voltage domains for CPUs, i.e. multiple policy systems.
The description of the Kconfig entry was never updated after the driver
was modified to support such systems, fix it.
Reported-by: Juri Lelli <Juri.Lelli@arm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Viresh Kumar [Fri, 9 Sep 2016 11:18:06 +0000 (16:48 +0530)]
cpufreq: dt: Remove unused code
This is leftover from an earlier patch which removed the usage of
platform data but forgot to remove this line. Remove it now.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Jean Delvare [Fri, 9 Sep 2016 11:05:24 +0000 (13:05 +0200)]
MAINTAINERS: Add Documentation/cpu-freq/
I am told the cpufreq documentation updates should go to the PM list.
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Geert Uytterhoeven [Tue, 6 Sep 2016 12:18:20 +0000 (14:18 +0200)]
cpufreq: dt: Add support for r8a7792
Add the compatible string for supporting the generic cpufreq driver on
the Renesas R-Car V2H (r8a7792) SoC.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Rafael J. Wysocki [Mon, 12 Sep 2016 12:49:29 +0000 (14:49 +0200)]
Merge back earlier cpufreq material for v4.9.
Jean Delvare [Thu, 8 Sep 2016 21:05:07 +0000 (23:05 +0200)]
cpufreq-stats: Minor documentation fix
The cpufreq-stats code can no longer be built as a module, so it now
appears with square brackets in menuconfig.
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Fixes:
1aefc75b2449 (cpufreq: stats: Make the stats code non-modular)
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Linus Torvalds [Sun, 4 Sep 2016 21:31:46 +0000 (14:31 -0700)]
Linux 4.8-rc5
Linus Torvalds [Sun, 4 Sep 2016 15:45:41 +0000 (08:45 -0700)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 fix from Thomas Gleixner:
"A single fix for an AMD erratum so machines without a BIOS fix work"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/AMD: Apply erratum 665 on machines without a BIOS fix
Linus Torvalds [Sun, 4 Sep 2016 15:43:45 +0000 (08:43 -0700)]
Merge branch 'timers-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull timer fixes from Thomas Gleixner:
"Two fixlet from the timers departement:
- A fix for scheduler stalls in the tick idle code affecting
NOHZ_FULL kernels
- A trivial compile fix"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tick/nohz: Fix softlockup on scheduler stalls in kvm guest
clocksource/drivers/atmel-pit: Fix compilation error
Linus Torvalds [Sun, 4 Sep 2016 00:29:58 +0000 (17:29 -0700)]
Merge tag 'dm-4.8-fixes-4' of git://git./linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
- a stable fix in both DM crypt and DM log-writes for too large bios
(as generated by bcache)
- two other stable fixes for DM log-writes
- a stable fix for a DM crypt bug that could result in freeing pointers
from uninitialized memory in the tfm allocation error path
- a DM bufio cleanup to discontinue using create_singlethread_workqueue()
* tag 'dm-4.8-fixes-4' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm bufio: remove use of deprecated create_singlethread_workqueue()
dm crypt: fix free of bad values after tfm allocation failure
dm crypt: fix error with too large bios
dm log writes: fix check of kthread_run() return value
dm log writes: fix bug with too large bios
dm log writes: move IO accounting earlier to fix error path
Linus Torvalds [Sat, 3 Sep 2016 19:40:45 +0000 (12:40 -0700)]
Merge branch 'for-linus-4.8' of git://git./linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
"I'm still prepping a set of fixes for btrfs fsync, just nailing down a
hard to trigger memory corruption. For now, these are tested and ready."
* 'for-linus-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
btrfs: fix one bug that process may endlessly wait for ticket in wait_reserve_ticket()
Btrfs: fix endless loop in balancing block groups
Btrfs: kill invalid ASSERT() in process_all_refs()
Linus Torvalds [Sat, 3 Sep 2016 19:31:37 +0000 (12:31 -0700)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
"arm64 and arm/perf fixes:
- arm64 fix: debug exception unmasking on the CPU resume path
- ARM PMU fixes: memory leak on error path and NULL pointer
dereference"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: kernel: Fix unmasked debug exceptions when restoring mdscr_el1
drivers/perf: arm_pmu: Fix NULL pointer dereference during probe
drivers/perf: arm_pmu: Fix leak in error path
Linus Torvalds [Sat, 3 Sep 2016 18:38:43 +0000 (11:38 -0700)]
Merge tag 'char-misc-4.8-rc5' of git://git./linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes from Greg KH:
"Here are a number of small driver fixes for 4.8-rc5.
The largest thing here is deleting an obsolete driver,
drivers/misc/bh1780gli.c, as the functionality of it was replaced by
an iio driver a while ago.
The other fixes are things that have been reported, or reverts of
broken stuff (the binder change). All of these changes have been in
linux-next for a while with no reported issues"
* tag 'char-misc-4.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
thunderbolt: Don't declare Falcon Ridge unsupported
thunderbolt: Add support for INTEL_FALCON_RIDGE_2C controller.
thunderbolt: Fix resume quirk for Falcon Ridge 4C.
lkdtm: Mark lkdtm_rodata_do_nothing() notrace
mei: me: disable driver on SPT SPS firmware
Revert "android: binder: fix dangling pointer comparison"
drivers/iio/light/Kconfig: SENSORS_BH1780 cleanup
android: binder: fix dangling pointer comparison
misc: delete bh1780 driver
Linus Torvalds [Sat, 3 Sep 2016 18:36:55 +0000 (11:36 -0700)]
Merge tag 'driver-core-4.8-rc5' of git://git./linux/kernel/git/gregkh/driver-core
Pull driver core fixes from Greg KH:
"Here are three small fixes for 4.8-rc5.
One for sysfs, one for kernfs, and one documentation fix, all for
reported issues. All of these have been in linux-next for a while"
* tag 'driver-core-4.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
sysfs: correctly handle read offset on PREALLOC attrs
documentation: drivers/core/of: fix name of of_node symlink
kernfs: don't depend on d_find_any_alias() when generating notifications
Linus Torvalds [Sat, 3 Sep 2016 18:33:33 +0000 (11:33 -0700)]
Merge tag 'staging-4.8-rc5' of git://git./linux/kernel/git/gregkh/staging
Pull staging/IIO driver fixes from Greg KH:
"Here are a number of small fixes for staging and IIO drivers that
resolve reported problems.
Full details are in the shortlog. All of these have been in
linux-next with no reported issues"
* tag 'staging-4.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (35 commits)
arm: dts: rockchip: add reset node for the exist saradc SoCs
arm64: dts: rockchip: add reset saradc node for rk3368 SoCs
iio: adc: rockchip_saradc: reset saradc controller before programming it
iio: accel: kxsd9: Fix raw read return
iio: adc: ti_am335x_adc: Increase timeout value waiting for ADC sample
iio: adc: ti_am335x_adc: Protect FIFO1 from concurrent access
include/linux: fix excess fence.h kernel-doc notation
staging: wilc1000: correctly check if associatedsta has not been found
staging: wilc1000: NULL dereference on error
staging: wilc1000: txq_event: Fix coding error
MAINTAINERS: Add file patterns for ion device tree bindings
MAINTAINERS: Update maintainer entry for wilc1000
iio: chemical: atlas-ph-sensor: fix typo in val assignment
iio: fix sched WARNING "do not call blocking ops when !TASK_RUNNING"
staging: comedi: ni_mio_common: fix AO inttrig backwards compatibility
staging: comedi: dt2811: fix a precedence bug
staging: comedi: adv_pci1760: Do not return EINVAL for CMDF_ROUND_DOWN.
staging: comedi: ni_mio_common: fix wrong insn_write handler
staging: comedi: comedi_test: fix timer race conditions
staging: comedi: daqboard2000: bug fix board type matching code
...
Linus Torvalds [Sat, 3 Sep 2016 18:29:31 +0000 (11:29 -0700)]
Merge tag 'tty-4.8-rc5' of git://git./linux/kernel/git/gregkh/tty
Pull serial driver fixes from Greg KH:
"Here are some small serial driver fixes for 4.8-rc5. One fixes an
oft-reported build issue with the fintek driver, another reverts a
patch that was causing problems, one fixes a crash, and some new
device ids were added.
All of these have been in linux-next for a while"
* tag 'tty-4.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
serial: 8250: added acces i/o products quad and octal serial cards
serial: 8250_mid: fix divide error bug if baud rate is 0
Revert "tty/serial/8250: use mctrl_gpio helpers"
8250/fintek: rename IRQ_MODE macro
Linus Torvalds [Sat, 3 Sep 2016 18:24:23 +0000 (11:24 -0700)]
Merge tag 'usb-4.8-rc5' of git://git./linux/kernel/git/gregkh/usb
Pull USB/PHY fixes from Greg KH:
"Here are some USB and PHY driver fixes for 4.8-rc5
Nothing major, lots of little fixes for reported bugs, and a build fix
for a missing .h file that the phy drivers needed. All of these have
been in linux-next for a while with no reported issues"
* tag 'usb-4.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (24 commits)
usb: musb: Fix locking errors for host only mode
usb: dwc3: gadget: always decrement by 1
usb: dwc3: debug: fix ep name on trace output
usb: gadget: udc: core: don't starve DMA resources
USB: serial: option: add WeTelecom 0x6802 and 0x6803 products
USB: avoid left shift by -1
USB: fix typo in wMaxPacketSize validation
usb: gadget: Add the gserial port checking in gs_start_tx()
usb: dwc3: gadget: don't rely on jiffies while holding spinlock
usb: gadget: fsl_qe_udc: signedness bug in qe_get_frame()
usb: gadget: function: f_rndis: socket buffer may be NULL
usb: gadget: function: f_eem: socket buffer may be NULL
usb: renesas_usbhs: gadget: fix return value check in usbhs_mod_gadget_probe()
usb: dwc2: Add reset control to dwc2
usb: dwc3: core: allow device to runtime_suspend several times
usb: dwc3: pci: runtime_resume child device
USB: serial: option: add WeTelecom WM-D200
usb: chipidea: udc: don't touch DP when controller is in host mode
USB: serial: mos7840: fix non-atomic allocation in write path
USB: serial: mos7720: fix non-atomic allocation in write path
...
Linus Torvalds [Sat, 3 Sep 2016 18:02:50 +0000 (11:02 -0700)]
devpts: return NULL pts 'priv' entry for non-devpts nodes
In commit
8ead9dd54716 ("devpts: more pty driver interface cleanups") I
made devpts_get_priv() just return the dentry->fs_data directly. And
because I thought it wouldn't happen, I added a warning if you ever saw
a pts node that wasn't on devpts.
And no, that warning never triggered under any actual real use, but you
can trigger it by creating nonsensical pts nodes by hand.
So just revert the warning, and make devpts_get_priv() return NULL for
that case like it used to.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: stable@vger.kernel.org # 4.6+
Cc: Eric W Biederman" <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sat, 3 Sep 2016 04:05:38 +0000 (21:05 -0700)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"A collection of fixes for the nvme over fabrics code"
* 'for-linus' of git://git.kernel.dk/linux-block:
nvme-rdma: Get rid of redundant defines
nvme-rdma: Get rid of duplicate variable
nvme: fabrics drivers don't need the nvme-pci driver
nvme-fabrics: get a reference when reusing a nvme_host structure
nvme-fabrics: change NQN UUID to big-endian format
nvme-loop: set sqsize to 0-based value, per spec
nvme-rdma: fix sqsize/hsqsize per spec
fabrics: define admin sqsize min default, per spec
nvmet-rdma: +1 to *queue_size from hsqsize/hrqsize
nvmet-rdma: Fix use after free
nvme-rdma: initialize ret to zero to avoid returning garbage
Linus Torvalds [Fri, 2 Sep 2016 22:33:54 +0000 (15:33 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jmorris/linux-security
Pull TPM bugfix from James Morris.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
tpm: invalid self test error message
Jarkko Sakkinen [Thu, 1 Sep 2016 23:36:58 +0000 (02:36 +0300)]
tpm: invalid self test error message
The driver emits invalid self test error message even though the init
succeeds.
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Fixes:
cae8b441fc20 ("tpm: Factor out common startup code")
Reviewed-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Linus Torvalds [Fri, 2 Sep 2016 22:16:04 +0000 (15:16 -0700)]
Merge tag 'acpi-4.8-rc5' of git://git./linux/kernel/git/rafael/linux-pm
Pull ACPI fixes ffrom Rafael Wysocki:
"Two stable-candidate fixes for the ACPI early device probing code
added during the 4.4 cycle, one fixing a typo in a stub macro used
when CONFIG_ACPI is unset and one that prevents sleeping functions
from being called under a spinlock (Lorenzo Pieralisi)"
* tag 'acpi-4.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI / drivers: replace acpi_probe_lock spinlock with mutex
ACPI / drivers: fix typo in ACPI_DECLARE_PROBE_ENTRY macro
Linus Torvalds [Fri, 2 Sep 2016 22:07:41 +0000 (15:07 -0700)]
Merge tag 'pm-4.8-rc5' of git://git./linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"This includes a stable-candidate cpufreq-dt driver problem fix and
annotations of tracepoints in the runtime PM framework.
Specifics:
- Fix the definition of the cpufreq-dt driver's machines table
introduced during the 4.7 cycle that should be NULL-terminated, but
the termination entry is missing from it (Wei Yongjun).
- Annotate tracepoints in the runtime PM framework's core so as to
allow the functions containing them to be called from the idle code
path without causing RCU to complain about illegal usage (Paul
McKenney)"
* tag 'pm-4.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM / runtime: Add _rcuidle suffix to allow rpm_idle() use from idle
PM / runtime: Add _rcuidle suffix to allow rpm_resume() to be called from idle
cpufreq: dt: Add terminate entry for of_device_id tables
Rafael J. Wysocki [Fri, 2 Sep 2016 20:38:30 +0000 (22:38 +0200)]
Merge branches 'pm-cpufreq-fixes' and 'pm-core-fixes'
* pm-cpufreq-fixes:
cpufreq: dt: Add terminate entry for of_device_id tables
* pm-core-fixes:
PM / runtime: Add _rcuidle suffix to allow rpm_idle() use from idle
PM / runtime: Add _rcuidle suffix to allow rpm_resume() to be called from idle
Lorenzo Pieralisi [Tue, 16 Aug 2016 15:59:53 +0000 (16:59 +0100)]
ACPI / drivers: replace acpi_probe_lock spinlock with mutex
Commit
e647b532275b ("ACPI: Add early device probing infrastructure")
introduced code that allows inserting driver specific
struct acpi_probe_entry probe entries into ACPI linker sections
(one per-subsystem, eg irqchip, clocksource) that are then walked
to retrieve the data and function hooks required to probe the
respective kernel components.
Probing for all entries in a section is triggered through
the __acpi_probe_device_table() function, that in turn, according
to the table ID a given probe entry reports parses the table
with the function retrieved from the respective section structures
(ie struct acpi_probe_entry). Owing to the current ACPI table
parsing implementation, the __acpi_probe_device_table() function
has to share global variables with the acpi_match_madt() function, so
in order to guarantee mutual exclusion locking is required
between the two functions.
Current kernel code implements the locking through the acpi_probe_lock
spinlock; this has the side effect of requiring all code called
within the lock (ie struct acpi_probe_entry.probe_{table/subtbl} hooks)
not to sleep.
However, kernel subsystems that make use of the early probing
infrastructure are relying on kernel APIs that may sleep (eg
irq_domain_alloc_fwnode(), among others) in the function calls
pointed at by struct acpi_probe_entry.{probe_table/subtbl} entries
(eg gic_v2_acpi_init()), which is a bug.
Since __acpi_probe_device_table() is called from context
that is allowed to sleep the acpi_probe_lock spinlock can be replaced
with a mutex; this fixes the issue whilst still guaranteeing
mutual exclusion.
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Fixes:
e647b532275b (ACPI: Add early device probing infrastructure)
Cc: 4.4+ <stable@vger.kernel.org> # 4.4+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Lorenzo Pieralisi [Tue, 16 Aug 2016 15:59:52 +0000 (16:59 +0100)]
ACPI / drivers: fix typo in ACPI_DECLARE_PROBE_ENTRY macro
When the ACPI_DECLARE_PROBE_ENTRY macro was added in
commit
e647b532275b ("ACPI: Add early device probing infrastructure"),
a stub macro adding an unused entry was added for the !CONFIG_ACPI
Kconfig option case to make sure kernel code making use of the
macro did not require to be guarded within CONFIG_ACPI in order to
be compiled.
The stub macro was never used since all kernel code that defines
ACPI_DECLARE_PROBE_ENTRY entries is currently guarded within
CONFIG_ACPI; it contains a typo that should be nonetheless fixed.
Fix the typo in the stub (ie !CONFIG_ACPI) ACPI_DECLARE_PROBE_ENTRY()
macro so that it can actually be used if needed.
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Fixes:
e647b532275b (ACPI: Add early device probing infrastructure)
Cc: 4.4+ <stable@vger.kernel.org> # 4.4+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Emanuel Czirai [Fri, 2 Sep 2016 05:35:50 +0000 (07:35 +0200)]
x86/AMD: Apply erratum 665 on machines without a BIOS fix
AMD F12h machines have an erratum which can cause DIV/IDIV to behave
unpredictably. The workaround is to set MSRC001_1029[31] but sometimes
there is no BIOS update containing that workaround so let's do it
ourselves unconditionally. It is simple enough.
[ Borislav: Wrote commit message. ]
Signed-off-by: Emanuel Czirai <icanrealizeum@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Yaowu Xu <yaowu@google.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20160902053550.18097-1-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Steven Rostedt [Wed, 25 May 2016 17:47:26 +0000 (13:47 -0400)]
x86/paravirt: Do not trace _paravirt_ident_*() functions
Łukasz Daniluk reported that on a RHEL kernel that his machine would lock up
after enabling function tracer. I asked him to bisect the functions within
available_filter_functions, which he did and it came down to three:
_paravirt_nop(), _paravirt_ident_32() and _paravirt_ident_64()
It was found that this is only an issue when noreplace-paravirt is added
to the kernel command line.
This means that those functions are most likely called within critical
sections of the funtion tracer, and must not be traced.
In newer kenels _paravirt_nop() is defined within gcc asm(), and is no
longer an issue. But both _paravirt_ident_{32,64}() causes the
following splat when they are traced:
mm/pgtable-generic.c:33: bad pmd
ffff8800d2435150(
0000000001d00054)
mm/pgtable-generic.c:33: bad pmd
ffff8800d3624190(
0000000001d00070)
mm/pgtable-generic.c:33: bad pmd
ffff8800d36a5110(
0000000001d00054)
mm/pgtable-generic.c:33: bad pmd
ffff880118eb1450(
0000000001d00054)
NMI watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [systemd-journal:469]
Modules linked in: e1000e
CPU: 2 PID: 469 Comm: systemd-journal Not tainted 4.6.0-rc4-test+ #513
Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v02.05 05/07/2012
task:
ffff880118f740c0 ti:
ffff8800d4aec000 task.ti:
ffff8800d4aec000
RIP: 0010:[<
ffffffff81134148>] [<
ffffffff81134148>] queued_spin_lock_slowpath+0x118/0x1a0
RSP: 0018:
ffff8800d4aefb90 EFLAGS:
00000246
RAX:
0000000000000000 RBX:
0000000000000000 RCX:
ffff88011eb16d40
RDX:
ffffffff82485760 RSI:
000000001f288820 RDI:
ffffea0000008030
RBP:
ffff8800d4aefb90 R08:
00000000000c0000 R09:
0000000000000000
R10:
ffffffff821c8e0e R11:
0000000000000000 R12:
ffff880000200fb8
R13:
00007f7a4e3f7000 R14:
ffffea000303f600 R15:
ffff8800d4b562e0
FS:
00007f7a4e3d7840(0000) GS:
ffff88011eb00000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
00007f7a4e3f7000 CR3:
00000000d3e71000 CR4:
00000000001406e0
Call Trace:
_raw_spin_lock+0x27/0x30
handle_pte_fault+0x13db/0x16b0
handle_mm_fault+0x312/0x670
__do_page_fault+0x1b1/0x4e0
do_page_fault+0x22/0x30
page_fault+0x28/0x30
__vfs_read+0x28/0xe0
vfs_read+0x86/0x130
SyS_read+0x46/0xa0
entry_SYSCALL_64_fastpath+0x1e/0xa8
Code: 12 48 c1 ea 0c 83 e8 01 83 e2 30 48 98 48 81 c2 40 6d 01 00 48 03 14 c5 80 6a 5d 82 48 89 0a 8b 41 08 85 c0 75 09 f3 90 8b 41 08 <85> c0 74 f7 4c 8b 09 4d 85 c9 74 08 41 0f 18 09 eb 02 f3 90 8b
Reported-by: Łukasz Daniluk <lukasz.daniluk@intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 2 Sep 2016 16:32:15 +0000 (09:32 -0700)]
Merge branch 'overlayfs-linus' of git://git./linux/kernel/git/mszeredi/vfs
Pull overlayfs fixes from Miklos Szeredi:
"Most of this is regression fixes for posix acl behavior introduced in
4.8-rc1 (these were caught by the pjd-fstest suite). The are also
miscellaneous fixes marked as stable material and cleanups.
Other than overlayfs code, it touches <linux/fs.h> to add a constant
with which to disable posix acl caching. No changes needed to the
actual caching code, it automatically does the right thing, although
later we may want to optimize this case.
I'm now testing overlayfs with the following test suites to catch
regressions:
- unionmount-testsuite
- xfstests
- pjd-fstest"
* 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
ovl: update doc
ovl: listxattr: use strnlen()
ovl: Switch to generic_getxattr
ovl: copyattr after setting POSIX ACL
ovl: Switch to generic_removexattr
ovl: Get rid of ovl_xattr_noacl_handlers array
ovl: Fix OVL_XATTR_PREFIX
ovl: fix spelling mistake: "directries" -> "directories"
ovl: don't cache acl on overlay layer
ovl: use cached acl on underlying layer
ovl: proper cleanup of workdir
ovl: remove posix_acl_default from workdir
ovl: handle umask and posix_acl_default correctly on creation
ovl: don't copy up opaqueness
James Morse [Fri, 26 Aug 2016 15:03:42 +0000 (16:03 +0100)]
arm64: kernel: Fix unmasked debug exceptions when restoring mdscr_el1
Changes to make the resume from cpu_suspend() code behave more like
secondary boot caused debug exceptions to be unmasked early by
__cpu_setup(). We then go on to restore mdscr_el1 in cpu_do_resume(),
potentially taking break or watch points based on uninitialised registers.
Mask debug exceptions in cpu_do_resume(), which is specific to resume
from cpu_suspend(). Debug exceptions will be restored to their original
state by local_dbg_restore() in cpu_suspend(), which runs after
hw_breakpoint_restore() has re-initialised the other registers.
Reported-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Fixes:
cabe1c81ea5b ("arm64: Change cpu_resume() to enable mmu early then access sleep_sp by va")
Cc: <stable@vger.kernel.org> # 4.7+
Signed-off-by: James Morse <james.morse@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Stefan Wahren [Sat, 27 Aug 2016 16:19:50 +0000 (16:19 +0000)]
drivers/perf: arm_pmu: Fix NULL pointer dereference during probe
Patch
7f1d642fbb5c ("drivers/perf: arm-pmu: Fix handling of SPI lacking
interrupt-affinity property") unintended also fixes perf_event support
for bcm2835 which doesn't have PMU interrupts. Unfortunately this change
introduce a NULL pointer dereference on bcm2835, because irq_is_percpu
always expected to be called with a valid IRQ. So fix this regression
by validating the IRQ before.
Tested-by: Kevin Hilman <khilman@baylibre.com>
Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
Fixes:
7f1d642fbb5c ("drivers/perf: arm-pmu: Fix handling of SPI lacking "interrupt-affinity" property")
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Stefan Wahren [Sat, 27 Aug 2016 16:19:49 +0000 (16:19 +0000)]
drivers/perf: arm_pmu: Fix leak in error path
In case of a IRQ type mismatch in of_pmu_irq_cfg() the
device node for interrupt affinity isn't freed. So fix this
issue by calling of_node_put().
Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
Fixes:
fa8ad7889d83 ("arm: perf: factor arm_pmu core out to drivers")
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Linus Torvalds [Fri, 2 Sep 2016 14:58:31 +0000 (07:58 -0700)]
Merge tag 'dmaengine-fix-4.8-rc5' of git://git.infradead.org/users/vkoul/slave-dma
Pull dmaengine fixes from Vinod Koul:
"The fixes this time are all in drivers:
- possible NULL dereference in img-mdc
- correct device identity for free_irq in at_xdmac
- missing of_node_put() in fsl probe
- fix debug log and hotchain corner case for pxa-dma
- fix checking hardware bits in isr in usb dmac"
* tag 'dmaengine-fix-4.8-rc5' of git://git.infradead.org/users/vkoul/slave-dma:
dmaengine: img-mdc: fix a possible NULL dereference
dmaengine: at_xdmac: fix to pass correct device identity to free_irq()
dmaengine: fsl_raid: add missing of_node_put() in fsl_re_probe()
dmaengine: pxa_dma: fix debug message
dmaengine: pxa_dma: fix hotchain corner case
dmaengine: usb-dmac: check CHCR.DE bit in usb_dmac_isr_channel()
Linus Torvalds [Fri, 2 Sep 2016 14:53:00 +0000 (07:53 -0700)]
Merge tag 'drm-fixes-for-4.8-rc5' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"Contains fixes for imx, amdgpu, vc4, msm and one nouveau ACPI fix"
* tag 'drm-fixes-for-4.8-rc5' of git://people.freedesktop.org/~airlied/linux:
drm/amdgpu: record error code when ring test failed
drm/amd/amdgpu: compute ring test fail during S4 on CI
drm/amd/amdgpu: sdma resume fail during S4 on CI
drm/nouveau/acpi: use DSM if bridge does not support D3cold
drm/imx: fix crtc vblank state regression
drm/imx: Add active plane reconfiguration support
drm/msm: protect against faults from copy_from_user() in submit ioctl
drm/msm: fix use of copy_from_user() while holding spinlock
drm/vc4: Fix oops when userspace hands in a bad BO.
drm/vc4: Fix overflow mem unreferencing when the binner runs dry.
drm/vc4: Free hang state before destroying BO cache.
drm/vc4: Fix handling of a pm_runtime_get_sync() success case.
drm/vc4: Use drm_malloc_ab to fix large rendering jobs.
drm/vc4: Use drm_free_large() on handles to match its allocation.
Wanpeng Li [Fri, 2 Sep 2016 06:38:23 +0000 (14:38 +0800)]
tick/nohz: Fix softlockup on scheduler stalls in kvm guest
tick_nohz_start_idle() is prevented to be called if the idle tick can't
be stopped since commit
1f3b0f8243cb934 ("tick/nohz: Optimize nohz idle
enter"). As a result, after suspend/resume the host machine, full dynticks
kvm guest will softlockup:
NMI watchdog: BUG: soft lockup - CPU#0 stuck for 26s! [swapper/0:0]
Call Trace:
default_idle+0x31/0x1a0
arch_cpu_idle+0xf/0x20
default_idle_call+0x2a/0x50
cpu_startup_entry+0x39b/0x4d0
rest_init+0x138/0x140
? rest_init+0x5/0x140
start_kernel+0x4c1/0x4ce
? set_init_arg+0x55/0x55
? early_idt_handler_array+0x120/0x120
x86_64_start_reservations+0x24/0x26
x86_64_start_kernel+0x142/0x14f
In addition, cat /proc/stat | grep cpu in guest or host:
cpu 398 16 5049 15754 5490 0 1 46 0 0
cpu0 206 5 450 0 0 0 1 14 0 0
cpu1 81 0 3937 3149 1514 0 0 9 0 0
cpu2 45 6 332 6052 2243 0 0 11 0 0
cpu3 65 2 328 6552 1732 0 0 11 0 0
The idle and iowait states are weird 0 for cpu0(housekeeping).
The bug is present in both guest and host kernels, and they both have
cpu0's idle and iowait states issue, however, host kernel's suspend/resume
path etc will touch watchdog to avoid the softlockup.
- The watchdog will not be touched in tick_nohz_stop_idle path (need be
touched since the scheduler stall is expected) if idle_active flags are
not detected.
- The idle and iowait states will not be accounted when exit idle loop
(resched or interrupt) if idle start time and idle_active flags are
not set.
This patch fixes it by reverting commit
1f3b0f8243cb934 since can't stop
idle tick doesn't mean can't be idle.
Fixes:
1f3b0f8243cb934 ("tick/nohz: Optimize nohz idle enter")
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Cc: Sanjeev Yadav<sanjeev.yadav@spreadtrum.com>
Cc: Gaurav Jindal<gaurav.jindal@spreadtrum.com>
Cc: stable@vger.kernel.org
Cc: kvm@vger.kernel.org
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: http://lkml.kernel.org/r/1472798303-4154-1-git-send-email-wanpeng.li@hotmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Dave Airlie [Fri, 2 Sep 2016 05:55:15 +0000 (15:55 +1000)]
Merge tag 'drm-vc4-fixes-2016-08-29' of https://github.com/anholt/linux into drm-fixes
This pull request brings in fixes for VC4 3D in 4.8, most of which are
covered by testcases.
* tag 'drm-vc4-fixes-2016-08-29' of https://github.com/anholt/linux:
drm/vc4: Fix oops when userspace hands in a bad BO.
drm/vc4: Fix overflow mem unreferencing when the binner runs dry.
drm/vc4: Free hang state before destroying BO cache.
drm/vc4: Fix handling of a pm_runtime_get_sync() success case.
drm/vc4: Use drm_malloc_ab to fix large rendering jobs.
drm/vc4: Use drm_free_large() on handles to match its allocation.
Dave Airlie [Fri, 2 Sep 2016 05:48:38 +0000 (15:48 +1000)]
Merge tag 'imx-drm-fixes-2016-08-30' of git://git.pengutronix.de/git/pza/linux into drm-fixes
imx-drm atomic modeset regression fixes
- add active plane reconfiguration support
- add back crtc vblank state reporting
* tag 'imx-drm-fixes-2016-08-30' of git://git.pengutronix.de/git/pza/linux:
drm/imx: fix crtc vblank state regression
drm/imx: Add active plane reconfiguration support
Linus Torvalds [Fri, 2 Sep 2016 03:32:18 +0000 (20:32 -0700)]
Merge tag 'clk-fixes-for-linus' of git://git./linux/kernel/git/clk/linux
Pull clk fixes from Stephen Boyd:
"A collection of small fixes for various SoC vendor clk drivers"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: rockchip: mark aclk_emmc_noc as a critical clock on rk3399
clk: tegra: remove TEGRA_PLL_USE_LOCK for PLLD/PLLD2
clk: rockchip: fix incorrect GATE bits for {c, g}pll_aclk_perihp_src on rk3399
clk: rockchip: fix incorrect aclk_emmc source gate bits on rk3399
clk: renesas: r8a7795: Fix SD clocks
clk: rockchip: fix rk3399 aclk_vio gate bit
clk: sunxi-ng: Fix inverted test condition in ccu_helper_wait_for_lock
Linus Torvalds [Fri, 2 Sep 2016 01:23:22 +0000 (18:23 -0700)]
Merge branch 'akpm' (patches from Andrew)
Merge fixes from Andrew Morton:
"14 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
rapidio/tsi721: fix incorrect detection of address translation condition
rapidio/documentation/mport_cdev: add missing parameter description
kernel/fork: fix CLONE_CHILD_CLEARTID regression in nscd
MAINTAINERS: Vladimir has moved
mm, mempolicy: task->mempolicy must be NULL before dropping final reference
printk/nmi: avoid direct printk()-s from __printk_nmi_flush()
treewide: remove references to the now unnecessary DEFINE_PCI_DEVICE_TABLE
drivers/scsi/wd719x.c: remove last declaration using DEFINE_PCI_DEVICE_TABLE
mm, vmscan: only allocate and reclaim from zones with pages managed by the buddy allocator
lib/test_hash.c: fix warning in preprocessor symbol evaluation
lib/test_hash.c: fix warning in two-dimensional array init
kconfig: tinyconfig: provide whole choice blocks to avoid warnings
kexec: fix double-free when failing to relocate the purgatory
mm, oom: prevent premature OOM killer invocation for high order request
Alexandre Bounine [Thu, 1 Sep 2016 23:15:18 +0000 (16:15 -0700)]
rapidio/tsi721: fix incorrect detection of address translation condition
Fix incorrect condition to identify involvment of a address translation
mechanism.
This bug results in NULL pointer kernel crash dump in cases when mapping
of inbound RapidIO address range is requested within existing aprture.
Link: http://lkml.kernel.org/r/20160901173144.2983-1-alexandre.bounine@idt.com
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com>
Cc: Barry Wood <barry.wood@idt.com>
Cc: <stable@vger.kernel.org> [4.6+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexandre Bounine [Thu, 1 Sep 2016 23:15:15 +0000 (16:15 -0700)]
rapidio/documentation/mport_cdev: add missing parameter description
Add missing description for rio_mport_cdev driver parameter
'dma_timeout'.
This patch is applicable to kernel versions starting from v4.6.
Link: http://lkml.kernel.org/r/20160901173104.2928-1-alexandre.bounine@idt.com
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com>
Cc: Barry Wood <barry.wood@idt.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michal Hocko [Thu, 1 Sep 2016 23:15:13 +0000 (16:15 -0700)]
kernel/fork: fix CLONE_CHILD_CLEARTID regression in nscd
Commit
fec1d0115240 ("[PATCH] Disable CLONE_CHILD_CLEARTID for abnormal
exit") has caused a subtle regression in nscd which uses
CLONE_CHILD_CLEARTID to clear the nscd_certainly_running flag in the
shared databases, so that the clients are notified when nscd is
restarted. Now, when nscd uses a non-persistent database, clients that
have it mapped keep thinking the database is being updated by nscd, when
in fact nscd has created a new (anonymous) one (for non-persistent
databases it uses an unlinked file as backend).
The original proposal for the CLONE_CHILD_CLEARTID change claimed
(https://lkml.org/lkml/2006/10/25/233):
: The NPTL library uses the CLONE_CHILD_CLEARTID flag on clone() syscalls
: on behalf of pthread_create() library calls. This feature is used to
: request that the kernel clear the thread-id in user space (at an address
: provided in the syscall) when the thread disassociates itself from the
: address space, which is done in mm_release().
:
: Unfortunately, when a multi-threaded process incurs a core dump (such as
: from a SIGSEGV), the core-dumping thread sends SIGKILL signals to all of
: the other threads, which then proceed to clear their user-space tids
: before synchronizing in exit_mm() with the start of core dumping. This
: misrepresents the state of process's address space at the time of the
: SIGSEGV and makes it more difficult for someone to debug NPTL and glibc
: problems (misleading him/her to conclude that the threads had gone away
: before the fault).
:
: The fix below is to simply avoid the CLONE_CHILD_CLEARTID action if a
: core dump has been initiated.
The resulting patch from Roland (https://lkml.org/lkml/2006/10/26/269)
seems to have a larger scope than the original patch asked for. It
seems that limitting the scope of the check to core dumping should work
for SIGSEGV issue describe above.
[Changelog partly based on Andreas' description]
Fixes:
fec1d0115240 ("[PATCH] Disable CLONE_CHILD_CLEARTID for abnormal exit")
Link: http://lkml.kernel.org/r/1471968749-26173-1-git-send-email-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Tested-by: William Preston <wpreston@suse.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Roland McGrath <roland@hack.frob.com>
Cc: Andreas Schwab <schwab@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vladimir Davydov [Thu, 1 Sep 2016 23:15:09 +0000 (16:15 -0700)]
MAINTAINERS: Vladimir has moved
vdavydov@{parallels,virtuozzo}.com will bounce from now on.
Link: http://lkml.kernel.org/r/20160831180752.GB10353@esperanza
Signed-off-by: Vladimir Davydov <vdavydov.dev@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Rientjes [Thu, 1 Sep 2016 23:15:07 +0000 (16:15 -0700)]
mm, mempolicy: task->mempolicy must be NULL before dropping final reference
KASAN allocates memory from the page allocator as part of
kmem_cache_free(), and that can reference current->mempolicy through any
number of allocation functions. It needs to be NULL'd out before the
final reference is dropped to prevent a use-after-free bug:
BUG: KASAN: use-after-free in alloc_pages_current+0x363/0x370 at addr
ffff88010b48102c
CPU: 0 PID: 15425 Comm: trinity-c2 Not tainted 4.8.0-rc2+ #140
...
Call Trace:
dump_stack
kasan_object_err
kasan_report_error
__asan_report_load2_noabort
alloc_pages_current <-- use after free
depot_save_stack
save_stack
kasan_slab_free
kmem_cache_free
__mpol_put <-- free
do_exit
This patch sets current->mempolicy to NULL before dropping the final
reference.
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1608301442180.63329@chino.kir.corp.google.com
Fixes:
cd11016e5f52 ("mm, kasan: stackdepot implementation. Enable stackdepot for SLAB")
Signed-off-by: David Rientjes <rientjes@google.com>
Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: <stable@vger.kernel.org> [4.6+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sergey Senozhatsky [Thu, 1 Sep 2016 23:15:04 +0000 (16:15 -0700)]
printk/nmi: avoid direct printk()-s from __printk_nmi_flush()
__printk_nmi_flush() can be called from nmi_panic(), therefore it has to
test whether it's executed in NMI context and thus must route the
messages through deferred printk() or via direct printk().
This is to avoid potential deadlocks, as described in commit
cf9b1106c81c ("printk/nmi: flush NMI messages on the system panic").
However there remain two places where __printk_nmi_flush() does
unconditional direct printk() calls:
- pr_err("printk_nmi_flush: internal error ...")
- pr_cont("\n")
Factor out print_nmi_seq_line() parts into a new printk_nmi_flush_line()
function, which takes care of in_nmi(), and use it in
__printk_nmi_flush() for printing and error-reporting.
Link: http://lkml.kernel.org/r/20160830161354.581-1-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joe Perches [Thu, 1 Sep 2016 23:15:01 +0000 (16:15 -0700)]
treewide: remove references to the now unnecessary DEFINE_PCI_DEVICE_TABLE
It's been eliminated from the sources, remove it from everywhere else.
Link: http://lkml.kernel.org/r/076eff466fd7edb550c25c8b25d76924ca0eba62.1472660229.git.joe@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joe Perches [Thu, 1 Sep 2016 23:14:58 +0000 (16:14 -0700)]
drivers/scsi/wd719x.c: remove last declaration using DEFINE_PCI_DEVICE_TABLE
Convert it to the preferred const struct pci_device_id instead.
Link: http://lkml.kernel.org/r/95c5e4100c3cd4eda643624f5b70e8d7abceb86c.1472660229.git.joe@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mel Gorman [Thu, 1 Sep 2016 23:14:55 +0000 (16:14 -0700)]
mm, vmscan: only allocate and reclaim from zones with pages managed by the buddy allocator
Firmware Assisted Dump (FA_DUMP) on ppc64 reserves substantial amounts
of memory when booting a secondary kernel. Srikar Dronamraju reported
that multiple nodes may have no memory managed by the buddy allocator
but still return true for populated_zone().
Commit
1d82de618ddd ("mm, vmscan: make kswapd reclaim in terms of
nodes") was reported to cause kswapd to spin at 100% CPU usage when
fadump was enabled. The old code happened to deal with the situation of
a populated node with zero free pages by co-incidence but the current
code tries to reclaim populated zones without realising that is
impossible.
We cannot just convert populated_zone() as many existing users really
need to check for present_pages. This patch introduces a managed_zone()
helper and uses it in the few cases where it is critical that the check
is made for managed pages -- zonelist construction and page reclaim.
Link: http://lkml.kernel.org/r/20160831195104.GB8119@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reported-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Tested-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Geert Uytterhoeven [Thu, 1 Sep 2016 23:14:53 +0000 (16:14 -0700)]
lib/test_hash.c: fix warning in preprocessor symbol evaluation
Some versions of gcc don't like tests for the value of an undefined
preprocessor symbol, even in the #else branch of an #ifndef:
lib/test_hash.c:224:7: warning: "HAVE_ARCH__HASH_32" is not defined [-Wundef]
#elif HAVE_ARCH__HASH_32 != 1
^
lib/test_hash.c:229:7: warning: "HAVE_ARCH_HASH_32" is not defined [-Wundef]
#elif HAVE_ARCH_HASH_32 != 1
^
lib/test_hash.c:234:7: warning: "HAVE_ARCH_HASH_64" is not defined [-Wundef]
#elif HAVE_ARCH_HASH_64 != 1
^
Seen with gcc 4.9, not seen with 4.1.2.
Change the logic to only check the value inside an #ifdef to fix this.
Fixes:
468a9428521e7d00 ("<linux/hash.h>: Add support for architecture-specific functions")
Link: http://lkml.kernel.org/r/20160829214952.1334674-4-arnd@arndb.de
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: George Spelvin <linux@sciencehorizons.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Geert Uytterhoeven [Thu, 1 Sep 2016 23:14:50 +0000 (16:14 -0700)]
lib/test_hash.c: fix warning in two-dimensional array init
lib/test_hash.c: In function 'test_hash_init':
lib/test_hash.c:146:2: warning: missing braces around initializer [-Wmissing-braces]
Fixes:
468a9428521e7d00 ("<linux/hash.h>: Add support for architecture-specific functions")
Link: http://lkml.kernel.org/r/20160829214952.1334674-3-arnd@arndb.de
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: George Spelvin <linux@sciencehorizons.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Arnd Bergmann [Thu, 1 Sep 2016 23:14:47 +0000 (16:14 -0700)]
kconfig: tinyconfig: provide whole choice blocks to avoid warnings
Using "make tinyconfig" produces a couple of annoying warnings that show
up for build test machines all the time:
.config:966:warning: override: NOHIGHMEM changes choice state
.config:965:warning: override: SLOB changes choice state
.config:963:warning: override: KERNEL_XZ changes choice state
.config:962:warning: override: CC_OPTIMIZE_FOR_SIZE changes choice state
.config:933:warning: override: SLOB changes choice state
.config:930:warning: override: CC_OPTIMIZE_FOR_SIZE changes choice state
.config:870:warning: override: SLOB changes choice state
.config:868:warning: override: KERNEL_XZ changes choice state
.config:867:warning: override: CC_OPTIMIZE_FOR_SIZE changes choice state
I've made a previous attempt at fixing them and we discussed a number of
alternatives.
I tried changing the Makefile to use "merge_config.sh -n
$(fragment-list)" but couldn't get that to work properly.
This is yet another approach, based on the observation that we do want
to see a warning for conflicting 'choice' options, and that we can
simply make them non-conflicting by listing all other options as
disabled. This is a trivial patch that we can apply independent of
plans for other changes.
Link: http://lkml.kernel.org/r/20160829214952.1334674-2-arnd@arndb.de
Link: https://storage.kernelci.org/mainline/v4.7-rc6/x86-tinyconfig/build.log
https://patchwork.kernel.org/patch/
9212749/
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Reviewed-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thiago Jung Bauermann [Thu, 1 Sep 2016 23:14:44 +0000 (16:14 -0700)]
kexec: fix double-free when failing to relocate the purgatory
If kexec_apply_relocations fails, kexec_load_purgatory frees pi->sechdrs
and pi->purgatory_buf. This is redundant, because in case of error
kimage_file_prepare_segments calls kimage_file_post_load_cleanup, which
will also free those buffers.
This causes two warnings like the following, one for pi->sechdrs and the
other for pi->purgatory_buf:
kexec-bzImage64: Loading purgatory failed
------------[ cut here ]------------
WARNING: CPU: 1 PID: 2119 at mm/vmalloc.c:1490 __vunmap+0xc1/0xd0
Trying to vfree() nonexistent vm area (
ffffc90000e91000)
Modules linked in:
CPU: 1 PID: 2119 Comm: kexec Not tainted 4.8.0-rc3+ #5
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
dump_stack+0x4d/0x65
__warn+0xcb/0xf0
warn_slowpath_fmt+0x4f/0x60
? find_vmap_area+0x19/0x70
? kimage_file_post_load_cleanup+0x47/0xb0
__vunmap+0xc1/0xd0
vfree+0x2e/0x70
kimage_file_post_load_cleanup+0x5e/0xb0
SyS_kexec_file_load+0x448/0x680
? putname+0x54/0x60
? do_sys_open+0x190/0x1f0
entry_SYSCALL_64_fastpath+0x13/0x8f
---[ end trace
158bb74f5950ca2b ]---
Fix by setting pi->sechdrs an pi->purgatory_buf to NULL, since vfree
won't try to free a NULL pointer.
Link: http://lkml.kernel.org/r/1472083546-23683-1-git-send-email-bauerman@linux.vnet.ibm.com
Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michal Hocko [Thu, 1 Sep 2016 23:14:41 +0000 (16:14 -0700)]
mm, oom: prevent premature OOM killer invocation for high order request
There have been several reports about pre-mature OOM killer invocation
in 4.7 kernel when order-2 allocation request (for the kernel stack)
invoked OOM killer even during basic workloads (light IO or even kernel
compile on some filesystems). In all reported cases the memory is
fragmented and there are no order-2+ pages available. There is usually
a large amount of slab memory (usually dentries/inodes) and further
debugging has shown that there are way too many unmovable blocks which
are skipped during the compaction. Multiple reporters have confirmed
that the current linux-next which includes [1] and [2] helped and OOMs
are not reproducible anymore.
A simpler fix for the late rc and stable is to simply ignore the
compaction feedback and retry as long as there is a reclaim progress and
we are not getting OOM for order-0 pages. We already do that for
CONFING_COMPACTION=n so let's reuse the same code when compaction is
enabled as well.
[1] http://lkml.kernel.org/r/
20160810091226.6709-1-vbabka@suse.cz
[2] http://lkml.kernel.org/r/
f7a9ea9d-bb88-bfd6-e340-
3a933559305a@suse.cz
Fixes:
0a0337e0d1d1 ("mm, oom: rework oom detection")
Link: http://lkml.kernel.org/r/20160823074339.GB23577@dhcp22.suse.cz
Signed-off-by: Michal Hocko <mhocko@suse.com>
Tested-by: Olaf Hering <olaf@aepfle.de>
Tested-by: Ralf-Peter Rohbeck <Ralf-Peter.Rohbeck@quantum.com>
Cc: Markus Trippelsdorf <markus@trippelsdorf.de>
Cc: Arkadiusz Miskiewicz <a.miskiewicz@gmail.com>
Cc: Ralf-Peter Rohbeck <Ralf-Peter.Rohbeck@quantum.com>
Cc: Jiri Slaby <jslaby@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: David Rientjes <rientjes@google.com>
Cc: <stable@vger.kernel.org> [4.7.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Chris Mason [Fri, 2 Sep 2016 00:29:34 +0000 (17:29 -0700)]
Merge tag 'for-chris' of git://git./linux/kernel/git/kdave/linux into for-linus-4.8
Linus Torvalds [Thu, 1 Sep 2016 22:55:56 +0000 (15:55 -0700)]
Merge branch 'stable-4.8' of git://git.infradead.org/users/pcmoore/audit
Pull audit fixes from Paul Moore:
"Two small patches to fix some bugs with the audit-by-executable
functionality we introduced back in v4.3 (both patches are marked
for the stable folks)"
* 'stable-4.8' of git://git.infradead.org/users/pcmoore/audit:
audit: fix exe_file access in audit_exe_compare
mm: introduce get_task_exe_file
Linus Torvalds [Thu, 1 Sep 2016 22:33:16 +0000 (15:33 -0700)]
Merge tag 'xfs-iomap-for-linus-4.8-rc5' of git://git./linux/kernel/git/dgc/linux-xfs
Pull xfs and iomap fixes from Dave Chinner:
"Most of these changes are small regression fixes that address problems
introduced in the 4.8-rc1 window. The two fixes that aren't (IO
completion fix and superblock inprogress check) are fixes for problems
introduced some time ago and need to be pushed back to stable kernels.
Changes in this update:
- iomap FIEMAP_EXTENT_MERGED usage fix
- additional mount-time feature restrictions
- rmap btree query fixes
- freeze/unmount io completion workqueue fix
- memory corruption fix for deferred operations handling"
* tag 'xfs-iomap-for-linus-4.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs:
xfs: track log done items directly in the deferred pending work item
iomap: don't set FIEMAP_EXTENT_MERGED for extent based filesystems
xfs: prevent dropping ioend completions during buftarg wait
xfs: fix superblock inprogress check
xfs: simple btree query range should look right if LE lookup fails
xfs: fix some key handling problems in _btree_simple_query_range
xfs: don't log the entire end of the AGF
xfs: disallow mounting of realtime + rmap filesystems
xfs: don't perform lookups on zero-height btrees
Wang Xiaoguang [Wed, 31 Aug 2016 11:46:16 +0000 (19:46 +0800)]
btrfs: fix one bug that process may endlessly wait for ticket in wait_reserve_ticket()
If can_overcommit() in btrfs_calc_reclaim_metadata_size() returns true,
btrfs_async_reclaim_metadata_space() will not reclaim metadata space, just
return directly and also forget to wake up process which are waiting for
their tickets, so these processes will wait endlessly.
Fstests case generic/172 with mount option "-o compress=lzo" have revealed
this bug in my test machine. Here if we have tickets to handle, we must
handle them first.
Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Liu Bo [Wed, 31 Aug 2016 23:43:33 +0000 (16:43 -0700)]
Btrfs: fix endless loop in balancing block groups
Qgroup function may overwrite the saved error 'err' with 0
in case quota is not enabled, and this ends up with a
endless loop in balance because we keep going back to balance
the same block group.
It really should use 'ret' instead.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Josef Bacik [Wed, 24 Aug 2016 15:57:52 +0000 (11:57 -0400)]
Btrfs: kill invalid ASSERT() in process_all_refs()
Suppose you have the following tree in snap1 on a file system mounted with -o
inode_cache so that inode numbers are recycled
└── [ 258] a
└── [ 257] b
and then you remove b, rename a to c, and then re-create b in c so you have the
following tree
└── [ 258] c
└── [ 257] b
and then you try to do an incremental send you will hit
ASSERT(pending_move == 0);
in process_all_refs(). This is because we assume that any recycling of inodes
will not have a pending change in our path, which isn't the case. This is the
case for the DELETE side, since we want to remove the old file using the old
path, but on the create side we could have a pending move and need to do the
normal pending rename dance. So remove this ASSERT() and put a comment about
why we ignore pending_move. Thanks,
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Miklos Szeredi [Thu, 1 Sep 2016 09:12:00 +0000 (11:12 +0200)]
ovl: update doc
Some of the documented quirks no longer apply.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Miklos Szeredi [Thu, 1 Sep 2016 09:12:00 +0000 (11:12 +0200)]
ovl: listxattr: use strnlen()
Be defensive about what underlying fs provides us in the returned xattr
list buffer. If it's not properly null terminated, bail out with a warning
insead of BUG.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Cc: <stable@vger.kernel.org>
Andreas Gruenbacher [Mon, 22 Aug 2016 15:52:55 +0000 (17:52 +0200)]
ovl: Switch to generic_getxattr
Now that overlayfs has xattr handlers for iop->{set,remove}xattr, use
those same handlers for iop->getxattr as well.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Miklos Szeredi [Thu, 1 Sep 2016 09:12:00 +0000 (11:12 +0200)]
ovl: copyattr after setting POSIX ACL
Setting POSIX acl may also modify the file mode, so need to copy that up to
the overlay inode.
Reported-by: Eryu Guan <eguan@redhat.com>
Fixes:
d837a49bd57f ("ovl: fix POSIX ACL setting")
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Andreas Gruenbacher [Mon, 22 Aug 2016 15:22:11 +0000 (17:22 +0200)]
ovl: Switch to generic_removexattr
Commit
d837a49bd57f ("ovl: fix POSIX ACL setting") switches from
iop->setxattr from ovl_setxattr to generic_setxattr, so switch from
ovl_removexattr to generic_removexattr as well. As far as permission
checking goes, the same rules should apply in either case.
While doing that, rename ovl_setxattr to ovl_xattr_set to indicate that
this is not an iop->setxattr implementation and remove the unused inode
argument.
Move ovl_other_xattr_set above ovl_own_xattr_set so that they match the
order of handlers in ovl_xattr_handlers.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Fixes:
d837a49bd57f ("ovl: fix POSIX ACL setting")
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Andreas Gruenbacher [Mon, 22 Aug 2016 14:36:49 +0000 (16:36 +0200)]
ovl: Get rid of ovl_xattr_noacl_handlers array
Use an ordinary #ifdef to conditionally include the POSIX ACL handlers
in ovl_xattr_handlers, like the other filesystems do. Flag the code
that is now only used conditionally with __maybe_unused.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Andreas Gruenbacher [Mon, 22 Aug 2016 15:59:22 +0000 (17:59 +0200)]
ovl: Fix OVL_XATTR_PREFIX
Make sure ovl_own_xattr_handler only matches attribute names starting
with "overlay.", not "overlayXXX".
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Fixes:
d837a49bd57f ("ovl: fix POSIX ACL setting")
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Colin Ian King [Thu, 18 Aug 2016 15:58:35 +0000 (16:58 +0100)]
ovl: fix spelling mistake: "directries" -> "directories"
Trivial fix to spelling mistake in pr_err message.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Miklos Szeredi [Thu, 1 Sep 2016 09:11:59 +0000 (11:11 +0200)]
ovl: don't cache acl on overlay layer
Some operations (setxattr/chmod) can make the cached acl stale. We either
need to clear overlay's acl cache for the affected inode or prevent acl
caching on the overlay altogether. Preventing caching has the following
advantages:
- no double caching, less memory used
- overlay cache doesn't go stale when fs clears it's own cache
Possible disadvantage is performance loss. If that becomes a problem
get_acl() can be optimized for overlayfs.
This patch disables caching by pre setting i_*acl to a value that
- has bit 0 set, so is_uncached_acl() will return true
- is not equal to ACL_NOT_CACHED, so get_acl() will not overwrite it
The constant -3 was chosen for this purpose.
Fixes:
39a25b2b3762 ("ovl: define ->get_acl() for overlay inodes")
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Miklos Szeredi [Thu, 1 Sep 2016 09:11:59 +0000 (11:11 +0200)]
ovl: use cached acl on underlying layer
Instead of calling ->get_acl() directly, use get_acl() to get the cached
value.
We will have the acl cached on the underlying inode anyway, because we do
permission checking on the both the overlay and the underlying fs.
So, since we already have double caching, this improves performance without
any cost.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Miklos Szeredi [Thu, 1 Sep 2016 09:11:59 +0000 (11:11 +0200)]
ovl: proper cleanup of workdir
When mounting overlayfs it needs a clean "work" directory under the
supplied workdir.
Previously the mount code removed this directory if it already existed and
created a new one. If the removal failed (e.g. directory was not empty)
then it fell back to a read-only mount not using the workdir.
While this has never been reported, it is possible to get a non-empty
"work" dir from a previous mount of overlayfs in case of crash in the
middle of an operation using the work directory.
In this case the left over state should be discarded and the overlay
filesystem will be consistent, guaranteed by the atomicity of operations on
moving to/from the workdir to the upper layer.
This patch implements cleaning out any files left in workdir. It is
implemented using real recursion for simplicity, but the depth is limited
to 2, because the worst case is that of a directory containing whiteouts
under "work".
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Cc: <stable@vger.kernel.org>
Miklos Szeredi [Thu, 1 Sep 2016 09:11:59 +0000 (11:11 +0200)]
ovl: remove posix_acl_default from workdir
Clear out posix acl xattrs on workdir and also reset the mode after
creation so that an inherited sgid bit is cleared.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Cc: <stable@vger.kernel.org>
Miklos Szeredi [Thu, 1 Sep 2016 09:11:59 +0000 (11:11 +0200)]
ovl: handle umask and posix_acl_default correctly on creation
Setting MS_POSIXACL in sb->s_flags has the side effect of passing mode to
create functions without masking against umask.
Another problem when creating over a whiteout is that the default posix acl
is not inherited from the parent dir (because the real parent dir at the
time of creation is the work directory).
Fix these problems by:
a) If upper fs does not have MS_POSIXACL, then mask mode with umask.
b) If creating over a whiteout, call posix_acl_create() to get the
inherited acls. After creation (but before moving to the final
destination) set these acls on the created file. posix_acl_create() also
updates the file creation mode as appropriate.
Fixes:
39a25b2b3762 ("ovl: define ->get_acl() for overlay inodes")
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Steve Muckle [Fri, 26 Aug 2016 18:40:47 +0000 (11:40 -0700)]
cpufreq / sched: ignore SMT when determining max cpu capacity
PELT does not consider SMT when scaling its utilization values via
arch_scale_cpu_capacity(). The value in rq->cpu_capacity_orig does
take SMT into consideration though and therefore may be smaller than
the utilization reported by PELT.
On an Intel i7-3630QM for example rq->cpu_capacity_orig is 589 but
util_avg scales up to 1024. This means that a 50% utilized CPU will show
up in schedutil as ~86% busy.
Fix this by using the same CPU scaling value in schedutil as that which
is used by PELT.
Signed-off-by: Steve Muckle <smuckle@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Rafael J. Wysocki [Wed, 31 Aug 2016 01:11:31 +0000 (03:11 +0200)]
cpufreq: Drop unnecessary check from cpufreq_policy_alloc()
Since cpufreq_policy_alloc() doesn't use its dev variable for
anything useful, drop that variable from there along with the
NULL check against it.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Dave Airlie [Wed, 31 Aug 2016 20:34:09 +0000 (06:34 +1000)]
Merge branch 'msm-fixes-4.8' of git://people.freedesktop.org/~robclark/linux into drm-fixes
copy from user fixes.
* 'msm-fixes-4.8' of git://people.freedesktop.org/~robclark/linux:
drm/msm: protect against faults from copy_from_user() in submit ioctl
drm/msm: fix use of copy_from_user() while holding spinlock
Mateusz Guzik [Tue, 23 Aug 2016 14:20:39 +0000 (16:20 +0200)]
audit: fix exe_file access in audit_exe_compare
Prior to the change the function would blindly deference mm, exe_file
and exe_file->f_inode, each of which could have been NULL or freed.
Use get_task_exe_file to safely obtain stable exe_file.
Signed-off-by: Mateusz Guzik <mguzik@redhat.com>
Acked-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Acked-by: Richard Guy Briggs <rgb@redhat.com>
Cc: <stable@vger.kernel.org> # 4.3.x
Signed-off-by: Paul Moore <paul@paul-moore.com>