Rafael J. Wysocki [Mon, 14 Mar 2016 13:22:34 +0000 (14:22 +0100)]
Merge branch 'pm-tools'
* pm-tools:
tools/power turbostat: bugfix: TDP MSRs print bits fixing
tools/power turbostat: correct output for MSR_NHM_SNB_PKG_CST_CFG_CTL dump
tools/power turbostat: call __cpuid() instead of __get_cpuid()
tools/power turbostat: indicate SMX and SGX support
tools/power turbostat: detect and work around syscall jitter
tools/power turbostat: show GFX%rc6
tools/power turbostat: show GFXMHz
tools/power turbostat: show IRQs per CPU
tools/power turbostat: make fewer systems calls
tools/power turbostat: fix compiler warnings
tools/power turbostat: add --out option for saving output in a file
tools/power turbostat: re-name "%Busy" field to "Busy%"
tools/power turbostat: Intel Xeon x200: fix turbo-ratio decoding
tools/power turbostat: Intel Xeon x200: fix erroneous bclk value
tools/power turbostat: allow sub-sec intervals
tools/power turbostat: Decode MSR_MISC_PWR_MGMT
tools/power turbostat: decode HWP registers
x86 msr-index: Simplify syntax for HWP fields
tools/power turbostat: CPUID(0x16) leaf shows base, max, and bus frequency
tools/power turbostat: decode more CPUID fields
Rafael J. Wysocki [Mon, 14 Mar 2016 13:22:22 +0000 (14:22 +0100)]
Merge branches 'pm-cpuidle', 'pm-sleep' and 'pm-domains'
* pm-cpuidle:
cpuidle: menu: help gcc generate slightly better code
cpuidle: menu: avoid expensive square root computation
* pm-sleep:
PM / suspend: replacing printk
PM/freezer: y2038, use boottime to compare tstamps
PM / sleep: declare __tracedata symbols as char[] rather than char
* pm-domains:
PM / Domains: Fix potential NULL pointer dereference
PM / Domains: Fix removal of a subdomain
PM / Domains: Propagate start and restore errors during runtime resume
PM / Domains: Join state name and index in debugfs output
PM / Domains: Restore alignment of slaves in debugfs output
PM / Domains: remove old power on/off latencies
ARM: imx6: pm: declare pm domain latency on power_state struct
PM / Domains: Support for multiple states
Rafael J. Wysocki [Mon, 14 Mar 2016 13:22:03 +0000 (14:22 +0100)]
Merge branch 'pm-cpufreq'
* pm-cpufreq: (94 commits)
intel_pstate: Do not skip samples partially
intel_pstate: Remove freq calculation from intel_pstate_calc_busy()
intel_pstate: Move intel_pstate_calc_busy() into get_target_pstate_use_performance()
intel_pstate: Optimize calculation for max/min_perf_adj
intel_pstate: Remove extra conversions in pid calculation
cpufreq: Move scheduler-related code to the sched directory
Revert "cpufreq: postfix policy directory with the first CPU in related_cpus"
cpufreq: Reduce cpufreq_update_util() overhead a bit
cpufreq: Select IRQ_WORK if CPU_FREQ_GOV_COMMON is set
cpufreq: Remove 'policy->governor_enabled'
cpufreq: Rename __cpufreq_governor() to cpufreq_governor()
cpufreq: Relocate handle_update() to kill its declaration
cpufreq: governor: Drop unnecessary checks from show() and store()
cpufreq: governor: Fix race in dbs_update_util_handler()
cpufreq: governor: Make gov_set_update_util() static
cpufreq: governor: Narrow down the dbs_data_mutex coverage
cpufreq: governor: Make dbs_data_mutex static
cpufreq: governor: Relocate definitions of tuners structures
cpufreq: governor: Move per-CPU data to the common code
cpufreq: governor: Make governor private data per-policy
...
Rafael J. Wysocki [Mon, 14 Mar 2016 13:21:55 +0000 (14:21 +0100)]
Merge branch 'pm-opp'
* pm-opp:
PM / OPP: Rename structures for clarity
PM / OPP: Fix incorrect comments
PM / OPP: Initialize regulator pointer to an error value
PM / OPP: Initialize u_volt_min/max to a valid value
PM / OPP: Fix NULL pointer dereference crash when disabling OPPs
PM / OPP: Add dev_pm_opp_set_rate()
PM / OPP: Manage device clk
PM / OPP: Parse clock-latency and voltage-tolerance for v1 bindings
PM / OPP: Introduce dev_pm_opp_get_max_transition_latency()
PM / OPP: Introduce dev_pm_opp_get_max_volt_latency()
PM / OPP: Disable OPPs that aren't supported by the regulator
PM / OPP: get/put regulators from OPP core
Rafael J. Wysocki [Mon, 14 Mar 2016 13:21:45 +0000 (14:21 +0100)]
Merge branch 'powercap'
* powercap:
powercap/rapl: track lead cpu per package
powercap/rapl: add package reference per domain
powercap/rapl: reduce ipi calls
cpumask: export cpumask_any_but
Rafael J. Wysocki [Mon, 14 Mar 2016 13:21:39 +0000 (14:21 +0100)]
Merge branch 'device-properties'
* device-properties:
device property: fix for a case of use-after-free
Rafael J. Wysocki [Mon, 14 Mar 2016 13:21:23 +0000 (14:21 +0100)]
Merge branches 'acpi-ec', 'acpi-fan', 'acpi-video' and 'acpi-misc'
* acpi-ec:
ACPI / EC: Deny write access unless requested by module param
* acpi-fan:
ACPI / fan: Make struct dev_pm_ops const
* acpi-video:
ACPI / video: remove unused device_decode array
* acpi-misc:
ACPI / util: remove redundant check if element is NULL
ACPI: Add acpi_force_32bit_fadt_addr option to force 32 bit FADT addresses
drivers/acpi: make pmic/intel_pmic_crc.c explicitly non-modular
drivers/acpi: make apei/ghes.c more explicitly non-modular
drivers/acpi: make bgrt driver explicitly non-modular
Rafael J. Wysocki [Mon, 14 Mar 2016 13:20:57 +0000 (14:20 +0100)]
Merge branches 'acpi-pci', 'acpi-soc' and 'pnp'
* acpi-pci:
x86/ACPI/PCI: Recognize that Interrupt Line 255 means "not connected"
* acpi-soc:
i2c: designware: Add device HID for future AMD I2C controller
* pnp:
PNP / ACPI: add ACPI_RESOURCE_TYPE_SERIAL_BUS as a valid type
Rafael J. Wysocki [Mon, 14 Mar 2016 13:20:33 +0000 (14:20 +0100)]
Merge branches 'acpi-processor' and 'acpi-cppc'
* acpi-processor:
ACPI / sleep: move acpi_processor_sleep to sleep.c
ACPI / processor : add support for ACPI0010 processor container
ACPI / processor_idle: replace PREFIX with pr_fmt
* acpi-cppc:
ACPI / CPPC: use MRTT/MPAR to decide if/when a req can be sent
ACPI / CPPC: replace writeX/readX to PCC with relaxed version
mailbox: pcc: optimized pcc_send_data
ACPI / CPPC: optimized cpc_read and cpc_write
ACPI / CPPC: Optimize PCC Read Write operations
Rafael J. Wysocki [Mon, 14 Mar 2016 13:20:13 +0000 (14:20 +0100)]
Merge branches 'acpi-scan', 'acpi-osl' and 'acpi-apei'
* acpi-scan:
ACPI / scan: AMBA bus probing support
ACPI: introduce a function to find the first physical device
* acpi-osl:
ACPI / OSL: Add support to install tables via initrd
ACPI / OSL: Clean up initrd table override code
* acpi-apei:
ACPI / APEI: ERST: Fixed leaked resources in erst_init
ACPI / APEI: Fix leaked resources
Rafael J. Wysocki [Mon, 14 Mar 2016 13:19:52 +0000 (14:19 +0100)]
Merge branch 'acpica'
* acpica:
ACPICA / Interpreter: Fix a regression triggered because of wrong Linux ECDT support
ACPICA: Utilities: Update trace mechinism for acquire_object
ACPICA: Namespace: Rename acpi_gbl_reg_methods_enabled to acpi_gbl_namespace_initialized
ACPICA: Namespace: Ensure \_SB._INI executed before any _REG
ACPICA: ACPICA: Tune _REG evaluations order in the initialization steps
ACPICA: Tables: make default region accessible during the table load
ACPICA: ACPI 6.0/iASL: Add support for the External AML opcode
ACPICA: Remove unnecessary arguments to ACPI_INFO
ACPICA: debugger: dbconvert: free pld_info on error return path
ACPICA: iASL: Update to use internal acpi_ut_strtoul64 function
ACPICA: iASL: Fix some typos with the name strtoul64
ACPICA: Remove incorrect "static" from a global structure
ACPICA: aclocal: Put parens around some definitions.
Linus Torvalds [Mon, 14 Mar 2016 04:28:54 +0000 (21:28 -0700)]
Linux 4.5
Rafael J. Wysocki [Mon, 14 Mar 2016 01:13:05 +0000 (02:13 +0100)]
Merge branch 'turbostat' of git://git./linux/kernel/git/lenb/linux into pm-tools
Pull turbostat updates for 4.6 from Len Brown.
* 'turbostat' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
tools/power turbostat: bugfix: TDP MSRs print bits fixing
tools/power turbostat: correct output for MSR_NHM_SNB_PKG_CST_CFG_CTL dump
tools/power turbostat: call __cpuid() instead of __get_cpuid()
tools/power turbostat: indicate SMX and SGX support
tools/power turbostat: detect and work around syscall jitter
tools/power turbostat: show GFX%rc6
tools/power turbostat: show GFXMHz
tools/power turbostat: show IRQs per CPU
tools/power turbostat: make fewer systems calls
tools/power turbostat: fix compiler warnings
tools/power turbostat: add --out option for saving output in a file
tools/power turbostat: re-name "%Busy" field to "Busy%"
tools/power turbostat: Intel Xeon x200: fix turbo-ratio decoding
tools/power turbostat: Intel Xeon x200: fix erroneous bclk value
tools/power turbostat: allow sub-sec intervals
tools/power turbostat: Decode MSR_MISC_PWR_MGMT
tools/power turbostat: decode HWP registers
x86 msr-index: Simplify syntax for HWP fields
tools/power turbostat: CPUID(0x16) leaf shows base, max, and bus frequency
tools/power turbostat: decode more CPUID fields
Linus Torvalds [Sun, 13 Mar 2016 20:04:46 +0000 (13:04 -0700)]
Merge branch 'upstream' of git://git.linux-mips.org/ralf/upstream-linus
Pull MIPS fixes from Ralf Baechle:
"Another round of MIPS fixes for 4.5:
- Fix JZ4780 build with DEBUG_ZBOOT and MACH_JZ4780
- Fix build with DEBUG_ZBOOT and MACH_JZ4780
- Fix issue with uninitialised temp_foreign_map
- Fix awk regex compile failure with certain versions of awk. At
this time, the sole user, ld-ifversion, is only used on MIPS"
* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
MIPS: smp.c: Fix uninitialised temp_foreign_map
MIPS: Fix build error when SMP is used without GIC
ld-version: Fix awk regex compile failure
MIPS: Fix build with DEBUG_ZBOOT and MACH_JZ4780
James Hogan [Fri, 4 Mar 2016 10:10:51 +0000 (10:10 +0000)]
MIPS: smp.c: Fix uninitialised temp_foreign_map
When calculate_cpu_foreign_map() recalculates the cpu_foreign_map
cpumask it uses the local variable temp_foreign_map without initialising
it to zero. Since the calculation only ever sets bits in this cpumask
any existing bits at that memory location will remain set and find their
way into cpu_foreign_map too. This could potentially lead to cache
operations suboptimally doing smp calls to multiple VPEs in the same
core, even though the VPEs share primary caches.
Therefore initialise temp_foreign_map using cpumask_clear() before use.
Fixes:
cccf34e9411c ("MIPS: c-r4k: Fix cache flushing for MT cores")
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/12759/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Hauke Mehrtens [Sun, 6 Mar 2016 21:28:56 +0000 (22:28 +0100)]
MIPS: Fix build error when SMP is used without GIC
The MIPS_GIC_IPI should only be selected when MIPS_GIC is also
selected, otherwise it results in a compile error. smp-gic.c uses some
functions from include/linux/irqchip/mips-gic.h like
plat_ipi_call_int_xlate() which are only added to the header file when
MIPS_GIC is set. The Lantiq SoC does not use the GIC, but supports SMP.
The calls top the functions from smp-gic.c are already protected by
some #ifdefs
The first part of this was introduced in commit
72e20142b2bf ("MIPS:
Move GIC IPI functions out of smp-cmp.c")
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: stable@vger.kernel.org # v3.15+
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/12774/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
James Hogan [Tue, 8 Mar 2016 16:47:53 +0000 (16:47 +0000)]
ld-version: Fix awk regex compile failure
The ld-version.sh script fails on some versions of awk with the
following error, resulting in build failures for MIPS:
awk: scripts/ld-version.sh: line 4: regular expression compile failed (missing '(')
This is due to the regular expression ".*)", meant to strip off the
beginning of the ld version string up to the close bracket, however
brackets have a meaning in regular expressions, so lets escape it so
that awk doesn't expect a corresponding open bracket.
Fixes:
ccbef1674a15 ("Kbuild, lto: add ld-version and ld-ifversion ...")
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Tested-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk>
Cc: Michal Marek <mmarek@suse.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: linux-mips@linux-mips.org
Cc: linux-kbuild@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org # 4.4.x-
Patchwork: https://patchwork.linux-mips.org/patch/12838/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Aaro Koskinen [Mon, 7 Mar 2016 22:06:36 +0000 (00:06 +0200)]
MIPS: Fix build with DEBUG_ZBOOT and MACH_JZ4780
Ingenic SoC declares ZBOOT support, but debug definitions are missing
for MACH_JZ4780 resulting in a build failure when DEBUG_ZBOOT is set.
The UART addresses are same as with JZ4740, so fix by covering JZ4780
with those as well.
Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/12830/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Chen Yu [Sun, 13 Dec 2015 13:09:31 +0000 (21:09 +0800)]
tools/power turbostat: bugfix: TDP MSRs print bits fixing
MSR_CONFIG_TDP_NOMINAL:
should print all 8 bits of base_ratio (bit 0:7) 0xFF
MSR_CONFIG_TDP_LEVEL_1:
should print all 15 bits of PKG_MIN_PWR_LVL1 (bit 48:62) 0x7FFF
should print all 15 bits of PKG_MAX_PWR_LVL1 (bit 32:46) 0x7FFF
should print all 8 bits of LVL1_RATIO (bit 16:23) 0xFF
should print all 15 bits of PKG_TDP_LVL1 (bit 0:14) 0x7FFF
And the same modification to MSR_CONFIG_TDP_LEVEL_2.
MSR_TURBO_ACTIVATION_RATIO:
should print all 8 bits of MAX_NON_TURBO_RATIO (bit 0:7) 0xFF
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Sun, 13 Mar 2016 07:21:22 +0000 (03:21 -0400)]
tools/power turbostat: correct output for MSR_NHM_SNB_PKG_CST_CFG_CTL dump
MSR_NHM_SNB_PKG_CST_CFG_CTL: 0x1e008008 (...pkg-cstate-limit=0: unlimited)
should print as
MSR_NHM_SNB_PKG_CST_CFG_CTL: 0x1e008008 (...pkg-cstate-limit=8: unlimited)
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Sun, 13 Mar 2016 07:14:35 +0000 (03:14 -0400)]
tools/power turbostat: call __cpuid() instead of __get_cpuid()
turbostat already checks whether calling each cpuid leavf is legal,
and it doesn't look at the function return value,
so call the simpler gcc intrinsic __cpuid() instead of __get_cpuid().
syntax only, no functional change
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Fri, 11 Mar 2016 18:26:03 +0000 (13:26 -0500)]
tools/power turbostat: indicate SMX and SGX support
SGX presence is related to a SKL power workaround,
so lets show when that is enabled.
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Sat, 27 Feb 2016 08:11:29 +0000 (03:11 -0500)]
tools/power turbostat: detect and work around syscall jitter
The accuracy of Bzy_Mhz and Busy% depend on reading
the TSC, APERF, and MPERF close together in time.
When there is a very short measurement interval,
or a large system is profoundly idle, the changes
in APERF and MPERF may be very small.
They can be small enough that an expensive interrupt
between reading APERF and MPERF can cause the APERF/MPERF
ratio to become inaccurate, resulting in invalid
calculation and display of Bzy_MHz.
A dummy APERF read of APERF makes this problem
much more rare. Apparently this 1st systemn call
after exiting a long stretch of idle is when we
typically see expensive timer interrupts that cause
large jitter.
For the cases that dummy APERF read fails to prevent,
we compare the latency of the APERF and MPERF reads.
If they differ by more than 2x, we re-issue them.
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Sat, 27 Feb 2016 06:28:12 +0000 (01:28 -0500)]
tools/power turbostat: show GFX%rc6
The column "GFX%c6" show the percentage of time the GPU
is in the "render C6" state, rc6. Deep package C-states on several
systems depend on the GPU being in RC6.
This information comes from the counter
/sys/class/drm/card0/power/rc6_residency_ms,
as read before and after the measurement interval.
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Sat, 27 Feb 2016 05:37:54 +0000 (00:37 -0500)]
tools/power turbostat: show GFXMHz
Under the column "GFXMHz", show a snapshot of this attribute:
/sys/class/graphics/fb0/device/drm/card0/gt_cur_freq_mhz
This is an instantaneous snapshot of what sysfs presents
at the end of the measurement interval. turbostat does
not average or otherwise perform any math on this value.
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Sat, 27 Feb 2016 04:48:05 +0000 (23:48 -0500)]
tools/power turbostat: show IRQs per CPU
The new IRQ column shows how many interrupts have occurred on each CPU
during the measurement inteval. This information comes from
the difference between /proc/interrupts shapshots made before
and after the measurement interval.
The first row, the system summary, shows the sum of the IRQS
for all CPUs during that interval.
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Sat, 27 Feb 2016 01:51:02 +0000 (20:51 -0500)]
tools/power turbostat: make fewer systems calls
skip the open(2)/close(2) on each msr read
by keeping the /dev/cpu/*/msr files open.
The remaining read(2) is generally far fewer cycles
than the removed open(2) system call.
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Wed, 24 Feb 2016 23:16:40 +0000 (18:16 -0500)]
tools/power turbostat: fix compiler warnings
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Sun, 14 Feb 2016 04:36:17 +0000 (23:36 -0500)]
tools/power turbostat: add --out option for saving output in a file
By default...
Turbostat --debug gconfiguration info goes to stderr.
In FORK mode, turbostat statistics go to stderr.
In PERIODIC mode, turbostat statistics go to stdout.
These defaults do not change, but an option "--out file"
will send all output above only to the specified file.
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Sun, 14 Feb 2016 04:41:53 +0000 (23:41 -0500)]
tools/power turbostat: re-name "%Busy" field to "Busy%"
some tools processing turbostat output
have difficulty with items that begin with %...
Reported-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Hubert Chrzaniuk [Wed, 10 Feb 2016 13:55:22 +0000 (14:55 +0100)]
tools/power turbostat: Intel Xeon x200: fix turbo-ratio decoding
Following changes have been made:
- changed MSR_NHM_TURBO_RATIO_LIMIT to MSR_TURBO_RATIO_LIMIT in debug print
for consistency with Developer Manual
- updated definition of bitfields in MSR_TURBO_RATIO_LIMIT and appropriate
parsing code
- added x200 to list of architectures that do not support Nahlem compatible
definition of MSR_TURBO_RATIO_LIMIT register (x200 has the register but
bits definition is custom)
- fixed typo in code that parses MSR_TURBO_RATIO_LIMIT
(logical instead of bitwise operator)
- changed MSR_TURBO_RATIO_LIMIT parsing algorithm so the print out had the
same order as implementations for other platforms
Signed-off-by: Hubert Chrzaniuk <hubert.chrzaniuk@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Chrzaniuk, Hubert [Wed, 10 Feb 2016 15:35:17 +0000 (16:35 +0100)]
tools/power turbostat: Intel Xeon x200: fix erroneous bclk value
x200 does not enable any way to programmatically obtain bus clock
speed. Bclk for the architecture has a fixed value of 100 MHz.
At the same time x200 cannot be included in has_snb_msrs since
it does not support C7 idle state.
prior to this patch, MHz values reported on this chip
were erroneously calculated using bclk of 133MHz,
causing MHz values to be reported 33% higher than actual.
Signed-off-by: Hubert Chrzaniuk <hubert.chrzaniuk@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Sat, 13 Feb 2016 03:44:48 +0000 (22:44 -0500)]
tools/power turbostat: allow sub-sec intervals
turbostat -i interval_sec
will sample and display statistics every interval_sec.
interval_sec used to be a whole number of seconds,
but now we accept a decimal, as small as 0.001 sec (1 ms).
Signed-off-by: Len Brown <len.brown@intel.com>
Linus Torvalds [Sun, 13 Mar 2016 04:18:54 +0000 (20:18 -0800)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block merge fix from Jens Axboe.
This fixes the block segment counting bug and resulting sg overrun
reported by Kent Overstreet, introduced with the last block pull.
* 'for-linus' of git://git.kernel.dk/linux-block:
block: don't optimize for non-cloned bio in bio_get_last_bvec()
Linus Torvalds [Sun, 13 Mar 2016 04:09:25 +0000 (20:09 -0800)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"This fixes 3 FPU handling related bugs, an EFI boot crash and a
runtime warning.
The EFI fix arrived late but I didn't want to delay it to after v4.5
because the effects are pretty bad for the systems that are affected
by it"
[ Actually, I don't think the EFI fix really matters yet, because we
haven't switched to the separate EFI page tables in mainline yet ]
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/efi: Fix boot crash by always mapping boot service regions into new EFI page tables
x86/fpu: Fix eager-FPU handling on legacy FPU machines
x86/delay: Avoid preemptible context checks in delay_mwaitx()
x86/fpu: Revert ("x86/fpu: Disable AVX when eagerfpu is off")
x86/fpu: Fix 'no387' regression
Linus Torvalds [Sun, 13 Mar 2016 01:14:07 +0000 (17:14 -0800)]
Merge git://git./linux/kernel/git/nab/target-pending
Pull SCSI target bug fix from Nicholas Bellinger:
"Here is an outstanding target-core bug-fix for v4.5 code."
This patch addresses a recent Task Management (TMR) regression related
to larger set of multi-port LUN_RESET bug-fixes in v4.5-rc5.
It drops a left-over target_put_sess_cmd() of se_cmd->cmd_kref within
ABORT_TASK failure path, once a se_cmd descriptor has already
completed posting response to fabric driver logic, and must be skipped
during normal ABORT_TASK se_cmd->tag lookup"
* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
target: Drop incorrect ABORT_TASK put for completed commands
Ming Lei [Sat, 12 Mar 2016 14:56:19 +0000 (22:56 +0800)]
block: don't optimize for non-cloned bio in bio_get_last_bvec()
For !BIO_CLONED bio, we can use .bi_vcnt safely, but it
doesn't mean we can just simply return .bi_io_vec[.bi_vcnt - 1]
because the start postion may have been moved in the middle of
the bvec, such as splitting in the middle of bvec.
Fixes:
7bcd79ac50d9(block: bio: introduce helpers to get the 1st and last bvec)
Cc: stable@vger.kernel.org
Reported-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Matt Fleming [Fri, 11 Mar 2016 11:19:23 +0000 (11:19 +0000)]
x86/efi: Fix boot crash by always mapping boot service regions into new EFI page tables
Some machines have EFI regions in page zero (physical address
0x00000000) and historically that region has been added to the e820
map via trim_bios_range(), and ultimately mapped into the kernel page
tables. It was not mapped via efi_map_regions() as one would expect.
Alexis reports that with the new separate EFI page tables some boot
services regions, such as page zero, are not mapped. This triggers an
oops during the SetVirtualAddressMap() runtime call.
For the EFI boot services quirk on x86 we need to memblock_reserve()
boot services regions until after SetVirtualAddressMap(). Doing that
while respecting the ownership of regions that may have already been
reserved by the kernel was the motivation behind this commit:
7d68dc3f1003 ("x86, efi: Do not reserve boot services regions within reserved areas")
That patch was merged at a time when the EFI runtime virtual mappings
were inserted into the kernel page tables as described above, and the
trick of setting ->numpages (and hence the region size) to zero to
track regions that should not be freed in efi_free_boot_services()
meant that we never mapped those regions in efi_map_regions(). Instead
we were relying solely on the existing kernel mappings.
Now that we have separate page tables we need to make sure the EFI
boot services regions are mapped correctly, even if someone else has
already called memblock_reserve(). Instead of stashing a tag in
->numpages, set the EFI_MEMORY_RUNTIME bit of ->attribute. Since it
generally makes no sense to mark a boot services region as required at
runtime, it's pretty much guaranteed the firmware will not have
already set this bit.
For the record, the specific circumstances under which Alexis
triggered this bug was that an EFI runtime driver on his machine was
responding to the EVT_SIGNAL_VIRTUAL_ADDRESS_CHANGE event during
SetVirtualAddressMap().
The event handler for this driver looks like this,
sub rsp,0x28
lea rdx,[rip+0x2445] # 0xaa948720
mov ecx,0x4
call func_aa9447c0 ; call to ConvertPointer(4, & 0xaa948720)
mov r11,QWORD PTR [rip+0x2434] # 0xaa948720
xor eax,eax
mov BYTE PTR [r11+0x1],0x1
add rsp,0x28
ret
Which is pretty typical code for an EVT_SIGNAL_VIRTUAL_ADDRESS_CHANGE
handler. The "mov r11, QWORD PTR [rip+0x2424]" was the faulting
instruction because ConvertPointer() was being called to convert the
address 0x0000000000000000, which when converted is left unchanged and
remains 0x0000000000000000.
The output of the oops trace gave the impression of a standard NULL
pointer dereference bug, but because we're accessing physical
addresses during ConvertPointer(), it wasn't. EFI boot services code
is stored at that address on Alexis' machine.
Reported-by: Alexis Murzeau <amurzeau@gmail.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Raphael Hertzog <hertzog@debian.org>
Cc: Roger Shimizu <rogershimizu@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1457695163-29632-2-git-send-email-matt@codeblueprint.co.uk
Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=815125
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Borislav Petkov [Fri, 11 Mar 2016 11:32:06 +0000 (12:32 +0100)]
x86/fpu: Fix eager-FPU handling on legacy FPU machines
i486 derived cores like Intel Quark support only the very old,
legacy x87 FPU (FSAVE/FRSTOR, CPUID bit FXSR is not set), and
our FPU code wasn't handling the saving and restoring there
properly in the 'eagerfpu' case.
So after we made eagerfpu the default for all CPU types:
58122bf1d856 x86/fpu: Default eagerfpu=on on all CPUs
these old FPU designs broke. First, Andy Shevchenko reported a splat:
WARNING: CPU: 0 PID: 823 at arch/x86/include/asm/fpu/internal.h:163 fpu__clear+0x8c/0x160
which was us trying to execute FXRSTOR on those machines even though
they don't support it.
After taking care of that, Bryan O'Donoghue reported that a simple FPU
test still failed because we weren't initializing the FPU state properly
on those machines.
Take care of all that.
Reported-and-tested-by: Bryan O'Donoghue <pure.logic@nexus-software.ie>
Reported-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/20160311113206.GD4312@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Linus Torvalds [Sat, 12 Mar 2016 00:34:18 +0000 (16:34 -0800)]
Merge tag 'for-linus-
20160311' of git://git.infradead.org/linux-mtd
Pull MTD fixes from Brian Norris:
"Late MTD fix for v4.5:
- A simple error code handling fix for the NAND ECC test; this was a
regression in v4.5-rc1
- A MAINTAINERS update, which might as well go in ASAP"
* tag 'for-linus-
20160311' of git://git.infradead.org/linux-mtd:
MAINTAINERS: add a maintainer for the NAND subsystem
mtd: nand: tests: fix regression introduced in mtd_nandectest
Linus Torvalds [Sat, 12 Mar 2016 00:19:23 +0000 (16:19 -0800)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm/i915 fixes from Dave Airlie:
"Just two i915 regression fixes, that should be it from me"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm/i915: Actually retry with bit-banging after GMBUS timeout
drm/i915: Fix bogus dig_port_map[] assignment for pre-HSW
Matthew Dawson [Fri, 11 Mar 2016 21:08:07 +0000 (13:08 -0800)]
mm/mempool: avoid KASAN marking mempool poison checks as use-after-free
When removing an element from the mempool, mark it as unpoisoned in KASAN
before verifying its contents for SLUB/SLAB debugging. Otherwise KASAN
will flag the reads checking the element use-after-free writes as
use-after-free reads.
Signed-off-by: Matthew Dawson <matthew@mjdsystems.ca>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 11 Mar 2016 20:35:54 +0000 (12:35 -0800)]
Merge tag 'armsoc-fixes' of git://git./linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Olof Johansson:
"Two more fixes for 4.5:
- One is a fix for OMAP that is urgently needed to avoid DRA7xx chips
from premature aging, by always keeping the Ethernet clock enabled.
- The other solves a I/O memory layout issue on Armada, where SROM
and PCI memory windows were conflicting in some configurations"
* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
ARM: mvebu: fix overlap of Crypto SRAM with PCIe memory window
ARM: dts: dra7: do not gate cpsw clock due to errata i877
ARM: OMAP2+: hwmod: Introduce ti,no-idle dt property
Linus Torvalds [Fri, 11 Mar 2016 20:32:02 +0000 (12:32 -0800)]
Merge tag 'media/v4.5-5' of git://git./linux/kernel/git/mchehab/linux-media
Pull media fix from Mauro Carvalho Chehab:
"One last time fix: It adds a code that prevents some media tools like
media-ctl to hide some entities that have their IDs out of the range
expected by those apps"
* tag 'media/v4.5-5' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
[media] media-device: map new functions into old types for legacy API
Thomas Petazzoni [Tue, 8 Mar 2016 15:59:57 +0000 (16:59 +0100)]
ARM: mvebu: fix overlap of Crypto SRAM with PCIe memory window
When the Crypto SRAM mappings were added to the Device Tree files
describing the Armada XP boards in commit
c466d997bb16 ("ARM: mvebu:
define crypto SRAM ranges for all armada-xp boards"), the fact that
those mappings were overlaping with the PCIe memory aperture was
overlooked. Due to this, we currently have for all Armada XP platforms
a situation that looks like this:
Memory mapping on Armada XP boards with internal registers at
0xf1000000:
- 0x00000000 -> 0xf0000000 3.75G RAM
- 0xf0000000 -> 0xf1000000 16M NOR flashes (AXP GP / AXP DB)
- 0xf1000000 -> 0xf1100000 1M internal registers
- 0xf8000000 -> 0xffe0000 126M PCIe memory aperture
- 0xf8100000 -> 0xf8110000 64KB Crypto SRAM #0 => OVERLAPS WITH PCIE !
- 0xf8110000 -> 0xf8120000 64KB Crypto SRAM #1 => OVERLAPS WITH PCIE !
- 0xffe00000 -> 0xfff00000 1M PCIe I/O aperture
- 0xfff0000 -> 0xffffffff 1M BootROM
The overlap means that when PCIe devices are added, depending on their
memory window needs, they might or might not be mapped into the
physical address space. Indeed, they will not be mapped if the area
allocated in the PCIe memory aperture by the PCI core overlaps with
one of the Crypto SRAM. Typically, a Intel IGB PCIe NIC that needs 8MB
of PCIe memory will see its PCIe memory window allocated from
0xf80000000 for 8MB, which overlaps with the Crypto SRAM windows. Due
to this, the PCIe window is not created, and any attempt to access the
PCIe window makes the kernel explode:
[ 3.302213] igb: Copyright (c) 2007-2014 Intel Corporation.
[ 3.307841] pci 0000:00:09.0: enabling device (0140 -> 0143)
[ 3.313539] mvebu_mbus: cannot add window '4:f8', conflicts with another window
[ 3.320870] mvebu-pcie soc:pcie-controller: Could not create MBus window at [mem 0xf8000000-0xf87fffff]: -22
[ 3.330811] Unhandled fault: external abort on non-linefetch (0x1008) at 0xf08c0018
This problem does not occur on Armada 370 boards, because we use the
following memory mapping (for boards that have internal registers at
0xf1000000):
- 0x00000000 -> 0xf0000000 3.75G RAM
- 0xf0000000 -> 0xf1000000 16M NOR flashes (AXP GP / AXP DB)
- 0xf1000000 -> 0xf1100000 1M internal registers
- 0xf1100000 -> 0xf1110000 64KB Crypto SRAM #0 => OK !
- 0xf8000000 -> 0xffe0000 126M PCIe memory
- 0xffe00000 -> 0xfff00000 1M PCIe I/O
- 0xfff0000 -> 0xffffffff 1M BootROM
Obviously, the solution is to align the location of the Crypto SRAM
mappings of Armada XP to be similar with the ones on Armada 370, i.e
have them between the "internal registers" area and the beginning of
the PCIe aperture.
However, we have a special case with the OpenBlocks AX3-4 platform,
which has a 128 MB NOR flash. Currently, this NOR flash is mapped from
0xf0000000 to 0xf8000000. This is possible because on OpenBlocks
AX3-4, the internal registers are not at 0xf1000000. And this explains
why the Crypto SRAM mappings were not configured at the same place on
Armada XP.
Hence, the solution is two-fold:
(1) Move the NOR flash mapping on Armada XP OpenBlocks AX3-4 from
0xe8000000 to 0xf0000000. This frees the 0xf0000000 ->
0xf80000000 space.
(2) Move the Crypto SRAM mappings on Armada XP to be similar to
Armada 370 (except of course that Armada XP has two Crypto SRAM
and not one).
After this patch, the memory mapping on Armada XP boards with
registers at 0xf1 is:
- 0x00000000 -> 0xf0000000 3.75G RAM
- 0xf0000000 -> 0xf1000000 16M NOR flashes (AXP GP / AXP DB)
- 0xf1000000 -> 0xf1100000 1M internal registers
- 0xf1100000 -> 0xf1110000 64KB Crypto SRAM #0
- 0xf1110000 -> 0xf1120000 64KB Crypto SRAM #1
- 0xf8000000 -> 0xffe0000 126M PCIe memory
- 0xffe00000 -> 0xfff00000 1M PCIe I/O
- 0xfff0000 -> 0xffffffff 1M BootROM
And the memory mapping for the special case of the OpenBlocks AX3-4
(internal registers at 0xd0000000, NOR of 128 MB):
- 0x00000000 -> 0xc0000000 3G RAM
- 0xd0000000 -> 0xd1000000 1M internal registers
- 0xe800000 -> 0xf0000000 128M NOR flash
- 0xf1100000 -> 0xf1110000 64KB Crypto SRAM #0
- 0xf1110000 -> 0xf1120000 64KB Crypto SRAM #1
- 0xf8000000 -> 0xffe0000 126M PCIe memory
- 0xffe00000 -> 0xfff00000 1M PCIe I/O
- 0xfff0000 -> 0xffffffff 1M BootROM
Fixes:
c466d997bb16 ("ARM: mvebu: define crypto SRAM ranges for all armada-xp boards")
Reported-by: Phil Sutter <phil@nwl.cc>
Cc: Phil Sutter <phil@nwl.cc>
Cc: <stable@vger.kernel.org>
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Acked-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: Olof Johansson <olof@lixom.net>
Linus Torvalds [Fri, 11 Mar 2016 18:57:18 +0000 (10:57 -0800)]
Merge tag 'dmaengine-fix-4.5' of git://git.infradead.org/users/vkoul/slave-dma
Pull dmaengine fixes from Vinod Koul:
"Two fixes showed up in last few days, and they should be included in
4.5. Summary:
Two more late fixes to drivers, nothing major here:
- A memory leak fix in fsdma unmap the dma descriptors on freeup
- A fix in xdmac driver for residue calculation of dma descriptor"
* tag 'dmaengine-fix-4.5' of git://git.infradead.org/users/vkoul/slave-dma:
dmaengine: at_xdmac: fix residue computation
dmaengine: fsldma: fix memory leak
Linus Torvalds [Fri, 11 Mar 2016 18:45:03 +0000 (10:45 -0800)]
Merge tag 'pm+acpi-4.5-final' of git://git./linux/kernel/git/rafael/linux-pm
Pull power management and ACPI fixes from Rafael Wysocki:
"Two more fixes for issues introduced recently, one in the generic
device properties framework and one in ACPICA.
Specifics:
- Revert a recent ACPICA commit that has been reverted upstream,
because it caused problems to happen on user systems and the
problem it attempted to address will not be relevant any more after
upcoming ACPI specification changes (Bob Moore).
- Fix crash in the generic device properties framework introduced by
a recent change that forgot to check pointers against error values
in addition to checking them against NULL (Heikki Krogerus)"
* tag 'pm+acpi-4.5-final' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
device property: fwnode->secondary may contain ERR_PTR(-ENODEV)
ACPICA: Revert "Parser: Fix for SuperName method invocation"
Linus Torvalds [Fri, 11 Mar 2016 18:21:32 +0000 (10:21 -0800)]
Merge tag 'xfs-for-linus-4.5-rc7' of git://git./linux/kernel/git/dgc/linux-xfs
Pull xfs fixes from Dave Chinner:
"This is a fix for a regression introduced in 4.5-rc1 by the new torn
log write detection code. The regression only affects people moving a
clean filesystem between machines/kernels of different architecture
(such as changing between 32 bit and 64 bit kernels), but this is the
recommended (and only!) safe way to migrate a filesystem between
architectures so we really need to ensure it works.
The changes are larger than I'd prefer right at the end of the release
cycle, but the majority of the change is just factoring code to enable
the detection of a clean log at the correct time to avoid this issue.
Changes:
- Only perform torn log write detection on dirty logs. This prevents
failures being detected due to a clean filesystem being moved
between machines or kernels of different architectures (e.g. 32 ->
64 bit, BE -> LE, etc). This fixes a regression introduced by the
torn log write detection in 4.5-rc1"
* tag 'xfs-for-linus-4.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs:
xfs: only run torn log write detection on dirty logs
xfs: refactor in-core log state update to helper
xfs: refactor unmount record detection into helper
xfs: separate log head record discovery from verification
Linus Torvalds [Fri, 11 Mar 2016 18:13:49 +0000 (10:13 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro:
"A couple of fixes: Fix for my dumb braino in ncpfs and a long-standing
breakage on recovery from failed rename() in jffs2"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
jffs2: reduce the breakage on recovery from halfway failed rename()
ncpfs: fix a braino in OOM handling in ncp_fill_cache()
Rafael J. Wysocki [Fri, 11 Mar 2016 13:22:54 +0000 (14:22 +0100)]
Merge branches 'device-properties-fixes' and 'acpica-fixes'
* device-properties-fixes:
device property: fwnode->secondary may contain ERR_PTR(-ENODEV)
* acpica-fixes:
ACPICA: Revert "Parser: Fix for SuperName method invocation"
Ville Syrjälä [Mon, 7 Mar 2016 15:56:57 +0000 (17:56 +0200)]
drm/i915: Actually retry with bit-banging after GMBUS timeout
After the GMBUS transfer times out, we set force_bit=1 and
return -EAGAIN expecting the i2c core to call the .master_xfer
hook again so that we will retry the same transfer via bit-banging.
This is in case the gmbus hardware is somehow faulty.
Unfortunately we left adapter->retries to 0, meaning the i2c core
didn't actually do the retry. Let's tell the core we want one retry
when we return -EAGAIN.
Note that i2c-algo-bit also uses this retry count for some internal
retries, so we'll end up increasing those a bit as well.
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: drm-intel-fixes@lists.freedesktop.org
Fixes:
bffce907d640 ("drm/i915: abstract i2c bit banging fallback in gmbus xfer")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1457366220-29409-2-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
(cherry picked from commit
8b1f165a4a8f64c28cf42d10e1f4d3b451dedc51)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Josh Hunt [Tue, 8 Mar 2016 15:52:12 +0000 (10:52 -0500)]
ACPI / APEI: ERST: Fixed leaked resources in erst_init
erst_init currently leaks resources allocated from its call to
apei_resources_init(). The data allocated there gets copied
into apei_resources_all and can be freed when we're done with it.
Signed-off-by: Josh Hunt <johunt@akamai.com>
Reviewed-by: Chen, Gong <gong.chen@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Josh Hunt [Tue, 8 Mar 2016 15:52:11 +0000 (10:52 -0500)]
ACPI / APEI: Fix leaked resources
We leak the NVS and arch resources (if used), in apei_resources_request.
They are allocated to make sure we exclude them from the APEI resources,
but they are never freed at the end of the function. Free them now.
Signed-off-by: Josh Hunt <johunt@akamai.com>
Reviewed-by: Chen, Gong <gong.chen@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Rafael J. Wysocki [Thu, 10 Mar 2016 22:45:19 +0000 (23:45 +0100)]
intel_pstate: Do not skip samples partially
If the current value of MPERF or the current value of TSC is the
same as the previous one, respectively, intel_pstate_sample() bails
out early and skips the sample.
However, intel_pstate_adjust_busy_pstate() is still called in that
case which is not correct, so modify intel_pstate_sample() to
return a bool value indicating whether or not the sample has been
taken and use it to decide whether or not to call
intel_pstate_adjust_busy_pstate().
While at it, remove redundant parentheses from the MPERF/TSC
check in intel_pstate_sample().
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Philippe Longepe [Sun, 6 Mar 2016 07:34:06 +0000 (08:34 +0100)]
intel_pstate: Remove freq calculation from intel_pstate_calc_busy()
Use a helper function to compute the average pstate and call it only
where it is needed (only when tracing or in intel_pstate_get).
Signed-off-by: Philippe Longepe <philippe.longepe@linux.intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Philippe Longepe [Sun, 6 Mar 2016 07:34:05 +0000 (08:34 +0100)]
intel_pstate: Move intel_pstate_calc_busy() into get_target_pstate_use_performance()
The cpu_load algorithm doesn't need to invoke intel_pstate_calc_busy(),
so move that call from intel_pstate_sample() to
get_target_pstate_use_performance().
Signed-off-by: Philippe Longepe <philippe.longepe@linux.intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Philippe Longepe [Sun, 6 Mar 2016 07:34:04 +0000 (08:34 +0100)]
intel_pstate: Optimize calculation for max/min_perf_adj
mul_fp(int_tofp(A), B) expands to:
((A << FRAC_BITS) * B) >> FRAC_BITS, so the same result can be obtained
via simple multiplication A * B. Apply this observation to
max_perf * limits->max_perf and max_perf * limits->min_perf in
intel_pstate_get_min_max()."
Signed-off-by: Philippe Longepe <philippe.longepe@linux.intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Philippe Longepe [Tue, 8 Mar 2016 09:31:14 +0000 (10:31 +0100)]
intel_pstate: Remove extra conversions in pid calculation
pid->setpoint and pid->deadband can be initialized in fixed point, so we
can avoid the int_tofp in pid_calc.
Signed-off-by: Philippe Longepe <philippe.longepe@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Xiangliang Yu [Thu, 10 Mar 2016 11:34:52 +0000 (19:34 +0800)]
i2c: designware: Add device HID for future AMD I2C controller
Add device HID AMDI0010 to match the AMD ACPI Vendor ID (AMDI) that
was registered in http://www.uefi.org/acpi_id_list, and the I2C
controller on future AMD paltform will use the HID instead of AMD0010.
Signed-off-by: Xiangliang Yu <Xiangliang.Yu@amd.com>
Acked-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Heikki Krogerus [Thu, 10 Mar 2016 11:03:18 +0000 (13:03 +0200)]
device property: fix for a case of use-after-free
In device_remove_property_set(), the secondary fwnode needs
to be cleared before the pset is freed. This fixes a
use-after-free when a property set is providing the primary
fwnode.
As a result of the fix, the primary fwnode may end up
containing ERR_PTR(-ENODEV), so also adding checks for it to
the property handling code.
Reported-by: John Youn <John.Youn@synopsys.com>
Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Lv Zheng [Thu, 10 Mar 2016 02:54:29 +0000 (10:54 +0800)]
ACPICA / Interpreter: Fix a regression triggered because of wrong Linux ECDT support
It is reported that the following commit triggers regressions:
Linux commit:
efaed9be998b5ae0afb7458e057e5f4402b43fa0
ACPICA commit:
31178590dde82368fdb0f6b0e466b6c0add96c57
Subject: ACPICA: Events: Enhance acpi_ev_execute_reg_method() to
ensure no _REG evaluations can happen during OS early boot
stages
This is because that the ECDT support is not corrected in Linux, and Linux
requires to execute _REG for ECDT (though this sounds so wrong), we need to
ensure acpi_gbl_namespace_initialized is set before ECDT probing in order
for _REG to be executed. Since we have to move
"acpi_gbl_namespace_initialized = TRUE" to the initialization step
happening before ECDT probing, acpi_load_tables() is the best candidate for
now. Thus this patch fixes the regression by doing so.
But if the ECDT support is fixed, Linux will not execute _REG for ECDT, and
ECDT probing will happen before acpi_load_tables(). At that time, we still
want to ensure acpi_gbl_namespace_initialized is set after executing
acpi_ns_initialize_objects() (under the condition of
acpi_gbl_group_module_level_code = FALSE), this patch also moves
acpi_ns_initialize_objects() to acpi_load_tables() accordingly.
Since acpi_ns_initialize_objects() doesn't seem to be skippable, this
patch also removes ACPI_NO_OBJECT_INIT for the one invoked in
acpi_load_tables(). And since the default region handlers should always be
installed before loading the tables, this patch also removes useless
acpi_gbl_group_module_level_code check accordingly. Reported by Chris
Bainbridge, Fixed by Lv Zheng.
Fixes:
efaed9be998b (ACPICA: Events: Enhance acpi_ev_execute_reg_method() to ensure no _REG evaluations can happen during OS early boot stages)
Reported-and-tested-by: Chris Bainbridge <chris.bainbridge@gmail.com>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Rafael J. Wysocki [Thu, 10 Mar 2016 19:46:03 +0000 (20:46 +0100)]
Merge branch 'pm-cpufreq-governor' into pm-cpufreq
Rafael J. Wysocki [Thu, 10 Mar 2016 19:44:47 +0000 (20:44 +0100)]
cpufreq: Move scheduler-related code to the sched directory
Create cpufreq.c under kernel/sched/ and move the cpufreq code
related to the scheduler to that file and to sched.h.
Redefine cpufreq_update_util() as a static inline function to avoid
function calls at its call sites in the scheduler code (as suggested
by Peter Zijlstra).
Also move the definition of struct update_util_data and declaration
of cpufreq_set_update_util_data() from include/linux/cpufreq.h to
include/linux/sched.h.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Boris BREZILLON [Wed, 9 Mar 2016 20:57:24 +0000 (21:57 +0100)]
MAINTAINERS: add a maintainer for the NAND subsystem
Add myself as the maintainer of the NAND subsystem.
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Acked-by: Richard Weinberger <richard@nod.at>
Acked-by: Brian Norris <computersforpeace@gmail.com>
Acked-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Linus Torvalds [Thu, 10 Mar 2016 18:42:15 +0000 (10:42 -0800)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
"A few simple fixes for ARM, x86, PPC and generic code.
The x86 MMU fix is a bit larger because the surrounding code needed a
cleanup, but nothing worrisome"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: MMU: fix reserved bit check for ept=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0
KVM: MMU: fix ept=0/pte.u=1/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 combo
kvm: cap halt polling at exactly halt_poll_ns
KVM: s390: correct fprs on SIGP (STOP AND) STORE STATUS
KVM: VMX: disable PEBS before a guest entry
KVM: PPC: Book3S HV: Sanitize special-purpose register values on guest exit
Linus Torvalds [Thu, 10 Mar 2016 18:39:04 +0000 (10:39 -0800)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"I thought we were done for 4.5, but then the 64k-page chaps came
crawling out of the woodwork. *sigh*
The vmemmap fix I sent for -rc7 caused a regression with 64k pages and
sparsemem and at some point during the release cycle the new hugetlb
code using contiguous ptes started failing the libhugetlbfs tests with
64k pages enabled.
So here are a couple of patches that fix the vmemmap alignment and
disable the new hugetlb page sizes whilst a proper fix is being
developed:
- Temporarily disable huge pages built using contiguous ptes
- Ensure vmemmap region is sufficiently aligned for sparsemem
sections"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: hugetlb: partial revert of
66b3923a1a0f
arm64: account for sparsemem section alignment when choosing vmemmap offset
Linus Torvalds [Thu, 10 Mar 2016 18:36:07 +0000 (10:36 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/s390/linux
Pull s390 fixes from Martin Schwidefsky:
"Three bug fixes:
- The fix for the page table corruption (CVE-2016-2143)
- The diagnose statistics introduced a regression for the dasd diag
driver
- Boot crash on systems without the set-program-parameters facility"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/mm: four page table levels vs. fork
s390/cpumf: Fix lpp detection
s390/dasd: fix diag 0x250 inline assembly
Mauro Carvalho Chehab [Sat, 5 Mar 2016 10:13:39 +0000 (07:13 -0300)]
[media] media-device: map new functions into old types for legacy API
The legacy media controller userspace API exposes entity types that
carry both type and function information. The new API replaces the type
with a function. It preserves backward compatibility by defining legacy
functions for the existing types and using them in drivers.
This works fine, as long as newer entity functions won't be added.
Unfortunately, some tools, like media-ctl with --print-dot argument
rely on the now legacy MEDIA_ENT_T_V4L2_SUBDEV and MEDIA_ENT_T_DEVNODE
numeric ranges to identify what entities will be shown.
Also, if the entity doesn't match those ranges, it will ignore the
major/minor information on devnodes, and won't be getting the devnode
name via udev or sysfs.
As we're now adding devices outside the old range, the legacy ioctl
needs to map the new entity functions into a type at the old range,
or otherwise we'll have a regression.
Detected on all released media-ctl versions (e. g. versions <= 1.10).
Fix this by deriving the type from the function to emulate the legacy
API if the function isn't in the legacy functions range.
Reported-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Ludovic Desroches [Thu, 10 Mar 2016 09:17:55 +0000 (10:17 +0100)]
dmaengine: at_xdmac: fix residue computation
When computing the residue we need two pieces of information: the current
descriptor and the remaining data of the current descriptor. To get
that information, we need to read consecutively two registers but we
can't do it in an atomic way. For that reason, we have to check manually
that current descriptor has not changed.
Signed-off-by: Ludovic Desroches <ludovic.desroches@atmel.com>
Suggested-by: Cyrille Pitchen <cyrille.pitchen@atmel.com>
Reported-by: David Engraf <david.engraf@sysgo.com>
Tested-by: David Engraf <david.engraf@sysgo.com>
Fixes:
e1f7c9eee707 ("dmaengine: at_xdmac: creation of the atmel
eXtended DMA Controller driver")
Cc: stable@vger.kernel.org #4.1 and later
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Borislav Petkov [Wed, 9 Mar 2016 20:56:22 +0000 (21:56 +0100)]
x86/delay: Avoid preemptible context checks in delay_mwaitx()
We do use this_cpu_ptr(&cpu_tss) as a cacheline-aligned, seldomly
accessed per-cpu var as the MONITORX target in delay_mwaitx(). However,
when called in preemptible context, this_cpu_ptr -> smp_processor_id() ->
debug_smp_processor_id() fires:
BUG: using smp_processor_id() in preemptible [
00000000] code: udevd/312
caller is delay_mwaitx+0x40/0xa0
But we don't care about that check - we only need cpu_tss as a MONITORX
target and it doesn't really matter which CPU's var we're touching as
we're going idle anyway. Fix that.
Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Huang Rui <ray.huang@amd.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: spg_linux_kernel@amd.com
Link: http://lkml.kernel.org/r/20160309205622.GG6564@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Paolo Bonzini [Wed, 9 Mar 2016 13:28:02 +0000 (14:28 +0100)]
KVM: MMU: fix reserved bit check for ept=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0
KVM has special logic to handle pages with pte.u=1 and pte.w=0 when
CR0.WP=1. These pages' SPTEs flip continuously between two states:
U=1/W=0 (user and supervisor reads allowed, supervisor writes not allowed)
and U=0/W=1 (supervisor reads and writes allowed, user writes not allowed).
When SMEP is in effect, however, U=0 will enable kernel execution of
this page. To avoid this, KVM also sets NX=1 in the shadow PTE together
with U=0, making the two states U=1/W=0/NX=gpte.NX and U=0/W=1/NX=1.
When guest EFER has the NX bit cleared, the reserved bit check thinks
that the latter state is invalid; teach it that the smep_andnot_wp case
will also use the NX bit of SPTEs.
Cc: stable@vger.kernel.org
Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.inel.com>
Fixes:
c258b62b264fdc469b6d3610a907708068145e3b
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Tue, 8 Mar 2016 11:13:39 +0000 (12:13 +0100)]
KVM: MMU: fix ept=0/pte.u=1/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 combo
Yes, all of these are needed. :) This is admittedly a bit odd, but
kvm-unit-tests access.flat tests this if you run it with "-cpu host"
and of course ept=0.
KVM runs the guest with CR0.WP=1, so it must handle supervisor writes
specially when pte.u=1/pte.w=0/CR0.WP=0. Such writes cause a fault
when U=1 and W=0 in the SPTE, but they must succeed because CR0.WP=0.
When KVM gets the fault, it sets U=0 and W=1 in the shadow PTE and
restarts execution. This will still cause a user write to fault, while
supervisor writes will succeed. User reads will fault spuriously now,
and KVM will then flip U and W again in the SPTE (U=1, W=0). User reads
will be enabled and supervisor writes disabled, going back to the
originary situation where supervisor writes fault spuriously.
When SMEP is in effect, however, U=0 will enable kernel execution of
this page. To avoid this, KVM also sets NX=1 in the shadow PTE together
with U=0. If the guest has not enabled NX, the result is a continuous
stream of page faults due to the NX bit being reserved.
The fix is to force EFER.NX=1 even if the CPU is taking care of the EFER
switch. (All machines with SMEP have the CPU_LOAD_IA32_EFER vm-entry
control, so they do not use user-return notifiers for EFER---if they did,
EFER.NX would be forced to the same value as the host).
There is another bug in the reserved bit check, which I've split to a
separate patch for easier application to stable kernels.
Cc: stable@vger.kernel.org
Cc: Andy Lutomirski <luto@amacapital.net>
Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Fixes:
f6577a5fa15d82217ca73c74cd2dcbc0f6c781dd
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Yu-cheng Yu [Thu, 10 Mar 2016 00:28:54 +0000 (16:28 -0800)]
x86/fpu: Revert ("x86/fpu: Disable AVX when eagerfpu is off")
Leonid Shatz noticed that the SDM interpretation of the following
recent commit:
394db20ca240741 ("x86/fpu: Disable AVX when eagerfpu is off")
... is incorrect and that the original behavior of the FPU code was correct.
Because AVX is not stated in CR0 TS bit description, it was mistakenly
believed to be not supported for lazy context switch. This turns out
to be false:
Intel Software Developer's Manual Vol. 3A, Sec. 2.5 Control Registers:
'TS Task Switched bit (bit 3 of CR0) -- Allows the saving of the x87 FPU/
MMX/SSE/SSE2/SSE3/SSSE3/SSE4 context on a task switch to be delayed until
an x87 FPU/MMX/SSE/SSE2/SSE3/SSSE3/SSE4 instruction is actually executed
by the new task.'
Intel Software Developer's Manual Vol. 2A, Sec. 2.4 Instruction Exception
Specification:
'AVX instructions refer to exceptions by classes that include #NM
"Device Not Available" exception for lazy context switch.'
So revert the commit.
Reported-by: Leonid Shatz <leonid.shatz@ravellosystems.com>
Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1457569734-3785-1-git-send-email-yu-cheng.yu@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Martin Schwidefsky [Mon, 15 Feb 2016 13:46:49 +0000 (14:46 +0100)]
s390/mm: four page table levels vs. fork
The fork of a process with four page table levels is broken since
git commit
6252d702c5311ce9 "[S390] dynamic page tables."
All new mm contexts are created with three page table levels and
an asce limit of 4TB. If the parent has four levels dup_mmap will
add vmas to the new context which are outside of the asce limit.
The subsequent call to copy_page_range will walk the three level
page table structure of the new process with non-zero pgd and pud
indexes. This leads to memory clobbers as the pgd_index *and* the
pud_index is added to the mm->pgd pointer without a pgd_deref
in between.
The init_new_context() function is selecting the number of page
table levels for a new context. The function is used by mm_init()
which in turn is called by dup_mm() and mm_alloc(). These two are
used by fork() and exec(). The init_new_context() function can
distinguish the two cases by looking at mm->context.asce_limit,
for fork() the mm struct has been copied and the number of page
table levels may not change. For exec() the mm_alloc() function
set the new mm structure to zero, in this case a three-level page
table is created as the temporary stack space is located at
STACK_TOP_MAX = 4TB.
This fixes CVE-2016-2143.
Reported-by: Marcin Kościelnicki <koriakin@0x04.net>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: stable@vger.kernel.org
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Linus Torvalds [Thu, 10 Mar 2016 04:24:23 +0000 (20:24 -0800)]
Merge tag 'spi-fix-v4.5-rc7' of git://git./linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"A few driver specific fixes for the Rockchip and i.MX SPI controllers,
especially for the i.MX they're annoying bugs if you run into them"
* tag 'spi-fix-v4.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: imx: fix spi resource leak with dma transfer
spi: imx: allow only WML aligned transfers to use DMA
spi: rockchip: add missing spi_master_put
spi: rockchip: disable runtime pm when in err case
Mark Brown [Thu, 10 Mar 2016 03:42:24 +0000 (10:42 +0700)]
Merge remote-tracking branch 'spi/fix/rockchip' into spi-linus
Mark Brown [Thu, 10 Mar 2016 03:42:22 +0000 (10:42 +0700)]
Merge remote-tracking branch 'spi/fix/imx' into spi-linus
Linus Torvalds [Thu, 10 Mar 2016 03:33:05 +0000 (19:33 -0800)]
Merge tag 'ext4_for_linus_stable' of git://git./linux/kernel/git/tytso/ext4
Pull ext4 fix from Ted Ts'o:
"This fixes a regression which crept in v4.5-rc5"
* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: iterate over buffer heads correctly in move_extent_per_page()
Linus Torvalds [Thu, 10 Mar 2016 03:12:37 +0000 (19:12 -0800)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"A few imx fixes I missed from a couple of weeks ago, they still aren't
that big and fix some regression and a fail to boot problem.
Other than that, a couple of regression fixes for radeon/amdgpu, one
regression fix for vmwgfx and one regression fix for tda998x"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
Revert "drm/radeon/pm: adjust display configuration after powerstate"
drm/amdgpu/dp: add back special handling for NUTMEG
drm/radeon/dp: add back special handling for NUTMEG
drm/i2c: tda998x: Choose between atomic or non atomic dpms helper
drm/vmwgfx: Add back ->detect() and ->fill_modes()
drm/radeon: Fix error handling in radeon_flip_work_func.
drm/amdgpu: Fix error handling in amdgpu_flip_work_func.
drm/imx: Add missing DRM_FORMAT_RGB565 to ipu_plane_formats
drm/imx: notify DRM core about CRTC vblank state
gpu: ipu-v3: Reset IPU before activating IRQ
gpu: ipu-v3: Do not bail out on missing optional port nodes
Linus Torvalds [Thu, 10 Mar 2016 03:01:58 +0000 (19:01 -0800)]
Merge tag 'trace-fixes-v4.5-rc7' of git://git./linux/kernel/git/rostedt/linux-trace
Pull tracing fix from Steven Rostedt:
"I previously sent a fix that prevents all trace events from being
called if the current cpu is offline.
But I forgot that in 3.18, we added lockdep checks to test RCU usage
even when the event is disabled. Although there cannot be any bug
when a cpu is going offline, we now get false warnings triggered by
the added checks of the event being disabled.
I removed the check from the tracepoint code itself, and added it to
the condition section (which is "1" for 'no condition'). This way the
online cpu check will get checked in all the right locations"
* tag 'trace-fixes-v4.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Fix check for cpu online when event is disabled
Eryu Guan [Sun, 21 Feb 2016 23:38:44 +0000 (18:38 -0500)]
ext4: iterate over buffer heads correctly in move_extent_per_page()
In commit
bcff24887d00 ("ext4: don't read blocks from disk after extents
being swapped") bh is not updated correctly in the for loop and wrong
data has been written to disk. generic/324 catches this on sub-page
block size ext4.
Fixes:
bcff24887d00 ("ext4: don't read blocks from disk after extentsbeing swapped")
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Linus Torvalds [Thu, 10 Mar 2016 02:27:52 +0000 (18:27 -0800)]
Merge branch 'akpm' (patches from Andrew)
Merge fixes from Andrew Morton:
"13 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
dma-mapping: avoid oops when parameter cpu_addr is null
mm/hugetlb: use EOPNOTSUPP in hugetlb sysctl handlers
memremap: check pfn validity before passing to pfn_to_page()
mm, thp: fix migration of PTE-mapped transparent huge pages
dax: check return value of dax_radix_entry()
ocfs2: fix return value from ocfs2_page_mkwrite()
arm64: kasan: clear stale stack poison
sched/kasan: remove stale KASAN poison after hotplug
kasan: add functions to clear stack poison
mm: fix mixed zone detection in devm_memremap_pages
list: kill list_force_poison()
mm: __delete_from_page_cache show Bad page if mapped
mm/hugetlb: hugetlb_no_page: rate-limit warning message
Zhen Lei [Wed, 9 Mar 2016 22:08:38 +0000 (14:08 -0800)]
dma-mapping: avoid oops when parameter cpu_addr is null
To keep consistent with kfree, which tolerate ptr is NULL. We do this
because sometimes we may use goto statement, so that success and failure
case can share parts of the code. But unfortunately, dma_free_coherent
called with parameter cpu_addr is null will cause oops, such as showed
below:
Unable to handle kernel paging request at virtual address
ffffffc020d3b2b8
pgd =
ffffffc083a61000
[
ffffffc020d3b2b8] *pgd=
0000000000000000, *pud=
0000000000000000
CPU: 4 PID: 1489 Comm: malloc_dma_1 Tainted: G O 4.1.12 #1
Hardware name: ARM64 (DT)
PC is at __dma_free_coherent.isra.10+0x74/0xc8
LR is at __dma_free+0x9c/0xb0
Process malloc_dma_1 (pid: 1489, stack limit = 0xffffffc0837fc020)
[...]
Call trace:
__dma_free_coherent.isra.10+0x74/0xc8
__dma_free+0x9c/0xb0
malloc_dma+0x104/0x158 [dma_alloc_coherent_mtmalloc]
kthread+0xec/0xfc
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Stancek [Wed, 9 Mar 2016 22:08:35 +0000 (14:08 -0800)]
mm/hugetlb: use EOPNOTSUPP in hugetlb sysctl handlers
Replace ENOTSUPP with EOPNOTSUPP. If hugepages are not supported, this
value is propagated to userspace. EOPNOTSUPP is part of uapi and is
widely supported by libc libraries.
It gives nicer message to user, rather than:
# cat /proc/sys/vm/nr_hugepages
cat: /proc/sys/vm/nr_hugepages: Unknown error 524
And also LTP's proc01 test was failing because this ret code (524)
was unexpected:
proc01 1 TFAIL : proc01.c:396: read failed: /proc/sys/vm/nr_hugepages: errno=???(524): Unknown error 524
proc01 2 TFAIL : proc01.c:396: read failed: /proc/sys/vm/nr_hugepages_mempolicy: errno=???(524): Unknown error 524
proc01 3 TFAIL : proc01.c:396: read failed: /proc/sys/vm/nr_overcommit_hugepages: errno=???(524): Unknown error 524
Signed-off-by: Jan Stancek <jstancek@redhat.com>
Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ard Biesheuvel [Wed, 9 Mar 2016 22:08:32 +0000 (14:08 -0800)]
memremap: check pfn validity before passing to pfn_to_page()
In memremap's helper function try_ram_remap(), we dereference a struct
page pointer that was derived from a PFN that is known to be covered by
a 'System RAM' iomem region, and is thus assumed to be a 'valid' PFN,
i.e., a PFN that has a struct page associated with it and is covered by
the kernel direct mapping.
However, the assumption that there is a 1:1 relation between the System
RAM iomem region and the kernel direct mapping is not universally valid
on all architectures, and on ARM and arm64, 'System RAM' may include
regions for which pfn_valid() returns false.
Generally speaking, both __va() and pfn_to_page() should only ever be
called on PFNs/physical addresses for which pfn_valid() returns true, so
add that check to try_ram_remap().
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kirill A. Shutemov [Wed, 9 Mar 2016 22:08:30 +0000 (14:08 -0800)]
mm, thp: fix migration of PTE-mapped transparent huge pages
We don't have native support of THP migration, so we have to split huge
page into small pages in order to migrate it to different node. This
includes PTE-mapped huge pages.
I made mistake in refcounting patchset: we don't actually split
PTE-mapped huge page in queue_pages_pte_range(), if we step on head
page.
The result is that the head page is queued for migration, but none of
tail pages: putting head page on queue takes pin on the page and any
subsequent attempts of split_huge_pages() would fail and we skip queuing
tail pages.
unmap_and_move_huge_page() will eventually split the huge pages, but
only one of 512 pages would get migrated.
Let's fix the situation.
Fixes:
248db92da13f2507 ("migrate_pages: try to split pages on queuing")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ross Zwisler [Wed, 9 Mar 2016 22:08:27 +0000 (14:08 -0800)]
dax: check return value of dax_radix_entry()
dax_pfn_mkwrite() previously wasn't checking the return value of the
call to dax_radix_entry(), which was a mistake.
Instead, capture this return value and return the appropriate VM_FAULT_
value.
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Kara [Wed, 9 Mar 2016 22:08:24 +0000 (14:08 -0800)]
ocfs2: fix return value from ocfs2_page_mkwrite()
ocfs2_page_mkwrite() could mistakenly return error code instead of
mkwrite status value. Fix it.
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mark Rutland [Wed, 9 Mar 2016 22:08:21 +0000 (14:08 -0800)]
arm64: kasan: clear stale stack poison
Functions which the compiler has instrumented for KASAN place poison on
the stack shadow upon entry and remove this poison prior to returning.
In the case of cpuidle, CPUs exit the kernel a number of levels deep in
C code. Any instrumented functions on this critical path will leave
portions of the stack shadow poisoned.
If CPUs lose context and return to the kernel via a cold path, we
restore a prior context saved in __cpu_suspend_enter are forgotten, and
we never remove the poison they placed in the stack shadow area by
functions calls between this and the actual exit of the kernel.
Thus, (depending on stackframe layout) subsequent calls to instrumented
functions may hit this stale poison, resulting in (spurious) KASAN
splats to the console.
To avoid this, clear any stale poison from the idle thread for a CPU
prior to bringing a CPU online.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reviewed-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mark Rutland [Wed, 9 Mar 2016 22:08:18 +0000 (14:08 -0800)]
sched/kasan: remove stale KASAN poison after hotplug
Functions which the compiler has instrumented for KASAN place poison on
the stack shadow upon entry and remove this poision prior to returning.
In the case of CPU hotplug, CPUs exit the kernel a number of levels deep
in C code. Any instrumented functions on this critical path will leave
portions of the stack shadow poisoned.
When a CPU is subsequently brought back into the kernel via a different
path, depending on stackframe, layout calls to instrumented functions
may hit this stale poison, resulting in (spurious) KASAN splats to the
console.
To avoid this, clear any stale poison from the idle thread for a CPU
prior to bringing a CPU online.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Alexander Potapenko <glider@google.com>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mark Rutland [Wed, 9 Mar 2016 22:08:15 +0000 (14:08 -0800)]
kasan: add functions to clear stack poison
Functions which the compiler has instrumented for ASAN place poison on
the stack shadow upon entry and remove this poison prior to returning.
In some cases (e.g. hotplug and idle), CPUs may exit the kernel a
number of levels deep in C code. If there are any instrumented
functions on this critical path, these will leave portions of the idle
thread stack shadow poisoned.
If a CPU returns to the kernel via a different path (e.g. a cold
entry), then depending on stack frame layout subsequent calls to
instrumented functions may use regions of the stack with stale poison,
resulting in (spurious) KASAN splats to the console.
Contemporary GCCs always add stack shadow poisoning when ASAN is
enabled, even when asked to not instrument a function [1], so we can't
simply annotate functions on the critical path to avoid poisoning.
Instead, this series explicitly removes any stale poison before it can
be hit. In the common hotplug case we clear the entire stack shadow in
common code, before a CPU is brought online.
On architectures which perform a cold return as part of cpu idle may
retain an architecture-specific amount of stack contents. To retain the
poison for this retained context, the arch code must call the core KASAN
code, passing a "watermark" stack pointer value beyond which shadow will
be cleared. Architectures which don't perform a cold return as part of
idle do not need any additional code.
This patch (of 3):
Functions which the compiler has instrumented for KASAN place poison on
the stack shadow upon entry and remove this poision prior to returning.
In some cases (e.g. hotplug and idle), CPUs may exit the kernel a number
of levels deep in C code. If there are any instrumented functions on this
critical path, these will leave portions of the stack shadow poisoned.
If a CPU returns to the kernel via a different path (e.g. a cold entry),
then depending on stack frame layout subsequent calls to instrumented
functions may use regions of the stack with stale poison, resulting in
(spurious) KASAN splats to the console.
To avoid this, we must clear stale poison from the stack prior to
instrumented functions being called. This patch adds functions to the
KASAN core for removing poison from (portions of) a task's stack. These
will be used by subsequent patches to avoid problems with hotplug and
idle.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dan Williams [Wed, 9 Mar 2016 22:08:13 +0000 (14:08 -0800)]
mm: fix mixed zone detection in devm_memremap_pages
The check for whether we overlap "System RAM" needs to be done at
section granularity. For example a system with the following mapping:
100000000-
37bffffff : System RAM
37c000000-
837ffffff : Persistent Memory
...is unable to use devm_memremap_pages() as it would result in two
zones colliding within a given section.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Toshi Kani <toshi.kani@hpe.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dan Williams [Wed, 9 Mar 2016 22:08:10 +0000 (14:08 -0800)]
list: kill list_force_poison()
Given we have uninitialized list_heads being passed to list_add() it
will always be the case that those uninitialized values randomly trigger
the poison value. Especially since a list_add() operation will seed the
stack with the poison value for later stack allocations to trip over.
For example, see these two false positive reports:
list_add attempted on force-poisoned entry
WARNING: at lib/list_debug.c:34
[..]
NIP [
c00000000043c390] __list_add+0xb0/0x150
LR [
c00000000043c38c] __list_add+0xac/0x150
Call Trace:
__list_add+0xac/0x150 (unreliable)
__down+0x4c/0xf8
down+0x68/0x70
xfs_buf_lock+0x4c/0x150 [xfs]
list_add attempted on force-poisoned entry(
0000000000000500),
new->next ==
d0000000059ecdb0, new->prev ==
0000000000000500
WARNING: at lib/list_debug.c:33
[..]
NIP [
c00000000042db78] __list_add+0xa8/0x140
LR [
c00000000042db74] __list_add+0xa4/0x140
Call Trace:
__list_add+0xa4/0x140 (unreliable)
rwsem_down_read_failed+0x6c/0x1a0
down_read+0x58/0x60
xfs_log_commit_cil+0x7c/0x600 [xfs]
Fixes: commit
5c2c2587b132 ("mm, dax, pmem: introduce {get|put}_dev_pagemap() for dax-gup")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reported-by: Eryu Guan <eguan@redhat.com>
Tested-by: Eryu Guan <eguan@redhat.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Wed, 9 Mar 2016 22:08:07 +0000 (14:08 -0800)]
mm: __delete_from_page_cache show Bad page if mapped
Commit
e1534ae95004 ("mm: differentiate page_mapped() from
page_mapcount() for compound pages") changed the famous
BUG_ON(page_mapped(page)) in __delete_from_page_cache() to
VM_BUG_ON_PAGE(page_mapped(page)): which gives us more info when
CONFIG_DEBUG_VM=y, but nothing at all when not.
Although it has not usually been very helpul, being hit long after the
error in question, we do need to know if it actually happens on users'
systems; but reinstating a crash there is likely to be opposed :)
In the non-debug case, pr_alert("BUG: Bad page cache") plus dump_page(),
dump_stack(), add_taint() - I don't really believe LOCKDEP_NOW_UNRELIABLE,
but that seems to be the standard procedure now. Move that, or the
VM_BUG_ON_PAGE(), up before the deletion from tree: so that the
unNULLified page->mapping gives a little more information.
If the inode is being evicted (rather than truncated), it won't have any
vmas left, so it's safe(ish) to assume that the raised mapcount is
erroneous, and we can discount it from page_count to avoid leaking the
page (I'm less worried by leaking the occasional 4kB, than losing a
potential 2MB page with each 4kB page leaked).
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Geoffrey Thomas [Wed, 9 Mar 2016 22:08:04 +0000 (14:08 -0800)]
mm/hugetlb: hugetlb_no_page: rate-limit warning message
The warning message "killed due to inadequate hugepage pool" simply
indicates that SIGBUS was sent, not that the process was forcibly killed.
If the process has a signal handler installed does not fix the problem,
this message can rapidly spam the kernel log.
On my amd64 dev machine that does not have hugepages configured, I can
reproduce the repeated warnings easily by setting vm.nr_hugepages=2 (i.e.,
4 megabytes of huge pages) and running something that sets a signal
handler and forks, like
#include <sys/mman.h>
#include <signal.h>
#include <stdlib.h>
#include <unistd.h>
sig_atomic_t counter = 10;
void handler(int signal)
{
if (counter-- == 0)
exit(0);
}
int main(void)
{
int status;
char *addr = mmap(NULL, 4 *
1048576, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB, -1, 0);
if (addr == MAP_FAILED) {perror("mmap"); return 1;}
*addr = 'x';
switch (fork()) {
case -1:
perror("fork"); return 1;
case 0:
signal(SIGBUS, handler);
*addr = 'x';
break;
default:
*addr = 'x';
wait(&status);
if (WIFSIGNALED(status)) {
psignal(WTERMSIG(status), "child");
}
break;
}
}
Signed-off-by: Geoffrey Thomas <geofft@ldpreload.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Lv Zheng [Wed, 2 Mar 2016 06:16:25 +0000 (14:16 +0800)]
ACPI / OSL: Add support to install tables via initrd
This patch adds support to install tables from initrd.
If a table in the initrd wasn't used by the override mechanism,
the table would be installed after initializing all RSDT/XSDT
tables.
Link: https://lkml.org/lkml/2014/2/28/368
Reported-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Lv Zheng [Wed, 2 Mar 2016 06:16:17 +0000 (14:16 +0800)]
ACPI / OSL: Clean up initrd table override code
This patch cleans up the initrd table override code by merging
redundant logics and re-ordering code blocks.
No functional changes.
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Harb Abdulhamid [Tue, 1 Mar 2016 19:31:45 +0000 (13:31 -0600)]
PNP / ACPI: add ACPI_RESOURCE_TYPE_SERIAL_BUS as a valid type
An error message is printed for resources of type 19, which is a valid
supported resource type. The Firmware Test Suite tool (fwts) reports
this as a test failure. This change fixes the false test failures
for ASL that use type 19 (ACPI_RESOURCE_TYPE_SERIAL_BUS) resources.
Signed-off-by: Harb Abdulhamid <harba@codeaurora.org>
Signed-off-by: Timur Tabi <timur@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Colin Ian King [Sun, 28 Feb 2016 20:31:49 +0000 (20:31 +0000)]
ACPI / util: remove redundant check if element is NULL
element is &package->package.elements[i] which can never be NULL
so the check to see if it is NULL is redundant and can be removed.
Detected with static analysis by CoverityScan
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Colin Ian King [Thu, 21 Jan 2016 17:05:47 +0000 (17:05 +0000)]
ACPI: Add acpi_force_32bit_fadt_addr option to force 32 bit FADT addresses
Some HP laptops seem to have invalid 64 bit FADT X_PM* addresses
which are causing various boot issues. In these cases, it would
be useful to force ACPI to use the valid legacy 32 bit equivalent
PM addresses. Add a acpi_force_32bit_fadt_addr to set the ACPICA
acpi_gbl_use32_bit_fadt_addresses to TRUE to force this override.
Link: https://bugs.launchpad.net/bugs/1529381
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>