git.stricted.de - GitHub/moto-9609/android_kernel_motorola

Merge branch 'pm-tools'

* pm-tools:
  tools/power turbostat: bugfix: TDP MSRs print bits fixing
  tools/power turbostat: correct output for MSR_NHM_SNB_PKG_CST_CFG_CTL dump
  tools/power turbostat: call __cpuid() instead of __get_cpuid()
  tools/power turbostat: indicate SMX and SGX support
  tools/power turbostat: detect and work around syscall jitter
  tools/power turbostat: show GFX%rc6
  tools/power turbostat: show GFXMHz
  tools/power turbostat: show IRQs per CPU
  tools/power turbostat: make fewer systems calls
  tools/power turbostat: fix compiler warnings
  tools/power turbostat: add --out option for saving output in a file
  tools/power turbostat: re-name "%Busy" field to "Busy%"
  tools/power turbostat: Intel Xeon x200: fix turbo-ratio decoding
  tools/power turbostat: Intel Xeon x200: fix erroneous bclk value
  tools/power turbostat: allow sub-sec intervals
  tools/power turbostat: Decode MSR_MISC_PWR_MGMT
  tools/power turbostat: decode HWP registers
  x86 msr-index: Simplify syntax for HWP fields
  tools/power turbostat: CPUID(0x16) leaf shows base, max, and bus frequency
  tools/power turbostat: decode more CPUID fields

Merge branches 'pm-cpuidle', 'pm-sleep' and 'pm-domains'

* pm-cpuidle:
  cpuidle: menu: help gcc generate slightly better code
  cpuidle: menu: avoid expensive square root computation

* pm-sleep:
  PM / suspend: replacing printk
  PM/freezer: y2038, use boottime to compare tstamps
  PM / sleep: declare __tracedata symbols as char[] rather than char

* pm-domains:
  PM / Domains: Fix potential NULL pointer dereference
  PM / Domains: Fix removal of a subdomain
  PM / Domains: Propagate start and restore errors during runtime resume
  PM / Domains: Join state name and index in debugfs output
  PM / Domains: Restore alignment of slaves in debugfs output
  PM / Domains: remove old power on/off latencies
  ARM: imx6: pm: declare pm domain latency on power_state struct
  PM / Domains: Support for multiple states

Merge branch 'pm-cpufreq'

* pm-cpufreq: (94 commits)
  intel_pstate: Do not skip samples partially
  intel_pstate: Remove freq calculation from intel_pstate_calc_busy()
  intel_pstate: Move intel_pstate_calc_busy() into get_target_pstate_use_performance()
  intel_pstate: Optimize calculation for max/min_perf_adj
  intel_pstate: Remove extra conversions in pid calculation
  cpufreq: Move scheduler-related code to the sched directory
  Revert "cpufreq: postfix policy directory with the first CPU in related_cpus"
  cpufreq: Reduce cpufreq_update_util() overhead a bit
  cpufreq: Select IRQ_WORK if CPU_FREQ_GOV_COMMON is set
  cpufreq: Remove 'policy->governor_enabled'
  cpufreq: Rename __cpufreq_governor() to cpufreq_governor()
  cpufreq: Relocate handle_update() to kill its declaration
  cpufreq: governor: Drop unnecessary checks from show() and store()
  cpufreq: governor: Fix race in dbs_update_util_handler()
  cpufreq: governor: Make gov_set_update_util() static
  cpufreq: governor: Narrow down the dbs_data_mutex coverage
  cpufreq: governor: Make dbs_data_mutex static
  cpufreq: governor: Relocate definitions of tuners structures
  cpufreq: governor: Move per-CPU data to the common code
  cpufreq: governor: Make governor private data per-policy
  ...

Merge branch 'pm-opp'

* pm-opp:
  PM / OPP: Rename structures for clarity
  PM / OPP: Fix incorrect comments
  PM / OPP: Initialize regulator pointer to an error value
  PM / OPP: Initialize u_volt_min/max to a valid value
  PM / OPP: Fix NULL pointer dereference crash when disabling OPPs
  PM / OPP: Add dev_pm_opp_set_rate()
  PM / OPP: Manage device clk
  PM / OPP: Parse clock-latency and voltage-tolerance for v1 bindings
  PM / OPP: Introduce dev_pm_opp_get_max_transition_latency()
  PM / OPP: Introduce dev_pm_opp_get_max_volt_latency()
  PM / OPP: Disable OPPs that aren't supported by the regulator
  PM / OPP: get/put regulators from OPP core

Merge branch 'powercap'

* powercap:
  powercap/rapl: track lead cpu per package
  powercap/rapl: add package reference per domain
  powercap/rapl: reduce ipi calls
  cpumask: export cpumask_any_but

Merge branch 'device-properties'

* device-properties:
device property: fix for a case of use-after-free

Merge branches 'acpi-ec', 'acpi-fan', 'acpi-video' and 'acpi-misc'

* acpi-ec:
  ACPI / EC: Deny write access unless requested by module param

* acpi-fan:
  ACPI / fan: Make struct dev_pm_ops const

* acpi-video:
  ACPI / video: remove unused device_decode array

* acpi-misc:
  ACPI / util: remove redundant check if element is NULL
  ACPI: Add acpi_force_32bit_fadt_addr option to force 32 bit FADT addresses
  drivers/acpi: make pmic/intel_pmic_crc.c explicitly non-modular
  drivers/acpi: make apei/ghes.c more explicitly non-modular
  drivers/acpi: make bgrt driver explicitly non-modular

Merge branches 'acpi-pci', 'acpi-soc' and 'pnp'

* acpi-pci:
  x86/ACPI/PCI: Recognize that Interrupt Line 255 means "not connected"

* acpi-soc:
  i2c: designware: Add device HID for future AMD I2C controller

* pnp:
  PNP / ACPI: add ACPI_RESOURCE_TYPE_SERIAL_BUS as a valid type

Merge branches 'acpi-processor' and 'acpi-cppc'

* acpi-processor:
  ACPI / sleep: move acpi_processor_sleep to sleep.c
  ACPI / processor : add support for ACPI0010 processor container
  ACPI / processor_idle: replace PREFIX with pr_fmt

* acpi-cppc:
  ACPI / CPPC: use MRTT/MPAR to decide if/when a req can be sent
  ACPI / CPPC: replace writeX/readX to PCC with relaxed version
  mailbox: pcc: optimized pcc_send_data
  ACPI / CPPC: optimized cpc_read and cpc_write
  ACPI / CPPC: Optimize PCC Read Write operations

Merge branches 'acpi-scan', 'acpi-osl' and 'acpi-apei'

* acpi-scan:
  ACPI / scan: AMBA bus probing support
  ACPI: introduce a function to find the first physical device

* acpi-osl:
  ACPI / OSL: Add support to install tables via initrd
  ACPI / OSL: Clean up initrd table override code

* acpi-apei:
  ACPI / APEI: ERST: Fixed leaked resources in erst_init
  ACPI / APEI: Fix leaked resources

Merge branch 'acpica'

* acpica:
  ACPICA / Interpreter: Fix a regression triggered because of wrong Linux ECDT support
  ACPICA: Utilities: Update trace mechinism for acquire_object
  ACPICA: Namespace: Rename acpi_gbl_reg_methods_enabled to acpi_gbl_namespace_initialized
  ACPICA: Namespace: Ensure \_SB._INI executed before any _REG
  ACPICA: ACPICA: Tune _REG evaluations order in the initialization steps
  ACPICA: Tables: make default region accessible during the table load
  ACPICA: ACPI 6.0/iASL: Add support for the External AML opcode
  ACPICA: Remove unnecessary arguments to ACPI_INFO
  ACPICA: debugger: dbconvert: free pld_info on error return path
  ACPICA: iASL: Update to use internal acpi_ut_strtoul64 function
  ACPICA: iASL: Fix some typos with the name strtoul64
  ACPICA: Remove incorrect "static" from a global structure
  ACPICA: aclocal: Put parens around some definitions.

Linux 4.5

Merge branch 'turbostat' of git://git./linux/kernel/git/lenb/linux into pm-tools

Pull turbostat updates for 4.6 from Len Brown.

* 'turbostat' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
  tools/power turbostat: bugfix: TDP MSRs print bits fixing
  tools/power turbostat: correct output for MSR_NHM_SNB_PKG_CST_CFG_CTL dump
  tools/power turbostat: call __cpuid() instead of __get_cpuid()
  tools/power turbostat: indicate SMX and SGX support
  tools/power turbostat: detect and work around syscall jitter
  tools/power turbostat: show GFX%rc6
  tools/power turbostat: show GFXMHz
  tools/power turbostat: show IRQs per CPU
  tools/power turbostat: make fewer systems calls
  tools/power turbostat: fix compiler warnings
  tools/power turbostat: add --out option for saving output in a file
  tools/power turbostat: re-name "%Busy" field to "Busy%"
  tools/power turbostat: Intel Xeon x200: fix turbo-ratio decoding
  tools/power turbostat: Intel Xeon x200: fix erroneous bclk value
  tools/power turbostat: allow sub-sec intervals
  tools/power turbostat: Decode MSR_MISC_PWR_MGMT
  tools/power turbostat: decode HWP registers
  x86 msr-index: Simplify syntax for HWP fields
  tools/power turbostat: CPUID(0x16) leaf shows base, max, and bus frequency
  tools/power turbostat: decode more CPUID fields

Merge branch 'upstream' of git://git.linux-mips.org/ralf/upstream-linus

Pull MIPS fixes from Ralf Baechle:
"Another round of MIPS fixes for 4.5:

   - Fix JZ4780 build with DEBUG_ZBOOT and MACH_JZ4780
   - Fix build with DEBUG_ZBOOT and MACH_JZ4780
   - Fix issue with uninitialised temp_foreign_map
   - Fix awk regex compile failure with certain versions of awk.  At
     this time, the sole user, ld-ifversion, is only used on MIPS"

* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
  MIPS: smp.c: Fix uninitialised temp_foreign_map
  MIPS: Fix build error when SMP is used without GIC
  ld-version: Fix awk regex compile failure
  MIPS: Fix build with DEBUG_ZBOOT and MACH_JZ4780

MIPS: smp.c: Fix uninitialised temp_foreign_map

When calculate_cpu_foreign_map() recalculates the cpu_foreign_map
cpumask it uses the local variable temp_foreign_map without initialising
it to zero. Since the calculation only ever sets bits in this cpumask
any existing bits at that memory location will remain set and find their
way into cpu_foreign_map too. This could potentially lead to cache
operations suboptimally doing smp calls to multiple VPEs in the same
core, even though the VPEs share primary caches.

Therefore initialise temp_foreign_map using cpumask_clear() before use.

Fixes: cccf34e9411c ("MIPS: c-r4k: Fix cache flushing for MT cores")
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/12759/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

MIPS: Fix build error when SMP is used without GIC

The MIPS_GIC_IPI should only be selected when MIPS_GIC is also
selected, otherwise it results in a compile error. smp-gic.c uses some
functions from include/linux/irqchip/mips-gic.h like
plat_ipi_call_int_xlate() which are only added to the header file when
MIPS_GIC is set. The Lantiq SoC does not use the GIC, but supports SMP.
The calls top the functions from smp-gic.c are already protected by
some #ifdefs

The first part of this was introduced in commit 72e20142b2bf ("MIPS:
Move GIC IPI functions out of smp-cmp.c")

Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: stable@vger.kernel.org # v3.15+
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/12774/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

ld-version: Fix awk regex compile failure

The ld-version.sh script fails on some versions of awk with the
following error, resulting in build failures for MIPS:

awk: scripts/ld-version.sh: line 4: regular expression compile failed (missing '(')

This is due to the regular expression ".*)", meant to strip off the
beginning of the ld version string up to the close bracket, however
brackets have a meaning in regular expressions, so lets escape it so
that awk doesn't expect a corresponding open bracket.

Fixes: ccbef1674a15 ("Kbuild, lto: add ld-version and ld-ifversion ...")
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Tested-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk>
Cc: Michal Marek <mmarek@suse.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: linux-mips@linux-mips.org
Cc: linux-kbuild@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org # 4.4.x-
Patchwork: https://patchwork.linux-mips.org/patch/12838/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

MIPS: Fix build with DEBUG_ZBOOT and MACH_JZ4780

Ingenic SoC declares ZBOOT support, but debug definitions are missing
for MACH_JZ4780 resulting in a build failure when DEBUG_ZBOOT is set.
The UART addresses are same as with JZ4740, so fix by covering JZ4780
with those as well.

Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/12830/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

tools/power turbostat: bugfix: TDP MSRs print bits fixing

MSR_CONFIG_TDP_NOMINAL:
should print all 8 bits of base_ratio (bit 0:7) 0xFF

MSR_CONFIG_TDP_LEVEL_1:
should print all 15 bits of PKG_MIN_PWR_LVL1 (bit 48:62) 0x7FFF
should print all 15 bits of PKG_MAX_PWR_LVL1 (bit 32:46) 0x7FFF
should print all 8 bits of LVL1_RATIO (bit 16:23) 0xFF
should print all 15 bits of PKG_TDP_LVL1 (bit 0:14) 0x7FFF

And the same modification to MSR_CONFIG_TDP_LEVEL_2.

MSR_TURBO_ACTIVATION_RATIO:
should print all 8 bits of MAX_NON_TURBO_RATIO (bit 0:7) 0xFF

Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>

tools/power turbostat: correct output for MSR_NHM_SNB_PKG_CST_CFG_CTL dump

MSR_NHM_SNB_PKG_CST_CFG_CTL: 0x1e008008 (...pkg-cstate-limit=0: unlimited)
should print as
MSR_NHM_SNB_PKG_CST_CFG_CTL: 0x1e008008 (...pkg-cstate-limit=8: unlimited)

Signed-off-by: Len Brown <len.brown@intel.com>

tools/power turbostat: call __cpuid() instead of __get_cpuid()

turbostat already checks whether calling each cpuid leavf is legal,
and it doesn't look at the function return value,
so call the simpler gcc intrinsic __cpuid() instead of __get_cpuid().

syntax only, no functional change

Signed-off-by: Len Brown <len.brown@intel.com>

tools/power turbostat: indicate SMX and SGX support

SGX presence is related to a SKL power workaround,
so lets show when that is enabled.

Signed-off-by: Len Brown <len.brown@intel.com>

tools/power turbostat: detect and work around syscall jitter

The accuracy of Bzy_Mhz and Busy% depend on reading
the TSC, APERF, and MPERF close together in time.

When there is a very short measurement interval,
or a large system is profoundly idle, the changes
in APERF and MPERF may be very small.
They can be small enough that an expensive interrupt
between reading APERF and MPERF can cause the APERF/MPERF
ratio to become inaccurate, resulting in invalid
calculation and display of Bzy_MHz.

A dummy APERF read of APERF makes this problem
much more rare. Apparently this 1st systemn call
after exiting a long stretch of idle is when we
typically see expensive timer interrupts that cause
large jitter.

For the cases that dummy APERF read fails to prevent,
we compare the latency of the APERF and MPERF reads.
If they differ by more than 2x, we re-issue them.

Signed-off-by: Len Brown <len.brown@intel.com>

tools/power turbostat: show GFX%rc6

The column "GFX%c6" show the percentage of time the GPU
is in the "render C6" state, rc6. Deep package C-states on several
systems depend on the GPU being in RC6.

This information comes from the counter
/sys/class/drm/card0/power/rc6_residency_ms,
as read before and after the measurement interval.

Signed-off-by: Len Brown <len.brown@intel.com>

tools/power turbostat: show GFXMHz

Under the column "GFXMHz", show a snapshot of this attribute:
/sys/class/graphics/fb0/device/drm/card0/gt_cur_freq_mhz

This is an instantaneous snapshot of what sysfs presents
at the end of the measurement interval. turbostat does
not average or otherwise perform any math on this value.

Signed-off-by: Len Brown <len.brown@intel.com>

tools/power turbostat: show IRQs per CPU

The new IRQ column shows how many interrupts have occurred on each CPU
during the measurement inteval. This information comes from
the difference between /proc/interrupts shapshots made before
and after the measurement interval.

The first row, the system summary, shows the sum of the IRQS
for all CPUs during that interval.

Signed-off-by: Len Brown <len.brown@intel.com>

tools/power turbostat: make fewer systems calls

skip the open(2)/close(2) on each msr read
by keeping the /dev/cpu/*/msr files open.

The remaining read(2) is generally far fewer cycles
than the removed open(2) system call.

Signed-off-by: Len Brown <len.brown@intel.com>

tools/power turbostat: fix compiler warnings

Signed-off-by: Len Brown <len.brown@intel.com>

tools/power turbostat: add --out option for saving output in a file

By default...

Turbostat --debug gconfiguration info goes to stderr.

In FORK mode, turbostat statistics go to stderr.

In PERIODIC mode, turbostat statistics go to stdout.

These defaults do not change, but an option "--out file"
will send all output above only to the specified file.

Signed-off-by: Len Brown <len.brown@intel.com>

tools/power turbostat: re-name "%Busy" field to "Busy%"

some tools processing turbostat output
have difficulty with items that begin with %...

Reported-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>

tools/power turbostat: Intel Xeon x200: fix turbo-ratio decoding

Following changes have been made:
- changed MSR_NHM_TURBO_RATIO_LIMIT to MSR_TURBO_RATIO_LIMIT in debug print
  for consistency with Developer Manual
- updated definition of bitfields in MSR_TURBO_RATIO_LIMIT and appropriate
  parsing code
- added x200 to list of architectures that do not support Nahlem compatible
  definition of MSR_TURBO_RATIO_LIMIT register (x200 has the register but
  bits definition is custom)
- fixed typo in code that parses MSR_TURBO_RATIO_LIMIT
  (logical instead of bitwise operator)
- changed MSR_TURBO_RATIO_LIMIT parsing algorithm so the print out had the
  same order as implementations for other platforms

Signed-off-by: Hubert Chrzaniuk <hubert.chrzaniuk@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>

tools/power turbostat: Intel Xeon x200: fix erroneous bclk value

x200 does not enable any way to programmatically obtain bus clock
speed. Bclk for the architecture has a fixed value of 100 MHz.
At the same time x200 cannot be included in has_snb_msrs since
it does not support C7 idle state.

prior to this patch, MHz values reported on this chip
were erroneously calculated using bclk of 133MHz,
causing MHz values to be reported 33% higher than actual.

Signed-off-by: Hubert Chrzaniuk <hubert.chrzaniuk@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>

tools/power turbostat: allow sub-sec intervals

turbostat -i interval_sec

will sample and display statistics every interval_sec.
interval_sec used to be a whole number of seconds,
but now we accept a decimal, as small as 0.001 sec (1 ms).

Signed-off-by: Len Brown <len.brown@intel.com>

Merge branch 'for-linus' of git://git.kernel.dk/linux-block

Pull block merge fix from Jens Axboe.

This fixes the block segment counting bug and resulting sg overrun
reported by Kent Overstreet, introduced with the last block pull.

* 'for-linus' of git://git.kernel.dk/linux-block:
block: don't optimize for non-cloned bio in bio_get_last_bvec()

Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull x86 fixes from Ingo Molnar:
"This fixes 3 FPU handling related bugs, an EFI boot crash and a
  runtime warning.

  The EFI fix arrived late but I didn't want to delay it to after v4.5
  because the effects are pretty bad for the systems that are affected
  by it"

[ Actually, I don't think the EFI fix really matters yet, because we
  haven't switched to the separate EFI page tables in mainline yet ]

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/efi: Fix boot crash by always mapping boot service regions into new EFI page tables
  x86/fpu: Fix eager-FPU handling on legacy FPU machines
  x86/delay: Avoid preemptible context checks in delay_mwaitx()
  x86/fpu: Revert ("x86/fpu: Disable AVX when eagerfpu is off")
  x86/fpu: Fix 'no387' regression

Merge git://git./linux/kernel/git/nab/target-pending

Pull SCSI target bug fix from Nicholas Bellinger:
"Here is an outstanding target-core bug-fix for v4.5 code."

  This patch addresses a recent Task Management (TMR) regression related
  to larger set of multi-port LUN_RESET bug-fixes in v4.5-rc5.

  It drops a left-over target_put_sess_cmd() of se_cmd->cmd_kref within
  ABORT_TASK failure path, once a se_cmd descriptor has already
  completed posting response to fabric driver logic, and must be skipped
  during normal ABORT_TASK se_cmd->tag lookup"

* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
  target: Drop incorrect ABORT_TASK put for completed commands

block: don't optimize for non-cloned bio in bio_get_last_bvec()

For !BIO_CLONED bio, we can use .bi_vcnt safely, but it
doesn't mean we can just simply return .bi_io_vec[.bi_vcnt - 1]
because the start postion may have been moved in the middle of
the bvec, such as splitting in the middle of bvec.

Fixes: 7bcd79ac50d9(block: bio: introduce helpers to get the 1st and last bvec)
Cc: stable@vger.kernel.org
Reported-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

x86/efi: Fix boot crash by always mapping boot service regions into new EFI page tables

Some machines have EFI regions in page zero (physical address
0x00000000) and historically that region has been added to the e820
map via trim_bios_range(), and ultimately mapped into the kernel page
tables. It was not mapped via efi_map_regions() as one would expect.

Alexis reports that with the new separate EFI page tables some boot
services regions, such as page zero, are not mapped. This triggers an
oops during the SetVirtualAddressMap() runtime call.

For the EFI boot services quirk on x86 we need to memblock_reserve()
boot services regions until after SetVirtualAddressMap(). Doing that
while respecting the ownership of regions that may have already been
reserved by the kernel was the motivation behind this commit:

  7d68dc3f1003 ("x86, efi: Do not reserve boot services regions within reserved areas")

That patch was merged at a time when the EFI runtime virtual mappings
were inserted into the kernel page tables as described above, and the
trick of setting ->numpages (and hence the region size) to zero to
track regions that should not be freed in efi_free_boot_services()
meant that we never mapped those regions in efi_map_regions(). Instead
we were relying solely on the existing kernel mappings.

Now that we have separate page tables we need to make sure the EFI
boot services regions are mapped correctly, even if someone else has
already called memblock_reserve(). Instead of stashing a tag in
->numpages, set the EFI_MEMORY_RUNTIME bit of ->attribute. Since it
generally makes no sense to mark a boot services region as required at
runtime, it's pretty much guaranteed the firmware will not have
already set this bit.

For the record, the specific circumstances under which Alexis
triggered this bug was that an EFI runtime driver on his machine was
responding to the EVT_SIGNAL_VIRTUAL_ADDRESS_CHANGE event during
SetVirtualAddressMap().

The event handler for this driver looks like this,

  sub rsp,0x28
  lea rdx,[rip+0x2445] # 0xaa948720
  mov ecx,0x4
  call func_aa9447c0  ; call to ConvertPointer(4, & 0xaa948720)
  mov r11,QWORD PTR [rip+0x2434] # 0xaa948720
  xor eax,eax
  mov BYTE PTR [r11+0x1],0x1
  add rsp,0x28
  ret

Which is pretty typical code for an EVT_SIGNAL_VIRTUAL_ADDRESS_CHANGE
handler. The "mov r11, QWORD PTR [rip+0x2424]" was the faulting
instruction because ConvertPointer() was being called to convert the
address 0x0000000000000000, which when converted is left unchanged and
remains 0x0000000000000000.

The output of the oops trace gave the impression of a standard NULL
pointer dereference bug, but because we're accessing physical
addresses during ConvertPointer(), it wasn't. EFI boot services code
is stored at that address on Alexis' machine.

Reported-by: Alexis Murzeau <amurzeau@gmail.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Raphael Hertzog <hertzog@debian.org>
Cc: Roger Shimizu <rogershimizu@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1457695163-29632-2-git-send-email-matt@codeblueprint.co.uk
Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=815125
Signed-off-by: Ingo Molnar <mingo@kernel.org>

x86/fpu: Fix eager-FPU handling on legacy FPU machines

i486 derived cores like Intel Quark support only the very old,
legacy x87 FPU (FSAVE/FRSTOR, CPUID bit FXSR is not set), and
our FPU code wasn't handling the saving and restoring there
properly in the 'eagerfpu' case.

So after we made eagerfpu the default for all CPU types:

58122bf1d856 x86/fpu: Default eagerfpu=on on all CPUs

these old FPU designs broke. First, Andy Shevchenko reported a splat:

WARNING: CPU: 0 PID: 823 at arch/x86/include/asm/fpu/internal.h:163 fpu__clear+0x8c/0x160

which was us trying to execute FXRSTOR on those machines even though
they don't support it.

After taking care of that, Bryan O'Donoghue reported that a simple FPU
test still failed because we weren't initializing the FPU state properly
on those machines.

Take care of all that.

Reported-and-tested-by: Bryan O'Donoghue <pure.logic@nexus-software.ie>
Reported-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/20160311113206.GD4312@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Merge tag 'for-linus-20160311' of git://git.infradead.org/linux-mtd

Pull MTD fixes from Brian Norris:
"Late MTD fix for v4.5:

   - A simple error code handling fix for the NAND ECC test; this was a
     regression in v4.5-rc1

   - A MAINTAINERS update, which might as well go in ASAP"

* tag 'for-linus-20160311' of git://git.infradead.org/linux-mtd:
  MAINTAINERS: add a maintainer for the NAND subsystem
  mtd: nand: tests: fix regression introduced in mtd_nandectest

Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux

Pull drm/i915 fixes from Dave Airlie:
"Just two i915 regression fixes, that should be it from me"

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm/i915: Actually retry with bit-banging after GMBUS timeout
drm/i915: Fix bogus dig_port_map[] assignment for pre-HSW

mm/mempool: avoid KASAN marking mempool poison checks as use-after-free

When removing an element from the mempool, mark it as unpoisoned in KASAN
before verifying its contents for SLUB/SLAB debugging. Otherwise KASAN
will flag the reads checking the element use-after-free writes as
use-after-free reads.

Signed-off-by: Matthew Dawson <matthew@mjdsystems.ca>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'armsoc-fixes' of git://git./linux/kernel/git/arm/arm-soc

Pull ARM SoC fixes from Olof Johansson:
"Two more fixes for 4.5:

   - One is a fix for OMAP that is urgently needed to avoid DRA7xx chips
     from premature aging, by always keeping the Ethernet clock enabled.

   - The other solves a I/O memory layout issue on Armada, where SROM
     and PCI memory windows were conflicting in some configurations"

* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  ARM: mvebu: fix overlap of Crypto SRAM with PCIe memory window
  ARM: dts: dra7: do not gate cpsw clock due to errata i877
  ARM: OMAP2+: hwmod: Introduce ti,no-idle dt property

Merge tag 'media/v4.5-5' of git://git./linux/kernel/git/mchehab/linux-media

Pull media fix from Mauro Carvalho Chehab:
"One last time fix: It adds a code that prevents some media tools like
  media-ctl to hide some entities that have their IDs out of the range
  expected by those apps"

* tag 'media/v4.5-5' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
  [media] media-device: map new functions into old types for legacy API

ARM: mvebu: fix overlap of Crypto SRAM with PCIe memory window

When the Crypto SRAM mappings were added to the Device Tree files
describing the Armada XP boards in commit c466d997bb16 ("ARM: mvebu:
define crypto SRAM ranges for all armada-xp boards"), the fact that
those mappings were overlaping with the PCIe memory aperture was
overlooked. Due to this, we currently have for all Armada XP platforms
a situation that looks like this:

Memory mapping on Armada XP boards with internal registers at
0xf1000000:

- 0x00000000 -> 0xf0000000 3.75G RAM
- 0xf0000000 -> 0xf1000000 16M NOR flashes (AXP GP / AXP DB)
- 0xf1000000 -> 0xf1100000 1M internal registers
- 0xf8000000 -> 0xffe0000 126M PCIe memory aperture
- 0xf8100000 -> 0xf8110000 64KB Crypto SRAM #0 => OVERLAPS WITH PCIE !
- 0xf8110000 -> 0xf8120000 64KB Crypto SRAM #1 => OVERLAPS WITH PCIE !
- 0xffe00000 -> 0xfff00000 1M PCIe I/O aperture
- 0xfff0000  -> 0xffffffff 1M BootROM

The overlap means that when PCIe devices are added, depending on their
memory window needs, they might or might not be mapped into the
physical address space. Indeed, they will not be mapped if the area
allocated in the PCIe memory aperture by the PCI core overlaps with
one of the Crypto SRAM. Typically, a Intel IGB PCIe NIC that needs 8MB
of PCIe memory will see its PCIe memory window allocated from
0xf80000000 for 8MB, which overlaps with the Crypto SRAM windows. Due
to this, the PCIe window is not created, and any attempt to access the
PCIe window makes the kernel explode:

[    3.302213] igb: Copyright (c) 2007-2014 Intel Corporation.
[    3.307841] pci 0000:00:09.0: enabling device (0140 -> 0143)
[    3.313539] mvebu_mbus: cannot add window '4:f8', conflicts with another window
[    3.320870] mvebu-pcie soc:pcie-controller: Could not create MBus window at [mem 0xf8000000-0xf87fffff]: -22
[    3.330811] Unhandled fault: external abort on non-linefetch (0x1008) at 0xf08c0018

This problem does not occur on Armada 370 boards, because we use the
following memory mapping (for boards that have internal registers at
0xf1000000):

- 0x00000000 -> 0xf0000000 3.75G RAM
- 0xf0000000 -> 0xf1000000 16M NOR flashes (AXP GP / AXP DB)
- 0xf1000000 -> 0xf1100000 1M internal registers
- 0xf1100000 -> 0xf1110000 64KB Crypto SRAM #0 => OK !
- 0xf8000000 -> 0xffe0000 126M PCIe memory
- 0xffe00000 -> 0xfff00000 1M PCIe I/O
- 0xfff0000  -> 0xffffffff 1M BootROM

Obviously, the solution is to align the location of the Crypto SRAM
mappings of Armada XP to be similar with the ones on Armada 370, i.e
have them between the "internal registers" area and the beginning of
the PCIe aperture.

However, we have a special case with the OpenBlocks AX3-4 platform,
which has a 128 MB NOR flash. Currently, this NOR flash is mapped from
0xf0000000 to 0xf8000000. This is possible because on OpenBlocks
AX3-4, the internal registers are not at 0xf1000000. And this explains
why the Crypto SRAM mappings were not configured at the same place on
Armada XP.

Hence, the solution is two-fold:

(1) Move the NOR flash mapping on Armada XP OpenBlocks AX3-4 from
     0xe8000000 to 0xf0000000. This frees the 0xf0000000 ->
     0xf80000000 space.

(2) Move the Crypto SRAM mappings on Armada XP to be similar to
     Armada 370 (except of course that Armada XP has two Crypto SRAM
     and not one).

After this patch, the memory mapping on Armada XP boards with
registers at 0xf1 is:

- 0x00000000 -> 0xf0000000 3.75G RAM
- 0xf0000000 -> 0xf1000000 16M NOR flashes (AXP GP / AXP DB)
- 0xf1000000 -> 0xf1100000 1M internal registers
- 0xf1100000 -> 0xf1110000 64KB Crypto SRAM #0
- 0xf1110000 -> 0xf1120000 64KB Crypto SRAM #1
- 0xf8000000 -> 0xffe0000 126M PCIe memory
- 0xffe00000 -> 0xfff00000 1M PCIe I/O
- 0xfff0000  -> 0xffffffff 1M BootROM

And the memory mapping for the special case of the OpenBlocks AX3-4
(internal registers at 0xd0000000, NOR of 128 MB):

- 0x00000000 -> 0xc0000000 3G RAM
- 0xd0000000 -> 0xd1000000 1M internal registers
- 0xe800000  -> 0xf0000000 128M NOR flash
- 0xf1100000 -> 0xf1110000 64KB Crypto SRAM #0
- 0xf1110000 -> 0xf1120000 64KB Crypto SRAM #1
- 0xf8000000 -> 0xffe0000 126M PCIe memory
- 0xffe00000 -> 0xfff00000 1M PCIe I/O
- 0xfff0000  -> 0xffffffff 1M BootROM

Fixes: c466d997bb16 ("ARM: mvebu: define crypto SRAM ranges for all armada-xp boards")
Reported-by: Phil Sutter <phil@nwl.cc>
Cc: Phil Sutter <phil@nwl.cc>
Cc: <stable@vger.kernel.org>
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Acked-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: Olof Johansson <olof@lixom.net>

Merge tag 'dmaengine-fix-4.5' of git://git.infradead.org/users/vkoul/slave-dma

Pull dmaengine fixes from Vinod Koul:
"Two fixes showed up in last few days, and they should be included in
  4.5.  Summary:

  Two more late fixes to drivers, nothing major here:

   - A memory leak fix in fsdma unmap the dma descriptors on freeup

   - A fix in xdmac driver for residue calculation of dma descriptor"

* tag 'dmaengine-fix-4.5' of git://git.infradead.org/users/vkoul/slave-dma:
  dmaengine: at_xdmac: fix residue computation
  dmaengine: fsldma: fix memory leak

Merge tag 'pm+acpi-4.5-final' of git://git./linux/kernel/git/rafael/linux-pm

Pull power management and ACPI fixes from Rafael Wysocki:
"Two more fixes for issues introduced recently, one in the generic
  device properties framework and one in ACPICA.

  Specifics:

   - Revert a recent ACPICA commit that has been reverted upstream,
     because it caused problems to happen on user systems and the
     problem it attempted to address will not be relevant any more after
     upcoming ACPI specification changes (Bob Moore).

   - Fix crash in the generic device properties framework introduced by
     a recent change that forgot to check pointers against error values
     in addition to checking them against NULL (Heikki Krogerus)"

* tag 'pm+acpi-4.5-final' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  device property: fwnode->secondary may contain ERR_PTR(-ENODEV)
  ACPICA: Revert "Parser: Fix for SuperName method invocation"

Merge tag 'xfs-for-linus-4.5-rc7' of git://git./linux/kernel/git/dgc/linux-xfs

Pull xfs fixes from Dave Chinner:
"This is a fix for a regression introduced in 4.5-rc1 by the new torn
  log write detection code.  The regression only affects people moving a
  clean filesystem between machines/kernels of different architecture
  (such as changing between 32 bit and 64 bit kernels), but this is the
  recommended (and only!) safe way to migrate a filesystem between
  architectures so we really need to ensure it works.

  The changes are larger than I'd prefer right at the end of the release
  cycle, but the majority of the change is just factoring code to enable
  the detection of a clean log at the correct time to avoid this issue.

  Changes:

   - Only perform torn log write detection on dirty logs.  This prevents
     failures being detected due to a clean filesystem being moved
     between machines or kernels of different architectures (e.g.  32 ->
     64 bit, BE -> LE, etc).  This fixes a regression introduced by the
     torn log write detection in 4.5-rc1"

* tag 'xfs-for-linus-4.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs:
  xfs: only run torn log write detection on dirty logs
  xfs: refactor in-core log state update to helper
  xfs: refactor unmount record detection into helper
  xfs: separate log head record discovery from verification

Merge branch 'for-linus' of git://git./linux/kernel/git/viro/vfs

Pull vfs fixes from Al Viro:
"A couple of fixes: Fix for my dumb braino in ncpfs and a long-standing
  breakage on recovery from failed rename() in jffs2"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  jffs2: reduce the breakage on recovery from halfway failed rename()
  ncpfs: fix a braino in OOM handling in ncp_fill_cache()

Merge branches 'device-properties-fixes' and 'acpica-fixes'

* device-properties-fixes:
device property: fwnode->secondary may contain ERR_PTR(-ENODEV)

* acpica-fixes:
ACPICA: Revert "Parser: Fix for SuperName method invocation"

drm/i915: Actually retry with bit-banging after GMBUS timeout

After the GMBUS transfer times out, we set force_bit=1 and
return -EAGAIN expecting the i2c core to call the .master_xfer
hook again so that we will retry the same transfer via bit-banging.
This is in case the gmbus hardware is somehow faulty.

Unfortunately we left adapter->retries to 0, meaning the i2c core
didn't actually do the retry. Let's tell the core we want one retry
when we return -EAGAIN.

Note that i2c-algo-bit also uses this retry count for some internal
retries, so we'll end up increasing those a bit as well.

Cc: Jani Nikula <jani.nikula@intel.com>
Cc: drm-intel-fixes@lists.freedesktop.org
Fixes: bffce907d640 ("drm/i915: abstract i2c bit banging fallback in gmbus xfer")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1457366220-29409-2-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
(cherry picked from commit 8b1f165a4a8f64c28cf42d10e1f4d3b451dedc51)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>

ACPI / APEI: ERST: Fixed leaked resources in erst_init

erst_init currently leaks resources allocated from its call to
apei_resources_init(). The data allocated there gets copied
into apei_resources_all and can be freed when we're done with it.

Signed-off-by: Josh Hunt <johunt@akamai.com>
Reviewed-by: Chen, Gong <gong.chen@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

ACPI / APEI: Fix leaked resources

We leak the NVS and arch resources (if used), in apei_resources_request.
They are allocated to make sure we exclude them from the APEI resources,
but they are never freed at the end of the function. Free them now.

Signed-off-by: Josh Hunt <johunt@akamai.com>
Reviewed-by: Chen, Gong <gong.chen@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

intel_pstate: Do not skip samples partially

If the current value of MPERF or the current value of TSC is the
same as the previous one, respectively, intel_pstate_sample() bails
out early and skips the sample.

However, intel_pstate_adjust_busy_pstate() is still called in that
case which is not correct, so modify intel_pstate_sample() to
return a bool value indicating whether or not the sample has been
taken and use it to decide whether or not to call
intel_pstate_adjust_busy_pstate().

While at it, remove redundant parentheses from the MPERF/TSC
check in intel_pstate_sample().

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>

intel_pstate: Remove freq calculation from intel_pstate_calc_busy()

Use a helper function to compute the average pstate and call it only
where it is needed (only when tracing or in intel_pstate_get).

Signed-off-by: Philippe Longepe <philippe.longepe@linux.intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

intel_pstate: Move intel_pstate_calc_busy() into get_target_pstate_use_performance()

The cpu_load algorithm doesn't need to invoke intel_pstate_calc_busy(),
so move that call from intel_pstate_sample() to
get_target_pstate_use_performance().

Signed-off-by: Philippe Longepe <philippe.longepe@linux.intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

intel_pstate: Optimize calculation for max/min_perf_adj

mul_fp(int_tofp(A), B) expands to:
((A << FRAC_BITS) * B) >> FRAC_BITS, so the same result can be obtained
via simple multiplication A * B. Apply this observation to
max_perf * limits->max_perf and max_perf * limits->min_perf in
intel_pstate_get_min_max()."

Signed-off-by: Philippe Longepe <philippe.longepe@linux.intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

intel_pstate: Remove extra conversions in pid calculation

pid->setpoint and pid->deadband can be initialized in fixed point, so we
can avoid the int_tofp in pid_calc.

Signed-off-by: Philippe Longepe <philippe.longepe@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

i2c: designware: Add device HID for future AMD I2C controller

Add device HID AMDI0010 to match the AMD ACPI Vendor ID (AMDI) that
was registered in http://www.uefi.org/acpi_id_list, and the I2C
controller on future AMD paltform will use the HID instead of AMD0010.

Signed-off-by: Xiangliang Yu <Xiangliang.Yu@amd.com>
Acked-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

device property: fix for a case of use-after-free

In device_remove_property_set(), the secondary fwnode needs
to be cleared before the pset is freed. This fixes a
use-after-free when a property set is providing the primary
fwnode.

As a result of the fix, the primary fwnode may end up
containing ERR_PTR(-ENODEV), so also adding checks for it to
the property handling code.

Reported-by: John Youn <John.Youn@synopsys.com>
Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

ACPICA / Interpreter: Fix a regression triggered because of wrong Linux ECDT support

It is reported that the following commit triggers regressions:
Linux commit: efaed9be998b5ae0afb7458e057e5f4402b43fa0
ACPICA commit: 31178590dde82368fdb0f6b0e466b6c0add96c57
Subject: ACPICA: Events: Enhance acpi_ev_execute_reg_method() to
ensure no _REG evaluations can happen during OS early boot
stages

This is because that the ECDT support is not corrected in Linux, and Linux
requires to execute _REG for ECDT (though this sounds so wrong), we need to
ensure acpi_gbl_namespace_initialized is set before ECDT probing in order
for _REG to be executed. Since we have to move
"acpi_gbl_namespace_initialized = TRUE" to the initialization step
happening before ECDT probing, acpi_load_tables() is the best candidate for
now. Thus this patch fixes the regression by doing so.

But if the ECDT support is fixed, Linux will not execute _REG for ECDT, and
ECDT probing will happen before acpi_load_tables(). At that time, we still
want to ensure acpi_gbl_namespace_initialized is set after executing
acpi_ns_initialize_objects() (under the condition of
acpi_gbl_group_module_level_code = FALSE), this patch also moves
acpi_ns_initialize_objects() to acpi_load_tables() accordingly.

Since acpi_ns_initialize_objects() doesn't seem to be skippable, this
patch also removes ACPI_NO_OBJECT_INIT for the one invoked in
acpi_load_tables(). And since the default region handlers should always be
installed before loading the tables, this patch also removes useless
acpi_gbl_group_module_level_code check accordingly. Reported by Chris
Bainbridge, Fixed by Lv Zheng.

Fixes: efaed9be998b (ACPICA: Events: Enhance acpi_ev_execute_reg_method() to ensure no _REG evaluations can happen during OS early boot stages)
Reported-and-tested-by: Chris Bainbridge <chris.bainbridge@gmail.com>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Merge branch 'pm-cpufreq-governor' into pm-cpufreq

cpufreq: Move scheduler-related code to the sched directory

Create cpufreq.c under kernel/sched/ and move the cpufreq code
related to the scheduler to that file and to sched.h.

Redefine cpufreq_update_util() as a static inline function to avoid
function calls at its call sites in the scheduler code (as suggested
by Peter Zijlstra).

Also move the definition of struct update_util_data and declaration
of cpufreq_set_update_util_data() from include/linux/cpufreq.h to
include/linux/sched.h.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

MAINTAINERS: add a maintainer for the NAND subsystem

Add myself as the maintainer of the NAND subsystem.

Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Acked-by: Richard Weinberger <richard@nod.at>
Acked-by: Brian Norris <computersforpeace@gmail.com>
Acked-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>

Merge tag 'for-linus' of git://git./virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
"A few simple fixes for ARM, x86, PPC and generic code.

  The x86 MMU fix is a bit larger because the surrounding code needed a
  cleanup, but nothing worrisome"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: MMU: fix reserved bit check for ept=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0
  KVM: MMU: fix ept=0/pte.u=1/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 combo
  kvm: cap halt polling at exactly halt_poll_ns
  KVM: s390: correct fprs on SIGP (STOP AND) STORE STATUS
  KVM: VMX: disable PEBS before a guest entry
  KVM: PPC: Book3S HV: Sanitize special-purpose register values on guest exit

Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux

Pull arm64 fixes from Will Deacon:
"I thought we were done for 4.5, but then the 64k-page chaps came
  crawling out of the woodwork.  *sigh*

  The vmemmap fix I sent for -rc7 caused a regression with 64k pages and
  sparsemem and at some point during the release cycle the new hugetlb
  code using contiguous ptes started failing the libhugetlbfs tests with
  64k pages enabled.

  So here are a couple of patches that fix the vmemmap alignment and
  disable the new hugetlb page sizes whilst a proper fix is being
  developed:

   - Temporarily disable huge pages built using contiguous ptes

   - Ensure vmemmap region is sufficiently aligned for sparsemem
     sections"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: hugetlb: partial revert of 66b3923a1a0f
  arm64: account for sparsemem section alignment when choosing vmemmap offset

Merge branch 'for-linus' of git://git./linux/kernel/git/s390/linux

Pull s390 fixes from Martin Schwidefsky:
"Three bug fixes:
   - The fix for the page table corruption (CVE-2016-2143)
   - The diagnose statistics introduced a regression for the dasd diag
     driver
   - Boot crash on systems without the set-program-parameters facility"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/mm: four page table levels vs. fork
  s390/cpumf: Fix lpp detection
  s390/dasd: fix diag 0x250 inline assembly

[media] media-device: map new functions into old types for legacy API

The legacy media controller userspace API exposes entity types that
carry both type and function information. The new API replaces the type
with a function. It preserves backward compatibility by defining legacy
functions for the existing types and using them in drivers.

This works fine, as long as newer entity functions won't be added.

Unfortunately, some tools, like media-ctl with --print-dot argument
rely on the now legacy MEDIA_ENT_T_V4L2_SUBDEV and MEDIA_ENT_T_DEVNODE
numeric ranges to identify what entities will be shown.

Also, if the entity doesn't match those ranges, it will ignore the
major/minor information on devnodes, and won't be getting the devnode
name via udev or sysfs.

As we're now adding devices outside the old range, the legacy ioctl
needs to map the new entity functions into a type at the old range,
or otherwise we'll have a regression.

Detected on all released media-ctl versions (e. g. versions <= 1.10).

Fix this by deriving the type from the function to emulate the legacy
API if the function isn't in the legacy functions range.

Reported-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>

dmaengine: at_xdmac: fix residue computation

When computing the residue we need two pieces of information: the current
descriptor and the remaining data of the current descriptor. To get
that information, we need to read consecutively two registers but we
can't do it in an atomic way. For that reason, we have to check manually
that current descriptor has not changed.

Signed-off-by: Ludovic Desroches <ludovic.desroches@atmel.com>
Suggested-by: Cyrille Pitchen <cyrille.pitchen@atmel.com>
Reported-by: David Engraf <david.engraf@sysgo.com>
Tested-by: David Engraf <david.engraf@sysgo.com>
Fixes: e1f7c9eee707 ("dmaengine: at_xdmac: creation of the atmel
eXtended DMA Controller driver")
Cc: stable@vger.kernel.org #4.1 and later
Signed-off-by: Vinod Koul <vinod.koul@intel.com>

x86/delay: Avoid preemptible context checks in delay_mwaitx()

We do use this_cpu_ptr(&cpu_tss) as a cacheline-aligned, seldomly
accessed per-cpu var as the MONITORX target in delay_mwaitx(). However,
when called in preemptible context, this_cpu_ptr -> smp_processor_id() ->
debug_smp_processor_id() fires:

BUG: using smp_processor_id() in preemptible [00000000] code: udevd/312
caller is delay_mwaitx+0x40/0xa0

But we don't care about that check - we only need cpu_tss as a MONITORX
target and it doesn't really matter which CPU's var we're touching as
we're going idle anyway. Fix that.

Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Huang Rui <ray.huang@amd.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: spg_linux_kernel@amd.com
Link: http://lkml.kernel.org/r/20160309205622.GG6564@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>

KVM: MMU: fix reserved bit check for ept=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0

KVM has special logic to handle pages with pte.u=1 and pte.w=0 when
CR0.WP=1. These pages' SPTEs flip continuously between two states:
U=1/W=0 (user and supervisor reads allowed, supervisor writes not allowed)
and U=0/W=1 (supervisor reads and writes allowed, user writes not allowed).

When SMEP is in effect, however, U=0 will enable kernel execution of
this page. To avoid this, KVM also sets NX=1 in the shadow PTE together
with U=0, making the two states U=1/W=0/NX=gpte.NX and U=0/W=1/NX=1.
When guest EFER has the NX bit cleared, the reserved bit check thinks
that the latter state is invalid; teach it that the smep_andnot_wp case
will also use the NX bit of SPTEs.

Cc: stable@vger.kernel.org
Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.inel.com>
Fixes: c258b62b264fdc469b6d3610a907708068145e3b
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: MMU: fix ept=0/pte.u=1/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 combo

Yes, all of these are needed. :) This is admittedly a bit odd, but
kvm-unit-tests access.flat tests this if you run it with "-cpu host"
and of course ept=0.

KVM runs the guest with CR0.WP=1, so it must handle supervisor writes
specially when pte.u=1/pte.w=0/CR0.WP=0.  Such writes cause a fault
when U=1 and W=0 in the SPTE, but they must succeed because CR0.WP=0.
When KVM gets the fault, it sets U=0 and W=1 in the shadow PTE and
restarts execution.  This will still cause a user write to fault, while
supervisor writes will succeed.  User reads will fault spuriously now,
and KVM will then flip U and W again in the SPTE (U=1, W=0).  User reads
will be enabled and supervisor writes disabled, going back to the
originary situation where supervisor writes fault spuriously.

When SMEP is in effect, however, U=0 will enable kernel execution of
this page.  To avoid this, KVM also sets NX=1 in the shadow PTE together
with U=0.  If the guest has not enabled NX, the result is a continuous
stream of page faults due to the NX bit being reserved.

The fix is to force EFER.NX=1 even if the CPU is taking care of the EFER
switch.  (All machines with SMEP have the CPU_LOAD_IA32_EFER vm-entry
control, so they do not use user-return notifiers for EFER---if they did,
EFER.NX would be forced to the same value as the host).

There is another bug in the reserved bit check, which I've split to a
separate patch for easier application to stable kernels.

Cc: stable@vger.kernel.org
Cc: Andy Lutomirski <luto@amacapital.net>
Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Fixes: f6577a5fa15d82217ca73c74cd2dcbc0f6c781dd
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

x86/fpu: Revert ("x86/fpu: Disable AVX when eagerfpu is off")

Leonid Shatz noticed that the SDM interpretation of the following
recent commit:

  394db20ca240741 ("x86/fpu: Disable AVX when eagerfpu is off")

... is incorrect and that the original behavior of the FPU code was correct.

Because AVX is not stated in CR0 TS bit description, it was mistakenly
believed to be not supported for lazy context switch. This turns out
to be false:

  Intel Software Developer's Manual Vol. 3A, Sec. 2.5 Control Registers:

   'TS Task Switched bit (bit 3 of CR0) -- Allows the saving of the x87 FPU/
    MMX/SSE/SSE2/SSE3/SSSE3/SSE4 context on a task switch to be delayed until
    an x87 FPU/MMX/SSE/SSE2/SSE3/SSSE3/SSE4 instruction is actually executed
    by the new task.'

  Intel Software Developer's Manual Vol. 2A, Sec. 2.4 Instruction Exception
  Specification:

   'AVX instructions refer to exceptions by classes that include #NM
    "Device Not Available" exception for lazy context switch.'

So revert the commit.

Reported-by: Leonid Shatz <leonid.shatz@ravellosystems.com>
Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1457569734-3785-1-git-send-email-yu-cheng.yu@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

s390/mm: four page table levels vs. fork

The fork of a process with four page table levels is broken since
git commit 6252d702c5311ce9 "[S390] dynamic page tables."

All new mm contexts are created with three page table levels and
an asce limit of 4TB. If the parent has four levels dup_mmap will
add vmas to the new context which are outside of the asce limit.
The subsequent call to copy_page_range will walk the three level
page table structure of the new process with non-zero pgd and pud
indexes. This leads to memory clobbers as the pgd_index *and* the
pud_index is added to the mm->pgd pointer without a pgd_deref
in between.

The init_new_context() function is selecting the number of page
table levels for a new context. The function is used by mm_init()
which in turn is called by dup_mm() and mm_alloc(). These two are
used by fork() and exec(). The init_new_context() function can
distinguish the two cases by looking at mm->context.asce_limit,
for fork() the mm struct has been copied and the number of page
table levels may not change. For exec() the mm_alloc() function
set the new mm structure to zero, in this case a three-level page
table is created as the temporary stack space is located at
STACK_TOP_MAX = 4TB.

This fixes CVE-2016-2143.

Reported-by: Marcin Kościelnicki <koriakin@0x04.net>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: stable@vger.kernel.org
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>

Merge tag 'spi-fix-v4.5-rc7' of git://git./linux/kernel/git/broonie/spi

Pull spi fixes from Mark Brown:
"A few driver specific fixes for the Rockchip and i.MX SPI controllers,
  especially for the i.MX they're annoying bugs if you run into them"

* tag 'spi-fix-v4.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi: imx: fix spi resource leak with dma transfer
  spi: imx: allow only WML aligned transfers to use DMA
  spi: rockchip: add missing spi_master_put
  spi: rockchip: disable runtime pm when in err case

Merge remote-tracking branch 'spi/fix/rockchip' into spi-linus

Merge remote-tracking branch 'spi/fix/imx' into spi-linus

Merge tag 'ext4_for_linus_stable' of git://git./linux/kernel/git/tytso/ext4

Pull ext4 fix from Ted Ts'o:
"This fixes a regression which crept in v4.5-rc5"

* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: iterate over buffer heads correctly in move_extent_per_page()

Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux

Pull drm fixes from Dave Airlie:
"A few imx fixes I missed from a couple of weeks ago, they still aren't
  that big and fix some regression and a fail to boot problem.

  Other than that, a couple of regression fixes for radeon/amdgpu, one
  regression fix for vmwgfx and one regression fix for tda998x"

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
  Revert "drm/radeon/pm: adjust display configuration after powerstate"
  drm/amdgpu/dp: add back special handling for NUTMEG
  drm/radeon/dp: add back special handling for NUTMEG
  drm/i2c: tda998x: Choose between atomic or non atomic dpms helper
  drm/vmwgfx: Add back ->detect() and ->fill_modes()
  drm/radeon: Fix error handling in radeon_flip_work_func.
  drm/amdgpu: Fix error handling in amdgpu_flip_work_func.
  drm/imx: Add missing DRM_FORMAT_RGB565 to ipu_plane_formats
  drm/imx: notify DRM core about CRTC vblank state
  gpu: ipu-v3: Reset IPU before activating IRQ
  gpu: ipu-v3: Do not bail out on missing optional port nodes

Merge tag 'trace-fixes-v4.5-rc7' of git://git./linux/kernel/git/rostedt/linux-trace

Pull tracing fix from Steven Rostedt:
"I previously sent a fix that prevents all trace events from being
  called if the current cpu is offline.

  But I forgot that in 3.18, we added lockdep checks to test RCU usage
  even when the event is disabled.  Although there cannot be any bug
  when a cpu is going offline, we now get false warnings triggered by
  the added checks of the event being disabled.

  I removed the check from the tracepoint code itself, and added it to
  the condition section (which is "1" for 'no condition').  This way the
  online cpu check will get checked in all the right locations"

* tag 'trace-fixes-v4.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Fix check for cpu online when event is disabled

ext4: iterate over buffer heads correctly in move_extent_per_page()

In commit bcff24887d00 ("ext4: don't read blocks from disk after extents
being swapped") bh is not updated correctly in the for loop and wrong
data has been written to disk. generic/324 catches this on sub-page
block size ext4.

Fixes: bcff24887d00 ("ext4: don't read blocks from disk after extentsbeing swapped")
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

Merge branch 'akpm' (patches from Andrew)

Merge fixes from Andrew Morton:
"13 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  dma-mapping: avoid oops when parameter cpu_addr is null
  mm/hugetlb: use EOPNOTSUPP in hugetlb sysctl handlers
  memremap: check pfn validity before passing to pfn_to_page()
  mm, thp: fix migration of PTE-mapped transparent huge pages
  dax: check return value of dax_radix_entry()
  ocfs2: fix return value from ocfs2_page_mkwrite()
  arm64: kasan: clear stale stack poison
  sched/kasan: remove stale KASAN poison after hotplug
  kasan: add functions to clear stack poison
  mm: fix mixed zone detection in devm_memremap_pages
  list: kill list_force_poison()
  mm: __delete_from_page_cache show Bad page if mapped
  mm/hugetlb: hugetlb_no_page: rate-limit warning message

dma-mapping: avoid oops when parameter cpu_addr is null

To keep consistent with kfree, which tolerate ptr is NULL.  We do this
because sometimes we may use goto statement, so that success and failure
case can share parts of the code.  But unfortunately, dma_free_coherent
called with parameter cpu_addr is null will cause oops, such as showed
below:

  Unable to handle kernel paging request at virtual address ffffffc020d3b2b8
  pgd = ffffffc083a61000
  [ffffffc020d3b2b8] *pgd=0000000000000000, *pud=0000000000000000
  CPU: 4 PID: 1489 Comm: malloc_dma_1 Tainted: G           O    4.1.12 #1
  Hardware name: ARM64 (DT)
  PC is at __dma_free_coherent.isra.10+0x74/0xc8
  LR is at __dma_free+0x9c/0xb0
  Process malloc_dma_1 (pid: 1489, stack limit = 0xffffffc0837fc020)
  [...]
  Call trace:
    __dma_free_coherent.isra.10+0x74/0xc8
    __dma_free+0x9c/0xb0
    malloc_dma+0x104/0x158 [dma_alloc_coherent_mtmalloc]
    kthread+0xec/0xfc

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm/hugetlb: use EOPNOTSUPP in hugetlb sysctl handlers

Replace ENOTSUPP with EOPNOTSUPP.  If hugepages are not supported, this
value is propagated to userspace.  EOPNOTSUPP is part of uapi and is
widely supported by libc libraries.

It gives nicer message to user, rather than:

  # cat /proc/sys/vm/nr_hugepages
  cat: /proc/sys/vm/nr_hugepages: Unknown error 524

And also LTP's proc01 test was failing because this ret code (524)
was unexpected:

  proc01      1  TFAIL  :  proc01.c:396: read failed: /proc/sys/vm/nr_hugepages: errno=???(524): Unknown error 524
  proc01      2  TFAIL  :  proc01.c:396: read failed: /proc/sys/vm/nr_hugepages_mempolicy: errno=???(524): Unknown error 524
  proc01      3  TFAIL  :  proc01.c:396: read failed: /proc/sys/vm/nr_overcommit_hugepages: errno=???(524): Unknown error 524

Signed-off-by: Jan Stancek <jstancek@redhat.com>
Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

memremap: check pfn validity before passing to pfn_to_page()

In memremap's helper function try_ram_remap(), we dereference a struct
page pointer that was derived from a PFN that is known to be covered by
a 'System RAM' iomem region, and is thus assumed to be a 'valid' PFN,
i.e., a PFN that has a struct page associated with it and is covered by
the kernel direct mapping.

However, the assumption that there is a 1:1 relation between the System
RAM iomem region and the kernel direct mapping is not universally valid
on all architectures, and on ARM and arm64, 'System RAM' may include
regions for which pfn_valid() returns false.

Generally speaking, both __va() and pfn_to_page() should only ever be
called on PFNs/physical addresses for which pfn_valid() returns true, so
add that check to try_ram_remap().

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm, thp: fix migration of PTE-mapped transparent huge pages

We don't have native support of THP migration, so we have to split huge
page into small pages in order to migrate it to different node. This
includes PTE-mapped huge pages.

I made mistake in refcounting patchset: we don't actually split
PTE-mapped huge page in queue_pages_pte_range(), if we step on head
page.

The result is that the head page is queued for migration, but none of
tail pages: putting head page on queue takes pin on the page and any
subsequent attempts of split_huge_pages() would fail and we skip queuing
tail pages.

unmap_and_move_huge_page() will eventually split the huge pages, but
only one of 512 pages would get migrated.

Let's fix the situation.

Fixes: 248db92da13f2507 ("migrate_pages: try to split pages on queuing")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

dax: check return value of dax_radix_entry()

dax_pfn_mkwrite() previously wasn't checking the return value of the
call to dax_radix_entry(), which was a mistake.

Instead, capture this return value and return the appropriate VM_FAULT_
value.

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ocfs2: fix return value from ocfs2_page_mkwrite()

ocfs2_page_mkwrite() could mistakenly return error code instead of
mkwrite status value. Fix it.

Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

arm64: kasan: clear stale stack poison

Functions which the compiler has instrumented for KASAN place poison on
the stack shadow upon entry and remove this poison prior to returning.

In the case of cpuidle, CPUs exit the kernel a number of levels deep in
C code. Any instrumented functions on this critical path will leave
portions of the stack shadow poisoned.

If CPUs lose context and return to the kernel via a cold path, we
restore a prior context saved in __cpu_suspend_enter are forgotten, and
we never remove the poison they placed in the stack shadow area by
functions calls between this and the actual exit of the kernel.

Thus, (depending on stackframe layout) subsequent calls to instrumented
functions may hit this stale poison, resulting in (spurious) KASAN
splats to the console.

To avoid this, clear any stale poison from the idle thread for a CPU
prior to bringing a CPU online.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reviewed-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

sched/kasan: remove stale KASAN poison after hotplug

Functions which the compiler has instrumented for KASAN place poison on
the stack shadow upon entry and remove this poision prior to returning.

In the case of CPU hotplug, CPUs exit the kernel a number of levels deep
in C code. Any instrumented functions on this critical path will leave
portions of the stack shadow poisoned.

When a CPU is subsequently brought back into the kernel via a different
path, depending on stackframe, layout calls to instrumented functions
may hit this stale poison, resulting in (spurious) KASAN splats to the
console.

To avoid this, clear any stale poison from the idle thread for a CPU
prior to bringing a CPU online.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Alexander Potapenko <glider@google.com>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

kasan: add functions to clear stack poison

Functions which the compiler has instrumented for ASAN place poison on
the stack shadow upon entry and remove this poison prior to returning.

In some cases (e.g. hotplug and idle), CPUs may exit the kernel a
number of levels deep in C code.  If there are any instrumented
functions on this critical path, these will leave portions of the idle
thread stack shadow poisoned.

If a CPU returns to the kernel via a different path (e.g. a cold
entry), then depending on stack frame layout subsequent calls to
instrumented functions may use regions of the stack with stale poison,
resulting in (spurious) KASAN splats to the console.

Contemporary GCCs always add stack shadow poisoning when ASAN is
enabled, even when asked to not instrument a function [1], so we can't
simply annotate functions on the critical path to avoid poisoning.

Instead, this series explicitly removes any stale poison before it can
be hit.  In the common hotplug case we clear the entire stack shadow in
common code, before a CPU is brought online.

On architectures which perform a cold return as part of cpu idle may
retain an architecture-specific amount of stack contents.  To retain the
poison for this retained context, the arch code must call the core KASAN
code, passing a "watermark" stack pointer value beyond which shadow will
be cleared.  Architectures which don't perform a cold return as part of
idle do not need any additional code.

This patch (of 3):

Functions which the compiler has instrumented for KASAN place poison on
the stack shadow upon entry and remove this poision prior to returning.

In some cases (e.g.  hotplug and idle), CPUs may exit the kernel a number
of levels deep in C code.  If there are any instrumented functions on this
critical path, these will leave portions of the stack shadow poisoned.

If a CPU returns to the kernel via a different path (e.g.  a cold entry),
then depending on stack frame layout subsequent calls to instrumented
functions may use regions of the stack with stale poison, resulting in
(spurious) KASAN splats to the console.

To avoid this, we must clear stale poison from the stack prior to
instrumented functions being called.  This patch adds functions to the
KASAN core for removing poison from (portions of) a task's stack.  These
will be used by subsequent patches to avoid problems with hotplug and
idle.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: fix mixed zone detection in devm_memremap_pages

The check for whether we overlap "System RAM" needs to be done at
section granularity.  For example a system with the following mapping:

    100000000-37bffffff : System RAM
    37c000000-837ffffff : Persistent Memory

...is unable to use devm_memremap_pages() as it would result in two
zones colliding within a given section.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Toshi Kani <toshi.kani@hpe.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

list: kill list_force_poison()

Given we have uninitialized list_heads being passed to list_add() it
will always be the case that those uninitialized values randomly trigger
the poison value.  Especially since a list_add() operation will seed the
stack with the poison value for later stack allocations to trip over.

For example, see these two false positive reports:

  list_add attempted on force-poisoned entry
  WARNING: at lib/list_debug.c:34
  [..]
  NIP [c00000000043c390] __list_add+0xb0/0x150
  LR [c00000000043c38c] __list_add+0xac/0x150
  Call Trace:
    __list_add+0xac/0x150 (unreliable)
    __down+0x4c/0xf8
    down+0x68/0x70
    xfs_buf_lock+0x4c/0x150 [xfs]

  list_add attempted on force-poisoned entry(0000000000000500),
   new->next == d0000000059ecdb0, new->prev == 0000000000000500
  WARNING: at lib/list_debug.c:33
  [..]
  NIP [c00000000042db78] __list_add+0xa8/0x140
  LR [c00000000042db74] __list_add+0xa4/0x140
  Call Trace:
    __list_add+0xa4/0x140 (unreliable)
    rwsem_down_read_failed+0x6c/0x1a0
    down_read+0x58/0x60
    xfs_log_commit_cil+0x7c/0x600 [xfs]

Fixes: commit 5c2c2587b132 ("mm, dax, pmem: introduce {get|put}_dev_pagemap() for dax-gup")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reported-by: Eryu Guan <eguan@redhat.com>
Tested-by: Eryu Guan <eguan@redhat.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: __delete_from_page_cache show Bad page if mapped

Commit e1534ae95004 ("mm: differentiate page_mapped() from
page_mapcount() for compound pages") changed the famous
BUG_ON(page_mapped(page)) in __delete_from_page_cache() to
VM_BUG_ON_PAGE(page_mapped(page)): which gives us more info when
CONFIG_DEBUG_VM=y, but nothing at all when not.

Although it has not usually been very helpul, being hit long after the
error in question, we do need to know if it actually happens on users'
systems; but reinstating a crash there is likely to be opposed :)

In the non-debug case, pr_alert("BUG: Bad page cache") plus dump_page(),
dump_stack(), add_taint() - I don't really believe LOCKDEP_NOW_UNRELIABLE,
but that seems to be the standard procedure now. Move that, or the
VM_BUG_ON_PAGE(), up before the deletion from tree: so that the
unNULLified page->mapping gives a little more information.

If the inode is being evicted (rather than truncated), it won't have any
vmas left, so it's safe(ish) to assume that the raised mapcount is
erroneous, and we can discount it from page_count to avoid leaking the
page (I'm less worried by leaking the occasional 4kB, than losing a
potential 2MB page with each 4kB page leaked).

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm/hugetlb: hugetlb_no_page: rate-limit warning message

The warning message "killed due to inadequate hugepage pool" simply
indicates that SIGBUS was sent, not that the process was forcibly killed.
If the process has a signal handler installed does not fix the problem,
this message can rapidly spam the kernel log.

On my amd64 dev machine that does not have hugepages configured, I can
reproduce the repeated warnings easily by setting vm.nr_hugepages=2 (i.e.,
4 megabytes of huge pages) and running something that sets a signal
handler and forks, like

  #include <sys/mman.h>
  #include <signal.h>
  #include <stdlib.h>
  #include <unistd.h>

  sig_atomic_t counter = 10;
  void handler(int signal)
  {
      if (counter-- == 0)
         exit(0);
  }

  int main(void)
  {
      int status;
      char *addr = mmap(NULL, 4 * 1048576, PROT_READ | PROT_WRITE,
              MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB, -1, 0);
      if (addr == MAP_FAILED) {perror("mmap"); return 1;}
      *addr = 'x';
      switch (fork()) {
         case -1:
            perror("fork"); return 1;
         case 0:
            signal(SIGBUS, handler);
            *addr = 'x';
            break;
         default:
            *addr = 'x';
            wait(&status);
            if (WIFSIGNALED(status)) {
               psignal(WTERMSIG(status), "child");
            }
            break;
      }
  }

Signed-off-by: Geoffrey Thomas <geofft@ldpreload.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ACPI / OSL: Add support to install tables via initrd

This patch adds support to install tables from initrd.

If a table in the initrd wasn't used by the override mechanism,
the table would be installed after initializing all RSDT/XSDT
tables.

Link: https://lkml.org/lkml/2014/2/28/368
Reported-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

ACPI / OSL: Clean up initrd table override code

This patch cleans up the initrd table override code by merging
redundant logics and re-ordering code blocks.

No functional changes.

Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

PNP / ACPI: add ACPI_RESOURCE_TYPE_SERIAL_BUS as a valid type

An error message is printed for resources of type 19, which is a valid
supported resource type. The Firmware Test Suite tool (fwts) reports
this as a test failure. This change fixes the false test failures
for ASL that use type 19 (ACPI_RESOURCE_TYPE_SERIAL_BUS) resources.

Signed-off-by: Harb Abdulhamid <harba@codeaurora.org>
Signed-off-by: Timur Tabi <timur@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

ACPI / util: remove redundant check if element is NULL

element is &package->package.elements[i] which can never be NULL
so the check to see if it is NULL is redundant and can be removed.

Detected with static analysis by CoverityScan

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

ACPI: Add acpi_force_32bit_fadt_addr option to force 32 bit FADT addresses

Some HP laptops seem to have invalid 64 bit FADT X_PM* addresses
which are causing various boot issues. In these cases, it would
be useful to force ACPI to use the valid legacy 32 bit equivalent
PM addresses. Add a acpi_force_32bit_fadt_addr to set the ACPICA
acpi_gbl_use32_bit_fadt_addresses to TRUE to force this override.

Link: https://bugs.launchpad.net/bugs/1529381
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>