Greg Kurz [Fri, 12 Dec 2014 11:37:40 +0000 (12:37 +0100)]
powerpc/powernv: Ignore smt-enabled on Power8 and later
Starting with POWER8, the subcore logic relies on all threads of a core
being booted so that they can participate in split mode switches. So on
those machines we ignore the smt_enabled_at_boot setting (smt-enabled on
the kernel command line).
Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
[mpe: Update comment and change log to be more precise]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael S. Tsirkin [Sun, 14 Dec 2014 16:52:51 +0000 (18:52 +0200)]
powerpc/uaccess: Allow get_user() with bitwise types
At the moment, if p and x are both of the same bitwise type
(eg. __le32), get_user(x, p) produces a sparse warning.
This is because *p is loaded into a long then cast back to typeof(*p).
When typeof(*p) is a bitwise type (which is uncommon), such a cast needs
__force, otherwise sparse produces a warning.
For non-bitwise types __force should have no effect, and should not hide
any legitimate errors.
Note that we are casting to typeof(*p) not typeof(x). Even with the
cast, if x and *p are of different types we should get the warning, so I
think we are not loosing the ability to detect any actual errors.
virtio would like to use bitwise types with get_user() so fix these
spurious warnings by adding __force.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
[mpe: Fill in changelog with more details]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Benjamin Herrenschmidt [Thu, 11 Dec 2014 01:39:49 +0000 (12:39 +1100)]
powerpc/powernv: Expose OPAL firmware symbol map
Newer versions of OPAL will provide this, so let's expose it to user
space so tools like perf can use it to properly decode samples in
firmware space.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Shreyas B. Prabhu [Tue, 9 Dec 2014 18:56:53 +0000 (00:26 +0530)]
powernv/powerpc: Add winkle support for offline cpus
Winkle is a deep idle state supported in power8 chips. A core enters
winkle when all the threads of the core enter winkle. In this state
power supply to the entire chiplet i.e core, private L2 and private L3
is turned off. As a result it gives higher powersavings compared to
sleep.
But entering winkle results in a total hypervisor state loss. Hence the
hypervisor context has to be preserved before entering winkle and
restored upon wake up.
Power-on Reset Engine (PORE) is a dedicated engine which is responsible
for powering on the chiplet during wake up. It can be programmed to
restore the register contests of a few specific registers. This patch
uses PORE to restore register state wherever possible and uses stack to
save and restore rest of the necessary registers.
With hypervisor state restore things fall under three categories-
per-core state, per-subcore state and per-thread state. To manage this,
extend the infrastructure introduced for sleep. Mainly we add a paca
variable subcore_sibling_mask. Using this and the core_idle_state we can
distingush first thread in core and subcore.
Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Shreyas B. Prabhu [Tue, 9 Dec 2014 18:56:52 +0000 (00:26 +0530)]
powernv/cpuidle: Redesign idle states management
Deep idle states like sleep and winkle are per core idle states. A core
enters these states only when all the threads enter either the
particular idle state or a deeper one. There are tasks like fastsleep
hardware bug workaround and hypervisor core state save which have to be
done only by the last thread of the core entering deep idle state and
similarly tasks like timebase resync, hypervisor core register restore
that have to be done only by the first thread waking up from these
state.
The current idle state management does not have a way to distinguish the
first/last thread of the core waking/entering idle states. Tasks like
timebase resync are done for all the threads. This is not only is
suboptimal, but can cause functionality issues when subcores and kvm is
involved.
This patch adds the necessary infrastructure to track idle states of
threads in a per-core structure. It uses this info to perform tasks like
fastsleep workaround and timebase resync only once per core.
Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
Originally-by: Preeti U. Murthy <preeti@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: linux-pm@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Shreyas B. Prabhu [Tue, 9 Dec 2014 18:56:51 +0000 (00:26 +0530)]
powerpc/powernv: Enable Offline CPUs to enter deep idle states
The secondary threads should enter deep idle states so as to gain maximum
powersavings when the entire core is offline. To do so the offline path
must be made aware of the available deepest idle state. Hence probe the
device tree for the possible idle states in powernv core code and
expose the deepest idle state through flags.
Since the device tree is probed by the cpuidle driver as well, move
the parameters required to discover the idle states into an appropriate
common place to both the driver and the powernv core code.
Another point is that fastsleep idle state may require workarounds in
the kernel to function properly. This workaround is introduced in the
subsequent patches. However neither the cpuidle driver or the hotplug
path need be bothered about this workaround.
They will be taken care of by the core powernv code.
Originally-by: Srivatsa S. Bhat <srivatsa@mit.edu>
Signed-off-by: Preeti U. Murthy <preeti@linux.vnet.ibm.com>
Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
Reviewed-by: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: linux-pm@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Paul Mackerras [Tue, 9 Dec 2014 18:56:50 +0000 (00:26 +0530)]
powerpc/powernv: Switch off MMU before entering nap/sleep/rvwinkle mode
Currently, when going idle, we set the flag indicating that we are in
nap mode (paca->kvm_hstate.hwthread_state) and then execute the nap
(or sleep or rvwinkle) instruction, all with the MMU on. This is bad
for two reasons: (a) the architecture specifies that those instructions
must be executed with the MMU off, and in fact with only the SF, HV, ME
and possibly RI bits set, and (b) this introduces a race, because as
soon as we set the flag, another thread can switch the MMU to a guest
context. If the race is lost, this thread will typically start looping
on relocation-on ISIs at 0xc...4400.
This fixes it by setting the MSR as required by the architecture before
setting the flag or executing the nap/sleep/rvwinkle instruction.
Cc: stable@vger.kernel.org
[ shreyas@linux.vnet.ibm.com: Edited to handle LE ]
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Neelesh Gupta [Sat, 13 Dec 2014 18:01:05 +0000 (23:31 +0530)]
i2c: Driver to expose PowerNV platform i2c busses
The patch exposes the available i2c busses on the PowerNV platform
to the kernel and implements the bus driver to support i2c and
smbus commands.
The driver uses the platform device infrastructure to probe the busses
on the platform and registers them with the i2c driver framework.
Signed-off-by: Neelesh Gupta <neelegup@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Wolfram Sang <wsa@the-dreams.de> (I2C part, excluding the bindings)
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Richard Guy Briggs [Tue, 9 Dec 2014 20:37:07 +0000 (15:37 -0500)]
powerpc: add little endian flag to syscall_get_arch()
Since both ppc and ppc64 have LE variants which are now reported by uname, add
that flag (__AUDIT_ARCH_LE) to syscall_get_arch() and add AUDIT_ARCH_PPC64LE
variant.
Without this, perf trace and auditctl fail.
Mainline kernel reports ppc64le (per
a058801) but there is no matching
AUDIT_ARCH_PPC64LE.
Since 32-bit PPC LE is not supported by audit, don't advertise it in
AUDIT_ARCH_PPC* variants.
See:
https://www.redhat.com/archives/linux-audit/2014-August/msg00082.html
https://www.redhat.com/archives/linux-audit/2014-December/msg00004.html
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Sukadev Bhattiprolu [Wed, 10 Dec 2014 06:43:34 +0000 (01:43 -0500)]
power/perf/hv-24x7: Use kmem_cache_free() instead of kfree
Use kmem_cache_free() to free a buffer allocated with kmem_cache_alloc().
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
sukadev@linux.vnet.ibm.com [Wed, 10 Dec 2014 22:29:13 +0000 (14:29 -0800)]
powerpc/perf/hv-24x7: Use per-cpu page buffer
The 24x7 counters are continuously running and not updated on an
interrupt. So we record the event counts when stopping the event or
deleting it.
But to "read" a single counter in 24x7, we allocate a page and pass it
into the hypervisor (The HV returns the page full of counters from which
we extract the specific counter for this event).
We allocate a page using GFP_USER and when deleting the event, we end up
with the following warning because we are blocking in interrupt context.
[ 698.641709] BUG: scheduling while atomic: swapper/0/0/0x10010000
We could use GFP_ATOMIC but that could result in failures. Pre-allocate
a buffer so we don't have to allocate in interrupt context. Further as
Michael Ellerman suggested, use Per-CPU buffer so we only need to
allocate once per CPU.
Cc: stable@vger.kernel.org
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Ian Munsie [Mon, 8 Dec 2014 08:18:01 +0000 (19:18 +1100)]
cxl: Unmap MMIO regions when detaching a context
If we need to force detach a context (e.g. due to EEH or simply force
unbinding the driver) we should prevent the userspace contexts from
being able to access the Problem State Area MMIO region further, which
they may have mapped with mmap().
This patch unmaps any mapped MMIO regions when detaching a userspace
context.
Cc: stable@vger.kernel.org
Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Ian Munsie [Mon, 8 Dec 2014 08:17:56 +0000 (19:17 +1100)]
cxl: Add timeout to process element commands
In the event that something goes wrong in the hardware and it is unable
to complete a process element comment we would end up polling forever,
effectively making the associated process unkillable.
This patch adds a timeout to the process element command code path, so
that we will give up if the hardware does not respond in a reasonable
time.
Cc: stable@vger.kernel.org
Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Ian Munsie [Mon, 8 Dec 2014 08:17:55 +0000 (19:17 +1100)]
cxl: Change contexts_lock to a mutex to fix sleep while atomic bug
We had a known sleep while atomic bug if a CXL device was forcefully
unbound while it was in use. This could occur as a result of EEH, or
manually induced with something like this while the device was in use:
echo 0000:01:00.0 > /sys/bus/pci/drivers/cxl-pci/unbind
The issue was that in this code path we iterated over each context and
forcefully detached it with the contexts_lock spin lock held, however
the detach also needed to take the spu_mutex, and call schedule.
This patch changes the contexts_lock to a mutex so that we are not in
atomic context while doing the detach, thereby avoiding the sleep while
atomic.
Also delete the related TODO comment, which suggested an alternate
solution which turned out to not be workable.
Cc: stable@vger.kernel.org
Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Anton Blanchard [Mon, 8 Dec 2014 23:58:19 +0000 (10:58 +1100)]
powerpc: Secondary CPUs must set cpu_callin_map after setting active and online
I have a busy ppc64le KVM box where guests sometimes hit the infamous
"kernel BUG at kernel/smpboot.c:134!" issue during boot:
BUG_ON(td->cpu != smp_processor_id());
Basically a per CPU hotplug thread scheduled on the wrong CPU. The oops
output confirms it:
CPU: 0
Comm: watchdog/130
The problem is that we aren't ensuring the CPU active and online bits are set
before allowing the master to continue on. The master unparks the secondary
CPUs kthreads and the scheduler looks for a CPU to run on. It calls
select_task_rq and realises the suggested CPU is not in the cpus_allowed
mask. It then ends up in select_fallback_rq, and since the active and
online bits aren't set we choose some other CPU to run on.
Cc: stable@vger.kernel.org
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Paul Mackerras [Wed, 3 Dec 2014 03:48:40 +0000 (14:48 +1100)]
powerpc/powernv: Return to cpu offline loop when finished in KVM guest
When a secondary hardware thread has finished running a KVM guest, we
currently put that thread into nap mode using a nap instruction in
the KVM code. This changes the code so that instead of doing a nap
instruction directly, we instead cause the call to power7_nap() that
put the thread into nap mode to return. The reason for doing this is
to avoid having the KVM code having to know what low-power mode to
put the thread into.
In the case of a secondary thread used to run a KVM guest, the thread
will be offline from the point of view of the host kernel, and the
relevant power7_nap() call is the one in pnv_smp_cpu_disable().
In this case we don't want to clear pending IPIs in the offline loop
in that function, since that might cause us to miss the wakeup for
the next time the thread needs to run a guest. To tell whether or
not to clear the interrupt, we use the SRR1 value returned from
power7_nap(), and check if it indicates an external interrupt. We
arrange that the return from power7_nap() when we have finished running
a guest returns 0, so pending interrupts don't get flushed in that
case.
Note that it is important a secondary thread that has finished
executing in the guest, or that didn't have a guest to run, should
not return to power7_nap's caller while the kvm_hstate.hwthread_req
flag in the PACA is non-zero, because the return from power7_nap
will reenable the MMU, and the MMU might still be in guest context.
In this situation we spin at low priority in real mode waiting for
hwthread_req to become zero.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Mahesh Salgaonkar [Fri, 5 Dec 2014 04:31:15 +0000 (10:01 +0530)]
powerpc/book3s: Fix partial invalidation of TLBs in MCE code.
The existing MCE code calls flush_tlb hook with IS=0 (single page) resulting
in partial invalidation of TLBs which is not right. This patch fixes
that by passing IS=0xc00 to invalidate whole TLB for successful recovery
from TLB and ERAT errors.
Cc: stable@vger.kernel.org
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Aneesh Kumar K.V [Thu, 4 Dec 2014 05:30:14 +0000 (11:00 +0530)]
powerpc/mm: don't do tlbie for updatepp request with NO HPTE fault
upatepp can get called for a nohpte fault when we find from the linux
page table that the translation was hashed before. In that case
we are sure that there is no existing translation, hence we could
avoid doing tlbie.
We could possibly race with a parallel fault filling the TLB. But
that should be ok because updatepp is only ever relaxing permissions.
We also look at linux pte permission bits when filling hash pte
permission bits. We also hold the linux pte busy bits while
inserting/updating a hashpte entry, hence a paralle update of
linux pte is not possible. On the other hand mprotect involves
ptep_modify_prot_start which cause a hpte invalidate and not updatepp.
Performance number:
We use randbox_access_bench written by Anton.
Kernel with THP disabled and smaller hash page table size.
86.60% random_access_b [kernel.kallsyms] [k] .native_hpte_updatepp
2.10% random_access_b random_access_bench [.] doit
1.99% random_access_b [kernel.kallsyms] [k] .do_raw_spin_lock
1.85% random_access_b [kernel.kallsyms] [k] .native_hpte_insert
1.26% random_access_b [kernel.kallsyms] [k] .native_flush_hash_range
1.18% random_access_b [kernel.kallsyms] [k] .__delay
0.69% random_access_b [kernel.kallsyms] [k] .native_hpte_remove
0.37% random_access_b [kernel.kallsyms] [k] .clear_user_page
0.34% random_access_b [kernel.kallsyms] [k] .__hash_page_64K
0.32% random_access_b [kernel.kallsyms] [k] fast_exception_return
0.30% random_access_b [kernel.kallsyms] [k] .hash_page_mm
With Fix:
27.54% random_access_b random_access_bench [.] doit
22.90% random_access_b [kernel.kallsyms] [k] .native_hpte_insert
5.76% random_access_b [kernel.kallsyms] [k] .native_hpte_remove
5.20% random_access_b [kernel.kallsyms] [k] fast_exception_return
5.12% random_access_b [kernel.kallsyms] [k] .__hash_page_64K
4.80% random_access_b [kernel.kallsyms] [k] .hash_page_mm
3.31% random_access_b [kernel.kallsyms] [k] data_access_common
1.84% random_access_b [kernel.kallsyms] [k] .trace_hardirqs_on_caller
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Mon, 1 Dec 2014 05:54:13 +0000 (16:54 +1100)]
powerpc/xmon: Cleanup the breakpoint flags
Drop BP_IABR_TE, which though used, does not do anything useful. Rename
BP_IABR to BP_CIABR. Renumber the flags.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Anshuman Khandual [Fri, 28 Nov 2014 04:36:42 +0000 (10:06 +0530)]
powerpc/xmon: Enable HW instruction breakpoint on POWER8
This patch enables support for hardware instruction breakpoint in xmon
on POWER8 platform with the help of a new register called the CIABR
(Completed Instruction Address Breakpoint Register). With this patch, a
single hardware instruction breakpoint can be added and cleared during
any active xmon debug session. The hardware based instruction breakpoint
mechanism works correctly with the existing TRAP based instruction
breakpoint available on xmon.
There are no powerpc CPU with CPU_FTR_IABR feature any more. This patch
has re-purposed all the existing IABR related code to work with CIABR
register based HW instruction breakpoint.
This has one odd feature, which is that when we hit a breakpoint xmon
doesn't tell us we have hit the breakpoint. This is because xmon is
expecting bp->address == regs->nip. Because CIABR fires on completition
regs->nip points to the instruction after the breakpoint. We could fix
that, but it would then confuse other parts of the xmon code which think
we need to emulate the instruction. [mpe]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Michael Ellerman [Tue, 2 Dec 2014 03:19:20 +0000 (14:19 +1100)]
Merge remote-tracking branch 'benh/next' into next
Merge updates collected & acked by Ben. A few EEH patches from Gavin,
some mm updates from Aneesh and a few odds and ends.
Aneesh Kumar K.V [Sun, 2 Nov 2014 15:45:28 +0000 (21:15 +0530)]
powerpc/mm/thp: Use tlbiel if possible
If we know that user address space has never executed on other cpus
we could use tlbiel.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Aneesh Kumar K.V [Sun, 2 Nov 2014 15:45:27 +0000 (21:15 +0530)]
powerpc/mm/thp: Remove code duplication
Rename invalidate_old_hpte to flush_hash_hugepage and use that in
other places.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
James Yang [Fri, 14 Nov 2014 18:32:24 +0000 (12:32 -0600)]
powerpc/mm/hugetlb: Sanity check gigantic hugepage count
Limit the number of gigantic hugepages specified by the
hugepages= parameter to MAX_NUMBER_GPAGES.
Signed-off-by: James Yang <James.Yang@freescale.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Jiang Lu [Fri, 21 Nov 2014 03:23:19 +0000 (11:23 +0800)]
powerpc/oprofile: Disable pagefaults during user stack read
A page fault occurred during reading user stack in oprofile backtrace
would lead following calltrace:
WARNING: at linux/kernel/smp.c:210
Modules linked in:
CPU: 5 PID: 736 Comm: sh Tainted: G W 3.14.23-WR7.0.0.0_standard #1
task:
c0000000f6208bc0 ti:
c00000007c72c000 task.ti:
c00000007c72c000
NIP:
c0000000000ed6e4 LR:
c0000000000ed5b8 CTR:
0000000000000000
REGS:
c00000007c72f050 TRAP: 0700 Tainted: G W (3.14.23-WR7.0.0
tandard)
MSR:
0000000080021000 <CE,ME> CR:
48222482 XER:
00000000
SOFTE: 0
GPR00:
c0000000000ed5b8 c00000007c72f2d0 c0000000010aa048 0000000000000005
GPR04:
c000000000fdb820 c00000007c72f410 0000000000000001 0000000000000005
GPR08:
c0000000010b5768 c000000000f8a048 0000000000000001 0000000000000000
GPR12:
0000000048222482 c00000000fffe580 0000000022222222 0000000010129664
GPR16:
0000000010143cc0 0000000000000000 0000000044444444 0000000000000000
GPR20:
c00000007c7221d8 c0000000f638e3c8 000003f15a20120d 0000000000000001
GPR24:
000000005a20120d c00000007c722000 c00000007cdedda8 00003fffef23b160
GPR28:
0000000000000001 c00000007c72f410 c000000000fdb820 0000000000000006
NIP [
c0000000000ed6e4] .smp_call_function_single+0x18c/0x248
LR [
c0000000000ed5b8] .smp_call_function_single+0x60/0x248
Call Trace:
[
c00000007c72f2d0] [
c0000000000ed5b8] .smp_call_function_single+0x60/0x248 (unreliable)
[
c00000007c72f3a0] [
c000000000030810] .__flush_tlb_page+0x164/0x1b0
[
c00000007c72f460] [
c00000000002e054] .ptep_set_access_flags+0xb8/0x168
[
c00000007c72f500] [
c0000000001ad3d8] .handle_mm_fault+0x4a8/0xbac
[
c00000007c72f5e0] [
c000000000bb3238] .do_page_fault+0x3b8/0x868
[
c00000007c72f810] [
c00000000001e1d0] storage_fault_common+0x20/0x44
Exception: 301 at .__copy_tofrom_user_base+0x54/0x5b0
LR = .op_powerpc_backtrace+0x190/0x20c
[
c00000007c72fb00] [
c000000000a2ec34] .op_powerpc_backtrace+0x204/0x20c (unreliable)
[
c00000007c72fbc0] [
c000000000a2b5fc] .oprofile_add_ext_sample+0xe8/0x118
[
c00000007c72fc70] [
c000000000a2eee0] .fsl_emb_handle_interrupt+0x20c/0x27c
[
c00000007c72fd30] [
c000000000a2e440] .op_handle_interrupt+0x44/0x58
[
c00000007c72fdb0] [
c000000000016d68] .performance_monitor_exception+0x74/0x90
[
c00000007c72fe30] [
c00000000001d8b4] exc_0x260_common+0xfc/0x100
performance_monitor_exception() is executed in a context with interrupt
disabled and preemption enabled. When there is a user space page fault
happened, do_page_fault() invoke in_atomic() to decide whether kernel
should handle such page fault. in_atomic() only check preempt_count.
So need call pagefault_disable() to disable preemption before reading
user stack.
Signed-off-by: Jiang Lu <lu.jiang@windriver.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Aneesh Kumar K.V [Mon, 3 Nov 2014 14:51:34 +0000 (20:21 +0530)]
powerpc/mm: Check for matching hpte without taking hpte lock
With smaller hash page table config, we would end up in situation
where we would be replacing hash page table slot frequently. In
such config, we will find the hpte to be not matching, and we
can do that check without holding the hpte lock. We need to
recheck the hpte again after holding lock.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Greg Kurz [Tue, 25 Nov 2014 16:10:06 +0000 (17:10 +0100)]
powerpc: Drop useless warning in eeh_init()
This is what we get in dmesg when booting a pseries guest and
the hypervisor doesn't provide EEH support.
[ 0.166655] EEH functionality not supported
[ 0.166778] eeh_init: Failed to call platform init function (-22)
Since both powernv_eeh_init() and pseries_eeh_init() already complain when
hitting an error, it is not needed to print more (especially such an
uninformative message).
Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Mahesh Salgaonkar [Mon, 24 Nov 2014 16:29:26 +0000 (21:59 +0530)]
powerpc/powernv: Cleanup unused MCE definitions/declarations.
Cleanup OpalMCE_* definitions/declarations and other related code which
is not used anymore.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Acked-by: Benjamin Herrrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Gavin Shan [Sat, 22 Nov 2014 10:58:09 +0000 (21:58 +1100)]
powerpc/eeh: Dump PHB diag-data early
On PowerNV platform, PHB diag-data is dumped after stopping device
drivers. In case of recursive EEH errors, the kernel is usually
crashed before dumping PHB diag-data for the second EEH error. It's
hard to locate the root cause of the second EEH error without PHB
diag-data.
The patch adds one more EEH option "eeh=early_log", which helps
dumping PHB diag-data immediately once frozen PE is detected, in
order to get the PHB diag-data for the second EEH error.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Gavin Shan [Thu, 13 Nov 2014 23:47:30 +0000 (10:47 +1100)]
powerpc/eeh: Recover EEH error on ownership change for BCM5719
In PCI passthrou scenario, we need simulate EEH recovery for Emulex
adapters when their ownership changes, as we did in commit
5cfb20b96
("powerpc/eeh: Emulate EEH recovery for VFIO devices"). Broadcom
BCM5719 adpaters are facing same problem and needs same cure.
Reported-by: Rajeshkumar Subramanian <rajeshkumars@in.ibm.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Gavin Shan [Thu, 13 Nov 2014 23:47:29 +0000 (10:47 +1100)]
powerpc/eeh: Set EEH_PE_RESET on PE reset
The patch introduces additional flag EEH_PE_RESET to indicate the
corresponding PE is under reset. In turn, the PE retrieval bakcend
on PowerNV platform can return unfrozen state for the EEH core to
moving forward. Flag EEH_PE_CFG_BLOCKED isn't the correct one for
the purpose.
In PCI passthrou case, the problem is more worse: Guest doesn't
recover 6th EEH error. The PE is left in isolated (frozen) and
config blocked state on Broadcom adapters. We can't retrieve the
PE's state correctly any more, even from the host side via sysfs
/sys/bus/pci/devices/xxx/eeh_pe_state.
Reported-by: Rajeshkumar Subramanian <rajeshkumars@in.ibm.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Gavin Shan [Thu, 13 Nov 2014 23:47:28 +0000 (10:47 +1100)]
powerpc/eeh: Refactor eeh_reset_pe()
The patch refactors eeh_reset_pe() in order for:
* Varied return values for different failure cases.
* Replace pr_err() with pr_warn() and print function name.
* Coding style cleanup.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Michael Ellerman [Tue, 18 Nov 2014 05:47:35 +0000 (16:47 +1100)]
powerpc: Remove more traces of bootmem
Although we are now selecting NO_BOOTMEM, we still have some traces of
bootmem lying around. That is because even with NO_BOOTMEM there is
still a shim that converts bootmem calls into memblock calls, but
ultimately we want to remove all traces of bootmem.
Most of the patch is conversions from alloc_bootmem() to
memblock_virt_alloc(). In general a call such as:
p = (struct foo *)alloc_bootmem(x);
Becomes:
p = memblock_virt_alloc(x, 0);
We don't need the cast because memblock_virt_alloc() returns a void *.
The alignment value of zero tells memblock to use the default alignment,
which is SMP_CACHE_BYTES, the same value alloc_bootmem() uses.
We remove a number of NULL checks on the result of
memblock_virt_alloc(). That is because memblock_virt_alloc() will panic
if it can't allocate, in exactly the same way as alloc_bootmem(), so the
NULL checks are and always have been redundant.
The memory returned by memblock_virt_alloc() is already zeroed, so we
remove several memsets of the result of memblock_virt_alloc().
Finally we convert a few uses of __alloc_bootmem(x, y, MAX_DMA_ADDRESS)
to just plain memblock_virt_alloc(). We don't use memblock_alloc_base()
because MAX_DMA_ADDRESS is ~0ul on powerpc, so limiting the allocation
to that is pointless, 16XB ought to be enough for anyone.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Li Zhong [Mon, 17 Nov 2014 02:52:30 +0000 (10:52 +0800)]
powerpc/pseries: Initialise nvram_pstore_info's buf_lock
nvram_pstore_info's buf_lock is not initialized before registering,
which is clearly incorrect.
It causes some strange behavior when trying to obtain the lock during
kdump process.
On a UP configuration, the console stopped for a couple of seconds, then
"lockup suspected" warning printed out, but then it continued to run.
So try lock fails, and lockup reported, but then arch_spin_lock()
passes.
Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
[mpe: Edited changelog]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Tue, 18 Nov 2014 05:58:15 +0000 (16:58 +1100)]
Merge remote-tracking branch 'scottwood/next' into next
Scott says:
"Highlights include a bunch of 8xx optimizations, device tree bindings
for Freescale BMan, QMan, and FMan datapath components, misc device tree
updates, and inbound rio window support."
Michael Neuling [Fri, 14 Nov 2014 07:09:28 +0000 (18:09 +1100)]
cxl: Name interrupts in /proc/interrupt
Currently all interrupts generated by cxl are named "cxl". This is not very
informative as we can't distinguish between cards, AFUs, error interrupts, user
contexts and user interrupts numbers. Being able to distinguish them is useful
for setting affinity.
This patch gives each of these names in /proc/interrupts.
A two card CAPI system, with afu0.0 having 2 active contexts each with 4 user
IRQs each, will now look like this:
% grep cxl /proc/interrupts
444: 0 OPAL ICS 141312 Level cxl-card1-err
445: 0 OPAL ICS 141313 Level cxl-afu1.0-err
446: 0 OPAL ICS 141314 Level cxl-afu1.0
462: 0 OPAL ICS 2052 Level cxl-afu0.0-pe0-1
463: 75517 OPAL ICS 2053 Level cxl-afu0.0-pe0-2
468: 0 OPAL ICS 2054 Level cxl-afu0.0-pe0-3
469: 0 OPAL ICS 2055 Level cxl-afu0.0-pe0-4
470: 0 OPAL ICS 2056 Level cxl-afu0.0-pe1-1
471: 75506 OPAL ICS 2057 Level cxl-afu0.0-pe1-2
472: 0 OPAL ICS 2058 Level cxl-afu0.0-pe1-3
473: 0 OPAL ICS 2059 Level cxl-afu0.0-pe1-4
502: 1066 OPAL ICS 2050 Level cxl-afu0.0
514: 0 OPAL ICS 2048 Level cxl-card0-err
515: 0 OPAL ICS 2049 Level cxl-afu0.0-err
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Ian Munsie [Fri, 14 Nov 2014 06:37:50 +0000 (17:37 +1100)]
cxl: Return error to PSL if IRQ demultiplexing fails & print clearer warning
If an AFU has a hardware bug that causes it to acknowledge a context
terminate or remove while that context has outstanding transactions, it
is possible for the kernel to receive an interrupt for that context
after we have removed it from the context list.
The kernel will not be able to demultiplex the interrupt (or worse - if
we have already reallocated the process handle we could mis-attribute it
to the new context), and printed a big scary warning.
It did not acknowledge the interrupt, which would effectively halt
further translation fault processing on the PSL.
This patch makes the warning clearer about the likely cause of the issue
(i.e. hardware bug) to make it obvious to future AFU designers of what
needs to be fixed. It also prints out the process handle which can then
be matched up with hardware and software traces for debugging.
It also acknowledges the interrupt to the PSL with either an address
error or acknowledge, so that the PSL can continue with other
translations.
Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Prabhakar Kushwaha [Fri, 31 Jan 2014 09:40:14 +0000 (15:10 +0530)]
powerpc/config: Enable memory driver
As Freescale IFC controller has been moved to driver to driver/memory.
So enable memory driver in powerpc config
Signed-off-by: Prabhakar Kushwaha <prabhakar@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Neelesh Gupta [Tue, 14 Oct 2014 08:38:36 +0000 (14:08 +0530)]
rtc/tpo: Driver to support rtc and wakeup on PowerNV platform
The patch implements the OPAL rtc driver that binds with the rtc
driver subsystem. The driver uses the platform device infrastructure
to probe the rtc device and register it to rtc class framework. The
'wakeup' is supported depending upon the property 'has-tpo' present
in the OF node. It provides a way to load the generic rtc driver in
in the absence of an OPAL driver.
The patch also moves the existing OPAL rtc get/set time interfaces to the
new driver and exposes the necessary OPAL calls using EXPORT_SYMBOL_GPL.
Test results:
-------------
Host:
[root@tul169p1 ~]# ls -l /sys/class/rtc/
total 0
lrwxrwxrwx 1 root root 0 Oct 14 03:07 rtc0 -> ../../devices/opal-rtc/rtc/rtc0
[root@tul169p1 ~]# cat /sys/devices/opal-rtc/rtc/rtc0/time
08:10:07
[root@tul169p1 ~]# echo `date '+%s' -d '+ 2 minutes'` > /sys/class/rtc/rtc0/wakealarm
[root@tul169p1 ~]# cat /sys/class/rtc/rtc0/wakealarm
1413274345
[root@tul169p1 ~]#
FSP:
$ smgr mfgState
standby
$ rtim timeofday
System time is valid: 2014/10/14 08:12:04.225115
$ smgr mfgState
ipling
$
CC: devicetree@vger.kernel.org
CC: tglx@linutronix.de
CC: rtc-linux@googlegroups.com
CC: a.zummo@towertech.it
Signed-off-by: Neelesh Gupta <neelegup@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Vineeth Vijayan [Fri, 14 Nov 2014 09:12:05 +0000 (14:42 +0530)]
powerpc: Use generic PIE randomization
Back in 2009 we merged
501cb16d3cfd "Randomise PIEs", which added support for
randomizing PIE (Position Independent Executable) binaries.
That commit added randomize_et_dyn(), which correctly randomized the addresses,
but failed to honor PF_RANDOMIZE. That means it was not possible to disable PIE
randomization via the personality flag, or /proc/sys/kernel/randomize_va_space.
Since then there has been generic support for PIE randomization added to
binfmt_elf.c, selectable via ARCH_BINFMT_ELF_RANDOMIZE_PIE.
Enabling that allows us to drop randomize_et_dyn(), which means we start
honoring PF_RANDOMIZE correctly.
It also causes a fairly major change to how we layout PIE binaries.
Currently we will place the binary at 512MB-520MB for 32 bit binaries, or
512MB-1.5GB for 64 bit binaries, eg:
$ cat /proc/$$/maps
4e550000-
4e580000 r-xp
00000000 08:02 129813 /bin/dash
4e580000-
4e590000 rw-p
00020000 08:02 129813 /bin/dash
10014110000-
10014140000 rw-p
00000000 00:00 0 [heap]
3fffaa3f0000-
3fffaa5a0000 r-xp
00000000 08:02 921 /lib/powerpc64le-linux-gnu/libc-2.19.so
3fffaa5a0000-
3fffaa5b0000 rw-p
001a0000 08:02 921 /lib/powerpc64le-linux-gnu/libc-2.19.so
3fffaa5c0000-
3fffaa5d0000 rw-p
00000000 00:00 0
3fffaa5d0000-
3fffaa5f0000 r-xp
00000000 00:00 0 [vdso]
3fffaa5f0000-
3fffaa620000 r-xp
00000000 08:02 1246 /lib/powerpc64le-linux-gnu/ld-2.19.so
3fffaa620000-
3fffaa630000 rw-p
00020000 08:02 1246 /lib/powerpc64le-linux-gnu/ld-2.19.so
3ffffc340000-
3ffffc370000 rw-p
00000000 00:00 0 [stack]
With this commit applied we don't do any special randomisation for the binary,
and instead rely on mmap randomisation. This means the binary ends up at high
addresses, eg:
$ cat /proc/$$/maps
3fff99820000-
3fff999d0000 r-xp
00000000 08:02 921 /lib/powerpc64le-linux-gnu/libc-2.19.so
3fff999d0000-
3fff999e0000 rw-p
001a0000 08:02 921 /lib/powerpc64le-linux-gnu/libc-2.19.so
3fff999f0000-
3fff99a00000 rw-p
00000000 00:00 0
3fff99a00000-
3fff99a20000 r-xp
00000000 00:00 0 [vdso]
3fff99a20000-
3fff99a50000 r-xp
00000000 08:02 1246 /lib/powerpc64le-linux-gnu/ld-2.19.so
3fff99a50000-
3fff99a60000 rw-p
00020000 08:02 1246 /lib/powerpc64le-linux-gnu/ld-2.19.so
3fff99a60000-
3fff99a90000 r-xp
00000000 08:02 129813 /bin/dash
3fff99a90000-
3fff99aa0000 rw-p
00020000 08:02 129813 /bin/dash
3fffc3de0000-
3fffc3e10000 rw-p
00000000 00:00 0 [stack]
3fffc55e0000-
3fffc5610000 rw-p
00000000 00:00 0 [heap]
Although this should be OK, it's possible it might break badly written
binaries that make assumptions about the address space layout.
Signed-off-by: Vineeth Vijayan <vvijayan@mvista.com>
[mpe: Rewrite changelog]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Gavin Shan [Wed, 12 Nov 2014 02:36:11 +0000 (13:36 +1100)]
powerpc/powernv: Fix potential zero devisor
If there're no PHBs under P5IOC2 HUB device tree node, we should
bail early to avoid zero devisor and allocating TCE tables.
Reported-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Gavin Shan [Wed, 12 Nov 2014 02:36:10 +0000 (13:36 +1100)]
powerpc/powernv: Bail upon invalid master PE
When freezing compound PEs in pnv_ioda_freeze_pe(), we should bail
upon illegal master PE. We needn't freeze slave PE because it should
have been put into frozen state by hardware.
Reported-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Gavin Shan [Wed, 12 Nov 2014 02:36:09 +0000 (13:36 +1100)]
powerpc/powernv: Simplify pnv_ioda_configure_pe()
Nested if statements are always bad and the patch avoids one by
checking PHB type and bail in advance if necessary.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Gavin Shan [Wed, 12 Nov 2014 02:36:08 +0000 (13:36 +1100)]
powerpc/powernv: Set PELTV for compound PEs
Commit
262af55 ("powerpc/powernv: Enable M64 aperatus for PHB3")
introduced compound PEs in order to support M64 aperatus on PHB3.
However, we never configured PELTV for compound PEs. The patch
fixes that by: parent PE can freeze all child compound PEs. Any
compound PE affects the group.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Gavin Shan [Wed, 12 Nov 2014 02:36:07 +0000 (13:36 +1100)]
powerpc/powernv: Initialize M64 PE in time
The patch initializes PE instance when reserving PE number to
keep consistent things as we did before. Also, it replaces the
iteration on bridge's windows with the prefered way.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Gavin Shan [Wed, 12 Nov 2014 02:36:06 +0000 (13:36 +1100)]
powerpc/powernv: Rename alloc_m64_pe() to reserve_m64_pe()
The patch renames alloc_m64_pe() to reserve_m64_pe() to reflect
its real usage: We reserve PE numbers for M64 segments in advance
and then pick up the reserved PE numbers when building the mapping
between PE numbers and M64 segments.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Gavin Shan [Wed, 12 Nov 2014 02:36:05 +0000 (13:36 +1100)]
powerpc/powernv: Fix condition to remove M64
The M64 resource should be removed if we don't have hook to
initialize it, or (not and) fail to do that.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Gavin Shan [Wed, 12 Nov 2014 02:36:04 +0000 (13:36 +1100)]
powerpc/powernv: Check PHB type in advance
The patch checks PHB type a bit early to save a bit cycles
for P7 because we don't support M64 for P7IOC no matter what
OPAL firmware we have.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Aneesh Kumar K.V [Wed, 5 Nov 2014 16:27:41 +0000 (21:57 +0530)]
powerpc/mm: Switch to generic RCU get_user_pages_fast
This patch switch the ppc arch to use the generic RCU based
gup implementation.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Aneesh Kumar K.V [Wed, 5 Nov 2014 16:27:40 +0000 (21:57 +0530)]
mm: Update generic gup implementation to handle hugepage directory
Update generic gup implementation with powerpc specific details.
On powerpc at pmd level we can have hugepte, normal pmd pointer
or a pointer to the hugepage directory.
Tested-by: Steve Capper <steve.capper@linaro.org>
Acked-by: Steve Capper <steve.capper@linaro.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Aneesh Kumar K.V [Wed, 5 Nov 2014 16:27:39 +0000 (21:57 +0530)]
powerpc/mm: Add missing pmd accessors
This patch add documentation and missing accessors.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Aneesh Kumar K.V [Sun, 2 Nov 2014 14:32:42 +0000 (20:02 +0530)]
powerpc: Disable CPU_FTR_TM if TM is disabled by firmware
Firmware is allowed to communicate to us via the "ibm,pa-features" property
that TM (Transactional Memory) support is disabled.
Currently this doesn't happen on any platform we're aware of, but we should
honor it anyway.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Martijn de Gouw [Tue, 5 Aug 2014 13:52:32 +0000 (15:52 +0200)]
powerpc/fsl-rio: add support for mapping inbound windows
Add support for mapping and unmapping of inbound rapidio windows. This
allows for drivers to open up a part of local memory on the rapidio
network. Also applications can use this and tranfer blocks of data
over the network.
Signed-off-by: Martijn de Gouw <martijn.de.gouw@prodrive-technologies.com>
[scottwood@freescale.com: updated commit message based on review]
Signed-off-by: Scott Wood <scottwood@freescale.com>
Scott Wood [Fri, 7 Nov 2014 02:56:07 +0000 (20:56 -0600)]
powerpc/fsl: Update fman dt binding with clock name and qbman link
The clock name "fmanclk" was given in the example, but not specified
in the binding itself. Made clock-names mandatory as otherwise there's
not much point having it.
Added a reference to the fsl,qman and fsl,bman properties proposed
in http://patchwork.ozlabs.org/patch/407034/ and
http://patchwork.ozlabs.org/patch/407035/
Signed-off-by: Scott Wood <scottwood@freescale.com>
Igal Liberman [Wed, 17 Sep 2014 11:08:30 +0000 (14:08 +0300)]
powerpc/fsl: Frame Manager Device Tree binding document
The Frame Manager (FMan) combines the Ethernet network interfaces with
packet distribution logic to provide intelligent distribution and
queuing decisions for incoming traffic at line rate.
This binding document describes Freescale's Frame Manager hardware
attributes that are used by the Frame Manager driver for its basic
initialization and configuration.
Signed-off-by: Igal Liberman <Igal.Liberman@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Emil Medve [Wed, 5 Nov 2014 15:18:54 +0000 (09:18 -0600)]
dt/bindings: Introduce the FSL QorIQ DPAA QMan portal(s)
Portals are memory mapped interfaces to QMan that allow low-latency,
lock-less interaction by software running on processor cores,
accelerators and network interfaces with the QMan
Signed-off-by: Emil Medve <Emilian.Medve@Freescale.com>
Change-Id: I29764fa8093b5ce65460abc879446795c50d7185
Signed-off-by: Scott Wood <scottwood@freescale.com>
Emil Medve [Wed, 5 Nov 2014 15:18:53 +0000 (09:18 -0600)]
dt/bindings: Introduce the FSL QorIQ DPAA QMan
The Queue Manager is part of the Data-Path Acceleration Architecture
(DPAA). QMan supports queuing and QoS scheduling of frames to CPUs,
network interfaces and DPAA logic modules, maintains packet ordering
within flows. Besides providing flow-level queuing, is also
responsible for congestion management functions such as RED/WRED,
congestion notifications and tail discards. This binding covers the
CCSR space programming model
Signed-off-by: Emil Medve <Emilian.Medve@Freescale.com>
Change-Id: I3acb223893e42003d6c9dc061db568ec0b10d29b
Signed-off-by: Scott Wood <scottwood@freescale.com>
Emil Medve [Wed, 5 Nov 2014 15:18:52 +0000 (09:18 -0600)]
dt/bindings: Introduce the FSL QorIQ DPAA BMan portal(s)
Portals are memory mapped interfaces to BMan that allow low-latency,
lock-less interaction by software running on processor cores,
accelerators and network interfaces with the BMan
Signed-off-by: Emil Medve <Emilian.Medve@Freescale.com>
Change-Id: I6d245ffc14ba3d0e91d403ac7c3b91b75a9e6a95
Signed-off-by: Scott Wood <scottwood@freescale.com>
Emil Medve [Wed, 5 Nov 2014 15:18:51 +0000 (09:18 -0600)]
dt/bindings: Introduce the FSL QorIQ DPAA BMan
The Buffer Manager is part of the Data-Path Acceleration Architecture
(DPAA). BMan supports hardware allocation and deallocation of buffers
belonging to pools originally created by software with configurable
depletion thresholds. This binding covers the CCSR space programming
model
Signed-off-by: Emil Medve <Emilian.Medve@Freescale.com>
Change-Id: I3ec479bfb3c91951e96902f091f5d7d2adbef3b2
Signed-off-by: Scott Wood <scottwood@freescale.com>
Michael Ellerman [Wed, 12 Nov 2014 05:54:54 +0000 (16:54 +1100)]
powerpc/xmon: Fix build when 4xx=y and 44x=n
dump_tlb_44x() is only defined when 44x=y, but the ifdef in xmon.c
checks for 4xx, leading to a build failure:
arch/powerpc/xmon/xmon.c:912:4: error: implicit declaration of function 'dump_tlb_44x'
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Wed, 12 Nov 2014 05:33:35 +0000 (16:33 +1100)]
Merge branch 'topic/opal-ipmi' into next
Boqun Feng [Tue, 11 Nov 2014 04:50:22 +0000 (12:50 +0800)]
powerpc: Fix comment typos in arch/powerpc/include/asm/bitops.h
In arch/powerpc/include/asm/bitops.h, the comments about bit numbers in
large (> 1 word) bitmaps have two typos:
- On ppc64 system, the LSB of the 4th word should be bit 192 rather than
196, because if it's bit 196, bit 192-195 will be missing in the
bitmap.
- On ppc32 system, the LSB of the second word should be bit 32 rather
than 31, because bit 31 is already in the first word.
This patch fixes these typos.
Signed-off-by: Boqun Feng <boqun.feng@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Jeremy Kerr [Thu, 6 Nov 2014 03:38:27 +0000 (11:38 +0800)]
powerpc/powernv: Add OPAL IPMI interface
Recent OPAL firmare adds a couple of functions to send and receive IPMI
messages:
https://github.com/open-power/skiboot/commit/
b2a374da
This change updates the token list and wrappers to suit, and adds the
platform devices for any IPMI interfaces.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Paul Mackerras [Mon, 3 Nov 2014 04:46:43 +0000 (15:46 +1100)]
powerpc: Fix compilation of emulate_step()
Commit
be96f63375a1 ("powerpc: Split out instruction analysis
part of emulate_step()") added some calls to do_fp_load()
and do_fp_store(), which fail to compile on configs with
CONFIG_PPC_FPU=n and CONFIG_PPC_EMULATE_SSTEP=y. This fixes
the compile by adding #ifdef CONFIG_PPC_FPU around the code
that calls these functions.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Suresh E. Warrier [Mon, 3 Nov 2014 04:46:42 +0000 (15:46 +1100)]
powerpc: Save/restore PPR for KVM hypercalls
The system call FLIH (first-level interrupt handler) at 0xc00
unconditionally sets hardware priority to medium. For hypercalls, this
means we lose guest OS priority. The front end (do_kvm_0x**) to the
KVM interrupt handler always assumes that PPR priority is saved in
PACA exception save area, so it copies this to the kvm_hstate
structure. For hypercalls, this would be the saved priority from any
previous exception. Eventually, the guest gets resumed with an
incorrect priority.
The fix is to save the PPR priority in PACA exception save area before
switching HMT priorities in the FLIH so that existing code described above
in the KVM interrupt handler can copy it from there into the VCPU's saved
context.
Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
[mpe: Dropped HMT_MEDIUM_PPR_DISCARD and reworded comment]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Gavin Shan [Wed, 12 Nov 2014 02:29:28 +0000 (13:29 +1100)]
powerpc/mm: Use PAGE_FACTOR
PAGE_FACTOR was defined to reflect the difference between configured
page size and fixed 4KB page size. Replace (PAGE_SHIFT - HW_PAGE_SHIFT)
with PAGE_FACTOR.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Anton Blanchard [Mon, 10 Nov 2014 22:12:28 +0000 (09:12 +1100)]
powerpc: Fix bad NULL pointer check in udbg_uart_getc_poll()
We have some code in udbg_uart_getc_poll() that tries to protect
against a NULL udbg_uart_in, but gets it all wrong.
Found with the LLVM static analyzer (scan-build).
Fixes:
309257484cc1 ("powerpc: Cleanup udbg_16550 and add support for LPC PIO-only UARTs")
Signed-off-by: Anton Blanchard <anton@samba.org>
[mpe: Add some newlines for readability while we're here]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Julia Lawall [Fri, 8 Aug 2014 10:07:48 +0000 (12:07 +0200)]
powerpc/gamecube/wii: delete unneeded test before of_node_put
Simplify the error path to avoid calling of_node_put when it is not needed.
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Julia Lawall [Fri, 8 Aug 2014 10:07:47 +0000 (12:07 +0200)]
powerpc/pseries: delete unneeded test before of_node_put
Of_node_put supports NULL as its argument, so the initial test is not
necessary.
Suggested by Uwe Kleine-König.
The semantic patch that fixes this problem is as follows:
(http://coccinelle.lip6.fr/)
// <smpl>
@@
expression e;
@@
-if (e)
of_node_put(e);
// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Julia Lawall [Fri, 8 Aug 2014 10:07:46 +0000 (12:07 +0200)]
powerpc/mpc5xxx: delete unneeded test before of_node_put
Of_node_put supports NULL as its argument, so the initial test is not
necessary.
Suggested by Uwe Kleine-König.
The semantic patch that fixes this problem is as follows:
(http://coccinelle.lip6.fr/)
// <smpl>
@@
expression e;
@@
-if (e)
of_node_put(e);
// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Julia Lawall [Fri, 8 Aug 2014 10:07:45 +0000 (12:07 +0200)]
powerpc/fsl: fsl_soc: delete unneeded test before of_node_put
Of_node_put supports NULL as its argument, so the initial test is not
necessary.
Suggested by Uwe Kleine-König.
The semantic patch that fixes this problem is as follows:
(http://coccinelle.lip6.fr/)
// <smpl>
@@
expression e;
@@
-if (e)
of_node_put(e);
// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Julia Lawall [Fri, 8 Aug 2014 10:07:44 +0000 (12:07 +0200)]
powerpc/4xx/cpm: delete unneeded test before of_node_put
Simplify the error path to avoid calling of_node_put when it is not needed.
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Anton Blanchard [Sun, 2 Nov 2014 21:34:01 +0000 (08:34 +1100)]
powerpc/pseries: Quieten relocation on exceptions warning
The H_SET_MODE hcall returns H_P2 if a function is not implemented
and all callers should handle this case.
The call to enable relocation on exceptions currently prints an error
message if the feature is not implemented. While H_SET_MODE was
first introduced on POWER8 (which has relocation on exceptions), it
has been now added on some POWER7 configurations (which does not).
Check for H_P2 and print an informational message instead.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Anton Blanchard [Sun, 2 Nov 2014 21:18:06 +0000 (08:18 +1100)]
powerpc/pseries: Quieten ibm,pcie-link-speed-stats warning
The ibm,pcie-link-speed-stats isn't mandatory, so we shouldn't print
a high priority error message when missing. One example where we see
this is QEMU.
Reduce it to pr_debug.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Anton Blanchard [Fri, 31 Oct 2014 03:47:27 +0000 (14:47 +1100)]
powerpc: LLVM complains about forward declaration of struct rtas_sensors
Move the declaration up to silence the warning.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Anton Blanchard [Fri, 31 Oct 2014 03:47:26 +0000 (14:47 +1100)]
powerpc: Remove double braces in alignment code.
Looks like I introduced this when adding LE support.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Anton Blanchard [Fri, 31 Oct 2014 03:47:25 +0000 (14:47 +1100)]
powerpc: Don't use local named register variable in current_thread_info
LLVM doesn't support local named register variables and is unlikely
to. current_thread_info is using one, fix it by moving it out and
calling it __current_r1().
I gave it a bit of an obscure name because we don't want anyone else
using it - they should use current_stack_pointer(). This specific
case is performance critical and we can't afford to call a function
to get it. Furthermore it isn't important to know exactly where in
the stack we are since we mask the lower bits.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Mon, 27 Oct 2014 09:56:14 +0000 (20:56 +1100)]
powerpc: Remove unused vgacon_remap_base & fix build break
The build is broken with CONFIG_PPC32=y, CONFIG_FB_VGA16=y and
CONFIG_VGA_CONSOLE=n.
The problem is that vgacon_remap_base is not defined. It's used in:
#define VGA_MAP_MEM(x,s) (x + vgacon_remap_base)
Which is used in the vga16fb.c code.
Digging down it seems vgacon_remap_base is never initialised. It used to
be, back in arch/ppc (pplus.c and prep_setup.c), but none of that code
ever made it to arch/powerpc.
So given it's been unused for >6 years, remove it.
Whether vga16fb.c works on 32-bit is another question, but this patch
shouldn't affect it.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Anton Blanchard [Thu, 30 Oct 2014 04:43:43 +0000 (15:43 +1100)]
powerpc/jump_label: Use HAVE_JUMP_LABEL
Commit
d4fe0965e208 ("powerpc/jump_label: use HAVE_JUMP_LABEL?")
missed a few conversions. Change the remaining uses of
CONFIG_JUMP_LABEL to HAVE_JUMP_LABEL.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Simon Kagstrom [Tue, 28 Oct 2014 11:19:00 +0000 (12:19 +0100)]
powerpc/boot: Parse chosen/cmdline-timeout parameter
On some platforms a 5 second timeout during boot might be quite long, so
make it configurable. Run the loop at least once to let the user stop
the boot by holding a key pressed. If the timeout is set to 0, don't
wait for input, which can be used as a workaround if the boot hangs on
random data coming in on the serial port.
Signed-off-by: Simon Kagstrom <simon.kagstrom@netinsight.net>
[mpe: Changelog wording & whitespace]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Wed, 6 Aug 2014 08:26:28 +0000 (18:26 +1000)]
powerpc: Remove unused CPU_FTRS_A2
In commit
fb5a515704d7 "Remove platforms/wsp and associated pieces" we
removed the last user of CPU_FTRS_A2, so we should remove it too.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Thu, 23 Oct 2014 05:35:14 +0000 (16:35 +1100)]
powerpc: Remove CPU_FTR_HVMODE from CPU_FTRS_ALWAYS
We potentially clear CPU_FTR_HVMODE at runtime in __init_hvmode_206(),
so we must make sure it's not set in CPU_FTRS_ALWAYS.
This doesn't hurt us in practice at the moment, because we don't support
compiling only for CPUs that support CPU_FTR_HVMODE.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Jiri Slaby [Tue, 29 Apr 2014 07:24:06 +0000 (09:24 +0200)]
powerpc/ftrace: Fix obsolete comment
CONFIG_MCOUNT is not defined anymore, the corresponding #ifdef there
is CONFIG_FUNCTION_TRACER.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Kyle McMartin [Sat, 25 May 2013 16:54:25 +0000 (12:54 -0400)]
powerpc: Remove unused devm_ioremap_prot()
Added in 2008, but has never had any in-tree users, and no other
architectures provide it.
Signed-off-by: Kyle McMartin <kyle@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Anton Blanchard [Wed, 17 Sep 2014 07:07:04 +0000 (17:07 +1000)]
powerpc/ftrace: simplify prepare_ftrace_return
Instead of passing in the stack address of the link register
to be modified, just pass in the old value and return the
new value and rely on ftrace_graph_caller to do the
modification.
This removes the exception handling around the stack update -
it isn't needed and we weren't consistent about it. Later on
we would do an unprotected modification:
if (!ftrace_graph_entry(&trace)) {
*parent = old;
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Anton Blanchard [Wed, 17 Sep 2014 07:07:03 +0000 (17:07 +1000)]
powerpc/ftrace: Remove mod_return_to_handler
mod_return_to_handler is the same as return_to_handler, except
it handles the change of the TOC (r2). Add this into
return_to_handler and remove mod_return_to_handler.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Anton Blanchard [Wed, 17 Sep 2014 12:15:37 +0000 (22:15 +1000)]
powerpc: make __ffs return unsigned long
I'm seeing a build warning in mm/nobootmem.c after removing
bootmem:
mm/nobootmem.c: In function '__free_pages_memory':
include/linux/kernel.h:713:17: warning: comparison of distinct pointer types lacks a cast [enabled by default]
(void) (&_min1 == &_min2); \
^
mm/nobootmem.c:90:11: note: in expansion of macro 'min'
order = min(MAX_ORDER - 1UL, __ffs(start));
^
The rest of the worlds seems to define __ffs as returning unsigned long,
so lets do that.
Signed-off-by: Anton Blanchard <anton@samba.org>
Tested-by: Emil Medve <Emilian.Medve@Freescale.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Anton Blanchard [Wed, 17 Sep 2014 12:15:36 +0000 (22:15 +1000)]
powerpc: Move sparse_init() into initmem_init
We did part of sparse initialisation in setup_arch and part in
initmem_init. Put them together.
Signed-off-by: Anton Blanchard <anton@samba.org>
Tested-by: Emil Medve <Emilian.Medve@Freescale.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Anton Blanchard [Wed, 17 Sep 2014 12:15:35 +0000 (22:15 +1000)]
powerpc: Remove superfluous bootmem includes
Lots of places included bootmem.h even when not using bootmem.
Signed-off-by: Anton Blanchard <anton@samba.org>
Tested-by: Emil Medve <Emilian.Medve@Freescale.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Anton Blanchard [Wed, 17 Sep 2014 12:15:34 +0000 (22:15 +1000)]
powerpc: Remove some old bootmem related comments
Now bootmem is gone from powerpc we can remove comments mentioning it.
Signed-off-by: Anton Blanchard <anton@samba.org>
Tested-by: Emil Medve <Emilian.Medve@Freescale.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Anton Blanchard [Wed, 17 Sep 2014 12:15:33 +0000 (22:15 +1000)]
powerpc: Remove bootmem allocator
At the moment we transition from the memblock alloctor to the bootmem
allocator. Gitting rid of the bootmem allocator removes a bunch of
complicated code (most of which I owe the dubious honour of being
responsible for writing).
Signed-off-by: Anton Blanchard <anton@samba.org>
Tested-by: Emil Medve <Emilian.Medve@Freescale.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Emil Medve [Thu, 6 Nov 2014 15:48:13 +0000 (09:48 -0600)]
powerpc/dts: Add node(s) for the platform PLL
Signed-off-by: Emil Medve <Emilian.Medve@Freescale.com>
Change-Id: If76cd705a01813abe53396c1486bc13c4289ee92
Signed-off-by: Scott Wood <scottwood@freescale.com>
Emil Medve [Thu, 6 Nov 2014 15:48:12 +0000 (09:48 -0600)]
dt/bindings: qoriq-clock: Add binding for the platform PLL
Signed-off-by: Emil Medve <Emilian.Medve@Freescale.com>
Change-Id: I7950afa9650d15ec7ce2cca89bb2a1e38586d4a5
Signed-off-by: Scott Wood <scottwood@freescale.com>
Emil Medve [Thu, 6 Nov 2014 15:48:11 +0000 (09:48 -0600)]
powerpc/dts: Factorize the clock control node
Signed-off-by: Emil Medve <Emilian.Medve@Freescale.com>
Change-Id: I25ce24a25862b4ca460164159867abefe00ccdd1
Signed-off-by: Scott Wood <scottwood@freescale.com>
Hongtao Jia [Wed, 5 Nov 2014 06:59:53 +0000 (14:59 +0800)]
powerpc: Add INA220 to device tree for supported boards
Including: P3041DS P5020DS P5040DS B4QDS
Signed-off-by: Jia Hongtao <hongtao.jia@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Hongtao Jia [Wed, 5 Nov 2014 06:59:52 +0000 (14:59 +0800)]
powerpc: Add ADT7461 to device tree for supported boards
Including: T104xRDB T208xQDS B4QDS
Signed-off-by: Jia Hongtao <hongtao.jia@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Igal Liberman [Thu, 30 Oct 2014 09:15:47 +0000 (11:15 +0200)]
powerpc/fsl: Added rcw registers to global utility registers
The RCW registers are required for the future clock binding implementation.
Signed-off-by: Igal Liberman <Igal.Liberman@freescale.com>
Change-Id: Ic36dd8bc2959aa7f97fb6fd7bbb8420822fef0a9
Signed-off-by: Scott Wood <scottwood@freescale.com>
Ashish Kumar [Tue, 7 Oct 2014 12:34:36 +0000 (18:04 +0530)]
powerpc/mpc85xx: Remove SPI and NAND partition from bsc9131rdb.dtsi
* Run "mtdparts default" on u-boot to create dynamic partitions
* Or use dynamic mtd partition with the help of bootargs in u-boot
Append bootargs with:
"mtdparts=
ff800000.flash:1m(nand_uboot),512K(nand_dtb),8m(nand_kernel),-(fs);\
spiff707000.0:1m(spi_uboot),4m(spi_kernel),512k(spi_dtb),-(fs)'"
Signed-off-by: Ashish Kumar <Ashish.Kumar@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Paul Bolle [Wed, 24 Sep 2014 08:06:19 +0000 (10:06 +0200)]
powerpc/8xx: Remove Kconfig symbol FADS
Commit
39eb56da2b53 ("pcmcia: Remove m8xx_pcmcia driver") removed the
only driver that used CONFIG_FADS. Setting the Kconfig symbol FADS is
pointless since that commit. Remove it.
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Scott Wood <scottwood@freescale.com>
LEROY Christophe [Fri, 19 Sep 2014 08:36:10 +0000 (10:36 +0200)]
powerpc/8xx: Invalidate non present TLB as early as possible
8xx sometimes need to load a invalid/non-present TLBs in
it DTLB asm handler.
These must be invalidated separaly as linux mm doesn't.
Commit
5efab4a02c89c252fb4cce097aafde5f8208dbfe was invalidating them in
arch/powerpc/mm/fault.c.
This patch does the invalidation earlier in order to free the TLB as soon as
possible. This also has the advantage of removing some 8xx specific code from
fault.c
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <scottwood@freescale.com>