GitHub/LineageOS/android_kernel_motorola_exynos9610.git
10 years agopowerpc/eeh: Export eeh_iommu_group_to_pe()
Gavin Shan [Thu, 7 Aug 2014 02:47:16 +0000 (12:47 +1000)]
powerpc/eeh: Export eeh_iommu_group_to_pe()

The function is used by VFIO driver, which might be built as a
dynamic module. So it should be exported.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agopowerpc/eeh: Add missing #ifdef CONFIG_IOMMU_API
Benjamin Herrenschmidt [Tue, 5 Aug 2014 08:52:59 +0000 (18:52 +1000)]
powerpc/eeh: Add missing #ifdef CONFIG_IOMMU_API

Some new functions are exposed for use by the IOMMU code but
won't build when CONFIG_IOMMU_API isn't set, so shield them
appropriately.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agopowerpc: Reduce scariness of interrupt frames in stack traces
Paul Mackerras [Thu, 12 Jun 2014 06:53:08 +0000 (16:53 +1000)]
powerpc: Reduce scariness of interrupt frames in stack traces

Some people see things like "Exception: 501" in stack traces in dmesg
and assume that means that something has gone badly wrong, when in
fact "Exception: 501" just means a device interrupt was taken.
This changes "Exception" to "interrupt" to make it clearer that we
are just recording the fact of a change in control flow rather than
some error condition.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agopowerpc: start loop at section start of start in vmemmap_populated()
Li Zhong [Wed, 11 Jun 2014 08:23:39 +0000 (16:23 +0800)]
powerpc: start loop at section start of start in vmemmap_populated()

vmemmap_populated() checks whether the [start, start + page_size) has valid
pfn numbers, to know whether a vmemmap mapping has been created that includes
this range.

Some range before end might not be checked by this loop:
  sec11start......start11..sec11end/sec12start..end....start12..sec12end
as the above, for start11(section 11), it checks [sec11start, sec11end), and
loop ends as the next start(start12) is bigger than end. However,
[sec11end/sec12start, end) is not checked here.

So before the loop, adjust the start to be the start of the section, so we don't miss ranges like the above.

After we adjust start to be the start of the section, it also means it's
aligned with vmemmap as of the sizeof struct page, so we could use
page_to_pfn directly in the loop.

Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Acked-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agopowerpc: implement vmemmap_free()
Li Zhong [Wed, 11 Jun 2014 08:23:38 +0000 (16:23 +0800)]
powerpc: implement vmemmap_free()

vmemmap_free() does the opposite of vmemap_populate().
This patch also puts vmemmap_free() and vmemmap_list_free() into
 CONFIG_MEMMORY_HOTPLUG.

Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Acked-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agopowerpc: implement vmemmap_remove_mapping() for BOOK3S
Li Zhong [Wed, 11 Jun 2014 08:23:37 +0000 (16:23 +0800)]
powerpc: implement vmemmap_remove_mapping() for BOOK3S

This is to be called in vmemmap_free(), leave the implementation on BOOK3E
empty as before.

Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Acked-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agopowerpc: implement vmemmap_list_free()
Li Zhong [Wed, 11 Jun 2014 08:23:36 +0000 (16:23 +0800)]
powerpc: implement vmemmap_list_free()

This patch implements vmemmap_list_free() for vmemmap_free().

The freed entries will be removed from vmemmap_list, and form a freed list,
with next as the header. The next position in the last allocated page is kept
at the list tail.

When allocation, if there are freed entries left, get it from the freed list;
if no freed entries left, get it like before from the last allocated pages.

With this change, realmode_pfn_to_page() also needs to be changed to walk
all the entries in the vmemmap_list, as the virt_addr of the entries might not
be stored in order anymore.

It helps to reuse the memory when continuous doing memory hot-plug/remove
operations, but didn't reclaim the pages already allocated, so the memory usage
will only increase, but won't exceed the value for the largest memory
configuration.

Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Acked-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agopowerpc: Fail remap_4k_pfn() if PFN doesn't fit inside PTE
Madhusudanan Kandasamy [Thu, 10 Jul 2014 15:15:13 +0000 (20:45 +0530)]
powerpc: Fail remap_4k_pfn() if PFN doesn't fit inside PTE

remap_4k_pfn() silently truncates upper bits of input 4K PFN
if it cannot be contained in PTE. This leads invalid memory mapping and could
result in a system crash when the memory is accessed. This patch fails
remap_4k_pfn() and returns -EINVAL if the input 4K PFN cannot be contained in
PTE.

V3 : Added parentheses to protect 'pfn' and entire macro as suggested by Brian.
V2 : Rewritten to avoid helper function as suggested by Stephen Rothwell.

Signed-off-by: Madhusudanan Kandasamy <kmadhu@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agopowerpc/book3s: Fix endianess issue for HMI handling on napping cpus.
Mahesh Salgaonkar [Thu, 31 Jul 2014 12:47:52 +0000 (18:17 +0530)]
powerpc/book3s: Fix endianess issue for HMI handling on napping cpus.

(NOTE: This patch depends on upstream HMI handling patchset at
 https://lists.ozlabs.org/pipermail/linuxppc-dev/2014-July/119731.html)

The current HMI handling on napping cpus does not take care of endianess
issue. On LE host kernel when we wake up from nap due to HMI interrupt we
would checkstop while jumping into opal call. There is a similar issue in
case of fast sleep wakeup where the code invokes opal_resync_tb opal call
without handling LE issue. This patch fixes that as well.

With this patch applied, HMIs handling on LE host kernel works fine.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agopowerpc/book3s: handle HMIs for cpus in nap mode.
Mahesh Salgaonkar [Tue, 29 Jul 2014 13:10:13 +0000 (18:40 +0530)]
powerpc/book3s: handle HMIs for cpus in nap mode.

HMIs are thread specific and can come while thread is in sleep/nap mode.
Hence with SMT=off mode we can receive HMIs on sleeping threads. For
interrupt received in nap mode, cpu wakes up at system reset vector, clears
the interrupt and go back to nap mode again. But HMIs are sticky and they
keep happening until we clear reason bits from HMER. Hence add a special
check for HMI in reset vector (through power7_wakeup_* functions) and
invoke opal call to handle HMI.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agopowerpc/powernv: Invoke opal call to handle hmi.
Mahesh Salgaonkar [Tue, 29 Jul 2014 13:10:07 +0000 (18:40 +0530)]
powerpc/powernv: Invoke opal call to handle hmi.

When we hit the HMI in Linux, invoke opal call to handle/recover from HMI
errors in real mode and then in virtual mode during check_irq_replay()
invoke opal_poll_events()/opal_do_notifier() to retrieve HMI event from
OPAL and act accordingly.

Now that we are ready to handle HMI interrupt directly in linux, remove
the HMI interrupt registration with firmware.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agopowerpc/book3s: Add basic infrastructure to handle HMI in Linux.
Mahesh Salgaonkar [Tue, 29 Jul 2014 13:10:01 +0000 (18:40 +0530)]
powerpc/book3s: Add basic infrastructure to handle HMI in Linux.

Handle Hypervisor Maintenance Interrupt (HMI) in Linux. This patch implements
basic infrastructure to handle HMI in Linux host. The design is to invoke
opal handle hmi in real mode for recovery and set irq_pending when we hit HMI.
During check_irq_replay pull opal hmi event and print hmi info on console.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agopowerpc/iommu: Fix comments with it_page_shift
Alexey Kardashevskiy [Tue, 15 Jul 2014 09:24:25 +0000 (19:24 +1000)]
powerpc/iommu: Fix comments with it_page_shift

There is a couple of commented debug prints which still use
IOMMU_PAGE_SHIFT() which is not defined for POWERPC anymore, replace
them with it_page_shift.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agopowerpc/powernv: Handle compound PE in config accessors
Gavin Shan [Mon, 21 Jul 2014 04:42:35 +0000 (14:42 +1000)]
powerpc/powernv: Handle compound PE in config accessors

The PCI config accessors check for PE frozen state and clear it if
EEH isn't functional. The patch handles compound PE in config accessors
if PHB supports it. For consistency, all PEs will be put into frozen
state if any one in compound group gets frozen by hardware.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agopowerpc/powernv: Handle compound PE for EEH
Gavin Shan [Mon, 21 Jul 2014 04:42:34 +0000 (14:42 +1000)]
powerpc/powernv: Handle compound PE for EEH

The patch handles compound PE for EEH backend. If one specific
PE in compound group has been frozen, we enforces to freeze
all PEs in the group. If we're enable DMA or MMIO for one PE
in compound group, DMA or MMIO of all PEs in the group will be
enabled.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agopowerpc/powernv: Handle compound PE
Gavin Shan [Mon, 21 Jul 2014 04:42:33 +0000 (14:42 +1000)]
powerpc/powernv: Handle compound PE

The patch introduces 3 PHB callbacks: compound PE state retrieval,
force freezing and unfreezing compound PE. The PCI config accessors
and PowerNV EEH backend can use them in subsequent patches.

We don't export the capability of compound PE to EEH core, which
helps avoiding more complexity to EEH core.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agopowerpc/powernv: Split ioda_eeh_get_state()
Gavin Shan [Mon, 21 Jul 2014 04:42:32 +0000 (14:42 +1000)]
powerpc/powernv: Split ioda_eeh_get_state()

Function ioda_eeh_get_state() is used to fetch EEH state for PHB
or PE. We're going to support compound PE and the function becomes
more complicated with that. The patch splits the function into two
functions for PHB and PE cases separately to improve readability.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agopowerpc/powernv: Allow to freeze PE
Gavin Shan [Mon, 21 Jul 2014 04:42:31 +0000 (14:42 +1000)]
powerpc/powernv: Allow to freeze PE

The patch synchronizes header file with firmware to have new OPAL
API opal_pci_eeh_freeze_set(), which is used to freeze the specified
PE in order to support "compound" PE.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agopowerpc/powernv: Enable M64 aperatus for PHB3
Guo Chao [Mon, 21 Jul 2014 04:42:30 +0000 (14:42 +1000)]
powerpc/powernv: Enable M64 aperatus for PHB3

This patch enables M64 aperatus for PHB3.

We already had platform hook (ppc_md.pcibios_window_alignment) to affect
the PCI resource assignment done in PCI core so that each PE's M32 resource
was built on basis of M32 segment size. Similarly, we're using that for
M64 assignment on basis of M64 segment size.

   * We're using last M64 BAR to cover M64 aperatus, and it's shared by all
     256 PEs.
   * We don't support P7IOC yet. However, some function callbacks are added
     to (struct pnv_phb) so that we can reuse them on P7IOC in future.
   * PE, corresponding to PCI bus with large M64 BAR device attached, might
     span multiple M64 segments. We introduce "compound" PE to cover the case.
     The compound PE is a list of PEs and the master PE is used as before.
     The slave PEs are just for MMIO isolation.

Signed-off-by: Guo Chao <yan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agopowerpc/eeh: Aux PE data for error log
Gavin Shan [Thu, 17 Jul 2014 04:41:43 +0000 (14:41 +1000)]
powerpc/eeh: Aux PE data for error log

The patch allows PE (struct eeh_pe) instance to have auxillary data,
whose size is configurable on basis of platform. For PowerNV, the
auxillary data will be used to cache PHB diag-data for that PE
(frozen PE or fenced PHB). In turn, we can retrieve the diag-data
at any later points.

It's useful for the case of VFIO PCI devices where the error log
should be cached, and then be retrieved by the guest at later point.
Also, it can avoid PHB diag-data overwritting if another frozen PE
reported and the previous diag-data isn't fetched by guest.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agopowerpc/eeh: Make diag-data not endian dependent
Gavin Shan [Thu, 17 Jul 2014 04:41:42 +0000 (14:41 +1000)]
powerpc/eeh: Make diag-data not endian dependent

It's followup of commit ddf0322a ("powerpc/powernv: Fix endianness
problems in EEH"). The patch helps to get non-endian-dependent
diag-data.

Cc: Guo Chao <yan@linux.vnet.ibm.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agopowerpc/eeh: Replace pr_warning() with pr_warn()
Gavin Shan [Thu, 17 Jul 2014 04:41:41 +0000 (14:41 +1000)]
powerpc/eeh: Replace pr_warning() with pr_warn()

pr_warn() is equal to pr_warning(), but the former is a bit more
formal according to commit fc62f2f ("kernel.h: add pr_warn for
symmetry to dev_warn, netdev_warn").

The patch replaces pr_warning() with pr_warn().

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agopowerpc/eeh: Reduce lines of log dump
Gavin Shan [Thu, 17 Jul 2014 04:41:40 +0000 (14:41 +1000)]
powerpc/eeh: Reduce lines of log dump

The patch prints 4 PCIE or AER config registers each line, which
is part of the EEH log so that it looks a bit more compact.

Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agopowerpc/eeh: Selectively enable IO for error log
Gavin Shan [Thu, 17 Jul 2014 04:41:39 +0000 (14:41 +1000)]
powerpc/eeh: Selectively enable IO for error log

According to the experiment I did, PCI config access is blocked
on P7IOC frozen PE by hardware, but PHB3 doesn't do that. That
means we always get 0xFF's while dumping PCI config space of the
frozen PE on P7IOC. We don't have the problem on PHB3. So we have
to enable I/O prioir to collecting error log. Otherwise, meaningless
0xFF's are always returned.

The patch fixes it by EEH flag (EEH_ENABLE_IO_FOR_LOG), which is
selectively set to indicate the case for: P7IOC on PowerNV platform,
pSeries platform.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agopowerpc/eeh: Refactor EEH flag accessors
Gavin Shan [Thu, 17 Jul 2014 04:41:38 +0000 (14:41 +1000)]
powerpc/eeh: Refactor EEH flag accessors

There are multiple global EEH flags. Almost each flag has its own
accessor, which doesn't make sense. The patch refactors EEH flag
accessors so that they look unified:

  eeh_add_flag():   Add EEH flag
  eeh_clear_flag(): Clear EEH flag
  eeh_has_flag():   Check if one specific flag has been set
  eeh_enabled():    Check if EEH functionality has been enabled

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agopowerpc/eeh: Fetch IOMMU table in reliable way
Gavin Shan [Tue, 15 Jul 2014 07:00:56 +0000 (17:00 +1000)]
powerpc/eeh: Fetch IOMMU table in reliable way

Function eeh_iommu_group_to_pe() iterates each PCI device to check
the binding IOMMU group with get_iommu_table_base(), which possibly
fetches pdev->dev.archdata.dma_data.dma_offset. It's (0x1 << 59)
for "bypass" cases.

The patch fixes the issue by iterating devices hooked to the IOMMU
group and fetch IOMMU table there.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agopowerpc/powernv: Fix IOMMU table for VFIO dev
Gavin Shan [Tue, 15 Jul 2014 07:00:55 +0000 (17:00 +1000)]
powerpc/powernv: Fix IOMMU table for VFIO dev

On PHB3, PCI devices can bypass IOMMU for DMA access. If we pass
through one PCI device, whose hose driver ever enable the bypass
mode, pdev->dev.archdata.dma_data.iommu_table_base isn't IOMMU
table. However, EEH needs access the IOMMU table when the device
is owned by guest.

The patch fixes pdev->dev.archdata.dma_data.iommu_table when
passing through the device to guest in pnv_pci_ioda2_set_bypass().

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agopowerpc/eeh: Wrong place to call pci_get_slot()
Mike Qiu [Tue, 15 Jul 2014 05:42:22 +0000 (01:42 -0400)]
powerpc/eeh: Wrong place to call pci_get_slot()

pci_get_slot() is called with hold of PCI bus semaphore and it's not
safe to be called in interrupt context. However, we possibly checks
EEH error and calls the function in interrupt context. To avoid using
pci_get_slot(), we turn into device tree for fetching location code.
Otherwise, we might run into WARN_ON() as following messages indicate:

 WARNING: at drivers/pci/search.c:223
 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.16.0-rc3+ #72
 task: c000000001367af0 ti: c000000001444000 task.ti: c000000001444000
 NIP: c000000000497b70 LR: c000000000037530 CTR: 000000003003d114
 REGS: c000000001446fa0 TRAP: 0700   Not tainted  (3.16.0-rc3+)
 MSR: 9000000000029032 <SF,HV,EE,ME,IR,DR,RI>  CR: 48002422  XER: 20000000
 CFAR: c00000000003752c SOFTE: 0
   :
 NIP [c000000000497b70] .pci_get_slot+0x40/0x110
 LR [c000000000037530] .eeh_pe_loc_get+0x150/0x190
 Call Trace:
   .of_get_property+0x30/0x60 (unreliable)
   .eeh_pe_loc_get+0x150/0x190
   .eeh_dev_check_failure+0x1b4/0x550
   .eeh_check_failure+0x90/0xf0
   .lpfc_sli_check_eratt+0x504/0x7c0 [lpfc]
   .lpfc_poll_eratt+0x64/0x100 [lpfc]
   .call_timer_fn+0x64/0x190
   .run_timer_softirq+0x2cc/0x3e0

Cc: stable@vger.kernel.org
Signed-off-by: Mike Qiu <qiudayu@linux.vnet.ibm.com>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agopowerpc: Fix wrong defintion in boot/io.h
Lucas Tanure [Thu, 24 Jul 2014 13:24:06 +0000 (10:24 -0300)]
powerpc: Fix wrong defintion in boot/io.h

Fix wrong __IO_H definition in boot/io.h

Reported-by: Fernando Silveira <fsilveira@gmail.com>
Signed-off-by: Lucas Tanure <tanure@linux.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agopowerpc/pci: Reorder pci bus/bridge unregistration during PHB removal
Tyrel Datwyler [Tue, 29 Jul 2014 17:48:13 +0000 (13:48 -0400)]
powerpc/pci: Reorder pci bus/bridge unregistration during PHB removal

Commit bcdde7e made __sysfs_remove_dir() recursive and introduced a BUG_ON
during PHB removal while attempting to delete the power managment attribute
group of the bus. This is a result of tearing the bridge and bus devices down
out of order in remove_phb_dynamic. Since, the the bus resides below the bridge
in the sysfs device tree it should be torn down first.

This patch simply moves the device_unregister call for the PHB bridge device
after the device_unregister call for the PHB bus.

Fixes: bcdde7e221a8 ("sysfs: make __sysfs_remove_dir() recursive")
Cc: stable@vger.kernel.org
Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agopowerpc/powernv: Update dev->dma_mask in pci_set_dma_mask() path
Brian W Hart [Thu, 31 Jul 2014 19:24:37 +0000 (14:24 -0500)]
powerpc/powernv: Update dev->dma_mask in pci_set_dma_mask() path

powerpc defines various machine-specific routines for handling
pci_set_dma_mask().  The routines for machine "PowerNV" may neglect
to set dev->dma_mask.  This could confuse anyone (e.g. drivers) that
consult dev->dma_mask to find the current mask.  Set the dma_mask in
the PowerNV leaf routine.

Signed-off-by: Brian W. Hart <hartb@linux.vnet.ibm.com>
CC: <stable@vger.kernel.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agopowerpc/64e: Add __ref to early_alloc_pgtable()
Scott Wood [Sat, 2 Aug 2014 03:07:40 +0000 (22:07 -0500)]
powerpc/64e: Add __ref to early_alloc_pgtable()

This silences a section mismatch warning.  early_alloc_pgtable() is
called from map_kernel_page() which cannot be __init, but only when
slab_is_available() returns false which can only happen during early
boot.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agopowerpc/cpuidle: Fix parsing of idle state flags from device-tree
Vaidyanathan Srinivasan [Sun, 3 Aug 2014 07:53:08 +0000 (13:23 +0530)]
powerpc/cpuidle: Fix parsing of idle state flags from device-tree

Flags from device-tree need to be parsed with accessors for
interpreting correct value in little-endian.

Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Reviewed-by: Preeti U. Murthy <preeti@linux.vnet.ibm.com>
CC: <stable@vger.kernel.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agopowerpc/cpufreq: Add pr_warn() on OPAL firmware failures
Vaidyanathan Srinivasan [Sun, 3 Aug 2014 09:24:05 +0000 (14:54 +0530)]
powerpc/cpufreq: Add pr_warn() on OPAL firmware failures

Cpufreq depends on platform firmware to implement PStates.  In case of
platform firmware failure, cpufreq should not panic host kernel with
BUG_ON().  Less severe pr_warn() will suffice.

Add firmware_has_feature(FW_FEATURE_OPALv3) check to
skip probing for device-tree on non-powernv platforms.

Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Acked-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
CC: <stable@vger.kernel.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agopowerpc/mm/numa: Fix break placement
Andrey Utkin [Mon, 4 Aug 2014 20:13:10 +0000 (23:13 +0300)]
powerpc/mm/numa: Fix break placement

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=81631
Reported-by: David Binderman <dcb314@hotmail.com>
Signed-off-by: Andrey Utkin <andrey.krieger.utkin@gmail.com>
CC: <stable@vger.kernel.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agopowerpc/eeh: sysfs entries lost
Mike Qiu [Thu, 26 Jun 2014 06:58:47 +0000 (02:58 -0400)]
powerpc/eeh: sysfs entries lost

The sysfs entries are lost because of commit 2213fb1 ("powerpc/eeh:
Skip eeh sysfs when eeh is disabled"). That commit added condition
to create sysfs entries with EEH_ENABLED, which isn't populated
when trying to create sysfs entries on PowerNV platform during system
boot time. The patch fixes the issue by:

   * Reoder EEH initialization functions so that they're same on
     PowerNV/pSeries.
   * Cache PE's primary bus by PowerNV platform instead of EEH core
     to avoid kernel crash caused by the function reorder. Another
     benefit with this is to avoid one eeh_probe_mode_dev() in EEH
     core.

Signed-off-by: Mike Qiu <qiudayu@linux.vnet.ibm.com>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agodrivers/vfio: EEH support for VFIO PCI device
Gavin Shan [Tue, 10 Jun 2014 01:41:57 +0000 (11:41 +1000)]
drivers/vfio: EEH support for VFIO PCI device

The patch adds new IOCTL commands for sPAPR VFIO container device
to support EEH functionality for PCI devices, which have been passed
through from host to somebody else via VFIO.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: Alexander Graf <agraf@suse.de>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agopowerpc/eeh: EEH support for VFIO PCI device
Gavin Shan [Tue, 10 Jun 2014 01:41:56 +0000 (11:41 +1000)]
powerpc/eeh: EEH support for VFIO PCI device

The patch exports functions to be used by new VFIO ioctl command,
which will be introduced in subsequent patch, to support EEH
functinality for VFIO PCI devices.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agopowerpc/eeh: Avoid event on passed PE
Gavin Shan [Tue, 10 Jun 2014 01:41:55 +0000 (11:41 +1000)]
powerpc/eeh: Avoid event on passed PE

We must not handle EEH error on devices which are passed to somebody
else. Instead, we expect that the frozen device owner detects an EEH
error and recovers from it.

This avoids EEH error handling on passed through devices so the device
owner gets a chance to handle them.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agoMerge remote-tracking branch 'scott/next' into next
Benjamin Herrenschmidt [Tue, 5 Aug 2014 04:13:41 +0000 (14:13 +1000)]
Merge remote-tracking branch 'scott/next' into next

Scott writes:

Highlights include e6500 hardware threading support, an e6500 TLB erratum
workaround, corenet error reporting, support for a new board, and some
minor fixes.

10 years agopowerpc/t2080rdb: Add T2080RDB board support
Shengzhou Liu [Tue, 8 Jul 2014 06:52:33 +0000 (14:52 +0800)]
powerpc/t2080rdb: Add T2080RDB board support

T2080PCIe-RDB is a Freescale Reference Design Board that hosts T2080 SoC.
The board feature overview:
Processor:
 - T2080 SoC integrating four 64-bit dual-threads e6500 cores up to 1.8GHz
DDR Memory:
 - Single memory controller capable of supporting DDR3 and DDR3-LP devices
 - 72bit 4GB DDR3-LP SODIMM in slot
Ethernet interfaces:
 - Two 1Gbps RGMII ports on-board
 - Two 10Gbps SFP+ ports on-board
 - Two 10Gbps Base-T ports on-board
Accelerator:
 - DPAA components consist of FMan, BMan, QMan, PME, DCE and SEC
IFC/Local Bus
 - NOR:  128MB 16-bit NOR flash
 - NAND: 1GB 8-bit NAND flash
 - CPLD: for system controlling with programable header on-board
eSPI:
 - 64MB N25Q512 SPI flash
USB:
 - Two USB2.0 ports with internal PHY (both Type-A)
PCIe:
 - One PCIe x4 goldfinger(support SR-IOV)
 - One PCIe x4 slot
 - One PCIe x2 end-point device (C293 crypto co-processor)
SATA:
 - Two SATA 2.0 ports on-board
SDHC:
 - support a MicroSD/TF card on-board
I2C:
 - Four I2C controllers.
UART:
 - Dual 4-pins UART serial ports

Signed-off-by: Shengzhou Liu <Shengzhou.Liu@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
10 years agopowerpc/85xx: Add binding for CPLD
Priyanka Jain [Wed, 9 Jul 2014 03:47:07 +0000 (09:17 +0530)]
powerpc/85xx: Add binding for CPLD

Some Freescale boards like T1040RDB have an on board CPLD connected on
the IFC bus. Add binding for cpld in board.txt file

Signed-off-by: Priyanka Jain <Priyanka.Jain@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
10 years agopowerpc/fsl-pci: Correct use of ! and &
Himangi Saraogi [Sat, 19 Jul 2014 21:49:59 +0000 (03:19 +0530)]
powerpc/fsl-pci: Correct use of ! and &

In commit ae91d60ba88ef0bdb1b5e9b2363bd52fc45d2af7, a bug was fixed that
involved converting !x & y to !(x & y).  The code below shows the same
pattern, and thus should perhaps be fixed in the same way.

This is not tested and clearly changes the semantics, so it is only
something to consider.

The Coccinelle semantic patch that makes this change is as follows:

// <smpl>
@@ expression E1,E2; @@
(
  !E1 & !E2
|
- !E1 & E2
+ !(E1 & E2)
)
// </smpl>

Signed-off-by: Himangi Saraogi <himangi774@gmail.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
10 years agopowerpc/mpic_msgr: Use kcalloc and correct the argument to sizeof
Himangi Saraogi [Sat, 19 Jul 2014 21:53:35 +0000 (03:23 +0530)]
powerpc/mpic_msgr: Use kcalloc and correct the argument to sizeof

mpic_msgrs has type struct mpic_msgr **, not struct mpic_msgr *, so the
elements of the array should have pointer type, not structure type.
The advantage of kcalloc is, that will prevent integer overflows which
could result from the multiplication of number of elements and size and
it is also a bit nicer to read.

The Coccinelle semantic patch that makes the first change is as follows:

// <smpl>
@disable sizeof_type_expr@
type T;
T **x;
@@

  x =
  <+...sizeof(
- T
+ *x
  )...+>
// </smpl>

Signed-off-by: Himangi Saraogi <himangi774@gmail.com>
Acked-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Scott Wood <scottwood@freescale.com>
10 years agomemory: Freescale CoreNet Coherency Fabric error reporting driver
Scott Wood [Wed, 2 Jul 2014 23:52:11 +0000 (18:52 -0500)]
memory: Freescale CoreNet Coherency Fabric error reporting driver

The CoreNet Coherency Fabric is part of the memory subsystem on
some Freescale QorIQ chips.  It can report coherency violations (e.g.
due to misusing memory that is mapped noncoherent) as well as
transactions that do not hit any local access window, or which hit a
local access window with an invalid target ID.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Reviewed-by: Bharat Bhushan <bharat.bhushan@freescale.com>
10 years agopowerpc/e6500: Work around erratum A-008139
Scott Wood [Wed, 11 Jun 2014 21:09:32 +0000 (16:09 -0500)]
powerpc/e6500: Work around erratum A-008139

Erratum A-008139 can cause duplicate TLB entries if an indirect
entry is overwritten using tlbwe while the other thread is using it to
do a lookup.  Work around this by using tlbilx to invalidate prior
to overwriting.

To avoid the need to save another register to hold MAS1 during the
workaround code, TID clearing has been moved from tlb_miss_kernel_e6500
until after the SMT section.

Signed-off-by: Scott Wood <scottwood@freescale.com>
10 years agopowerpc/e6500: Add support for hardware threads
Andy Fleming [Thu, 8 Dec 2011 07:20:27 +0000 (01:20 -0600)]
powerpc/e6500: Add support for hardware threads

The general idea is that each core will release all of its
threads into the secondary thread startup code, which will
eventually wait in the secondary core holding area, for the
appropriate bit in the PACA to be set. The kick_cpu function
pointer will set that bit in the PACA, and thus "release"
the core/thread to boot. We also need to do a few things that
U-Boot normally does for CPUs (like enable branch prediction).

Signed-off-by: Andy Fleming <afleming@freescale.com>
[scottwood@freescale.com: various changes, including only enabling
 threads if Linux wants to kick them]
Signed-off-by: Scott Wood <scottwood@freescale.com>
10 years agopowerpc/booke: Define MSR bits the same way as reg.h
Scott Wood [Wed, 23 Jul 2014 19:38:02 +0000 (14:38 -0500)]
powerpc/booke: Define MSR bits the same way as reg.h

This ensures that all MSR definitions are consistently unsigned long,
and that MSR_CM does not become 0xffffffff80000000 (this is usually
harmless because MSR is 32-bit on booke and is mainly noticeable when
debugging, but still I'd rather avoid it).

Signed-off-by: Scott Wood <scottwood@freescale.com>
10 years agoAdd Michael Ellerman as powerpc co-maintainer
Benjamin Herrenschmidt [Mon, 28 Jul 2014 03:54:37 +0000 (13:54 +1000)]
Add Michael Ellerman as powerpc co-maintainer

Michael has been backing me up and helping will all aspects of
maintainership for a while now, let's make it official.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agopowerpc/perf: Add per-event excludes on Power8
Michael Ellerman [Wed, 23 Jul 2014 11:12:38 +0000 (21:12 +1000)]
powerpc/perf: Add per-event excludes on Power8

Power8 has a new register (MMCR2), which contains individual freeze bits
for each counter. This is an improvement on previous chips as it means
we can have multiple events on the PMU at the same time with different
exclude_{user,kernel,hv} settings. Previously we had to ensure all
events on the PMU had the same exclude settings.

The core of the patch is fairly simple. We use the 207S feature flag to
indicate that the PMU backend supports per-event excludes, if it's set
we skip the generic logic that enforces the equality of excludes between
events. We also use that flag to skip setting the freeze bits in MMCR0,
the PMU backend is expected to have handled setting them in MMCR2.

The complication arises with EBB. The FCxP bits in MMCR2 are accessible
R/W to a task using EBB. Which means a task using EBB will be able to
see that we are using MMCR2 for freezing, whereas the old logic which
used MMCR0 is not user visible.

The task can not see or affect exclude_kernel & exclude_hv, so we only
need to consider exclude_user.

The table below summarises the behaviour both before and after this
commit is applied:

 exclude_user           true  false
 ------------------------------------
        | User visible |  N    N
 Before | Can freeze   |  Y    Y
        | Can unfreeze |  N    Y
 ------------------------------------
        | User visible |  Y    Y
  After | Can freeze   |  Y    Y
        | Can unfreeze |  Y/N  Y
 ------------------------------------

So firstly I assert that the simple visibility of the exclude_user
setting in MMCR2 is a non-issue. The event belongs to the task, and
was most likely created by the task. So the exclude_user setting is not
privileged information in any way.

Secondly, the behaviour in the exclude_user = false case is unchanged.
This is important as it is the case that is actually useful, ie. the
event is created with no exclude setting and the task uses MMCR2 to
implement exclusion manually.

For exclude_user = true there is no meaningful change to freezing the
event. Previously the task could use MMCR2 to freeze the event, though
it was already frozen with MMCR0. With the new code the task can use
MMCR2 to freeze the event, though it was already frozen with MMCR2.

The only real change is when exclude_user = true and the task tries to
use MMCR2 to unfreeze the event. Previously this had no effect, because
the event was already frozen in MMCR0. With the new code the task can
unfreeze the event in MMCR2, but at some indeterminate time in the
future the kernel will overwrite its setting and refreeze the event.

Therefore my final assertion is that any task using exclude_user = true
and also fiddling with MMCR2 was deeply confused before this change, and
remains so after it.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agopowerpc/perf: Pass the struct perf_events down to compute_mmcr()
Michael Ellerman [Wed, 23 Jul 2014 11:12:37 +0000 (21:12 +1000)]
powerpc/perf: Pass the struct perf_events down to compute_mmcr()

To support per-event exclude settings on Power8 we need access to the
struct perf_events in compute_mmcr().

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agopowerpc/perf: Clear all MMCR settings before calling compute_mmcr()
Michael Ellerman [Wed, 23 Jul 2014 11:12:36 +0000 (21:12 +1000)]
powerpc/perf: Clear all MMCR settings before calling compute_mmcr()

Because we reuse cpuhw->mmcr on each call to compute_mmcr() there's a
risk that we could forget to set one of the values and use whatever
value was in there previously.

Currently all the implementations are careful to set all the values, but
it's safer to clear them all before we call compute_mmcr().

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agoselftests/powerpc: Add test of per-event excludes
Michael Ellerman [Wed, 23 Jul 2014 07:31:40 +0000 (17:31 +1000)]
selftests/powerpc: Add test of per-event excludes

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agoselftests/powerpc: Add a routine for retrieving an AUXV entry
Michael Ellerman [Wed, 23 Jul 2014 07:31:39 +0000 (17:31 +1000)]
selftests/powerpc: Add a routine for retrieving an AUXV entry

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agoselftests/powerpc: Add cycles test with MMCR2 handling
Michael Ellerman [Wed, 23 Jul 2014 07:31:38 +0000 (17:31 +1000)]
selftests/powerpc: Add cycles test with MMCR2 handling

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agoselftests/powerpc: Dump MMCR2 as part of the EBB HW state
Michael Ellerman [Wed, 23 Jul 2014 07:31:37 +0000 (17:31 +1000)]
selftests/powerpc: Dump MMCR2 as part of the EBB HW state

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agoselftests/powerpc: Count more instructions & use decimal
Michael Ellerman [Wed, 23 Jul 2014 07:31:36 +0000 (17:31 +1000)]
selftests/powerpc: Count more instructions & use decimal

Although we expect some small discrepancies for very large counts, we
seem to be able to count up to 64 billion instructions without too much
skew, so do so.

Also switch to using decimals for the instruction counts. This just
makes it easier to visually compare the expected vs actual values, as
well as the raw result from instructions.

Before:

  instructions: result 68719476753 running/enabled 13101961654
  cycles: result 38077343785 running/enabled 13101725752
  Looped for 68719476736 instructions, overhead 17
  Expected 68719476753
  Actual   68719476753
  Delta    0, 0.000000%
  success: count_instructions

After:
  instructions: result 64000000016 running/enabled 12197599964
  cycles: result 35412471674 running/enabled 12197534110
  Looped for 64000000000 instructions, overhead 16
  Expected 64000000016
  Actual   64000000016
  Delta    0, 0.000000%
  success: count_instructions

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agoselftests/powerpc: Count instructions under scheduler pressure
Michael Ellerman [Wed, 23 Jul 2014 07:31:35 +0000 (17:31 +1000)]
selftests/powerpc: Count instructions under scheduler pressure

Have a task eat some cpu while we are counting instructions to create
some scheduler pressure. The idea being to try and unearth any bugs we
have in counting that only appear when context switching is happening.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agoselftests/powerpc: Add test of L3 bank handling
Michael Ellerman [Wed, 23 Jul 2014 07:31:34 +0000 (17:31 +1000)]
selftests/powerpc: Add test of L3 bank handling

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agoselftests/powerpc: Move core_busy_loop() into asm
Michael Ellerman [Wed, 23 Jul 2014 07:31:33 +0000 (17:31 +1000)]
selftests/powerpc: Move core_busy_loop() into asm

There is at least one bug in core_busy_loop(), we use r0, but it's
not in the clobber list. We were getting away with this it seems but
that was luck.

It's also fishy to be touching the stack, even if we do it below the
stack pointer. It seems we get away with it, but looking at the
generated code that may just be luck.

So move it into assembler, do all the stack handling by hand. We create
a stack frame to save the non-volatiles in, so we can muck around with
them.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agoselftests/powerpc: Fix parse_proc_maps()
Michael Ellerman [Wed, 23 Jul 2014 07:31:32 +0000 (17:31 +1000)]
selftests/powerpc: Fix parse_proc_maps()

start and end should be unsigned long.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agoselftests/powerpc: Don't ignore errors from sub Makefiles
Michael Ellerman [Wed, 23 Jul 2014 07:31:31 +0000 (17:31 +1000)]
selftests/powerpc: Don't ignore errors from sub Makefiles

Currently we ignore errors from our sub Makefiles. We inherited that
from the top-level selftests Makefile which aims to build and run as
many tests as possible and damn the torpedoes.

For the powerpc tests we'd instead like any errors to fail the build, so
we can automatically catch build failures.

We can achieve the best of both worlds by using -k, which tells make to
keep building when it hits an error, but still reports the error.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agopowerpc: Document how we set AIL on guest kernels
Michael Ellerman [Thu, 17 Jul 2014 05:29:45 +0000 (15:29 +1000)]
powerpc: Document how we set AIL on guest kernels

I spent ten minutes scratching my head, trying to work out where we
enabled relocation on interrupts for guest kernels. Expand the doco to
make it clear.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agopowerpc/pseries: Switch pseries drivers to use machine_xxx_initcall()
Michael Ellerman [Wed, 16 Jul 2014 02:02:43 +0000 (12:02 +1000)]
powerpc/pseries: Switch pseries drivers to use machine_xxx_initcall()

A lot of the code in platforms/pseries is using non-machine initcalls.
That means if a kernel built with pseries support runs on another
platform, for example powernv, the initcalls will still run.

Most of these cases are OK, though sometimes only due to luck. Some were
having more effect:

 * hcall_inst_init
  - Checking FW_FEATURE_LPAR which is set on ps3 & celleb.
 * mobility_sysfs_init
  - created sysfs files unconditionally
  - but no effect due to ENOSYS from rtas_ibm_suspend_me()
 * apo_pm_init
  - created sysfs, allows write
  - nothing checks the value written to though
 * alloc_dispatch_log_kmem_cache
  - creating kmem_cache on non-pseries machines

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agopowerpc/powernv: Switch powernv drivers to use machine_xxx_initcall()
Michael Ellerman [Tue, 15 Jul 2014 12:22:24 +0000 (22:22 +1000)]
powerpc/powernv: Switch powernv drivers to use machine_xxx_initcall()

A lot of the code in platforms/powernv is using non-machine initcalls.
That means if a kernel built with powernv support runs on another
platform, for example pseries, the initcalls will still run.

That is usually OK, because the initcalls will check for something in
the device tree or elsewhere before doing anything, so on other
platforms they will usually just return.

But it's fishy for powernv code to be running on other platforms, so
switch them all to be machine initcalls. If we want any of them to run
on other platforms in future they should move to sysdev.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agopowerpc: Add machine_early_initcall()
Michael Ellerman [Tue, 15 Jul 2014 12:22:23 +0000 (22:22 +1000)]
powerpc: Add machine_early_initcall()

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agopowerpc: Remove misleading DISABLE_INTS
Michael Ellerman [Tue, 15 Jul 2014 11:15:38 +0000 (21:15 +1000)]
powerpc: Remove misleading DISABLE_INTS

DISABLE_INTS has a long and storied history, but for some time now it
has not actually disabled interrupts.

For the open-coded exception handlers, just stop using it, instead call
RECONCILE_IRQ_STATE directly. This has the benefit of removing a level
of indirection, and making it clear that r10 & r11 are used at that
point.

For the addition case we still need a macro, so rename it to clarify
what it actually does.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agopowerpc: Document register clobbering in EXCEPTION_COMMON()
Michael Ellerman [Tue, 15 Jul 2014 11:15:37 +0000 (21:15 +1000)]
powerpc: Document register clobbering in EXCEPTION_COMMON()

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agopowerpc: Update comments in irqflags.h
Michael Ellerman [Tue, 15 Jul 2014 11:15:36 +0000 (21:15 +1000)]
powerpc: Update comments in irqflags.h

The comment on TRACE_ENABLE_INTS is incorrect, and appears to have
always been incorrect since the code was merged. It probably came from
an original out-of-tree patch.

Replace it with something that's correct. Also propagate the message to
RECONCILE_IRQ_STATE(), because it's potentially subtle.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agopowerpc: Move bad_stack() below the fwnmi_data_area
Michael Ellerman [Tue, 15 Jul 2014 10:25:02 +0000 (20:25 +1000)]
powerpc: Move bad_stack() below the fwnmi_data_area

At the moment the allmodconfig build is failing because we run out of
space between altivec_assist() at 0x5700 and the fwnmi_data_area at
0x7000.

Fixing it permanently will take some more work, but a quick fix is to
move bad_stack() below the fwnmi_data_area. That gives us just enough
room with everything enabled.

bad_stack() is called from the common exception handlers, but it's a
non-conditional branch, so we have plenty of scope to move it further
way.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agopowerpc: Remove CLASSIC_PPC
Michael Ellerman [Thu, 10 Jul 2014 02:29:26 +0000 (12:29 +1000)]
powerpc: Remove CLASSIC_PPC

We have a strange #define in cputable.h called CLASSIC_PPC.

Although it is defined for 32 & 64bit, it's only used for 32bit and
it's basically a duplicate of CONFIG_PPC_BOOK3S_32, so let's use
the latter.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agopowerpc: Remove CONFIG_POWER4
Michael Ellerman [Thu, 10 Jul 2014 02:29:25 +0000 (12:29 +1000)]
powerpc: Remove CONFIG_POWER4

Although the name CONFIG_POWER4 suggests that it controls support for
power4 cpus, this symbol is actually misnamed.

It is a historical wart from the powermac code, which used to support
building a 32-bit kernel for power4. CONFIG_POWER4 was used in that
context to guard code that was 64-bit only.

In the powermac code we can just use CONFIG_PPC64 instead, and in other
places it is a synonym for CONFIG_PPC_BOOK3S_64.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agopowerpc: Remove power3 from comments
Michael Ellerman [Thu, 10 Jul 2014 02:29:24 +0000 (12:29 +1000)]
powerpc: Remove power3 from comments

There are still a few occurences where it remains, because it helps to
explain something that persists.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agopowerpc: Remove oprofile RS64 support
Michael Ellerman [Thu, 10 Jul 2014 02:29:23 +0000 (12:29 +1000)]
powerpc: Remove oprofile RS64 support

We no longer support these cpus, so we don't need oprofile support for
them either.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agopowerpc: Remove CONFIG_POWER3
Michael Ellerman [Thu, 10 Jul 2014 02:29:22 +0000 (12:29 +1000)]
powerpc: Remove CONFIG_POWER3

Now that we have dropped power3 support we can remove CONFIG_POWER3. The
usage in pgtable_32.c was already dead code as CONFIG_POWER3 was not
selectable on PPC32.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agopowerpc: Pull out ksp_vsid logic into a helper
Michael Ellerman [Thu, 10 Jul 2014 02:29:21 +0000 (12:29 +1000)]
powerpc: Pull out ksp_vsid logic into a helper

The previous patch left a bit of a wart in copy_process(). Clean it up a
bit by moving the logic out into a helper.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agopowerpc: Remove MMU_FTR_SLB
Michael Ellerman [Thu, 10 Jul 2014 02:29:20 +0000 (12:29 +1000)]
powerpc: Remove MMU_FTR_SLB

We now only support cpus that use an SLB, so we don't need an MMU
feature to indicate that.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agopowerpc: Remove STAB code
Michael Ellerman [Thu, 10 Jul 2014 02:29:19 +0000 (12:29 +1000)]
powerpc: Remove STAB code

Old cpus didn't have a Segment Lookaside Buffer (SLB), instead they had
a Segment Table (STAB). Now that we've dropped support for those cpus,
we can remove the STAB support entirely.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agopowerpc: Drop support for pre-POWER4 cpus
Michael Ellerman [Thu, 10 Jul 2014 02:29:18 +0000 (12:29 +1000)]
powerpc: Drop support for pre-POWER4 cpus

We inadvertently broke power3 support back in 3.4 with commit
f5339277eb8d "powerpc: Remove FW_FEATURE ISERIES from arch code".
No one noticed until at least 3.9.

By then we'd also broken it with the optimised memcpy, copy_to/from_user
and clear_user routines. We don't want to add any more complexity to
those just to support ancient cpus, so it seems like it's a good time to
drop support for power3 and earlier.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agopowerpc: Use standard macros for sys_sigpending() & sys_old_getrlimit()
Michael Ellerman [Tue, 24 Jun 2014 08:15:57 +0000 (18:15 +1000)]
powerpc: Use standard macros for sys_sigpending() & sys_old_getrlimit()

Currently we have sys_sigpending and sys_old_getrlimit defined to use
COMPAT_SYS() in systbl.h, but then both are #defined to sys_ni_syscall
in systbl.S.

This seems to have been done when ppc and ppc64 were merged, in commit
9994a33 "Introduce entry_{32,64}.S, misc_{32,64}.S, systbl.S".

AFAICS there's no longer (or never was) any need for this, we can just
use SYSX() for both and remove the #defines to sys_ni_syscall.

The expansion before was:

  #define COMPAT_SYS(func) .llong .sys_##func,.compat_sys_##func
  #define sys_old_getrlimit sys_ni_syscall
  COMPAT_SYS(old_getrlimit)
  =>
  .llong .sys_old_getrlimit,.compat_sys_old_getrlimit
  =>
  .llong .sys_ni_syscall,.compat_sys_old_getrlimit

After is:

  #define SYSX(f, f3264, f32) .llong .f,.f3264
  SYSX(sys_ni_syscall, compat_sys_old_getrlimit, sys_old_getrlimit)
  =>
  .llong .sys_ni_syscall,.compat_sys_old_getrlimit

ie. they are equivalent.

Finally both COMPAT_SYS() and SYSX() evaluate to sys_ni_syscall in the
Cell SPU code.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agoMerge branch 'merge' into next
Benjamin Herrenschmidt [Mon, 28 Jul 2014 03:41:12 +0000 (13:41 +1000)]
Merge branch 'merge' into next

Bring in some important fixes from the 3.16 branch

10 years agopowerpc: Fix endianness of flash_block_list in rtas_flash
Thomas Falcon [Fri, 25 Jul 2014 17:47:42 +0000 (12:47 -0500)]
powerpc: Fix endianness of flash_block_list in rtas_flash

The function rtas_flash_firmware passes the address of a data structure,
flash_block_list, when making the update-flash-64-and-reboot rtas call.
While the endianness of the address is handled correctly, the endianness
of the data is not.  This patch ensures that the data in flash_block_list
is big endian when passed to rtas on little endian hosts.

Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agopowerpc/powernv: Change BUG_ON to WARN_ON in elog code
Vasant Hegde [Wed, 23 Jul 2014 09:22:39 +0000 (14:52 +0530)]
powerpc/powernv: Change BUG_ON to WARN_ON in elog code

We can continue to read the error log (up to MAX size) even if
we get the elog size more than MAX size. Hence change BUG_ON to
WARN_ON.

Also updated error message.

Reported-by: Gopesh Kumar Chaudhary <gopchaud@in.ibm.com>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Acked-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
Acked-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agopowerpc/perf: Fix MMCR2 handling for EBB
Michael Ellerman [Wed, 23 Jul 2014 07:20:04 +0000 (17:20 +1000)]
powerpc/perf: Fix MMCR2 handling for EBB

In the recent commit b50a6c584bb4 "Clear MMCR2 when enabling PMU", I
screwed up the handling of MMCR2 for tasks using EBB.

We must make sure we set MMCR2 *before* ebb_switch_in(), otherwise we
overwrite the value of MMCR2 that userspace may have written. That
potentially breaks a task that uses EBB and manually uses MMCR2 for
event freezing.

Fixes: b50a6c584bb4 ("powerpc/perf: Clear MMCR2 when enabling PMU")
Cc: stable@vger.kernel.org
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agopowerpc: use _GLOBAL_TOC for memmove
Li Zhong [Mon, 21 Jul 2014 09:55:13 +0000 (17:55 +0800)]
powerpc: use _GLOBAL_TOC for memmove

memmove may be called from module code copy_pages(btrfs), and it may
call memcpy, which may call back to C code, so it needs to use
_GLOBAL_TOC to set up r2 correctly.

This fixes following error when I tried to boot an le guest:

Vector: 300 (Data Access) at [c000000073f97210]
    pc: c000000000015004: enable_kernel_altivec+0x24/0x80
    lr: c000000000058fbc: enter_vmx_copy+0x3c/0x60
    sp: c000000073f97490
   msr: 8000000002009033
   dar: d000000001d50170
 dsisr: 40000000
  current = 0xc0000000734c0000
  paca    = 0xc00000000fff0000  softe: 0  irq_happened: 0x01
    pid   = 815, comm = mktemp
enter ? for help
[c000000073f974f0c000000000058fbc enter_vmx_copy+0x3c/0x60
[c000000073f97510c000000000057d34 memcpy_power7+0x274/0x840
[c000000073f97610d000000001c3179c copy_pages+0xfc/0x110 [btrfs]
[c000000073f97660d000000001c3c248 memcpy_extent_buffer+0xe8/0x160 [btrfs]
[c000000073f97700d000000001be4be8 setup_items_for_insert+0x208/0x4a0 [btrfs]
[c000000073f97820d000000001be50b4 btrfs_insert_empty_items+0xf4/0x140 [btrfs]
[c000000073f97890d000000001bfed30 insert_with_overflow+0x70/0x180 [btrfs]
[c000000073f97900d000000001bff174 btrfs_insert_dir_item+0x114/0x2f0 [btrfs]
[c000000073f979a0d000000001c1f92c btrfs_add_link+0x10c/0x370 [btrfs]
[c000000073f97a40d000000001c20e94 btrfs_create+0x204/0x270 [btrfs]
[c000000073f97b00c00000000026d438 vfs_create+0x178/0x210
[c000000073f97b50c000000000270a70 do_last+0x9f0/0xe90
[c000000073f97c20c000000000271010 path_openat+0x100/0x810
[c000000073f97ce0c000000000272ea8 do_filp_open+0x58/0xd0
[c000000073f97dc0c00000000025ade8 do_sys_open+0x1b8/0x300
[c000000073f97e30c00000000000a008 syscall_exit+0x0/0x7c

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agopowerpc/pseries: dynamically added OF nodes need to call of_node_init
Tyrel Datwyler [Thu, 10 Jul 2014 18:50:57 +0000 (14:50 -0400)]
powerpc/pseries: dynamically added OF nodes need to call of_node_init

Commit 75b57ecf9 refactored device tree nodes to use kobjects such that they
can be exposed via /sysfs. A secondary commit 0829f6d1f furthered this rework
by moving the kobect initialization logic out of of_node_add into its own
of_node_init function. The inital commit removed the existing kref_init calls
in the pseries dlpar code with the assumption kobject initialization would
occur in of_node_add. The second commit had the side effect of triggering a
BUG_ON during DLPAR, migration and suspend/resume operations as a result of
dynamically added nodes being uninitialized.

This patch fixes this by adding of_node_init calls in place of the previously
removed kref_init calls.

Fixes: 0829f6d1f69e ("of: device_node kobject lifecycle fixes")
Cc: stable@vger.kernel.org
Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Acked-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Acked-by: Grant Likely <grant.likely@linaro.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agopowerpc: subpage_protect: Increase the array size to take care of 64TB
Aneesh Kumar K.V [Tue, 15 Jul 2014 14:52:30 +0000 (20:22 +0530)]
powerpc: subpage_protect: Increase the array size to take care of 64TB

We now support TASK_SIZE of 16TB, hence the array should be 8.

Fixes the below crash:

Unable to handle kernel paging request for data at address 0x000100bd
Faulting instruction address: 0xc00000000004f914
cpu 0x13: Vector: 300 (Data Access) at [c000000fea75fa90]
    pc: c00000000004f914: .sys_subpage_prot+0x2d4/0x5c0
    lr: c00000000004fb5c: .sys_subpage_prot+0x51c/0x5c0
    sp: c000000fea75fd10
   msr: 9000000000009032
   dar: 100bd
 dsisr: 40000000
  current = 0xc000000fea6ae490
  paca    = 0xc00000000fb8ab00   softe: 0        irq_happened: 0x00
    pid   = 8237, comm = a.out
enter ? for help
[c000000fea75fe30c00000000000a164 syscall_exit+0x0/0x98

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agopowerpc: Fix bugs in emulate_step()
Paul Mackerras [Sat, 19 Jul 2014 07:47:57 +0000 (17:47 +1000)]
powerpc: Fix bugs in emulate_step()

This fixes some bugs in emulate_step().  First, the setting of the carry
bit for the arithmetic right-shift instructions was not correct on 64-bit
machines because we were masking with a mask of type int rather than
unsigned long.  Secondly, the sld (shift left doubleword) instruction was
using the wrong instruction field for the register containing the shift
count.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agopowerpc: Disable doorbells on Power8 DD1.x
Joel Stanley [Fri, 18 Jul 2014 02:11:37 +0000 (11:41 +0930)]
powerpc: Disable doorbells on Power8 DD1.x

These processors do not currently support doorbell IPIs, so remove them
from the feature list if we are at DD 1.xx for the 0x004d part.

This fixes a regression caused by d4e58e5928f8 (powerpc/powernv: Enable
POWER8 doorbell IPIs). With that patch the kernel would hang at boot
when calling smp_call_function_many, as the doorbell would not be
received by the target CPUs:

  .smp_call_function_many+0x2bc/0x3c0 (unreliable)
  .on_each_cpu_mask+0x30/0x100
  .cpuidle_register_driver+0x158/0x1a0
  .cpuidle_register+0x2c/0x110
  .powernv_processor_idle_init+0x23c/0x2c0
  .do_one_initcall+0xd4/0x260
  .kernel_init_freeable+0x25c/0x33c
  .kernel_init+0x1c/0x120
  .ret_from_kernel_thread+0x58/0x7c

Fixes: d4e58e5928f8 (powerpc/powernv: Enable POWER8 doorbell IPIs)
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agopowernv: Add OPAL tracepoints
Anton Blanchard [Thu, 3 Jul 2014 07:20:50 +0000 (17:20 +1000)]
powernv: Add OPAL tracepoints

Knowing how long we spend in firmware calls is an important part of
minimising OS jitter.

This patch adds tracepoints to each OPAL call. If tracepoints are
enabled we branch out to a common routine that calls an entry and exit
tracepoint.

This allows us to write tools that monitor the frequency and duration
of OPAL calls, eg:

name                  count  total(ms)  min(ms)  max(ms)  avg(ms)  period(ms)
OPAL_HANDLE_INTERRUPT     5      0.199    0.037    0.042    0.040   12547.545
OPAL_POLL_EVENTS        204      2.590    0.012    0.036    0.013    2264.899
OPAL_PCI_MSI_EOI       2830      3.066    0.001    0.005    0.001      81.166

We use jump labels if configured, which means we only add a single
nop instruction to every OPAL call when the tracepoints are disabled.

Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agopowerpc/pseries: Optimise hcall tracepoints
Anton Blanchard [Thu, 3 Jul 2014 05:52:56 +0000 (15:52 +1000)]
powerpc/pseries: Optimise hcall tracepoints

Now that we execute the hcall tracepoint entry and exit code out of
line, we can use the same stack across both functions.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agopowerpc/pseries: Use jump labels for hcall tracepoints
Anton Blanchard [Thu, 3 Jul 2014 05:52:03 +0000 (15:52 +1000)]
powerpc/pseries: Use jump labels for hcall tracepoints

hcall tracepoints add quite a few instructions to our hcall path:

plpar_hcall:
mr      r2,r2
mfcr    r0
stw     r0,8(r1)
b       164 <---- start
ld      r12,0(r2)
std     r12,32(r1)
cmpdi   r12,0
beq     164 <---- end
...

We have an unconditional branch that gets noped out during boot and
a load/compare/branch. We also store the tracepoint value to the
stack for the hcall_exit path to use.

By using jump labels we can simplify this to just a single nop that
gets replaced with a branch when the tracepoint is enabled:

plpar_hcall:
mr      r2,r2
mfcr    r0
stw     r0,8(r1)
nop <----
...

If jump labels are not enabled, we fall back to the old method.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agopowerpc/powernv: Add a page size parameter to pnv_pci_setup_iommu_table()
Alexey Kardashevskiy [Fri, 6 Jun 2014 08:44:03 +0000 (18:44 +1000)]
powerpc/powernv: Add a page size parameter to pnv_pci_setup_iommu_table()

Since a TCE page size can be other than 4K, make it configurable for
P5IOC2 and IODA PHBs.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agopowerpc/powernv: Use it_page_shift in TCE build
Alexey Kardashevskiy [Fri, 6 Jun 2014 08:44:02 +0000 (18:44 +1000)]
powerpc/powernv: Use it_page_shift in TCE build

This makes use of iommu_table::it_page_shift instead of TCE_SHIFT and
TCE_RPN_SHIFT hardcoded values.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agopowerpc/powernv: Use it_page_shift for TCE invalidation
Alexey Kardashevskiy [Fri, 6 Jun 2014 08:44:01 +0000 (18:44 +1000)]
powerpc/powernv: Use it_page_shift for TCE invalidation

This fixes IODA1/2 to use it_page_shift as it may be bigger than 4K.

This changes involved constant values to use "ull" modifier.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agoMerge branch 'merge' into next
Benjamin Herrenschmidt [Fri, 11 Jul 2014 05:38:12 +0000 (15:38 +1000)]
Merge branch 'merge' into next

10 years agopowerpc/perf: Never program book3s PMCs with values >= 0x80000000
Anton Blanchard [Wed, 28 May 2014 22:15:38 +0000 (08:15 +1000)]
powerpc/perf: Never program book3s PMCs with values >= 0x80000000

We are seeing a lot of PMU warnings on POWER8:

    Can't find PMC that caused IRQ

Looking closer, the active PMC is 0 at this point and we took a PMU
exception on the transition from negative to 0. Some versions of POWER8
have an issue where they edge detect and not level detect PMC overflows.

A number of places program the PMC with (0x80000000 - period_left),
where period_left can be negative. We can either fix all of these or
just ensure that period_left is always >= 1.

This patch takes the second option.

Cc: <stable@vger.kernel.org>
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agopowerpc: Disable RELOCATABLE for COMPILE_TEST with PPC64
Guenter Roeck [Mon, 30 Jun 2014 18:45:30 +0000 (11:45 -0700)]
powerpc: Disable RELOCATABLE for COMPILE_TEST with PPC64

powerpc:allmodconfig has been failing for some time with the following
error.

arch/powerpc/kernel/exceptions-64s.S: Assembler messages:
arch/powerpc/kernel/exceptions-64s.S:1312: Error: attempt to move .org backwards
make[1]: *** [arch/powerpc/kernel/head_64.o] Error 1

A number of attempts to fix the problem by moving around code have been
unsuccessful and resulted in failed builds for some configurations and
the discovery of toolchain bugs.

Fix the problem by disabling RELOCATABLE for COMPILE_TEST builds instead.
While this is less than perfect, it avoids substantial code changes
which would otherwise be necessary just to make COMPILE_TEST builds
happy and might have undesired side effects.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agopowerpc/perf: Clear MMCR2 when enabling PMU
Joel Stanley [Tue, 8 Jul 2014 06:38:22 +0000 (16:08 +0930)]
powerpc/perf: Clear MMCR2 when enabling PMU

On POWER8 when switching to a KVM guest we set bits in MMCR2 to freeze
the PMU counters. Aside from on boot they are then never reset,
resulting in stuck perf counters for any user in the guest or host.

We now set MMCR2 to 0 whenever enabling the PMU, which provides a sane
state for perf to use the PMU counters under either the guest or the
host.

This was manifesting as a bug with ppc64_cpu --frequency:

    $ sudo ppc64_cpu --frequency
    WARNING: couldn't run on cpu 0
    WARNING: couldn't run on cpu 8
      ...
    WARNING: couldn't run on cpu 144
    WARNING: couldn't run on cpu 152
    min:    18446744073.710 GHz (cpu -1)
    max:    0.000 GHz (cpu -1)
    avg:    0.000 GHz

The command uses a perf counter to measure CPU cycles over a fixed
amount of time, in order to approximate the frequency of the machine.
The counters were returning zero once a guest was started, regardless of
weather it was still running or had been shut down.

By dumping the value of MMCR2, it was observed that once a guest is
running MMCR2 is set to 1s - which stops counters from running:

    $ sudo sh -c 'echo p > /proc/sysrq-trigger'
    CPU: 0 PMU registers, ppmu = POWER8 n_counters = 6
    PMC1:  5b635e38 PMC2: 00000000 PMC3: 00000000 PMC4: 00000000
    PMC5:  1bf5a646 PMC6: 5793d378 PMC7: deadbeef PMC8: deadbeef
    MMCR0: 0000000080000000 MMCR1: 000000001e000000 MMCRA: 0000040000000000
    MMCR2: fffffffffffffc00 EBBHR: 0000000000000000
    EBBRR: 0000000000000000 BESCR: 0000000000000000
    SIAR:  00000000000a51cc SDAR:  c00000000fc40000 SIER:  0000000001000000

This is done unconditionally in book3s_hv_interrupts.S upon entering the
guest, and the original value is only save/restored if the host has
indicated it was using the PMU. This is okay, however the user of the
PMU needs to ensure that it is in a defined state when it starts using
it.

Fixes: e05b9b9e5c10 ("powerpc/perf: Power8 PMU support")
Cc: stable@vger.kernel.org
Signed-off-by: Joel Stanley <joel@jms.id.au>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agopowerpc/perf: Add PPMU_ARCH_207S define
Joel Stanley [Tue, 8 Jul 2014 06:38:21 +0000 (16:08 +0930)]
powerpc/perf: Add PPMU_ARCH_207S define

Instead of separate bits for every POWER8 PMU feature, have a single one
for v2.07 of the architecture.

This saves us adding a MMCR2 define for a future patch.

Cc: stable@vger.kernel.org
Signed-off-by: Joel Stanley <joel@jms.id.au>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>