Paul Mackerras [Wed, 8 Jan 2014 10:25:28 +0000 (21:25 +1100)]
KVM: PPC: Book3S HV: Prepare for host using hypervisor doorbells
POWER8 has support for hypervisor doorbell interrupts. Though the
kernel doesn't use them for IPIs on the powernv platform yet, it
probably will in future, so this makes KVM cope gracefully if a
hypervisor doorbell interrupt arrives while in a guest.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Paul Mackerras [Wed, 8 Jan 2014 10:25:27 +0000 (21:25 +1100)]
KVM: PPC: Book3S HV: Handle new LPCR bits on POWER8
POWER8 has a bit in the LPCR to enable or disable the PURR and SPURR
registers to count when in the guest. Set this bit.
POWER8 has a field in the LPCR called AIL (Alternate Interrupt Location)
which is used to enable relocation-on interrupts. Allow userspace to
set this field.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Paul Mackerras [Wed, 8 Jan 2014 10:25:26 +0000 (21:25 +1100)]
KVM: PPC: Book3S HV: Handle guest using doorbells for IPIs
* SRR1 wake reason field for system reset interrupt on wakeup from nap
is now a 4-bit field on P8, compared to 3 bits on P7.
* Set PECEDP in LPCR when napping because of H_CEDE so guest doorbells
will wake us up.
* Waking up from nap because of a guest doorbell interrupt is not a
reason to exit the guest.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Paul Mackerras [Wed, 8 Jan 2014 10:25:25 +0000 (21:25 +1100)]
KVM: PPC: Book3S HV: Consolidate code that checks reason for wake from nap
Currently in book3s_hv_rmhandlers.S we have three places where we
have woken up from nap mode and we check the reason field in SRR1
to see what event woke us up. This consolidates them into a new
function, kvmppc_check_wake_reason. It looks at the wake reason
field in SRR1, and if it indicates that an external interrupt caused
the wakeup, calls kvmppc_read_intr to check what sort of interrupt
it was.
This also consolidates the two places where we synthesize an external
interrupt (0x500 vector) for the guest. Now, if the guest exit code
finds that there was an external interrupt which has been handled
(i.e. it was an IPI indicating that there is now an interrupt pending
for the guest), it jumps to deliver_guest_interrupt, which is in the
last part of the guest entry code, where we synthesize guest external
and decrementer interrupts. That code has been streamlined a little
and now clears LPCR[MER] when appropriate as well as setting it.
The extra clearing of any pending IPI on a secondary, offline CPU
thread before going back to nap mode has been removed. It is no longer
necessary now that we have code to read and acknowledge IPIs in the
guest exit path.
This fixes a minor bug in the H_CEDE real-mode handling - previously,
if we found that other threads were already exiting the guest when we
were about to go to nap mode, we would branch to the cede wakeup path
and end up looking in SRR1 for a wakeup reason. Now we branch to a
point after we have checked the wakeup reason.
This also fixes a minor bug in kvmppc_read_intr - previously it could
return 0xff rather than 1, in the case where we find that a host IPI
is pending after we have cleared the IPI. Now it returns 1.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Paul Mackerras [Wed, 8 Jan 2014 10:25:24 +0000 (21:25 +1100)]
KVM: PPC: Book3S HV: Implement architecture compatibility modes for POWER8
This allows us to select architecture 2.05 (POWER6) or 2.06 (POWER7)
compatibility modes on a POWER8 processor. (Note that transactional
memory is disabled for usermode if either or both of the PCR_TM_DIS
and PCR_ARCH_206 bits are set.)
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Michael Ellerman [Wed, 8 Jan 2014 10:25:23 +0000 (21:25 +1100)]
KVM: PPC: Book3S HV: Add handler for HV facility unavailable
At present this should never happen, since the host kernel sets
HFSCR to allow access to all facilities. It's better to be prepared
to handle it cleanly if it does ever happen, though.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Paul Mackerras [Wed, 8 Jan 2014 10:25:22 +0000 (21:25 +1100)]
KVM: PPC: Book3S HV: Flush the correct number of TLB sets on POWER8
POWER8 has 512 sets in the TLB, compared to 128 for POWER7, so we need
to do more tlbiel instructions when flushing the TLB on POWER8.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Michael Neuling [Wed, 8 Jan 2014 10:25:21 +0000 (21:25 +1100)]
KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs
This adds fields to the struct kvm_vcpu_arch to store the new
guest-accessible SPRs on POWER8, adds code to the get/set_one_reg
functions to allow userspace to access this state, and adds code to
the guest entry and exit to context-switch these SPRs between host
and guest.
Note that DPDES (Directed Privileged Doorbell Exception State) is
shared between threads on a core; hence we store it in struct
kvmppc_vcore and have the master thread save and restore it.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Paul Mackerras [Wed, 8 Jan 2014 10:25:20 +0000 (21:25 +1100)]
KVM: PPC: Book3S HV: Align physical and virtual CPU thread numbers
On a threaded processor such as POWER7, we group VCPUs into virtual
cores and arrange that the VCPUs in a virtual core run on the same
physical core. Currently we don't enforce any correspondence between
virtual thread numbers within a virtual core and physical thread
numbers. Physical threads are allocated starting at 0 on a first-come
first-served basis to runnable virtual threads (VCPUs).
POWER8 implements a new "msgsndp" instruction which guest kernels can
use to interrupt other threads in the same core or sub-core. Since
the instruction takes the destination physical thread ID as a parameter,
it becomes necessary to align the physical thread IDs with the virtual
thread IDs, that is, to make sure virtual thread N within a virtual
core always runs on physical thread N.
This means that it's possible that thread 0, which is where we call
__kvmppc_vcore_entry, may end up running some other vcpu than the
one whose task called kvmppc_run_core(), or it may end up running
no vcpu at all, if for example thread 0 of the virtual core is
currently executing in userspace. However, we do need thread 0
to be responsible for switching the MMU -- a previous version of
this patch that had other threads switching the MMU was found to
be responsible for occasional memory corruption and machine check
interrupts in the guest on POWER7 machines.
To accommodate this, we no longer pass the vcpu pointer to
__kvmppc_vcore_entry, but instead let the assembly code load it from
the PACA. Since the assembly code will need to know the kvm pointer
and the thread ID for threads which don't have a vcpu, we move the
thread ID into the PACA and we add a kvm pointer to the virtual core
structure.
In the case where thread 0 has no vcpu to run, it still calls into
kvmppc_hv_entry in order to do the MMU switch, and then naps until
either its vcpu is ready to run in the guest, or some other thread
needs to exit the guest. In the latter case, thread 0 jumps to the
code that switches the MMU back to the host. This control flow means
that now we switch the MMU before loading any guest vcpu state.
Similarly, on guest exit we now save all the guest vcpu state before
switching the MMU back to the host. This has required substantial
code movement, making the diff rather large.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Michael Neuling [Wed, 8 Jan 2014 10:25:19 +0000 (21:25 +1100)]
KVM: PPC: Book3S HV: Don't set DABR on POWER8
POWER8 doesn't have the DABR and DABRX registers; instead it has
new DAWR/DAWRX registers, which will be handled in a later patch.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Scott Wood [Fri, 10 Jan 2014 01:18:40 +0000 (19:18 -0600)]
kvm/ppc: IRQ disabling cleanup
Simplify the handling of lazy EE by going directly from fully-enabled
to hard-disabled. This replaces the lazy_irq_pending() check
(including its misplaced kvm_guest_exit() call).
As suggested by Tiejun Chen, move the interrupt disabling into
kvmppc_prepare_to_enter() rather than have each caller do it. Also
move the IRQ enabling on heavyweight exit into
kvmppc_prepare_to_enter().
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Mihai Caraman [Thu, 9 Jan 2014 15:01:05 +0000 (17:01 +0200)]
KVM: PPC: e500: Fix bad address type in deliver_tlb_misss()
Use gva_t instead of unsigned int for eaddr in deliver_tlb_miss().
Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
CC: stable@vger.kernel.org
Signed-off-by: Alexander Graf <agraf@suse.de>
Andreas Schwab [Mon, 30 Dec 2013 14:36:56 +0000 (15:36 +0100)]
KVM: PPC: Book3S HV: use xics_wake_cpu only when defined
Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
CC: stable@vger.kernel.org
Signed-off-by: Alexander Graf <agraf@suse.de>
Cédric Le Goater [Thu, 9 Jan 2014 10:51:16 +0000 (11:51 +0100)]
KVM: PPC: Book3S: MMIO emulation support for little endian guests
MMIO emulation reads the last instruction executed by the guest
and then emulates. If the guest is running in Little Endian order,
or more generally in a different endian order of the host, the
instruction needs to be byte-swapped before being emulated.
This patch adds a helper routine which tests the endian order of
the host and the guest in order to decide whether a byteswap is
needed or not. It is then used to byteswap the last instruction
of the guest in the endian order of the host before MMIO emulation
is performed.
Finally, kvmppc_handle_load() of kvmppc_handle_store() are modified
to reverse the endianness of the MMIO if required.
Signed-off-by: Cédric Le Goater <clg@fr.ibm.com>
[agraf: add booke handling]
Signed-off-by: Alexander Graf <agraf@suse.de>
Alexander Graf [Thu, 9 Jan 2014 10:10:44 +0000 (11:10 +0100)]
KVM: PPC: Unify kvmppc_get_last_inst and sc
We had code duplication between the inline functions to get our last
instruction on normal interrupts and system call interrupts. Unify
both helper functions towards a single implementation.
Signed-off-by: Alexander Graf <agraf@suse.de>
Zhouyi Zhou [Mon, 2 Dec 2013 10:21:58 +0000 (18:21 +0800)]
KVM: PPC: NULL return of kvmppc_mmu_hpte_cache_next should be handled
NULL return of kvmppc_mmu_hpte_cache_next should be handled
Signed-off-by: Zhouyi Zhou <yizhouzhou@ict.ac.cn>
Signed-off-by: Alexander Graf <agraf@suse.de>
Tiejun Chen [Wed, 23 Oct 2013 01:26:48 +0000 (09:26 +0800)]
KVM: PPC: Book3E HV: call RECONCILE_IRQ_STATE to sync the software state
Rather than calling hard_irq_disable() when we're back in C code
we can just call RECONCILE_IRQ_STATE to soft disable IRQs while
we're already in hard disabled state.
This should be functionally equivalent to the code before, but
cleaner and faster.
Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com>
[agraf: fix comment, commit message]
Signed-off-by: Alexander Graf <agraf@suse.de>
Bharat Bhushan [Mon, 18 Nov 2013 05:48:54 +0000 (11:18 +0530)]
kvm: powerpc: use caching attributes as per linux pte
KVM uses same WIM tlb attributes as the corresponding qemu pte.
For this we now search the linux pte for the requested page and
get these cache caching/coherency attributes from pte.
Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Reviewed-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Bharat Bhushan [Fri, 15 Nov 2013 05:31:15 +0000 (11:01 +0530)]
kvm: powerpc: define a linux pte lookup function
We need to search linux "pte" to get "pte" attributes for setting TLB in KVM.
This patch defines a lookup_linux_ptep() function which returns pte pointer.
Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Reviewed-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Bharat Bhushan [Fri, 15 Nov 2013 05:31:14 +0000 (11:01 +0530)]
kvm: book3s: rename lookup_linux_pte() to lookup_linux_pte_and_update()
lookup_linux_pte() is doing more than lookup, updating the pte,
so for clarity it is renamed to lookup_linux_pte_and_update()
Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Reviewed-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Bharat Bhushan [Fri, 15 Nov 2013 05:31:13 +0000 (11:01 +0530)]
kvm: booke: clear host tlb reference flag on guest tlb invalidation
On booke, "struct tlbe_ref" contains host tlb mapping information
(pfn: for guest-pfn to pfn, flags: attribute associated with this mapping)
for a guest tlb entry. So when a guest creates a TLB entry then
"struct tlbe_ref" is set to point to valid "pfn" and set attributes in
"flags" field of the above said structure. When a guest TLB entry is
invalidated then flags field of corresponding "struct tlbe_ref" is
updated to point that this is no more valid, also we selectively clear
some other attribute bits, example: if E500_TLB_BITMAP was set then we clear
E500_TLB_BITMAP, if E500_TLB_TLB0 is set then we clear this.
Ideally we should clear complete "flags" as this entry is invalid and does not
have anything to re-used. The other part of the problem is that when we use
the same entry again then also we do not clear (started doing or-ing etc).
So far it was working because the selectively clearing mentioned above
actually clears "flags" what was set during TLB mapping. But the problem
starts coming when we add more attributes to this then we need to selectively
clear them and which is not needed.
Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Reviewed-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Paul Mackerras [Tue, 15 Oct 2013 09:43:04 +0000 (20:43 +1100)]
KVM: PPC: Book3S HV: Use load/store_fp_state functions in HV guest entry/exit
This modifies kvmppc_load_fp and kvmppc_save_fp to use the generic
FP/VSX and VMX load/store functions instead of open-coding the
FP/VSX/VMX load/store instructions. Since kvmppc_load/save_fp don't
follow C calling conventions, we make them private symbols within
book3s_hv_rmhandlers.S.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Paul Mackerras [Tue, 15 Oct 2013 09:43:03 +0000 (20:43 +1100)]
KVM: PPC: Load/save FP/VMX/VSX state directly to/from vcpu struct
Now that we have the vcpu floating-point and vector state stored in
the same type of struct as the main kernel uses, we can load that
state directly from the vcpu struct instead of having extra copies
to/from the thread_struct. Similarly, when the guest state needs to
be saved, we can have it saved it directly to the vcpu struct by
setting the current->thread.fp_save_area and current->thread.vr_save_area
pointers. That also means that we don't need to back up and restore
userspace's FP/vector state. This all makes the code simpler and
faster.
Note that it's not necessary to save or modify current->thread.fpexc_mode,
since nothing in KVM uses or is affected by its value. Nor is it
necessary to touch used_vr or used_vsr.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Paul Mackerras [Tue, 15 Oct 2013 09:43:02 +0000 (20:43 +1100)]
KVM: PPC: Store FP/VSX/VMX state in thread_fp/vr_state structures
This uses struct thread_fp_state and struct thread_vr_state to store
the floating-point, VMX/Altivec and VSX state, rather than flat arrays.
This makes transferring the state to/from the thread_struct simpler
and allows us to unify the get/set_one_reg implementations for the
VSX registers.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Paul Mackerras [Tue, 15 Oct 2013 09:43:01 +0000 (20:43 +1100)]
KVM: PPC: Use load_fp/vr_state rather than load_up_fpu/altivec
The load_up_fpu and load_up_altivec functions were never intended to
be called from C, and do things like modifying the MSR value in their
callers' stack frames, which are assumed to be interrupt frames. In
addition, on 32-bit Book S they require the MMU to be off.
This makes KVM use the new load_fp_state() and load_vr_state() functions
instead of load_up_fpu/altivec. This means we can remove the assembler
glue in book3s_rmhandlers.S, and potentially fixes a bug on Book E,
where load_up_fpu was called directly from C.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Bharat Bhushan [Tue, 8 Oct 2013 04:02:20 +0000 (09:32 +0530)]
kvm/powerpc: move kvm_hypercall0() and friends to epapr_hypercall0()
kvm_hypercall0() and friends have nothing KVM specific so moved to
epapr_hypercall0() and friends. Also they are moved from
arch/powerpc/include/asm/kvm_para.h to arch/powerpc/include/asm/epapr_hcalls.h
Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Bharat Bhushan [Tue, 8 Oct 2013 04:02:19 +0000 (09:32 +0530)]
kvm/powerpc: rename kvm_hypercall() to epapr_hypercall()
kvm_hypercall() have nothing KVM specific, so renamed to epapr_hypercall().
Also this in moved to arch/powerpc/include/asm/epapr_hcalls.h
Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Gleb Natapov [Sun, 1 Sep 2013 12:53:46 +0000 (15:53 +0300)]
KVM: PPC: fix couple of memory leaks in MPIC/XICS devices
XICS failed to free xics structure on error path. MPIC destroy handler
forgot to delete kvm_device structure.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Alexander Graf [Mon, 9 Dec 2013 12:53:42 +0000 (13:53 +0100)]
KVM: PPC: Add devname:kvm aliases for modules
Systems that support automatic loading of kernel modules through
device aliases should try and automatically load kvm when /dev/kvm
gets opened.
Add code to support that magic for all PPC kvm targets, even the
ones that don't support modules yet.
Signed-off-by: Alexander Graf <agraf@suse.de>
Liu Ping Fan [Tue, 19 Nov 2013 06:12:48 +0000 (14:12 +0800)]
powerpc: kvm: optimize "sc 1" as fast return
In some scene, e.g openstack CI, PR guest can trigger "sc 1" frequently,
this patch optimizes the path by directly delivering BOOK3S_INTERRUPT_SYSCALL
to HV guest, so powernv can return to HV guest without heavy exit, i.e,
no need to swap TLB, HTAB,.. etc
Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Heiko Carstens [Mon, 18 Nov 2013 09:35:55 +0000 (10:35 +0100)]
KVM: kvm_clear_guest_page(): fix empty_zero_page usage
Using the address of 'empty_zero_page' as source address in order to
clear a page is wrong. On some architectures empty_zero_page is only the
pointer to the struct page of the empty_zero_page. Therefore the clear
page operation would copy the contents of a couple of struct pages instead
of clearing a page. For kvm only arm/arm64 are affected by this bug.
To fix this use the ZERO_PAGE macro instead which will return the struct
page address of the empty_zero_page on all architectures.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Sasha Levin [Tue, 19 Nov 2013 20:22:47 +0000 (15:22 -0500)]
kvm: mmu: delay mmu audit activation
We should not be using jump labels before they were initialized. Push back
the callback to until after jump label initialization.
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Gleb Natapov [Tue, 19 Nov 2013 08:43:05 +0000 (10:43 +0200)]
Merge tag 'kvm-arm-fixes-3.13-1' of git://git.linaro.org/people/cdall/linux-kvm-arm into next
Fix percpu vmalloc allocations
Christoffer Dall [Fri, 15 Nov 2013 21:14:12 +0000 (13:14 -0800)]
arm/arm64: KVM: Fix hyp mappings of vmalloc regions
Using virt_to_phys on percpu mappings is horribly wrong as it may be
backed by vmalloc. Introduce kvm_kaddr_to_phys which translates both
types of valid kernel addresses to the corresponding physical address.
At the same time resolves a typing issue where we were storing the
physical address as a 32 bit unsigned long (on arm), truncating the
physical address for addresses above the 4GB limit. This caused
breakage on Keystone.
Cc: <stable@vger.kernel.org> [3.10+]
Reported-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Tested-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Linus Torvalds [Fri, 15 Nov 2013 04:51:36 +0000 (13:51 +0900)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm
Pull KVM changes from Paolo Bonzini:
"Here are the 3.13 KVM changes. There was a lot of work on the PPC
side: the HV and emulation flavors can now coexist in a single kernel
is probably the most interesting change from a user point of view.
On the x86 side there are nested virtualization improvements and a few
bugfixes.
ARM got transparent huge page support, improved overcommit, and
support for big endian guests.
Finally, there is a new interface to connect KVM with VFIO. This
helps with devices that use NoSnoop PCI transactions, letting the
driver in the guest execute WBINVD instructions. This includes some
nVidia cards on Windows, that fail to start without these patches and
the corresponding userspace changes"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (146 commits)
kvm, vmx: Fix lazy FPU on nested guest
arm/arm64: KVM: PSCI: propagate caller endianness to the incoming vcpu
arm/arm64: KVM: MMIO support for BE guest
kvm, cpuid: Fix sparse warning
kvm: Delete prototype for non-existent function kvm_check_iopl
kvm: Delete prototype for non-existent function complete_pio
hung_task: add method to reset detector
pvclock: detect watchdog reset at pvclock read
kvm: optimize out smp_mb after srcu_read_unlock
srcu: API for barrier after srcu read unlock
KVM: remove vm mmap method
KVM: IOMMU: hva align mapping page size
KVM: x86: trace cpuid emulation when called from emulator
KVM: emulator: cleanup decode_register_operand() a bit
KVM: emulator: check rex prefix inside decode_register()
KVM: x86: fix emulation of "movzbl %bpl, %eax"
kvm_host: typo fix
KVM: x86: emulate SAHF instruction
MAINTAINERS: add tree for kvm.git
Documentation/kvm: add a 00-INDEX file
...
Linus Torvalds [Fri, 15 Nov 2013 04:34:37 +0000 (13:34 +0900)]
Merge tag 'stable/for-linus-3.13-rc0-tag' of git://git./linux/kernel/git/xen/tip
Pull Xen updates from Konrad Rzeszutek Wilk:
"This has tons of fixes and two major features which are concentrated
around the Xen SWIOTLB library.
The short <blurb> is that the tracing facility (just one function) has
been added to SWIOTLB to make it easier to track I/O progress.
Additionally under Xen and ARM (32 & 64) the Xen-SWIOTLB driver
"is used to translate physical to machine and machine to physical
addresses of foreign[guest] pages for DMA operations" (Stefano) when
booting under hardware without proper IOMMU.
There are also bug-fixes, cleanups, compile warning fixes, etc.
The commit times for some of the commits is a bit fresh - that is b/c
we wanted to make sure we have the Ack's from the ARM folks - which
with the string of back-to-back conferences took a bit of time. Rest
assured - the code has been stewing in #linux-next for some time.
Features:
- SWIOTLB has tracing added when doing bounce buffer.
- Xen ARM/ARM64 can use Xen-SWIOTLB. This work allows Linux to
safely program real devices for DMA operations when running as a
guest on Xen on ARM, without IOMMU support. [*1]
- xen_raw_printk works with PVHVM guests if needed.
Bug-fixes:
- Make memory ballooning work under HVM with large MMIO region.
- Inform hypervisor of MCFG regions found in ACPI DSDT.
- Remove deprecated IRQF_DISABLED.
- Remove deprecated __cpuinit.
[*1]:
"On arm and arm64 all Xen guests, including dom0, run with second
stage translation enabled. As a consequence when dom0 programs a
device for a DMA operation is going to use (pseudo) physical
addresses instead machine addresses. This work introduces two trees
to track physical to machine and machine to physical mappings of
foreign pages. Local pages are assumed mapped 1:1 (physical address
== machine address). It enables the SWIOTLB-Xen driver on ARM and
ARM64, so that Linux can translate physical addresses to machine
addresses for dma operations when necessary. " (Stefano)"
* tag 'stable/for-linus-3.13-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (32 commits)
xen/arm: pfn_to_mfn and mfn_to_pfn return the argument if nothing is in the p2m
arm,arm64/include/asm/io.h: define struct bio_vec
swiotlb-xen: missing include dma-direction.h
pci-swiotlb-xen: call pci_request_acs only ifdef CONFIG_PCI
arm: make SWIOTLB available
xen: delete new instances of added __cpuinit
xen/balloon: Set balloon's initial state to number of existing RAM pages
xen/mcfg: Call PHYSDEVOP_pci_mmcfg_reserved for MCFG areas.
xen: remove deprecated IRQF_DISABLED
x86/xen: remove deprecated IRQF_DISABLED
swiotlb-xen: fix error code returned by xen_swiotlb_map_sg_attrs
swiotlb-xen: static inline xen_phys_to_bus, xen_bus_to_phys, xen_virt_to_bus and range_straddles_page_boundary
grant-table: call set_phys_to_machine after mapping grant refs
arm,arm64: do not always merge biovec if we are running on Xen
swiotlb: print a warning when the swiotlb is full
swiotlb-xen: use xen_dma_map/unmap_page, xen_dma_sync_single_for_cpu/device
xen: introduce xen_dma_map/unmap_page and xen_dma_sync_single_for_cpu/device
tracing/events: Fix swiotlb tracepoint creation
swiotlb-xen: use xen_alloc/free_coherent_pages
xen: introduce xen_alloc/free_coherent_pages
...
Linus Torvalds [Fri, 15 Nov 2013 04:28:47 +0000 (13:28 +0900)]
Merge tag 'virtio-next-for-linus' of git://git./linux/kernel/git/rusty/linux
Pull virtio updates from Rusty Russell:
"Nothing really exciting: some groundwork for changing virtio endian,
and some robustness fixes for broken virtio devices, plus minor
tweaks"
* tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
virtio_scsi: verify if queue is broken after virtqueue_get_buf()
x86, asmlinkage, lguest: Pass in globals into assembler statement
virtio: mmio: fix signature checking for BE guests
virtio_ring: adapt to notify() returning bool
virtio_net: verify if queue is broken after virtqueue_get_buf()
virtio_console: verify if queue is broken after virtqueue_get_buf()
virtio_blk: verify if queue is broken after virtqueue_get_buf()
virtio_ring: add new function virtqueue_is_broken()
virtio_test: verify if virtqueue_kick() succeeded
virtio_net: verify if virtqueue_kick() succeeded
virtio_ring: let virtqueue_{kick()/notify()} return a bool
virtio_ring: change host notification API
virtio_config: remove virtio_config_val
virtio: use size-based config accessors.
virtio_config: introduce size-based accessors.
virtio_ring: plug kmemleak false positive.
virtio: pm: use CONFIG_PM_SLEEP instead of CONFIG_PM
Linus Torvalds [Fri, 15 Nov 2013 04:27:50 +0000 (13:27 +0900)]
Merge tag 'modules-next-for-linus' of git://git./linux/kernel/git/rusty/linux
Pull module updates from Rusty Russell:
"Mainly boring here, too. rmmod --wait finally removed, though"
* tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
modpost: fix bogus 'exported twice' warnings.
init: fix in-place parameter modification regression
asmlinkage, module: Make ksymtab and kcrctab symbols and __this_module __visible
kernel: add support for init_array constructors
modpost: Optionally ignore secondary errors seen if a single module build fails
module: remove rmmod --wait option.
Linus Torvalds [Fri, 15 Nov 2013 00:32:31 +0000 (09:32 +0900)]
Merge branch 'akpm' (patch-bomb from Andrew Morton)
Merge patches from Andrew Morton:
- memstick fixes
- the rest of MM
- various misc bits that were awaiting merges from linux-next into
mainline: seq_file, printk, rtc, completions, w1, softirqs, llist,
kfifo, hfsplus
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (72 commits)
cmdline-parser: fix build
hfsplus: Fix undefined __divdi3 in hfsplus_init_header_node()
kfifo API type safety
kfifo: kfifo_copy_{to,from}_user: fix copied bytes calculation
sound/core/memalloc.c: use gen_pool_dma_alloc() to allocate iram buffer
llists-move-llist_reverse_order-from-raid5-to-llistc-fix
llists: move llist_reverse_order from raid5 to llist.c
kernel: fix generic_exec_single indentation
kernel-provide-a-__smp_call_function_single-stub-for-config_smp-fix
kernel: provide a __smp_call_function_single stub for !CONFIG_SMP
kernel: remove CONFIG_USE_GENERIC_SMP_HELPERS
revert "softirq: Add support for triggering softirq work on softirqs"
drivers/w1/masters/w1-gpio.c: use dev_get_platdata()
sched: remove INIT_COMPLETION
tree-wide: use reinit_completion instead of INIT_COMPLETION
sched: replace INIT_COMPLETION with reinit_completion
drivers/rtc/rtc-hid-sensor-time.c: enable HID input processing early
drivers/rtc/rtc-hid-sensor-time.c: use dev_get_platdata()
vsprintf: ignore %n again
seq_file: remove "%n" usage from seq_file users
...
Alexander Beregalov [Thu, 14 Nov 2013 22:32:19 +0000 (14:32 -0800)]
cmdline-parser: fix build
Fix following errors:
include/linux/cmdline-parser.h:17:12: error: 'BDEVNAME_SIZE' undeclared here
block/cmdline-parser.c:17:2: error: implicit declaration of function 'kzalloc'
Signed-off-by: Alexander Beregalov <alexander.beregalov@intel.com>
Cc: CaiZhiyong <caizhiyong@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Geert Uytterhoeven [Thu, 14 Nov 2013 22:32:18 +0000 (14:32 -0800)]
hfsplus: Fix undefined __divdi3 in hfsplus_init_header_node()
ERROR: "__divdi3" [fs/hfsplus/hfsplus.ko] undefined!
Introduced by commit
099e9245e04d ("hfsplus: implement attributes file's
header node initialization code").
i_size_read() returns loff_t, which is long long, i.e. 64-bit. node_size
is size_t, which is either 32-bit or 64-bit. Hence
"i_size_read(attr_file) / node_size" is a 64-by-32 or 64-by-64 division,
causing (some versions of) gcc to emit a call to __divdi3().
Fortunately node_size is actually 16-bit, as the sole caller of
hfsplus_init_header_node() passes a u16. Hence change its type from
size_t to u16, and use do_div() to perform a 64-by-32 division.
Not seen in m68k/allmodconfig in -next, so it really depends on the
verion of gcc.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Stefani Seibold [Thu, 14 Nov 2013 22:32:17 +0000 (14:32 -0800)]
kfifo API type safety
This patch enhances the type safety for the kfifo API. It is now safe
to put const data into a non const FIFO and the API will now generate a
compiler warning when reading from the fifo where the destination
address is pointing to a const variable.
As a side effect the kfifo_put() does now expect the value of an element
instead a pointer to the element. This was suggested Russell King. It
make the handling of the kfifo_put easier since there is no need to
create a helper variable for getting the address of a pointer or to pass
integers of different sizes.
IMHO the API break is okay, since there are currently only six users of
kfifo_put().
The code is also cleaner by kicking out the "if (0)" expressions.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Stefani Seibold <stefani@seibold.net>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Lars-Peter Clausen [Thu, 14 Nov 2013 22:32:16 +0000 (14:32 -0800)]
kfifo: kfifo_copy_{to,from}_user: fix copied bytes calculation
'copied' and 'len' are in bytes, while 'ret' is in elements, so we need to
multiply 'ret' with the size of one element to get the correct result.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Cc: Stefani Seibold <stefani@seibold.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nicolin Chen [Thu, 14 Nov 2013 22:32:15 +0000 (14:32 -0800)]
sound/core/memalloc.c: use gen_pool_dma_alloc() to allocate iram buffer
Since gen_pool_dma_alloc() is introduced, we implement it to simplify code.
Signed-off-by: Nicolin Chen <b42378@freescale.com>
Acked-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morton [Thu, 14 Nov 2013 22:32:13 +0000 (14:32 -0800)]
llists-move-llist_reverse_order-from-raid5-to-llistc-fix
fix comment typo, per Jan
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Christoph Hellwig [Thu, 14 Nov 2013 22:32:11 +0000 (14:32 -0800)]
llists: move llist_reverse_order from raid5 to llist.c
Make this useful helper available for other users.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Christoph Hellwig [Thu, 14 Nov 2013 22:32:10 +0000 (14:32 -0800)]
kernel: fix generic_exec_single indentation
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morton [Thu, 14 Nov 2013 22:32:09 +0000 (14:32 -0800)]
kernel-provide-a-__smp_call_function_single-stub-for-config_smp-fix
x86_64 allnoconfig:
kernel/up.c:25: error: redefinition of '__smp_call_function_single'
include/linux/smp.h:154: note: previous definition of '__smp_call_function_single' was here
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Christoph Hellwig [Thu, 14 Nov 2013 22:32:08 +0000 (14:32 -0800)]
kernel: provide a __smp_call_function_single stub for !CONFIG_SMP
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Christoph Hellwig [Thu, 14 Nov 2013 22:32:07 +0000 (14:32 -0800)]
kernel: remove CONFIG_USE_GENERIC_SMP_HELPERS
We've switched over every architecture that supports SMP to it, so
remove the new useless config variable.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Christoph Hellwig [Thu, 14 Nov 2013 22:32:06 +0000 (14:32 -0800)]
revert "softirq: Add support for triggering softirq work on softirqs"
This commit was incomplete in that code to remove items from the per-cpu
lists was missing and never acquired a user in the 5 years it has been in
the tree. We're going to implement what it seems to try to archive in a
simpler way, and this code is in the way of doing so.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jingoo Han [Thu, 14 Nov 2013 22:32:04 +0000 (14:32 -0800)]
drivers/w1/masters/w1-gpio.c: use dev_get_platdata()
Use the wrapper function for retrieving the platform data instead of
accessing dev->platform_data directly. This is a cosmetic change to make
the code simpler and enhance the readability.
Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Acked-by: Evgeniy Polyakov <zbr@ioremap.net>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Wolfram Sang [Thu, 14 Nov 2013 22:32:03 +0000 (14:32 -0800)]
sched: remove INIT_COMPLETION
All users are converted over to reinit_completion(). Remove the old
macro now.
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Wolfram Sang [Thu, 14 Nov 2013 22:32:02 +0000 (14:32 -0800)]
tree-wide: use reinit_completion instead of INIT_COMPLETION
Use this new function to make code more comprehensible, since we are
reinitialzing the completion, not initializing.
[akpm@linux-foundation.org: linux-next resyncs]
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Acked-by: Linus Walleij <linus.walleij@linaro.org> (personally at LCE13)
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Wolfram Sang [Thu, 14 Nov 2013 22:32:01 +0000 (14:32 -0800)]
sched: replace INIT_COMPLETION with reinit_completion
For the casual device driver writer, it is hard to remember when to use
init_completion (to init a completion structure) or INIT_COMPLETION (to
*reinit* a completion structure). Furthermore, while all other
completion functions exepct a pointer as a parameter, INIT_COMPLETION
does not. To make it easier to remember which function to use and to
make code more readable, introduce a new inline function with the proper
name and consistent argument type. Update the kernel-doc for
init_completion while we are here.
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Acked-by: Linus Walleij <linus.walleij@linaro.org> (personally at LCE13)
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexander Holler [Thu, 14 Nov 2013 22:32:00 +0000 (14:32 -0800)]
drivers/rtc/rtc-hid-sensor-time.c: enable HID input processing early
Enable the processing of HID input records before the RTC will be
registered, in order to allow the RTC register function to read clock.
Without doing that the clock can only be read after the probe function
has finished.
Signed-off-by: Alexander Holler <holler@ahsoftware.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jingoo Han [Thu, 14 Nov 2013 22:31:59 +0000 (14:31 -0800)]
drivers/rtc/rtc-hid-sensor-time.c: use dev_get_platdata()
Use the wrapper function for retrieving the platform data instead of
accessing dev->platform_data directly. This is a cosmetic change to
make the code simpler and enhance the readability.
Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kees Cook [Thu, 14 Nov 2013 22:31:58 +0000 (14:31 -0800)]
vsprintf: ignore %n again
This ignores %n in printf again, as was originally documented.
Implementing %n poses a greater security risk than utility, so it should
stay ignored. To help anyone attempting to use %n, a warning will be
emitted if it is encountered.
Based on an earlier patch by Joe Perches.
Because %n was designed to write to pointers on the stack, it has been
frequently used as an attack vector when bugs are found that leak
user-controlled strings into functions that ultimately process format
strings. While this class of bug can still be turned into an
information leak, removing %n eliminates the common method of elevating
such a bug into an arbitrary kernel memory writing primitive,
significantly reducing the danger of this class of bug.
For seq_file users that need to know the length of a written string for
padding, please see seq_setwidth() and seq_pad() instead.
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Joe Perches <joe@perches.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tetsuo Handa [Thu, 14 Nov 2013 22:31:57 +0000 (14:31 -0800)]
seq_file: remove "%n" usage from seq_file users
All seq_printf() users are using "%n" for calculating padding size,
convert them to use seq_setwidth() / seq_pad() pair.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Joe Perches <joe@perches.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tetsuo Handa [Thu, 14 Nov 2013 22:31:56 +0000 (14:31 -0800)]
seq_file: introduce seq_setwidth() and seq_pad()
There are several users who want to know bytes written by seq_*() for
alignment purpose. Currently they are using %n format for knowing it
because seq_*() returns 0 on success.
This patch introduces seq_setwidth() and seq_pad() for allowing them to
align without using %n format.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Joe Perches <joe@perches.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Peter Zijlstra [Thu, 14 Nov 2013 22:31:54 +0000 (14:31 -0800)]
lockref: use BLOATED_SPINLOCKS to avoid explicit config dependencies
Avoid the fragile Kconfig construct guestimating spinlock_t sizes; use a
friendly compile-time test to determine this.
[kirill.shutemov@linux.intel.com: drop CONFIG_CMPXCHG_LOCKREF]
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kirill A. Shutemov [Thu, 14 Nov 2013 22:31:53 +0000 (14:31 -0800)]
mm: create a separate slab for page->ptl allocation
If DEBUG_SPINLOCK and DEBUG_LOCK_ALLOC are enabled spinlock_t on x86_64
is 72 bytes. For page->ptl they will be allocated from kmalloc-96 slab,
so we loose 24 on each. An average system can easily allocate few tens
thousands of page->ptl and overhead is significant.
Let's create a separate slab for page->ptl allocation to solve this.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Peter Zijlstra [Thu, 14 Nov 2013 22:31:52 +0000 (14:31 -0800)]
mm: properly separate the bloated ptl from the regular case
Use kernel/bounds.c to convert build-time spinlock_t size check into a
preprocessor symbol and apply that to properly separate the page::ptl
situation.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kirill A. Shutemov [Thu, 14 Nov 2013 22:31:51 +0000 (14:31 -0800)]
mm: dynamically allocate page->ptl if it cannot be embedded to struct page
If split page table lock is in use, we embed the lock into struct page
of table's page. We have to disable split lock, if spinlock_t is too
big be to be embedded, like when DEBUG_SPINLOCK or DEBUG_LOCK_ALLOC
enabled.
This patch add support for dynamic allocation of split page table lock
if we can't embed it to struct page.
page->ptl is unsigned long now and we use it as spinlock_t if
sizeof(spinlock_t) <= sizeof(long), otherwise it's pointer to spinlock_t.
The spinlock_t allocated in pgtable_page_ctor() for PTE table and in
pgtable_pmd_page_ctor() for PMD table. All other helpers converted to
support dynamically allocated page->ptl.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kirill A. Shutemov [Thu, 14 Nov 2013 22:31:50 +0000 (14:31 -0800)]
xtensa: use buddy allocator for PTE table
At the moment xtensa uses slab allocator for PTE table. It doesn't work
with enabled split page table lock: slab uses page->slab_cache and
page->first_page for its pages. These fields share stroage with
page->ptl.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Chris Zankel <chris@zankel.net>
Acked-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kirill A. Shutemov [Thu, 14 Nov 2013 22:31:49 +0000 (14:31 -0800)]
iommu/arm-smmu: handle pgtable_page_ctor() fail
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Rob Herring <rob.herring@calxeda.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kirill A. Shutemov [Thu, 14 Nov 2013 22:31:48 +0000 (14:31 -0800)]
xtensa: handle pgtable_page_ctor() fail
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kirill A. Shutemov [Thu, 14 Nov 2013 22:31:47 +0000 (14:31 -0800)]
x86: handle pgtable_page_ctor() fail
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kirill A. Shutemov [Thu, 14 Nov 2013 22:31:46 +0000 (14:31 -0800)]
unicore32: handle pgtable_page_ctor() fail
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kirill A. Shutemov [Thu, 14 Nov 2013 22:31:44 +0000 (14:31 -0800)]
um: handle pgtable_page_ctor() fail
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kirill A. Shutemov [Thu, 14 Nov 2013 22:31:43 +0000 (14:31 -0800)]
tile: handle pgtable_page_ctor() fail
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kirill A. Shutemov [Thu, 14 Nov 2013 22:31:42 +0000 (14:31 -0800)]
sparc: handle pgtable_page_ctor() fail
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kirill A. Shutemov [Thu, 14 Nov 2013 22:31:41 +0000 (14:31 -0800)]
sh: handle pgtable_page_ctor() fail
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kirill A. Shutemov [Thu, 14 Nov 2013 22:31:40 +0000 (14:31 -0800)]
score: handle pgtable_page_ctor() fail
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Chen Liqin <liqin.chen@sunplusct.com>
Acked-by: Lennox Wu <lennox.wu@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kirill A. Shutemov [Thu, 14 Nov 2013 22:31:39 +0000 (14:31 -0800)]
s390: handle pgtable_page_ctor() fail
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kirill A. Shutemov [Thu, 14 Nov 2013 22:31:38 +0000 (14:31 -0800)]
powerpc: handle pgtable_page_ctor() fail
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kirill A. Shutemov [Thu, 14 Nov 2013 22:31:37 +0000 (14:31 -0800)]
parisc: handle pgtable_page_ctor() fail
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Helge Deller <deller@gmx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kirill A. Shutemov [Thu, 14 Nov 2013 22:31:36 +0000 (14:31 -0800)]
mips: handle pgtable_page_ctor() fail
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kirill A. Shutemov [Thu, 14 Nov 2013 22:31:35 +0000 (14:31 -0800)]
metag: handle pgtable_page_ctor() fail
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kirill A. Shutemov [Thu, 14 Nov 2013 22:31:34 +0000 (14:31 -0800)]
m68k: handle pgtable_page_ctor() fail
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kirill A. Shutemov [Thu, 14 Nov 2013 22:31:33 +0000 (14:31 -0800)]
m32r: handle pgtable_page_ctor() fail
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kirill A. Shutemov [Thu, 14 Nov 2013 22:31:32 +0000 (14:31 -0800)]
ia64: handle pgtable_page_ctor() fail
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kirill A. Shutemov [Thu, 14 Nov 2013 22:31:31 +0000 (14:31 -0800)]
hexagon: handle pgtable_page_ctor() fail
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Richard Kuo <rkuo@codeaurora.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kirill A. Shutemov [Thu, 14 Nov 2013 22:31:30 +0000 (14:31 -0800)]
frv: handle pgtable_page_ctor() fail
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kirill A. Shutemov [Thu, 14 Nov 2013 22:31:29 +0000 (14:31 -0800)]
cris: handle pgtable_page_ctor() fail
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mikael Starvik <starvik@axis.com>
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kirill A. Shutemov [Thu, 14 Nov 2013 22:31:28 +0000 (14:31 -0800)]
avr32: handle pgtable_page_ctor() fail
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Acked-by: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kirill A. Shutemov [Thu, 14 Nov 2013 22:31:27 +0000 (14:31 -0800)]
arm64: handle pgtable_page_ctor() fail
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kirill A. Shutemov [Thu, 14 Nov 2013 22:31:26 +0000 (14:31 -0800)]
arm: handle pgtable_page_ctor() fail
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Russell King <linux@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kirill A. Shutemov [Thu, 14 Nov 2013 22:31:25 +0000 (14:31 -0800)]
arc: handle pgtable_page_ctor() fail
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Vineet Gupta <vgupta@synopsys.com> [for arch/arc bits]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kirill A. Shutemov [Thu, 14 Nov 2013 22:31:24 +0000 (14:31 -0800)]
alpha: handle pgtable_page_ctor() fail
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kirill A. Shutemov [Thu, 14 Nov 2013 22:31:23 +0000 (14:31 -0800)]
openrisc: add missing pgtable_page_ctor/dtor calls
It will fix NR_PAGETABLE accounting. It's also required if the arch is
going ever support split ptl.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jonas Bonn <jonas@southpole.se>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kirill A. Shutemov [Thu, 14 Nov 2013 22:31:22 +0000 (14:31 -0800)]
mn10300: add missing pgtable_page_ctor/dtor calls
It will fix NR_PAGETABLE accounting. It's also required if the arch is
going ever support split ptl.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: David Howells <dhowells@redhat.com>
Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kirill A. Shutemov [Thu, 14 Nov 2013 22:31:21 +0000 (14:31 -0800)]
microblaze: add missing pgtable_page_ctor/dtor calls
It will fix NR_PAGETABLE accounting. It's also required if the arch is
going ever support split ptl.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Michal Simek <monstr@monstr.eu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kirill A. Shutemov [Thu, 14 Nov 2013 22:31:20 +0000 (14:31 -0800)]
mm: allow pgtable_page_ctor() to fail
Change pgtable_page_ctor() return type from void to bool. Returns true,
if initialization is successful and false otherwise.
Current implementation never fails, but it will change later.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kirill A. Shutemov [Thu, 14 Nov 2013 22:31:19 +0000 (14:31 -0800)]
xtensa: fix potential NULL-pointer dereference
Add missing check for memory allocation fail.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kirill A. Shutemov [Thu, 14 Nov 2013 22:31:18 +0000 (14:31 -0800)]
m32r: fix potential NULL-pointer dereference
Add missing check for memory allocation fail.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kirill A. Shutemov [Thu, 14 Nov 2013 22:31:17 +0000 (14:31 -0800)]
cris: fix potential NULL-pointer dereference
Add missing check for memory allocation fail.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mikael Starvik <starvik@axis.com>
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kirill A. Shutemov [Thu, 14 Nov 2013 22:31:13 +0000 (14:31 -0800)]
x86: add missed pgtable_pmd_page_ctor/dtor calls for preallocated pmds
In split page table lock case, we embed spinlock_t into struct page.
For obvious reason, we don't want to increase size of struct page if
spinlock_t is too big, like with DEBUG_SPINLOCK or DEBUG_LOCK_ALLOC or
on -rt kernel. So we disable split page table lock, if spinlock_t is
too big.
This patchset allows to allocate the lock dynamically if spinlock_t is
big. In this page->ptl is used to store pointer to spinlock instead of
spinlock itself. It costs additional cache line for indirect access,
but fix page fault scalability for multi-threaded applications.
LOCK_STAT depends on DEBUG_SPINLOCK, so on current kernel enabling
LOCK_STAT to analyse scalability issues breaks scalability. ;)
The patchset mostly fixes this. Results for ./thp_memscale -c 80 -b 512M
on 4-socket machine:
baseline, no CONFIG_LOCK_STAT: 9.
115460703 seconds time elapsed
baseline, CONFIG_LOCK_STAT=y: 53.
890567123 seconds time elapsed
patched, no CONFIG_LOCK_STAT: 8.
852250368 seconds time elapsed
patched, CONFIG_LOCK_STAT=y: 11.
069770759 seconds time elapsed
Patch count is scary, but most of them trivial. Overview:
Patches 1-4 Few bug fixes. No dependencies to other patches.
Probably should applied as soon as possible.
Patch 5 Changes signature of pgtable_page_ctor(). We will use it
for dynamic lock allocation, so it can fail.
Patches 6-8 Add missing constructor/destructor calls on few archs.
It's fixes NR_PAGETABLE accounting and prepare to use
split ptl.
Patches 9-33 Add pgtable_page_ctor() fail handling to all archs.
Patches 34 Finally adds support of dynamically-allocated page->pte.
Also contains documentation for split page table lock.
This patch (of 34):
I've missed that we preallocate few pmds on pgd_alloc() if X86_PAE
enabled. Let's add missed constructor/destructor calls.
I haven't noticed it during testing since prep_new_page() clears
page->mapping and therefore page->ptl. It's effectively equal to
spin_lock_init(&page->ptl).
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chen Liqin <liqin.chen@sunplusct.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Howells <dhowells@redhat.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
Cc: Lennox Wu <lennox.wu@gmail.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rob Herring <rob.herring@calxeda.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kirill A. Shutemov [Thu, 14 Nov 2013 22:31:10 +0000 (14:31 -0800)]
x86, mm: enable split page table lock for PMD level
Enable PMD split page table lock for X86_64 and PAE.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Tested-by: Alex Thorlton <athorlton@sgi.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: "Eric W . Biederman" <ebiederm@xmission.com>
Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Jones <davej@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Robin Holt <robinmholt@gmail.com>
Cc: Sedat Dilek <sedat.dilek@gmail.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Hugh Dickins <hughd@google.com>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kirill A. Shutemov [Thu, 14 Nov 2013 22:31:07 +0000 (14:31 -0800)]
mm: implement split page table lock for PMD level
The basic idea is the same as with PTE level: the lock is embedded into
struct page of table's page.
We can't use mm->pmd_huge_pte to store pgtables for THP, since we don't
take mm->page_table_lock anymore. Let's reuse page->lru of table's page
for that.
pgtable_pmd_page_ctor() returns true, if initialization is successful
and false otherwise. Current implementation never fails, but assumption
that constructor can fail will help to port it to -rt where spinlock_t
is rather huge and cannot be embedded into struct page -- dynamic
allocation is required.
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Tested-by: Alex Thorlton <athorlton@sgi.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "Eric W . Biederman" <ebiederm@xmission.com>
Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Jones <davej@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Robin Holt <robinmholt@gmail.com>
Cc: Sedat Dilek <sedat.dilek@gmail.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Hugh Dickins <hughd@google.com>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>