GitHub/exynos8895/android_kernel_samsung_universal8895.git
11 years agoMerge git://github.com/agraf/linux-2.6.git kvm-ppc-next into queue
Gleb Natapov [Sun, 28 Apr 2013 09:50:07 +0000 (12:50 +0300)]
Merge git://github.com/agraf/linux-2.6.git kvm-ppc-next into queue

11 years agoKVM: x86: Rework request for immediate exit
Jan Kiszka [Sun, 28 Apr 2013 08:50:52 +0000 (10:50 +0200)]
KVM: x86: Rework request for immediate exit

The VMX implementation of enable_irq_window raised
KVM_REQ_IMMEDIATE_EXIT after we checked it in vcpu_enter_guest. This
caused infinite loops on vmentry. Fix it by letting enable_irq_window
signal the need for an immediate exit via its return value and drop
KVM_REQ_IMMEDIATE_EXIT.

This issue only affects nested VMX scenarios.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
11 years agokvm, svm: Fix typo in printk message
Borislav Petkov [Thu, 25 Apr 2013 22:22:01 +0000 (00:22 +0200)]
kvm, svm: Fix typo in printk message

It is "exit_int_info". It is actually EXITINTINFO in the official docs
but we don't like screaming docs.

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
11 years agoKVM: VMX: remove unprintable characters from comment
Jan Kiszka [Sat, 27 Apr 2013 10:58:00 +0000 (12:58 +0200)]
KVM: VMX: remove unprintable characters from comment

Slipped in while copy&pasting from the SDM.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
11 years agoKVM: PPC: Book3S: Facilities to save/restore XICS presentation ctrler state
Paul Mackerras [Wed, 17 Apr 2013 20:32:26 +0000 (20:32 +0000)]
KVM: PPC: Book3S: Facilities to save/restore XICS presentation ctrler state

This adds the ability for userspace to save and restore the state
of the XICS interrupt presentation controllers (ICPs) via the
KVM_GET/SET_ONE_REG interface.  Since there is one ICP per vcpu, we
simply define a new 64-bit register in the ONE_REG space for the ICP
state.  The state includes the CPU priority setting, the pending IPI
priority, and the priority and source number of any pending external
interrupt.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
11 years agoKVM: PPC: Book3S: Add support for ibm,int-on/off RTAS calls
Paul Mackerras [Wed, 17 Apr 2013 20:32:04 +0000 (20:32 +0000)]
KVM: PPC: Book3S: Add support for ibm,int-on/off RTAS calls

This adds support for the ibm,int-on and ibm,int-off RTAS calls to the
in-kernel XICS emulation and corrects the handling of the saved
priority by the ibm,set-xive RTAS call.  With this, ibm,int-off sets
the specified interrupt's priority in its saved_priority field and
sets the priority to 0xff (the least favoured value).  ibm,int-on
restores the saved_priority to the priority field, and ibm,set-xive
sets both the priority and the saved_priority to the specified
priority value.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
11 years agoKVM: PPC: Book3S HV: Improve real-mode handling of external interrupts
Paul Mackerras [Wed, 17 Apr 2013 20:31:41 +0000 (20:31 +0000)]
KVM: PPC: Book3S HV: Improve real-mode handling of external interrupts

This streamlines our handling of external interrupts that come in
while we're in the guest.  First, when waking up a hardware thread
that was napping, we split off the "napping due to H_CEDE" case
earlier, and use the code that handles an external interrupt (0x500)
in the guest to handle that too.  Secondly, the code that handles
those external interrupts now checks if any other thread is exiting
to the host before bouncing an external interrupt to the guest, and
also checks that there is actually an external interrupt pending for
the guest before setting the LPCR MER bit (mediated external request).

This also makes sure that we clear the "ceded" flag when we handle a
wakeup from cede in real mode, and fixes a potential infinite loop
in kvmppc_run_vcpu() which can occur if we ever end up with the ceded
flag set but MSR[EE] off.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
11 years agoKVM: PPC: Book3S HV: Add support for real mode ICP in XICS emulation
Benjamin Herrenschmidt [Wed, 17 Apr 2013 20:31:15 +0000 (20:31 +0000)]
KVM: PPC: Book3S HV: Add support for real mode ICP in XICS emulation

This adds an implementation of the XICS hypercalls in real mode for HV
KVM, which allows us to avoid exiting the guest MMU context on all
threads for a variety of operations such as fetching a pending
interrupt, EOI of messages, IPIs, etc.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
11 years agoKVM: PPC: Book3S HV: Speed up wakeups of CPUs on HV KVM
Benjamin Herrenschmidt [Wed, 17 Apr 2013 20:30:50 +0000 (20:30 +0000)]
KVM: PPC: Book3S HV: Speed up wakeups of CPUs on HV KVM

Currently, we wake up a CPU by sending a host IPI with
smp_send_reschedule() to thread 0 of that core, which will take all
threads out of the guest, and cause them to re-evaluate their
interrupt status on the way back in.

This adds a mechanism to differentiate real host IPIs from IPIs sent
by KVM for guest threads to poke each other, in order to target the
guest threads precisely when possible and avoid that global switch of
the core to host state.

We then use this new facility in the in-kernel XICS code.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
11 years agoKVM: PPC: Book3S: Add kernel emulation for the XICS interrupt controller
Benjamin Herrenschmidt [Wed, 17 Apr 2013 20:30:26 +0000 (20:30 +0000)]
KVM: PPC: Book3S: Add kernel emulation for the XICS interrupt controller

This adds in-kernel emulation of the XICS (eXternal Interrupt
Controller Specification) interrupt controller specified by PAPR, for
both HV and PR KVM guests.

The XICS emulation supports up to 1048560 interrupt sources.
Interrupt source numbers below 16 are reserved; 0 is used to mean no
interrupt and 2 is used for IPIs.  Internally these are represented in
blocks of 1024, called ICS (interrupt controller source) entities, but
that is not visible to userspace.

Each vcpu gets one ICP (interrupt controller presentation) entity,
used to store the per-vcpu state such as vcpu priority, pending
interrupt state, IPI request, etc.

This does not include any API or any way to connect vcpus to their
ICP state; that will be added in later patches.

This is based on an initial implementation by Michael Ellerman
<michael@ellerman.id.au> reworked by Benjamin Herrenschmidt and
Paul Mackerras.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
[agraf: fix typo, add dependency on !KVM_MPIC]
Signed-off-by: Alexander Graf <agraf@suse.de>
11 years agoKVM: PPC: Book3S: Add infrastructure to implement kernel-side RTAS calls
Michael Ellerman [Wed, 17 Apr 2013 20:30:00 +0000 (20:30 +0000)]
KVM: PPC: Book3S: Add infrastructure to implement kernel-side RTAS calls

For pseries machine emulation, in order to move the interrupt
controller code to the kernel, we need to intercept some RTAS
calls in the kernel itself.  This adds an infrastructure to allow
in-kernel handlers to be registered for RTAS services by name.
A new ioctl, KVM_PPC_RTAS_DEFINE_TOKEN, then allows userspace to
associate token values with those service names.  Then, when the
guest requests an RTAS service with one of those token values, it
will be handled by the relevant in-kernel handler rather than being
passed up to userspace as at present.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
[agraf: fix warning]
Signed-off-by: Alexander Graf <agraf@suse.de>
11 years agokvm/ppc/mpic: Eliminate mmio_mapped
Scott Wood [Thu, 25 Apr 2013 14:11:24 +0000 (14:11 +0000)]
kvm/ppc/mpic: Eliminate mmio_mapped

We no longer need to keep track of this now that MPIC destruction
always happens either during VM destruction (after MMIO has been
destroyed) or during a failed creation (before the fd has been exposed
to userspace, and thus before the MMIO region could have been
registered).

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
11 years agokvm: destroy emulated devices on VM exit
Scott Wood [Thu, 25 Apr 2013 14:11:23 +0000 (14:11 +0000)]
kvm: destroy emulated devices on VM exit

The hassle of getting refcounting right was greater than the hassle
of keeping a list of devices to destroy on VM exit.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
11 years agoKVM: IA64: Carry non-ia64 changes into ia64
Alexander Graf [Thu, 25 Apr 2013 09:50:03 +0000 (11:50 +0200)]
KVM: IA64: Carry non-ia64 changes into ia64

We changed a few things in non-ia64 code paths. This patch blindly applies
the changes to the ia64 code as well, hoping it proves useful in case anyone
revives the ia64 kvm code.

Signed-off-by: Alexander Graf <agraf@suse.de>
11 years agoKVM: PPC: MPIC: Restrict to e500 platforms
Alexander Graf [Tue, 16 Apr 2013 23:54:26 +0000 (01:54 +0200)]
KVM: PPC: MPIC: Restrict to e500 platforms

The code as is doesn't make any sense on non-e500 platforms. Restrict it
there, so that people don't get wrong ideas on what would actually work.

This patch should get reverted as soon as it's possible to either run e500
guests on non-e500 hosts or the MPIC emulation gains support for non-e500
modes.

Signed-off-by: Alexander Graf <agraf@suse.de>
11 years agoKVM: PPC: MPIC: Add support for KVM_IRQ_LINE
Alexander Graf [Tue, 16 Apr 2013 22:37:57 +0000 (00:37 +0200)]
KVM: PPC: MPIC: Add support for KVM_IRQ_LINE

Now that all pieces are in place for reusing generic irq infrastructure,
we can copy x86's implementation of KVM_IRQ_LINE irq injection and simply
reuse it for PPC, as it will work there just as well.

Signed-off-by: Alexander Graf <agraf@suse.de>
11 years agoKVM: PPC: Support irq routing and irqfd for in-kernel MPIC
Alexander Graf [Tue, 16 Apr 2013 15:42:19 +0000 (17:42 +0200)]
KVM: PPC: Support irq routing and irqfd for in-kernel MPIC

Now that all the irq routing and irqfd pieces are generic, we can expose
real irqchip support to all of KVM's internal helpers.

This allows us to use irqfd with the in-kernel MPIC.

Signed-off-by: Alexander Graf <agraf@suse.de>
11 years agokvm/ppc/mpic: add KVM_CAP_IRQ_MPIC
Scott Wood [Fri, 12 Apr 2013 14:08:47 +0000 (14:08 +0000)]
kvm/ppc/mpic: add KVM_CAP_IRQ_MPIC

Enabling this capability connects the vcpu to the designated in-kernel
MPIC.  Using explicit connections between vcpus and irqchips allows
for flexibility, but the main benefit at the moment is that it
simplifies the code -- KVM doesn't need vm-global state to remember
which MPIC object is associated with this vm, and it doesn't need to
care about ordering between irqchip creation and vcpu creation.

Signed-off-by: Scott Wood <scottwood@freescale.com>
[agraf: add stub functions for kvmppc_mpic_{dis,}connect_vcpu]
Signed-off-by: Alexander Graf <agraf@suse.de>
11 years agokvm/ppc/mpic: in-kernel MPIC emulation
Scott Wood [Fri, 12 Apr 2013 14:08:46 +0000 (14:08 +0000)]
kvm/ppc/mpic: in-kernel MPIC emulation

Hook the MPIC code up to the KVM interfaces, add locking, etc.

Signed-off-by: Scott Wood <scottwood@freescale.com>
[agraf: add stub function for kvmppc_mpic_set_epr, non-booke, 64bit]
Signed-off-by: Alexander Graf <agraf@suse.de>
11 years agokvm/ppc/mpic: adapt to kernel style and environment
Scott Wood [Fri, 12 Apr 2013 14:08:45 +0000 (14:08 +0000)]
kvm/ppc/mpic: adapt to kernel style and environment

Remove braces that Linux style doesn't permit, remove space after
'*' that Lindent added, keep error/debug strings contiguous, etc.

Substitute type names, debug prints, etc.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
11 years agokvm/ppc/mpic: remove some obviously unneeded code
Scott Wood [Fri, 12 Apr 2013 14:08:44 +0000 (14:08 +0000)]
kvm/ppc/mpic: remove some obviously unneeded code

Remove some parts of the code that are obviously QEMU or Raven specific
before fixing style issues, to reduce the style issues that need to be
fixed.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
11 years agokvm/ppc/mpic: import hw/openpic.c from QEMU
Scott Wood [Fri, 12 Apr 2013 14:08:43 +0000 (14:08 +0000)]
kvm/ppc/mpic: import hw/openpic.c from QEMU

This is QEMU's hw/openpic.c from commit
abd8d4a4d6dfea7ddea72f095f993e1de941614e ("Update version for
1.4.0-rc0"), run through Lindent with no other changes to ease merging
future changes between Linux and QEMU.  Remaining style issues
(including those introduced by Lindent) will be fixed in a later patch.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
11 years agokvm: add device control API
Scott Wood [Fri, 12 Apr 2013 14:08:42 +0000 (14:08 +0000)]
kvm: add device control API

Currently, devices that are emulated inside KVM are configured in a
hardcoded manner based on an assumption that any given architecture
only has one way to do it.  If there's any need to access device state,
it is done through inflexible one-purpose-only IOCTLs (e.g.
KVM_GET/SET_LAPIC).  Defining new IOCTLs for every little thing is
cumbersome and depletes a limited numberspace.

This API provides a mechanism to instantiate a device of a certain
type, returning an ID that can be used to set/get attributes of the
device.  Attributes may include configuration parameters (e.g.
register base address), device state, operational commands, etc.  It
is similar to the ONE_REG API, except that it acts on devices rather
than vcpus.

Both device types and individual attributes can be tested without having
to create the device or get/set the attribute, without the need for
separately managing enumerated capabilities.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
11 years agoKVM: Move irqfd resample cap handling to generic code
Alexander Graf [Tue, 16 Apr 2013 10:12:49 +0000 (12:12 +0200)]
KVM: Move irqfd resample cap handling to generic code

Now that we have most irqfd code completely platform agnostic, let's move
irqfd's resample capability return to generic code as well.

Signed-off-by: Alexander Graf <agraf@suse.de>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
11 years agoKVM: Move irq routing setup to irqchip.c
Alexander Graf [Mon, 15 Apr 2013 21:23:21 +0000 (23:23 +0200)]
KVM: Move irq routing setup to irqchip.c

Setting up IRQ routes is nothing IOAPIC specific. Extract everything
that really is generic code into irqchip.c and only leave the ioapic
specific bits to irq_comm.c.

Signed-off-by: Alexander Graf <agraf@suse.de>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
11 years agoKVM: Extract generic irqchip logic into irqchip.c
Alexander Graf [Mon, 15 Apr 2013 21:04:10 +0000 (23:04 +0200)]
KVM: Extract generic irqchip logic into irqchip.c

The current irq_comm.c file contains pieces of code that are generic
across different irqchip implementations, as well as code that is
fully IOAPIC specific.

Split the generic bits out into irqchip.c.

Signed-off-by: Alexander Graf <agraf@suse.de>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
11 years agoKVM: Move irq routing to generic code
Alexander Graf [Mon, 15 Apr 2013 19:12:53 +0000 (21:12 +0200)]
KVM: Move irq routing to generic code

The IRQ routing set ioctl lives in the hacky device assignment code inside
of KVM today. This is definitely the wrong place for it. Move it to the much
more natural kvm_main.c.

Signed-off-by: Alexander Graf <agraf@suse.de>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
11 years agoKVM: Remove kvm_get_intr_delivery_bitmask
Alexander Graf [Mon, 15 Apr 2013 08:50:54 +0000 (10:50 +0200)]
KVM: Remove kvm_get_intr_delivery_bitmask

The prototype has been stale for a while, I can't spot any real function
define behind it. Let's just remove it.

Signed-off-by: Alexander Graf <agraf@suse.de>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
11 years agoKVM: Drop __KVM_HAVE_IOAPIC condition on irq routing
Alexander Graf [Mon, 15 Apr 2013 08:49:31 +0000 (10:49 +0200)]
KVM: Drop __KVM_HAVE_IOAPIC condition on irq routing

We have a capability enquire system that allows user space to ask kvm
whether a feature is available.

The point behind this system is that we can have different kernel
configurations with different capabilities and user space can adjust
accordingly.

Because features can always be non existent, we can drop any #ifdefs
on CAP defines that could be used generically, like the irq routing
bits. These can be easily reused for non-IOAPIC systems as well.

Signed-off-by: Alexander Graf <agraf@suse.de>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
11 years agoKVM: Introduce CONFIG_HAVE_KVM_IRQ_ROUTING
Alexander Graf [Wed, 17 Apr 2013 11:29:30 +0000 (13:29 +0200)]
KVM: Introduce CONFIG_HAVE_KVM_IRQ_ROUTING

Quite a bit of code in KVM has been conditionalized on availability of
IOAPIC emulation. However, most of it is generically applicable to
platforms that don't have an IOPIC, but a different type of irq chip.

Make code that only relies on IRQ routing, not an APIC itself, on
CONFIG_HAVE_KVM_IRQ_ROUTING, so that we can reuse it later.

Signed-off-by: Alexander Graf <agraf@suse.de>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
11 years agoKVM: Add KVM_IRQCHIP_NUM_PINS in addition to KVM_IOAPIC_NUM_PINS
Alexander Graf [Mon, 15 Apr 2013 08:42:33 +0000 (10:42 +0200)]
KVM: Add KVM_IRQCHIP_NUM_PINS in addition to KVM_IOAPIC_NUM_PINS

The concept of routing interrupt lines to an irqchip is nothing
that is IOAPIC specific. Every irqchip has a maximum number of pins
that can be linked to irq lines.

So let's add a new define that allows us to reuse generic code for
non-IOAPIC platforms.

Signed-off-by: Alexander Graf <agraf@suse.de>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
11 years agoKVM: PPC: Book3S HV: Report VPA and DTL modifications in dirty map
Paul Mackerras [Thu, 18 Apr 2013 19:51:04 +0000 (19:51 +0000)]
KVM: PPC: Book3S HV: Report VPA and DTL modifications in dirty map

At present, the KVM_GET_DIRTY_LOG ioctl doesn't report modifications
done by the host to the virtual processor areas (VPAs) and dispatch
trace logs (DTLs) registered by the guest.  This is because those
modifications are done either in real mode or in the host kernel
context, and in neither case does the access go through the guest's
HPT, and thus no change (C) bit gets set in the guest's HPT.

However, the changes done by the host do need to be tracked so that
the modified pages get transferred when doing live migration.  In
order to track these modifications, this adds a dirty flag to the
struct representing the VPA/DTL areas, and arranges to set the flag
when the VPA/DTL gets modified by the host.  Then, when we are
collecting the dirty log, we also check the dirty flags for the
VPA and DTL for each vcpu and set the relevant bit in the dirty log
if necessary.  Doing this also means we now need to keep track of
the guest physical address of the VPA/DTL areas.

So as not to lose track of modifications to a VPA/DTL area when it gets
unregistered, or when a new area gets registered in its place, we need
to transfer the dirty state to the rmap chain.  This adds code to
kvmppc_unpin_guest_page() to do that if the area was dirty.  To simplify
that code, we now require that all VPA, DTL and SLB shadow buffer areas
fit within a single host page.  Guests already comply with this
requirement because pHyp requires that these areas not cross a 4k
boundary.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
11 years agoKVM: PPC: Book3S HV: Make HPT reading code notice R/C bit changes
Paul Mackerras [Thu, 18 Apr 2013 19:50:24 +0000 (19:50 +0000)]
KVM: PPC: Book3S HV: Make HPT reading code notice R/C bit changes

At present, the code that determines whether a HPT entry has changed,
and thus needs to be sent to userspace when it is copying the HPT,
doesn't consider a hardware update to the reference and change bits
(R and C) in the HPT entries to constitute a change that needs to
be sent to userspace.  This adds code to check for changes in R and C
when we are scanning the HPT to find changed entries, and adds code
to set the changed flag for the HPTE when we update the R and C bits
in the guest view of the HPTE.

Since we now need to set the HPTE changed flag in book3s_64_mmu_hv.c
as well as book3s_hv_rm_mmu.c, we move the note_hpte_modification()
function into kvm_book3s_64.h.

Current Linux guest kernels don't use the hardware updates of R and C
in the HPT, so this change won't affect them.  Linux (or other) kernels
might in future want to use the R and C bits and have them correctly
transferred across when a guest is migrated, so it is better to correct
this deficiency.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
11 years agoKVM: PPC: e500: Add e6500 core to Kconfig description
Mihai Caraman [Thu, 11 Apr 2013 00:03:14 +0000 (00:03 +0000)]
KVM: PPC: e500: Add e6500 core to Kconfig description

Add e6500 core to Kconfig description.

Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
11 years agoKVM: PPC: e500mc: Enable e6500 cores
Mihai Caraman [Thu, 11 Apr 2013 00:03:13 +0000 (00:03 +0000)]
KVM: PPC: e500mc: Enable e6500 cores

Extend processor compatibility names to e6500 cores.

Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
Reviewed-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Alexander Graf <agraf@suse.de>
11 years agoKVM: PPC: e500: Remove E.PT and E.HV.LRAT categories from VCPUs
Mihai Caraman [Thu, 11 Apr 2013 00:03:12 +0000 (00:03 +0000)]
KVM: PPC: e500: Remove E.PT and E.HV.LRAT categories from VCPUs

Embedded.Page Table (E.PT) category is not supported yet in e6500 kernel.
Configure TLBnCFG to remove E.PT and E.HV.LRAT categories from VCPUs.

Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
11 years agoKVM: PPC: e500: Add support for EPTCFG register
Mihai Caraman [Thu, 11 Apr 2013 00:03:11 +0000 (00:03 +0000)]
KVM: PPC: e500: Add support for EPTCFG register

EPTCFG register defined by E.PT is accessed unconditionally by Linux guests
in the presence of MAV 2.0. Emulate it now.

Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
11 years agoKVM: PPC: e500: Add support for TLBnPS registers
Mihai Caraman [Thu, 11 Apr 2013 00:03:10 +0000 (00:03 +0000)]
KVM: PPC: e500: Add support for TLBnPS registers

Add support for TLBnPS registers available in MMU Architecture Version
(MAV) 2.0.

Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
11 years agoKVM: PPC: e500: Move vcpu's MMU configuration to dedicated functions
Mihai Caraman [Thu, 11 Apr 2013 00:03:09 +0000 (00:03 +0000)]
KVM: PPC: e500: Move vcpu's MMU configuration to dedicated functions

Vcpu's MMU default configuration and geometry update logic was buried in
a chunk of code. Move them to dedicated functions to add more clarity.

Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
11 years agoKVM: PPC: e500: Expose MMU registers via ONE_REG
Mihai Caraman [Thu, 11 Apr 2013 00:03:08 +0000 (00:03 +0000)]
KVM: PPC: e500: Expose MMU registers via ONE_REG

MMU registers were exposed to user-space using sregs interface. Add them
to ONE_REG interface using kvmppc_get_one_reg/kvmppc_set_one_reg delegation
mechanism.

Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
11 years agoKVM: PPC: Book3E: Refactor ONE_REG ioctl implementation
Mihai Caraman [Thu, 11 Apr 2013 00:03:07 +0000 (00:03 +0000)]
KVM: PPC: Book3E: Refactor ONE_REG ioctl implementation

Refactor Book3E ONE_REG ioctl implementation to use kvmppc_get_one_reg/
kvmppc_set_one_reg delegation interface introduced by Book3S. This is
necessary for MMU SPRs which are platform specifics.

Get rid of useless case braces in the process.

Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
11 years agobooke: exit to user space if emulator request
Bharat Bhushan [Mon, 8 Apr 2013 00:32:15 +0000 (00:32 +0000)]
booke: exit to user space if emulator request

This allows the exit to user space if emulator request by returning
EMULATE_EXIT_USER. This will be used in subsequent patches in list

Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
11 years agoKVM: extend EMULATE_EXIT_USER to support different exit reasons
Bharat Bhushan [Mon, 8 Apr 2013 00:32:14 +0000 (00:32 +0000)]
KVM: extend EMULATE_EXIT_USER to support different exit reasons

Currently the instruction emulator code returns EMULATE_EXIT_USER
and common code initializes the "run->exit_reason = .." and
"vcpu->arch.hcall_needed = .." with one fixed reason.
But there can be different reasons when emulator need to exit
to user space. To support that the "run->exit_reason = .."
and "vcpu->arch.hcall_needed = .." initialization is moved a
level up to emulator.

Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
11 years agoRename EMULATE_DO_PAPR to EMULATE_EXIT_USER
Bharat Bhushan [Mon, 8 Apr 2013 00:32:13 +0000 (00:32 +0000)]
Rename EMULATE_DO_PAPR to EMULATE_EXIT_USER

Instruction emulation return EMULATE_DO_PAPR when it requires
exit to userspace on book3s. Similar return is required
for booke. EMULATE_DO_PAPR reads out to be confusing so it is
renamed to EMULATE_EXIT_USER.

Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
11 years agoKVM: PPC: debug stub interface parameter defined
Bharat Bhushan [Mon, 8 Apr 2013 00:32:12 +0000 (00:32 +0000)]
KVM: PPC: debug stub interface parameter defined

This patch defines the interface parameter for KVM_SET_GUEST_DEBUG
ioctl support. Follow up patches will use this for setting up
hardware breakpoints, watchpoints and software breakpoints.

Also kvm_arch_vcpu_ioctl_set_guest_debug() is brought one level below.
This is because I am not sure what is required for book3s. So this ioctl
behaviour will not change for book3s.

Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
11 years agoKVM: PPC: cache flush for kernel managed pages
Bharat Bhushan [Thu, 25 Apr 2013 06:33:57 +0000 (06:33 +0000)]
KVM: PPC: cache flush for kernel managed pages

Kernel can only access pages which maps as memory.
So flush only the valid kernel pages.

Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
11 years agoMerge branch 'kvm-arm-cleanup' from git://github.com/columbia/linux-kvm-arm.git
Gleb Natapov [Thu, 25 Apr 2013 15:23:48 +0000 (18:23 +0300)]
Merge branch 'kvm-arm-cleanup' from git://github.com/columbia/linux-kvm-arm.git

11 years agoKVM: X86 emulator: fix source operand decoding for 8bit mov[zs]x instructions
Gleb Natapov [Wed, 24 Apr 2013 10:38:36 +0000 (13:38 +0300)]
KVM: X86 emulator: fix source operand decoding for 8bit mov[zs]x instructions

Source operand for one byte mov[zs]x is decoded incorrectly if it is in
high byte register. Fix that.

Cc: stable@vger.kernel.org
Signed-off-by: Gleb Natapov <gleb@redhat.com>
11 years agoKVM: nVMX: VM_ENTRY/EXIT_LOAD_IA32_EFER overrides EFER.LMA settings
Jan Kiszka [Sun, 14 Apr 2013 10:44:54 +0000 (12:44 +0200)]
KVM: nVMX: VM_ENTRY/EXIT_LOAD_IA32_EFER overrides EFER.LMA settings

If we load the complete EFER MSR on entry or exit, EFER.LMA (and LME)
loading is skipped. Their consistency is already checked now before
starting the transition.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
11 years agoKVM: nVMX: Validate EFER values for VM_ENTRY/EXIT_LOAD_IA32_EFER
Jan Kiszka [Sat, 20 Apr 2013 08:52:36 +0000 (10:52 +0200)]
KVM: nVMX: Validate EFER values for VM_ENTRY/EXIT_LOAD_IA32_EFER

As we may emulate the loading of EFER on VM-entry and VM-exit, implement
the checks that VMX performs on the guest and host values on vmlaunch/
vmresume. Factor out kvm_valid_efer for this purpose which checks for
set reserved bits.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
11 years agoKVM: nVMX: Fix conditions for NMI injection
Jan Kiszka [Sun, 14 Apr 2013 19:04:26 +0000 (21:04 +0200)]
KVM: nVMX: Fix conditions for NMI injection

The logic for checking if interrupts can be injected has to be applied
also on NMIs. The difference is that if NMI interception is on these
events are consumed and blocked by the VM exit.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
11 years agoKVM: VMX: Move vmx_nmi_allowed after vmx_set_nmi_mask
Jan Kiszka [Sun, 14 Apr 2013 10:12:47 +0000 (12:12 +0200)]
KVM: VMX: Move vmx_nmi_allowed after vmx_set_nmi_mask

vmx_set_nmi_mask will soon be used by vmx_nmi_allowed. No functional
changes.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
11 years agoKVM: x86: Fix memory leak in vmx.c
Andrew Honig [Thu, 18 Apr 2013 16:38:14 +0000 (09:38 -0700)]
KVM: x86: Fix memory leak in vmx.c

If userspace creates and destroys multiple VMs within the same process
we leak 20k of memory in the userspace process context per VM.  This
patch frees the memory in kvm_arch_destroy_vm.  If the process exits
without closing the VM file descriptor or the file descriptor has been
shared with another process then we don't free the memory.

It's still possible for a user space process to leak memory if the last
process to close the fd for the VM is not the process that created it.
However, this is an unexpected case that's only caused by a user space
process that's misbehaving.

Signed-off-by: Andrew Honig <ahonig@google.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
11 years agoKVM: x86: fix error return code in kvm_arch_vcpu_init()
Wei Yongjun [Wed, 17 Apr 2013 23:41:00 +0000 (07:41 +0800)]
KVM: x86: fix error return code in kvm_arch_vcpu_init()

Fix to return a negative error code from the error handling
case instead of 0, as returned elsewhere in this function.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
11 years agoKVM: nVMX: Enable and disable shadow vmcs functionality
Abel Gordon [Thu, 18 Apr 2013 11:39:55 +0000 (14:39 +0300)]
KVM: nVMX: Enable and disable shadow vmcs functionality

Once L1 loads VMCS12 we enable shadow-vmcs capability and copy all the VMCS12
shadowed fields to the shadow vmcs.  When we release the VMCS12, we also
disable shadow-vmcs capability.

Signed-off-by: Abel Gordon <abelg@il.ibm.com>
Reviewed-by: Orit Wasserman <owasserm@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
11 years agoKVM: nVMX: Synchronize VMCS12 content with the shadow vmcs
Abel Gordon [Thu, 18 Apr 2013 11:39:25 +0000 (14:39 +0300)]
KVM: nVMX: Synchronize VMCS12 content with the shadow vmcs

Synchronize between the VMCS12 software controlled structure and the
processor-specific shadow vmcs

Signed-off-by: Abel Gordon <abelg@il.ibm.com>
Reviewed-by: Orit Wasserman <owasserm@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
11 years agoKVM: nVMX: Copy VMCS12 to processor-specific shadow vmcs
Abel Gordon [Thu, 18 Apr 2013 11:38:55 +0000 (14:38 +0300)]
KVM: nVMX: Copy VMCS12 to processor-specific shadow vmcs

Introduce a function used to copy fields from the software controlled VMCS12
to the processor-specific shadow vmcs

Signed-off-by: Abel Gordon <abelg@il.ibm.com>
Reviewed-by: Orit Wasserman <owasserm@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
11 years agoKVM: nVMX: Copy processor-specific shadow-vmcs to VMCS12
Abel Gordon [Thu, 18 Apr 2013 11:38:25 +0000 (14:38 +0300)]
KVM: nVMX: Copy processor-specific shadow-vmcs to VMCS12

Introduce a function used to copy fields from the processor-specific shadow
vmcs to the software controlled VMCS12

Signed-off-by: Abel Gordon <abelg@il.ibm.com>
Reviewed-by: Orit Wasserman <owasserm@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
11 years agoKVM: nVMX: Release shadow vmcs
Abel Gordon [Thu, 18 Apr 2013 11:37:55 +0000 (14:37 +0300)]
KVM: nVMX: Release shadow vmcs

Unmap vmcs12 and release the corresponding shadow vmcs

Signed-off-by: Abel Gordon <abelg@il.ibm.com>
Reviewed-by: Orit Wasserman <owasserm@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
11 years agoKVM: nVMX: Allocate shadow vmcs
Abel Gordon [Thu, 18 Apr 2013 11:37:25 +0000 (14:37 +0300)]
KVM: nVMX: Allocate shadow vmcs

Allocate a shadow vmcs used by the processor to shadow part of the fields
stored in the software defined VMCS12 (let L1 access fields without causing
exits). Note we keep a shadow vmcs only for the current vmcs12.  Once a vmcs12
becomes non-current, its shadow vmcs is released.

Signed-off-by: Abel Gordon <abelg@il.ibm.com>
Reviewed-by: Orit Wasserman <owasserm@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
11 years agoKVM: nVMX: Fix VMXON emulation
Abel Gordon [Thu, 18 Apr 2013 11:36:55 +0000 (14:36 +0300)]
KVM: nVMX: Fix VMXON emulation

handle_vmon doesn't check if L1 is already in root mode (VMXON
was previously called). This patch adds this missing check and calls
nested_vmx_failValid if VMX is already ON.
We need this check because L0 will allocate the shadow vmcs when L1
executes VMXON and we want to avoid host leaks (due to shadow vmcs
allocation) if L1 executes VMXON repeatedly.

Signed-off-by: Abel Gordon <abelg@il.ibm.com>
Reviewed-by: Orit Wasserman <owasserm@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
11 years agoKVM: nVMX: Refactor handle_vmwrite
Abel Gordon [Thu, 18 Apr 2013 11:36:25 +0000 (14:36 +0300)]
KVM: nVMX: Refactor handle_vmwrite

Refactor existent code so we re-use vmcs12_write_any to copy fields from the
shadow vmcs specified by the link pointer (used by the processor,
implementation-specific) to the VMCS12 software format used by L0 to hold
the fields in L1 memory address space.

Signed-off-by: Abel Gordon <abelg@il.ibm.com>
Reviewed-by: Orit Wasserman <owasserm@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
11 years agoKVM: nVMX: Introduce vmread and vmwrite bitmaps
Abel Gordon [Thu, 18 Apr 2013 11:35:55 +0000 (14:35 +0300)]
KVM: nVMX: Introduce vmread and vmwrite bitmaps

Prepare vmread and vmwrite bitmaps according to a pre-specified list of fields.
These lists are intended to specifiy most frequent accessed fields so we can
minimize the number of fields that are copied from/to the software controlled
VMCS12 format to/from to processor-specific shadow vmcs. The lists were built
measuring the VMCS fields access rate after L2 Ubuntu 12.04 booted when it was
running on top of L1 KVM, also Ubuntu 12.04. Note that during boot there were
additional fields which were frequently modified but they were not added to
these lists because after boot these fields were not longer accessed by L1.

Signed-off-by: Abel Gordon <abelg@il.ibm.com>
Reviewed-by: Orit Wasserman <owasserm@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
11 years agoKVM: nVMX: Detect shadow-vmcs capability
Abel Gordon [Thu, 18 Apr 2013 11:35:25 +0000 (14:35 +0300)]
KVM: nVMX: Detect shadow-vmcs capability

Add logic required to detect if shadow-vmcs is supported by the
processor. Introduce a new kernel module parameter to specify if L0 should use
shadow vmcs (or not) to run L1.

Signed-off-by: Abel Gordon <abelg@il.ibm.com>
Reviewed-by: Orit Wasserman <owasserm@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
11 years agoKVM: nVMX: Shadow-vmcs control fields/bits
Abel Gordon [Thu, 18 Apr 2013 11:34:55 +0000 (14:34 +0300)]
KVM: nVMX: Shadow-vmcs control fields/bits

Add definitions for all the vmcs control fields/bits
required to enable vmcs-shadowing

Signed-off-by: Abel Gordon <abelg@il.ibm.com>
Reviewed-by: Orit Wasserman <owasserm@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
11 years agoMerge git://github.com/agraf/linux-2.6.git kvm-ppc-next into queue
Gleb Natapov [Mon, 22 Apr 2013 07:38:15 +0000 (10:38 +0300)]
Merge git://github.com/agraf/linux-2.6.git kvm-ppc-next into queue

11 years agoKVM: ia64: Fix kvm_vm_ioctl_irq_line
Yang Zhang [Wed, 17 Apr 2013 00:46:41 +0000 (08:46 +0800)]
KVM: ia64: Fix kvm_vm_ioctl_irq_line

Fix the compile error with kvm_vm_ioctl_irq_line.

Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
11 years agoKVM: x86: Fix posted interrupt with CONFIG_SMP=n
Zhang, Yang Z [Thu, 18 Apr 2013 02:11:54 +0000 (23:11 -0300)]
KVM: x86: Fix posted interrupt with CONFIG_SMP=n

->send_IPI_mask is not defined on UP.

Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
11 years agokvm/ppc: don't call complete_mmio_load when it's a store
Scott Wood [Mon, 15 Apr 2013 15:07:11 +0000 (15:07 +0000)]
kvm/ppc: don't call complete_mmio_load when it's a store

complete_mmio_load writes back the mmio result into the
destination register.  Doing this on a store results in
register corruption.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
11 years agoKVM: PPC: emulate dcbst
Stuart Yoder [Tue, 9 Apr 2013 10:36:23 +0000 (10:36 +0000)]
KVM: PPC: emulate dcbst

Signed-off-by: Stuart Yoder <stuart.yoder@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
11 years agoAdded ONE_REG interface for debug instruction
Bharat Bhushan [Wed, 20 Mar 2013 20:24:58 +0000 (20:24 +0000)]
Added ONE_REG interface for debug instruction

This patch adds the one_reg interface to get the special instruction
to be used for setting software breakpoint from userspace.

Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
11 years agoMerge commit 'origin/next' into kvm-ppc-next
Alexander Graf [Wed, 17 Apr 2013 13:20:38 +0000 (15:20 +0200)]
Merge commit 'origin/next' into kvm-ppc-next

11 years agoKVM: VMX: Fix check guest state validity if a guest is in VM86 mode
Gleb Natapov [Sun, 14 Apr 2013 13:07:37 +0000 (16:07 +0300)]
KVM: VMX: Fix check guest state validity if a guest is in VM86 mode

If guest vcpu is in VM86 mode the vcpu state should be checked as if in
real mode.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
11 years agoKVM: nVMX: check vmcs12 for valid activity state
Paolo Bonzini [Mon, 15 Apr 2013 13:00:27 +0000 (15:00 +0200)]
KVM: nVMX: check vmcs12 for valid activity state

KVM does not use the activity state VMCS field, and does not support
it in nested VMX either (the corresponding bits in the misc VMX feature
MSR are zero).  Fail entry if the activity state is set to anything but
"active".

Since the value will always be the same for L1 and L2, we do not need
to read and write the corresponding VMCS field on L1/L2 transitions,
either.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
11 years agoKVM: ARM: Fix kvm_vm_ioctl_irq_line
Alexander Graf [Tue, 16 Apr 2013 17:21:41 +0000 (19:21 +0200)]
KVM: ARM: Fix kvm_vm_ioctl_irq_line

Commit aa2fbe6d broke the ARM KVM target by introducing a new parameter
to irq handling functions.

Fix the function prototype to get things compiling again and ignore the
parameter just like we did before

Signed-off-by: Alexander Graf <agraf@suse.de>
Acked-by: Christoffer Dall <cdall@cs.columbia.edu>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
11 years agoKVM: VMX: Use posted interrupt to deliver virtual interrupt
Yang Zhang [Thu, 11 Apr 2013 11:25:16 +0000 (19:25 +0800)]
KVM: VMX: Use posted interrupt to deliver virtual interrupt

If posted interrupt is avaliable, then uses it to inject virtual
interrupt to guest.

Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
11 years agoKVM: VMX: Add the deliver posted interrupt algorithm
Yang Zhang [Thu, 11 Apr 2013 11:25:15 +0000 (19:25 +0800)]
KVM: VMX: Add the deliver posted interrupt algorithm

Only deliver the posted interrupt when target vcpu is running
and there is no previous interrupt pending in pir.

Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
11 years agoKVM: Set TMR when programming ioapic entry
Yang Zhang [Thu, 11 Apr 2013 11:25:14 +0000 (19:25 +0800)]
KVM: Set TMR when programming ioapic entry

We already know the trigger mode of a given interrupt when programming
the ioapice entry. So it's not necessary to set it in each interrupt
delivery.

Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
11 years agoKVM: Call common update function when ioapic entry changed.
Yang Zhang [Thu, 11 Apr 2013 11:25:13 +0000 (19:25 +0800)]
KVM: Call common update function when ioapic entry changed.

Both TMR and EOI exit bitmap need to be updated when ioapic changed
or vcpu's id/ldr/dfr changed. So use common function instead eoi exit
bitmap specific function.

Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
11 years agoKVM: VMX: Check the posted interrupt capability
Yang Zhang [Thu, 11 Apr 2013 11:25:12 +0000 (19:25 +0800)]
KVM: VMX: Check the posted interrupt capability

Detect the posted interrupt feature. If it exists, then set it in vmcs_config.

Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
11 years agoKVM: VMX: Register a new IPI for posted interrupt
Yang Zhang [Thu, 11 Apr 2013 11:25:11 +0000 (19:25 +0800)]
KVM: VMX: Register a new IPI for posted interrupt

Posted Interrupt feature requires a special IPI to deliver posted interrupt
to guest. And it should has a high priority so the interrupt will not be
blocked by others.
Normally, the posted interrupt will be consumed by vcpu if target vcpu is
running and transparent to OS. But in some cases, the interrupt will arrive
when target vcpu is scheduled out. And host will see it. So we need to
register a dump handler to handle it.

Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
11 years agoKVM: VMX: Enable acknowledge interupt on vmexit
Yang Zhang [Thu, 11 Apr 2013 11:25:10 +0000 (19:25 +0800)]
KVM: VMX: Enable acknowledge interupt on vmexit

The "acknowledge interrupt on exit" feature controls processor behavior
for external interrupt acknowledgement. When this control is set, the
processor acknowledges the interrupt controller to acquire the
interrupt vector on VM exit.

After enabling this feature, an interrupt which arrived when target cpu is
running in vmx non-root mode will be handled by vmx handler instead of handler
in idt. Currently, vmx handler only fakes an interrupt stack and jump to idt
table to let real handler to handle it. Further, we will recognize the interrupt
and only delivery the interrupt which not belong to current vcpu through idt table.
The interrupt which belonged to current vcpu will be handled inside vmx handler.
This will reduce the interrupt handle cost of KVM.

Also, interrupt enable logic is changed if this feature is turnning on:
Before this patch, hypervior call local_irq_enable() to enable it directly.
Now IF bit is set on interrupt stack frame, and will be enabled on a return from
interrupt handler if exterrupt interrupt exists. If no external interrupt, still
call local_irq_enable() to enable it.

Refer to Intel SDM volum 3, chapter 33.2.

Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
11 years agoKVM: Use eoi to track RTC interrupt delivery status
Yang Zhang [Thu, 11 Apr 2013 11:21:41 +0000 (19:21 +0800)]
KVM: Use eoi to track RTC interrupt delivery status

Current interrupt coalescing logci which only used by RTC has conflict
with Posted Interrupt.
This patch introduces a new mechinism to use eoi to track interrupt:
When delivering an interrupt to vcpu, the pending_eoi set to number of
vcpu that received the interrupt. And decrease it when each vcpu writing
eoi. No subsequent RTC interrupt can deliver to vcpu until all vcpus
write eoi.

Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
11 years agoKVM: Let ioapic know the irq line status
Yang Zhang [Thu, 11 Apr 2013 11:21:40 +0000 (19:21 +0800)]
KVM: Let ioapic know the irq line status

Userspace may deliver RTC interrupt without query the status. So we
want to track RTC EOI for this case.

Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
11 years agoKVM: Force vmexit with virtual interrupt delivery
Yang Zhang [Thu, 11 Apr 2013 11:21:39 +0000 (19:21 +0800)]
KVM: Force vmexit with virtual interrupt delivery

Need the EOI to track interrupt deliver status, so force vmexit
on EOI for rtc interrupt when enabling virtual interrupt delivery.

Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
11 years agoKVM: Add reset/restore rtc_status support
Yang Zhang [Thu, 11 Apr 2013 11:21:38 +0000 (19:21 +0800)]
KVM: Add reset/restore rtc_status support

restore rtc_status from migration or save/restore

Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
11 years agoKVM: Return destination vcpu on interrupt injection
Yang Zhang [Thu, 11 Apr 2013 11:21:37 +0000 (19:21 +0800)]
KVM: Return destination vcpu on interrupt injection

Add a new parameter to know vcpus who received the interrupt.

Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
11 years agoKVM: Introduce struct rtc_status
Yang Zhang [Thu, 11 Apr 2013 11:21:36 +0000 (19:21 +0800)]
KVM: Introduce struct rtc_status

rtc_status is used to track RTC interrupt delivery status. The pending_eoi
will be increased by vcpu who received RTC interrupt and will be decreased
when EOI to this interrupt.
Also, we use dest_map to record the destination vcpu to avoid the case that
vcpu who didn't get the RTC interupt, but issued EOI with same vector of RTC
and descreased pending_eoi by mistake.

Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
11 years agoKVM: Add vcpu info to ioapic_update_eoi()
Yang Zhang [Thu, 11 Apr 2013 11:21:35 +0000 (19:21 +0800)]
KVM: Add vcpu info to ioapic_update_eoi()

Add vcpu info to ioapic_update_eoi, so we can know which vcpu
issued this EOI.

Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
11 years agoKVM: nVMX: Avoid reading VM_EXIT_INTR_ERROR_CODE needlessly on nested exits
Jan Kiszka [Sun, 14 Apr 2013 10:12:50 +0000 (12:12 +0200)]
KVM: nVMX: Avoid reading VM_EXIT_INTR_ERROR_CODE needlessly on nested exits

We only need to update vm_exit_intr_error_code if there is a valid exit
interruption information and it comes with a valid error code.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
11 years agoKVM: nVMX: Fix conditions for interrupt injection
Jan Kiszka [Sun, 14 Apr 2013 10:12:48 +0000 (12:12 +0200)]
KVM: nVMX: Fix conditions for interrupt injection

If we are entering guest mode, we do not want L0 to interrupt this
vmentry with all its side effects on the vmcs. Therefore, injection
shall be disallowed during L1->L2 transitions, as in the previous
version. However, this check is conceptually independent of
nested_exit_on_intr, so decouple it.

If L1 traps external interrupts, we can kick the guest from L2 to L1,
also just like the previous code worked. But we no longer need to
consider L1's idt_vectoring_info_field. It will always be empty at this
point. Instead, if L2 has pending events, those are now found in the
architectural queues and will, thus, prevent vmx_interrupt_allowed from
being called at all.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
11 years agoKVM: nVMX: Rework event injection and recovery
Jan Kiszka [Sun, 14 Apr 2013 10:12:46 +0000 (12:12 +0200)]
KVM: nVMX: Rework event injection and recovery

The basic idea is to always transfer the pending event injection on
vmexit into the architectural state of the VCPU and then drop it from
there if it turns out that we left L2 to enter L1, i.e. if we enter
prepare_vmcs12.

vmcs12_save_pending_events takes care to transfer pending L0 events into
the queue of L1. That is mandatory as L1 may decide to switch the guest
state completely, invalidating or preserving the pending events for
later injection (including on a different node, once we support
migration).

This concept is based on the rule that a pending vmlaunch/vmresume is
not canceled. Otherwise, we would risk to lose injected events or leak
them into the wrong queues. Encode this rule via a WARN_ON_ONCE at the
entry of nested_vmx_vmexit.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
11 years agoKVM: nVMX: Fix injection of PENDING_INTERRUPT and NMI_WINDOW exits to L1
Jan Kiszka [Sun, 14 Apr 2013 10:12:45 +0000 (12:12 +0200)]
KVM: nVMX: Fix injection of PENDING_INTERRUPT and NMI_WINDOW exits to L1

Check if the interrupt or NMI window exit is for L1 by testing if it has
the corresponding controls enabled. This is required when we allow
direct injection from L0 to L2

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
11 years agoKVM: emulator: mark 0xff 0x7d opcode as undefined.
Gleb Natapov [Thu, 11 Apr 2013 09:32:14 +0000 (12:32 +0300)]
KVM: emulator: mark 0xff 0x7d opcode as undefined.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
11 years agoKVM: emulator: Do not fail on emulation of undefined opcode
Gleb Natapov [Thu, 11 Apr 2013 09:30:01 +0000 (12:30 +0300)]
KVM: emulator: Do not fail on emulation of undefined opcode

Emulation of undefined opcode should inject #UD instead of causing
emulation failure. Do that by moving Undefined flag check to emulation
stage and injection #UD there.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
11 years agoKVM: VMX: do not try to reexecute failed instruction while emulating invalid guest...
Gleb Natapov [Thu, 11 Apr 2013 09:10:51 +0000 (12:10 +0300)]
KVM: VMX: do not try to reexecute failed instruction while emulating invalid guest state

During invalid guest state emulation vcpu cannot enter guest mode to try
to reexecute instruction that emulator failed to emulate, so emulation
will happen again and again.  Prevent that by telling the emulator that
instruction reexecution should not be attempted.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
11 years agoKVM: emulator: fix unimplemented instruction detection
Gleb Natapov [Thu, 11 Apr 2013 08:59:55 +0000 (11:59 +0300)]
KVM: emulator: fix unimplemented instruction detection

Unimplemented instruction detection is broken for group instructions
since it relies on "flags" field of opcode to be zero, but all
instructions in a group inherit flags from a group encoding. Fix that by
having a separate flag for unimplemented instructions.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
11 years agoKVM: x86 emulator: Fix segment loading in VM86
Kevin Wolf [Thu, 11 Apr 2013 12:06:03 +0000 (14:06 +0200)]
KVM: x86 emulator: Fix segment loading in VM86

This fixes a regression introduced in commit 03ebebeb1 ("KVM: x86
emulator: Leave segment limit and attributs alone in real mode").

The mentioned commit changed the segment descriptors for both real mode
and VM86 to only update the segment base instead of creating a
completely new descriptor with limit 0xffff so that unreal mode keeps
working across a segment register reload.

This leads to an invalid segment descriptor in the eyes of VMX, which
seems to be okay for real mode because KVM will fix it up before the
next VM entry or emulate the state, but it doesn't do this if the guest
is in VM86, so we end up with:

  KVM: entry failed, hardware error 0x80000021

Fix this by effectively reverting commit 03ebebeb1 for VM86 and leaving
it only in place for real mode, which is where it's really needed.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
11 years agoKVM: Move kvm_rebooting declaration out of x86
Geoff Levand [Fri, 5 Apr 2013 19:20:30 +0000 (19:20 +0000)]
KVM: Move kvm_rebooting declaration out of x86

The variable kvm_rebooting is a common kvm variable, so move its
declaration from arch/x86/include/asm/kvm_host.h to
include/asm/kvm_host.h.

Fixes this sparse warning when building on arm64:

  virt/kvm/kvm_main.c:warning: symbol 'kvm_rebooting' was not declared. Should it be static?

Signed-off-by: Geoff Levand <geoff@infradead.org>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
11 years agoKVM: Move kvm_spurious_fault to x86.c
Geoff Levand [Fri, 5 Apr 2013 19:20:30 +0000 (19:20 +0000)]
KVM: Move kvm_spurious_fault to x86.c

The routine kvm_spurious_fault() is an x86 specific routine, so
move it from virt/kvm/kvm_main.c to arch/x86/kvm/x86.c.

Fixes this sparse warning when building on arm64:

  virt/kvm/kvm_main.c:warning: symbol 'kvm_spurious_fault' was not declared. Should it be static?

Signed-off-by: Geoff Levand <geoff@infradead.org>
Signed-off-by: Gleb Natapov <gleb@redhat.com>