Jan Kiszka [Wed, 20 Jan 2010 17:20:20 +0000 (18:20 +0100)]
KVM: SVM: Trap all debug register accesses
To enable proper debug register emulation under all conditions, trap
access to all DR0..7. This may be optimized later on.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Jan Kiszka [Wed, 20 Jan 2010 17:20:20 +0000 (18:20 +0100)]
KVM: SVM: Clean up and enhance mov dr emulation
Enhance mov dr instruction emulation used by SVM so that it properly
handles dr4/5: alias to dr6/7 if cr4.de is cleared. Otherwise return
EMULATE_FAIL which will let our only possible caller in that scenario,
ud_interception, re-inject UD.
We do not need to inject faults, SVM does this for us (exceptions take
precedence over instruction interceptions). For the same reason, the
value overflow checks can be removed.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Jan Kiszka [Wed, 20 Jan 2010 17:20:20 +0000 (18:20 +0100)]
KVM: VMX: Clean up DR6 emulation
As we trap all debug register accesses, we do not need to switch real
DR6 at all. Clean up update_exception_bitmap at this chance, too.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Jan Kiszka [Wed, 20 Jan 2010 17:20:20 +0000 (18:20 +0100)]
KVM: VMX: Fix emulation of DR4 and DR5
Make sure DR4 and DR5 are aliased to DR6 and DR7, respectively, if
CR4.DE is not set.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Jan Kiszka [Wed, 20 Jan 2010 17:20:20 +0000 (18:20 +0100)]
KVM: VMX: Fix exceptions of mov to dr
Injecting GP without an error code is a bad idea (causes unhandled guest
exits). Moreover, we must not skip the instruction if we injected an
exception.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Takuya Yoshikawa [Wed, 20 Jan 2010 07:47:21 +0000 (16:47 +0900)]
KVM: x86: Use macros for x86_emulate_ops to avoid future mistakes
The return values from x86_emulate_ops are defined
in kvm_emulate.h as macros X86EMUL_*.
But in emulate.c, we are comparing the return values
from these ops with 0 to check if they're X86EMUL_CONTINUE
or not: X86EMUL_CONTINUE is defined as 0 now.
To avoid possible mistakes in the future, this patch
substitutes "X86EMUL_CONTINUE" for "0" that are being
compared with the return values from x86_emulate_ops.
We think that there are more places we should use these
macros, but the meanings of rc values in x86_emulate_insn()
were not so clear at a glance. If we use proper macros in
this function, we would be able to follow the flow of each
emulation more easily and, maybe, more securely.
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Marcelo Tosatti [Tue, 19 Jan 2010 14:45:23 +0000 (12:45 -0200)]
KVM: fix cleanup_srcu_struct on vm destruction
cleanup_srcu_struct on VM destruction remains broken:
BUG: unable to handle kernel paging request at
ffffffffffffffff
IP: [<
ffffffff802533d2>] srcu_read_lock+0x16/0x21
RIP: 0010:[<
ffffffff802533d2>] [<
ffffffff802533d2>] srcu_read_lock+0x16/0x21
Call Trace:
[<
ffffffffa05354c4>] kvm_arch_vcpu_uninit+0x1b/0x48 [kvm]
[<
ffffffffa05339c6>] kvm_vcpu_uninit+0x9/0x15 [kvm]
[<
ffffffffa0569f7d>] vmx_free_vcpu+0x7f/0x8f [kvm_intel]
[<
ffffffffa05357b5>] kvm_arch_destroy_vm+0x78/0x111 [kvm]
[<
ffffffffa053315b>] kvm_put_kvm+0xd4/0xfe [kvm]
Move it to kvm_arch_destroy_vm.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Reported-by: Jan Kiszka <jan.kiszka@siemens.com>
Gleb Natapov [Tue, 19 Jan 2010 13:06:38 +0000 (15:06 +0200)]
KVM: fix Hyper-V hypercall warnings and wrong mask value
Fix compilation warnings and wrong mask value.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Sheng Yang [Tue, 19 Jan 2010 09:43:21 +0000 (17:43 +0800)]
KVM: VMX: Remove emulation failure report
As Avi noted:
>There are two problems with the kernel failure report. First, it
>doesn't report enough data - registers, surrounding instructions, etc.
>that are needed to explain what is going on. Second, it can flood
>dmesg, which is a pretty bad thing to do.
So we remove the emulation failure report in handle_invalid_guest_state(),
and would inspected the guest using userspace tool in the future.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Avi Kivity [Mon, 18 Jan 2010 11:26:34 +0000 (13:26 +0200)]
KVM: export <asm/hyperv.h>
Needed by <asm/kvm_para.h>.
Signed-off-by: Avi Kivity <avi@redhat.com>
Takuya Yoshikawa [Mon, 18 Jan 2010 09:45:10 +0000 (18:45 +0900)]
KVM: rename is_writeble_pte() to is_writable_pte()
There are two spellings of "writable" in
arch/x86/kvm/mmu.c and paging_tmpl.h .
This patch renames is_writeble_pte() to is_writable_pte()
and makes grepping easy.
New name is consistent with the definition of itself:
return pte & PT_WRITABLE_MASK;
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Avi Kivity <avi@redhat.com>
Gleb Natapov [Sun, 17 Jan 2010 13:51:24 +0000 (15:51 +0200)]
KVM: Implement NotifyLongSpinWait HYPER-V hypercall
Windows issues this hypercall after guest was spinning on a spinlock
for too many iterations.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Vadim Rozenfeld <vrozenfe@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Gleb Natapov [Sun, 17 Jan 2010 13:51:23 +0000 (15:51 +0200)]
KVM: Add HYPER-V apic access MSRs
Implement HYPER-V apic MSRs. Spec defines three MSRs that speed-up
access to EOI/TPR/ICR apic registers for PV guests.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Vadim Rozenfeld <vrozenfe@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Gleb Natapov [Sun, 17 Jan 2010 13:51:22 +0000 (15:51 +0200)]
KVM: Implement bare minimum of HYPER-V MSRs
Minimum HYPER-V implementation should have GUEST_OS_ID, HYPERCALL and
VP_INDEX MSRs.
[avi: fix build on i386]
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Vadim Rozenfeld <vrozenfe@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Gleb Natapov [Sun, 17 Jan 2010 13:51:21 +0000 (15:51 +0200)]
KVM: Add HYPER-V header file
Provide HYPER-V related defines that will be used by following patches.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Vadim Rozenfeld <vrozenfe@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 15 Jan 2010 13:49:14 +0000 (14:49 +0100)]
KVM: PPC: Move Shadow MSR calculation to function
We keep a copy of the MSR around that we use when we go into the guest context.
That copy is basically the normal process MSR flags OR some allowed guest
specified MSR flags. We also AND the external providers into this, so we get
traps on FPU usage when we haven't activated it on the host yet.
Currently this calculation is part of the set_msr function that we use whenever
we set the guest MSR value. With the external providers, we also have the case
that we don't modify the guest's MSR, but only want to update the shadow MSR.
So let's move the shadow MSR parts to a separate function that we then use
whenever we only need to update it. That way we don't accidently kvm_vcpu_block
within a preempt notifier context.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 15 Jan 2010 13:49:13 +0000 (14:49 +0100)]
KVM: PPC: Keep SRR1 flags around in shadow_msr
SRR1 stores more information that just the MSR value. It also stores
valuable information about the type of interrupt we received, for
example whether the storage interrupt we just got was because of a
missing htab entry or not.
We use that information to speed up the exit path.
Now if we get preempted before we can interpret the shadow_msr values,
we get into vcpu_put which then calls the MSR handler, which then sets
all the SRR1 information bits in shadow_msr to 0. Great.
So let's preserve the SRR1 specific bits in shadow_msr whenever we set
the MSR. They don't hurt.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 15 Jan 2010 13:49:12 +0000 (14:49 +0100)]
KVM: PPC: Fix initial GPR settings
Commit
7d01b4c3ed2bb33ceaf2d270cb4831a67a76b51b introduced PACA backed vcpu
values. With this patch, when a userspace app was setting GPRs before it was
actually first loaded, the set values get discarded.
This is because vcpu_load loads them from the vcpu backing store that we use
whenever we're not owning the PACA.
That behavior is not really a major problem, because we don't need it for
qemu. Other users (like kvmctl) do have problems with it though, so let's
better do it right.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 15 Jan 2010 13:49:11 +0000 (14:49 +0100)]
KVM: PPC: Add support for FPU/Altivec/VSX
When our guest starts using either the FPU, Altivec or VSX we need to make
sure Linux knows about it and sneak into its process switching code
accordingly.
This patch makes accesses to the above parts of the system work inside the
VM.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 15 Jan 2010 13:49:10 +0000 (14:49 +0100)]
KVM: PPC: Add helper functions to call real mode loaders
Linux contains quite some bits of code to load FPU, Altivec and VSX lazily for
a task. It calls those bits in real mode, coming from an interrupt handler.
For KVM we better reuse those, so let's wrap a bit of trampoline magic around
them and then we can call them from normal module code.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 15 Jan 2010 13:49:09 +0000 (14:49 +0100)]
KVM: PPC: Export __giveup_vsx
We need to explicitly only giveup VSX in KVM, so let's export that
specific function to module space.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Roel Kluin [Thu, 14 Jan 2010 17:05:58 +0000 (18:05 +0100)]
KVM: ia64: remove redundant kvm_get_exit_data() NULL tests
kvm_get_exit_data() cannot return a NULL pointer.
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Sun, 10 Jan 2010 10:19:20 +0000 (12:19 +0200)]
KVM: SVM: Lazy fpu with npt
Now that we can allow the guest to play with cr0 when the fpu is loaded,
we can enable lazy fpu when npt is in use.
Acked-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Wed, 6 Jan 2010 08:55:27 +0000 (10:55 +0200)]
KVM: SVM: Selective cr0 intercept
If two conditions apply:
- no bits outside TS and EM differ between the host and guest cr0
- the fpu is active
then we can activate the selective cr0 write intercept and drop the
unconditional cr0 read and write intercept, and allow the guest to run
with the host fpu state. This reduces cr0 exits due to guest fpu management
while the guest fpu is loaded.
Acked-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Sun, 10 Jan 2010 10:14:04 +0000 (12:14 +0200)]
KVM: SVM: Restore unconditional cr0 intercept under npt
Currently we don't intercept cr0 at all when npt is enabled. This improves
performance but requires us to activate the fpu at all times.
Remove this behaviour in preparation for adding selective cr0 intercepts.
Acked-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Thu, 7 Jan 2010 11:16:08 +0000 (13:16 +0200)]
KVM: SVM: Initialize fpu_active in init_vmcb()
init_vmcb() sets up the intercepts as if the fpu is active, so initialize it
there. This avoids an INIT from setting up intercepts inconsistent with
fpu_active.
Acked-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Wed, 6 Jan 2010 11:13:01 +0000 (13:13 +0200)]
KVM: SVM: Fix SVM_CR0_SELECTIVE_MASK
Instead of selecting TS and MP as the comments say, the macro included TS and
PE. Luckily the macro is unused now, but fix in order to save a few hours of
debugging from anyone who attempts to use it.
Acked-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Wed, 6 Jan 2010 17:10:22 +0000 (19:10 +0200)]
KVM: Set cr0.et when the guest writes cr0
Follow the hardware.
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Wed, 30 Dec 2009 16:07:40 +0000 (18:07 +0200)]
KVM: VMX: Give the guest ownership of cr0.ts when the fpu is active
If the guest fpu is loaded, there is nothing interesing about cr0.ts; let
the guest play with it as it will. This makes context switches between fpu
intensive guest processes faster, as we won't trap the clts and cr0 write
instructions.
[marcelo: fix cr0 read shadow update on fpu deactivation; kills F8 install]
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Avi Kivity [Wed, 30 Dec 2009 10:40:26 +0000 (12:40 +0200)]
KVM: Lazify fpu activation and deactivation
Defer fpu deactivation as much as possible - if the guest fpu is loaded, keep
it loaded until the next heavyweight exit (where we are forced to unload it).
This reduces unnecessary exits.
We also defer fpu activation on clts; while clts signals the intent to use the
fpu, we can't be sure the guest will actually use it.
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Tue, 29 Dec 2009 16:43:06 +0000 (18:43 +0200)]
KVM: VMX: Allow the guest to own some cr0 bits
We will use this later to give the guest ownership of cr0.ts.
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Tue, 29 Dec 2009 16:07:30 +0000 (18:07 +0200)]
KVM: Replace read accesses of vcpu->arch.cr0 by an accessor
Since we'd like to allow the guest to own a few bits of cr0 at times, we need
to know when we access those bits.
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Tue, 29 Dec 2009 15:33:58 +0000 (17:33 +0200)]
KVM: VMX: trace clts and lmsw instructions as cr accesses
clts writes cr0.ts; lmsw writes cr0[0:15] - record that in ftrace.
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Sun, 10 Jan 2010 02:27:47 +0000 (03:27 +0100)]
KVM: PPC: Make large pages work
An SLB entry contains two pieces of information related to size:
1) PTE size
2) SLB size
The L bit defines the PTE be "large" (usually means 16MB),
SLB_VSID_B_1T defines that the SLB should span 1 GB instead of the
default 256MB.
Apparently I messed things up and just put those two in one box,
shaked it heavily and came up with the current code which handles
large pages incorrectly, because it also treats large page SLB entries
as "1TB" segment entries.
This patch splits those two features apart, making Linux guests boot
even when they have > 256MB.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Sun, 10 Jan 2010 02:27:32 +0000 (03:27 +0100)]
KVM: PPC: Pass through program interrupts
When we get a program interrupt in guest kernel mode, we try to emulate the
instruction.
If that doesn't fail, we report to the user and try again - at the exact same
instruction pointer. So if the guest kernel really does trigger an invalid
instruction, we loop forever.
So let's better go and forward program exceptions to the guest when we don't
know the instruction we're supposed to emulate.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 8 Jan 2010 01:58:09 +0000 (02:58 +0100)]
KVM: PPC: Pass program interrupt flags to the guest
When we need to reinject a program interrupt into the guest, we also need to
reinject the corresponding flags into the guest.
Signed-off-by: Alexander Graf <agraf@suse.de>
Reported-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 8 Jan 2010 01:58:08 +0000 (02:58 +0100)]
KVM: PPC: Fix HID5 setting code
The code to unset HID5.dcbz32 is broken.
This patch makes it do the right rotate magic.
Signed-off-by: Alexander Graf <agraf@suse.de>
Reported-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 8 Jan 2010 01:58:07 +0000 (02:58 +0100)]
KVM: PPC: Emulate trap SRR1 flags properly
Book3S needs some flags in SRR1 to get to know details about an interrupt.
One such example is the trap instruction. It tells the guest kernel that
a program interrupt is due to a trap using a bit in SRR1.
This patch implements above behavior, making WARN_ON behave like WARN_ON.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 8 Jan 2010 01:58:06 +0000 (02:58 +0100)]
KVM: PPC: Call SLB patching code in interrupt safe manner
Currently we're racy when doing the transition from IR=1 to IR=0, from
the module memory entry code to the real mode SLB switching code.
To work around that I took a look at the RTAS entry code which is faced
with a similar problem and did the same thing:
A small helper in linear mapped memory that does mtmsr with IR=0 and
then RFIs info the actual handler.
Thanks to that trick we can safely take page faults in the entry code
and only need to be really wary of what to do as of the SLB switching
part.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 8 Jan 2010 01:58:05 +0000 (02:58 +0100)]
KVM: PPC: Get rid of unnecessary RFI
Using an RFI in IR=1 is dangerous. We need to set two SRRs and then do an RFI
without getting interrupted at all, because every interrupt could potentially
overwrite the SRR values.
Fortunately, we don't need to RFI in at least this particular case of the code,
so we can just replace it with an mtmsr and b.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 8 Jan 2010 01:58:04 +0000 (02:58 +0100)]
KVM: PPC: Implement 'skip instruction' mode
To fetch the last instruction we were interrupted on, we enable DR in early
exit code, where we are still in a very transitional phase between guest
and host state.
Most of the time this seemed to work, but another CPU can easily flush our
TLB and HTAB which makes us go in the Linux page fault handler which totally
breaks because we still use the guest's SLB entries.
To work around that, let's introduce a second KVM guest mode that defines
that whenever we get a trap, we don't call the Linux handler or go into
the KVM exit code, but just jump over the faulting instruction.
That way a potentially bad lwz doesn't trigger any faults and we can later
on interpret the invalid instruction we fetched as "fetch didn't work".
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 8 Jan 2010 01:58:03 +0000 (02:58 +0100)]
KVM: PPC: Use PACA backed shadow vcpu
We're being horribly racy right now. All the entry and exit code hijacks
random fields from the PACA that could easily be used by different code in
case we get interrupted, for example by a #MC or even page fault.
After discussing this with Ben, we figured it's best to reserve some more
space in the PACA and just shove off some vcpu state to there.
That way we can drastically improve the readability of the code, make it
less racy and less complex.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 8 Jan 2010 01:58:02 +0000 (02:58 +0100)]
KVM: PPC: Add helpers for CR, XER
We now have helpers for the GPRs, so let's also add some for CR and XER.
Having them in the PACA simplifies code a lot, as we don't need to care
about where to store CC or not to overflow any integers.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 8 Jan 2010 01:58:01 +0000 (02:58 +0100)]
KVM: PPC: Use accessor functions for GPR access
All code in PPC KVM currently accesses gprs in the vcpu struct directly.
While there's nothing wrong with that wrt the current way gprs are stored
and loaded, it doesn't suffice for the PACA acceleration that will follow
in this patchset.
So let's just create little wrapper inline functions that we call whenever
a GPR needs to be read from or written to. The compiled code shouldn't really
change at all for now.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Takuya Yoshikawa [Wed, 6 Jan 2010 08:55:23 +0000 (17:55 +0900)]
KVM: Fix the explanation of write_emulated
The explanation of write_emulated is confused with
that of read_emulated. This patch fix it.
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Sheng Yang [Tue, 5 Jan 2010 11:02:29 +0000 (19:02 +0800)]
KVM: VMX: Enable EPT 1GB page support
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Sheng Yang [Tue, 5 Jan 2010 11:02:27 +0000 (19:02 +0800)]
KVM: x86: Rename gb_page_enable() to get_lpage_level() in kvm_x86_ops
Then the callback can provide the maximum supported large page level, which
is more flexible.
Also move the gb page support into x86_64 specific.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Sheng Yang [Tue, 5 Jan 2010 11:02:26 +0000 (19:02 +0800)]
KVM: x86: Moving PT_*_LEVEL to mmu.h
We can use them in x86.c and vmx.c now...
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Alexander Graf [Mon, 4 Jan 2010 21:19:25 +0000 (22:19 +0100)]
KVM: PPC: Enable lightweight exits again
The PowerPC C ABI defines that registers r14-r31 need to be preserved across
function calls. Since our exit handler is written in C, we can make use of that
and don't need to reload r14-r31 on every entry/exit cycle.
This technique is also used in the BookE code and is called "lightweight exits"
there. To follow the tradition, it's called the same in Book3S.
So far this optimization was disabled though, as the code didn't do what it was
expected to do, but failed to work.
This patch fixes and enables lightweight exits again.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Alexander Graf [Mon, 4 Jan 2010 21:19:22 +0000 (22:19 +0100)]
KVM: PPC: Fix typo in rebolting code
When we're loading bolted entries into the SLB again, we're checking if an
entry is in use and only slbmte it when it is.
Unfortunately, the check always goes to the skip label of the first entry,
resulting in an endless loop when it actually gets triggered.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Avi Kivity [Mon, 28 Dec 2009 12:08:30 +0000 (14:08 +0200)]
KVM: avoid taking ioapic mutex for non-ioapic EOIs
When the guest acknowledges an interrupt, it sends an EOI message to the local
apic, which broadcasts it to the ioapic. To handle the EOI, we need to take
the ioapic mutex.
On large guests, this causes a lot of contention on this mutex. Since large
guests usually don't route interrupts via the ioapic (they use msi instead),
this is completely unnecessary.
Avoid taking the mutex by introducing a handled_vectors bitmap. Before taking
the mutex, check if the ioapic was actually responsible for the acked vector.
If not, we can return early.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Avi Kivity [Mon, 28 Dec 2009 14:06:35 +0000 (16:06 +0200)]
KVM: Fill out ftrace exit reason strings
Some exit reasons missed their strings; fill out the table.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Avi Kivity [Sun, 27 Dec 2009 15:00:46 +0000 (17:00 +0200)]
KVM: Bump maximum vcpu count to 64
With slots_lock converted to rcu, the entire kvm hotpath on modern processors
(with npt or ept) now scales beautifully. Increase the maximum vcpu count to
64 to reflect this.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Marcelo Tosatti [Wed, 23 Dec 2009 16:35:26 +0000 (14:35 -0200)]
KVM: convert slots_lock to a mutex
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Marcelo Tosatti [Wed, 23 Dec 2009 16:35:25 +0000 (14:35 -0200)]
KVM: switch vcpu context to use SRCU
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Marcelo Tosatti [Wed, 23 Dec 2009 16:35:24 +0000 (14:35 -0200)]
KVM: convert io_bus to SRCU
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Marcelo Tosatti [Wed, 23 Dec 2009 16:35:23 +0000 (14:35 -0200)]
KVM: x86: switch kvm_set_memory_alias to SRCU update
Using a similar two-step procedure as for memslots.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Marcelo Tosatti [Wed, 23 Dec 2009 16:35:22 +0000 (14:35 -0200)]
KVM: use SRCU for dirty log
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Marcelo Tosatti [Wed, 23 Dec 2009 16:35:21 +0000 (14:35 -0200)]
KVM: introduce kvm->srcu and convert kvm_set_memory_region to SRCU update
Use two steps for memslot deletion: mark the slot invalid (which stops
instantiation of new shadow pages for that slot, but allows destruction),
then instantiate the new empty slot.
Also simplifies kvm_handle_hva locking.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Marcelo Tosatti [Wed, 23 Dec 2009 16:35:20 +0000 (14:35 -0200)]
KVM: use gfn_to_pfn_memslot in kvm_iommu_map_pages
So its possible to iommu map a memslot before making it visible to
kvm.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Marcelo Tosatti [Wed, 23 Dec 2009 16:35:19 +0000 (14:35 -0200)]
KVM: introduce gfn_to_pfn_memslot
Which takes a memslot pointer instead of using kvm->memslots.
To be used by SRCU convertion later.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Marcelo Tosatti [Wed, 23 Dec 2009 16:35:18 +0000 (14:35 -0200)]
KVM: split kvm_arch_set_memory_region into prepare and commit
Required for SRCU convertion later.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Marcelo Tosatti [Wed, 23 Dec 2009 16:35:17 +0000 (14:35 -0200)]
KVM: modify alias layout in x86s struct kvm_arch
Have a pointer to an allocated region inside x86's kvm_arch.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Marcelo Tosatti [Wed, 23 Dec 2009 16:35:16 +0000 (14:35 -0200)]
KVM: modify memslots layout in struct kvm
Have a pointer to an allocated region inside struct kvm.
[alex: fix ppc book 3s]
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Wu Fengguang [Thu, 24 Dec 2009 01:04:16 +0000 (09:04 +0800)]
KVM: trivial document fixes
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Alexander Graf [Sun, 20 Dec 2009 21:24:07 +0000 (22:24 +0100)]
KVM: powerpc: Change maintainer
Progress on KVM for Embedded PowerPC has stalled, but for Book3S there's quite
a lot of work to do and going on.
So in agreement with Hollis and Avi, we should switch maintainers for PowerPC.
Signed-off-by: Alexander Graf <agraf@suse.de>
Acked-by: Hollis Blanchard <hollis@penguinppc.org>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Mon, 21 Dec 2009 19:21:25 +0000 (20:21 +0100)]
KVM: powerpc: Remove AGGRESSIVE_DEC
Because we now emulate the DEC interrupt according to real life behavior,
there's no need to keep the AGGRESSIVE_DEC hack around.
Let's just remove it.
Signed-off-by: Alexander Graf <agraf@suse.de>
Acked-by: Acked-by: Hollis Blanchard <hollis@penguinppc.org>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Mon, 21 Dec 2009 19:21:24 +0000 (20:21 +0100)]
KVM: powerpc: Improve DEC handling
We treated the DEC interrupt like an edge based one. This is not true for
Book3s. The DEC keeps firing until mtdec is issued again and thus clears
the interrupt line.
So let's implement this logic in KVM too. This patch moves the line clearing
from the firing of the interrupt to the mtdec emulation.
This makes PPC64 guests work without AGGRESSIVE_DEC defined.
Signed-off-by: Alexander Graf <agraf@suse.de>
Acked-by: Acked-by: Hollis Blanchard <hollis@penguinppc.org>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Mon, 21 Dec 2009 19:21:23 +0000 (20:21 +0100)]
KVM: powerpc: Move vector to irqprio resolving to separate function
We're using a switch table to find the irqprio that belongs to a specific
interrupt vector. This table is part of the interrupt inject logic.
Since we'll add a new function to stop interrupts, let's move this table
out of the injection logic into a separate function.
Signed-off-by: Alexander Graf <agraf@suse.de>
Acked-by: Acked-by: Hollis Blanchard <hollis@penguinppc.org>
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Sun, 20 Dec 2009 13:13:43 +0000 (15:13 +0200)]
KVM: Simplify coalesced mmio initialization
- add destructor function
- move related allocation into constructor
- add stubs for !CONFIG_KVM_MMIO
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Sun, 20 Dec 2009 13:00:10 +0000 (15:00 +0200)]
KVM: Add KVM_MMIO kconfig item
s390 doesn't have mmio, this will simplify ifdefing it out.
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Sun, 20 Dec 2009 12:54:04 +0000 (14:54 +0200)]
KVM: Remove ifdefs from mmu notifier initialization
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Sun, 20 Dec 2009 12:42:19 +0000 (14:42 +0200)]
KVM: Add include guards for coalesced_mmio.h
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Sun, 20 Dec 2009 12:25:19 +0000 (14:25 +0200)]
KVM: Disentangle mmu notifiers and coalesced_mmio registration
They aren't related.
Signed-off-by: Avi Kivity <avi@redhat.com>
Joerg Roedel [Mon, 14 Dec 2009 11:22:20 +0000 (12:22 +0100)]
KVM: SVM: Adjust tsc_offset only if tsc_unstable
The tsc_offset adjustment in svm_vcpu_load is executed
unconditionally even if Linux considers the host tsc as
stable. This causes a Linux guest detecting an unstable tsc
in any case.
This patch removes the tsc_offset adjustment if the host tsc
is stable. The guest will now get the benefit of a stable
tsc too.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Sheng Yang [Fri, 18 Dec 2009 08:48:47 +0000 (16:48 +0800)]
KVM: VMX: Add instruction rdtscp support for guest
Before enabling, execution of "rdtscp" in guest would result in #UD.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Sheng Yang [Fri, 18 Dec 2009 08:48:46 +0000 (16:48 +0800)]
KVM: Add cpuid_update() callback to kvm_x86_ops
Sometime, we need to adjust some state in order to reflect guest CPUID
setting, e.g. if we don't expose rdtscp to guest, we won't want to enable
it on hardware. cpuid_update() is introduced for this purpose.
Also export kvm_find_cpuid_entry() for later use.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Sheng Yang [Fri, 18 Dec 2009 08:48:45 +0000 (16:48 +0800)]
x86: Raise vsyscall priority on hotplug notifier chain
KVM need vsyscall_init() to initialize MSR_TSC_AUX before it read the value.
Per Avi's suggestion, this patch raised vsyscall priority on hotplug notifier
chain, to 30.
CC: Ingo Molnar <mingo@elte.hu>
CC: linux-kernel@vger.kernel.org
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Sheng Yang [Fri, 18 Dec 2009 08:48:44 +0000 (16:48 +0800)]
KVM: Extended shared_msr_global to per CPU
shared_msr_global saved host value of relevant MSRs, but it have an
assumption that all MSRs it tracked shared the value across the different
CPUs. It's not true with some MSRs, e.g. MSR_TSC_AUX.
Extend it to per CPU to provide the support of MSR_TSC_AUX, and more
alike MSRs.
Notice now the shared_msr_global still have one assumption: it can only deal
with the MSRs that won't change in host after KVM module loaded.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Sheng Yang [Fri, 18 Dec 2009 08:48:42 +0000 (16:48 +0800)]
KVM: VMX: Remove redundant variable
It's no longer necessary.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Tue, 8 Dec 2009 10:14:42 +0000 (12:14 +0200)]
KVM: VMX: Fold ept_update_paging_mode_cr4() into its caller
ept_update_paging_mode_cr4() accesses vcpu->arch.cr4 directly, which usually
needs to be accessed via kvm_read_cr4(). In this case, we can't, since cr4
is in the process of being updated. Instead of adding inane comments, fold
the function into its caller (vmx_set_cr4), so it can use the not-yet-committed
cr4 directly.
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Mon, 7 Dec 2009 10:29:14 +0000 (12:29 +0200)]
KVM: VMX: When using ept, allow the guest to own cr4.pge
We make no use of cr4.pge if ept is enabled, but the guest does (to flush
global mappings, as with vmap()), so give the guest ownership of this bit.
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Mon, 7 Dec 2009 10:26:18 +0000 (12:26 +0200)]
KVM: VMX: Make guest cr4 mask more conservative
Instead of specifying the bits which we want to trap on, specify the bits
which we allow the guest to change transparently. This is safer wrt future
changes to cr4.
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Mon, 7 Dec 2009 10:16:48 +0000 (12:16 +0200)]
KVM: Add accessor for reading cr4 (or some bits of cr4)
Some bits of cr4 can be owned by the guest on vmx, so when we read them,
we copy them to the vcpu structure. In preparation for making the set of
guest-owned bits dynamic, use helpers to access these bits so we don't need
to know where the bit resides.
No changes to svm since all bits are host-owned there.
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Sun, 6 Dec 2009 15:21:14 +0000 (17:21 +0200)]
KVM: VMX: Move some cr[04] related constants to vmx.c
They have no place in common code.
Signed-off-by: Avi Kivity <avi@redhat.com>
Sheng Yang [Tue, 15 Dec 2009 05:29:54 +0000 (13:29 +0800)]
KVM: VMX: Trap and invalid MWAIT/MONITOR instruction
We don't support these instructions, but guest can execute them even if the
feature('monitor') haven't been exposed in CPUID. So we would trap and inject
a #UD if guest try this way.
Cc: stable@kernel.org
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Wed, 2 Dec 2009 13:17:00 +0000 (15:17 +0200)]
KVM: MMU: Report spte not found in rmap before BUG()
In the past we've had errors of single-bit in the other two cases; the
printk() may confirm it for the third case (many->many).
Signed-off-by: Avi Kivity <avi@redhat.com>
Marcelo Tosatti [Wed, 11 Nov 2009 19:29:49 +0000 (17:29 -0200)]
KVM: x86: raise TSS exception for NULL CS and SS segments
Windows 2003 uses task switch to triple fault and reboot (the other
exception being reserved pdptrs bits).
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Eddie Dong [Thu, 19 Nov 2009 15:54:07 +0000 (17:54 +0200)]
KVM: x86: make double/triple fault promotion generic to all exceptions
Move Double-Fault generation logic out of page fault
exception generating function to cover more generic case.
Signed-off-by: Eddie Dong <eddie.dong@intel.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Linus Torvalds [Sun, 28 Feb 2010 19:00:55 +0000 (11:00 -0800)]
Merge branch 'x86-uv-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip
* 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, uv: Remove recursion in uv_heartbeat_enable()
x86, uv: uv_global_gru_mmr_address() macro fix
x86, uv: Add serial number parameter to uv_bios_get_sn_info()
Linus Torvalds [Sun, 28 Feb 2010 18:59:44 +0000 (10:59 -0800)]
Merge branch 'x86-ptrace-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip
* 'x86-ptrace-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, ptrace: Remove set_stopped_child_used_math() in [x]fpregs_set
x86, ptrace: Simplify xstateregs_get()
ptrace: Fix ptrace_regset() comments and diagnose errors specifically
parisc: Disable CONFIG_HAVE_ARCH_TRACEHOOK
ptrace: Add support for generic PTRACE_GETREGSET/PTRACE_SETREGSET
x86, ptrace: regset extensions to support xstate
Linus Torvalds [Sun, 28 Feb 2010 18:59:18 +0000 (10:59 -0800)]
Merge branch 'x86-pci-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip
* 'x86-pci-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: Enable NMI on all cpus on UV
vgaarb: Add user selectability of the number of GPUS in a system
vgaarb: Fix VGA arbiter to accept PCI domains other than 0
x86, uv: Update UV arch to target Legacy VGA I/O correctly.
pci: Update pci_set_vga_state() to call arch functions
Linus Torvalds [Sun, 28 Feb 2010 18:43:53 +0000 (10:43 -0800)]
Merge branch 'x86-setup-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip
* 'x86-setup-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, setup: Don't skip mode setting for the standard VGA modes
x86-64, setup: Inhibit decompressor output if video info is invalid
x86, setup: When restoring the screen, update boot_params.screen_info
Linus Torvalds [Sun, 28 Feb 2010 18:41:35 +0000 (10:41 -0800)]
Merge branch 'x86-rwsem-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip
* 'x86-rwsem-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86-64, rwsem: Avoid store forwarding hazard in __downgrade_write
x86-64, rwsem: 64-bit xadd rwsem implementation
x86: Fix breakage of UML from the changes in the rwsem system
x86-64: support native xadd rwsem implementation
x86: clean up rwsem type system
Linus Torvalds [Sun, 28 Feb 2010 18:39:36 +0000 (10:39 -0800)]
Merge branch 'x86-numa-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip
* 'x86-numa-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, numa: Remove configurable node size support for numa emulation
x86, numa: Add fixed node size option for numa emulation
x86, numa: Fix numa emulation calculation of big nodes
x86, acpi: Map hotadded cpu to correct node.
Linus Torvalds [Sun, 28 Feb 2010 18:39:16 +0000 (10:39 -0800)]
Merge branch 'x86-mtrr-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip
* 'x86-mtrr-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: Convert set_atomicity_lock to raw_spinlock
x86, mtrr: Kill over the top warn
x86, mtrr: Constify struct mtrr_ops
Linus Torvalds [Sun, 28 Feb 2010 18:38:45 +0000 (10:38 -0800)]
Merge branch 'x86-mm-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, mm: Unify kernel_physical_mapping_init() API
x86, mm: Allow highmem user page tables to be disabled at boot time
x86: Do not reserve brk for DMI if it's not going to be used
x86: Convert tlbstate_lock to raw_spinlock
x86: Use the generic page_is_ram()
x86: Remove BIOS data range from e820
Move page_is_ram() declaration to mm.h
Generic page_is_ram: use __weak
resources: introduce generic page_is_ram()
Linus Torvalds [Sun, 28 Feb 2010 18:37:40 +0000 (10:37 -0800)]
Merge branch 'x86-io-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip
* 'x86-io-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: Merge io.h
x86: Simplify flush_write_buffers()
x86: Clean up mem*io functions.
x86-64: Use BUILDIO in io_64.h
x86-64: Reorganize io_64.h
x86-32: Remove _local variants of in/out from io_32.h
x86-32: Move XQUAD definitions to numaq.h
Linus Torvalds [Sun, 28 Feb 2010 18:37:06 +0000 (10:37 -0800)]
Merge branch 'x86-cpu-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip
* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, cacheinfo: Enable L3 CID only on AMD
x86, cacheinfo: Remove NUMA dependency, fix for AMD Fam10h rev D1
x86, cpu: Print AMD virtualization features in /proc/cpuinfo
x86, cacheinfo: Calculate L3 indices
x86, cacheinfo: Add cache index disable sysfs attrs only to L3 caches
x86, cacheinfo: Fix disabling of L3 cache indices
intel-agp: Switch to wbinvd_on_all_cpus
x86, lib: Add wbinvd smp helpers
Linus Torvalds [Sun, 28 Feb 2010 18:36:48 +0000 (10:36 -0800)]
Merge branch 'x86-cleanups-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip
* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: Remove trailing spaces in messages
x86, mtrr: Remove unused mtrr/state.c
x86, trivial: Fix grammo in tsc comment about Geode TSC reliability