Dexuan Cui [Thu, 10 Jun 2010 03:27:12 +0000 (11:27 +0800)]
KVM: VMX: Enable XSAVE/XRSTOR for guest
This patch enable guest to use XSAVE/XRSTOR instructions.
We assume that host_xcr0 would use all possible bits that OS supported.
And we loaded xcr0 in the same way we handled fpu - do it as late as we can.
Signed-off-by: Dexuan Cui <dexuan.cui@intel.com>
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Thu, 10 Jun 2010 14:21:29 +0000 (17:21 +0300)]
KVM: VMX: Fix incorrect rcu deref in rmode_tss_base()
Signed-off-by: Avi Kivity <avi@redhat.com>
Andi Kleen [Thu, 10 Jun 2010 11:10:55 +0000 (13:10 +0200)]
KVM: Fix unused but set warnings
No real bugs in this one.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Andi Kleen [Thu, 10 Jun 2010 11:10:47 +0000 (13:10 +0200)]
KVM: Fix KVM_SET_SIGNAL_MASK with arg == NULL
When the user passed in a NULL mask pass this on from the ioctl
handler.
Found by gcc 4.6's new warnings.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Xiao Guangrong [Tue, 8 Jun 2010 12:07:01 +0000 (20:07 +0800)]
KVM: MMU: delay local tlb flush
delay local tlb flush until enter guest moden, it can reduce vpid flush
frequency and reduce remote tlb flush IPI(if KVM_REQ_TLB_FLUSH bit is
already set, IPI is not sent)
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Xiao Guangrong [Tue, 8 Jun 2010 12:05:57 +0000 (20:05 +0800)]
KVM: MMU: use wrapper function to flush local tlb
Use kvm_mmu_flush_tlb() function instead of calling
kvm_x86_ops->tlb_flush(vcpu) directly.
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Xiao Guangrong [Tue, 8 Jun 2010 12:05:05 +0000 (20:05 +0800)]
KVM: MMU: remove unnecessary remote tlb flush
This remote tlb flush is no necessary since we have synced while
sp is zapped
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Xiao Guangrong [Tue, 8 Jun 2010 02:15:51 +0000 (10:15 +0800)]
KVM: VMX: fix rcu usage warning in init_rmode()
fix:
[ INFO: suspicious rcu_dereference_check() usage. ]
---------------------------------------------------
include/linux/kvm_host.h:258 invoked rcu_dereference_check() without protection!
other info that might help us debug this:
rcu_scheduler_active = 1, debug_locks = 1
1 lock held by qemu-system-x86/3796:
#0: (&vcpu->mutex){+.+.+.}, at: [<
ffffffffa0217fd8>] vcpu_load+0x1a/0x66 [kvm]
stack backtrace:
Pid: 3796, comm: qemu-system-x86 Not tainted 2.6.34 #25
Call Trace:
[<
ffffffff81070ed1>] lockdep_rcu_dereference+0x9d/0xa5
[<
ffffffffa0214fdf>] gfn_to_memslot_unaliased+0x65/0xa0 [kvm]
[<
ffffffffa0216139>] gfn_to_hva+0x22/0x4c [kvm]
[<
ffffffffa0216217>] kvm_write_guest_page+0x2a/0x7f [kvm]
[<
ffffffffa0216286>] kvm_clear_guest_page+0x1a/0x1c [kvm]
[<
ffffffffa0278239>] init_rmode+0x3b/0x180 [kvm_intel]
[<
ffffffffa02786ce>] vmx_set_cr0+0x350/0x4d3 [kvm_intel]
[<
ffffffffa02274ff>] kvm_arch_vcpu_ioctl_set_sregs+0x122/0x31a [kvm]
[<
ffffffffa021859c>] kvm_vcpu_ioctl+0x578/0xa3d [kvm]
[<
ffffffff8106624c>] ? cpu_clock+0x2d/0x40
[<
ffffffff810f7d86>] ? fget_light+0x244/0x28e
[<
ffffffff810709b9>] ? trace_hardirqs_off_caller+0x1f/0x10e
[<
ffffffff8110501b>] vfs_ioctl+0x32/0xa6
[<
ffffffff81105597>] do_vfs_ioctl+0x47f/0x4b8
[<
ffffffff813ae654>] ? sub_preempt_count+0xa3/0xb7
[<
ffffffff810f7da8>] ? fget_light+0x266/0x28e
[<
ffffffff810f7c53>] ? fget_light+0x111/0x28e
[<
ffffffff81105617>] sys_ioctl+0x47/0x6a
[<
ffffffff81002c1b>] system_call_fastpath+0x16/0x1b
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Gui Jianfeng [Mon, 7 Jun 2010 02:33:27 +0000 (10:33 +0800)]
KVM: VMX: rename vpid_sync_vcpu_all() to vpid_sync_vcpu_single()
The name "pid_sync_vcpu_all" isn't appropriate since it just affect
a single vpid, so rename it to vpid_sync_vcpu_single().
Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Gui Jianfeng [Mon, 7 Jun 2010 02:32:29 +0000 (10:32 +0800)]
KVM: VMX: Add all-context INVVPID type support
Add all-context INVVPID type support.
Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Xiao Guangrong [Fri, 4 Jun 2010 13:56:59 +0000 (21:56 +0800)]
KVM: MMU: reduce remote tlb flush in kvm_mmu_pte_write()
collect remote tlb flush in kvm_mmu_pte_write() path
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Xiao Guangrong [Fri, 4 Jun 2010 13:56:11 +0000 (21:56 +0800)]
KVM: MMU: traverse sp hlish safely
Now, we can safely to traverse sp hlish
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Xiao Guangrong [Fri, 4 Jun 2010 13:55:29 +0000 (21:55 +0800)]
KVM: MMU: gather remote tlb flush which occurs during page zapped
Using kvm_mmu_prepare_zap_page() and kvm_mmu_zap_page() instead of
kvm_mmu_zap_page() that can reduce remote tlb flush IPI
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Xiao Guangrong [Fri, 4 Jun 2010 13:54:38 +0000 (21:54 +0800)]
KVM: MMU: don't get free page number in the loop
In the later patch, we will modify sp's zapping way like below:
kvm_mmu_prepare_zap_page A
kvm_mmu_prepare_zap_page B
kvm_mmu_prepare_zap_page C
....
kvm_mmu_commit_zap_page
[ zaped multiple sps only need to call kvm_mmu_commit_zap_page once ]
In __kvm_mmu_free_some_pages() function, the free page number is
getted form 'vcpu->kvm->arch.n_free_mmu_pages' in loop, it will
hinders us to apply kvm_mmu_prepare_zap_page() and kvm_mmu_commit_zap_page()
since kvm_mmu_prepare_zap_page() not free sp.
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Xiao Guangrong [Fri, 4 Jun 2010 13:53:54 +0000 (21:53 +0800)]
KVM: MMU: split the operations of kvm_mmu_zap_page()
Using kvm_mmu_prepare_zap_page() and kvm_mmu_commit_zap_page() to
split kvm_mmu_zap_page() function, then we can:
- traverse hlist safely
- easily to gather remote tlb flush which occurs during page zapped
Those feature can be used in the later patches
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Xiao Guangrong [Fri, 4 Jun 2010 13:53:07 +0000 (21:53 +0800)]
KVM: MMU: introduce some macros to cleanup hlist traverseing
Introduce for_each_gfn_sp() and for_each_gfn_indirect_valid_sp() to
cleanup hlist traverseing
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Xiao Guangrong [Fri, 4 Jun 2010 13:52:17 +0000 (21:52 +0800)]
KVM: MMU: skip invalid sp when unprotect page
In kvm_mmu_unprotect_page(), the invalid sp can be skipped
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Gui Jianfeng [Fri, 4 Jun 2010 00:51:39 +0000 (08:51 +0800)]
KVM: VMX: Make sure single type invvpid is supported before issuing invvpid instruction
According to SDM, we need check whether single-context INVVPID type is supported
before issuing invvpid instruction.
Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Reviewed-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Lai Jiangshan [Wed, 2 Jun 2010 09:06:03 +0000 (17:06 +0800)]
KVM: x86: use linux/uaccess.h instead of asm/uaccess.h
Should use linux/uaccess.h instead of asm/uaccess.h
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Lai Jiangshan [Wed, 2 Jun 2010 09:01:23 +0000 (17:01 +0800)]
KVM: cleanup "*new.rmap" type
The type of '*new.rmap' is not 'struct page *', fix it
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Sheng Yang [Wed, 2 Jun 2010 06:05:24 +0000 (14:05 +0800)]
KVM: VMX: Enforce EPT pagetable level checking
We only support 4 levels EPT pagetable now.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Glauber Costa [Tue, 1 Jun 2010 12:22:48 +0000 (08:22 -0400)]
KVM: Add Documentation/kvm/msr.txt
This patch adds a file that documents the usage of KVM-specific
MSRs.
Signed-off-by: Glauber Costa <glommer@redhat.com>
Reviewed-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Andreas Schwab [Mon, 31 May 2010 19:59:13 +0000 (21:59 +0200)]
KVM: PPC: elide struct thread_struct instances from stack
Instead of instantiating a whole thread_struct on the stack use only the
required parts of it.
Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
Tested-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Mohammed Gamal [Mon, 31 May 2010 19:40:54 +0000 (22:40 +0300)]
KVM: VMX: Properly return error to userspace on vmentry failure
The vmexit handler returns KVM_EXIT_UNKNOWN since there is no handler
for vmentry failures. This intercepts vmentry failures and returns
KVM_FAIL_ENTRY to userspace instead.
Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Gui Jianfeng [Mon, 31 May 2010 09:11:39 +0000 (17:11 +0800)]
KVM: MMU: Don't calculate quadrant if tdp_enabled
There's no need to calculate quadrant if tdp is enabled.
Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Avi Kivity [Thu, 27 May 2010 13:44:12 +0000 (16:44 +0300)]
KVM: MMU: Document large pages
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Avi Kivity [Thu, 27 May 2010 11:46:04 +0000 (14:46 +0300)]
KVM: MMU: Document cr0.wp emulation
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Thu, 27 May 2010 11:22:51 +0000 (14:22 +0300)]
KVM: MMU: Allow spte.w=1 for gpte.w=0 and cr0.wp=0 only in shadow mode
When tdp is enabled, the guest's cr0.wp shouldn't have any effect on spte
permissions.
Signed-off-by: Avi Kivity <avi@redhat.com>
Jan Kiszka [Tue, 25 May 2010 14:01:50 +0000 (16:01 +0200)]
KVM: x86: Propagate fpu_alloc errors
Memory allocation may fail. Propagate such errors.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Reviewed-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Zachary Amsden [Thu, 27 May 2010 01:09:43 +0000 (15:09 -1000)]
KVM: SVM: Fix EFER.LME being stripped
Must set VCPU register to be the guest notion of EFER even if that
setting is not valid on hardware. This was masked by the set in
set_efer until
7657fd5ace88e8092f5f3a84117e093d7b893f26 broke that.
Fix is simply to set the VCPU register before stripping bits.
Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Gui Jianfeng [Thu, 27 May 2010 08:09:48 +0000 (16:09 +0800)]
KVM: MMU: don't check PT_WRITABLE_MASK directly
Since we have is_writable_pte(), make use of it.
Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Lai Jiangshan [Wed, 26 May 2010 08:48:19 +0000 (16:48 +0800)]
KVM: MMU: calculate correct gfn for small host pages backing large guest pages
In Documentation/kvm/mmu.txt:
gfn:
Either the guest page table containing the translations shadowed by this
page, or the base page frame for linear translations. See role.direct.
But in function FNAME(fetch)(), sp->gfn is incorrect when one of following
situations occurred:
1) guest is 32bit paging and the guest PDE maps a 4-MByte page
(backed by 4k host pages), FNAME(fetch)() miss handling the quadrant.
And if guest use pse-36, "table_gfn = gpte_to_gfn(gw->ptes[level - delta]);"
is incorrect.
2) guest is long mode paging and the guest PDPTE maps a 1-GByte page
(backed by 4k or 2M host pages).
So we fix it to suit to the document and suit to the code which
requires sp->gfn correct when sp->role.direct=1.
We use the goal mapping gfn(gw->gfn) to calculate the base page frame
for linear translations, it is simple and easy to be understood.
Reported-by: Marcelo Tosatti <mtosatti@redhat.com>
Reported-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Lai Jiangshan [Wed, 26 May 2010 08:48:25 +0000 (16:48 +0800)]
KVM: MMU: Calculate correct base gfn for direct non-DIR level
In Document/kvm/mmu.txt:
gfn:
Either the guest page table containing the translations shadowed by this
page, or the base page frame for linear translations. See role.direct.
But in __direct_map(), the base gfn calculation is incorrect,
it does not calculate correctly when level=3 or 4.
Fix by using PT64_LVL_ADDR_MASK() which accounts for all levels correctly.
Reported-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Lai Jiangshan [Wed, 26 May 2010 08:49:59 +0000 (16:49 +0800)]
KVM: MMU: Don't allocate gfns page for direct mmu pages
When sp->role.direct is set, sp->gfns does not contain any essential
information, leaf sptes reachable from this sp are for a continuous
guest physical memory range (a linear range).
So sp->gfns[i] (if it was set) equals to sp->gfn + i. (PT_PAGE_TABLE_LEVEL)
Obviously, it is not essential information, we can calculate it when need.
It means we don't need sp->gfns when sp->role.direct=1,
Thus we can save one page usage for every kvm_mmu_page.
Note:
Access to sp->gfns must be wrapped by kvm_mmu_page_get_gfn()
or kvm_mmu_page_set_gfn().
It is only exposed in FNAME(sync_page).
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Mohammed Gamal [Sun, 23 May 2010 22:01:04 +0000 (01:01 +0300)]
KVM: VMX: Add constant for invalid guest state exit reason
For the sake of completeness, this patch adds a symbolic
constant for VMX exit reason 0x21 (invalid guest state).
Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Xiao Guangrong [Mon, 24 May 2010 07:41:33 +0000 (15:41 +0800)]
KVM: MMU: allow more page become unsync at getting sp time
Allow more page become asynchronous at getting sp time, if need create new
shadow page for gfn but it not allow unsync(level > 1), we should unsync all
gfn's unsync page
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Xiao Guangrong [Mon, 24 May 2010 07:40:07 +0000 (15:40 +0800)]
KVM: MMU: allow more page become unsync at gfn mapping time
In current code, shadow page can become asynchronous only if one
shadow page for a gfn, this rule is too strict, in fact, we can
let all last mapping page(i.e, it's the pte page) become unsync,
and sync them at invlpg or flush tlb time.
This patch allow more page become asynchronous at gfn mapping time
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Sun, 23 May 2010 15:37:00 +0000 (18:37 +0300)]
KVM: Update Red Hat copyrights
Signed-off-by: Avi Kivity <avi@redhat.com>
Gleb Natapov [Sun, 23 May 2010 11:28:26 +0000 (14:28 +0300)]
KVM: SVM: correctly trace irq injection
On SVM interrupts are injected by svm_set_irq() not svm_inject_irq().
The later is used only to wait for irq window.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Xiao Guangrong [Sat, 15 May 2010 10:53:35 +0000 (18:53 +0800)]
KVM: MMU: only update unsync page in invlpg path
Only unsync pages need updated at invlpg time since other shadow
pages are write-protected
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Xiao Guangrong [Sat, 15 May 2010 10:52:34 +0000 (18:52 +0800)]
KVM: MMU: don't write-protect if have new mapping to unsync page
Two cases maybe happen in kvm_mmu_get_page() function:
- one case is, the goal sp is already in cache, if the sp is unsync,
we only need update it to assure this mapping is valid, but not
mark it sync and not write-protect sp->gfn since it not broke unsync
rule(one shadow page for a gfn)
- another case is, the goal sp not existed, we need create a new sp
for gfn, i.e, gfn (may)has another shadow page, to keep unsync rule,
we should sync(mark sync and write-protect) gfn's unsync shadow page.
After enabling multiple unsync shadows, we sync those shadow pages
only when the new sp not allow to become unsync(also for the unsyc
rule, the new rule is: allow all pte page become unsync)
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Xiao Guangrong [Sat, 15 May 2010 10:51:24 +0000 (18:51 +0800)]
KVM: MMU: split kvm_sync_page() function
Split kvm_sync_page() into kvm_sync_page() and kvm_sync_page_transient()
to clarify the code address Avi's suggestion
kvm_sync_page_transient() function only update shadow page but not mark
it sync and not write protect sp->gfn. it will be used by later patch
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Sheng Yang [Mon, 17 May 2010 09:08:28 +0000 (17:08 +0800)]
KVM: x86: Use FPU API
Convert KVM to use generic FPU API.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Sheng Yang [Mon, 17 May 2010 09:08:27 +0000 (17:08 +0800)]
KVM: x86: Use unlazy_fpu() for host FPU
We can avoid unnecessary fpu load when userspace process
didn't use FPU frequently.
Derived from Avi's idea.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Sheng Yang [Mon, 17 May 2010 09:22:23 +0000 (17:22 +0800)]
x86: Export FPU API for KVM use
Also add some constants.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Thu, 13 May 2010 09:35:17 +0000 (12:35 +0300)]
KVM: Consolidate arch specific vcpu ioctl locking
Now that all arch specific ioctls have centralized locking, it is easy to
move it to the central dispatcher.
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Thu, 13 May 2010 09:30:43 +0000 (12:30 +0300)]
KVM: PPC: Centralize locking of arch specific vcpu ioctls
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Thu, 13 May 2010 09:21:46 +0000 (12:21 +0300)]
KVM: s390: Centrally lock arch specific vcpu ioctls
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Thu, 13 May 2010 08:53:06 +0000 (11:53 +0300)]
KVM: x86: Lock arch specific vcpu ioctls centrally
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Thu, 13 May 2010 08:25:04 +0000 (11:25 +0300)]
KVM: move vcpu locking to dispatcher for generic vcpu ioctls
All vcpu ioctls need to be locked, so instead of locking each one specifically
we lock at the generic dispatcher.
This patch only updates generic ioctls and leaves arch specific ioctls alone.
Signed-off-by: Avi Kivity <avi@redhat.com>
Xiao Guangrong [Thu, 13 May 2010 02:09:57 +0000 (10:09 +0800)]
KVM: x86: cleanup unused local variable
fix:
arch/x86/kvm/x86.c: In function ‘handle_emulation_failure’:
arch/x86/kvm/x86.c:3844: warning: unused variable ‘ctxt’
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Xiao Guangrong [Thu, 13 May 2010 02:08:08 +0000 (10:08 +0800)]
KVM: MMU: unalias gfn before sp->gfns[] comparison in sync_page
sp->gfns[] contain unaliased gfns, but gpte might contain pointer
to aliased region.
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Xiao Guangrong [Thu, 13 May 2010 02:07:00 +0000 (10:07 +0800)]
KVM: MMU: remove rmap before clear spte
Remove rmap before clear spte otherwise it will trigger BUG_ON() in
some functions such as rmap_write_protect().
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Xiao Guangrong [Thu, 13 May 2010 02:06:02 +0000 (10:06 +0800)]
KVM: MMU: use proper cache object freeing function
Use kmem_cache_free to free objects allocated by kmem_cache_alloc.
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Alex Williamson [Wed, 12 May 2010 13:46:31 +0000 (09:46 -0400)]
KVM: remove CAP_SYS_RAWIO requirement from kvm_vm_ioctl_assign_irq
Remove this check in an effort to allow kvm guests to run without
root privileges. This capability check doesn't seem to add any
security since the device needs to have already been added via the
assign device ioctl and the io actually occurs through the pci
sysfs interface.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Sheng Yang [Wed, 12 May 2010 08:40:42 +0000 (16:40 +0800)]
KVM: VMX: Only reset MMU when necessary
Only modifying some bits of CR0/CR4 needs paging mode switch.
Modify EFER.NXE bit would result in reserved bit updates.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Sheng Yang [Wed, 12 May 2010 08:40:41 +0000 (16:40 +0800)]
KVM: x86: Clean up duplicate assignment
mmu.free() already set root_hpa to INVALID_PAGE, no need to do it again in the
destory_kvm_mmu().
kvm_x86_ops->set_cr4() and set_efer() already assign cr4/efer to
vcpu->arch.cr4/efer, no need to do it again later.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Mohammed Gamal [Tue, 11 May 2010 22:39:22 +0000 (01:39 +0300)]
KVM: x86 emulator: Add missing decoder flags for xor instructions
This adds missing decoder flags for xor instructions (opcodes 0x34 - 0x35)
Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Mohammed Gamal [Tue, 11 May 2010 22:39:21 +0000 (01:39 +0300)]
KVM: x86 emulator: Add missing decoder flags for sub instruction
This adds missing decoder flags for sub instructions (opcodes 0x2c - 0x2d)
Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Mohammed Gamal [Tue, 11 May 2010 19:22:40 +0000 (22:22 +0300)]
KVM: x86 emulator: Add test acc, imm instruction (opcodes 0xA8 - 0xA9)
This adds test acc, imm instruction to the x86 emulator
Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Marcelo Tosatti [Thu, 13 May 2010 00:00:35 +0000 (21:00 -0300)]
KVM: pass correct parameter to kvm_mmu_free_some_pages
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Dongxiao Xu [Tue, 11 May 2010 10:29:48 +0000 (18:29 +0800)]
KVM: VMX: VMXON/VMXOFF usage changes
SDM suggests VMXON should be called before VMPTRLD, and VMXOFF
should be called after doing VMCLEAR.
Therefore in vmm coexistence case, we should firstly call VMXON
before any VMCS operation, and then call VMXOFF after the
operation is done.
Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Dongxiao Xu [Tue, 11 May 2010 10:29:45 +0000 (18:29 +0800)]
KVM: VMX: VMCLEAR/VMPTRLD usage changes
Originally VMCLEAR/VMPTRLD is called on vcpu migration. To
support hosted VMM coexistance, VMCLEAR is executed on vcpu
schedule out, and VMPTRLD is executed on vcpu schedule in.
This could also eliminate the IPI when doing VMCLEAR.
Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Dongxiao Xu [Tue, 11 May 2010 10:29:42 +0000 (18:29 +0800)]
KVM: VMX: Some minor changes to code structure
Do some preparations for vmm coexistence support.
Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Dongxiao Xu [Tue, 11 May 2010 10:29:38 +0000 (18:29 +0800)]
KVM: VMX: Define new functions to wrapper direct call of asm code
Define vmcs_load() and kvm_cpu_vmxon() to avoid direct call of asm
code. Also move VMXE bit operation out of kvm_cpu_vmxoff().
Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Gui Jianfeng [Tue, 11 May 2010 06:36:58 +0000 (14:36 +0800)]
KVM: update mmu documetation for role.nxe
There's no member "cr4_nxe" in struct kvm_mmu_page_role, it names "nxe" now.
Update mmu document.
Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Avi Kivity [Mon, 10 May 2010 09:09:56 +0000 (12:09 +0300)]
KVM: MMU: Fix free memory accounting race in mmu_alloc_roots()
We drop the mmu lock between freeing memory and allocating the roots; this
allows some other vcpu to sneak in and allocate memory.
While the race is benign (resulting only in temporary overallocation, not oom)
it is simple and easy to fix by moving the freeing close to the allocation.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Gleb Natapov [Mon, 10 May 2010 08:16:56 +0000 (11:16 +0300)]
KVM: inject #UD if instruction emulation fails and exit to userspace
Do not kill VM when instruction emulation fails. Inject #UD and report
failure to userspace instead. Userspace may choose to reenter guest if
vcpu is in userspace (cpl == 3) in which case guest OS will kill
offending process and continue running.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Avi Kivity [Thu, 29 Apr 2010 09:12:57 +0000 (12:12 +0300)]
KVM: Document KVM_SET_BOOT_CPU_ID
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Thu, 29 Apr 2010 09:08:56 +0000 (12:08 +0300)]
KVM: Document KVM_SET_IDENTITY_MAP ioctl
Signed-off-by: Avi Kivity <avi@redhat.com>
Gui Jianfeng [Wed, 5 May 2010 01:03:49 +0000 (09:03 +0800)]
KVM: MMU: make kvm_mmu_zap_page() return the number of pages it actually freed
Currently, kvm_mmu_zap_page() returning the number of freed children sp.
This might confuse the caller, because caller don't know the actual freed
number. Let's make kvm_mmu_zap_page() return the number of pages it actually
freed.
Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Gui Jianfeng [Wed, 5 May 2010 01:58:33 +0000 (09:58 +0800)]
KVM: MMU: Fix debug output error in walk_addr()
Fix a debug output error in walk_addr
Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Gui Jianfeng [Wed, 5 May 2010 01:09:21 +0000 (09:09 +0800)]
KVM: MMU: mark page table dirty when a pte is actually modified
Sometime cmpxchg_gpte doesn't modify gpte, in such case, don't mark
page table page as dirty.
Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Joerg Roedel [Wed, 5 May 2010 14:04:44 +0000 (16:04 +0200)]
KVM: SVM: Allow EFER.LMSLE to be set with nested svm
This patch enables setting of efer bit 13 which is allowed
in all SVM capable processors. This is necessary for the
SLES11 version of Xen 4.0 to boot with nested svm.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Joerg Roedel [Wed, 5 May 2010 14:04:42 +0000 (16:04 +0200)]
KVM: SVM: Dump vmcb contents on failed vmrun
This patch adds a function to dump the vmcb into the kernel
log and calls it after a failed vmrun to ease debugging.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Mon, 3 May 2010 13:54:48 +0000 (16:54 +0300)]
KVM: Get rid of KVM_REQ_KICK
KVM_REQ_KICK poisons vcpu->requests by having a bit set during normal
operation. This causes the fast path check for a clear vcpu->requests
to fail all the time, triggering tons of atomic operations.
Fix by replacing KVM_REQ_KICK with a vcpu->guest_mode atomic.
Signed-off-by: Avi Kivity <avi@redhat.com>
Gleb Natapov [Wed, 28 Apr 2010 16:15:44 +0000 (19:15 +0300)]
KVM: x86 emulator: do not inject exception directly into vcpu
Return exception as a result of instruction emulation and handle
injection in KVM code.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Gleb Natapov [Wed, 28 Apr 2010 16:15:43 +0000 (19:15 +0300)]
KVM: x86 emulator: move interruptibility state tracking out of emulator
Emulator shouldn't access vcpu directly.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Gleb Natapov [Wed, 28 Apr 2010 16:15:42 +0000 (19:15 +0300)]
KVM: x86 emulator: handle shadowed registers outside emulator
Emulator shouldn't access vcpu directly.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Gleb Natapov [Wed, 28 Apr 2010 16:15:41 +0000 (19:15 +0300)]
KVM: x86 emulator: use shadowed register in emulate_sysexit()
emulate_sysexit() should use shadowed registers copy instead of
looking into vcpu state directly.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Gleb Natapov [Wed, 28 Apr 2010 16:15:40 +0000 (19:15 +0300)]
KVM: x86 emulator: set RFLAGS outside x86 emulator code
Removes the need for set_flags() callback.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Gleb Natapov [Wed, 28 Apr 2010 16:15:39 +0000 (19:15 +0300)]
KVM: x86 emulator: advance RIP outside x86 emulator code
Return new RIP as part of instruction emulation result instead of
updating KVM's RIP from x86 emulator code.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Gleb Natapov [Wed, 28 Apr 2010 16:15:38 +0000 (19:15 +0300)]
KVM: handle emulation failure case first
If emulation failed return immediately.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Gleb Natapov [Wed, 28 Apr 2010 16:15:37 +0000 (19:15 +0300)]
KVM: do not inject #PF in (read|write)_emulated() callbacks
Return error to x86 emulator instead of injection exception behind its back.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Gleb Natapov [Wed, 28 Apr 2010 16:15:36 +0000 (19:15 +0300)]
KVM: remove export of emulator_write_emulated()
It is not called directly outside of the file it's defined in anymore.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Gleb Natapov [Wed, 28 Apr 2010 16:15:35 +0000 (19:15 +0300)]
KVM: x86 emulator: x86_emulate_insn() return -1 only in case of emulation failure
Currently emulator returns -1 when emulation failed or IO is needed.
Caller tries to guess whether emulation failed by looking at other
variables. Make it easier for caller to recognise error condition by
always returning -1 in case of failure. For this new emulator
internal return value X86EMUL_IO_NEEDED is introduced. It is used to
distinguish between error condition (which returns X86EMUL_UNHANDLEABLE)
and condition that requires IO exit to userspace to continue emulation.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Gleb Natapov [Wed, 28 Apr 2010 16:15:34 +0000 (19:15 +0300)]
KVM: fill in run->mmio details in (read|write)_emulated function
Fill in run->mmio details in (read|write)_emulated function just like
pio does. There is no point in filling only vcpu fields there just to
copy them into vcpu->run a little bit later.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Gleb Natapov [Wed, 28 Apr 2010 16:15:33 +0000 (19:15 +0300)]
KVM: x86 emulator: fix X86EMUL_RETRY_INSTR and X86EMUL_CMPXCHG_FAILED values
Currently X86EMUL_PROPAGATE_FAULT, X86EMUL_RETRY_INSTR and
X86EMUL_CMPXCHG_FAILED have the same value so caller cannot
distinguish why function such as emulator_cmpxchg_emulated()
(which can return both X86EMUL_PROPAGATE_FAULT and
X86EMUL_CMPXCHG_FAILED) failed.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Gleb Natapov [Wed, 28 Apr 2010 16:15:32 +0000 (19:15 +0300)]
KVM: x86 emulator: make (get|set)_dr() callback return error if it fails
Make (get|set)_dr() callback return error if it fails instead of
injecting exception behind emulator's back.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Gleb Natapov [Wed, 28 Apr 2010 16:15:31 +0000 (19:15 +0300)]
KVM: x86 emulator: make set_cr() callback return error if it fails
Make set_cr() callback return error if it fails instead of injecting #GP
behind emulator's back.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Gleb Natapov [Wed, 28 Apr 2010 16:15:30 +0000 (19:15 +0300)]
KVM: x86 emulator: cleanup some direct calls into kvm to use existing callbacks
Use callbacks from x86_emulate_ops to access segments instead of calling
into kvm directly.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Gleb Natapov [Wed, 28 Apr 2010 16:15:29 +0000 (19:15 +0300)]
KVM: x86 emulator: add get_cached_segment_base() callback to x86_emulate_ops
On VMX it is expensive to call get_cached_descriptor() just to get segment
base since multiple vmcs_reads are done instead of only one. Introduce
new call back get_cached_segment_base() for efficiency.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Gleb Natapov [Wed, 28 Apr 2010 16:15:28 +0000 (19:15 +0300)]
KVM: x86 emulator: add (set|get)_msr callbacks to x86_emulate_ops
Add (set|get)_msr callbacks to x86_emulate_ops instead of calling
them directly.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Gleb Natapov [Wed, 28 Apr 2010 16:15:27 +0000 (19:15 +0300)]
KVM: x86 emulator: add (set|get)_dr callbacks to x86_emulate_ops
Add (set|get)_dr callbacks to x86_emulate_ops instead of calling
them directly.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Gleb Natapov [Wed, 28 Apr 2010 16:15:26 +0000 (19:15 +0300)]
KVM: x86 emulator: handle "far address" source operand
ljmp/lcall instruction operand contains address and segment.
It can be 10 bytes long. Currently we decode it as two different
operands. Fix it by introducing new kind of operand that can hold
entire far address.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Gleb Natapov [Wed, 28 Apr 2010 16:15:25 +0000 (19:15 +0300)]
KVM: x86 emulator: cleanup nop emulation
Make it more explicit what we are checking for.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Gleb Natapov [Wed, 28 Apr 2010 16:15:24 +0000 (19:15 +0300)]
KVM: x86 emulator: cleanup xchg emulation
Dst operand is already initialized during decoding stage. No need to
reinitialize.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Gleb Natapov [Wed, 28 Apr 2010 16:15:23 +0000 (19:15 +0300)]
KVM: x86 emulator: fix Move r/m16 to segment register decoding
This instruction does not need generic decoding for its dst operand.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Gleb Natapov [Wed, 28 Apr 2010 16:15:22 +0000 (19:15 +0300)]
KVM: x86 emulator: introduce read cache
Introduce read cache which is needed for instruction that require more
then one exit to userspace. After returning from userspace the instruction
will be re-executed with cached read value.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Mon, 3 May 2010 13:05:44 +0000 (16:05 +0300)]
KVM: VMX: Avoid writing HOST_CR0 every entry
cr0.ts may change between entries, so we copy cr0 to HOST_CR0 before each
entry. That is slow, so instead, set HOST_CR0 to have TS set unconditionally
(which is a safe value), and issue a clts() just before exiting vcpu context
if the task indeed owns the fpu.
Saves ~50 cycles/exit.
Signed-off-by: Avi Kivity <avi@redhat.com>