I couldn't get Xen to boot a L2 HVM when it was nested under KVM - it was
getting a GP(0) on a rather unspecial vmread from Xen:
(XEN) ----[ Xen-4.7.0-rc x86_64 debug=n Not tainted ]----
(XEN) CPU: 1
(XEN) RIP: e008:[<
ffff82d0801e629e>] vmx_get_segment_register+0x14e/0x450
(XEN) RFLAGS:
0000000000010202 CONTEXT: hypervisor (d1v0)
(XEN) rax:
ffff82d0801e6288 rbx:
ffff83003ffbfb7c rcx:
fffffffffffab928
(XEN) rdx:
0000000000000000 rsi:
0000000000000000 rdi:
ffff83000bdd0000
(XEN) rbp:
ffff83000bdd0000 rsp:
ffff83003ffbfab0 r8:
ffff830038813910
(XEN) r9:
ffff83003faf3958 r10:
0000000a3b9f7640 r11:
ffff83003f82d418
(XEN) r12:
0000000000000000 r13:
ffff83003ffbffff r14:
0000000000004802
(XEN) r15:
0000000000000008 cr0:
0000000080050033 cr4:
00000000001526e0
(XEN) cr3:
000000003fc79000 cr2:
0000000000000000
(XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008
(XEN) Xen code around <
ffff82d0801e629e> (vmx_get_segment_register+0x14e/0x450):
(XEN) 00 00 41 be 02 48 00 00 <44> 0f 78 74 24 08 0f 86 38 56 00 00 b8 08 68 00
(XEN) Xen stack trace from rsp=
ffff83003ffbfab0:
...
(XEN) Xen call trace:
(XEN) [<
ffff82d0801e629e>] vmx_get_segment_register+0x14e/0x450
(XEN) [<
ffff82d0801f3695>] get_page_from_gfn_p2m+0x165/0x300
(XEN) [<
ffff82d0801bfe32>] hvmemul_get_seg_reg+0x52/0x60
(XEN) [<
ffff82d0801bfe93>] hvm_emulate_prepare+0x53/0x70
(XEN) [<
ffff82d0801ccacb>] handle_mmio+0x2b/0xd0
(XEN) [<
ffff82d0801be591>] emulate.c#_hvm_emulate_one+0x111/0x2c0
(XEN) [<
ffff82d0801cd6a4>] handle_hvm_io_completion+0x274/0x2a0
(XEN) [<
ffff82d0801f334a>] __get_gfn_type_access+0xfa/0x270
(XEN) [<
ffff82d08012f3bb>] timer.c#add_entry+0x4b/0xb0
(XEN) [<
ffff82d08012f80c>] timer.c#remove_entry+0x7c/0x90
(XEN) [<
ffff82d0801c8433>] hvm_do_resume+0x23/0x140
(XEN) [<
ffff82d0801e4fe7>] vmx_do_resume+0xa7/0x140
(XEN) [<
ffff82d080164aeb>] context_switch+0x13b/0xe40
(XEN) [<
ffff82d080128e6e>] schedule.c#schedule+0x22e/0x570
(XEN) [<
ffff82d08012c0cc>] softirq.c#__do_softirq+0x5c/0x90
(XEN) [<
ffff82d0801602c5>] domain.c#idle_loop+0x25/0x50
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 1:
(XEN) GENERAL PROTECTION FAULT
(XEN) [error_code=0000]
(XEN) ****************************************
Tracing my host KVM showed it was the one injecting the GP(0) when
emulating the VMREAD and checking the destination segment permissions in
get_vmx_mem_address():
3) | vmx_handle_exit() {
3) | handle_vmread() {
3) | nested_vmx_check_permission() {
3) | vmx_get_segment() {
3) 0.074 us | vmx_read_guest_seg_base();
3) 0.065 us | vmx_read_guest_seg_selector();
3) 0.066 us | vmx_read_guest_seg_ar();
3) 1.636 us | }
3) 0.058 us | vmx_get_rflags();
3) 0.062 us | vmx_read_guest_seg_ar();
3) 3.469 us | }
3) | vmx_get_cs_db_l_bits() {
3) 0.058 us | vmx_read_guest_seg_ar();
3) 0.662 us | }
3) | get_vmx_mem_address() {
3) 0.068 us | vmx_cache_reg();
3) | vmx_get_segment() {
3) 0.074 us | vmx_read_guest_seg_base();
3) 0.068 us | vmx_read_guest_seg_selector();
3) 0.071 us | vmx_read_guest_seg_ar();
3) 1.756 us | }
3) | kvm_queue_exception_e() {
3) 0.066 us | kvm_multiple_exception();
3) 0.684 us | }
3) 4.085 us | }
3) 9.833 us | }
3) + 10.366 us | }
Cross-checking the KVM/VMX VMREAD emulation code with the Intel Software
Developper Manual Volume 3C - "VMREAD - Read Field from Virtual-Machine
Control Structure", I found that we're enforcing that the destination
operand is NOT located in a read-only data segment or any code segment when
the L1 is in long mode - BUT that check should only happen when it is in
protected mode.
Shuffling the code a bit to make our emulation follow the specification
allows me to boot a Xen dom0 in a nested KVM and start HVM L2 guests
without problems.
Fixes:
f9eb4af67c9d ("KVM: nVMX: VMX instructions: add checks for #GP/#SS exceptions")
Signed-off-by: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Eugene Korenevsky <ekorenevsky@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
/* Checks for #GP/#SS exceptions. */
exn = false;
- if (is_protmode(vcpu)) {
+ if (is_long_mode(vcpu)) {
+ /* Long mode: #GP(0)/#SS(0) if the memory address is in a
+ * non-canonical form. This is the only check on the memory
+ * destination for long mode!
+ */
+ exn = is_noncanonical_address(*ret);
+ } else if (is_protmode(vcpu)) {
/* Protected mode: apply checks for segment validity in the
* following order:
* - segment type check (#GP(0) may be thrown)
* execute-only code segment
*/
exn = ((s.type & 0xa) == 8);
- }
- if (exn) {
- kvm_queue_exception_e(vcpu, GP_VECTOR, 0);
- return 1;
- }
- if (is_long_mode(vcpu)) {
- /* Long mode: #GP(0)/#SS(0) if the memory address is in a
- * non-canonical form. This is an only check for long mode.
- */
- exn = is_noncanonical_address(*ret);
- } else if (is_protmode(vcpu)) {
+ if (exn) {
+ kvm_queue_exception_e(vcpu, GP_VECTOR, 0);
+ return 1;
+ }
/* Protected mode: #GP(0)/#SS(0) if the segment is unusable.
*/
exn = (s.unusable != 0);