KVM: MMU: speedup update_permission_bitmask
authorPaolo Bonzini <pbonzini@redhat.com>
Thu, 24 Aug 2017 15:37:25 +0000 (17:37 +0200)
committerPaolo Bonzini <pbonzini@redhat.com>
Thu, 24 Aug 2017 16:09:18 +0000 (18:09 +0200)
commit09f037aa48f34c84e70c5a6ef4870b2dee3aee76
tree64fa777a0543ebed4e10f0c094e12299f63111b7
parentfd8cb433734eeb870156a67f5d56b6564cd2ea94
KVM: MMU: speedup update_permission_bitmask

update_permission_bitmask currently does a 128-iteration loop to,
essentially, compute a constant array.  Computing the 8 bits in parallel
reduces it to 16 iterations, and is enough to speed it up substantially
because many boolean operations in the inner loop become constants or
simplify noticeably.

Because update_permission_bitmask is actually the top item in the profile
for nested vmexits, this speeds up an L2->L1 vmexit by about ten thousand
clock cycles, or up to 30%:

                                         before     after
   cpuid                                 35173      25954
   vmcall                                35122      27079
   inl_from_pmtimer                      52635      42675
   inl_from_qemu                         53604      44599
   inl_from_kernel                       38498      30798
   outl_to_kernel                        34508      28816
   wr_tsc_adjust_msr                     34185      26818
   rd_tsc_adjust_msr                     37409      27049
   mmio-no-eventfd:pci-mem               50563      45276
   mmio-wildcard-eventfd:pci-mem         34495      30823
   mmio-datamatch-eventfd:pci-mem        35612      31071
   portio-no-eventfd:pci-io              44925      40661
   portio-wildcard-eventfd:pci-io        29708      27269
   portio-datamatch-eventfd:pci-io       31135      27164

(I wrote a small C program to compare the tables for all values of CR0.WP,
CR4.SMAP and CR4.SMEP, and they match).

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
arch/x86/kvm/mmu.c