Two small optimizations to not do register reloading in
vcpu_put/get, instead do it in the ioctl path. This reduces
the overhead for schedule-intense workload that does not
exit to QEMU. (e.g. KVM guest with eventfd/irqfd that
does a lot of context switching with vhost or iothreads).