GitHub/MotorolaMobilityLLC/kernel-slsi.git
13 years agoKVM: x86 emulator: convert push %sreg/pop %sreg to direct decode
Avi Kivity [Tue, 13 Sep 2011 07:45:51 +0000 (10:45 +0300)]
KVM: x86 emulator: convert push %sreg/pop %sreg to direct decode

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
13 years agoKVM: x86 emulator: switch lds/les/lss/lfs/lgs to direct decode
Avi Kivity [Tue, 13 Sep 2011 07:45:50 +0000 (10:45 +0300)]
KVM: x86 emulator: switch lds/les/lss/lfs/lgs to direct decode

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
13 years agoKVM: x86 emulator: streamline decode of segment registers
Avi Kivity [Tue, 13 Sep 2011 07:45:49 +0000 (10:45 +0300)]
KVM: x86 emulator: streamline decode of segment registers

The opcodes

  push %seg
  pop %seg
  l%seg, %mem, %reg  (e.g. lds/les/lss/lfs/lgs)

all have an segment register encoded in the instruction.  To allow reuse,
decode the segment number into src2 during the decode stage instead of the
execution stage.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
13 years agoKVM: x86 emulator: simplify OpMem64 decode
Avi Kivity [Tue, 13 Sep 2011 07:45:48 +0000 (10:45 +0300)]
KVM: x86 emulator: simplify OpMem64 decode

Use the same technique as the other OpMem variants, and goto mem_common.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
13 years agoKVM: x86 emulator: switch src decode to decode_operand()
Avi Kivity [Tue, 13 Sep 2011 07:45:47 +0000 (10:45 +0300)]
KVM: x86 emulator: switch src decode to decode_operand()

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
13 years agoKVM: x86 emulator: qualify OpReg inhibit_byte_regs hack
Avi Kivity [Tue, 13 Sep 2011 07:45:46 +0000 (10:45 +0300)]
KVM: x86 emulator: qualify OpReg inhibit_byte_regs hack

OpReg decoding has a hack that inhibits byte registers for movsx and movzx
instructions.  It should be replaced by something better, but meanwhile,
qualify that the hack is only active for the destination operand.

Note these instructions only use OpReg for the destination, but better to
be explicit about it.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
13 years agoKVM: x86 emulator: switch OpImmUByte decode to decode_imm()
Avi Kivity [Tue, 13 Sep 2011 07:45:45 +0000 (10:45 +0300)]
KVM: x86 emulator: switch OpImmUByte decode to decode_imm()

Similar to SrcImmUByte.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
13 years agoKVM: x86 emulator: free up some flag bits near src, dst
Avi Kivity [Tue, 13 Sep 2011 07:45:44 +0000 (10:45 +0300)]
KVM: x86 emulator: free up some flag bits near src, dst

Op fields are going to grow by a bit, we need two free bits.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
13 years agoKVM: x86 emulator: switch src2 to generic decode_operand()
Avi Kivity [Tue, 13 Sep 2011 07:45:43 +0000 (10:45 +0300)]
KVM: x86 emulator: switch src2 to generic decode_operand()

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
13 years agoKVM: x86 emulator: expand decode flags to 64 bits
Avi Kivity [Tue, 13 Sep 2011 07:45:42 +0000 (10:45 +0300)]
KVM: x86 emulator: expand decode flags to 64 bits

Unifiying the operands means not taking advantage of the fact that some
operand types can only go into certain operands (for example, DI can only
be used by the destination), so we need more bits to hold the operand type.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
13 years agoKVM: x86 emulator: split dst decode to a generic decode_operand()
Avi Kivity [Tue, 13 Sep 2011 07:45:41 +0000 (10:45 +0300)]
KVM: x86 emulator: split dst decode to a generic decode_operand()

Instead of decoding each operand using its own code, use a generic
function.  Start with the destination operand.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
13 years agoKVM: x86 emulator: move memop, memopp into emulation context
Avi Kivity [Tue, 13 Sep 2011 07:45:40 +0000 (10:45 +0300)]
KVM: x86 emulator: move memop, memopp into emulation context

Simplifies further generalization of decode.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
13 years agoKVM: x86 emulator: convert group 3 instructions to direct decode
Avi Kivity [Tue, 13 Sep 2011 07:45:39 +0000 (10:45 +0300)]
KVM: x86 emulator: convert group 3 instructions to direct decode

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
13 years agoKVM: Split up MSI-X assigned device IRQ handler
Jan Kiszka [Mon, 12 Sep 2011 16:57:56 +0000 (18:57 +0200)]
KVM: Split up MSI-X assigned device IRQ handler

The threaded IRQ handler for MSI-X has almost nothing in common with the
INTx/MSI handler. Move its code into a dedicated handler.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
13 years agoKVM: x86: Add module parameter for lapic periodic timer limit
Jan Kiszka [Mon, 12 Sep 2011 12:10:22 +0000 (14:10 +0200)]
KVM: x86: Add module parameter for lapic periodic timer limit

Certain guests, specifically RTOSes, request faster periodic timers than
what we allow by default. Add a module parameter to adjust the limit for
non-standard setups. Also add a rate-limited warning in case the guest
requested more.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
13 years agoKVM: Clean up and extend rate-limited output
Jan Kiszka [Mon, 12 Sep 2011 09:26:22 +0000 (11:26 +0200)]
KVM: Clean up and extend rate-limited output

The use of printk_ratelimit is discouraged, replace it with
pr*_ratelimited or __ratelimit. While at it, convert remaining
guest-triggerable printks to rate-limited variants.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
13 years agoKVM: x86: Avoid guest-triggerable printks in APIC model
Jan Kiszka [Mon, 12 Sep 2011 09:25:51 +0000 (11:25 +0200)]
KVM: x86: Avoid guest-triggerable printks in APIC model

Convert remaining printks that the guest can trigger to apic_printk.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
13 years agoKVM: x86: Move kvm_trace_exit into atomic vmexit section
Jan Kiszka [Mon, 12 Sep 2011 08:52:24 +0000 (10:52 +0200)]
KVM: x86: Move kvm_trace_exit into atomic vmexit section

This avoids that events causing the vmexit are recorded before the
actual exit reason.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
13 years agoKVM: x86 emulator: disable writeback for TEST
Avi Kivity [Sun, 11 Sep 2011 08:23:02 +0000 (11:23 +0300)]
KVM: x86 emulator: disable writeback for TEST

The TEST instruction doesn't write its destination operand.  This
could cause problems if an MMIO register was accessed using the TEST
instruction.  Recently Windows XP was observed to use TEST against
the APIC ICR; this can cause spurious IPIs.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
13 years agoKVM: Avoid needless registrations of IRQ ack notifier for assigned devices
Jan Kiszka [Sun, 11 Sep 2011 08:16:20 +0000 (10:16 +0200)]
KVM: Avoid needless registrations of IRQ ack notifier for assigned devices

We only perform work in kvm_assigned_dev_ack_irq if the guest IRQ is of
INTx type. This completely avoids the callback invocation in non-INTx
cases by registering the IRQ ack notifier only for INTx.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
13 years agoKVM: Clean up unneeded void pointer casts
Jan Kiszka [Sun, 11 Sep 2011 08:16:16 +0000 (10:16 +0200)]
KVM: Clean up unneeded void pointer casts

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
13 years agoKVM: x86 emulator: simplify emulate_1op_rax_rdx()
Avi Kivity [Wed, 7 Sep 2011 13:41:40 +0000 (16:41 +0300)]
KVM: x86 emulator: simplify emulate_1op_rax_rdx()

emulate_1op_rax_rdx() is always called with the same parameters.  Simplify
by passing just the emulation context.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
13 years agoKVM: x86 emulator: merge the two emulate_1op_rax_rdx implementations
Avi Kivity [Wed, 7 Sep 2011 13:41:39 +0000 (16:41 +0300)]
KVM: x86 emulator: merge the two emulate_1op_rax_rdx implementations

We have two emulate-with-extended-accumulator implementations: once
which expect traps (_ex) and one which doesn't (plain).  Drop the
plain implementation and always use the one which expects traps;
it will simply return 0 in the _ex argument and we can happily ignore
it.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
13 years agoKVM: x86 emulator: simplify emulate_1op()
Avi Kivity [Wed, 7 Sep 2011 13:41:38 +0000 (16:41 +0300)]
KVM: x86 emulator: simplify emulate_1op()

emulate_1op() is always called with the same parameters.  Simplify
by passing just the emulation context.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
13 years agoKVM: x86 emulator: simplify emulate_2op_cl()
Avi Kivity [Wed, 7 Sep 2011 13:41:37 +0000 (16:41 +0300)]
KVM: x86 emulator: simplify emulate_2op_cl()

emulate_2op_cl() is always called with the same parameters.  Simplify
by passing just the emulation context.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
13 years agoKVM: x86 emulator: simplify emulate_2op_cl()
Avi Kivity [Wed, 7 Sep 2011 13:41:36 +0000 (16:41 +0300)]
KVM: x86 emulator: simplify emulate_2op_cl()

emulate_2op_cl() is always called with the same parameters.  Simplify
by passing just the emulation context.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
13 years agoKVM: x86 emulator: simplify emulate_2op_SrcV()
Avi Kivity [Wed, 7 Sep 2011 13:41:35 +0000 (16:41 +0300)]
KVM: x86 emulator: simplify emulate_2op_SrcV()

emulate_2op_SrcV(), and its siblings, emulate_2op_SrcV_nobyte()
and emulate_2op_SrcB(), all use the same calling conventions
and all get passed exactly the same parameters.  Simplify them
by passing just the emulation context.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
13 years agoKVM: Update documentation to include detailed ENABLE_CAP description
Alexander Graf [Wed, 31 Aug 2011 08:58:55 +0000 (10:58 +0200)]
KVM: Update documentation to include detailed ENABLE_CAP description

We have an ioctl that enables capabilities individually, but no description
on what exactly happens when we enable a capability using this ioctl.

This patch adds documentation for capability enabling in a new section
of the API documentation.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: PPC: Implement H_CEDE hcall for book3s_hv in real-mode code
Paul Mackerras [Sat, 23 Jul 2011 07:42:46 +0000 (17:42 +1000)]
KVM: PPC: Implement H_CEDE hcall for book3s_hv in real-mode code

With a KVM guest operating in SMT4 mode (i.e. 4 hardware threads per
core), whenever a CPU goes idle, we have to pull all the other
hardware threads in the core out of the guest, because the H_CEDE
hcall is handled in the kernel.  This is inefficient.

This adds code to book3s_hv_rmhandlers.S to handle the H_CEDE hcall
in real mode.  When a guest vcpu does an H_CEDE hcall, we now only
exit to the kernel if all the other vcpus in the same core are also
idle.  Otherwise we mark this vcpu as napping, save state that could
be lost in nap mode (mainly GPRs and FPRs), and execute the nap
instruction.  When the thread wakes up, because of a decrementer or
external interrupt, we come back in at kvm_start_guest (from the
system reset interrupt vector), find the `napping' flag set in the
paca, and go to the resume path.

This has some other ramifications.  First, when starting a core, we
now start all the threads, both those that are immediately runnable and
those that are idle.  This is so that we don't have to pull all the
threads out of the guest when an idle thread gets a decrementer interrupt
and wants to start running.  In fact the idle threads will all start
with the H_CEDE hcall returning; being idle they will just do another
H_CEDE immediately and go to nap mode.

This required some changes to kvmppc_run_core() and kvmppc_run_vcpu().
These functions have been restructured to make them simpler and clearer.
We introduce a level of indirection in the wait queue that gets woken
when external and decrementer interrupts get generated for a vcpu, so
that we can have the 4 vcpus in a vcore using the same wait queue.
We need this because the 4 vcpus are being handled by one thread.

Secondly, when we need to exit from the guest to the kernel, we now
have to generate an IPI for any napping threads, because an HDEC
interrupt doesn't wake up a napping thread.

Thirdly, we now need to be able to handle virtual external interrupts
and decrementer interrupts becoming pending while a thread is napping,
and deliver those interrupts to the guest when the thread wakes.
This is done in kvmppc_cede_reentry, just before fast_guest_return.

Finally, since we are not using the generic kvm_vcpu_block for book3s_hv,
and hence not calling kvm_arch_vcpu_runnable, we can remove the #ifdef
from kvm_arch_vcpu_runnable.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
13 years agoKVM: PPC: book3s_pr: Simplify transitions between virtual and real mode
Paul Mackerras [Sat, 23 Jul 2011 07:41:44 +0000 (17:41 +1000)]
KVM: PPC: book3s_pr: Simplify transitions between virtual and real mode

This simplifies the way that the book3s_pr makes the transition to
real mode when entering the guest.  We now call kvmppc_entry_trampoline
(renamed from kvmppc_rmcall) in the base kernel using a normal function
call instead of doing an indirect call through a pointer in the vcpu.
If kvm is a module, the module loader takes care of generating a
trampoline as it does for other calls to functions outside the module.

kvmppc_entry_trampoline then disables interrupts and jumps to
kvmppc_handler_trampoline_enter in real mode using an rfi[d].
That then uses the link register as the address to return to
(potentially in module space) when the guest exits.

This also simplifies the way that we call the Linux interrupt handler
when we exit the guest due to an external, decrementer or performance
monitor interrupt.  Instead of turning on the MMU, then deciding that
we need to call the Linux handler and turning the MMU back off again,
we now go straight to the handler at the point where we would turn the
MMU on.  The handler will then return to the virtual-mode code
(potentially in the module).

Along the way, this moves the setting and clearing of the HID5 DCBZ32
bit into real-mode interrupts-off code, and also makes sure that
we clear the MSR[RI] bit before loading values into SRR0/1.

The net result is that we no longer need any code addresses to be
stored in vcpu->arch.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
13 years agoKVM: PPC: Assemble book3s{,_hv}_rmhandlers.S separately
Paul Mackerras [Sat, 23 Jul 2011 07:41:11 +0000 (17:41 +1000)]
KVM: PPC: Assemble book3s{,_hv}_rmhandlers.S separately

This makes arch/powerpc/kvm/book3s_rmhandlers.S and
arch/powerpc/kvm/book3s_hv_rmhandlers.S be assembled as
separate compilation units rather than having them #included in
arch/powerpc/kernel/exceptions-64s.S.  We no longer have any
conditional branches between the exception prologs in
exceptions-64s.S and the KVM handlers, so there is no need to
keep their contents close together in the vmlinux image.

In their current location, they are using up part of the limited
space between the first-level interrupt handlers and the firmware
NMI data area at offset 0x7000, and with some kernel configurations
this area will overflow (e.g. allyesconfig), leading to an
"attempt to .org backwards" error when compiling exceptions-64s.S.

Moving them out requires that we add some #includes that the
book3s_{,hv_}rmhandlers.S code was previously getting implicitly
via exceptions-64s.S.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
13 years agoKVM: PPC: Add sanity checking to vcpu_run
Alexander Graf [Wed, 10 Aug 2011 11:57:08 +0000 (13:57 +0200)]
KVM: PPC: Add sanity checking to vcpu_run

There are multiple features in PowerPC KVM that can now be enabled
depending on the user's wishes. Some of the combinations don't make
sense or don't work though.

So this patch adds a way to check if the executing environment would
actually be able to run the guest properly. It also adds sanity
checks if PVR is set (should always be true given the current code
flow), if PAPR is only used with book3s_64 where it works and that
HV KVM is only used in PAPR mode.

Signed-off-by: Alexander Graf <agraf@suse.de>
13 years agoKVM: PPC: Enable the PAPR CAP for Book3S
Alexander Graf [Mon, 8 Aug 2011 15:29:42 +0000 (17:29 +0200)]
KVM: PPC: Enable the PAPR CAP for Book3S

Now that Book3S PV mode can also run PAPR guests, we can add a PAPR cap and
enable it for all Book3S targets. Enabling that CAP switches KVM into PAPR
mode.

Signed-off-by: Alexander Graf <agraf@suse.de>
13 years agoKVM: PPC: Support SC1 hypercalls for PAPR in PR mode
Alexander Graf [Mon, 8 Aug 2011 15:26:24 +0000 (17:26 +0200)]
KVM: PPC: Support SC1 hypercalls for PAPR in PR mode

PAPR defines hypercalls as SC1 instructions. Using these, the guest modifies
page tables and does other privileged operations that it wouldn't be allowed
to do in supervisor mode.

This patch adds support for PR KVM to trap these instructions and route them
through the same PAPR hypercall interface that we already use for HV style
KVM.

Signed-off-by: Alexander Graf <agraf@suse.de>
13 years agoKVM: PPC: Stub emulate CFAR and PURR SPRs
Alexander Graf [Mon, 8 Aug 2011 15:22:59 +0000 (17:22 +0200)]
KVM: PPC: Stub emulate CFAR and PURR SPRs

Recent Linux versions use the CFAR and PURR SPRs, but don't really care about
their contents (yet). So for now, we can simply return 0 when the guest wants
to read them.

Signed-off-by: Alexander Graf <agraf@suse.de>
13 years agoKVM: PPC: Add PAPR hypercall code for PR mode
Alexander Graf [Mon, 8 Aug 2011 15:21:15 +0000 (17:21 +0200)]
KVM: PPC: Add PAPR hypercall code for PR mode

When running a PAPR guest, we need to handle a few hypercalls in kernel space,
most prominently the page table invalidation (to sync the shadows).

So this patch adds handling for a few PAPR hypercalls to PR mode KVM. I tried
to share the code with HV mode, but it ended up being a lot easier this way
around, as the two differ too much in those details.

Signed-off-by: Alexander Graf <agraf@suse.de>
---

v1 -> v2:

  - whitespace fix

13 years agoKVM: PPC: Add support for explicit HIOR setting
Alexander Graf [Mon, 8 Aug 2011 15:17:09 +0000 (17:17 +0200)]
KVM: PPC: Add support for explicit HIOR setting

Until now, we always set HIOR based on the PVR, but this is just wrong.
Instead, we should be setting HIOR explicitly, so user space can decide
what the initial HIOR value is - just like on real hardware.

We keep the old PVR based way around for backwards compatibility, but
once user space uses the SREGS based method, we drop the PVR logic.

Signed-off-by: Alexander Graf <agraf@suse.de>
13 years agoKVM: PPC: Read out syscall instruction on trap
Alexander Graf [Mon, 8 Aug 2011 14:11:36 +0000 (16:11 +0200)]
KVM: PPC: Read out syscall instruction on trap

We have a few traps where we cache the instruction that cause the trap
for analysis later on. Since we now need to be able to distinguish
between SC 0 and SC 1 system calls and the only way to find out which
is which is by looking at the instruction, we also read out the instruction
causing the system call.

Signed-off-by: Alexander Graf <agraf@suse.de>
13 years agoKVM: PPC: Interpret SDR1 as HVA in PAPR mode
Alexander Graf [Mon, 8 Aug 2011 13:06:55 +0000 (15:06 +0200)]
KVM: PPC: Interpret SDR1 as HVA in PAPR mode

When running a PAPR guest, the guest is not allowed to set SDR1 - instead
the HTAB information is held in internal hypervisor structures. But all of
our current code relies on SDR1 and walking the HTAB like on real hardware.

So in order to not be too intrusive, we simply set SDR1 to the HTAB we hold
in host memory. That way we can keep the HTAB in user space, but use it from
kernel space to map the guest.

Signed-off-by: Alexander Graf <agraf@suse.de>
13 years agoKVM: PPC: Check privilege level on SPRs
Alexander Graf [Mon, 8 Aug 2011 14:07:16 +0000 (16:07 +0200)]
KVM: PPC: Check privilege level on SPRs

We have 3 privilege levels: problem state, supervisor state and hypervisor
state. Each of them can access different SPRs, so we need to check on every
SPR if it's accessible in the respective mode.

Signed-off-by: Alexander Graf <agraf@suse.de>
13 years agoKVM: PPC: Add papr_enabled flag
Alexander Graf [Mon, 8 Aug 2011 14:08:55 +0000 (16:08 +0200)]
KVM: PPC: Add papr_enabled flag

When running a PAPR guest, some things change. The privilege level drops
from hypervisor to supervisor, SDR1 gets treated differently and we interpret
hypercalls. For bisectability sake, add the flag now, but only enable it when
all the support code is there.

Signed-off-by: Alexander Graf <agraf@suse.de>
13 years agoKVM: PPC: move compute_tlbie_rb to book3s common header
Alexander Graf [Fri, 8 Jul 2011 11:40:10 +0000 (13:40 +0200)]
KVM: PPC: move compute_tlbie_rb to book3s common header

We need the compute_tlbie_rb in _pr and _hv implementations for papr
soon, so let's move it over to a common header file that both
implementations can leverage.

Signed-off-by: Alexander Graf <agraf@suse.de>
13 years agoKVM: Restore missing powerpc API docs
Avi Kivity [Mon, 29 Aug 2011 13:27:08 +0000 (16:27 +0300)]
KVM: Restore missing powerpc API docs

Commit 371fefd6 lost a doc hunk somehow, restore it.

Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: APIC: avoid instruction emulation for EOI writes
Kevin Tian [Tue, 30 Aug 2011 10:56:17 +0000 (13:56 +0300)]
KVM: APIC: avoid instruction emulation for EOI writes

Instruction emulation for EOI writes can be skipped, since sane
guest simply uses MOV instead of string operations. This is a nice
improvement when guest doesn't support x2apic or hyper-V EOI
support.

a single VM bandwidth is observed with ~8% bandwidth improvement
(7.4Gbps->8Gbps), by saving ~5% cycles from EOI emulation.

Signed-off-by: Kevin Tian <kevin.tian@intel.com>
<Based on earlier work from>:
Signed-off-by: Eddie Dong <eddie.dong@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: SVM: Fix TSC MSR read in nested SVM
Nadav Har'El [Tue, 2 Aug 2011 12:55:23 +0000 (15:55 +0300)]
KVM: SVM: Fix TSC MSR read in nested SVM

When the TSC MSR is read by an L2 guest (when L1 allowed this MSR to be
read without exit), we need to return L2's notion of the TSC, not L1's.

The current code incorrectly returned L1 TSC, because svm_get_msr() was also
used in x86.c where this was assumed, but now that these places call the new
svm_read_l1_tsc(), the MSR read can be fixed.

Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Tested-by: Joerg Roedel <joerg.roedel@amd.com>
Acked-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: nVMX: Fix nested VMX TSC emulation
Nadav Har'El [Tue, 2 Aug 2011 12:54:52 +0000 (15:54 +0300)]
KVM: nVMX: Fix nested VMX TSC emulation

This patch fixes two corner cases in nested (L2) handling of TSC-related
issues:

1. Somewhat suprisingly, according to the Intel spec, if L1 allows WRMSR to
the TSC MSR without an exit, then this should set L1's TSC value itself - not
offset by vmcs12.TSC_OFFSET (like was wrongly done in the previous code).

2. Allow L1 to disable the TSC_OFFSETING control, and then correctly ignore
the vmcs12.TSC_OFFSET.

Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: L1 TSC handling
Nadav Har'El [Tue, 2 Aug 2011 12:54:20 +0000 (15:54 +0300)]
KVM: L1 TSC handling

KVM assumed in several places that reading the TSC MSR returns the value for
L1. This is incorrect, because when L2 is running, the correct TSC read exit
emulation is to return L2's value.

We therefore add a new x86_ops function, read_l1_tsc, to use in places that
specifically need to read the L1 TSC, NOT the TSC of the current level of
guest.

Note that one change, of one line in kvm_arch_vcpu_load, is made redundant
by a different patch sent by Zachary Amsden (and not yet applied):
kvm_arch_vcpu_load() should not read the guest TSC, and if it didn't, of
course we didn't have to change the call of kvm_get_msr() to read_l1_tsc().

[avi: moved callback to kvm_x86_ops tsc block]

Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Acked-by: Zachary Amsdem <zamsden@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: nVMX: Document 'nested' parameter
Sasha Levin [Tue, 9 Aug 2011 11:28:35 +0000 (14:28 +0300)]
KVM: nVMX: Document 'nested' parameter

Add documentation of the new 'nested' parameter to
'Documentation/kernel-parameters.txt'.

Cc: Avi Kivity <avi@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Nadav Har'El <nyh@il.ibm.com>
Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: MMU: Fix SMEP failure during fetch
Yang, Wei Y [Tue, 9 Aug 2011 10:14:01 +0000 (18:14 +0800)]
KVM: MMU: Fix SMEP failure during fetch

This patch fix kvm-unit-tests hanging and incorrect PT_ACCESSED_MASK
bit set in the case of SMEP fault.  The code updated 'eperm' after
the variable was checked.

Signed-off-by: Yang, Wei <wei.y.yang@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: MMU: Do not unconditionally read PDPTE from guest memory
Avi Kivity [Thu, 28 Jul 2011 08:36:17 +0000 (11:36 +0300)]
KVM: MMU: Do not unconditionally read PDPTE from guest memory

Architecturally, PDPTEs are cached in the PDPTRs when CR3 is reloaded.
On SVM, it is not possible to implement this, but on VMX this is possible
and was indeed implemented until nested SVM changed this to unconditionally
read PDPTEs dynamically.  This has noticable impact when running PAE guests.

Fix by changing the MMU to read PDPTRs from the cache, falling back to
reading from memory for the nested MMU.

Signed-off-by: Avi Kivity <avi@redhat.com>
Tested-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
13 years agoKVM: VMX: trivial: use BUG_ON
Julia Lawall [Tue, 2 Aug 2011 10:34:57 +0000 (12:34 +0200)]
KVM: VMX: trivial: use BUG_ON

Use BUG_ON(x) rather than if(x) BUG();

The semantic patch that fixes this problem is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@ identifier x; @@
-if (x) BUG();
+BUG_ON(x);

@@ identifier x; @@
-if (!x) BUG();
+BUG_ON(!x);
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
13 years agoKVM: x86: report valid microcode update ID
Marcelo Tosatti [Fri, 29 Jul 2011 22:44:21 +0000 (19:44 -0300)]
KVM: x86: report valid microcode update ID

Windows Server 2008 SP2 checked build with smp > 1 BSOD's during
boot due to lack of microcode update:

*** Assertion failed: The system BIOS on this machine does not properly
support the processor.  The system BIOS did not load any microcode update.
A BIOS containing the latest microcode update is needed for system reliability.
(CurrentUpdateRevision != 0)
***   Source File: d:\longhorn\base\hals\update\intelupd\update.c, line 440

Report a non-zero microcode update signature to make it happy.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86 emulator: Make x86_decode_insn() return proper macros
Takuya Yoshikawa [Sat, 30 Jul 2011 09:03:34 +0000 (18:03 +0900)]
KVM: x86 emulator: Make x86_decode_insn() return proper macros

Return EMULATION_OK/FAILED consistently.  Also treat instruction fetch
errors, not restricted to X86EMUL_UNHANDLEABLE, as EMULATION_FAILED;
although this cannot happen in practice, the current logic will continue
the emulation even if the decoder fails to fetch the instruction.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86 emulator: Let compiler know insn_fetch() rarely fails
Takuya Yoshikawa [Sat, 30 Jul 2011 09:02:29 +0000 (18:02 +0900)]
KVM: x86 emulator: Let compiler know insn_fetch() rarely fails

Fetching the instruction which was to be executed by the guest cannot
fail normally.  So compiler should always predict that it will succeed.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86 emulator: Drop _size argument from insn_fetch()
Takuya Yoshikawa [Sat, 30 Jul 2011 09:01:26 +0000 (18:01 +0900)]
KVM: x86 emulator: Drop _size argument from insn_fetch()

_type is enough to know the size.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86 emulator: Use ctxt->_eip directly in do_insn_fetch_byte()
Takuya Yoshikawa [Sat, 30 Jul 2011 09:00:17 +0000 (18:00 +0900)]
KVM: x86 emulator: Use ctxt->_eip directly in do_insn_fetch_byte()

Instead of passing ctxt->_eip from insn_fetch() call sites, get it from
ctxt in do_insn_fetch_byte().  This is done by replacing the argument
_eip of insn_fetch() with _ctxt, which should be better than letting the
macro use ctxt silently in its body.

Though this changes the place where ctxt->_eip is incremented from
insn_fetch() to do_insn_fetch_byte(), this does not have any real
effect.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: Intelligent device lookup on I/O bus
Sasha Levin [Wed, 27 Jul 2011 13:00:48 +0000 (16:00 +0300)]
KVM: Intelligent device lookup on I/O bus

Currently the method of dealing with an IO operation on a bus (PIO/MMIO)
is to call the read or write callback for each device registered
on the bus until we find a device which handles it.

Since the number of devices on a bus can be significant due to ioeventfds
and coalesced MMIO zones, this leads to a lot of overhead on each IO
operation.

Instead of registering devices, we now register ranges which points to
a device. Lookup is done using an efficient bsearch instead of a linear
search.

Performance test was conducted by comparing exit count per second with
200 ioeventfds created on one byte and the guest is trying to access a
different byte continuously (triggering usermode exits).
Before the patch the guest has achieved 259k exits per second, after the
patch the guest does 274k exits per second.

Cc: Avi Kivity <avi@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: Use __print_symbolic() for vmexit tracepoints
Stefan Hajnoczi [Fri, 22 Jul 2011 11:46:53 +0000 (12:46 +0100)]
KVM: Use __print_symbolic() for vmexit tracepoints

The vmexit tracepoints format the exit_reason to make it human-readable.
Since the exit_reason depends on the instruction set (vmx or svm),
formatting is handled with ftrace_print_symbols_seq() by referring to
the appropriate exit reason table.

However, the ftrace_print_symbols_seq() function is not meant to be used
directly in tracepoints since it does not export the formatting table
which userspace tools like trace-cmd and perf use to format traces.

In practice perf dies when formatting vmexit-related events and
trace-cmd falls back to printing the numeric value (with extra
formatting code in the kvm plugin to paper over this limitation).  Other
userspace consumers of vmexit-related tracepoints would be in similar
trouble.

To avoid significant changes to the kvm_exit tracepoint, this patch
moves the vmx and svm exit reason tables into arch/x86/kvm/trace.h and
selects the right table with __print_symbolic() depending on the
instruction set.  Note that __print_symbolic() is designed for exporting
the formatting table to userspace and allows trace-cmd and perf to work.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: Record instruction set in all vmexit tracepoints
Stefan Hajnoczi [Fri, 22 Jul 2011 11:46:52 +0000 (12:46 +0100)]
KVM: Record instruction set in all vmexit tracepoints

The kvm_exit tracepoint recently added the isa argument to aid decoding
exit_reason.  The semantics of exit_reason depend on the instruction set
(vmx or svm) and the isa argument allows traces to be analyzed on other
machines.

Add the isa argument to kvm_nested_vmexit and kvm_nested_vmexit_inject
so these tracepoints can also be self-describing.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: Really fix HV_X64_MSR_APIC_ASSIST_PAGE
Mike Waychison [Sat, 23 Jul 2011 07:31:45 +0000 (00:31 -0700)]
KVM: Really fix HV_X64_MSR_APIC_ASSIST_PAGE

Commit 0945d4b228 tried to fix the get_msr path for the
HV_X64_MSR_APIC_ASSIST_PAGE msr, but was poorly tested.  We should be
returning 0 if the read succeeded, and passing the value back to the
caller via the pdata out argument, not returning the value directly.

Signed-off-by: Mike Waychison <mikew@google.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86: get_msr support for HV_X64_MSR_APIC_ASSIST_PAGE
Mike Waychison [Thu, 21 Jul 2011 22:38:10 +0000 (15:38 -0700)]
KVM: x86: get_msr support for HV_X64_MSR_APIC_ASSIST_PAGE

"get" support for the HV_X64_MSR_APIC_ASSIST_PAGE msr was missing, even
though it is explicitly enumerated as something the vmm should save in
msrs_to_save and reported to userland via the KVM_GET_MSR_INDEX_LIST
ioctl.

Add "get" support for HV_X64_MSR_APIC_ASSIST_PAGE.  We simply return the
guest visible value of this register, which seems to be correct as a set
on the register is validated for us already.

Signed-off-by: Mike Waychison <mikew@google.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
13 years agoKVM: Make coalesced mmio use a device per zone
Sasha Levin [Wed, 20 Jul 2011 17:59:00 +0000 (20:59 +0300)]
KVM: Make coalesced mmio use a device per zone

This patch changes coalesced mmio to create one mmio device per
zone instead of handling all zones in one device.

Doing so enables us to take advantage of existing locking and prevents
a race condition between coalesced mmio registration/unregistration
and lookups.

Suggested-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
13 years agoKVM: x86: Raise the hard VCPU count limit
Sasha Levin [Mon, 18 Jul 2011 14:17:15 +0000 (17:17 +0300)]
KVM: x86: Raise the hard VCPU count limit

The patch raises the hard limit of VCPU count to 254.

This will allow developers to easily work on scalability
and will allow users to test high VCPU setups easily without
patching the kernel.

To prevent possible issues with current setups, KVM_CAP_NR_VCPUS
now returns the recommended VCPU limit (which is still 64) - this
should be a safe value for everybody, while a new KVM_CAP_MAX_VCPUS
returns the hard limit which is now 254.

Cc: Avi Kivity <avi@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Pekka Enberg <penberg@kernel.org>
Suggested-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
13 years agoKVM: MMIO: Lock coalesced device when checking for available entry
Sasha Levin [Mon, 18 Jul 2011 14:17:14 +0000 (17:17 +0300)]
KVM: MMIO: Lock coalesced device when checking for available entry

Move the check whether there are available entries to within the spinlock.
This allows working with larger amount of VCPUs and reduces premature
exits when using a large number of VCPUs.

Cc: Avi Kivity <avi@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
13 years agoKVM: x86: cleanup the code of read/write emulation
Xiao Guangrong [Wed, 13 Jul 2011 06:32:31 +0000 (14:32 +0800)]
KVM: x86: cleanup the code of read/write emulation

Using the read/write operation to remove the same code

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86: abstract the operation for read/write emulation
Xiao Guangrong [Wed, 13 Jul 2011 06:31:50 +0000 (14:31 +0800)]
KVM: x86: abstract the operation for read/write emulation

The operations of read emulation and write emulation are very similar, so we
can abstract the operation of them, in larter patch, it is used to cleanup the
same code

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86: fix broken read emulation spans a page boundary
Xiao Guangrong [Wed, 13 Jul 2011 06:31:08 +0000 (14:31 +0800)]
KVM: x86: fix broken read emulation spans a page boundary

If the range spans a page boundary, the mmio access can be broke, fix it as
write emulation.

And we already get the guest physical address, so use it to read guest data
directly to avoid walking guest page table again

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86 emulator: fix Src2CL decode
Avi Kivity [Tue, 13 Sep 2011 07:45:38 +0000 (10:45 +0300)]
KVM: x86 emulator: fix Src2CL decode

Src2CL decode (used for double width shifts) erronously decodes only bit 3
of %rcx, instead of bits 7:0.

Fix by decoding %cl in its entirety.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
13 years agoKVM: MMU: fix incorrect return of spte
Zhao Jin [Mon, 19 Sep 2011 04:19:51 +0000 (12:19 +0800)]
KVM: MMU: fix incorrect return of spte

__update_clear_spte_slow should return original spte while the
current code returns low half of original spte combined with high
half of new spte.

Signed-off-by: Zhao Jin <cronozhj@gmail.com>
Reviewed-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
13 years agoMerge branch 'spi/merge' of git://git.secretlab.ca/git/linux-2.6
Linus Torvalds [Fri, 23 Sep 2011 23:53:16 +0000 (16:53 -0700)]
Merge branch 'spi/merge' of git://git.secretlab.ca/git/linux-2.6

* 'spi/merge' of git://git.secretlab.ca/git/linux-2.6:
  spi: Fix WARN when removing spi-fsl-spi module
  spi/imx: Fix spi-imx when the hardware SPI chipselects are used

13 years agospi: Fix WARN when removing spi-fsl-spi module
Jeff Harris [Fri, 23 Sep 2011 15:49:36 +0000 (11:49 -0400)]
spi: Fix WARN when removing spi-fsl-spi module

If CPM mode is not used, the fsl_dummy_rx variable is never allocated.  When
the cleanup attempts to free it, the reference count is zero and a WARN is
generated.  The same CPM mode check used in the initialize is applied to the
free as well.

Tested on 2.6.33 with the previous spi_mpc8xxx driver.  The renamed
spi-fsl-spi driver looks to have the same problem.

Signed-off-by: Jeff Harris <jeff_harris@kentrox.com>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
13 years agoscsi: fix qla2xxx printk format warning
Randy Dunlap [Fri, 23 Sep 2011 22:40:50 +0000 (15:40 -0700)]
scsi: fix qla2xxx printk format warning

sector_t can be different types, so cast it to its largest possible
type.

  drivers/scsi/qla2xxx/qla_isr.c:1509:5: warning: format '%lx' expects type 'long unsigned int', but argument 5 has type 'sector_t'

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoscsi: SCSI_ISCI needs to select SCSI_SAS_HOST_SMP, fixes build error
Randy Dunlap [Fri, 23 Sep 2011 22:43:54 +0000 (15:43 -0700)]
scsi: SCSI_ISCI needs to select SCSI_SAS_HOST_SMP, fixes build error

SCSI_ISCI needs to select SCSI_SAS_HOST_SMP to ensure that all
needed symbols are available to it.

Fixes this build error:

  ERROR: "try_test_sas_gpio_gp_bit" [drivers/scsi/isci/isci.ko] undefined!

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoMerge branch 'perf-tools-for-linus' of git://github.com/acmel/linux
Linus Torvalds [Fri, 23 Sep 2011 22:17:02 +0000 (15:17 -0700)]
Merge branch 'perf-tools-for-linus' of git://github.com/acmel/linux

* 'perf-tools-for-linus' of git://github.com/acmel/linux:
  perf python: Add missing perf_event__parse_sample 'swapped' parm

13 years agoMerge branch 'perf-tools-for-linus' of git://github.com/acmel/linux
Linus Torvalds [Fri, 23 Sep 2011 20:59:37 +0000 (13:59 -0700)]
Merge branch 'perf-tools-for-linus' of git://github.com/acmel/linux

* 'perf-tools-for-linus' of git://github.com/acmel/linux:
  perf tools: Add support for disabling -Werror via WERROR=0
  perf top: Fix userspace sample addr map offset
  perf symbols: Fix issue with binaries using 16-bytes buildids (v2)
  perf tool: Fix endianness handling of u32 data in samples
  perf sort: Fix symbol sort output by separating unresolved samples by type
  perf symbols: Synthesize anonymous mmap events
  perf record: Create events initially disabled and enable after init
  perf symbols: Add some heuristics for choosing the best duplicate symbol
  perf symbols: Preserve symbol scope when parsing /proc/kallsyms
  perf symbols: /proc/kallsyms does not sort module symbols
  perf symbols: Fix ppc64 SEGV in dso__load_sym with debuginfo files
  perf probe: Fix regression of variable finder

13 years agoMerge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Linus Torvalds [Fri, 23 Sep 2011 19:05:53 +0000 (12:05 -0700)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
  drm/radeon/kms: fix DDIA enable on some rs690 systems
  Revert "drm/radeon/kms: fix typo in r100_blit_copy"

13 years agoMerge branch 'for-linus' of git://github.com/tiwai/sound
Linus Torvalds [Fri, 23 Sep 2011 19:04:32 +0000 (12:04 -0700)]
Merge branch 'for-linus' of git://github.com/tiwai/sound

* 'for-linus' of git://github.com/tiwai/sound:
  ALSA: usb-audio - clear chip->probing on error exit
  ALSA: fm801: Gracefully handle failure of tuner auto-detect
  ALSA: fm801: Fix double free in case of error in tuner detection
  ASoC: Ensure we generate a driver name
  ASoC: Remove bitrotted wm8962_resume()
  ASoC: bf5xx-ad73311: Fix prototype for bf5xx_probe

13 years agoperf python: Add missing perf_event__parse_sample 'swapped' parm
Arnaldo Carvalho de Melo [Fri, 23 Sep 2011 18:38:53 +0000 (15:38 -0300)]
perf python: Add missing perf_event__parse_sample 'swapped' parm

Problem introduced in 936be50, that missed one perf_event__parse_sample
user, the python binding.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-ja4phms9618ggi657plyuch2@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
13 years agoperf tools: Add support for disabling -Werror via WERROR=0
Darren Hart [Thu, 8 Sep 2011 20:42:39 +0000 (13:42 -0700)]
perf tools: Add support for disabling -Werror via WERROR=0

GCC often introduces new warnings with lots of false positives -
breaking -Werror builds. WERROR=0 allows one to build perf without much
fuss - while still encouraging people to send patches to avoid the fuss
of having to type WERROR=0.

Bisecting back to commits that produce a (mostly harmless) warning on
some compilers is more difficult. With WERROR=0 one could bisect without
worrying about harmless warnings.

Cc: Ingo Molnar <mingo@elte.hu>
Link: http://lkml.kernel.org/r/eac06c7cc4920e5d4830417d466161fb26c7359c.1315514559.git.dvhart@linux.intel.com
Signed-off-by: Darren Hart <dvhart@linux.intel.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
13 years agoperf top: Fix userspace sample addr map offset
Arnaldo Carvalho de Melo [Wed, 14 Sep 2011 18:54:30 +0000 (15:54 -0300)]
perf top: Fix userspace sample addr map offset

The 'perf top' tool came from the kernel where we had each DSO (vmlinux,
modules) loaded just once at a time.

But userspace may have DSOs loaded in multiple addresses (shared
libraries), requiring that we use the just resolved map instead of the
first one found.

Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-ag53wz0yllpgers0n2w7hchp@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
13 years agoperf symbols: Fix issue with binaries using 16-bytes buildids (v2)
Stephane Eranian [Fri, 22 Oct 2010 15:25:01 +0000 (17:25 +0200)]
perf symbols: Fix issue with binaries using 16-bytes buildids (v2)

Buildid can vary in size. According to the man page of ld, buildid can
be 160 bits (sha1) or 128 bits (md5, uuid). Perf assumes buildid size of
20 bytes (160 bits) regardless. When dealing with md5 buildids, it would
thus read more than needed and that would cause mismatches and samples
without symbols.

This patch fixes this by taking into account the actual buildid size as
encoded int he section header. The leftover bytes are also cleared.

This second version fixes a minor issue with the memset() base position.

Cc: David S. Miller <davem@davemloft.net>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Stephane Eranian <eranian@gmail.com>
Link: http://lkml.kernel.org/r/4cc1af3c.8ee7d80a.5a28.ffff868e@mx.google.com
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
13 years agoperf tool: Fix endianness handling of u32 data in samples
David Ahern [Tue, 6 Sep 2011 15:12:26 +0000 (09:12 -0600)]
perf tool: Fix endianness handling of u32 data in samples

Currently, analyzing PPC data files on x86 the cpu field is always 0 and
the tid and pid are backwards. For example, analyzing a PPC file on PPC
the pid/tid fields show:

        rsyslogd  1210/1212

and analyzing the same PPC file using an x86 perf binary shows:

        rsyslogd  1212/1210

The problem is that the swap_op method for samples is
perf_event__all64_swap which assumes all elements in the sample_data
struct are u64s. cpu, tid and pid are u32s and need to be handled
individually. Given that the swap is done before the sample is parsed,
the simplest solution is to undo the 64-bit swap of those elements when
the sample is parsed and do the proper swap.

The RAW data field is generic and perf cannot have programmatic knowledge
of how to treat that data. Instead a warning is given to the user.

Thanks to Anton Blanchard for providing a data file for a mult-CPU
PPC system so I could verify the fix for the CPU fields.

v3 -> v4:
- fixed use of WARN_ONCE

v2 -> v3:
- used WARN_ONCE for message regarding raw data
- removed struct wrapper around union
- fixed whitespace issues

v1 -> v2:
- added a union for undoing the byte-swap on u64 and redoing swap on
  u32's to address compiler errors (see git commit 65014ab3)

Cc: Anton Blanchard <anton@samba.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1315321946-16993-1-git-send-email-dsahern@gmail.com
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
13 years agoperf sort: Fix symbol sort output by separating unresolved samples by type
Anton Blanchard [Wed, 31 Aug 2011 01:51:45 +0000 (11:51 +1000)]
perf sort: Fix symbol sort output by separating unresolved samples by type

I took a profile that suggested 60% of total CPU time was in the
hypervisor:

...
    60.20%  [H] 0x33d43c
     4.43%  [k] ._spin_lock_irqsave
     1.07%  [k] ._spin_lock

Using perf stat to get the user/kernel/hypervisor breakdown contradicted
this.

The problem is we merge all unresolved samples into the one unknown
bucket. If add a comparison by sample type to sort__sym_cmp we get the
real picture:

...
    57.11%  [.] 0x80fbf63c
     4.43%  [k] ._spin_lock_irqsave
     1.07%  [k] ._spin_lock
     0.65%  [H] 0x33d43c

So it was almost all userspace, not hypervisor as the initial profile
suggested.

I found another issue while adding this. Symbol sorting sometimes shows
multiple entries for the unknown bucket:

...
    16.65%  [.] 0x6cd3a8
     7.25%  [.] 0x422460
     5.37%  [.] yylex
     4.79%  [.] malloc
     4.78%  [.] _int_malloc
     4.03%  [.] _int_free
     3.95%  [.] hash_source_code_string
     2.82%  [.] 0x532908
     2.64%  [.] 0x36b538
     0.94%  [H] 0x8000000000e132a4
     0.82%  [H] 0x800000000000e8b0

This happens because we aren't consistent with our sorting. On
one hand we check to see if both symbols match and for two unresolved
samples sym is NULL so we match:

        if (left->ms.sym == right->ms.sym)
                return 0;

On the other hand we use sample IP for unresolved samples when
comparing against a symbol:

       ip_l = left->ms.sym ? left->ms.sym->start : left->ip;
       ip_r = right->ms.sym ? right->ms.sym->start : right->ip;

This means unresolved samples end up spread across the rbtree and we
can't merge them all.

If we use cmp_null all unresolved samples will end up in the one bucket
and the output makes more sense:

...
    39.12%  [.] 0x36b538
     5.37%  [.] yylex
     4.79%  [.] malloc
     4.78%  [.] _int_malloc
     4.03%  [.] _int_free
     3.95%  [.] hash_source_code_string
     2.26%  [H] 0x800000000000e8b0

Acked-by: Eric B Munson <emunson@mgebm.net>
Cc: Eric B Munson <emunson@mgebm.net>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ian Munsie <imunsie@au1.ibm.com>
Link: http://lkml.kernel.org/r/20110831115145.4f598ab2@kryten
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
13 years agoperf symbols: Synthesize anonymous mmap events
Anton Blanchard [Mon, 29 Aug 2011 23:15:06 +0000 (09:15 +1000)]
perf symbols: Synthesize anonymous mmap events

perf_event__synthesize_mmap_events does not create anonymous mmap events
even though the kernel does. As a result an already running application
with dynamically created code will not get profiled - all samples end up
in the unknown bucket.

This patch skips any entries with '[' in the name to avoid adding events
for special regions (eg the vsyscall page). All other executable mmaps
are assumed to be anonymous and an event is synthesized.

Acked-by: Pekka Enberg <penberg@kernel.org>
Cc: Eric B Munson <emunson@mgebm.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Link: http://lkml.kernel.org/r/20110830091506.60b51fe8@kryten
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
13 years agoperf record: Create events initially disabled and enable after init
David Ahern [Thu, 25 Aug 2011 16:17:55 +0000 (10:17 -0600)]
perf record: Create events initially disabled and enable after init

perf-record currently creates events enabled. When doing a system wide
collection (-a arg) this causes data collection for perf's
initialization activities -- eg., perf_event__synthesize_threads().

For some events (e.g., context switch S/W event or tracepoints like
syscalls) perf's initialization causes a lot of events to be captured
frequently generating "Check IO/CPU overload!" warnings on larger
systems (e.g., 2 socket, quad core, hyperthreading).

perf's initialization phase can be skipped by creating events
disabled and then enabling them once the initialization is done.

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1314289075-14706-1-git-send-email-dsahern@gmail.com
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
13 years agoperf symbols: Add some heuristics for choosing the best duplicate symbol
Anton Blanchard [Wed, 24 Aug 2011 06:40:17 +0000 (16:40 +1000)]
perf symbols: Add some heuristics for choosing the best duplicate symbol

Try and pick the best symbol based on a few heuristics:

-  Prefer a non weak symbol over a weak one
-  Prefer a global symbol over a non global one
-  Prefer a symbol with less underscores (idea taken from kallsyms.c)
-  If all else fails, choose the symbol with the longest name

Cc: Eric B Munson <emunson@mgebm.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20110824065243.161953371@samba.org
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
13 years agoperf symbols: Preserve symbol scope when parsing /proc/kallsyms
Anton Blanchard [Wed, 24 Aug 2011 06:40:16 +0000 (16:40 +1000)]
perf symbols: Preserve symbol scope when parsing /proc/kallsyms

kallsyms__parse capitalises the symbol type, so every symbol is marked
global. Remove this and fix symbol_type__is_a to handle both local and
global symbols.

Cc: Eric B Munson <emunson@mgebm.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20110824065243.077125989@samba.org
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
13 years agoperf symbols: /proc/kallsyms does not sort module symbols
Anton Blanchard [Wed, 24 Aug 2011 06:40:15 +0000 (16:40 +1000)]
perf symbols: /proc/kallsyms does not sort module symbols

kallsyms__parse assumes that /proc/kallsyms is sorted and sets the end
of the previous symbol to the start of the current one.

Unfortunately module symbols are not sorted, eg:

ffffffffa0081f30 t e1000_clean_rx_irq   [e1000e]
ffffffffa00817a0 t e1000_alloc_rx_buffers       [e1000e]

Some symbols end up with a negative length and others have a length
larger than they should. This results in confusing perf output.

We already have a function to fixup the end of zero length symbols so
use that instead.

Cc: Eric B Munson <emunson@mgebm.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20110824065242.969681349@samba.org
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
13 years agoperf symbols: Fix ppc64 SEGV in dso__load_sym with debuginfo files
Anton Blanchard [Wed, 24 Aug 2011 06:40:14 +0000 (16:40 +1000)]
perf symbols: Fix ppc64 SEGV in dso__load_sym with debuginfo files

64bit PowerPC debuginfo files have an empty function descriptor section.
I hit a SEGV when perf tried to use this section for symbol resolution.

To fix this we need to check the section is valid and we can do this by
checking for type SHT_PROGBITS.

Cc: <stable@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Eric B Munson <emunson@mgebm.net>
Link: http://lkml.kernel.org/r/20110824065242.895239970@samba.org
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
13 years agoperf probe: Fix regression of variable finder
Masami Hiramatsu [Sat, 20 Aug 2011 05:39:23 +0000 (14:39 +0900)]
perf probe: Fix regression of variable finder

Fix to call convert_variable() if previous call does not fail.

To call convert_variable, it ensures "ret" is 0. However, since
"ret" has the return value of synthesize_perf_probe_arg() which
always returns positive value if it succeeded, perf probe doesn't
call convert_variable(). This will cause a SEGV when we add an
event with arguments.

This has to be fixed as it ensures "ret" is greater than 0
(or not negative).

This regression has been introduced by my previous patch, f182e3e1.

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: yrl.pp-manager.tt@hitachi.com
Link: http://lkml.kernel.org/r/20110820053922.3286.65805.stgit@fedora15
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
13 years agoMerge branch 'fix/asoc' into for-linus
Takashi Iwai [Fri, 23 Sep 2011 13:26:37 +0000 (15:26 +0200)]
Merge branch 'fix/asoc' into for-linus

13 years agoALSA: usb-audio - clear chip->probing on error exit
Thomas Pfaff [Thu, 22 Sep 2011 16:26:06 +0000 (18:26 +0200)]
ALSA: usb-audio - clear chip->probing on error exit

The Terratec Aureon 5.1 USB sound card support is broken since kernel
2.6.39.
2.6.39 introduced power management support for USB sound cards that added
a probing flag in struct snd_usb_audio.

During the probe of the card it gives following error message :

usb 7-2: new full speed USB device number 2 using uhci_hcd
cannot find UAC_HEADER
snd-usb-audio: probe of 7-2:1.3 failed with error -5
input: USB Audio as
/devices/pci0000:00/0000:00:1d.1/usb7/7-2/7-2:1.3/input/input6
generic-usb 0003:0CCD:0028.0001: input: USB HID v1.00 Device [USB Audio]
on usb-0000:00:1d.1-2/input3

I can not comment about that "cannot find UAC_HEADER" error, but until
2.6.38 the card worked anyway.
With 2.6.39 chip->probing remains 1 on error exit, and any later ioctl
stops in snd_usb_autoresume with -ENODEV.

Signed-off-by: Thomas Pfaff <tpfaff@gmx.net>
Cc: <stable@kernel.org> [2.6.39+]
Signed-off-by: Takashi Iwai <tiwai@suse.de>
13 years agodrm/radeon/kms: fix DDIA enable on some rs690 systems
Alex Deucher [Thu, 22 Sep 2011 14:47:23 +0000 (10:47 -0400)]
drm/radeon/kms: fix DDIA enable on some rs690 systems

DVOOutputControl checks the value of of bios scratch reg 3
on some tables and assumes the encoder is already enabled
if the DFP2_ACTIVE bit is set.  Clear that bit so the table
sets the DDIA enable bit properly.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
13 years agoRevert "drm/radeon/kms: fix typo in r100_blit_copy"
Dave Airlie [Fri, 23 Sep 2011 13:00:54 +0000 (14:00 +0100)]
Revert "drm/radeon/kms: fix typo in r100_blit_copy"

This reverts commit 18b4fada275dd2b6dd9db904ddf70fe39e272222.

This code was correct, apologies to anyone who noticed things broke.

revert contents are different due to another commit in between.

Signed-off-by: Dave Airlie <airlied@redhat.com>
13 years agoMerge branch 'for-linus' of git://git.selinuxproject.org/~jmorris/linux-security
Linus Torvalds [Fri, 23 Sep 2011 02:57:27 +0000 (19:57 -0700)]
Merge branch 'for-linus' of git://git.selinuxproject.org/~jmorris/linux-security

* 'for-linus' of git://git.selinuxproject.org/~jmorris/linux-security:
  TPM: Zero buffer after copying to userspace
  TPM: Call tpm_transmit with correct size
  TPM: tpm_nsc: Fix a double free of pdev in cleanup_nsc
  TPM: TCG_ATMEL should depend on HAS_IOPORT

13 years agoTPM: Zero buffer after copying to userspace
Peter Huewe [Thu, 15 Sep 2011 17:47:42 +0000 (14:47 -0300)]
TPM: Zero buffer after copying to userspace

Since the buffer might contain security related data it might be a good idea to
zero the buffer after we have copied it to userspace.

This got assigned CVE-2011-1162.

Signed-off-by: Rajiv Andrade <srajiv@linux.vnet.ibm.com>
Cc: Stable Kernel <stable@kernel.org>
Signed-off-by: James Morris <jmorris@namei.org>
13 years agoTPM: Call tpm_transmit with correct size
Peter Huewe [Thu, 15 Sep 2011 17:37:43 +0000 (14:37 -0300)]
TPM: Call tpm_transmit with correct size

This patch changes the call of tpm_transmit by supplying the size of the
userspace buffer instead of TPM_BUFSIZE.

This got assigned CVE-2011-1161.

[The first hunk didn't make sense given one could expect
 way less data than TPM_BUFSIZE, so added tpm_transmit boundary
 check over bufsiz instead
 The last parameter of tpm_transmit() reflects the amount
 of data expected from the device, and not the buffer size
 being supplied to it. It isn't ideal to parse it directly,
 so we just set it to the maximum the input buffer can handle
 and let the userspace API to do such job.]

Signed-off-by: Rajiv Andrade <srajiv@linux.vnet.ibm.com>
Cc: Stable Kernel <stable@kernel.org>
Signed-off-by: James Morris <jmorris@namei.org>
13 years agoTPM: tpm_nsc: Fix a double free of pdev in cleanup_nsc
Axel Lin [Wed, 3 Aug 2011 23:58:07 +0000 (07:58 +0800)]
TPM: tpm_nsc: Fix a double free of pdev in cleanup_nsc

platform_device_unregister() will release all resources
and remove it from the subsystem, then drop reference count by
calling platform_device_put().

We should not call kfree(pdev) after platform_device_unregister(pdev).

Signed-off-by: Axel Lin <axel.lin@gmail.com>
Signed-off-by: Rajiv Andrade <srajiv@linux.vnet.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
13 years agoTPM: TCG_ATMEL should depend on HAS_IOPORT
Geert Uytterhoeven [Mon, 8 Aug 2011 11:08:19 +0000 (13:08 +0200)]
TPM: TCG_ATMEL should depend on HAS_IOPORT

On m68k, I get:

drivers/char/tpm/tpm_atmel.h: In function ‘atmel_get_base_addr’:
drivers/char/tpm/tpm_atmel.h:129: error: implicit declaration of function ‘ioport_map’
drivers/char/tpm/tpm_atmel.h:129: warning: return makes pointer from integer without a cast

The code in tpm_atmel.h supports PPC64 (using the device tree and ioremap())
and "anything else" (using ioport_map()). However, ioportmap() is only
available on platforms that set HAS_IOPORT.

Although PC64 seems to have HAS_IOPORT, a "depends on HAS_IOPORT" should work,
but I think it's better to expose the special PPC64 handling explicit using
"depends on PPC64 || HAS_IOPORT".

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Rajiv Andrade <srajiv@linux.vnet.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
13 years agothp: fix khugepaged defrag tunable documentation
David Rientjes [Thu, 22 Sep 2011 21:11:38 +0000 (14:11 -0700)]
thp: fix khugepaged defrag tunable documentation

Commit e27e6151b154 ("mm/thp: use conventional format for boolean
attributes") changed

  /sys/kernel/mm/transparent_hugepage/khugepaged/defrag

to be tuned by using 1 (enabled) or 0 (disabled) instead of "yes" and
"no", respectively.

Update the documentation.

Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>