Linus Torvalds [Wed, 30 Jan 2008 22:31:37 +0000 (09:31 +1100)]
Merge git://git./linux/kernel/git/gregkh/driver-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-2.6:
PPC: Fix powerpc vio_find_name to not use devices_subsys
Driver core: add bus_find_device_by_name function
Module: check to see if we have a built in module with the same name
x86: fix runtime error in arch/x86/kernel/cpu/mcheck/mce_amd_64.c
Driver core: Fix up build when CONFIG_BLOCK=N
Linus Torvalds [Wed, 30 Jan 2008 22:30:10 +0000 (09:30 +1100)]
Merge branch 'for-linus' of git://git./linux/kernel/git/avi/kvm
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm: (249 commits)
KVM: Move apic timer migration away from critical section
KVM: Put kvm_para.h include outside __KERNEL__
KVM: Fix unbounded preemption latency
KVM: Initialize the mmu caches only after verifying cpu support
KVM: MMU: Fix dirty page setting for pages removed from rmap
KVM: Portability: Move kvm_fpu to asm-x86/kvm.h
KVM: x86 emulator: Only allow VMCALL/VMMCALL trapped by #UD
KVM: MMU: Merge shadow level check in FNAME(fetch)
KVM: MMU: Move kvm_free_some_pages() into critical section
KVM: MMU: Switch to mmu spinlock
KVM: MMU: Avoid calling gfn_to_page() in mmu_set_spte()
KVM: Add kvm_read_guest_atomic()
KVM: MMU: Concurrent guest walkers
KVM: Disable vapic support on Intel machines with FlexPriority
KVM: Accelerated apic support
KVM: local APIC TPR access reporting facility
KVM: Print data for unimplemented wrmsr
KVM: MMU: Add cache miss statistic
KVM: MMU: Coalesce remote tlb flushes
KVM: Expose ioapic to ia64 save/restore APIs
...
Linus Torvalds [Wed, 30 Jan 2008 22:29:31 +0000 (09:29 +1100)]
Merge branch 'for-linus' of git://git./linux/kernel/git/teigland/dlm
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm: (21 commits)
dlm: static initialization improvements
dlm: clean ups
dlm: Sanity check namelen before copying it
dlm: keep cached master rsbs during recovery
dlm: change error message to debug
dlm: fix possible use-after-free
dlm: limit dir lookup loop
dlm: reject normal unlock when lock is waiting for lookup
dlm: validate messages before processing
dlm: reject messages from non-members
dlm: another call to confirm_master in receive_request_reply
dlm: recover locks waiting for overlap replies
dlm: clear ast_type when removing from astqueue
dlm: use fixed errno values in messages
dlm: swap bytes for rcom lock reply
dlm: align midcomms message buffer
dlm: close othercons
dlm: use dlm prefix on alloc and free functions
dlm: don't print common non-errors
dlm: proper prototypes
...
Linus Torvalds [Wed, 30 Jan 2008 22:28:49 +0000 (09:28 +1100)]
Merge git://git./linux/kernel/git/jejb/scsi-misc-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (21 commits)
[SCSI] Revert "[SCSI] aacraid: fib context lock for management ioctls"
[SCSI] bsg: copy the cmd_type field to the subordinate request for bidi
[SCSI] handle scsi_init_queue failure properly
[SCSI] destroy scsi_bidi_sdb_cache in scsi_exit_queue
[SCSI] scsi_debug: add XDWRITEREAD_10 support
[SCSI] scsi_debug: add bidi data transfer support
[SCSI] scsi_debug: add get_data_transfer_info helper function
[SCSI] remove use_sg_chaining
[SCSI] bidirectional: fix up for the new blk_end_request code
[SCSI] bidirectional command support
[SCSI] implement scsi_data_buffer
[SCSI] tgt: use scsi_init_io instead of scsi_alloc_sgtable
[SCSI] aic7xxx: fix warnings with CONFIG_PM disabled
[SCSI] aic79xx: fix warnings with CONFIG_PM disabled
[SCSI] aic7xxx: fix ahc_done check SCB_ACTIVE for tagged transactions
[SCSI] sgiwd93: use cached memory access to make driver work on IP28
[SCSI] zfcp: fix sense_buffer access bug
[SCSI] ncr53c8xx: fix sense_buffer access bug
[SCSI] aic79xx: fix sense_buffer access bug
[SCSI] hptiop: fix sense_buffer access bug
...
Randy Dunlap [Wed, 30 Jan 2008 19:51:00 +0000 (11:51 -0800)]
docbook: fix block api fatal error
Fix docbook fatal error:
docproc: linux-2.6.24-git8/block/ll_rw_blk.c: No such file or directory
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Randy Dunlap [Wed, 30 Jan 2008 19:51:08 +0000 (11:51 -0800)]
docbook: fix drivers/base/class warning
Fix kernel-doc empty line warning:
Warning(linux-2.6.24-git8//drivers/base/class.c:866): bad line:
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
James Bottomley [Tue, 29 Jan 2008 21:17:15 +0000 (16:17 -0500)]
[SCSI] Revert "[SCSI] aacraid: fib context lock for management ioctls"
This reverts commit
a119ee8ee3045bf559d4cf02d72b112f3de2a15b.
Adaptec found this was causing system lockups.
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
James Bottomley [Sat, 26 Jan 2008 02:05:55 +0000 (20:05 -0600)]
[SCSI] bsg: copy the cmd_type field to the subordinate request for bidi
This fixes a problem in SCSI where we use the (previously
uninitialised) cmd_type via blk_pc_request() to set up the transfer in
scsi_init_sgtable().
Acked-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
FUJITA Tomonori [Fri, 25 Jan 2008 14:25:14 +0000 (23:25 +0900)]
[SCSI] handle scsi_init_queue failure properly
scsi_init_queue is expected to clean up allocated things when it
fails.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
FUJITA Tomonori [Fri, 25 Jan 2008 14:25:13 +0000 (23:25 +0900)]
[SCSI] destroy scsi_bidi_sdb_cache in scsi_exit_queue
Needs to call kmem_cache_destroy for scsi_bidi_sdb_cache in
scsi_exit_queue.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
FUJITA Tomonori [Tue, 22 Jan 2008 16:32:01 +0000 (01:32 +0900)]
[SCSI] scsi_debug: add XDWRITEREAD_10 support
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Douglas Gilbert <dougg@torque.net>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
FUJITA Tomonori [Tue, 22 Jan 2008 16:32:00 +0000 (01:32 +0900)]
[SCSI] scsi_debug: add bidi data transfer support
This enables fill_from_dev_buffer and fetch_to_dev_buffer to handle
bidi commands.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Douglas Gilbert <dougg@torque.net>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
FUJITA Tomonori [Tue, 22 Jan 2008 16:31:59 +0000 (01:31 +0900)]
[SCSI] scsi_debug: add get_data_transfer_info helper function
This adds get_data_transfer_info helper function that get lha and
sectors for READ_* and WRITE_* commands (and XDWRITEREAD_10 later).
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Douglas Gilbert <dougg@torque.net>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
James Bottomley [Tue, 15 Jan 2008 17:11:46 +0000 (11:11 -0600)]
[SCSI] remove use_sg_chaining
With the sg table code, every SCSI driver is now either chain capable
or broken (or has sg_tablesize set so chaining is never activated), so
there's no need to have a check in the host template.
Also tidy up the code by moving the scatterlist size defines into the
SCSI includes and permit the last entry of the scatterlist pools not
to be a power of two.
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Kiyoshi Ueda [Fri, 18 Jan 2008 17:02:15 +0000 (12:02 -0500)]
[SCSI] bidirectional: fix up for the new blk_end_request code
Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Boaz Harrosh [Thu, 13 Dec 2007 11:50:53 +0000 (13:50 +0200)]
[SCSI] bidirectional command support
At the block level bidi request uses req->next_rq pointer for a second
bidi_read request.
At Scsi-midlayer a second scsi_data_buffer structure is used for the
bidi_read part. This bidi scsi_data_buffer is put on
request->next_rq->special. Struct scsi_cmnd is not changed.
- Define scsi_bidi_cmnd() to return true if it is a bidi request and a
second sgtable was allocated.
- Define scsi_in()/scsi_out() to return the in or out scsi_data_buffer
from this command This API is to isolate users from the mechanics of
bidi.
- Define scsi_end_bidi_request() to do what scsi_end_request() does but
for a bidi request. This is necessary because bidi commands are a bit
tricky here. (See comments in body)
- scsi_release_buffers() will also release the bidi_read scsi_data_buffer
- scsi_io_completion() on bidi commands will now call
scsi_end_bidi_request() and return.
- The previous work done in scsi_init_io() is now done in a new
scsi_init_sgtable() (which is 99% identical to old scsi_init_io())
The new scsi_init_io() will call the above twice if needed also for
the bidi_read command. Only at this point is a command bidi.
- In scsi_error.c at scsi_eh_prep/restore_cmnd() make sure bidi-lld is not
confused by a get-sense command that looks like bidi. This is done
by puting NULL at request->next_rq, and restoring.
[jejb: update to sg_table and resolve conflicts
also update to blk-end-request and resolve conflicts]
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Boaz Harrosh [Thu, 13 Dec 2007 11:47:40 +0000 (13:47 +0200)]
[SCSI] implement scsi_data_buffer
In preparation for bidi we abstract all IO members of scsi_cmnd,
that will need to duplicate, into a substructure.
- Group all IO members of scsi_cmnd into a scsi_data_buffer
structure.
- Adjust accessors to new members.
- scsi_{alloc,free}_sgtable receive a scsi_data_buffer instead of
scsi_cmnd. And work on it.
- Adjust scsi_init_io() and scsi_release_buffers() for above
change.
- Fix other parts of scsi_lib/scsi.c to members migration. Use
accessors where appropriate.
- fix Documentation about scsi_cmnd in scsi_host.h
- scsi_error.c
* Changed needed members of struct scsi_eh_save.
* Careful considerations in scsi_eh_prep/restore_cmnd.
- sd.c and sr.c
* sd and sr would adjust IO size to align on device's block
size so code needs to change once we move to scsi_data_buff
implementation.
* Convert code to use scsi_for_each_sg
* Use data accessors where appropriate.
- tgt: convert libsrp to use scsi_data_buffer
- isd200: This driver still bangs on scsi_cmnd IO members,
so need changing
[jejb: rebased on top of sg_table patches fixed up conflicts
and used the synergy to eliminate use_sg and sg_count]
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Boaz Harrosh [Fri, 14 Dec 2007 00:14:27 +0000 (16:14 -0800)]
[SCSI] tgt: use scsi_init_io instead of scsi_alloc_sgtable
If we export scsi_init_io()/scsi_release_buffers() instead of
scsi_{alloc,free}_sgtable() from scsi_lib than tgt code is much more
insulated from scsi_lib changes. As a bonus it will also gain bidi
capability when it comes.
[jejb: rebase on to sg_table and fix up rejections]
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Acked-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
FUJITA Tomonori [Sat, 26 Jan 2008 15:08:19 +0000 (00:08 +0900)]
[SCSI] aic7xxx: fix warnings with CONFIG_PM disabled
CC [M] drivers/scsi/aic7xxx/aic7xxx_osm_pci.o
drivers/scsi/aic7xxx/aic7xxx_osm_pci.c:148: warning: 'ahc_linux_pci_dev_suspend' defined but not used
drivers/scsi/aic7xxx/aic7xxx_osm_pci.c:166: warning: 'ahc_linux_pci_dev_resume' defined but not used
This moves aic7xxx_pci_driver struct, removes some forward declarations,
and adds some ifdef CONFIG_PM.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
FUJITA Tomonori [Sat, 26 Jan 2008 15:08:18 +0000 (00:08 +0900)]
[SCSI] aic79xx: fix warnings with CONFIG_PM disabled
CC [M] drivers/scsi/aic7xxx/aic79xx_osm_pci.o
drivers/scsi/aic7xxx/aic79xx_osm_pci.c:101: warning: 'ahd_linux_pci_dev_suspend' defined but not used
drivers/scsi/aic7xxx/aic79xx_osm_pci.c:121: warning: 'ahd_linux_pci_dev_resume' defined but not used
This moves aic79xx_pci_driver struct, removes some forward
declarations, and adds some ifdef CONFIG_PM.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
David Milburn [Fri, 25 Jan 2008 18:16:18 +0000 (12:16 -0600)]
[SCSI] aic7xxx: fix ahc_done check SCB_ACTIVE for tagged transactions
The driver only needs to check the SCB_ACTIVE flag if the SCB is not
in the untagged queue.
If the driver is in error recovery, you may end panic'ing on a TUR
that is in the untagged queue.
Attempting to queue an ABORT message
CDB: 0x0 0x0 0x0 0x0 0x0 0x0
SCB 3 done'd twice
This patch is included in Adaptec's 6.3.11 driver on their website.
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Thomas Bogendoerfer [Sat, 26 Jan 2008 23:25:53 +0000 (00:25 +0100)]
[SCSI] sgiwd93: use cached memory access to make driver work on IP28
SGI IP28 machines would need special treatment (enable adding addtional
wait states) when accessing memory uncached. To avoid this pain I
changed the driver to use only cached access to memory.
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
FUJITA Tomonori [Sun, 27 Jan 2008 03:41:50 +0000 (12:41 +0900)]
[SCSI] zfcp: fix sense_buffer access bug
The commit
de25deb18016f66dcdede165d07654559bb332bc changed
scsi_cmnd.sense_buffer from a static array to a dynamically allocated
buffer. We can't access to sense_buffer in '&cmd->sense_buffer' way.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
FUJITA Tomonori [Sun, 27 Jan 2008 03:41:51 +0000 (12:41 +0900)]
[SCSI] ncr53c8xx: fix sense_buffer access bug
The commit
de25deb18016f66dcdede165d07654559bb332bc changed
scsi_cmnd.sense_buffer from a static array to a dynamically allocated
buffer. We can't access to sense_buffer in '&cmd->sense_buffer' way.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
FUJITA Tomonori [Sun, 27 Jan 2008 03:41:09 +0000 (12:41 +0900)]
[SCSI] aic79xx: fix sense_buffer access bug
The commit
de25deb18016f66dcdede165d07654559bb332bc changed
scsi_cmnd.sense_buffer from a static array to a dynamically allocated
buffer. We can't access to sense_buffer in '&cmd->sense_buffer' way.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
FUJITA Tomonori [Sun, 27 Jan 2008 01:22:26 +0000 (10:22 +0900)]
[SCSI] hptiop: fix sense_buffer access bug
&cmnd->sense_buffer now zeroes the wrong thing.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Nathan Lynch [Sat, 26 Jan 2008 22:07:30 +0000 (16:07 -0600)]
[SCSI] sym53c8xx: fix bad memset argument in sym_set_cam_result_error
On a big powerpc box I got the following oops with 2.6.24-git2:
sym0: <1010-66> rev 0x1 at pci 0000:d0:01.0 irq 215
sym0: No NVRAM, ID 7, Fast-80, LVD, parity checking
sym0: SCSI BUS has been reset.
scsi0 : sym-2.2.3
target0:0:8: FAST-40 WIDE SCSI 80.0 MB/s ST (25 ns, offset 31)
scsi 0:0:8:0: Direct-Access IBM ST318305LC C509 PQ: 0
ANSI: 3
target0:0:8: tagged command queuing enabled, command queue depth 16.
target0:0:8: Beginning Domain Validation
target0:0:8: asynchronous
target0:0:8: wide asynchronous
target0:0:8: FAST-80 WIDE SCSI 160.0 MB/s DT (12.5 ns, offset 31)
target0:0:8: FAST-80 WIDE SCSI 160.0 MB/s DT (12.5 ns, offset 31)
Unable to handle kernel paging request for data at address 0x00000000
Faulting instruction address: 0xc000000000038460
cpu 0x25: Vector: 300 (Data Access) at [
c00000000f567840]
pc:
c000000000038460: .memcpy+0x60/0x280
lr:
d000000000050280: .sym_set_cam_result_error+0xfc/0x1e0 [sym53c8xx]
sp:
c00000000f567ac0
msr:
8000000000009032
dar: 0
dsisr:
42000000
current = 0xc000006d1e0af0a0
paca = 0xc0000000004afc00
pid = 0, comm = swapper
enter ? for help
[link register ]
d000000000050280
.sym_set_cam_result_error+0xfc/0x1e0 [sym53c8xx]
[
c00000000f567ac0]
c00000000f567b80 (unreliable)
[
c00000000f567b80]
d0000000000552b8 .sym_complete_error+0x12c/0x1bc [sym53c8xx]
[
c00000000f567c20]
d0000000000561a4 .sym_int_sir+0xaa4/0x1718 [sym53c8xx]
[
c00000000f567d00]
d000000000057e8c .sym_interrupt+0x4e4/0x6ec [sym53c8xx]
[
c00000000f567dc0]
d00000000004fdf4 .sym53c8xx_intr+0x6c/0xdc [sym53c8xx]
[
c00000000f567e50]
c0000000000a83e0 .handle_IRQ_event+0x7c/0xec
[
c00000000f567ef0]
c0000000000aa344 .handle_fasteoi_irq+0x130/0x1f0
[
c00000000f567f90]
c00000000002a538 .call_handle_irq+0x1c/0x2c
[
c000004d5e0b3a90]
c00000000000c320 .do_IRQ+0x108/0x1d0
[
c000004d5e0b3b20]
c000000000004790 hardware_interrupt_entry+0x18/0x1c
The memset() in sym_set_cam_result_error() would appear to be trashing
the scsi_cmnd struct instead of clearing sense_buffer.
Signed-off-by: Nathan Lynch <ntl@pobox.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Denis Cheng [Tue, 29 Jan 2008 05:50:16 +0000 (13:50 +0800)]
dlm: static initialization improvements
also change name_prefix from char pointer to char array.
Signed-off-by: Denis Cheng <crquan@gmail.com>
Signed-off-by: David Teigland <teigland@redhat.com>
David Teigland [Tue, 29 Jan 2008 20:52:10 +0000 (14:52 -0600)]
dlm: clean ups
A couple small clean-ups. Remove unnecessary wrapper-functions in
rcom.c, and remove unnecessary casting and an unnecessary ASSERT in
util.c.
Signed-off-by: David Teigland <teigland@redhat.com>
Patrick Caulfeld [Thu, 17 Jan 2008 10:25:28 +0000 (10:25 +0000)]
dlm: Sanity check namelen before copying it
The 32/64 compatibility code in the DLM does not check the validity of
the lock name length passed into it, so it can easily overwrite memory
if the value is rubbish (as early versions of libdlm can cause with
unlock calls, it doesn't zero the field).
This patch restricts the length of the name to the amount of data
actually passed into the call.
Signed-off-by: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
David Teigland [Wed, 16 Jan 2008 19:02:31 +0000 (13:02 -0600)]
dlm: keep cached master rsbs during recovery
To prevent the master of an rsb from changing rapidly, an unused rsb is kept
on the "toss list" for a period of time to be reused. The toss list was
being cleared completely for each recovery, which is unnecessary. Much of
the benefit of the toss list can be maintained if nodes keep rsb's in their
toss list that they are the master of. These rsb's need to be included
when the resource directory is rebuilt during recovery.
Signed-off-by: David Teigland <teigland@redhat.com>
David Teigland [Wed, 16 Jan 2008 17:03:41 +0000 (11:03 -0600)]
dlm: change error message to debug
The invalid lockspace messages are normal and can appear relatively
often. They should be suppressed without debugging enabled.
Signed-off-by: David Teigland <teigland@redhat.com>
David Teigland [Mon, 14 Jan 2008 21:48:58 +0000 (15:48 -0600)]
dlm: fix possible use-after-free
The dlm_put_lkb() can free the lkb and its associated ua structure,
so we can't depend on using the ua struct after the put.
Signed-off-by: David Teigland <teigland@redhat.com>
David Teigland [Wed, 9 Jan 2008 16:37:39 +0000 (10:37 -0600)]
dlm: limit dir lookup loop
In a rare case we may need to repeat a local resource directory lookup
due to a race with removing the rsb and removing the resdir record.
We'll never need to do more than a single additional lookup, though,
so the infinite loop around the lookup can be removed. In addition
to being unnecessary, the infinite loop is dangerous since some other
unknown condition may appear causing the loop to never break.
Signed-off-by: David Teigland <teigland@redhat.com>
David Teigland [Wed, 9 Jan 2008 16:30:45 +0000 (10:30 -0600)]
dlm: reject normal unlock when lock is waiting for lookup
Non-forced unlocks should be rejected if the lock is waiting on the
rsb_lookup list for another lock to establish the master node.
Signed-off-by: David Teigland <teigland@redhat.com>
David Teigland [Wed, 9 Jan 2008 15:59:41 +0000 (09:59 -0600)]
dlm: validate messages before processing
There was some hit and miss validation of messages that has now been
cleaned up and unified. Before processing a message, the new
validate_message() function checks that the lkb is the appropriate type,
process-copy or master-copy, and that the message is from the correct
nodeid for the the given lkb. Other checks and assertions on the
lkb type and nodeid have been removed. The assertions were particularly
bad since they would panic the machine instead of just ignoring the bad
message.
Although other recent patches have made processing old message unlikely,
it still may be possible for an old message to be processed and caught
by these checks.
Signed-off-by: David Teigland <teigland@redhat.com>
David Teigland [Tue, 8 Jan 2008 22:24:00 +0000 (16:24 -0600)]
dlm: reject messages from non-members
Messages from nodes that are no longer members of the lockspace should be
ignored. When nodes are removed from the lockspace, recovery can
sometimes complete quickly enough that messages arrive from a removed node
after recovery has completed. When processed, these messages would often
cause an error message, and could in some cases change some state, causing
problems.
Signed-off-by: David Teigland <teigland@redhat.com>
David Teigland [Tue, 8 Jan 2008 21:37:47 +0000 (15:37 -0600)]
dlm: another call to confirm_master in receive_request_reply
When a failed request (EBADR or ENOTBLK) is unlocked/canceled instead of
retried, there may be other lkb's waiting on the rsb_lookup list for it
to complete. A call to confirm_master() is needed to move on to the next
waiting lkb since the current one won't be retried.
Signed-off-by: David Teigland <teigland@redhat.com>
David Teigland [Mon, 7 Jan 2008 22:15:05 +0000 (16:15 -0600)]
dlm: recover locks waiting for overlap replies
When recovery looks at locks waiting for replies, it fails to consider
locks that have already received a reply for their first remote operation,
but not received a reply for secondary, overlapping unlock/cancel. The
appropriate stub reply needs to be called for these waiters.
Appears when we start doing recovery in the presence of a many overlapping
unlock/cancel ops.
Signed-off-by: David Teigland <teigland@redhat.com>
David Teigland [Mon, 7 Jan 2008 21:55:18 +0000 (15:55 -0600)]
dlm: clear ast_type when removing from astqueue
The lkb_ast_type field indicates whether the lkb is on the astqueue list.
When clearing locks for a process, lkb's were being removed from the astqueue
list without clearing the field. If release_lockspace then happened
immediately afterward, it could try to remove the lkb from the list a second
time.
Appears when process calls libdlm dlm_release_lockspace() which first
closes the ls dev triggering clear_proc_locks, and then removes the ls
(a write to control dev) causing release_lockspace().
Signed-off-by: David Teigland <teigland@redhat.com>
David Teigland [Tue, 15 Jan 2008 21:43:24 +0000 (15:43 -0600)]
dlm: use fixed errno values in messages
Some errno values differ across platforms. So if we return things like
-EINPROGRESS from one node it can get misinterpreted or rejected on
another one.
This patch fixes up the errno values passed on the wire so that they
match the x86 ones (so as not to break the protocol), and re-instates
the platform-specific ones at the other end.
Many thanks to Fabio for testing this patch.
Initial patch from Patrick.
Signed-off-by: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: Fabio M. Di Nitto <fabbione@ubuntu.com>
Signed-off-by: David Teigland <teigland@redhat.com>
Fabio M. Di Nitto [Tue, 15 Jan 2008 21:13:36 +0000 (15:13 -0600)]
dlm: swap bytes for rcom lock reply
DLM_RCOM_LOCK_REPLY messages need byte swapping.
Signed-off-by: Fabio M. Di Nitto <fabbione@ubuntu.com>
Signed-off-by: David Teigland <teigland@redhat.com>
Fabio M. Di Nitto [Wed, 30 Jan 2008 16:56:42 +0000 (10:56 -0600)]
dlm: align midcomms message buffer
gcc does not guarantee that an auto buffer is 64bit aligned.
This change allows sparc64 to work.
Signed-off-by: Fabio M. Di Nitto <fabbione@ubuntu.com>
Signed-off-by: David Teigland <teigland@redhat.com>
Avi Kivity [Wed, 16 Jan 2008 10:49:30 +0000 (12:49 +0200)]
KVM: Move apic timer migration away from critical section
Migrating the apic timer in the critical section is not very nice, and is
absolutely horrible with the real-time port. Move migration to the regular
vcpu execution path, triggered by a new bitflag.
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Glauber de Oliveira Costa [Tue, 15 Jan 2008 15:10:15 +0000 (13:10 -0200)]
KVM: Put kvm_para.h include outside __KERNEL__
kvm_para.h potentially contains definitions that are to be used by userspace,
so it should not be included inside the __KERNEL__ block. To protect its own
data structures, kvm_para.h already includes its own __KERNEL__ block.
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Acked-by: Amit Shah <amit.shah@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Tue, 15 Jan 2008 16:27:32 +0000 (18:27 +0200)]
KVM: Fix unbounded preemption latency
When preparing to enter the guest, if an interrupt comes in while
preemption is disabled but interrupts are still enabled, we miss a
preemption point. Fix by explicitly checking whether we need to
reschedule.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Sun, 13 Jan 2008 11:23:56 +0000 (13:23 +0200)]
KVM: Initialize the mmu caches only after verifying cpu support
Otherwise we re-initialize the mmu caches, which will fail since the
caches are already registered, which will cause us to deinitialize said caches.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Izik Eidus [Sat, 12 Jan 2008 21:49:09 +0000 (23:49 +0200)]
KVM: MMU: Fix dirty page setting for pages removed from rmap
Right now rmap_remove won't set the page as dirty if the shadow pte
pointed to this page had write access and then it became readonly.
This patches fixes that, by setting the page as dirty for spte changes from
write to readonly access.
Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Christian Ehrhardt [Tue, 8 Jan 2008 07:04:50 +0000 (08:04 +0100)]
KVM: Portability: Move kvm_fpu to asm-x86/kvm.h
This patch moves kvm_fpu asm-x86/kvm.h to allow every architecture to
define an own representation used for KVM_GET_FPU/KVM_SET_FPU.
Signed-off-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
Acked-by: Carsten Otte <cotte@de.ibm.com>
Acked-by: Zhang Xiantao <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Sheng Yang [Wed, 2 Jan 2008 06:49:22 +0000 (14:49 +0800)]
KVM: x86 emulator: Only allow VMCALL/VMMCALL trapped by #UD
When executing a test program called "crashme", we found the KVM guest cannot
survive more than ten seconds, then encounterd kernel panic. The basic concept
of "crashme" is generating random assembly code and trying to execute it.
After some fixes on emulator insn validity judgment, we found it's hard to
get the current emulator handle the invalid instructions correctly, for the
#UD trap for hypercall patching caused troubles. The problem is, if the opcode
itself was OK, but combination of opcode and modrm_reg was invalid, and one
operand of the opcode was memory (SrcMem or DstMem), the emulator will fetch
the memory operand first rather than checking the validity, and may encounter
an error there. For example, ".byte 0xfe, 0x34, 0xcd" has this problem.
In the patch, we simply check that if the invalid opcode wasn't vmcall/vmmcall,
then return from emulate_instruction() and inject a #UD to guest. With the
patch, the guest had been running for more than 12 hours.
Signed-off-by: Feng (Eric) Liu <eric.e.liu@intel.com>
Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Dong, Eddie [Wed, 2 Jan 2008 06:29:08 +0000 (14:29 +0800)]
KVM: MMU: Merge shadow level check in FNAME(fetch)
Remove the redundant level check when fetching
shadow pte for present & non-present spte.
Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Mon, 31 Dec 2007 13:27:49 +0000 (15:27 +0200)]
KVM: MMU: Move kvm_free_some_pages() into critical section
If some other cpu steals mmu pages between our check and an attempt to
allocate, we can run out of mmu pages. Fix by moving the check into the
same critical section as the allocation.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Marcelo Tosatti [Fri, 21 Dec 2007 00:18:26 +0000 (19:18 -0500)]
KVM: MMU: Switch to mmu spinlock
Convert the synchronization of the shadow handling to a separate mmu_lock
spinlock.
Also guard fetch() by mmap_sem in read-mode to protect against alias
and memslot changes.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Sun, 30 Dec 2007 10:29:05 +0000 (12:29 +0200)]
KVM: MMU: Avoid calling gfn_to_page() in mmu_set_spte()
Since gfn_to_page() is a sleeping function, and we want to make the core mmu
spinlocked, we need to pass the page from the walker context (which can sleep)
to the shadow context (which cannot).
[marcelo: avoid recursive locking of mmap_sem]
Signed-off-by: Avi Kivity <avi@qumranet.com>
Marcelo Tosatti [Fri, 21 Dec 2007 00:18:23 +0000 (19:18 -0500)]
KVM: Add kvm_read_guest_atomic()
In preparation for a mmu spinlock, add kvm_read_guest_atomic()
and use it in fetch() and prefetch_page().
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Marcelo Tosatti [Fri, 21 Dec 2007 00:18:22 +0000 (19:18 -0500)]
KVM: MMU: Concurrent guest walkers
Do not hold kvm->lock mutex across the entire pagefault code,
only acquire it in places where it is necessary, such as mmu
hash list, active list, rmap and parent pte handling.
Allow concurrent guest walkers by switching walk_addr() to use
mmap_sem in read-mode.
And get rid of the lockless __gfn_to_page.
[avi: move kvm_mmu_pte_write() locking inside the function]
[avi: add locking for real mode]
[avi: fix cmpxchg locking]
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Wed, 26 Dec 2007 11:57:04 +0000 (13:57 +0200)]
KVM: Disable vapic support on Intel machines with FlexPriority
FlexPriority accelerates the tpr without any patching.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Thu, 25 Oct 2007 14:52:32 +0000 (16:52 +0200)]
KVM: Accelerated apic support
This adds a mechanism for exposing the virtual apic tpr to the guest, and a
protocol for letting the guest update the tpr without causing a vmexit if
conditions allow (e.g. there is no interrupt pending with a higher priority
than the new tpr).
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Mon, 22 Oct 2007 14:50:39 +0000 (16:50 +0200)]
KVM: local APIC TPR access reporting facility
Add a facility to report on accesses to the local apic tpr even if the
local apic is emulated in the kernel. This is basically a hack that
allows userspace to patch Windows which tends to bang on the tpr a lot.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Wed, 19 Dec 2007 10:02:40 +0000 (12:02 +0200)]
KVM: Print data for unimplemented wrmsr
This can help diagnosing what the guest is trying to do. In many cases
we can get away with partial emulation of msrs.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Tue, 18 Dec 2007 17:47:18 +0000 (19:47 +0200)]
KVM: MMU: Add cache miss statistic
Signed-off-by: Avi Kivity <avi@qumranet.com>
Eddie Dong [Mon, 17 Dec 2007 22:08:27 +0000 (06:08 +0800)]
KVM: MMU: Coalesce remote tlb flushes
Host side TLB flush can be merged together if multiple
spte need to be write-protected.
Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Zhang Xiantao [Mon, 17 Dec 2007 12:27:27 +0000 (20:27 +0800)]
KVM: Expose ioapic to ia64 save/restore APIs
IA64 also needs to see ioapic structure in irqchip.
Signed-off-by: xiantao.zhang@intel.com <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Zhang Xiantao [Mon, 17 Dec 2007 06:21:40 +0000 (14:21 +0800)]
KVM: Move kvm_vcpu_kick() to x86.c
Moving kvm_vcpu_kick() to x86.c. Since it should be
common for all archs, put its declarations in <linux/kvm_host.h>
Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Zhang Xiantao [Mon, 17 Dec 2007 06:16:14 +0000 (14:16 +0800)]
KVM: Move ioapic code to common directory.
Move ioapic code to common, since IA64 also needs it.
Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Zhang Xiantao [Mon, 17 Dec 2007 05:59:56 +0000 (13:59 +0800)]
KVM: Move irqchip declarations into new ioapic.h and lapic.h
This allows reuse of ioapic in ia64.
Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Sun, 16 Dec 2007 09:13:16 +0000 (11:13 +0200)]
KVM: Move drivers/kvm/* to virt/kvm/
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Sun, 16 Dec 2007 09:02:48 +0000 (11:02 +0200)]
KVM: Move arch dependent files to new directory arch/x86/kvm/
This paves the way for multiple architecture support. Note that while
ioapic.c could potentially be shared with ia64, it is also moved.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Ryan Harper [Thu, 13 Dec 2007 16:21:10 +0000 (10:21 -0600)]
KVM: VMX: Add printk_ratelimit in vmx_intr_assist
Add printk_ratelimit check in front of printk. This prevents spamming
of the message during 32-bit ubuntu 6.06server install. Previously, it
would hang during the partition formatting stage.
Signed-off-by: Ryan Harper <ryanh@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Zhang Xiantao [Fri, 14 Dec 2007 02:23:23 +0000 (10:23 +0800)]
KVM: Portability: Move kvm_vm_stat to x86.h
This patch moves kvm_vm_stat to x86.h, and every arch
can define its own kvm_vm_stat in $arch.h
Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Acked-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Zhang Xiantao [Fri, 14 Dec 2007 02:20:16 +0000 (10:20 +0800)]
KVM: Portability: Move round_robin_prev_vcpu and tss_addr to kvm_arch
This patches moves two fields round_robin_prev_vcpu and tss to kvm_arch.
Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Acked-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Zhang Xiantao [Fri, 14 Dec 2007 02:17:34 +0000 (10:17 +0800)]
KVM: Portability: move vpic and vioapic to kvm_arch
This patches moves two fields vpid and vioapic to kvm_arch
Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Acked-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Zhang Xiantao [Fri, 14 Dec 2007 02:01:48 +0000 (10:01 +0800)]
KVM: Portability: Move mmu-related fields to kvm_arch
This patches moves mmu-related fields to kvm_arch.
Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Acked-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Zhang Xiantao [Fri, 14 Dec 2007 01:54:20 +0000 (09:54 +0800)]
KVM: Portability: Move memslot aliases to new struct kvm_arch
This patches create kvm_arch to hold arch-specific kvm fileds
and moves fields naliases and aliases to kvm_arch.
Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Acked-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Zhang Xiantao [Fri, 14 Dec 2007 01:49:26 +0000 (09:49 +0800)]
KVM: Portability: Move kvm_vcpu_stat to x86.h
This patches moves kvm_vcpu_stat to x86.h, so every
arch can define its own kvm_vcpu_stat structure.
Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Acked-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Zhang Xiantao [Fri, 14 Dec 2007 01:45:31 +0000 (09:45 +0800)]
KVM: Portability: Expand the KVM_VCPU_COMM in kvm_vcpu structure.
This patches removes KVM_COMM macro, original it is hold
kvm_vcpu common fields.
Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Acked-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Zhang Xiantao [Fri, 14 Dec 2007 01:41:22 +0000 (09:41 +0800)]
KVM: Portability: Move kvm_vcpu definition back to kvm.h
This patches moves kvm_vcpu definition to kvm.h, and finally
kvm.h includes x86.h.
Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Acked-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Zhang Xiantao [Fri, 14 Dec 2007 01:35:10 +0000 (09:35 +0800)]
KVM: Portability: Split mmu-related static inline functions to mmu.h
Since these functions need to know the details of kvm or kvm_vcpu structure,
it can't be put in x86.h. Create mmu.h to hold them.
Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Acked-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Zhang Xiantao [Thu, 13 Dec 2007 15:50:52 +0000 (23:50 +0800)]
KVM: Portability: Introduce kvm_vcpu_arch
Move all the architecture-specific fields in kvm_vcpu into a new struct
kvm_vcpu_arch.
Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Acked-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Zhang Xiantao [Tue, 11 Dec 2007 12:36:00 +0000 (20:36 +0800)]
KVM: Portability: Move kvm{pic,ioapic} accesors to x86 specific code
Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Marcelo Tosatti [Wed, 12 Dec 2007 15:46:12 +0000 (10:46 -0500)]
KVM: MMU: emulated cmpxchg8b should be atomic on i386
Emulate cmpxchg8b atomically on i386. This is required to avoid a guest
pte walker from seeing a splitted write.
[avi: make it compile]
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Joerg Roedel [Tue, 11 Dec 2007 14:36:57 +0000 (15:36 +0100)]
KVM: SVM: support writing 0 to K8 performance counter control registers
This lets SVM ignore writes of the value 0 to the performance counter control
registers. Thus enabling them will still fail in the guest, but a write of 0
which keeps them disabled is accepted. This is required to boot Windows
Vista 64bit.
[avi: avoid fall-thru in switch statement]
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Markus Rechberger <markus.rechberger@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Joerg Roedel [Wed, 12 Dec 2007 11:37:24 +0000 (12:37 +0100)]
KVM: LAPIC: minor debugging compile fix
This patch fixes a compile error of the LAPIC code with APIC debugging enabled.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Markus Rechberger <markus.rechberger@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Marcelo Tosatti [Wed, 12 Dec 2007 00:12:27 +0000 (19:12 -0500)]
KVM: MMU: Fix SMP shadow instantiation race
There is a race where VCPU0 is shadowing a pagetable entry while VCPU1
is updating it, which results in a stale shadow copy.
Fix that by comparing the contents of the cached guest pte with the
current guest pte after write-protecting the guest pagetable.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Joerg Roedel [Thu, 6 Dec 2007 20:02:25 +0000 (21:02 +0100)]
KVM: SVM: Exit to userspace if write to cr8 and not using in-kernel apic
With this patch KVM on SVM will exit to userspace if the guest writes to CR8
and the in-kernel APIC is disabled.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Markus Rechberger <markus.rechberger@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Sun, 9 Dec 2007 16:43:00 +0000 (18:43 +0200)]
KVM: MMU: Use mmu_set_spte() for real-mode shadows
In addition to removing some duplicated code, this also handles the unlikely
case of real-mode code updating a guest page table. This can happen when
one vcpu (in real mode) touches a second vcpu's (in protected mode) page
tables, or if a vcpu switches to real mode, touches page tables, and switches
back.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Sun, 9 Dec 2007 16:39:41 +0000 (18:39 +0200)]
KVM: MMU: Adjust mmu_set_spte() debug code for gpte removal
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Sun, 9 Dec 2007 15:40:31 +0000 (17:40 +0200)]
KVM: MMU: Move set_pte() into guest paging mode independent code
As set_pte() no longer references either a gpte or the guest walker, we can
move it out of paging mode dependent code (which compiles twice and is
generally nasty).
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Sun, 9 Dec 2007 15:33:46 +0000 (17:33 +0200)]
KVM: MMU: Remove walker argument to set_pte()
Unused.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Sun, 9 Dec 2007 15:32:30 +0000 (17:32 +0200)]
KVM: MMU: Pass pte dirty flag to set_pte() instead of calculating it on-site
This allows us to remove its dependency on pt_element_t.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Sun, 9 Dec 2007 15:27:52 +0000 (17:27 +0200)]
KVM: MMU: No need to pick up nx bit from guest pte
We already set it according to cumulative access permissions.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Sun, 9 Dec 2007 15:00:02 +0000 (17:00 +0200)]
KVM: MMU: Fix inherited permissions for emulated guest pte updates
When we emulate a guest pte write, we fail to apply the correct inherited
permissions from the parent ptes. Now that we store inherited permissions
in the shadow page, we can use that to update the pte permissions correctly.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Sun, 9 Dec 2007 14:52:56 +0000 (16:52 +0200)]
KVM: MMU: Move pte access calculation into a helper function
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Sun, 9 Dec 2007 14:37:36 +0000 (16:37 +0200)]
KVM: MMU: Set nx bit correctly on shadow ptes
While the page table walker correctly generates a guest page fault
if a guest tries to execute a non-executable page, the shadow code does
not mark it non-executable. This means that if a guest accesses an nx
page first with a read access, then subsequent code fetch accesses will
succeed.
Fix by setting the nx bit on shadow ptes.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Sun, 9 Dec 2007 14:15:46 +0000 (16:15 +0200)]
KVM: MMU: Simplify calculation of pte access
The nx bit is awkwardly placed in the 63rd bit position; furthermore it
has a reversed meaning compared to the other bits, which means we can't use
a bitwise and to calculate compounded access masks.
So, we simplify things by creating a new 3-bit exec/write/user access word,
and doing all calculations in that.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Marcelo Tosatti [Fri, 7 Dec 2007 12:56:58 +0000 (07:56 -0500)]
KVM: MMU: Use cmpxchg for pte updates on walk_addr()
In preparation for multi-threaded guest pte walking, use cmpxchg()
when updating guest pte's. This guarantees that the assignment of the
dirty bit can't be lost if two CPU's are faulting the same address
simultaneously.
[avi: fix kunmap_atomic() parameters]
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Thu, 6 Dec 2007 17:50:00 +0000 (19:50 +0200)]
KVM: SVM: Trap access to the cr8 register
Later we may be able to use the virtual tpr feature, but for now,
just trap it.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Thu, 6 Dec 2007 16:14:14 +0000 (18:14 +0200)]
KVM: x86 emulator: Fix stack instructions on 64-bit mode
Stack instructions are always 64-bit on 64-bit mode; many of the
emulated stack instructions did not take that into account. Fix by
adding a 'Stack' bitflag and setting the operand size appropriately
during the decode stage (except for 'push r/m', which is in a group
with a few other instructions, so it gets its own treatment).
This fixes random crashes on Vista x64.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Joerg Roedel [Thu, 6 Dec 2007 14:46:52 +0000 (15:46 +0100)]
KVM: SVM: Emulate read/write access to cr8
This patch adds code to emulate the access to the cr8 register to the x86
instruction emulator in kvm. This is needed on svm, where there is no
hardware decode for control register access.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Markus Rechberger <markus.rechberger@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Thu, 6 Dec 2007 14:32:45 +0000 (16:32 +0200)]
KVM: VMX: Avoid exit when setting cr8 if the local apic is in the kernel
With apic in userspace, we must exit to userspace after a cr8 write in order
to update the tpr. But if the apic is in the kernel, the exit is unnecessary.
Noticed by Joerg Roedel.
Signed-off-by: Avi Kivity <avi@qumranet.com>