Andi Kleen [Wed, 2 May 2007 17:27:17 +0000 (19:27 +0200)]
[PATCH] x86-64: Move mtrr prototypes from proto.h to mtrr.h
Signed-off-by: Andi Kleen <ak@suse.de>
Jeremy Fitzhardinge [Wed, 2 May 2007 17:27:17 +0000 (19:27 +0200)]
[PATCH] i386: Clean up ELF note generation
Three cleanups:
1: ELF notes are never mapped, so there's no need to have any access
flags in their phdr.
2: When generating them from asm, tell the assembler to use a SHT_NOTE
section type. There doesn't seem to be a way to do this from C.
3: Use ANSI rather than traditional cpp behaviour to stringify the
macro argument.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Andi Kleen [Wed, 2 May 2007 17:27:17 +0000 (19:27 +0200)]
[PATCH] i386: PARAVIRT: Export paravirt_ops for non GPL modules too
Otherwise non GPL modules cannot even do basic operations
like disabling interrupts anymore, which would be excessive.
Longer term should split the single structure up into
internal and external symbols and not export the internal
ones at all.
Signed-off-by: Andi Kleen <ak@suse.de>
Jeremy Fitzhardinge [Wed, 2 May 2007 17:27:16 +0000 (19:27 +0200)]
[PATCH] x86: PARAVIRT: Jeremy Fitzhardinge <jeremy@goop.org>
The other symbols used to delineate the alt-instructions sections have the
form __foo/__foo_end. Rename parainstructions to match.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Zachary Amsden [Wed, 2 May 2007 17:27:16 +0000 (19:27 +0200)]
[PATCH] i386: Convert VMI timer to use clock events
Convert VMI timer to use clock events, making it properly able to use the NO_HZ
infrastructure. On UP systems, with no local APIC, we just continue to route
these events through the PIT. On systems with a local APIC, or SMP, we provide
a single source interrupt chip which creates the local timer IRQ. It actually
gets delivered by the APIC hardware, but we don't want to use the same local
APIC clocksource processing, so we create our own handler here.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andi Kleen <ak@suse.de>
CC: Dan Hecht <dhecht@vmware.com>
CC: Ingo Molnar <mingo@elte.hu>
CC: Thomas Gleixner <tglx@linutronix.de>
Zachary Amsden [Wed, 2 May 2007 17:27:16 +0000 (19:27 +0200)]
[PATCH] i386: Implement vmi_kmap_atomic_pte
Implement vmi_kmap_atomic_pte in terms of the backend set_linear_mapping
operation. The conversion is rather straighforward; call kmap_atomic
and then inform the hypervisor of the page mapping.
The _flush_tlb damage is due to macros being pulled in from highmem.h.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Zachary Amsden [Wed, 2 May 2007 17:27:16 +0000 (19:27 +0200)]
[PATCH] i386: Now that the VDSO can be relocated, we can support it in VMI configurations.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Zachary Amsden [Wed, 2 May 2007 17:27:16 +0000 (19:27 +0200)]
[PATCH] i386: Clean up arch/i386/kernel/cpu/mcheck/p4.c
No, just no. You do not use goto to skip a code block. You do not
return an obvious variable from a singly-inlined function and give
the function a return value. You don't put unexplained comments
about kmalloc in code which doesn't do dynamic allocation. And
you don't leave stray warnings around for no good reason.
Also, when possible, it is better to use block scoped variables
because gcc can sometime generate better code.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Jeremy Fitzhardinge [Wed, 2 May 2007 17:27:16 +0000 (19:27 +0200)]
[PATCH] i386: PARAVIRT: Allow boot-time disable of paravirt_ops patching
Add "noreplace-paravirt" to disable paravirt_ops patching.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Zachary Amsden [Wed, 2 May 2007 17:27:16 +0000 (19:27 +0200)]
[PATCH] i386: In compat mode, the return value here was uninitialized.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Jeremy Fitzhardinge [Wed, 2 May 2007 17:27:16 +0000 (19:27 +0200)]
[PATCH] x86: update for i386 and x86-64 check_bugs
Remove spurious comments, headers and keywords from x86-64 bugs.[ch].
Use identify_boot_cpu()
AK: merged with other patch
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Jeremy Fitzhardinge [Wed, 2 May 2007 17:27:16 +0000 (19:27 +0200)]
[PATCH] i386: map enough initial memory to create lowmem mappings
head.S creates the very initial pagetable for the kernel. This just
maps enough space for the kernel itself, and an allocation bitmap.
The amount of mapped memory is rounded up to 4Mbytes, and so this
typically ends up mapping 8Mbytes of memory.
When booting, pagetable_init() needs to create mappings for all
lowmem, and the pagetables for these mappings are allocated from the
free pages around the kernel in low memory. If the number of
pagetable pages + kernel size exceeds head.S's initial mapping, it
will end up faulting on an unmapped page. This will only happen with
specific combinations of kernel size and memory size.
This patch makes sure that head.S also maps enough space to fit the
kernel pagetables as well as the kernel itself. It ends up using an
additional two pages of unreclaimable memory.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Zachary Amsden <zach@vmware.com>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
Jeremy Fitzhardinge [Wed, 2 May 2007 17:27:16 +0000 (19:27 +0200)]
[PATCH] i386: Fix UP gdt bugs
Fixes two problems with the GDT when compiling for uniprocessor:
- There's no percpu segment, so trying to load its selector into %fs fails.
Use a null selector instead.
- The real gdt needs to be loaded at some point. Do it in cpu_init().
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Jeremy Fitzhardinge [Wed, 2 May 2007 17:27:16 +0000 (19:27 +0200)]
[PATCH] i386: Define per_cpu_offset
Define per_cpu_offset in asm-i386/percpu.h when SMP defined, like
asm-generic/percpu.h does for UP.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Andi Kleen <ak@suse.de>
Jeremy Fitzhardinge [Wed, 2 May 2007 17:27:16 +0000 (19:27 +0200)]
[PATCH] i386: cleanups to help using per-cpu variables from asm
This patch does a few small cleanups:
- use PER_CPU_NAME to generate the names of per-cpu variables
- use lea to add the per_cpu offset in PER_CPU(), because it doesn't
affect condition flags
- add PER_CPU_VAR which allows direct access to pre-cpu variables
with the %fs: prefix on SMP.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Andi Kleen <ak@suse.de>
Jeremy Fitzhardinge [Wed, 2 May 2007 17:27:16 +0000 (19:27 +0200)]
[PATCH] i386: Convert PDA into the percpu section
Currently x86 (similar to x84-64) has a special per-cpu structure
called "i386_pda" which can be easily and efficiently referenced via
the %fs register. An ELF section is more flexible than a structure,
allowing any piece of code to use this area. Indeed, such a section
already exists: the per-cpu area.
So this patch:
(1) Removes the PDA and uses per-cpu variables for each current member.
(2) Replaces the __KERNEL_PDA segment with __KERNEL_PERCPU.
(3) Creates a per-cpu mirror of __per_cpu_offset called this_cpu_off, which
can be used to calculate addresses for this CPU's variables.
(4) Simplifies startup, because %fs doesn't need to be loaded with a
special segment at early boot; it can be deferred until the first
percpu area is allocated (or never for UP).
The result is less code and one less x86-specific concept.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Jeremy Fitzhardinge [Wed, 2 May 2007 17:27:15 +0000 (19:27 +0200)]
[PATCH] i386: Page-align the GDT
Xen wants a dedicated page for the GDT. I believe VMI likes it too.
lguest, KVM and native don't care.
Simple transformation to page-aligned "struct gdt_page".
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Jeremy Fitzhardinge [Wed, 2 May 2007 17:27:15 +0000 (19:27 +0200)]
[PATCH] x86-64: deflate inflate_dynamic too
inflate_dynamic() has piggy stack usage too, so heap allocate it too.
I'm not sure it actually gets used, but it shows up large in "make
checkstack".
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Jeremy Fitzhardinge [Wed, 2 May 2007 17:27:15 +0000 (19:27 +0200)]
[PATCH] x86: deflate stack usage in lib/inflate.c
inflate_fixed and huft_build together use around 2.7k of stack. When
using 4k stacks, I saw stack overflows from interrupts arriving while
unpacking the root initrd:
do_IRQ: stack overflow: 384
[<
c0106b64>] show_trace_log_lvl+0x1a/0x30
[<
c01075e6>] show_trace+0x12/0x14
[<
c010763f>] dump_stack+0x16/0x18
[<
c0107ca4>] do_IRQ+0x6d/0xd9
[<
c010202b>] xen_evtchn_do_upcall+0x6e/0xa2
[<
c0106781>] xen_hypervisor_callback+0x25/0x2c
[<
c010116c>] xen_restore_fl+0x27/0x29
[<
c0330f63>] _spin_unlock_irqrestore+0x4a/0x50
[<
c0117aab>] change_page_attr+0x577/0x584
[<
c0117b45>] kernel_map_pages+0x8d/0xb4
[<
c016a314>] cache_alloc_refill+0x53f/0x632
[<
c016a6c2>] __kmalloc+0xc1/0x10d
[<
c0463d34>] malloc+0x10/0x12
[<
c04641c1>] huft_build+0x2a7/0x5fa
[<
c04645a5>] inflate_fixed+0x91/0x136
[<
c04657e2>] unpack_to_rootfs+0x5f2/0x8c1
[<
c0465acf>] populate_rootfs+0x1e/0xe4
(This was under Xen, but there's no reason it couldn't happen on bare
hardware.)
This patch mallocs the local variables, thereby reducing the stack
usage to sane levels.
Also, up the heap size for the kernel decompressor to deal with the
extra allocation.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Tim Yamin <plasmaroo@gentoo.org>
Cc: Andi Kleen <ak@suse.de>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ian Molton <spyro@f2s.com>
Jeremy Fitzhardinge [Wed, 2 May 2007 17:27:15 +0000 (19:27 +0200)]
[PATCH] i386: PARAVIRT: drop unused ptep_get_and_clear
In shadow mode hypervisors, ptep_get_and_clear achieves the desired
purpose of keeping the shadows in sync by issuing a native_get_and_clear,
followed by a call to pte_update, which indicates the PTE has been
modified.
Direct mode hypervisors (Xen) have no need for this anyway, and will trap
the update using writable pagetables.
This means no hypervisor makes use of ptep_get_and_clear; there is no
reason to have it in the paravirt-ops structure. Change confusing
terminology about raw vs. native functions into consistent use of
native_pte_xxx for operations which do not invoke paravirt-ops.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Jeremy Fitzhardinge [Wed, 2 May 2007 17:27:15 +0000 (19:27 +0200)]
[PATCH] i386: PARAVIRT: Clean up paravirt patchable wrappers
Replace all the open-coded macros for generating calls with a pair of
more general macros (__PVOP_CALL/VCALL), and redefine all the
PVOP_V?CALL[0-4] in terms of them.
[ Andrew, Andi: this should slot in immediately after "Document asm-i386/paravirt.h"
(paravirt_ops-document-asm-i386-paravirth.patch) ]
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Ingo Molnar <mingo@elte.hu>
Jeremy Fitzhardinge [Wed, 2 May 2007 17:27:15 +0000 (19:27 +0200)]
[PATCH] i386: PARAVIRT: Use enums for paravirt lazy flush modi
Remove #defines, add enum for PARAVIRT_LAZY_FLUSH.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Jeremy Fitzhardinge [Wed, 2 May 2007 17:27:15 +0000 (19:27 +0200)]
[PATCH] i386: PARAVIRT: flush lazy mmu updates on kunmap_atomic
kunmap_atomic should flush any pending lazy mmu updates, mainly to be
consistent with kmap_atomic, and to preserve its normal behaviour.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Jeremy Fitzhardinge [Wed, 2 May 2007 17:27:15 +0000 (19:27 +0200)]
[PATCH] i386: PARAVIRT: add kmap_atomic_pte for mapping highpte pages
Xen and VMI both have special requirements when mapping a highmem pte
page into the kernel address space. These can be dealt with by adding
a new kmap_atomic_pte() function for mapping highptes, and hooking it
into the paravirt_ops infrastructure.
Xen specifically wants to map the pte page RO, so this patch exposes a
helper function, kmap_atomic_prot, which maps the page with the
specified page protections.
This also adds a kmap_flush_unused() function to clear out the cached
kmap mappings. Xen needs this to clear out any potential stray RW
mappings of pages which will become part of a pagetable.
[ Zach - vmi.c will need some attention after this patch. It wasn't
immediately obvious to me what needs to be done. ]
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Zachary Amsden <zach@vmware.com>
Jeremy Fitzhardinge [Wed, 2 May 2007 17:27:15 +0000 (19:27 +0200)]
[PATCH] i386: PARAVIRT: revert map_pt_hook.
Back out the map_pt_hook to clear the way for kmap_atomic_pte.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Zachary Amsden <zach@vmware.com>
Jeremy Fitzhardinge [Wed, 2 May 2007 17:27:15 +0000 (19:27 +0200)]
[PATCH] i386: PARAVIRT: add flush_tlb_others paravirt_op
This patch adds a pv_op for flush_tlb_others. Linux running on native
hardware uses cross-CPU IPIs to flush the TLB on any CPU which may
have a particular mm's pagetable entries cached in its TLB. This is
inefficient in a paravirtualized environment, since the hypervisor
knows which real CPUs actually contain cached mappings, which may be a
small subset of a guest's VCPUs.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Jeremy Fitzhardinge [Wed, 2 May 2007 17:27:14 +0000 (19:27 +0200)]
[PATCH] i386: PARAVIRT: add common patching machinery
Implement the actual patching machinery. paravirt_patch_default()
contains the logic to automatically patch a callsite based on a few
simple rules:
- if the paravirt_op function is paravirt_nop, then patch nops
- if the paravirt_op function is a jmp target, then jmp to it
- if the paravirt_op function is callable and doesn't clobber too much
for the callsite, call it directly
paravirt_patch_default is suitable as a default implementation of
paravirt_ops.patch, will remove most of the expensive indirect calls
in favour of either a direct call or a pile of nops.
Backends may implement their own patcher, however. There are several
helper functions to help with this:
paravirt_patch_nop nop out a callsite
paravirt_patch_ignore leave the callsite as-is
paravirt_patch_call patch a call if the caller and callee
have compatible clobbers
paravirt_patch_jmp patch in a jmp
paravirt_patch_insns patch some literal instructions over
the callsite, if they fit
This patch also implements more direct patches for the native case, so
that when running on native hardware many common operations are
implemented inline.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Zachary Amsden <zach@vmware.com>
Cc: Anthony Liguori <anthony@codemonkey.ws>
Acked-by: Ingo Molnar <mingo@elte.hu>
Jeremy Fitzhardinge [Wed, 2 May 2007 17:27:14 +0000 (19:27 +0200)]
[PATCH] i386: PARAVIRT: Document asm-i386/paravirt.h
Clean things up, and broadly document:
- the paravirt_ops functions themselves
- the patching mechanism
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Jeremy Fitzhardinge [Wed, 2 May 2007 17:27:14 +0000 (19:27 +0200)]
[PATCH] i386: PARAVIRT: Consistently wrap paravirt ops callsites to make them patchable
Wrap a set of interesting paravirt_ops calls in a wrapper which makes
the callsites available for patching. Unfortunately this is pretty
ugly because there's no way to get gcc to generate a function call,
but also wrap just the callsite itself with the necessary labels.
This patch supports functions with 0-4 arguments, and either void or
returning a value. 64-bit arguments must be split into a pair of
32-bit arguments (lower word first). Small structures are returned in
registers.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Zachary Amsden <zach@vmware.com>
Cc: Anthony Liguori <anthony@codemonkey.ws>
Jeremy Fitzhardinge [Wed, 2 May 2007 17:27:14 +0000 (19:27 +0200)]
[PATCH] i386: PARAVIRT: Fix patch site clobbers to include return register
Fix a few clobbers to include the return register. The clobbers set
is the set of all registers modified (or may be modified) by the code
snippet, regardless of whether it was deliberate or accidental.
Also, make sure that callsites which are used in contexts which don't
allow clobbers actually save and restore all clobberable registers.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Zachary Amsden <zach@vmware.com>
Jeremy Fitzhardinge [Wed, 2 May 2007 17:27:14 +0000 (19:27 +0200)]
[PATCH] i386: PARAVIRT: Use patch site IDs computed from offset in paravirt_ops structure
Use patch type identifiers derived from the offset of the operation in
the paravirt_ops structure. This avoids having to maintain a separate
enum for patch site types.
Also, since the identifier is derived from the offset into
paravirt_ops, the offset can be derived from the identifier. This is
used to remove replicated information in the various callsite macros,
which has been a source of bugs in the past.
This patch also drops the fused save_fl+cli operation, which doesn't
really add much and makes things more complex - specifically because
it breaks the 1:1 relationship between identifiers and offsets. If
this operation turns out to be particularly beneficial, then the right
answer is to define a new entrypoint for it.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Zachary Amsden <zach@vmware.com>
Jeremy Fitzhardinge [Wed, 2 May 2007 17:27:14 +0000 (19:27 +0200)]
[PATCH] i386: PARAVIRT: rename struct paravirt_patch to paravirt_patch_site for clarity
Rename struct paravirt_patch to paravirt_patch_site, so that it
clearly refers to a callsite, and not the patch which may be applied
to that callsite.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Zachary Amsden <zach@vmware.com>
Jeremy Fitzhardinge [Wed, 2 May 2007 17:27:14 +0000 (19:27 +0200)]
[PATCH] x86: PARAVIRT: add hooks to intercept mm creation and destruction
Add hooks to allow a paravirt implementation to track the lifetime of
an mm. Paravirtualization requires three hooks, but only two are
needed in common code. They are:
arch_dup_mmap, which is called when a new mmap is created at fork
arch_exit_mmap, which is called when the last process reference to an
mm is dropped, which typically happens on exit and exec.
The third hook is activate_mm, which is called from the arch-specific
activate_mm() macro/function, and so doesn't need stub versions for
other architectures. It's called when an mm is first used.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: linux-arch@vger.kernel.org
Cc: James Bottomley <James.Bottomley@SteelEye.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Jeremy Fitzhardinge [Wed, 2 May 2007 17:27:13 +0000 (19:27 +0200)]
[PATCH] i386: PARAVIRT: Allow paravirt backend to choose kernel PMD sharing
Normally when running in PAE mode, the 4th PMD maps the kernel address space,
which can be shared among all processes (since they all need the same kernel
mappings).
Xen, however, does not allow guests to have the kernel pmd shared between page
tables, so parameterize pgtable.c to allow both modes of operation.
There are several side-effects of this. One is that vmalloc will update the
kernel address space mappings, and those updates need to be propagated into
all processes if the kernel mappings are not intrinsically shared. In the
non-PAE case, this is done by maintaining a pgd_list of all processes; this
list is used when all process pagetables must be updated. pgd_list is
threaded via otherwise unused entries in the page structure for the pgd, which
means that the pgd must be page-sized for this to work.
Normally the PAE pgd is only 4x64 byte entries large, but Xen requires the PAE
pgd to page aligned anyway, so this patch forces the pgd to be page
aligned+sized when the kernel pmd is unshared, to accomodate both these
requirements.
Also, since there may be several distinct kernel pmds (if the user/kernel
split is below 3G), there's no point in allocating them from a slab cache;
they're just allocated with get_free_page and initialized appropriately. (Of
course the could be cached if there is just a single kernel pmd - which is the
default with a 3G user/kernel split - but it doesn't seem worthwhile to add
yet another case into this code).
[ Many thanks to wli for review comments. ]
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Zachary Amsden <zach@vmware.com>
Cc: Christoph Lameter <clameter@sgi.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Jeremy Fitzhardinge [Wed, 2 May 2007 17:27:13 +0000 (19:27 +0200)]
[PATCH] i386: PARAVIRT: Allocate a fixmap slot
Allocate a fixmap slot for use by a paravirt_ops implementation. This
is intended for early-boot bootstrap mappings. Once the zones and
allocator have been set up, it would be better to use get_vm_area() to
allocate some virtual space.
Xen uses this to map the hypervisor's shared info page, which doesn't
have a pseudo-physical page number, and therefore can't be mapped
ordinarily. It is needed early because it contains the vcpu state,
including the interrupt mask.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Jeremy Fitzhardinge [Wed, 2 May 2007 17:27:13 +0000 (19:27 +0200)]
[PATCH] i386: PARAVIRT: Hooks to set up initial pagetable
This patch introduces paravirt_ops hooks to control how the kernel's
initial pagetable is set up.
In the case of a native boot, the very early bootstrap code creates a
simple non-PAE pagetable to map the kernel and physical memory. When
the VM subsystem is initialized, it creates a proper pagetable which
respects the PAE mode, large pages, etc.
When booting under a hypervisor, there are many possibilities for what
paging environment the hypervisor establishes for the guest kernel, so
the constructon of the kernel's pagetable depends on the hypervisor.
In the case of Xen, the hypervisor boots the kernel with a fully
constructed pagetable, which is already using PAE if necessary. Also,
Xen requires particular care when constructing pagetables to make sure
all pagetables are always mapped read-only.
In order to make this easier, kernel's initial pagetable construction
has been changed to only allocate and initialize a pagetable page if
there's no page already present in the pagetable. This allows the Xen
paravirt backend to make a copy of the hypervisor-provided pagetable,
allowing the kernel to establish any more mappings it needs while
keeping the existing ones.
A slightly subtle point which is worth highlighting here is that Xen
requires all kernel mappings to share the same pte_t pages between all
pagetables, so that updating a kernel page's mapping in one pagetable
is reflected in all other pagetables. This makes it possible to
allocate a page and attach it to a pagetable without having to
explicitly enumerate that page's mapping in all pagetables.
And:
+From: "Eric W. Biederman" <ebiederm@xmission.com>
If we don't set the leaf page table entries it is quite possible that
will inherit and incorrect page table entry from the initial boot
page table setup in head.S. So we need to redo the effort here,
so we pick up PSE, PGE and the like.
Hypervisors like Xen require that their page tables be read-only,
which is slightly incompatible with our low identity mappings, however
I discussed this with Jeremy he has modified the Xen early set_pte
function to avoid problems in this area.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: William Irwin <bill.irwin@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Jeremy Fitzhardinge [Wed, 2 May 2007 17:27:13 +0000 (19:27 +0200)]
[PATCH] i386: PARAVIRT: Add pagetable accessors to pack and unpack pagetable entries
Add a set of accessors to pack, unpack and modify page table entries
(at all levels). This allows a paravirt implementation to control the
contents of pgd/pmd/pte entries. For example, Xen uses this to
convert the (pseudo-)physical address into a machine address when
populating a pagetable entry, and converting back to pphys address
when an entry is read.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Jeremy Fitzhardinge [Wed, 2 May 2007 17:27:13 +0000 (19:27 +0200)]
[PATCH] i386: PARAVIRT: use paravirt_nop to consistently mark no-op operations
Add a _paravirt_nop function for use as a stub for no-op operations,
and paravirt_nop #defined void * version to make using it easier
(since all its uses are as a void *).
This is useful to allow the patcher to automatically identify noop
operations so it can simply nop out the callsite.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
[mingo] but only as a cleanup of the current open-coded (void *) casts.
My problem with this is that it loses the types. Not that there is much
to check for, but still, this adds some assumptions about how function
calls look like
Jeremy Fitzhardinge [Wed, 2 May 2007 17:27:13 +0000 (19:27 +0200)]
[PATCH] i386: PARAVIRT: Remove CONFIG_DEBUG_PARAVIRT
Remove CONFIG_DEBUG_PARAVIRT. When inlining code, this option
attempts to trash registers in the patch-site's "clobber" field, on
the grounds that this should find bugs with incorrect clobbers.
Unfortunately, the clobber field really means "registers modified by
this patch site", which includes return values.
Because of this, this option has outlived its usefulness, so remove
it.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Jeremy Fitzhardinge [Wed, 2 May 2007 17:27:13 +0000 (19:27 +0200)]
[PATCH] x86-64: update MAINTAINERS
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Zachary Amsden <zach@vmware.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Rusty Russell [Wed, 2 May 2007 17:27:13 +0000 (19:27 +0200)]
[PATCH] i386: i386 separate hardware-defined TSS from Linux additions
On Thu, 2007-03-29 at 13:16 +0200, Andi Kleen wrote:
> Please clean it up properly with two structs.
Not sure about this, now I've done it. Running it here.
If you like it, I can do x86-64 as well.
==
lguest defines its own TSS struct because the "struct tss_struct"
contains linux-specific additions. Andi asked me to split the struct
in processor.h.
Unfortunately it makes usage a little awkward.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andi Kleen <ak@suse.de>
James Puthukattukaran [Wed, 2 May 2007 17:27:13 +0000 (19:27 +0200)]
[PATCH] x86-64: x86-64 system crashes when no memory populating Node 0
I have a 4 socket AMD Operton system. The 2.6.18 kernel I have crashes
when there is no memory in node0.
AK: changed call to _nopanic
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Glauber de Oliveira Costa [Wed, 2 May 2007 17:27:13 +0000 (19:27 +0200)]
[PATCH] x86-64: Fix x86_64 compilation with DEBUG_SIG on
Setting the DEBUG_SIG flag breaks compilation due to a wrong
struct access. Aditionally, it raises two warnings. This is one
patch to fix them all.
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Jeremy Fitzhardinge [Wed, 2 May 2007 17:27:13 +0000 (19:27 +0200)]
[PATCH] i386: Allow boot-time disable of SMP altinstructions
Add "noreplace-smp" to disable SMP instruction replacement.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Jeremy Fitzhardinge [Wed, 2 May 2007 17:27:13 +0000 (19:27 +0200)]
[PATCH] i386: Remove smp_alt_instructions
The .smp_altinstructions section and its corresponding symbols are
completely unused, so remove them.
Also, remove stray #ifdef __KENREL__ in asm-i386/alternative.h
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
H. Peter Anvin [Wed, 2 May 2007 17:27:12 +0000 (19:27 +0200)]
[PATCH] x86: Clean up x86 control register and MSR macros (corrected)
This patch is based on Rusty's recent cleanup of the EFLAGS-related
macros; it extends the same kind of cleanup to control registers and
MSRs.
It also unifies these between i386 and x86-64; at least with regards
to MSRs, the two had definitely gotten out of sync.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Jeremy Fitzhardinge [Wed, 2 May 2007 17:27:12 +0000 (19:27 +0200)]
[PATCH] x86: Allow percpu variables to be page-aligned
Let's allow page-alignment in general for per-cpu data (wanted by Xen, and
Ingo suggested KVM as well).
Because larger alignments can use more room, we increase the max per-cpu
memory to 64k rather than 32k: it's getting a little tight.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Andi Kleen [Wed, 2 May 2007 17:27:12 +0000 (19:27 +0200)]
[PATCH] i386: Enable bank 0 on non K7 Athlon
As a bug workaround bank 0 on K7s is normally disabled, but no need
to do that on other AMD CPUs.
Cc: davej@redhat.com
Signed-off-by: Andi Kleen <ak@suse.de>
Jeremy Fitzhardinge [Wed, 2 May 2007 17:27:12 +0000 (19:27 +0200)]
[PATCH] i386: Update smp_call_function* comments
Update documentation for i386 smp_call_function* functions.
As reported by Randy Dunlap <rdunlap@xenotime.net>
[ I've posted this before but it seems to have been lost along the way. ]
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Jan Engelhardt [Wed, 2 May 2007 17:27:12 +0000 (19:27 +0200)]
[PATCH] i386: Use menuconfig objects - APM
(I hope Andi is the right one to Cc, otherwise please add, thanks!)
Use menuconfigs instead of menus, so the whole menu can be disabled at
once instead of going through all options.
Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Andi Kleen <ak@suse.de>
Andi Kleen [Wed, 2 May 2007 17:27:12 +0000 (19:27 +0200)]
[PATCH] x86: Don't use MWAIT on AMD Family 10
It doesn't put the CPU into deeper sleep states, so it's better to use the standard
idle loop to save power. But allow to reenable it anyways for benchmarking.
I also removed the obsolete idle=halt on i386
Cc: andreas.herrmann@amd.com
Signed-off-by: Andi Kleen <ak@suse.de>
Jeremy Fitzhardinge [Wed, 2 May 2007 17:27:12 +0000 (19:27 +0200)]
[PATCH] x86-64: Clean up asm-x86_64/bugs.h
Most of asm-x86_64/bugs.h is code which should be in a C file, so put it there.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Jeremy Fitzhardinge [Wed, 2 May 2007 17:27:12 +0000 (19:27 +0200)]
[PATCH] i386: Make COMPAT_VDSO runtime selectable.
Now that relocation of the VDSO for COMPAT_VDSO users is done at
runtime rather than compile time, it is possible to enable/disable
compat mode at runtime.
This patch allows you to enable COMPAT_VDSO mode with "vdso=2" on the
kernel command line, or via sysctl. (Switching on a running system
shouldn't be done lightly; any process which was relying on the compat
VDSO will be upset if it goes away.)
The COMPAT_VDSO config option still exists, but if enabled it just
makes vdso_enabled default to VDSO_COMPAT.
+From: Hugh Dickins <hugh@veritas.com>
Fix oops from i386-make-compat_vdso-runtime-selectable.patch.
Even mingetty at system startup finds it easy to trigger an oops
while reading /proc/PID/maps: though it has a good hold on the mm
itself, that cannot stop exit_mm() from resetting tsk->mm to NULL.
(It is usually show_map()'s call to get_gate_vma() which oopses,
and I expect we could change that to check priv->tail_vma instead;
but no matter, even m_start()'s call just after get_task_mm() is racy.)
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Zachary Amsden <zach@vmware.com>
Cc: "Jan Beulich" <JBeulich@novell.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland McGrath <roland@redhat.com>
Jeremy Fitzhardinge [Wed, 2 May 2007 17:27:12 +0000 (19:27 +0200)]
[PATCH] i386: Relocate VDSO ELF headers to match mapped location with COMPAT_VDSO
Some versions of libc can't deal with a VDSO which doesn't have its
ELF headers matching its mapped address. COMPAT_VDSO maps the VDSO at
a specific system-wide fixed address. Previously this was all done at
build time, on the grounds that the fixed VDSO address is always at
the top of the address space. However, a hypervisor may reserve some
of that address space, pushing the fixmap address down.
This patch does the adjustment dynamically at runtime, depending on
the runtime location of the VDSO fixmap.
[ Patch has been through several hands: Jan Beulich wrote the orignal
version; Zach reworked it, and Jeremy converted it to relocate phdrs
as well as sections. ]
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Zachary Amsden <zach@vmware.com>
Cc: "Jan Beulich" <JBeulich@novell.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland McGrath <roland@redhat.com>
Jeremy Fitzhardinge [Wed, 2 May 2007 17:27:12 +0000 (19:27 +0200)]
[PATCH] i386: clean up identify_cpu
identify_cpu() is used to identify both the boot CPU and secondary
CPUs, but it performs some actions which only apply to the boot CPU.
Those functions are therefore really __init functions, but because
they're called by identify_cpu(), they must be marked __cpuinit.
This patch splits identify_cpu() into identify_boot_cpu() and
identify_secondary_cpu(), and calls the appropriate init functions
from each. Also, identify_boot_cpu() and all the functions it
dominates are marked __init.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Jeremy Fitzhardinge [Wed, 2 May 2007 17:27:12 +0000 (19:27 +0200)]
[PATCH] i386: Clean up asm-i386/bugs.h
Most of asm-i386/bugs.h is code which should be in a C file, so put it there.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Andi Kleen [Wed, 2 May 2007 17:27:12 +0000 (19:27 +0200)]
[PATCH] x86-64: Fix vmalloc_32 to really allocate <4GB on 64bit platforms
Ugly ifdef, but should handle all 64bit platforms that have suitable
zones. On some like Altix it's probably impossible without IOMMU
use to get memory <4GB this way, but they have to live with that.
Signed-off-by: Andi Kleen <ak@suse.de>
Avi Kivity [Wed, 2 May 2007 17:27:12 +0000 (19:27 +0200)]
[PATCH] x86-64: fix arithmetic in comment
The xmm space on x86_64 is 256 bytes.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Andi Kleen [Wed, 2 May 2007 17:27:11 +0000 (19:27 +0200)]
[PATCH] x86-64: Use X86_EFLAGS_IF in x86-64/irqflags.h.
As per i386 patch: move X86_EFLAGS_IF et al out to a new header:
processor-flags.h, so we can include it from irqflags.h and use it in
raw_irqs_disabled_flags().
As a side-effect, we could now use these flags in .S files.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andi Kleen <ak@suse.de>
Jan Beulich [Wed, 2 May 2007 17:27:11 +0000 (19:27 +0200)]
[PATCH] x86: fix amd64-agp aperture validation
Under CONFIG_DISCONTIGMEM, assuming that a !pfn_valid() implies all
subsequent pfn-s are also invalid is wrong. Thus replace this by
explicitly checking against the E820 map.
AK: make e820 on x86-64 not initdata
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Mark Langsdorf <mark.langsdorf@amd.com>
Jeremy Fitzhardinge [Wed, 2 May 2007 17:27:11 +0000 (19:27 +0200)]
[PATCH] x86-64: Account for module percpu space separately from kernel percpu
Rather than using a single constant PERCPU_ENOUGH_ROOM, compute it as
the sum of kernel_percpu + PERCPU_MODULE_RESERVE. This is now common
to all architectures; if an architecture wants to set
PERCPU_ENOUGH_ROOM to something special, then it may do so (ia64 is
the only one which does).
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Andi Kleen <ak@suse.de>
Andi Kleen [Wed, 2 May 2007 17:27:11 +0000 (19:27 +0200)]
[PATCH] i386: Remove unneeded externs in nmi.c
All were already in some header
Signed-off-by: Andi Kleen <ak@suse.de>
Andi Kleen [Wed, 2 May 2007 17:27:11 +0000 (19:27 +0200)]
[PATCH] x86-64: Change my email address
Signed-off-by: Andi Kleen <ak@suse.de>
Jeremy Fitzhardinge [Wed, 2 May 2007 17:27:11 +0000 (19:27 +0200)]
[PATCH] i386: Add machine_ops interface to abstract halting and rebooting
machine_ops is an interface for the machine_* functions defined in
<linux/reboot.h>. This is intended to allow hypervisors to intercept
the reboot process, but it could be used to implement other x86
subarchtecture reboots.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Jeremy Fitzhardinge [Wed, 2 May 2007 17:27:11 +0000 (19:27 +0200)]
[PATCH] i386: Add smp_ops interface
Add a smp_ops interface. This abstracts the API defined by
<linux/smp.h> for use within arch/i386. The primary intent is that it
be used by a paravirtualizing hypervisor to implement SMP, but it
could also be used by non-APIC-using sub-architectures.
This is related to CONFIG_PARAVIRT, but is implemented unconditionally
since it is simpler that way and not a highly performance-sensitive
interface.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Rusty Russell [Wed, 2 May 2007 17:27:11 +0000 (19:27 +0200)]
[PATCH] i386: cleanup GDT Access
Now we have an explicit per-cpu GDT variable, we don't need to keep the
descriptors around to use them to find the GDT: expose cpu_gdt directly.
We could go further and make load_gdt() pack the descriptor for us, or even
assume it means "load the current cpu's GDT" which is what it always does.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Bernhard Walle [Wed, 2 May 2007 17:27:11 +0000 (19:27 +0200)]
[PATCH] x86-64: Fix "Section mismatch" compile warning
Fix "Section mismatch" warnings in arch/x86_64/kernel/time.c
Signed-off-by: Bernhard Walle <bwalle@suse.de>
Signed-off-by: Andi Kleen <ak@suse.de>
Jan Beulich [Wed, 2 May 2007 17:27:11 +0000 (19:27 +0200)]
[PATCH] x86-64: adjust EDID retrieval
commit
5e518d7672dea4cd7c60871e40d0490c52f01d13 did the same change to
i386's variant.
With this change, i386's and x86-64's versions are identical, raising
the question whether the x86-64 one should go (just like there's only
one instance of edd.S).
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Konrad Rzeszutek [Wed, 2 May 2007 17:27:11 +0000 (19:27 +0200)]
[PATCH] x86-64: Inhibit machine from asserting an NMI when doing Alt-SysRq-M operation.
This patch touches the NMI watchdog every MAX_ORDER_NR_PAGES
to inhibit the machine from triggering an NMI while the CPUs
are locked. This situation is happening on boxes with more
than 64CPUs and 128GB of RAM when Alt-SysRq-m is performed.
It has been succesfully tested for regression on uni, 2, 4, 8
32, and 64 CPU boxes with various memory configuration.
Signed-off-by: Andi Kleen <ak@suse.de>
Eric Dumazet [Wed, 2 May 2007 17:27:11 +0000 (19:27 +0200)]
[PATCH] x86-64: vsyscall_gtod_data diet and vgettimeofday() fix
Current vsyscall_gtod_data is large (3 or 4 cache lines dirtied at timer
interrupt). We can shrink it to exactly 64 bytes (1 cache line on AMD64)
Instead of copying a whole struct clocksource, we copy only needed fields.
I deleted an unused field : offset_base
This patch fixes one oddity in vgettimeofday(): It can returns a timeval with
tv_usec =
1000000. Maybe not a bug, but why not doing the right thing ?
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Eric Dumazet [Wed, 2 May 2007 17:27:11 +0000 (19:27 +0200)]
[PATCH] x86-64: fix vtime() vsyscall
There is a tiny probability that the return value from vtime(time_t *t) is
Signed-off-by: Andi Kleen <ak@suse.de>
different than the value stored in *t
Using a temporary variable solves the problem and gives a faster code.
17: 48 85 ff test %rdi,%rdi
1a: 48 8b 05 00 00 00 00 mov 0(%rip),%rax #
__vsyscall_gtod_data.wall_time_tv.tv_sec
21: 74 03 je 26
23: 48 89 07 mov %rax,(%rdi)
26: c9 leaveq
27: c3 retq
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Adrian Bunk [Wed, 2 May 2007 17:27:11 +0000 (19:27 +0200)]
[PATCH] x86: remove UNEXPECTED_IO_APIC()
Many years ago, UNEXPECTED_IO_APIC() contained printk()'s (but nothing more).
Now that it's completely empty for years, we can as well remove it.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andi Kleen <ak@suse.de>
Adrian Bunk [Wed, 2 May 2007 17:27:10 +0000 (19:27 +0200)]
[PATCH] x86: sys_ioperm() prototype cleanup
- there's no reason for duplicating the prototype from
include/linux/syscalls.h in include/asm-x86_64/unistd.h
- every file should #include the headers containing the prototypes for
it's global functions
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andi Kleen <ak@suse.de>
Christoph Lameter [Wed, 2 May 2007 17:27:10 +0000 (19:27 +0200)]
[PATCH] x86-64: use lru instead of page->index and page->private for pgd lists management.
x86_64 currently simulates a list using the index and private fields of the
page struct. Seems that the code was inherited from i386. But x86_64 does
not use the slab to allocate pgds and pmds etc. So the lru field is not
used by the slab and therefore available.
This patch uses standard list operations on page->lru to realize pgd
tracking.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Parag Warudkar [Wed, 2 May 2007 17:27:10 +0000 (19:27 +0200)]
[PATCH] i386: remove the APM_RTC_IS_GMT config option.
Signed-off-by: Parag Warudkar <parag.warudkar@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Andi Kleen [Wed, 2 May 2007 17:27:10 +0000 (19:27 +0200)]
[PATCH] x86-64: Remove unused stext symbol
suggested by Jan Beulich
Signed-off-by: Andi Kleen <ak@suse.de>
Andi Kleen [Wed, 2 May 2007 17:27:10 +0000 (19:27 +0200)]
[PATCH] i386: Use X86_EFLAGS_IF in irqflags.h.
Move X86_EFLAGS_IF et al out to a new header: processor-flags.h, so we
can include it from irqflags.h and use it in raw_irqs_disabled_flags().
As a side-effect, we could now use these flags in .S files.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andi Kleen <ak@suse.de>
Parag Warudkar [Wed, 2 May 2007 17:27:10 +0000 (19:27 +0200)]
[PATCH] i386: get rid of unused variables
Signed-off-by: Parag Warudkar <parag.warudkar@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Jan Beulich [Wed, 2 May 2007 17:27:10 +0000 (19:27 +0200)]
[PATCH] x86: tighten kernel image page access rights
On x86-64, kernel memory freed after init can be entirely unmapped instead
of just getting 'poisoned' by overwriting with a debug pattern.
On i386 and x86-64 (under CONFIG_DEBUG_RODATA), kernel text and bug table
can also be write-protected.
Compared to the first version, this one prevents re-creating deleted
mappings in the kernel image range on x86-64, if those got removed
previously. This, together with the original changes, prevents temporarily
having inconsistent mappings when cacheability attributes are being
changed on such pages (e.g. from AGP code). While on i386 such duplicate
mappings don't exist, the same change is done there, too, both for
consistency and because checking pte_present() before using various other
pte_XXX functions is a requirement anyway. At once, i386 code gets
adjusted to use pte_huge() instead of open coding this.
AK: split out cpa() changes
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Jan Beulich [Wed, 2 May 2007 17:27:10 +0000 (19:27 +0200)]
[PATCH] x86: Improve handling of kernel mappings in change_page_attr
Fix various broken corner cases in i386 and x86-64 change_page_attr.
AK: split off from tighten kernel image access rights
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Rusty Russell [Wed, 2 May 2007 17:27:10 +0000 (19:27 +0200)]
[PATCH] i386: rationalize paravirt wrappers
paravirt.c used to implement native versions of all low-level
functions. Far cleaner is to have the native versions exposed in the
headers and as inline native_XXX, and if !CONFIG_PARAVIRT, then simply
#define XXX native_XXX.
There are several nice side effects:
1) write_dt_entry() now takes the correct "struct Xgt_desc_struct *"
not "void *".
2) load_TLS is reintroduced to the for loop, not manually unrolled
with a #error in case the bounds ever change.
3) Macros become inlines, with type checking.
4) Access to the native versions is trivial for KVM, lguest, Xen and
others who might want it.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@muc.de>
Cc: Avi Kivity <avi@qumranet.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Sebastien Dugue [Wed, 2 May 2007 17:27:10 +0000 (19:27 +0200)]
[PATCH] i386: Rename boot_gdt_table to boot_gdt
Rename boot_gdt_table to boot_gdt to avoid the duplicate T(able).
Signed-off-by: Sebastien Dugue <sebastien.dugue@bull.net>
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Rusty Russell [Wed, 2 May 2007 17:27:10 +0000 (19:27 +0200)]
[PATCH] i386: clean up cpu_init()
We now have cpu_init() and secondary_cpu_init() doing nothing but calling
_cpu_init() with the same arguments. Rename _cpu_init() to cpu_init() and use
it as a replcement for secondary_cpu_init().
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Rusty Russell [Wed, 2 May 2007 17:27:10 +0000 (19:27 +0200)]
[PATCH] i386: Use per-cpu GDT immediately upon boot
Now we are no longer dynamically allocating the GDT, we don't need the
"cpu_gdt_table" at all: we can switch straight from "boot_gdt_table" to the
per-cpu GDT. This means initializing the cpu_gdt array in C.
The boot CPU uses the per-cpu var directly, then in smp_prepare_cpus() it
switches to the per-cpu copy just allocated. For secondary CPUs, the
early_gdt_descr is set to point directly to their per-cpu copy.
For UP the code is very simple: it keeps using the "per-cpu" GDT as per SMP,
but we never have to move.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Rusty Russell [Wed, 2 May 2007 17:27:10 +0000 (19:27 +0200)]
[PATCH] i386: Use per-cpu variables for GDT, PDA
Allocating PDA and GDT at boot is a pain. Using simple per-cpu variables adds
happiness (although we need the GDT page-aligned for Xen, which we do in a
followup patch).
[akpm@linux-foundation.org: build fix]
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Bernhard Walle [Wed, 2 May 2007 17:27:10 +0000 (19:27 +0200)]
[PATCH] x86: add command line length to boot protocol
Because the command line is increased to 2048 characters after 2.6.21, it's
not possible for boot loaders and userspace tools to determine the length
of the command line the kernel can understand. The benefit of knowing the
length is that users can be warned if the command line size is too long
which prevents surprise if things don't work after bootup.
This patch updates the boot protocol to contain a field called
"cmdline_size" that contain the length of the command line (excluding the
terminating zero).
The patch also adds missing fields (of protocol version 2.05) to the x86_64
setup code.
Signed-off-by: Bernhard Walle <bwalle@suse.de>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Alon Bar-Lev <alon.barlev@gmail.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Ian Campbell [Wed, 2 May 2007 17:27:09 +0000 (19:27 +0200)]
[PATCH] i386: Allow i386 crash kernels to handle x86_64 dumps
The specific case I am encountering is kdump under Xen with a 64 bit
hypervisor and 32 bit kernel/userspace. The dump created is 64 bit due to
the hypervisor but the dump kernel is 32 bit for maximum compatibility.
It's possibly less likely to be useful in a purely native scenario but I
see no reason to disallow it.
[akpm@linux-foundation.org: build fix]
Signed-off-by: Ian Campbell <ian.campbell@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Vivek Goyal <vgoyal@in.ibm.com>
Cc: Horms <horms@verge.net.au>
Cc: Magnus Damm <magnus.damm@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Rusty Russell [Wed, 2 May 2007 17:27:09 +0000 (19:27 +0200)]
[PATCH] x86-64: Introduce load_TLS to the "for" loop.
GCC (4.1 at least) unrolls it anyway, but I can't believe this code
was ever justifiable. (I've also submitted a patch which cleans up
i386, which is even uglier).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Rusty Russell [Wed, 2 May 2007 17:27:09 +0000 (19:27 +0200)]
[PATCH] i386: Initialize esp0 properly all the time
Whenever we schedule, __switch_to calls load_esp0 which does:
tss->esp0 = thread->esp0;
This is never initialized for the initial thread (ie "swapper"), so when we're
scheduling that, we end up setting esp0 to 0. This is fine: the swapper never
leaves ring 0, so this field is never used.
lguest, however, gets upset that we're trying to used an unmapped page as our
kernel stack. Rather than work around it there, let's initialize it.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Andrew Morton [Wed, 2 May 2007 17:27:09 +0000 (19:27 +0200)]
[PATCH] i386: VDSO_PRELINK warning fix
The lguest patches somehow managed to trigger this:
In file included from arch/i386/lguest/lguest.c:38:
include/asm/asm-offsets.h:67:1: warning: "VDSO_PRELINK" redefined
In file included from include/linux/elf.h:7,
from include/linux/module.h:15,
from include/linux/device.h:21,
from include/linux/interrupt.h:15,
from arch/i386/lguest/lguest.c:27:
include/asm/elf.h:140:1: warning: this is the location of the previous definition
I assume that using the same identifier twice was a bad idea..
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
David Rientjes [Wed, 2 May 2007 17:27:09 +0000 (19:27 +0200)]
[PATCH] x86-64: fake numa for cpusets document
Create a document to explain how to use numa=fake in conjunction with cpusets
for coarse memory resource management.
An attempt to get more awareness and testing for this feature.
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Paul Jackson <pj@sgi.com>
Cc: Christoph Lameter <clameter@engr.sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Joerg Roedel [Wed, 2 May 2007 17:27:09 +0000 (19:27 +0200)]
[PATCH] x86: remove constant_tsc reporting from /proc/cpuinfo' power flags
remove the reporting of the constant_tsc flag from the "power management"
field in /proc/cpuinfo. The NULL value there was replaced by "" because
the former would result in a printout of [8] if the flag is set.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
David Rientjes [Wed, 2 May 2007 17:27:09 +0000 (19:27 +0200)]
[PATCH] x86-64: fixed size remaining fake nodes
Extends the numa=fake x86_64 command-line option to split the remaining system
memory into nodes of fixed size. Any leftover memory is allocated to a final
node unless the command-line ends with a comma.
For example:
numa=fake=2*512,*128 gives two 512M nodes and the remaining system
memory is split into nodes of 128M each.
This is beneficial for systems where the exact size of RAM is unknown or not
necessarily relevant, but the size of the remaining nodes to be allocated is
known based on their capacity for resource management.
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Paul Jackson <pj@sgi.com>
Cc: Christoph Lameter <clameter@engr.sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
David Rientjes [Wed, 2 May 2007 17:27:09 +0000 (19:27 +0200)]
[PATCH] x86-64: split remaining fake nodes equally
Extends the numa=fake x86_64 command-line option to split the remaining
system memory into equal-sized nodes.
For example:
numa=fake=2*512,4* gives two 512M nodes and the remaining system
memory is split into four approximately equal
chunks.
This is beneficial for systems where the exact size of RAM is unknown or not
necessarily relevant, but the granularity with which nodes shall be allocated
is known.
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Paul Jackson <pj@sgi.com>
Cc: Christoph Lameter <clameter@engr.sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
David Rientjes [Wed, 2 May 2007 17:27:09 +0000 (19:27 +0200)]
[PATCH] x86-64: configurable fake numa node sizes
Extends the numa=fake x86_64 command-line option to allow for configurable
node sizes. These nodes can be used in conjunction with cpusets for coarse
memory resource management.
The old command-line option is still supported:
numa=fake=32 gives 32 fake NUMA nodes, ignoring the NUMA setup of the
actual machine.
But now you may configure your system for the node sizes of your choice:
numa=fake=2*512,1024,2*256
gives two 512M nodes, one 1024M node, two 256M nodes, and
the rest of system memory to a sixth node.
The existing hash function is maintained to support the various node sizes
that are possible with this implementation.
Each node of the same size receives roughly the same amount of available
pages, regardless of any reserved memory with its address range. The total
available pages on the system is calculated and divided by the number of equal
nodes to allocate. These nodes are then dynamically allocated and their
borders extended until such time as their number of available pages reaches
the required size.
Configurable node sizes are recommended when used in conjunction with cpusets
for memory control because it eliminates the overhead associated with scanning
the zonelists of many smaller full nodes on page_alloc().
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Paul Jackson <pj@sgi.com>
Cc: Christoph Lameter <clameter@engr.sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Ahmed S. Darwish [Wed, 2 May 2007 17:27:09 +0000 (19:27 +0200)]
[PATCH] i386: fix GDT's number of quadwords in comment
Fix comments to represent the true number of quadwords in GDT.
Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Adrian Bunk [Wed, 2 May 2007 17:27:09 +0000 (19:27 +0200)]
[PATCH] i386: vmi_pmd_clear() static
This patch makes the needlessly global vmi_pmd_clear() static.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Adrian Bunk [Wed, 2 May 2007 17:27:09 +0000 (19:27 +0200)]
[PATCH] x86-64: make simnow_init() static
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Yinghai Lu [Wed, 2 May 2007 17:27:08 +0000 (19:27 +0200)]
[PATCH] x86-64: remove extra smp_processor_id calling
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Ralf Baechle [Wed, 2 May 2007 17:27:08 +0000 (19:27 +0200)]
[PATCH] x86-64: fix ia32_binfmt.c build error
Reorder code to avoid multiple inclusion of elf.h.
#undef several symbols to avoid build errors over redefinitions.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>