Jeff Dike [Sun, 6 May 2007 21:51:31 +0000 (14:51 -0700)]
uml: remove unused x86_64 code
It turns out that essentially none of the x86_64 bugs.c is needed. So, we can
delete most of it.
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeff Dike [Sun, 6 May 2007 21:51:30 +0000 (14:51 -0700)]
uml: speed up page table walking
The previous page table walking code was horribly inefficient. This patch
replaces it with code taken from elsewhere in the kernel.
Forking from bash is now ~5% faster and page faults are handled ~10% faster.
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeff Dike [Sun, 6 May 2007 21:51:29 +0000 (14:51 -0700)]
uml: dump registers on ptrace or wait failure
Provide a register dump if handle_trap fails. Abstract out ptrace_dump_regs
since it now has two callers.
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeff Dike [Sun, 6 May 2007 21:51:29 +0000 (14:51 -0700)]
uml: drivers get release methods
Define release methods for the ubd and net drivers. They contain as much of
the remove methods as make sense. All error checking must have already been
done as well as anything else that might be holding a reference on the device
kobject.
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeff Dike [Sun, 6 May 2007 21:51:28 +0000 (14:51 -0700)]
uml: delete HOST_FRAME_SIZE
HOST_FRAME_SIZE isn't used any more. It has been replaced with MAX_REG_NR.
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeff Dike [Sun, 6 May 2007 21:51:27 +0000 (14:51 -0700)]
uml: irq locking commentary
Locking commentary.
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeff Dike [Sun, 6 May 2007 21:51:26 +0000 (14:51 -0700)]
uml: comment early boot locking
Commentary about missing locking.
Also got rid of uml_start because it was pointless.
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeff Dike [Sun, 6 May 2007 21:51:25 +0000 (14:51 -0700)]
uml: kernel segfaults should dump proper registers
If there's a segfault inside the kernel, we want a dump of the registers at
the point of the segfault, not the registers at the point of calling panic or
the last userspace registers.
sig_handler_common_skas now uses a static register set in the case of a
SIGSEGV to avoid messing up the process registers if the segfault turns out to
be non-fatal.
The architecture sigcontext-to-pt_regs copying code was repurposed to copy
data out of the SEGV stack frame.
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeff Dike [Sun, 6 May 2007 21:51:24 +0000 (14:51 -0700)]
uml: tidy fault code
Tidying in preparation for the segfault register dumping patch which follows.
void * pointers are changed to union uml_pt_regs *. This makes the types
match reality, except in arch_fixup, which is changed to operate on a union
uml_pt_regs. This fixes a bug in the call from segv_handler, which passes a
union uml_pt_regs, to segv, which expects to pass a struct sigcontext to
arch_fixup.
Whitespace and other style fixes.
There's also a errno printk fix.
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeff Dike [Sun, 6 May 2007 21:51:23 +0000 (14:51 -0700)]
uml: kernel_thread shouldn't panic
kernel_thread() should just return an error value on do_fork failure, not
panic.
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeff Dike [Sun, 6 May 2007 21:51:22 +0000 (14:51 -0700)]
uml: remove page_size()
userspace code used to have to call the kernelspace function page_size() in
order to determine the value of the kernel's PAGE_SIZE. Since this is now
available directly from kern_constants.h as UM_KERN_PAGE_SIZE, page_size() can
be deleted and calls changed to use the constant.
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeff Dike [Sun, 6 May 2007 21:51:21 +0000 (14:51 -0700)]
uml: tidy process.c
Clean up arch/um/kernel/process.c:
- lots of return(x); -> return x; conversions
- a number of the small functions are either unused, in which case they are
gone, along any declarations in a header, or could be made static.
- current_pid is ifdefed on CONFIG_MODE_TT and its declaration is ifdefed on
both CONFIG_MODE_TT and UML_CONFIG_MODE_TT because we don't know whether
it's being used in a userspace or kernel file.
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeff Dike [Sun, 6 May 2007 21:51:20 +0000 (14:51 -0700)]
uml: no locking needed in tls.c
Comment the lack of locking on a couple of globals.
Also fix the formatting of __setup_host_supports_tls.
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeff Dike [Sun, 6 May 2007 21:51:19 +0000 (14:51 -0700)]
uml: speed up exec
flush_thread doesn't need to do a full page table walk in order to clear the
address space. It knows what the end result needs to be, so it can call unmap
directly.
This results in a 10-20% speedup in an exec from bash.
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Davide Brini [Sun, 6 May 2007 21:51:16 +0000 (14:51 -0700)]
uml: fix umid in xterm titles
Calls lines_init() *after* xterm_title is modified to include umid.
Signed-off-by: Davide Brini <davide.brini@unibo.it>
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Acked-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Paolo 'Blaisorblade' Giarrusso [Sun, 6 May 2007 21:51:15 +0000 (14:51 -0700)]
uml: Replace one-element array with zero-element array
To look at users I did:
$ find arch/um/ include/asm-um -name '*.[ch]'|xargs grep -r 'net_kern\.h'
+-l|xargs grep '\<user\>'
Most users just cast user to the appropriate pointer, the remaining ones are
fixed here. In net_kern.c, I'm almost sure that save trick is not needed
anymore, but I've not verified it.
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Paolo 'Blaisorblade' Giarrusso [Sun, 6 May 2007 21:51:14 +0000 (14:51 -0700)]
uml: Eliminate temporary buffer in eth_configure
Avoid using the temporary buffer introduced by previous patch to hold the
device name.
Btw, avoid leaking device on an error path. Other error paths may need
cleanup.
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Paolo 'Blaisorblade' Giarrusso [Sun, 6 May 2007 21:51:13 +0000 (14:51 -0700)]
uml: improve checking and diagnostics of ethernet MACs
Improve checking and diagnostics for broadcast and multicast Ethernet MAC
addresses, and distinguish between those cases in output; also make sure the
device is assigned a MAC address valid only locally to avoid collisions.
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Robert P. J. Day [Sun, 6 May 2007 21:51:12 +0000 (14:51 -0700)]
remove unused header file: arch/um/kernel/tt/include/mode_kern-tt.h
Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Acked-by: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeff Dike [Sun, 6 May 2007 21:51:11 +0000 (14:51 -0700)]
uml: add missing __init declarations
The build started finding calls from non-init to init functions. These are
just cases of init functions not being properly marked, so this patch fixes
that.
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeff Dike [Sun, 6 May 2007 21:51:10 +0000 (14:51 -0700)]
uml: remove user_util.h
user_util.h isn't needed any more, so delete it and remove all includes of it.
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeff Dike [Sun, 6 May 2007 21:51:09 +0000 (14:51 -0700)]
uml: move remaining useful contents of user_util.h
Rescue the useful contents of the soon-to-be-gone user-util.h.
pty.c now gets ptsname from stdlib.h like it should have always done.
CATCH_EINTR is now in os.h, although perhaps all usage should be under
os-Linux at some point.
get_pty is also in os.h.
This patch restores the old definition of ARRAY_SIZE in user.h. This file is
included only in userspace files, so there will be no conflict with the
kernel's new ARRAY_SIZE. The copy of the kernel's ARRAY_SIZE and associated
infrastructure is now gone.
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeff Dike [Sun, 6 May 2007 21:51:08 +0000 (14:51 -0700)]
uml: create as-layout.h
This patch moves all the the symbols defined in um_arch.c, which are mostly
boundaries between different parts of the UML kernel address space, to a new
header, as-layout.h. There are also a few things here which aren't really
related to address space layout, but which don't really have a better place to
go.
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeff Dike [Sun, 6 May 2007 21:51:07 +0000 (14:51 -0700)]
uml: create arch.h
This patch moves the declarations of the architecture hooks from user_util.h
to a new header, arch.c, and adds the necessary includes to files which need
those declarations.
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeff Dike [Sun, 6 May 2007 21:51:06 +0000 (14:51 -0700)]
uml: move SIGIO testing to sigio.c
This patch narrows the sigio interface. The boot-time SIGIO testing used to
be in start_up.c, which meant that pty_output_sigio and pty_close_sigio needed
to be global. By moving that code here, those can become static and the
declarations moved from user_util.h.
os_check_bugs is also here because it only does the SIGIO checking. If it
does more, it'll probably move back to start_up.c.
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rusty Russell [Sun, 6 May 2007 21:51:05 +0000 (14:51 -0700)]
ARRAY_SIZE: check for type
We can use a gcc extension to ensure that ARRAY_SIZE() is handed an array,
not a pointer. This is especially important when code is changed from a
fixed array to a pointer. I assume the Intel compiler doesn't support
__builtin_types_compatible_p.
[jdike@addtoit.com: uml: update UML definition of ARRAY_SIZE]
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeff Dike [Sun, 6 May 2007 21:51:04 +0000 (14:51 -0700)]
uml: network interface hotplug error handling
This fixes a number of problems associated with network interface hotplug.
The userspace initialization function can fail in some cases, but the
failure was never passed back to eth_configure, which proceeded with the
configuration. This results in a zombie device that is present, but can't
work. This is fixed by allowing the initialization routines to return an
error, which is checked, and the configuration aborted on failure.
eth_configure failed to check for many failures. Even when it did check,
it didn't undo whatever initializations has already happened, so a present,
but partially initialized and non-working device could result. It now
checks everything that can fail, and bails out, undoing whatever had been
done.
The return value of eth_configure was always ignored, so it is now just
void.
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morton [Sun, 6 May 2007 21:51:03 +0000 (14:51 -0700)]
uml-driver-formatting-fixes-fix
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeff Dike [Sun, 6 May 2007 21:51:02 +0000 (14:51 -0700)]
uml: driver formatting fixes
Fix a bunch of formatting violations in the drivers:
return(n) -> return n
whitespace fixes
emacs formatting comment removal
breaking if(foo) return(n) into two lines
There are also a couple of errno use bugs:
using errno in a printk when the failure put errno into a local variable
saving errno after a printk, which can change it
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeff Dike [Sun, 6 May 2007 21:51:01 +0000 (14:51 -0700)]
uml: handle block device hotplug errors
If a disk fails to open, i.e. its host file doesn't exist, it won't be
removable because the hot-unplug code checks the existence of its gendisk.
This won't exist because it is only allocated for successfully opened disks.
Thus, a typo on the command line can result in a unusable and unfixable disk.
This is fixed by freeing the gendisk if it's there, but not letting that
affect the removal.
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeff Dike [Sun, 6 May 2007 21:51:00 +0000 (14:51 -0700)]
uml: print coredump limits
Print out core dump limits at boot time. This is to allow core dumps
to be collected if something goes very wrong and to tell if a core
dump isn't going to happen because of a resource limit.
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeff Dike [Sun, 6 May 2007 21:51:00 +0000 (14:51 -0700)]
uml: mark tt-mode code for future removal
Mark some tt-mode-only code as such.
Also cleaned up some formatting.
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeff Dike [Sun, 6 May 2007 21:50:59 +0000 (14:50 -0700)]
uml: host_info tidying
Move the host_info string from util.c to um_arch.c, where it is
actually initialized and used. Also document its lack of locking.
Signed-off-by: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeff Dike [Sun, 6 May 2007 21:50:58 +0000 (14:50 -0700)]
uml: formatting fixes
Formatting fixes -
style violations
whitespace breakage
emacs formatting comment removal
Signed-off-by: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeff Dike [Sun, 6 May 2007 21:50:57 +0000 (14:50 -0700)]
uml: delete unused code
Get rid of a bunch of unused stuff -
cpu_feature had no users
linux_prog is little-used, so its declaration is moved to the
user for easy deletion when the whole file goes away
a long-unused debugging aid in helper.c is gone
Signed-off-by: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Robert P. J. Day [Sun, 6 May 2007 21:50:56 +0000 (14:50 -0700)]
CRIS: remove code related to pre-2.2 kernel
Remove conditionals and code related to checking for a pre-2.2 kernel.
Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cyrill Gorcunov [Sun, 6 May 2007 21:50:56 +0000 (14:50 -0700)]
CRIS: check for memory allocation
Add checking for allocated memory. Indents and spaces are added to be
familiar with the kernel coding style.
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Acked-by: Mikael Starvik <starvik@axis.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Robert P. J. Day [Sun, 6 May 2007 21:50:55 +0000 (14:50 -0700)]
remove unused header file: drivers/serial/crisv10.h
Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Cc: Mikael Starvik <starvik@axis.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Milind Arun Choudhary [Sun, 6 May 2007 21:50:54 +0000 (14:50 -0700)]
SPIN_LOCK_UNLOCKED cleanup in arch/m68k
SPIN_LOCK_UNLOCKED cleanup,use __SPIN_LOCK_UNLOCKED instead
Signed-off-by: Milind Arun Choudhary <milindchoudhary@gmail.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Robert P. J. Day [Sun, 6 May 2007 21:50:53 +0000 (14:50 -0700)]
remove unused header file: arch/m68k/atari/atasound.h
Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rafael J. Wysocki [Sun, 6 May 2007 21:50:52 +0000 (14:50 -0700)]
swsusp: free more memory
Move the definition of PAGES_FOR_IO to kernel/power/power.h and introduce
SPARE_PAGES representing the number of pages that should be freed by the
swsusp's memory shrinker in addition to PAGES_FOR_IO so that device drivers
can allocate some memory (up to 1 MB total) in their .suspend() routines
without causing the suspend to fail.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Cc: Nigel Cunningham <nigel@nigel.suspend2.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rafael J. Wysocki [Sun, 6 May 2007 21:50:51 +0000 (14:50 -0700)]
swsusp: fix snapshot_release
Remove the leftover enable_nonboot_cpus() from snapshot_release().
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Nigel Cunningham <nigel@nigel.suspend2.net>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Brownell [Sun, 6 May 2007 21:50:50 +0000 (14:50 -0700)]
kconfig: mention 'hibernation' not just swsusp
Clarify that "software suspend" is what's called "hibernation" in most user
interfaces, shrinking a terminology gap. (Examples include Gnome and
MS-Windows.)
Also provide a more succinct description of what it does, so you won't have
to read the whole novel in Kconfig; and highlights just why the lack of
BIOS requirements for swsusp are a big deal.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Johannes Berg [Sun, 6 May 2007 21:50:50 +0000 (14:50 -0700)]
power management: change /sys/power/disk display
Change /sys/power/disk to display all valid modes as well as the currently
selected one in a fashion known from the LED subsystem.
This changes userspace API, but it is apparently not used much (we asked
some userspace developers)
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Johannes Berg [Sun, 6 May 2007 21:50:49 +0000 (14:50 -0700)]
remove software_suspend()
Remove software_suspend() and all its users since
pm_suspend(PM_SUSPEND_DISK) should be equivalent and there's no point in
having two interfaces for the same thing.
The patch also changes the valid_state function to return 0 (false) for
PM_SUSPEND_DISK when SOFTWARE_SUSPEND is not configured instead of
accepting it and having the whole thing fail later.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rafael J. Wysocki [Sun, 6 May 2007 21:50:48 +0000 (14:50 -0700)]
freezer: fix racy usage of try_to_freeze in kswapd
Currently we can miss freeze_process()->signal_wake_up() in kswapd() if it
happens between try_to_freeze() and prepare_to_wait(). To prevent this
from happening we should check freezing(current) before calling schedule().
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rafael J. Wysocki [Sun, 6 May 2007 21:50:47 +0000 (14:50 -0700)]
swsusp: use rbtree for tracking allocated swap
Make swsusp use extents instead of a bitmap to trace swap pages allocated
for saving the image (the tracking is only needed in case there's an error,
so that the allocated swap pages can be released).
This should allow us to reduce the memory usage, practically always, and
improve performance.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Nigel Cunningham <nigel@nigel.suspend2.net>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rafael J. Wysocki [Sun, 6 May 2007 21:50:46 +0000 (14:50 -0700)]
freezer: remove PF_NOFREEZE from handle_initrd
Make handle_initrd() call try_to_freeze() in a suitable place instead of setting
PF_NOFREEZE for the current task.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Nigel Cunningham <nigel@nigel.suspend2.net>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rafael J. Wysocki [Sun, 6 May 2007 21:50:45 +0000 (14:50 -0700)]
swsusp: use GFP_KERNEL for creating basic data structures
Make swsusp call create_basic_memory_bitmaps() before processes are frozen, so
that GFP_KERNEL allocations can be made in it. Additionally, ensure that the
swsusp's userland interface won't be used while either pm_suspend_disk() or
software_resume() is being executed.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rafael J. Wysocki [Sun, 6 May 2007 21:50:44 +0000 (14:50 -0700)]
swsusp: fix error paths in snapshot_open
We forget to increase device_available if there's an error in snapshot_open(),
so the snapshot device cannot be open at all after snapshot_open() has
returned an error.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rafael J. Wysocki [Sun, 6 May 2007 21:50:43 +0000 (14:50 -0700)]
mm: remove unused page flags
Remove the two page flags that were previously used by swsusp and are no
longer needed.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rafael J. Wysocki [Sun, 6 May 2007 21:50:43 +0000 (14:50 -0700)]
swsusp: do not use page flags
Make swsusp use memory bitmaps instead of page flags for marking 'nosave' and
free pages. This allows us to 'recycle' two page flags that can be used for
other purposes. Also, the memory needed to store the bitmaps is allocated
when necessary (ie. before the suspend) and freed after the resume which is
more reasonable.
The patch is designed to minimize the amount of changes and there are some
nice simplifications and optimizations possible on top of it. I am going to
implement them separately in the future.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rafael J. Wysocki [Sun, 6 May 2007 21:50:42 +0000 (14:50 -0700)]
swsusp: use inline functions for changing page flags
Replace direct invocations of SetPageNosave(), SetPageNosaveFree() etc. with
calls to inline functions that can be changed in subsequent patches without
modifying the code calling them.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oleg Nesterov [Sun, 6 May 2007 21:50:40 +0000 (14:50 -0700)]
fix refrigerator() vs thaw_process() race
refrigerator() can miss a wakeup, "wait event" loop needs a proper memory
ordering.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Robert P. J. Day [Sun, 6 May 2007 21:50:39 +0000 (14:50 -0700)]
ARM26: remove useless config option GENERIC_BUST_SPINLOCK.
Remove the apparently useless config option GENERIC_BUST_SPINLOCK,
since nothing in the source tree refers to it.
Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Acked-by: Ian Molton <spyro@f2s.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morton [Sun, 6 May 2007 21:50:39 +0000 (14:50 -0700)]
srmcons: fix kmalloc(GFP_KERNEL) inside spinlock
Fixes http://bugzilla.kernel.org/show_bug.cgi?id=8341
Cc: <matthias.kaehlcke@gmail.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ivan Kokshaysky [Sun, 6 May 2007 21:50:38 +0000 (14:50 -0700)]
ALPHA: "prctl" macros
Files:
include/asm-alpha/thread_info.h
Provide "prctl" macros for ALPHA.
Signed-off-by: Jay Estabrook <jay.estabrook@hp.com>
Signed-off-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ivan Kokshaysky [Sun, 6 May 2007 21:50:37 +0000 (14:50 -0700)]
ALPHA: fix BOOTP image creation
Files:
arch/alpha/boot/bootpz.c
Create a dummy "__kmalloc()" to satisfy the loader; never called.
arch/alpha/boot/tools/objstrip.c
Remove an include that is now (2.6.x) unnecessary.
Signed-off-by: Jay Estabrook <jay.estabrook@hp.com>
Signed-off-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Milind Arun Choudhary [Sun, 6 May 2007 21:50:36 +0000 (14:50 -0700)]
ROUND_UP macro cleanup in arch/alpha/kernel/osf_sys.c
ROUND_UP macro cleanup use ALIGN
Signed-off-by: Milind Arun Choudhary <milindchoudhary@gmail.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yoshinori Sato [Sun, 6 May 2007 21:50:36 +0000 (14:50 -0700)]
h8300: add zImage support
h8300 zImage target support.
Signed-off-by: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yoshinori Sato [Sun, 6 May 2007 21:50:35 +0000 (14:50 -0700)]
h8300 generic irq
h8300 using generic irq handler patch.
Signed-off-by: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
john stultz [Sun, 6 May 2007 21:50:34 +0000 (14:50 -0700)]
Convert h8/300 to generic timekeeping
Currently h8/300 does not implement sub-jiffy timekeeping, so there is no
benefit to having arch specific timekeeping code.
This patch simply removes those functions and enables the generic
timekeeping code.
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Acked-by: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Wu, Bryan [Sun, 6 May 2007 21:50:34 +0000 (14:50 -0700)]
Blackfin: blackfin on-chip SPI controller driver
This patch implements the driver necessary use the Analog Devices Blackfin
processor's SPI Port.
Signed-off-by: Bryan Wu <bryan.wu@analog.com>
Cc: David Brownell <david-b@pacbell.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Wu, Bryan [Sun, 6 May 2007 21:50:32 +0000 (14:50 -0700)]
Blackfin: on-chip RTC controller driver
This patch implements the driver necessary use the Analog Devices Blackfin
processor's on-chip RTC controller.
Signed-off-by: Bryan Wu <bryan.wu@analog.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: David Brownell <david-b@pacbell.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Wu, Bryan [Sun, 6 May 2007 21:50:32 +0000 (14:50 -0700)]
Blackfin: add blackfin support in smc91x ethernet controller driver
As SMC91X ethernet controller are used in blackfin STAMP 533 development
board, this patch add blackfin support to the smc91x linux driver.
Signed-off-by: Bryan Wu <bryan.wu@analog.com>
Acked-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bryan Wu [Sun, 6 May 2007 21:50:30 +0000 (14:50 -0700)]
blackfin: serial driver
This patch implements the driver necessary use the Analog Devices Blackfin
processor's Serial Port.
Signed-off-by: Bryan Wu <bryan.wu@analog.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Russell King <rmk+lkml@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bryan Wu [Sun, 6 May 2007 21:50:22 +0000 (14:50 -0700)]
blackfin architecture
This adds support for the Analog Devices Blackfin processor architecture, and
currently supports the BF533, BF532, BF531, BF537, BF536, BF534, and BF561
(Dual Core) devices, with a variety of development platforms including those
avaliable from Analog Devices (BF533-EZKit, BF533-STAMP, BF537-STAMP,
BF561-EZKIT), and Bluetechnix! Tinyboards.
The Blackfin architecture was jointly developed by Intel and Analog Devices
Inc. (ADI) as the Micro Signal Architecture (MSA) core and introduced it in
December of 2000. Since then ADI has put this core into its Blackfin
processor family of devices. The Blackfin core has the advantages of a clean,
orthogonal,RISC-like microprocessor instruction set. It combines a dual-MAC
(Multiply/Accumulate), state-of-the-art signal processing engine and
single-instruction, multiple-data (SIMD) multimedia capabilities into a single
instruction-set architecture.
The Blackfin architecture, including the instruction set, is described by the
ADSP-BF53x/BF56x Blackfin Processor Programming Reference
http://blackfin.uclinux.org/gf/download/frsrelease/29/2549/Blackfin_PRM.pdf
The Blackfin processor is already supported by major releases of gcc, and
there are binary and source rpms/tarballs for many architectures at:
http://blackfin.uclinux.org/gf/project/toolchain/frs There is complete
documentation, including "getting started" guides available at:
http://docs.blackfin.uclinux.org/ which provides links to the sources and
patches you will need in order to set up a cross-compiling environment for
bfin-linux-uclibc
This patch, as well as the other patches (toolchain, distribution,
uClibc) are actively supported by Analog Devices Inc, at:
http://blackfin.uclinux.org/
We have tested this on LTP, and our test plan (including pass/fails) can
be found at:
http://docs.blackfin.uclinux.org/doku.php?id=testing_the_linux_kernel
[m.kozlowski@tuxland.pl: balance parenthesis in blackfin header files]
Signed-off-by: Bryan Wu <bryan.wu@analog.com>
Signed-off-by: Mariusz Kozlowski <m.kozlowski@tuxland.pl>
Signed-off-by: Aubrey Li <aubrey.li@analog.com>
Signed-off-by: Jie Zhang <jie.zhang@analog.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Roland McGrath [Sun, 6 May 2007 21:50:20 +0000 (14:50 -0700)]
Return EPERM not ECHILD on security_task_wait failure
wait* syscalls return -ECHILD even when an individual PID of a live child
was requested explicitly, when security_task_wait denies the operation.
This means that something like a broken SELinux policy can produce an
unexpected failure that looks just like a bug with wait or ptrace or
something.
This patch makes do_wait return -EACCES (or other appropriate error returned
from security_task_wait() instead of -ECHILD if some children were ruled out
solely because security_task_wait failed.
[jmorris@namei.org: switch error code to EACCES]
Signed-off-by: Roland McGrath <roland@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Christoph Lameter [Sun, 6 May 2007 21:50:20 +0000 (14:50 -0700)]
page migration: Only migrate pages if allocation in the highest zone is possible
Address spaces contain an allocation flag that specifies restriction on the
zone for pages placed in the mapping. I.e. some device may require pages
to be allocated from a DMA zone. Block devices may not be able to use
pages from HIGHMEM.
Memory policies and the common use of page migration works only on the
highest zone. If the address space does not allow allocation from the
highest zone then the pages in the address space are not migratable simply
because we can only allocate memory for a specified node if we allow
allocation for the highest zone on each node.
Acked-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Akinobu Mita [Sun, 6 May 2007 21:50:19 +0000 (14:50 -0700)]
slob: fix page order calculation on not 4KB page
SLOB doesn't calculate correct page order when page size is not 4KB. This
patch fixes it with using get_order() instead of find_order() which is SLOB
version of get_order().
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Acked-by: Matt Mackall <mpm@selenic.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Akinobu Mita [Sun, 6 May 2007 21:50:18 +0000 (14:50 -0700)]
hugetlbfs: add NULL check in hugetlb_zero_setup()
If hugetlbfs module_init() fails, hugetlbfs_vfsmount is not initialized and
shmget() with SHM_HUGETLB flag will cause NULL pointer dereference.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Acked-by: William Irwin <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Christoph Lameter [Sun, 6 May 2007 21:50:17 +0000 (14:50 -0700)]
Slab allocators: remove useless __GFP_NO_GROW flag
There is no user remaining and I have never seen any use of that flag.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Christoph Lameter [Sun, 6 May 2007 21:50:17 +0000 (14:50 -0700)]
slab allocators: Remove SLAB_CTOR_ATOMIC
SLAB_CTOR atomic is never used which is no surprise since I cannot imagine
that one would want to do something serious in a constructor or destructor.
In particular given that the slab allocators run with interrupts disabled.
Actions in constructors and destructors are by their nature very limited
and usually do not go beyond initializing variables and list operations.
(The i386 pgd ctor and dtors do take a spinlock in constructor and
destructor..... I think that is the furthest we go at this point.)
There is no flag passed to the destructor so removing SLAB_CTOR_ATOMIC also
establishes a certain symmetry.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Christoph Lameter [Sun, 6 May 2007 21:50:16 +0000 (14:50 -0700)]
slab allocators: Remove SLAB_DEBUG_INITIAL flag
I have never seen a use of SLAB_DEBUG_INITIAL. It is only supported by
SLAB.
I think its purpose was to have a callback after an object has been freed
to verify that the state is the constructor state again? The callback is
performed before each freeing of an object.
I would think that it is much easier to check the object state manually
before the free. That also places the check near the code object
manipulation of the object.
Also the SLAB_DEBUG_INITIAL callback is only performed if the kernel was
compiled with SLAB debugging on. If there would be code in a constructor
handling SLAB_DEBUG_INITIAL then it would have to be conditional on
SLAB_DEBUG otherwise it would just be dead code. But there is no such code
in the kernel. I think SLUB_DEBUG_INITIAL is too problematic to make real
use of, difficult to understand and there are easier ways to accomplish the
same effect (i.e. add debug code before kfree).
There is a related flag SLAB_CTOR_VERIFY that is frequently checked to be
clear in fs inode caches. Remove the pointless checks (they would even be
pointless without removeal of SLAB_DEBUG_INITIAL) from the fs constructors.
This is the last slab flag that SLUB did not support. Remove the check for
unimplemented flags from SLUB.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Benjamin Herrenschmidt [Sun, 6 May 2007 21:50:14 +0000 (14:50 -0700)]
get_unmapped_area doesn't need hugetlbfs hacks anymore
Remove the hugetlbfs specific hacks in toplevel get_unmapped_area() now that
all archs and hugetlbfs itself do the right thing for both cases.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: William Irwin <bill.irwin@oracle.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: David Howells <dhowells@redhat.com>
Cc: Andi Kleen <ak@suse.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Grant Grundler <grundler@parisc-linux.org>
Cc: Matthew Wilcox <willy@debian.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Adam Litke <agl@us.ibm.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Benjamin Herrenschmidt [Sun, 6 May 2007 21:50:13 +0000 (14:50 -0700)]
get_unmapped_area handles MAP_FIXED in generic code
generic arch_get_unmapped_area() now handles MAP_FIXED. Now that all
implementations have been fixed, change the toplevel get_unmapped_area() to
call into arch or drivers for the MAP_FIXED case.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: David Howells <dhowells@redhat.com>
Cc: Andi Kleen <ak@suse.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Grant Grundler <grundler@parisc-linux.org>
Cc: Matthew Wilcox <willy@debian.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: William Irwin <bill.irwin@oracle.com>
Cc: Adam Litke <agl@us.ibm.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Benjamin Herrenschmidt [Sun, 6 May 2007 21:50:12 +0000 (14:50 -0700)]
get_unmapped_area handles MAP_FIXED in hugetlbfs
Generic hugetlb_get_unmapped_area() now handles MAP_FIXED by just calling
prepare_hugepage_range()
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: William Irwin <bill.irwin@oracle.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: David Howells <dhowells@redhat.com>
Cc: Andi Kleen <ak@suse.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Grant Grundler <grundler@parisc-linux.org>
Cc: Matthew Wilcox <willy@debian.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Adam Litke <agl@us.ibm.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Benjamin Herrenschmidt [Sun, 6 May 2007 21:50:11 +0000 (14:50 -0700)]
get_unmapped_area handles MAP_FIXED on x86_64
Handle MAP_FIXED in x86_64 arch_get_unmapped_area(), simple case, just return
the address as passed in
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Benjamin Herrenschmidt [Sun, 6 May 2007 21:50:10 +0000 (14:50 -0700)]
get_unmapped_area handles MAP_FIXED on sparc64
Handle MAP_FIXED in hugetlb_get_unmapped_area on sparc64 by just using
prepare_hugepage_range()
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: William Irwin <bill.irwin@oracle.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Benjamin Herrenschmidt [Sun, 6 May 2007 21:50:09 +0000 (14:50 -0700)]
get_unmapped_area handles MAP_FIXED on parisc
Handle MAP_FIXED in parisc arch_get_unmapped_area(), just return the address.
We might want to also check for possible cache aliasing issues now that we get
called in that case (like ARM or MIPS), leave a comment for the maintainers to
pick up.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Grant Grundler <grundler@parisc-linux.org>
Cc: Matthew Wilcox <willy@debian.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Benjamin Herrenschmidt [Sun, 6 May 2007 21:50:09 +0000 (14:50 -0700)]
get_unmapped_area handles MAP_FIXED on ia64
Handle MAP_FIXED in ia64 arch_get_unmapped_area and
hugetlb_get_unmapped_area(), just call prepare_hugepage_range in the later and
is_hugepage_only_range() in the former.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: William Irwin <bill.irwin@oracle.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Benjamin Herrenschmidt [Sun, 6 May 2007 21:50:08 +0000 (14:50 -0700)]
get_unmapped_area handles MAP_FIXED on i386
Handle MAP_FIXED in i386 hugetlb_get_unmapped_area(), just call
prepare_hugepage_range.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: William Irwin <bill.irwin@oracle.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Adam Litke <agl@us.ibm.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Benjamin Herrenschmidt [Sun, 6 May 2007 21:50:07 +0000 (14:50 -0700)]
get_unmapped_area handles MAP_FIXED on frv
Handle MAP_FIXED in arch_get_unmapped_area on frv. Trivial case, just return
the address.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Benjamin Herrenschmidt [Sun, 6 May 2007 21:50:07 +0000 (14:50 -0700)]
get_unmapped_area handles MAP_FIXED on arm
ARM already had a case for MAP_FIXED in arch_get_unmapped_area() though it was
not called before. Fix the comment to reflect that it will now be called.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Benjamin Herrenschmidt [Sun, 6 May 2007 21:50:06 +0000 (14:50 -0700)]
get_unmapped_area handles MAP_FIXED on alpha
Handle MAP_FIXED in alpha's arch_get_unmapped_area(), simple case, just return
the address as passed in
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Benjamin Herrenschmidt [Sun, 6 May 2007 21:50:02 +0000 (14:50 -0700)]
get_unmapped_area handles MAP_FIXED on powerpc
The current get_unmapped_area code calls the f_ops->get_unmapped_area or the
arch one (via the mm) only when MAP_FIXED is not passed. That makes it
impossible for archs to impose proper constraints on regions of the virtual
address space. To work around that, get_unmapped_area() then calls some
hugetlbfs specific hacks.
This cause several problems, among others:
- It makes it impossible for a driver or filesystem to do the same thing
that hugetlbfs does (for example, to allow a driver to use larger page sizes
to map external hardware) if that requires applying a constraint on the
addresses (constraining that mapping in certain regions and other mappings
out of those regions).
- Some archs like arm, mips, sparc, sparc64, sh and sh64 already want
MAP_FIXED to be passed down in order to deal with aliasing issues. The code
is there to handle it... but is never called.
This series of patches moves the logic to handle MAP_FIXED down to the various
arch/driver get_unmapped_area() implementations, and then changes the generic
code to always call them. The hugetlbfs hacks then disappear from the generic
code.
Since I need to do some special 64K pages mappings for SPEs on cell, I need to
work around the first problem at least. I have further patches thus
implementing a "slices" layer that handles multiple page sizes through slices
of the address space for use by hugetlbfs, the SPE code, and possibly others,
but it requires that serie of patches first/
There is still a potential (but not practical) issue due to the fact that
filesystems/drivers implemeting g_u_a will effectively bypass all arch checks.
This is not an issue in practice as the only filesystems/drivers using that
hook are doing so for arch specific purposes in the first place.
There is also a problem with mremap that will completely bypass all arch
checks. I'll try to address that separately, I'm not 100% certain yet how,
possibly by making it not work when the vma has a file whose f_ops has a
get_unmapped_area callback, and by making it use is_hugepage_only_range()
before expanding into a new area.
Also, I want to turn is_hugepage_only_range() into a more generic
is_normal_page_range() as that's really what it will end up meaning when used
in stack grow, brk grow and mremap.
None of the above "issues" however are introduced by this patch, they are
already there, so I think the patch can go ini for 2.6.22.
This patch:
Handle MAP_FIXED in powerpc's arch_get_unmapped_area() in all 3
implementations of it.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: William Irwin <bill.irwin@oracle.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: David Howells <dhowells@redhat.com>
Cc: Andi Kleen <ak@suse.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Grant Grundler <grundler@parisc-linux.org>
Cc: Matthew Wilcox <willy@debian.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Adam Litke <agl@us.ibm.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Rientjes [Sun, 6 May 2007 21:50:00 +0000 (14:50 -0700)]
oom: fix constraint deadlock
Fixes a deadlock in the OOM killer for allocations that are not
__GFP_HARDWALL.
Before the OOM killer checks for the allocation constraint, it takes
callback_mutex.
constrained_alloc() iterates through each zone in the allocation zonelist
and calls cpuset_zone_allowed_softwall() to determine whether an allocation
for gfp_mask is possible. If a zone's node is not in the OOM-triggering
task's mems_allowed, it is not exiting, and we did not fail on a
__GFP_HARDWALL allocation, cpuset_zone_allowed_softwall() attempts to take
callback_mutex to check the nearest exclusive ancestor of current's cpuset.
This results in deadlock.
We now take callback_mutex after iterating through the zonelist since we
don't need it yet.
Cc: Andi Kleen <ak@suse.de>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Christoph Lameter <clameter@engr.sgi.com>
Cc: Martin J. Bligh <mbligh@mbligh.org>
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yasunori Goto [Sun, 6 May 2007 21:49:59 +0000 (14:49 -0700)]
mm: fix handling of panic_on_oom when cpusets are in use
The current panic_on_oom may not work if there is a process using
cpusets/mempolicy, because other nodes' memory may remain. But some people
want failover by panic ASAP even if they are used. This patch makes new
setting for its request.
This is tested on my ia64 box which has 3 nodes.
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Ethan Solomita <solo@google.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Akinobu Mita [Sun, 6 May 2007 21:49:58 +0000 (14:49 -0700)]
fault injection: fix failslab with CONFIG_NUMA
Currently failslab injects failures into ____cache_alloc(). But with enabling
CONFIG_NUMA it's not enough to let actual slab allocator functions (kmalloc,
kmem_cache_alloc, ...) return NULL.
This patch moves fault injection hook inside of __cache_alloc() and
__cache_alloc_node(). These are lower call path than ____cache_alloc() and
enable to inject faulures to slab allocators with CONFIG_NUMA.
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Christoph Lameter [Sun, 6 May 2007 21:49:58 +0000 (14:49 -0700)]
slab allocators: remove multiple alignment specifications
It is not necessary to tell the slab allocators to align to a cacheline
if an explicit alignment was already specified. It is rather confusing
to specify multiple alignments.
Make sure that the call sites only use one form of alignment.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Christoph Lameter [Sun, 6 May 2007 21:49:57 +0000 (14:49 -0700)]
KMEM_CACHE(): simplify slab cache creation
This patch provides a new macro
KMEM_CACHE(<struct>, <flags>)
to simplify slab creation. KMEM_CACHE creates a slab with the name of the
struct, with the size of the struct and with the alignment of the struct.
Additional slab flags may be specified if necessary.
Example
struct test_slab {
int a,b,c;
struct list_head;
} __cacheline_aligned_in_smp;
test_slab_cache = KMEM_CACHE(test_slab, SLAB_PANIC)
will create a new slab named "test_slab" of the size sizeof(struct
test_slab) and aligned to the alignment of test slab. If it fails then we
panic.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Christoph Lameter [Sun, 6 May 2007 21:49:56 +0000 (14:49 -0700)]
slab allocators: Remove obsolete SLAB_MUST_HWCACHE_ALIGN
This patch was recently posted to lkml and acked by Pekka.
The flag SLAB_MUST_HWCACHE_ALIGN is
1. Never checked by SLAB at all.
2. A duplicate of SLAB_HWCACHE_ALIGN for SLUB
3. Fulfills the role of SLAB_HWCACHE_ALIGN for SLOB.
The only remaining use is in sparc64 and ppc64 and their use there
reflects some earlier role that the slab flag once may have had. If
its specified then SLAB_HWCACHE_ALIGN is also specified.
The flag is confusing, inconsistent and has no purpose.
Remove it.
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Peter Zijlstra [Sun, 6 May 2007 21:49:55 +0000 (14:49 -0700)]
mm: optimize acorn partition truncate
invalidate_bdev() is superfluous when truncate_inode_pages() is also
called. do call invalidate_bh_lrus() though, to avoid stale pointers.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Peter Zijlstra [Sun, 6 May 2007 21:49:55 +0000 (14:49 -0700)]
mm: optimize kill_bdev()
Remove duplicate work in kill_bdev().
It currently invalidates and then truncates the bdev's mapping.
invalidate_mapping_pages() will opportunistically remove pages from the
mapping. And truncate_inode_pages() will forcefully remove all pages.
The only thing truncate doesn't do is flush the bh lrus. So do that
explicitly. This avoids (very unlikely) but possible invalid lookup
results if the same bdev is quickly re-issued.
It also will prevent extreme kernel latencies which are observed when
blockdevs which have a large amount of pagecache are unmounted, by avoiding
invalidate_mapping_pages() on that path. invalidate_mapping_pages() has no
cond_resched (it can be called under spinlock), whereas truncate_inode_pages()
has one.
[akpm@linux-foundation.org: restore nrpages==0 optimisation]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Peter Zijlstra [Sun, 6 May 2007 21:49:54 +0000 (14:49 -0700)]
mm: remove destroy_dirty_buffers from invalidate_bdev()
Remove the destroy_dirty_buffers argument from invalidate_bdev(), it hasn't
been used in 6 years (so akpm says).
find * -name \*.[ch] | xargs grep -l invalidate_bdev |
while read file; do
quilt add $file;
sed -ie 's/invalidate_bdev(\([^,]*\),[^)]*)/invalidate_bdev(\1)/g' $file;
done
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nick Piggin [Sun, 6 May 2007 21:49:53 +0000 (14:49 -0700)]
mm: madvise avoid exclusive mmap_sem
Avoid down_write of the mmap_sem in madvise when we can help it.
Acked-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
matze [Sun, 6 May 2007 21:49:52 +0000 (14:49 -0700)]
include KERN_* constant in printk() calls in mm/slab.c
Signed-off-by: Matthias Kaehlcke <matthias.kaehlcke@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Akinobu Mita [Sun, 6 May 2007 21:49:52 +0000 (14:49 -0700)]
slob: handle SLAB_PANIC flag
kmem_cache_create() for slob doesn't handle SLAB_PANIC.
Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Miller [Sun, 6 May 2007 21:49:51 +0000 (14:49 -0700)]
Quicklist support for sparc64
I ported this to sparc64 as per the patch below, tested on UP SunBlade1500 and
24 cpu Niagara T1000.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Andi Kleen <ak@suse.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Christoph Lameter [Sun, 6 May 2007 21:49:50 +0000 (14:49 -0700)]
Quicklists for page table pages
On x86_64 this cuts allocation overhead for page table pages down to a
fraction (kernel compile / editing load. TSC based measurement of times spend
in each function):
no quicklist
pte_alloc
1569048 4.3s(401ns/2.7us/179.7us)
pmd_alloc 780988 2.1s(337ns/2.7us/86.1us)
pud_alloc 780072 2.2s(424ns/2.8us/300.6us)
pgd_alloc 260022 1s(920ns/4us/263.1us)
quicklist:
pte_alloc 452436 573.4ms(8ns/1.3us/121.1us)
pmd_alloc 196204 174.5ms(7ns/889ns/46.1us)
pud_alloc 195688 172.4ms(7ns/881ns/151.3us)
pgd_alloc 65228 9.8ms(8ns/150ns/6.1us)
pgd allocations are the most complex and there we see the most dramatic
improvement (may be we can cut down the amount of pgds cached somewhat?). But
even the pte allocations still see a doubling of performance.
1. Proven code from the IA64 arch.
The method used here has been fine tuned for years and
is NUMA aware. It is based on the knowledge that accesses
to page table pages are sparse in nature. Taking a page
off the freelists instead of allocating a zeroed pages
allows a reduction of number of cachelines touched
in addition to getting rid of the slab overhead. So
performance improves. This is particularly useful if pgds
contain standard mappings. We can save on the teardown
and setup of such a page if we have some on the quicklists.
This includes avoiding lists operations that are otherwise
necessary on alloc and free to track pgds.
2. Light weight alternative to use slab to manage page size pages
Slab overhead is significant and even page allocator use
is pretty heavy weight. The use of a per cpu quicklist
means that we touch only two cachelines for an allocation.
There is no need to access the page_struct (unless arch code
needs to fiddle around with it). So the fast past just
means bringing in one cacheline at the beginning of the
page. That same cacheline may then be used to store the
page table entry. Or a second cacheline may be used
if the page table entry is not in the first cacheline of
the page. The current code will zero the page which means
touching 32 cachelines (assuming 128 byte). We get down
from 32 to 2 cachelines in the fast path.
3. x86_64 gets lightweight page table page management.
This will allow x86_64 arch code to faster repopulate pgds
and other page table entries. The list operations for pgds
are reduced in the same way as for i386 to the point where
a pgd is allocated from the page allocator and when it is
freed back to the page allocator. A pgd can pass through
the quicklists without having to be reinitialized.
64 Consolidation of code from multiple arches
So far arches have their own implementation of quicklist
management. This patch moves that feature into the core allowing
an easier maintenance and consistent management of quicklists.
Page table pages have the characteristics that they are typically zero or in a
known state when they are freed. This is usually the exactly same state as
needed after allocation. So it makes sense to build a list of freed page
table pages and then consume the pages already in use first. Those pages have
already been initialized correctly (thus no need to zero them) and are likely
already cached in such a way that the MMU can use them most effectively. Page
table pages are used in a sparse way so zeroing them on allocation is not too
useful.
Such an implementation already exits for ia64. Howver, that implementation
did not support constructors and destructors as needed by i386 / x86_64. It
also only supported a single quicklist. The implementation here has
constructor and destructor support as well as the ability for an arch to
specify how many quicklists are needed.
Quicklists are defined by an arch defining CONFIG_QUICKLIST. If more than one
quicklist is necessary then we can define NR_QUICK for additional lists. F.e.
i386 needs two and thus has
config NR_QUICK
int
default 2
If an arch has requested quicklist support then pages can be allocated
from the quicklist (or from the page allocator if the quicklist is
empty) via:
quicklist_alloc(<quicklist-nr>, <gfpflags>, <constructor>)
Page table pages can be freed using:
quicklist_free(<quicklist-nr>, <destructor>, <page>)
Pages must have a definite state after allocation and before
they are freed. If no constructor is specified then pages
will be zeroed on allocation and must be zeroed before they are
freed.
If a constructor is used then the constructor will establish
a definite page state. F.e. the i386 and x86_64 pgd constructors
establish certain mappings.
Constructors and destructors can also be used to track the pages.
i386 and x86_64 use a list of pgds in order to be able to dynamically
update standard mappings.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Andi Kleen <ak@suse.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>