David Brownell [Tue, 17 Jul 2007 11:04:04 +0000 (04:04 -0700)]
spidev compiler warning gone
Get rid of annoying GCC warning on 32-bit platforms.
drivers/spi/spidev.c: In function 'spidev_message':
drivers/spi/spidev.c:184: warning: cast to pointer from integer of different size
drivers/spi/spidev.c:216: warning: cast to pointer from integer of different size
The trick is to add an extra cast using "ptrdiff_t" to convert the u64 to
the correct size integer, and only then casting it into a "void *" pointer.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Nikitenko [Tue, 17 Jul 2007 11:04:03 +0000 (04:04 -0700)]
CRC7 support
Add CRC7 routines, used for example in MMC over SPI communication.
Kerneldoc updates
[akpm@linux-foundation.org: fix funny mix of const and non-const]
Signed-off-by: Jan Nikitenko <jan.nikitenko@gmail.com>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Cc: "Randy.Dunlap" <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Brownell [Tue, 17 Jul 2007 11:04:03 +0000 (04:04 -0700)]
SPI: add 3wire mode flag
Add a new spi->mode bit: SPI_3WIRE, for chips where the SI and SO signals
are shared (and which are thus only half duplex). Update the LM70 driver
to require support for that hardware mode from the controller.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Brownell [Tue, 17 Jul 2007 11:04:02 +0000 (04:04 -0700)]
SPI controller drivers: check for unsupported modes
Minor SPI controller driver updates: make the setup() methods reject
spi->mode bits they don't support, by masking aginst the inverse of bits
they *do* support. This insures against misbehavior later when new mode
bits get added.
Most controllers can't support SPI_LSB_FIRST; more handle SPI_CS_HIGH.
Support for all four SPI clock/transfer modes is routine.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dmitry Torokhov [Tue, 17 Jul 2007 11:04:01 +0000 (04:04 -0700)]
IBMASM: must depend on CONFIG_INPUT
IBMASM: must depend on CONFIG_INPUT
The driver registers couple of input devices and therefore must depend
on CONFIG_INPUT.
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Cc: Vernon Mauery <vernux@us.ibm.com>
Cc: Max Asbock <masbock@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dmitry Torokhov [Tue, 17 Jul 2007 11:04:01 +0000 (04:04 -0700)]
IBMASM: miscellaneous fixes
IBMASM: miscellaneous fixes
Fix some minor issues, such as:
- properly set up ID of keyboard device (was mixed up with mouse)
- constify translation tables
- change some variables to #defines
- set up input device's parent to form proper sysfs hierarchy
- minor formatting changes
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Cc: Vernon Mauery <vernux@us.ibm.com>
Cc: Max Asbock <masbock@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dmitry Torokhov [Tue, 17 Jul 2007 11:04:00 +0000 (04:04 -0700)]
IBMASM: dont use extern in function declarations
IBMASM: don't use extern in function declarations
We normally don't use extern in function declarations located in header files.
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Cc: Vernon Mauery <vernux@us.ibm.com>
Cc: Max Asbock <masbock@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dmitry Torokhov [Tue, 17 Jul 2007 11:03:58 +0000 (04:03 -0700)]
IBMASM: whitespace cleanup
IBMASM: whitespace cleanup
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Cc: Vernon Mauery <vernux@us.ibm.com>
Cc: Max Asbock <masbock@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morton [Tue, 17 Jul 2007 11:03:58 +0000 (04:03 -0700)]
x86_64: speedup touch_nmi_watchdog
Avoid dirtying remote cpu's memory if it already has the correct value.
Cc: Andi Kleen <ak@suse.de>
Cc: Konrad Rzeszutek <konrad@darnok.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morton [Tue, 17 Jul 2007 11:03:57 +0000 (04:03 -0700)]
i386: speedup touch_nmi_watchdog
Avoid dirtying remote cpu's memory if it already has the correct value.
Cc: Andi Kleen <ak@suse.de>
Cc: Konrad Rzeszutek <konrad@darnok.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Konrad Rzeszutek [Tue, 17 Jul 2007 11:03:56 +0000 (04:03 -0700)]
Inhibit NMI watchdog when Alt-SysRq-T operation is underway
On large memory configuration with not so fast CPUs the NMI watchdog is
triggered when memory addresses are being gathered and printed. The code
paths for Alt-SysRq-t are sprinkled with touch_nmi_watchdog in various
places but not in this routine (or in the loop that utilizes this
function). The patch has been tested for regression on large CPU+memory
configuration (128 logical CPUs + 224 GB) and 1,2,4,16-CPU sockets with
various memory sizes (1,2,4,6,20).
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Tue, 17 Jul 2007 11:03:56 +0000 (04:03 -0700)]
Fix sparse false positives re BUG_ON(ptr)
sparse now warns if one compares pointers with integers. However, there are
false positives, like:
fs/filesystems.c:72:2: warning: Using plain integer as NULL pointer
Every time BUG_ON(ptr) is used, ptr is checked against integer zero. Avoid
that and save ~70 false positives from allyesconfig run.
mentioned by Al.
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Josh Triplett <josh@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oleg Nesterov [Tue, 17 Jul 2007 11:03:55 +0000 (04:03 -0700)]
destroy_workqueue() can livelock
Pointed out by Michal Schmidt <mschmidt@redhat.com>.
The bug was introduced in 2.6.22 by me.
cleanup_workqueue_thread() does flush_cpu_workqueue(cwq) in a loop until
->worklist becomes empty. This is live-lockable, a re-niced caller can get
CPU after wake_up() and insert a new barrier before the lower-priority
cwq->thread has a chance to clear ->current_work.
Change cleanup_workqueue_thread() to do flush_cpu_workqueue(cwq) only once.
We can rely on the fact that run_workqueue() won't return until it flushes
all works. So it is safe to call kthread_stop() after that, the "should
stop" request won't be noticed until run_workqueue() returns.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Michal Schmidt <mschmidt@redhat.com>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ananth N Mavinakayanahalli [Tue, 17 Jul 2007 11:03:54 +0000 (04:03 -0700)]
Kprobes on select architectures no longer EXPERIMENTAL
Based on usage and testing over the past couple of years, kprobes on
i386, ia64, powerpc and x86_64 is no longer EXPERIMENTAL.
This is a follow-up to Robert P.J. Day's patch making "Instrumentation
support" non-EXPERIMENTAL:
http://marc.info/?l=linux-kernel&m=
118396955423812&w=2
Arch maintainers for sparc64, avr32 and s390 need to take a similar call.
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Engelhardt [Tue, 17 Jul 2007 11:03:53 +0000 (04:03 -0700)]
make timespec_equal() take const arguments
Make arguments of timespec_equal() const struct timespec.
Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tejun Heo [Tue, 17 Jul 2007 11:03:51 +0000 (04:03 -0700)]
kallsyms: make KSYM_NAME_LEN include space for trailing '\0'
KSYM_NAME_LEN is peculiar in that it does not include the space for the
trailing '\0', forcing all users to use KSYM_NAME_LEN + 1 when allocating
buffer. This is nonsense and error-prone. Moreover, when the caller
forgets that it's very likely to subtly bite back by corrupting the stack
because the last position of the buffer is always cleared to zero.
This patch increments KSYM_NAME_LEN by one and updates code accordingly.
* off-by-one bug in asm-powerpc/kprobes.h::kprobe_lookup_name() macro
is fixed.
* Where MODULE_NAME_LEN and KSYM_NAME_LEN were used together,
MODULE_NAME_LEN was treated as if it didn't include space for the
trailing '\0'. Fix it.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Acked-by: Paulo Marques <pmarques@grupopie.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Maciej W. Rozycki [Tue, 17 Jul 2007 11:03:50 +0000 (04:03 -0700)]
sb1250-duart.c: SB1250 DUART serial support
This is a driver for the SB1250 DUART, a dual serial port implementation
included in the Broadcom family of SOCs descending from the SiByte SB1250
MIPS64 chip multiprocessor. It is a new implementation replacing the
old-fashioned driver currently present in the linux-mips.org tree. It
supports all the usual features one would expect from a(n asynchronous)
serial driver, including modem line control (as far as hardware supports it
-- there is edge detection logic missing from the DCD and RI lines and the
driver does not implement polling of these lines at the moment), the serial
console, BREAK transmission and reception, including the magic SysRq. The
receive FIFO threshold is not maintained though.
The driver was tested with a SWARM board which uses a BCM1250 SOC (which is
dual MIPS64 CMP) and has both ports of the single DUART implemented wired
externally. Both were tested. Testing included using the ports as
terminal lines at 1200bps (which is the ports minimum), 115200bps and a
couple of random speeds inbetween. The modem lines were verified to
operate correctly. No testing was performed with a use as a network
interface, like with SLIP or PPP.
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Roland McGrath [Tue, 17 Jul 2007 11:03:49 +0000 (04:03 -0700)]
Remove CHILD_MAX
The CHILD_MAX macro in limits.h should not be there. It claims to be the
limit on processes a user can own, but its value is wrong for that.
There is no constant value, but a variable resource limit (RLIMIT_NPROC).
Nothing in the kernel uses CHILD_MAX.
The proper thing to do according to POSIX is not to define CHILD_MAX at all.
The sysconf (_SC_CHILD_MAX) implementation works by calling getrlimit.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Roland McGrath [Tue, 17 Jul 2007 11:03:49 +0000 (04:03 -0700)]
Remove OPEN_MAX
The OPEN_MAX macro in limits.h should not be there. It claims to be the
limit on file descriptors in a process, but its value is wrong for that.
There is no constant value, but a variable resource limit (RLIMIT_NOFILE).
Nothing in the kernel uses OPEN_MAX except things that are wrong to do so.
I've submitted other patches to remove those uses.
The proper thing to do according to POSIX is not to define OPEN_MAX at all.
The sysconf (_SC_OPEN_MAX) implementation works by calling getrlimit.
Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Roland McGrath [Tue, 17 Jul 2007 11:03:48 +0000 (04:03 -0700)]
avoid OPEN_MAX in SCM_MAX_FD
The OPEN_MAX constant is an arbitrary number with no useful relation to
anything. Nothing should be using it. SCM_MAX_FD is just an arbitrary
constant and it should be clear that its value is chosen in net/scm.h
and not actually derived from anything else meaningful in the system.
Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Akinobu Mita [Tue, 17 Jul 2007 11:03:47 +0000 (04:03 -0700)]
unregister_blkdev(): return void
Put WARN_ON and fixed all callers of unregister_blkdev(). Now we can make
unregister_blkdev return void.
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Akinobu Mita [Tue, 17 Jul 2007 11:03:46 +0000 (04:03 -0700)]
unregister_blkdev(): delete redundant message
No need to warn unregister_blkdev() failure by caller. (The previous patch
makes unregister_blkdev() print error message in error case)
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Akinobu Mita [Tue, 17 Jul 2007 11:03:46 +0000 (04:03 -0700)]
unregister_blkdev() delete redundant messages in callers
No need to warn unregister_blkdev() failure by the callers. (The previous
patch makes unregister_blkdev() print error message in error case)
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Akinobu Mita [Tue, 17 Jul 2007 11:03:45 +0000 (04:03 -0700)]
unregister_blkdev(): do WARN_ON on failure
When unregister_blkdev() has failed, something wrong happened. This patch
adds WARN_ON to notify of such badness.
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Adrian Bunk [Tue, 17 Jul 2007 11:03:45 +0000 (04:03 -0700)]
proper prototype for proc_nr_files()
Add a proper prototype for proc_nr_files() in include/linux/fs.h
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Tue, 17 Jul 2007 11:03:44 +0000 (04:03 -0700)]
PTRACE_POKEDATA consolidation
Identical implementations of PTRACE_POKEDATA go into generic_ptrace_pokedata()
function.
AFAICS, fix bug on xtensa where successful PTRACE_POKEDATA will nevertheless
return EPERM.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Tue, 17 Jul 2007 11:03:43 +0000 (04:03 -0700)]
PTRACE_PEEKDATA consolidation
Identical implementations of PTRACE_PEEKDATA go into generic_ptrace_peekdata()
function.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pavel Emelianov [Tue, 17 Jul 2007 11:03:42 +0000 (04:03 -0700)]
Report that kernel is tainted if there was an OOPS
If the kernel OOPSed or BUGed then it probably should be considered as
tainted. Thus, all subsequent OOPSes and SysRq dumps will report the
tainted kernel. This saves a lot of time explaining oddities in the
calltraces.
Signed-off-by: Pavel Emelianov <xemul@openvz.org>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[ Added parisc patch from Matthew Wilson -Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Grant Likely [Tue, 17 Jul 2007 11:03:39 +0000 (04:03 -0700)]
Add support for Xilinx SystemACE CompactFlash interface
Tested on Xilinx Virtex ppc405, Katmai 440SPe, and Microblaze
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: Stefan Roese <sr@denx.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: John William <jwilliams@itee.uq.edu.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vitaly Bordug [Tue, 17 Jul 2007 11:03:37 +0000 (04:03 -0700)]
powerpc: 8xx: fix whitespace and indentation
Rolling forward PCMCIA driver, it was discovered that the indentation in
existing one, as well as in BSP side are very odd. This patch is just result
of Lindent run ontop of culprit files.
Signed-off-by: Vitaly Bordug <vitb@kernel.crashing.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Olof Johansson <olof@lixom.net>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Kumar Gala <galak@gate.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Christoph Lameter [Tue, 17 Jul 2007 11:03:37 +0000 (04:03 -0700)]
CONFIG_BOUNCE to avoid useless inclusion of bounce buffer logic
The bounce buffer logic is included on systems that do not need it. If a
system does not have zones like ZONE_DMA and ZONE_HIGHMEM that can lead to
the use of bounce buffers then there is no need to reserve memory pools etc
etc. This is true f.e. for SGI Altix.
Also nicifies the Makefile and gets rid of the tricky "and" there.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rafael J. Wysocki [Tue, 17 Jul 2007 11:03:35 +0000 (04:03 -0700)]
Freezer: make kernel threads nonfreezable by default
Currently, the freezer treats all tasks as freezable, except for the kernel
threads that explicitly set the PF_NOFREEZE flag for themselves. This
approach is problematic, since it requires every kernel thread to either
set PF_NOFREEZE explicitly, or call try_to_freeze(), even if it doesn't
care for the freezing of tasks at all.
It seems better to only require the kernel threads that want to or need to
be frozen to use some freezer-related code and to remove any
freezer-related code from the other (nonfreezable) kernel threads, which is
done in this patch.
The patch causes all kernel threads to be nonfreezable by default (ie. to
have PF_NOFREEZE set by default) and introduces the set_freezable()
function that should be called by the freezable kernel threads in order to
unset PF_NOFREEZE. It also makes all of the currently freezable kernel
threads call set_freezable(), so it shouldn't cause any (intentional)
change of behaviour to appear. Additionally, it updates documentation to
describe the freezing of tasks more accurately.
[akpm@linux-foundation.org: build fixes]
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Nigel Cunningham <nigel@nigel.suspend2.net>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nick Piggin [Tue, 17 Jul 2007 11:03:34 +0000 (04:03 -0700)]
fs: introduce some page/buffer invariants
It is a bug to set a page dirty if it is not uptodate unless it has
buffers. If the page has buffers, then the page may be dirty (some buffers
dirty) but not uptodate (some buffers not uptodate). The exception to this
rule is if the set_page_dirty caller is racing with truncate or invalidate.
A buffer can not be set dirty if it is not uptodate.
If either of these situations occurs, it indicates there could be some data
loss problem. Some of these warnings could be a harmless one where the
page or buffer is set uptodate immediately after it is dirtied, however we
should fix those up, and enforce this ordering.
Bring the order of operations for truncate into line with those of
invalidate. This will prevent a page from being able to go !uptodate while
we're holding the tree_lock, which is probably a good thing anyway.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Robert P. J. Day [Tue, 17 Jul 2007 11:03:33 +0000 (04:03 -0700)]
MM: Make needlessly global hugetlb_no_page() static.
Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Christoph Lameter [Tue, 17 Jul 2007 11:03:33 +0000 (04:03 -0700)]
Add VM_BUG_ON in case someone uses page_mapping on a slab page
Detect slab objects being passed to the page oriented functions of the VM.
It is not sufficient to simply return NULL because the functions calling
page_mapping may depend on other items of the page_struct also to be setup
properly. Moreover slab object may not be properly aligned. The page
oriented functions of the VM expect to operate on page aligned, page sized
objects. Operations on object straddling page boundaries may only affect the
objects partially which may lead to surprising results.
It is better to detect eventually remaining uses and eliminate them.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Christoph Lameter [Tue, 17 Jul 2007 11:03:32 +0000 (04:03 -0700)]
Make SLUB the default allocator
There are some reports that 2.6.22 has SLUB as the default. Not
true!
This will make SLUB the default for 2.6.23.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Christoph Lameter [Tue, 17 Jul 2007 11:03:32 +0000 (04:03 -0700)]
SLUB: Fix CONFIG_SLUB_DEBUG use for CONFIG_NUMA
We currently cannot disable CONFIG_SLUB_DEBUG for CONFIG_NUMA. Now that
embedded systems start to use NUMA we may need this.
Put an #ifdef around places where NUMA only code uses fields only valid
for CONFIG_SLUB_DEBUG.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Christoph Lameter [Tue, 17 Jul 2007 11:03:31 +0000 (04:03 -0700)]
SLUB: Move sysfs operations outside of slub_lock
Sysfs can do a gazillion things when called. Make sure that we do not call
any sysfs functions while holding the slub_lock.
Just protect the essentials:
1. The list of all slab caches
2. The kmalloc_dma array
3. The ref counters of the slabs.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Christoph Lameter [Tue, 17 Jul 2007 11:03:30 +0000 (04:03 -0700)]
SLUB: Do not allocate object bit array on stack
The objects per slab increase with the current patches in mm since we allow up
to order 3 allocs by default. More patches in mm actually allow to use 2M or
higher sized slabs. For slab validation we need per object bitmaps in order
to check a slab. We end up with up to 64k objects per slab resulting in a
potential requirement of 8K stack space. That does not look good.
Allocate the bit arrays via kmalloc.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Christoph Lameter [Tue, 17 Jul 2007 11:03:29 +0000 (04:03 -0700)]
Slab allocators: Replace explicit zeroing with __GFP_ZERO
kmalloc_node() and kmem_cache_alloc_node() were not available in a zeroing
variant in the past. But with __GFP_ZERO it is possible now to do zeroing
while allocating.
Use __GFP_ZERO to remove the explicit clearing of memory via memset whereever
we can.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Christoph Lameter [Tue, 17 Jul 2007 11:03:29 +0000 (04:03 -0700)]
Slab allocators: Cleanup zeroing allocations
It becomes now easy to support the zeroing allocs with generic inline
functions in slab.h. Provide inline definitions to allow the continued use of
kzalloc, kmem_cache_zalloc etc but remove other definitions of zeroing
functions from the slab allocators and util.c.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Christoph Lameter [Tue, 17 Jul 2007 11:03:28 +0000 (04:03 -0700)]
SLUB: Do not use length parameter in slab_alloc()
We can get to the length of the object through the kmem_cache_structure. The
additional parameter does no good and causes the compiler to generate bad
code.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Christoph Lameter [Tue, 17 Jul 2007 11:03:28 +0000 (04:03 -0700)]
SLUB: Style fix up the loop to disable small slabs
Do proper spacing and we only need to do this in steps of 8.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Adrian Bunk [Tue, 17 Jul 2007 11:03:27 +0000 (04:03 -0700)]
mm/slub.c: make code static
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Christoph Lameter [Tue, 17 Jul 2007 11:03:27 +0000 (04:03 -0700)]
SLUB: Simplify dma index -> size calculation
There is no need to caculate the dma slab size ourselves. We can simply
lookup the size of the corresponding non dma slab.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Christoph Lameter [Tue, 17 Jul 2007 11:03:26 +0000 (04:03 -0700)]
SLUB: faster more efficient slab determination for __kmalloc
kmalloc_index is a long series of comparisons. The attempt to replace
kmalloc_index with something more efficient like ilog2 failed due to compiler
issues with constant folding on gcc 3.3 / powerpc.
kmalloc_index()'es long list of comparisons works fine for constant folding
since all the comparisons are optimized away. However, SLUB also uses
kmalloc_index to determine the slab to use for the __kmalloc_xxx functions.
This leads to a large set of comparisons in get_slab().
The patch here allows to get rid of that list of comparisons in get_slab():
1. If the requested size is larger than 192 then we can simply use
fls to determine the slab index since all larger slabs are
of the power of two type.
2. If the requested size is smaller then we cannot use fls since there
are non power of two caches to be considered. However, the sizes are
in a managable range. So we divide the size by 8. Then we have only
24 possibilities left and then we simply look up the kmalloc index
in a table.
Code size of slub.o decreases by more than 200 bytes through this patch.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Christoph Lameter [Tue, 17 Jul 2007 11:03:25 +0000 (04:03 -0700)]
SLUB: do proper locking during dma slab creation
We modify the kmalloc_cache_dma[] array without proper locking. Do the proper
locking and undo the dma cache creation if another processor has already
created it.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Christoph Lameter [Tue, 17 Jul 2007 11:03:24 +0000 (04:03 -0700)]
SLUB: extract dma_kmalloc_cache from get_cache.
The rarely used dma functionality in get_slab() makes the function too
complex. The compiler begins to spill variables from the working set onto the
stack. The created function is only used in extremely rare cases so make sure
that the compiler does not decide on its own to merge it back into get_slab().
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Christoph Lameter [Tue, 17 Jul 2007 11:03:24 +0000 (04:03 -0700)]
SLUB: add some more inlines and #ifdef CONFIG_SLUB_DEBUG
Add #ifdefs around data structures only needed if debugging is compiled into
SLUB.
Add inlines to small functions to reduce code size.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Christoph Lameter [Tue, 17 Jul 2007 11:03:23 +0000 (04:03 -0700)]
Slab allocators: support __GFP_ZERO in all allocators
A kernel convention for many allocators is that if __GFP_ZERO is passed to an
allocator then the allocated memory should be zeroed.
This is currently not supported by the slab allocators. The inconsistency
makes it difficult to implement in derived allocators such as in the uncached
allocator and the pool allocators.
In addition the support zeroed allocations in the slab allocators does not
have a consistent API. There are no zeroing allocator functions for NUMA node
placement (kmalloc_node, kmem_cache_alloc_node). The zeroing allocations are
only provided for default allocs (kzalloc, kmem_cache_zalloc_node).
__GFP_ZERO will make zeroing universally available and does not require any
addititional functions.
So add the necessary logic to all slab allocators to support __GFP_ZERO.
The code is added to the hot path. The gfp flags are on the stack and so the
cacheline is readily available for checking if we want a zeroed object.
Zeroing while allocating is now a frequent operation and we seem to be
gradually approaching a 1-1 parity between zeroing and not zeroing allocs.
The current tree has 3476 uses of kmalloc vs 2731 uses of kzalloc.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Christoph Lameter [Tue, 17 Jul 2007 11:03:22 +0000 (04:03 -0700)]
Slab allocators: consistent ZERO_SIZE_PTR support and NULL result semantics
Define ZERO_OR_NULL_PTR macro to be able to remove the checks from the
allocators. Move ZERO_SIZE_PTR related stuff into slab.h.
Make ZERO_SIZE_PTR work for all slab allocators and get rid of the
WARN_ON_ONCE(size == 0) that is still remaining in SLAB.
Make slub return NULL like the other allocators if a too large memory segment
is requested via __kmalloc.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Christoph Lameter [Tue, 17 Jul 2007 11:03:21 +0000 (04:03 -0700)]
Slab allocators: consolidate code for krealloc in mm/util.c
The size of a kmalloc object is readily available via ksize(). ksize is
provided by all allocators and thus we can implement krealloc in a generic
way.
Implement krealloc in mm/util.c and drop slab specific implementations of
krealloc.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Christoph Lameter [Tue, 17 Jul 2007 11:03:21 +0000 (04:03 -0700)]
SLUB Debug: fix initial object debug state of NUMA bootstrap objects
The function we are calling to initialize object debug state during early NUMA
bootstrap sets up an inactive object giving it the wrong redzone signature.
The bootstrap nodes are active objects and should have active redzone
signatures.
Currently slab validation complains and reverts the object to active state.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Christoph Lameter [Tue, 17 Jul 2007 11:03:20 +0000 (04:03 -0700)]
SLUB: ensure that the number of objects per slab stays low for high orders
Currently SLUB has no provision to deal with too high page orders that may
be specified on the kernel boot line. If an order higher than 6 (on a 4k
platform) is generated then we will BUG() because slabs get more than 65535
objects.
Add some logic that decreases order for slabs that have too many objects.
This allow booting with slab sizes up to MAX_ORDER.
For example
slub_min_order=10
will boot with a default slab size of 4M and reduce slab sizes for small
object sizes to lower orders if the number of objects becomes too big.
Large slab sizes like that allow a concentration of objects of the same
slab cache under as few as possible TLB entries and thus potentially
reduces TLB pressure.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Christoph Lameter [Tue, 17 Jul 2007 11:03:20 +0000 (04:03 -0700)]
SLUB slab validation: Move tracking information alloc outside of lock
We currently have to do an GFP_ATOMIC allocation because the list_lock is
already taken when we first allocate memory for tracking allocation
information. It would be better if we could avoid atomic allocations.
Allocate a size of the tracking table that is usually sufficient (one page)
before we take the list lock. We will then only do the atomic allocation
if we need to resize the table to become larger than a page (mostly only
needed under large NUMA because of the tracking of cpus and nodes otherwise
the table stays small).
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Christoph Lameter [Tue, 17 Jul 2007 11:03:19 +0000 (04:03 -0700)]
SLUB: use list_for_each_entry for loops over all slabs
Use list_for_each_entry() instead of list_for_each().
Get rid of for_all_slabs(). It had only one user. So fold it into the
callback. This also gets rid of cpu_slab_flush.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Christoph Lameter [Tue, 17 Jul 2007 11:03:18 +0000 (04:03 -0700)]
SLUB: change error reporting format to follow lockdep loosely
Changes the error reporting format to loosely follow lockdep.
If data corruption is detected then we generate the following lines:
============================================
BUG <slab-cache>: <problem>
--------------------------------------------
INFO: <more information> [possibly multiple times]
<object dump>
FIX <slab-cache>: <remedial action>
This also adds some more intelligence to the data corruption detection. Its
now capable of figuring out the start and end.
Add a comment on how to configure SLUB so that a production system may
continue to operate even though occasional slab corruption occur through
a misbehaving kernel component. See "Emergency operations" in
Documentation/vm/slub.txt.
[akpm@linux-foundation.org: build fix]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rusty Russell [Tue, 17 Jul 2007 11:03:17 +0000 (04:03 -0700)]
mm: clean up and kernelify shrinker registration
I can never remember what the function to register to receive VM pressure
is called. I have to trace down from __alloc_pages() to find it.
It's called "set_shrinker()", and it needs Your Help.
1) Don't hide struct shrinker. It contains no magic.
2) Don't allocate "struct shrinker". It's not helpful.
3) Call them "register_shrinker" and "unregister_shrinker".
4) Call the function "shrink" not "shrinker".
5) Reduce the 17 lines of waffly comments to 13, but document it properly.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: David Chinner <dgc@sgi.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andy Whitcroft [Tue, 17 Jul 2007 11:03:16 +0000 (04:03 -0700)]
Lumpy Reclaim V4
When we are out of memory of a suitable size we enter reclaim. The current
reclaim algorithm targets pages in LRU order, which is great for fairness at
order-0 but highly unsuitable if you desire pages at higher orders. To get
pages of higher order we must shoot down a very high proportion of memory;
>95% in a lot of cases.
This patch set adds a lumpy reclaim algorithm to the allocator. It targets
groups of pages at the specified order anchored at the end of the active and
inactive lists. This encourages groups of pages at the requested orders to
move from active to inactive, and active to free lists. This behaviour is
only triggered out of direct reclaim when higher order pages have been
requested.
This patch set is particularly effective when utilised with an
anti-fragmentation scheme which groups pages of similar reclaimability
together.
This patch set is based on Peter Zijlstra's lumpy reclaim V2 patch which forms
the foundation. Credit to Mel Gorman for sanitity checking.
Mel said:
The patches have an application with hugepage pool resizing.
When lumpy-reclaim is used used with ZONE_MOVABLE, the hugepages pool can
be resized with greater reliability. Testing on a desktop machine with 2GB
of RAM showed that growing the hugepage pool with ZONE_MOVABLE on it's own
was very slow as the success rate was quite low. Without lumpy-reclaim,
each attempt to grow the pool by 100 pages would yield 1 or 2 hugepages.
With lumpy-reclaim, getting 40 to 70 hugepages on each attempt was typical.
[akpm@osdl.org: ia64 pfn_to_nid fixes and loop cleanup]
[bunk@stusta.de: static declarations for internal functions]
[a.p.zijlstra@chello.nl: initial lumpy V2 implementation]
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Cc: Bob Picco <bob.picco@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mel Gorman [Tue, 17 Jul 2007 11:03:15 +0000 (04:03 -0700)]
Add a movablecore= parameter for sizing ZONE_MOVABLE
This patch adds a new parameter for sizing ZONE_MOVABLE called
movablecore=. While kernelcore= is used to specify the minimum amount of
memory that must be available for all allocation types, movablecore= is
used to specify the minimum amount of memory that is used for migratable
allocations. The amount of memory used for migratable allocations
determines how large the huge page pool could be dynamically resized to at
runtime for example.
How movablecore is actually handled is that the total number of pages in
the system is calculated and a value is set for kernelcore that is
kernelcore == totalpages - movablecore
Both kernelcore= and movablecore= can be safely specified at the same time.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mel Gorman [Tue, 17 Jul 2007 11:03:14 +0000 (04:03 -0700)]
handle kernelcore=: generic
This patch adds the kernelcore= parameter for x86.
Once all patches are applied, a new command-line parameter exist and a new
sysctl. This patch adds the necessary documentation.
From: Yasunori Goto <y-goto@jp.fujitsu.com>
When "kernelcore" boot option is specified, kernel can't boot up on ia64
because of an infinite loop. In addition, the parsing code can be handled
in an architecture-independent manner.
This patch uses common code to handle the kernelcore= parameter. It is
only available to architectures that support arch-independent zone-sizing
(i.e. define CONFIG_ARCH_POPULATES_NODE_MAP). Other architectures will
ignore the boot parameter.
[bunk@stusta.de: make cmdline_parse_kernelcore() static]
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Acked-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mel Gorman [Tue, 17 Jul 2007 11:03:13 +0000 (04:03 -0700)]
Allow huge page allocations to use GFP_HIGH_MOVABLE
Huge pages are not movable so are not allocated from ZONE_MOVABLE. However,
as ZONE_MOVABLE will always have pages that can be migrated or reclaimed, it
can be used to satisfy hugepage allocations even when the system has been
running a long time. This allows an administrator to resize the hugepage pool
at runtime depending on the size of ZONE_MOVABLE.
This patch adds a new sysctl called hugepages_treat_as_movable. When a
non-zero value is written to it, future allocations for the huge page pool
will use ZONE_MOVABLE. Despite huge pages being non-movable, we do not
introduce additional external fragmentation of note as huge pages are always
the largest contiguous block we care about.
[akpm@linux-foundation.org: various fixes]
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mel Gorman [Tue, 17 Jul 2007 11:03:12 +0000 (04:03 -0700)]
Create the ZONE_MOVABLE zone
The following 8 patches against 2.6.20-mm2 create a zone called ZONE_MOVABLE
that is only usable by allocations that specify both __GFP_HIGHMEM and
__GFP_MOVABLE. This has the effect of keeping all non-movable pages within a
single memory partition while allowing movable allocations to be satisfied
from either partition. The patches may be applied with the list-based
anti-fragmentation patches that groups pages together based on mobility.
The size of the zone is determined by a kernelcore= parameter specified at
boot-time. This specifies how much memory is usable by non-movable
allocations and the remainder is used for ZONE_MOVABLE. Any range of pages
within ZONE_MOVABLE can be released by migrating the pages or by reclaiming.
When selecting a zone to take pages from for ZONE_MOVABLE, there are two
things to consider. First, only memory from the highest populated zone is
used for ZONE_MOVABLE. On the x86, this is probably going to be ZONE_HIGHMEM
but it would be ZONE_DMA on ppc64 or possibly ZONE_DMA32 on x86_64. Second,
the amount of memory usable by the kernel will be spread evenly throughout
NUMA nodes where possible. If the nodes are not of equal size, the amount of
memory usable by the kernel on some nodes may be greater than others.
By default, the zone is not as useful for hugetlb allocations because they are
pinned and non-migratable (currently at least). A sysctl is provided that
allows huge pages to be allocated from that zone. This means that the huge
page pool can be resized to the size of ZONE_MOVABLE during the lifetime of
the system assuming that pages are not mlocked. Despite huge pages being
non-movable, we do not introduce additional external fragmentation of note as
huge pages are always the largest contiguous block we care about.
Credit goes to Andy Whitcroft for catching a large variety of problems during
review of the patches.
This patch creates an additional zone, ZONE_MOVABLE. This zone is only usable
by allocations which specify both __GFP_HIGHMEM and __GFP_MOVABLE. Hot-added
memory continues to be placed in their existing destination as there is no
mechanism to redirect them to a specific zone.
[y-goto@jp.fujitsu.com: Fix section mismatch of memory hotplug related code]
[akpm@linux-foundation.org: various fixes]
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Cc: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mel Gorman [Tue, 17 Jul 2007 11:03:05 +0000 (04:03 -0700)]
Add __GFP_MOVABLE for callers to flag allocations from high memory that may be migrated
It is often known at allocation time whether a page may be migrated or not.
This patch adds a flag called __GFP_MOVABLE and a new mask called
GFP_HIGH_MOVABLE. Allocations using the __GFP_MOVABLE can be either migrated
using the page migration mechanism or reclaimed by syncing with backing
storage and discarding.
An API function very similar to alloc_zeroed_user_highpage() is added for
__GFP_MOVABLE allocations called alloc_zeroed_user_highpage_movable(). The
flags used by alloc_zeroed_user_highpage() are not changed because it would
change the semantics of an existing API. After this patch is applied there
are no in-kernel users of alloc_zeroed_user_highpage() so it probably should
be marked deprecated if this patch is merged.
Note that this patch includes a minor cleanup to the use of __GFP_ZERO in
shmem.c to keep all flag modifications to inode->mapping in the
shmem_dir_alloc() helper function. This clean-up suggestion is courtesy of
Hugh Dickens.
Additional credit goes to Christoph Lameter and Linus Torvalds for shaping the
concept. Credit to Hugh Dickens for catching issues with shmem swap vector
and ramfs allocations.
[akpm@linux-foundation.org: build fix]
[hugh@veritas.com: __GFP_ZERO cleanup]
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
NeilBrown [Tue, 17 Jul 2007 11:03:04 +0000 (04:03 -0700)]
Fix read/truncate race
do_generic_mapping_read currently samples the i_size at the start and doesn't
do so again unless it needs to call ->readpage to load a page. After
->readpage it has to re-sample i_size as a truncate may have caused that page
to be filled with zeros, and the read() call should not see these.
However there are other activities that might cause ->readpage to be called on
a page between the time that do_generic_mapping_read samples i_size and when
it finds that it has an uptodate page. These include at least read-ahead and
possibly another thread performing a read.
So do_generic_mapping_read must sample i_size *after* it has an uptodate page.
Thus the current sampling at the start and after a read can be replaced with
a sampling before the copy-out.
The same change applied to __generic_file_splice_read.
Note that this fixes any race with truncate_complete_page, but does not fix a
possible race with truncate_partial_page. If a partial truncate happens after
do_generic_mapping_read samples i_size and before the copy_out, the nuls that
truncate_partial_page place in the page could be copied out incorrectly.
I think the best fix for that is to *not* zero out parts of the page in
truncate_partial_page, but rather to zero out the tail of a page when
increasing i_size.
Signed-off-by: Neil Brown <neilb@suse.de>
Cc: Jens Axboe <jens.axboe@oracle.com>
Acked-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Martin Schwidefsky [Tue, 17 Jul 2007 11:03:03 +0000 (04:03 -0700)]
mm: remove ptep_test_and_clear_dirty and ptep_clear_flush_dirty
Nobody is using ptep_test_and_clear_dirty and ptep_clear_flush_dirty. Remove
the functions from all architectures.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Martin Schwidefsky [Tue, 17 Jul 2007 11:03:03 +0000 (04:03 -0700)]
mm: remove ptep_establish()
The last user of ptep_establish in mm/ is long gone. Remove the architecture
primitive as well.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yoann Padioleau [Tue, 17 Jul 2007 11:03:01 +0000 (04:03 -0700)]
parse error, drivers/i2c/busses/i2c-pmcmsp.c
Signed-off-by: Yoann Padioleau <padator@wanadoo.fr>
Cc: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Tue, 17 Jul 2007 15:44:27 +0000 (08:44 -0700)]
Merge branch 'drm-patches' of ssh:///linux/kernel/git/airlied/drm-2.6
* 'drm-patches' of ssh://master.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
drm: add idr_init to drm_stub.c
drm: fix problem with SiS typedef with sisfb enabled.
Dave Airlie [Tue, 17 Jul 2007 04:20:07 +0000 (14:20 +1000)]
drm: add idr_init to drm_stub.c
Brown paper bag for me this patch chunk didn't make it in the first application
Signed-off-by: Dave Airlie <airlied@linux.ie>
Dave Airlie [Tue, 17 Jul 2007 02:55:58 +0000 (12:55 +1000)]
drm: fix problem with SiS typedef with sisfb enabled.
Reported by: Avuton Olrich <avuton@gmail.com>
Signed-off-by: Dave Airlie <airlied@linux.ie>
Linus Torvalds [Tue, 17 Jul 2007 01:24:37 +0000 (18:24 -0700)]
Merge branch 'drm-patches' of ssh:///linux/kernel/git/airlied/drm-2.6
* 'drm-patches' of ssh://master.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
drm: convert drawable code to using idr
drm: convert drm context code to use Linux idr
Dave Airlie [Tue, 17 Jul 2007 00:55:47 +0000 (10:55 +1000)]
drm: convert drawable code to using idr
This converts the code for allocating drawables to the Linux idr,
Fixes from: Michel Dänzer <michel@tungstengraphics.com>, Kristian Høgsberg <krh@redhat.com>
Signed-off-by: Dave Airlie <airlied@linux.ie>
Dave Airlie [Tue, 17 Jul 2007 00:46:52 +0000 (10:46 +1000)]
drm: convert drm context code to use Linux idr
This converts the drm context allocator to an idr, using the new idr
interface features from Kristian.
Fixes from Kristian Hoegsberg <krh@redhat.com>
Signed-off-by: Dave Airlie <airlied@linux.ie>
Linus Torvalds [Tue, 17 Jul 2007 00:58:08 +0000 (17:58 -0700)]
Merge branch 'merge' of git://git./linux/kernel/git/paulus/powerpc
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (209 commits)
[POWERPC] Create add_rtc() function to enable the RTC CMOS driver
[POWERPC] Add H_ILLAN_ATTRIBUTES hcall number
[POWERPC] xilinxfb: Parameterize xilinxfb platform device registration
[POWERPC] Oprofile support for Power 5++
[POWERPC] Enable arbitary speed tty ioctls and split input/output speed
[POWERPC] Make drivers/char/hvc_console.c:khvcd() static
[POWERPC] Remove dead code for preventing pread() and pwrite() calls
[POWERPC] Remove unnecessary #undef printk from prom.c
[POWERPC] Fix typo in Ebony default DTS
[POWERPC] Check for NULL ppc_md.init_IRQ() before calling
[POWERPC] Remove extra return statement
[POWERPC] pasemi: Don't auto-select CONFIG_EMBEDDED
[POWERPC] pasemi: Rename platform
[POWERPC] arch/powerpc/kernel/sysfs.c: Move NUMA exports
[POWERPC] Add __read_mostly support for powerpc
[POWERPC] Modify sched_clock() to make CONFIG_PRINTK_TIME more sane
[POWERPC] Create a dummy zImage if no valid platform has been selected
[POWERPC] PS3: Bootwrapper support.
[POWERPC] powermac i2c: Use mutex
[POWERPC] Schedule removal of arch/ppc
...
Fixed up conflicts manually in:
Documentation/feature-removal-schedule.txt
arch/powerpc/kernel/pci_32.c
arch/powerpc/kernel/pci_64.c
include/asm-powerpc/pci.h
and asked the powerpc people to double-check the result..
Linus Torvalds [Tue, 17 Jul 2007 00:48:54 +0000 (17:48 -0700)]
Merge branch 'upstream-linus' of /linux/kernel/git/jgarzik/netdev-2.6
* 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6: (37 commits)
forcedeth bug fix: realtek phy
forcedeth bug fix: vitesse phy
forcedeth bug fix: cicada phy
atl1: reorder atl1_main functions
atl1: fix excessively indented code
atl1: cleanup atl1_main
atl1: header file cleanup
atl1: remove irq_sem
cdc-subset to support new vendor/product ID
8139cp: implement the missing dev->tx_timeout
myri10ge: Remove nonsensical limit in the tx done routine
gianfar: kill unused header
EP93XX_ETH must select MII
macb: Add multicast capability
macb: Use generic PHY layer
s390: add barriers to qeth driver
s390: scatter-gather for inbound traffic in qeth driver
eHEA: Introducing support vor DLPAR memory add
Fix a potential NULL pointer dereference in free_shared_mem() in drivers/net/s2io.c
[PATCH] softmac: Fix ESSID problem
...
Linus Torvalds [Tue, 17 Jul 2007 00:33:17 +0000 (17:33 -0700)]
Merge branch 'master' of /linux/kernel/git/davem/sparc-2.6
* 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6:
[SERIAL] SUNHV: Fix jerky console on LDOM guests.
[SPARC64]: Fix race between MD update and dr-cpu add.
[SPARC64]: SMP build fix.
David Miller [Tue, 17 Jul 2007 00:17:44 +0000 (17:17 -0700)]
[HRTIMER] Fix cpu pointer arg to clockevents_notify()
All of the clockevent notifiers expect a pointer to
an "unsigned int" cpu argument, but hrtimer_cpu_notify()
passes in a pointer to a long.
[ Discussed with and ok by Thomas Gleixner ]
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David S. Miller [Tue, 17 Jul 2007 00:05:11 +0000 (17:05 -0700)]
[SERIAL] SUNHV: Fix jerky console on LDOM guests.
Mixing putchar() and write() hvcalls does not work %100
correctly. But we should be using write() all the time
if we can, even from ->start_tx(), anyways.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 16 Jul 2007 23:50:36 +0000 (16:50 -0700)]
[SPARC64]: Fix race between MD update and dr-cpu add.
We need to make sure the MD update occurs before we try to
process dr-cpu configure requests. MD update and dr-cpu
were being processed by seperate threads so that did not
happen occaisionally.
Fix this by executing all domain services data packets from
a single thread, in order.
This will help simplify some other things as well.
Signed-off-by: David S. Miller <davem@davemloft.net>
Fabio Massimo Di Nitto [Mon, 16 Jul 2007 21:15:39 +0000 (14:15 -0700)]
[SPARC64]: SMP build fix.
The UP build fix had some unintended consequences.
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Mon, 16 Jul 2007 23:52:44 +0000 (16:52 -0700)]
Make BLK_DEV_BSG depend strictly on SCSI=y
The SCSI code can be compiled modular, but BLK_DEV_BSG currently cannot,
and depends on the SCSI layer. So make sure that it depends on the SCSI
layer being compiled in, not just available as a module.
Noticed by Jeff Garzik and S.Çağlar Onur.
Cc: Jeff Garzik <jeff@garzik.org>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: S.Çağlar Onur <caglar@pardus.org.tr>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Mon, 16 Jul 2007 23:50:01 +0000 (16:50 -0700)]
Make check_signature depend on CONFIG_HAS_IOMEM
This should avoid build problems on architectures without a "readb()",
that got bitten by check_signature() being uninlined.
Noted by Heiko Carstens.
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ayaz Abdulla [Sun, 15 Jul 2007 10:51:03 +0000 (06:51 -0400)]
forcedeth bug fix: realtek phy
This patch contains errata fixes for the realtek phy.
Signed-off-by: Ayaz Abdulla <aabdulla@nvidia.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Ayaz Abdulla [Sun, 15 Jul 2007 10:50:53 +0000 (06:50 -0400)]
forcedeth bug fix: vitesse phy
This patch contains errata fixes for the vitesse phy.
Signed-off-by: Ayaz Abdulla <aabdulla@nvidia.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Ayaz Abdulla [Sun, 15 Jul 2007 10:50:28 +0000 (06:50 -0400)]
forcedeth bug fix: cicada phy
This patch contains errata fixes for the cicada phy. It only renamed the
defines to be phy specific.
Signed-off-by: Ayaz Abdulla <aabdulla@nvidia.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Jay Cliburn [Sun, 15 Jul 2007 16:03:29 +0000 (11:03 -0500)]
atl1: reorder atl1_main functions
Reorder functions in atl1_main into more logical groupings to make the
code easier to follow. This patch is large, but it's harmless; it neither
adds nor removes any functionality whatsoever.
Signed-off-by: Jay Cliburn <jacliburn@bellsouth.net>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Jay Cliburn [Sun, 15 Jul 2007 16:03:28 +0000 (11:03 -0500)]
atl1: fix excessively indented code
Move excessively indented code to separate functions. Also move ring
pointer initialization to its own function.
Signed-off-by: Jay Cliburn <jacliburn@bellsouth.net>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Jay Cliburn [Sun, 15 Jul 2007 16:03:27 +0000 (11:03 -0500)]
atl1: cleanup atl1_main
Fix indentation, remove dead code, improve some comments, change dev_dbg to
dev_printk.
Signed-off-by: Jay Cliburn <jacliburn@bellsouth.net>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Jay Cliburn [Sun, 15 Jul 2007 16:03:26 +0000 (11:03 -0500)]
atl1: header file cleanup
Remove unused structure members, improve comments, break long comment lines,
rename a constant to be consistent with others in the file.
Signed-off-by: Jay Cliburn <jacliburn@bellsouth.net>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Jay Cliburn [Sun, 15 Jul 2007 16:03:25 +0000 (11:03 -0500)]
atl1: remove irq_sem
Remove unnecessary irq_sem code.
Signed-off-by: Chris Snook <csnook@redhat.com>
Signed-off-by: Jay Cliburn <jacliburn@bellsouth.net>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
jing xiang [Sat, 14 Jul 2007 06:13:24 +0000 (14:13 +0800)]
cdc-subset to support new vendor/product ID
This patch is for cdc subset to support Mavell vendor/product ID.
Signed-off-by: Jing Xiang <everxiang@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Francois Romieu [Fri, 13 Jul 2007 21:05:35 +0000 (23:05 +0200)]
8139cp: implement the missing dev->tx_timeout
Signed-off-by: Mika Lansirinne <mika.lansirinne@stonesoft.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Brice Goglin [Fri, 13 Jul 2007 18:15:13 +0000 (20:15 +0200)]
myri10ge: Remove nonsensical limit in the tx done routine
Remove nonsensical limit in the tx done routine. Specifically,
the loop will always terminate after processing <= 1 rings worth
of frames, as the mcp index is not refetched, so the removed
conditional could never be true.
Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Kumar Gala [Fri, 13 Jul 2007 05:38:47 +0000 (00:38 -0500)]
gianfar: kill unused header
A long time ago we used OCP with the gianfar driver. Eventually when
we kill arch/ppc including this will cause issues so lets just kill it now.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
John Donoghue [Fri, 13 Jul 2007 00:12:08 +0000 (02:12 +0200)]
EP93XX_ETH must select MII
CONFIG_EP93XX_ETH=y, CONFIG_MII=n results in an obvious link error.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Patrice Vilchez [Thu, 12 Jul 2007 17:07:25 +0000 (19:07 +0200)]
macb: Add multicast capability
Add multicast capability to Atmel ethernet macb driver.
Signed-off-by: Patrice Vilchez <patrice.vilchez@rfo.atmel.com>
Signed-off-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
frederic RODO [Thu, 12 Jul 2007 17:07:24 +0000 (19:07 +0200)]
macb: Use generic PHY layer
Convert the macb driver to use the generic PHY layer in
drivers/net/phy.
Signed-off-by: Frederic RODO <f.rodo@til-technologies.fr>
Signed-off-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Frank Blaschka [Thu, 12 Jul 2007 10:51:35 +0000 (12:51 +0200)]
s390: add barriers to qeth driver
Add barrier to loop where atomic variable is evaluated.
Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
Signed-off-by: Ursula Braun <braunu@de.ibm.com>
Signed-off-by: Frank Pavlic <fpavlic@de.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Frank Blaschka [Thu, 12 Jul 2007 10:51:34 +0000 (12:51 +0200)]
s390: scatter-gather for inbound traffic in qeth driver
For large incoming packets > PAGE_SIZE/2 qeth creates a fragmented skb
by adding pointers to qdio pages to the fragment list of the skb.
This avoids allocating big chunks of consecutive memory. Also copying
data from the qdio buffer to the skb is economized.
Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
Signed-off-by: Ursula Braun <braunu@de.ibm.com>
Signed-off-by: Frank Pavlic <fpavlic@de.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>