Kurt Garloff [Tue, 21 Feb 2006 02:27:51 +0000 (18:27 -0800)]
[PATCH] OOM kill: children accounting
In the badness() calculation, there's currently this piece of code:
/*
* Processes which fork a lot of child processes are likely
* a good choice. We add the vmsize of the children if they
* have an own mm. This prevents forking servers to flood the
* machine with an endless amount of children
*/
list_for_each(tsk, &p->children) {
struct task_struct *chld;
chld = list_entry(tsk, struct task_struct, sibling);
if (chld->mm = p->mm && chld->mm)
points += chld->mm->total_vm;
}
The intention is clear: If some server (apache) keeps spawning new children
and we run OOM, we want to kill the father rather than picking a child.
This -- to some degree -- also helps a bit with getting fork bombs under
control, though I'd consider this a desirable side-effect rather than a
feature.
There's one problem with this: No matter how many or few children there are,
if just one of them misbehaves, and all others (including the father) do
everything right, we still always kill the whole family. This hits in real
life; whether it's javascript in konqueror resulting in kdeinit (and thus the
whole KDE session) being hit or just a classical server that spawns children.
Sidenote: The killer does kill all direct children as well, not only the
selected father, see oom_kill_process().
The idea in attached patch is that we do want to account the memory
consumption of the (direct) children to the father -- however not fully.
This maintains the property that fathers with too many children will still
very likely be picked, whereas a single misbehaving child has the chance to
be picked by the OOM killer.
In the patch I account only half (rounded up) of the children's vm_size to
the parent. This means that if one child eats more mem than the rest of
the family, it will be picked, otherwise it's still the father and thus the
whole family that gets selected.
This is heuristics -- we could debate whether accounting for a fourth would
be better than for half of it. Or -- if people would consider it worth the
trouble -- make it a sysctl. For now I sticked to accounting for half,
which should IMHO be a significant improvement.
The patch does one more thing: As users tend to be irritated by the choice
of killed processes (mainly because the children are killed first, despite
some of them having a very low OOM score), I added some more output: The
selected (father) process will be reported first and it's oom_score printed
to syslog.
Description:
Only account for half of children's vm size in oom score calculation
This should still give the parent enough point in case of fork bombs. If
any child however has more than 50% of the vm size of all children
together, it'll get a higher score and be elected.
This patch also makes the kernel display the oom_score.
Signed-off-by: Kurt Garloff <garloff@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Linus Torvalds [Fri, 17 Feb 2006 22:23:45 +0000 (14:23 -0800)]
Linux v2.6.16-rc4
Bjorn Helgaas [Fri, 17 Feb 2006 21:59:50 +0000 (13:59 -0800)]
[PATCH] ACPI: fix vendor resource length computation
acpi_rs_get_list_length() needs to account for all the vendor-defined data
bytes. Failing to include these causes buffers to be sized too small,
which causes slab corruption when we later convert AML to resources and run
off the end of the buffer.
This causes slab corruption on machines that use ACPI vendor-defined
resources. All HP ia64 machines do, and I'm told that some NEC machines
may as well.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: "Brown, Len" <len.brown@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Chris Wright [Fri, 17 Feb 2006 21:59:36 +0000 (13:59 -0800)]
[PATCH] sys_mbind sanity checking
Make sure maxnodes is safe size before calculating nlongs in
get_nodes().
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Andrew Morton [Fri, 17 Feb 2006 21:52:58 +0000 (13:52 -0800)]
[PATCH] select: time comparison fixes
I got all of these backwards. We want to return
min(input timeout, new timeout)
to userspace to prevent increasing the time-remaining value.
Thanks to Ernst Herzberg <earny@net4u.de> for reporting and diagnosing.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
David Gibson [Fri, 17 Feb 2006 21:52:56 +0000 (13:52 -0800)]
[PATCH] powerpc: Fix accidentally-working typo in __pud_free_tlb
One of the parameters to the __pud_free_tlb() macro for powerpc is
incorrect (see patch) . We get away with it by accident, because the one
place the macro is called, the second parameter is a variable named "pud".
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Tim Hockin [Fri, 17 Feb 2006 21:52:54 +0000 (13:52 -0800)]
[PATCH] Remove KERN_INFO from middle of printk line
Don't print KERN_INFO in the middle of a printk line.
printk(KERN_INFO "OEM ID: %s ",str);
is just above this. This is already fixed up in i386 copy.
Signed-off-by: Martin J. Bligh <mbligh@google.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Johannes Berg [Fri, 17 Feb 2006 21:52:54 +0000 (13:52 -0800)]
[PATCH] allow windfarm_pm112 module to load
The windfarm_pm112 module relies on smu_sat_get_sdb_partition which is in
windfarm_smu_sat.c but is not exported to modules, so despite Kconfig
having the option to build the pm112 as modules, this can never be loaded.
This patch fixes that by exporting smu_sat_get_sdb_partition with
EXPORT_SYMBOL_GPL
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Miklos Szeredi [Fri, 17 Feb 2006 21:52:52 +0000 (13:52 -0800)]
[PATCH] fuse: fix bug in aborted fuse_release_end()
There's a rather theoretical case of the BUG triggering in
fuse_reset_request():
- iget() fails because of OOM after a successful CREATE_OPEN request
- during IO on the resulting RELEASE request the connection is aborted
Fix and add warning to fuse_reset_request().
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Rafael J. Wysocki [Fri, 17 Feb 2006 21:52:51 +0000 (13:52 -0800)]
[PATCH] swsusp: fix breakage with swap on LVM
Restore the compatibility with the older code and make it possible to
suspend if the kernel command line doesn't contain the "resume=" argument
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Heiko Carstens [Fri, 17 Feb 2006 21:52:50 +0000 (13:52 -0800)]
[PATCH] s390: sys32_fstatat -> sys32_fstatat64
Just rename the compat system call to keep the name consistent with all the
other *64 compat system calls.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Cornelia Huck [Fri, 17 Feb 2006 21:52:49 +0000 (13:52 -0800)]
[PATCH] s390: fix assignment instead of check in ccw_device_set_online()
Fix assignment instead of check in ccw_device_set_online(). Also remove
unneeded assignment in ccw_device_do_sense().
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Heiko Carstens [Fri, 17 Feb 2006 21:52:48 +0000 (13:52 -0800)]
[PATCH] s390: smp initialization speed
The last changes that introduced the additional_cpus command line parameter
also introduced a regression regarding smp initialization speed. In
smp_setup_cpu_possible_map() cpu_present_map is set to the same value as
cpu_possible_map. Especially that means that bits in the present map will be
set for cpus that are not present. This will cause a slow down in the initial
cpu_up() loop in smp_init() since trying to take cpus online that aren't
present takes a while.
Fix this by setting only bits for present cpus in cpu_present_map and set
cpu_present_map to cpu_possible_map in smp_cpus_done().
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Heiko Carstens [Fri, 17 Feb 2006 21:52:47 +0000 (13:52 -0800)]
[PATCH] s390: possible_cpus parameter
Introduce possible_cpus command line option. Hard sets the number of bits set
in cpu_possible_map. Unlike the additional_cpus parameter this one guarantees
that num_possible_cpus() will stay constant even if the system gets rebooted
and a different number of cpus are present at startup.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Heiko Carstens [Fri, 17 Feb 2006 21:52:46 +0000 (13:52 -0800)]
[PATCH] s390: additional_cpus parameter
Introduce additional_cpus command line option. By default no additional cpu
can be attached to the system anymore. Only the cpus present at IPL time can
be switched on/off. If it is desired that additional cpus can be attached to
the system the maximum number of additional cpus needs to be specified with
this option.
This change is necessary in order to limit the waste of per_cpu data
structures.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Heiko Carstens [Fri, 17 Feb 2006 21:52:46 +0000 (13:52 -0800)]
[PATCH] s390: fix preempt_count of idle thread with cpu hotplug
Set preempt_count of idle_thread to zero before switching off cpu. Otherwise
the preempt_count will be wrong if the cpu is switched on again since the
thread will be reused.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Cornelia Huck [Fri, 17 Feb 2006 21:52:45 +0000 (13:52 -0800)]
[PATCH] s390: ccw device disbanding
If __ccw_device_disband_start() fails to initiate disbanding, it should finish
with ccw_device_disband_done() (which leaves the device in offline state)
instead of ccw_device_verify_done() (which leaves the device in online state).
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Ingo Molnar [Fri, 17 Feb 2006 21:52:44 +0000 (13:52 -0800)]
[PATCH] Introduce CONFIG_DEFAULT_MIGRATION_COST
Heiko Carstens <heiko.carstens@de.ibm.com> wrote:
The boot sequence on s390 sometimes takes ages and we spend a very long
time (up to one or two minutes) in calibrate_migration_costs. The time
spent there differs from boot to boot. Also the calculated costs differ
a lot. I've seen differences by up to a factor of 15 (yes, factor not
percent). Also I doubt that making these measurements make much sense on
a completely virtualized architecture where you cannot tell how much cpu
time you will get anyway.
So introduce the CONFIG_DEFAULT_MIGRATION_COST method for an architecture
to set the scheduler migration costs. This turns off automatic detection
of migration costs. Makes sense on virtual platforms, where migration
costs are hard to measure accurately.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Adrian Bunk [Fri, 17 Feb 2006 21:52:42 +0000 (13:52 -0800)]
[PATCH] arch/sh/Kconfig: fix the ISA_DMA_API dependencies
Jean-Luc Leger <reiga@dspnet.fr.eu.org> found this obvious typo.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Marcel Selhorst [Fri, 17 Feb 2006 21:52:41 +0000 (13:52 -0800)]
[PATCH] Infineon TPM: IO-port leakage fix, WTX-bugfix
Fix IO-port leakage from request_region in case of error during TPM
initialization, adds more pnp-verification and fixes a WTX-bug.
Signed-off-by: Marcel Selhorst <selhorst@crypto.rub.de>
Acked-by: Kylene Jo Hall <kjhall@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Peter Staubach [Fri, 17 Feb 2006 21:52:36 +0000 (13:52 -0800)]
[PATCH] fix deadlock in ext2
Fix a deadlock possible in the ext2 file system implementation. This
deadlock occurs when a file is removed from an ext2 file system which was
mounted with the "sync" mount option.
The problem is that ext2_xattr_delete_inode() was invoking the routine,
sync_dirty_buffer(), using a buffer head which was previously locked via
lock_buffer(). The first thing that sync_dirty_buffer() does is to lock
the buffer head that it was passed. It does this via lock_buffer(). Oops.
The solution is to unlock the buffer head in ext2_xattr_delete_inode()
before invoking sync_dirty_buffer(). This makes the code in
ext2_xattr_delete_inode() obey the same locking rules as all other callers
of sync_dirty_buffer() in the ext2 file system implementation.
Signed-off-by: Peter Staubach <staubach@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Linus Torvalds [Fri, 17 Feb 2006 21:47:16 +0000 (13:47 -0800)]
Merge branch 'upstream-fixes' of /linux/kernel/git/jgarzik/libata-dev
Jens Axboe [Wed, 15 Feb 2006 14:59:25 +0000 (15:59 +0100)]
[PATCH] Add missing FUA write to sata_mv dma command list
Signed-off-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Albert Lee [Mon, 13 Feb 2006 10:55:25 +0000 (18:55 +0800)]
[PATCH] libata: minor fix for 2.6.16-rc3
- Fix the array index value in ata_rwcmd_protocol() for the added FUA commands.
- Filter out ATAPI packet command error messages in ata_pio_error()
Signed-off-by: Albert Lee <albertcc@tw.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Linus Torvalds [Fri, 17 Feb 2006 21:27:40 +0000 (13:27 -0800)]
Merge branch 'upstream-fixes' of /linux/kernel/git/jgarzik/netdev-2.6
Dan Williams [Wed, 14 Dec 2005 20:10:49 +0000 (13:10 -0700)]
[PATCH] Necessary evil to get sata_vsc to initialize with Intel iq3124h hba
* libata does not care about error interrupts, so handle them locally
* the interrupts that are ignored only appear to happen at init time
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Linus Torvalds [Fri, 17 Feb 2006 19:38:21 +0000 (20:38 +0100)]
[PATCH] Handle holes in node mask in node fallback list setup
Change the find_next_best_node algorithm to correctly skip
over holes in the node online mask. Previously it would not handle
missing nodes correctly and cause crashes at boot.
[Written by Linus, tested by AK]
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Jay Vosburgh [Wed, 8 Feb 2006 05:17:22 +0000 (21:17 -0800)]
[PATCH] bonding: fix a locking bug in bond_release
bond_release returns EINVAL without releasing the bond lock if the
slave device is not being bonded by the bond. The following patch
ensures that the lock is released in this case.
Signed-off-by: Stephen J. Bevan <stephen@dino.dnsalias.com>
Acked-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Frank Pavlic [Tue, 7 Feb 2006 16:04:38 +0000 (17:04 +0100)]
[PATCH] s390: some qeth driver fixes
[patch 2/2] s390: some qeth driver fixes
From: Frank Pavlic <fpavlic@de.ibm.com>
- fixed kernel panic when using EDDP support in Layer 2 mode
- NULL pointer exception in qeth_set_offline fixed.
- setting EDDP in Layer 2 mode did not set NETIF_F_(SG/TSO)
flags when device became online.
- use sscanf for parsing and converting IPv4 addresses
from string to __u8 values.
- qeth_string_to_ipaddr6 fixed. in case of double colon
the converted IPv6 address out from the string was not correct
in previous implementation.
Signed-off-by: Frank Pavlic <fpavlic@de.ibm.com>
diffstat:
qeth.h | 112 +++++++++++++++++++++++++-----------------------------------
qeth_eddp.c | 11 ++++-
qeth_main.c | 17 +++------
3 files changed, 63 insertions(+), 77 deletions(-)
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Frank Pavlic [Tue, 7 Feb 2006 16:04:36 +0000 (17:04 +0100)]
[PATCH] s390: lcs performance enhancements
[patch 1/2] s390: lcs performance enhancements
From: Klaus Wacker <kdwacker@de.ibm.com>
- When flood pinging (with large packet size) an LCS device,
about 90 % of all packets are dropped by driver.
- increased number of lcs IO buffers to 32.
- use netif_stop_queue/netif_wake_queue in lcs_start_xmit routine
- don't lock the whole xmit routine but just the piece of code where
tx_buffer is touched.
Signed-off-by: Frank Pavlic <fpavlic@de.ibm.com>
diffstat:
lcs.c | 31 +++++++++++++++++--------------
lcs.h | 2 +-
2 files changed, 18 insertions(+), 15 deletions(-)
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Andrew Morton [Fri, 10 Feb 2006 10:00:43 +0000 (02:00 -0800)]
[PATCH] smctr warning fix
drivers/net/tokenring/smctr.c: In function `smctr_load_firmware':
drivers/net/tokenring/smctr.c:2981: warning: assignment discards qualifiers from pointer target type
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Stephen Hemminger [Fri, 10 Feb 2006 23:58:59 +0000 (15:58 -0800)]
[PATCH] sky2: speed setting fix
Users report problems w/ auto-negotiation disabled and the link set
to 100/Half or 10/Half. Problems range from poor performance to no
link at all.
The current sky2 code does not set things properly on link up if
autonegotiation is disabled. Plus it does not contemplate a 10Mbit
setting at all. This patch corrects that.
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Stephen Hemminger [Mon, 13 Feb 2006 23:46:48 +0000 (15:46 -0800)]
[PATCH] skge: speed setting
This is a clone of John Linville's fixed for speed setting on sky2 driver.
The skge driver has the same code (and bug). It would not allow manually forcing
100 and 10 mbit.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Stephen Hemminger [Mon, 13 Feb 2006 23:48:08 +0000 (15:48 -0800)]
[PATCH] skge: no longer experimental
Take the experimental dependency of skge driver, it is as stable as the
others.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Stephen Hemminger [Mon, 13 Feb 2006 23:49:55 +0000 (15:49 -0800)]
[PATCH] sk98lin: no d-link support (kconfig)
The sk98lin driver was changed a while ago to remove support for the
D-Link 530T card because that hardware has no working VPD data. The help
text for Kconfig was not updated.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Jean Tourrilhes [Fri, 17 Feb 2006 01:44:54 +0000 (17:44 -0800)]
[PATCH] Wavelan_cs bitfield fixes
Some bitfields were incorrectly initialised in wavelan_cs,
causing some compiler warning. Also killed a error message that should
not be there...
Signed-off-by: Jean Tourrilhes <jt@hpl.hp.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Jeff Garzik [Fri, 17 Feb 2006 21:11:47 +0000 (16:11 -0500)]
Merge branch 'for-jeff' of git://electric-eye.fr.zoreil.com/home/romieu/linux-2.6
Jeff Garzik [Fri, 17 Feb 2006 21:10:20 +0000 (16:10 -0500)]
Merge branch 'upstream-fixes' of git://git./linux/kernel/git/linville/wireless-2.6
Chuck Ebbert [Fri, 17 Feb 2006 08:16:55 +0000 (03:16 -0500)]
[PATCH] i386: fix singlestepping though a syscall
Do not mask TIF_SINGLESTEP bit in _TIF_WORK_MASK. Masking this stopped
do_notify_resume() from being called when it should have been.
Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Joshua Kinard [Fri, 17 Feb 2006 03:52:25 +0000 (03:52 +0000)]
[PATCH] Fix SGI O2 compile error in drivers/video/gbefb.c
A sysfs function call uses the wrong parameter, and thus breaks a build on
SGI O2.
CC drivers/video/gbefb.o
drivers/video/gbefb.c: In function ‘gbefb_remove’:
drivers/video/gbefb.c:1246: error: ‘dev’ undeclared (first use in this function)
drivers/video/gbefb.c:1246: error: (Each undeclared identifier is reported only once
drivers/video/gbefb.c:1246: error: for each function it appears in.)
make[2]: *** [drivers/video/gbefb.o] Error 1
Signed-off-by: Joshua Kinard <kumba@gentoo.org>
Signed-off-by: Antonino Daplas <adaplas@pol.net>
Signed-off-by: Martin Michlmayr <tbm@cyrius.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Paul Mackerras [Thu, 16 Feb 2006 23:30:23 +0000 (10:30 +1100)]
[PATCH] Provide an interface for getting the current tick length
This provides an interface for arch code to find out how many
nanoseconds are going to be added on to xtime by the next call to
do_timer. The value returned is a fixed-point number in 52.12 format
in nanoseconds. The reason for this format is that it gives the
full precision that the timekeeping code is using internally.
The motivation for this is to fix a problem that has arisen on 32-bit
powerpc in that the value returned by do_gettimeofday drifts apart
from xtime if NTP is being used. PowerPC is now using a lockless
do_gettimeofday based on reading the timebase register and performing
some simple arithmetic. (This method of getting the time is also
exported to userspace via the VDSO.) However, the factor and offset
it uses were calculated based on the nominal tick length and weren't
being adjusted when NTP varied the tick length.
Note that 64-bit powerpc has had the lockless do_gettimeofday for a
long time now. It also had an extremely hairy routine that got called
from the 32-bit compat routine for adjtimex, which adjusted the
factor and offset according to what it thought the timekeeping code
was going to do. Not only was this only called if a 32-bit task did
adjtimex (i.e. not if a 64-bit task did adjtimex), it was also
duplicating computations from kernel/timer.c and it wasn't clear that
it was (still) correct.
The simple solution is to ask the timekeeping code how long the
current jiffy will be on each timer interrupt, after calling
do_timer. If this jiffy will be a different length from the last one,
we then need to compute new values for the factor and offset used in
the lockless do_gettimeofday. In this way we can keep xtime and
do_gettimeofday in sync, even when NTP is varying the tick length.
Note that when adjtimex varies the tick length, it almost always
introduces the variation from the next tick on. The only case I could
see where adjtimex would vary the length of the current tick is when
an old-style adjtime adjustment is being cancelled. (It's not clear
to me why the adjustment has to be cancelled immediately rather than
from the next tick on.) Thus I don't see any real need for a hook in
adjtimex; the rare case of an old-style adjustment being cancelled can
be fixed up at the next tick.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Acked-by: john stultz <johnstul@us.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Andi Kleen [Fri, 17 Feb 2006 00:39:16 +0000 (01:39 +0100)]
[PATCH] Handle all and empty zones when setting up custom zonelists for mbind
The memory allocator doesn't like empty zones (which have an
uninitialized freelist), so a x86-64 system with a node fully
in GFP_DMA32 only would crash on mbind.
Fix that up by putting all possible zones as fallback into the zonelist
and skipping the empty ones.
In fact the code always enough allocated space for all zones,
but only used it for the highest. This change just uses all the
memory that was allocated before.
This should work fine for now, but whoever implements node hot removal
needs to fix this somewhere else too (or make sure zone datastructures
by itself never go away, only their memory)
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Linus Torvalds [Fri, 17 Feb 2006 16:16:35 +0000 (08:16 -0800)]
Merge branch 'release' of git://git./linux/kernel/git/aegl/linux-2.6
Linus Torvalds [Fri, 17 Feb 2006 16:13:38 +0000 (08:13 -0800)]
Merge master.kernel.org:/home/rmk/linux-2.6-serial
Linus Torvalds [Fri, 17 Feb 2006 16:13:11 +0000 (08:13 -0800)]
Merge master.kernel.org:/home/rmk/linux-2.6-arm
Linus Torvalds [Fri, 17 Feb 2006 16:12:08 +0000 (08:12 -0800)]
Merge master.kernel.org:/home/rmk/linux-2.6-mmc
Andi Kleen [Thu, 16 Feb 2006 22:42:16 +0000 (23:42 +0100)]
[PATCH] x86_64: Always pass full number of nodes to NUMA hash computation
Previously the numa hash code would be confused by holes in the node space
and stop early. This is the first part of the fix for the non boot issue
with empty nodes on Opterons.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Andi Kleen [Thu, 16 Feb 2006 22:42:13 +0000 (23:42 +0100)]
[PATCH] x86_64: Relax SRAT covers all memory check a bit
Code was refusing good SRATs because about 12K got lost somewhere.
Allow less than 1MB of difference before rejecting it.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Andi Kleen [Thu, 16 Feb 2006 22:42:10 +0000 (23:42 +0100)]
[PATCH] x86_64: Resolve the RIP of an early exception using kallsyms
But do it after everything else to risk less from recursive
crashes.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Andi Kleen [Thu, 16 Feb 2006 22:42:07 +0000 (23:42 +0100)]
[PATCH] x86_64: Disable tsc when apicpmtimer is active
Otherwise it has no effect anyways.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Andi Kleen [Thu, 16 Feb 2006 22:42:04 +0000 (23:42 +0100)]
[PATCH] x86_64: Don't enable ATI apicmaintimer workaround when the machine has C2 or C3
Many laptops have problems with ticking the local APIC timer in C2/C3.
The code added earlier to use it by default on ATI didn't really work
for them. Don't enable it when the system supports C2/C3.
This doesn't fix the problem fully, but at least it's not worse than before.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Andi Kleen [Thu, 16 Feb 2006 22:42:01 +0000 (23:42 +0100)]
[PATCH] x86_64: Don't call do_exit with interrupts disabled after IRET exception
This caused a sigreturn with bad argument on a preemptible kernel
to complain with
Debug: sleeping function called from invalid context at /home/lsrc/quilt/linux/include/linux/rwsem.h:43
in_atomic():0, irqs_disabled():1
Call Trace: {__might_sleep+190} {profile_task_exit+21}
{__do_exit+34} {do_wait+0}
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Andi Kleen [Thu, 16 Feb 2006 22:41:58 +0000 (23:41 +0100)]
[PATCH] x86_64: Add boot option to disable randomized mappings and cleanup
AMD SimNow!'s JIT doesn't like them at all in the guest. For distribution
installation it's easiest if it's a boot time option.
Also I moved the variable to a more appropiate place and make
it independent from sysctl
And marked __read_mostly which it is.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Jan Beulich [Thu, 16 Feb 2006 22:41:55 +0000 (23:41 +0100)]
[PATCH] x86_64: make touch_nmi_watchdog() not touch impossible cpus' private data
Along with that, also suppress the memory touching altogether when the
watchdog is not running, to eliminate needless crosstalk. Plus ad a call
to it to make things consistent (one could also consider removing the call
in enable_timer_nmi_watchdog()).
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Andi Kleen [Thu, 16 Feb 2006 22:41:52 +0000 (23:41 +0100)]
[PATCH] x86_64: Update defconfig
... and enable 1394 by default.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Dan Williams [Sun, 5 Feb 2006 22:55:16 +0000 (17:55 -0500)]
[PATCH] wireless/atmel: fix Open System authentication process bugs
This patch fixes a number of bugs in the authentication process:
1) When falling back to Shared Key authentication mode from Open System,
a missing 'return' would cause the auth request to be sent, but would
drop the card into Management Error state. When falling back, the
driver should also indicate that it is switching to Shared Key mode by
setting exclude_unencrypted.
2) Initial authentication modes were apparently wrong in some cases,
causing the driver to attempt Shared Key authentication mode when in
fact the access point didn't support that mode or even had WEP disabled.
The driver should set the correct initial authentication mode based on
wep_is_on and exclude_unencrypted.
3) Authentication response packets from the access point in Open System
mode were getting ignored because the driver was expecting the sequence
number of a Shared Key mode response. The patch separates the OS and SK
mode handling to provide the correct behavior.
Signed-off-by: Dan Williams <dcbw@redhat.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Dan Williams [Sun, 5 Feb 2006 22:52:21 +0000 (17:52 -0500)]
[PATCH] wireless/atmel: fix setting TX key only in ENCODEEXT
The previous patch that added ENCODEEXT and AUTH support to the atmel
driver contained a slight error which would cause just setting the TX
key index to also set the encryption key again. This patch allows any
combination of setting the TX key index and setting an encryption key.
Signed-off-by: Dan Williams <dcbw@redhat.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Nicolas Pitre [Thu, 16 Feb 2006 22:36:15 +0000 (22:36 +0000)]
[ARM] 3339/1: ARM EABI: make unmuxed syscalls visible
Patch from Nicolas Pitre
With EABI the multiplex sys_ipc and sys_socketcall syscalls are
unavailable and their support code even removed from the compiled
kernel, and the new unmuxed syscalls must be used instead.
Make those syscall numbers visible.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Nicolas Pitre [Thu, 16 Feb 2006 22:36:13 +0000 (22:36 +0000)]
[ARM] 3338/1: old ABI compat: sys_socketcall
Patch from Nicolas Pitre
Commit
99595d0237926b5aba1fe4c844a011a1ba1ee1f8 forgot to intercept
sys_socketcall as well.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Martin Michlmayr [Thu, 16 Feb 2006 22:36:12 +0000 (22:36 +0000)]
[ARM] 3337/1: Fix NSLU2 flash support according to window size configuration patch
Patch from Martin Michlmayr
ARM patch 3226/1 (IXP4xx runtime expansion bus window size configuration)
forgot to update mach-ixp4xx/nslu2-setup.c which leads to the following
compilation error. Update NSLU2 flash support following patch 3226/1.
CC arch/arm/mach-ixp4xx/nslu2-setup.o
arch/arm/mach-ixp4xx/nslu2-setup.c:30: error: \91NSLU2_FLASH_BASE\92 undeclared here (not in a function)
arch/arm/mach-ixp4xx/nslu2-setup.c:31: error: \91NSLU2_FLASH_SIZE\92 undeclared here (not in a function)
make[1]: *** [arch/arm/mach-ixp4xx/nslu2-setup.o] Error 1
make: *** [arch/arm/mach-ixp4xx] Error 2
Signed-off-by: Martin Michlmayr <tbm@cyrius.com>
---
nslu2-setup.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Ashok Raj [Thu, 16 Feb 2006 22:01:48 +0000 (14:01 -0800)]
[IA64] Count disabled cpus as potential hot-pluggable CPUs
Minor updates to earlier patch.
- Added to documentation to add ia64 as well.
- Minor clarification on how to use disabled cpus
- used plain max instead of max_t per Andew Morton.
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Linus Torvalds [Thu, 16 Feb 2006 22:04:08 +0000 (14:04 -0800)]
Merge branch 'upstream-linus' of git://oss.oracle.com/home/sourcebo/git/ocfs2
Francois Romieu [Thu, 16 Feb 2006 21:17:00 +0000 (22:17 +0100)]
sis190: early setting of the pci driver private data
Below this point, the error path will proceed through
sis190_release_board(). It will happily oops if
pci_set_drvdata() has not been issued.
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Linus Torvalds [Thu, 16 Feb 2006 20:47:44 +0000 (12:47 -0800)]
Merge /linux/kernel/git/jejb/scsi-rc-fixes-2.6
Kurt Hackel [Tue, 14 Feb 2006 19:45:21 +0000 (11:45 -0800)]
[PATCH] ocfs2: detach from heartbeat events before freeing mle
Signed-off-by: Kurt Hackel <Kurt.Hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Mark Fasheh [Thu, 9 Feb 2006 21:23:39 +0000 (13:23 -0800)]
[PATCH] ocfs2: only checkpoint journal when asked to
Disable automatic checkpointing of the journal - this is a relic from older
ocfs2 days. Worth quite a bit of performance on longer running single node
tests.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Kurt Hackel [Thu, 19 Jan 2006 01:07:47 +0000 (17:07 -0800)]
[PATCH] ocfs2: manually grant remote recovery lock
* fix a hang in recovery that occurred in dlmlock_remote. the $RECOVERY
lock was never moved to the granted queue even after getting DLM_NORMAL
back from the master node.
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Kurt Hackel [Thu, 19 Jan 2006 01:05:38 +0000 (17:05 -0800)]
[PATCH] ocfs2: add dlm_wait_for_node_death
* add dlm_wait_for_node_death function to be used after receiving a network
error. this will wait for the given timeout to allow the heartbeat
callbacks to update the domain map. without this, some paths may spin
and consume enough cpu that the heartbeat gets starved and never updates.
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Kurt Hackel [Thu, 19 Jan 2006 01:02:56 +0000 (17:02 -0800)]
[PATCH] ocfs2: fix release of ast never reserved
* fix a bug in dlm_convert_lock_handler where dlm_lockres_release_ast was
being called even if no ast was ever reserved
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Kurt Hackel [Thu, 19 Jan 2006 01:01:25 +0000 (17:01 -0800)]
[PATCH] ocfs2: recheck recovery state after getting lock
* after successfully taking the $RECOVERY lock in EX mode, recheck to make
sure that recovery has not already begun or completed on another node
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Jack Steiner [Thu, 16 Feb 2006 01:46:50 +0000 (19:46 -0600)]
[IA64] Missing check for TIF_WORK if trace/audit enabled
It appears that if auditing is enabled, the kernel fails to
check for pending signals before returning to user mode.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Acked-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Russell King [Thu, 16 Feb 2006 16:48:31 +0000 (16:48 +0000)]
[MMC] mmci: allow small data transfers
If a data transfer is small (less than a FIFO size) we would
hang waiting for the data to be read due to the PIO interrupt
not occuring. We allowed for this in our PIO interrupt handler,
but not when setting up a data transfer.
Apply the "fix" when setting up a data transfer as well.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
NeilBrown [Thu, 16 Feb 2006 03:43:01 +0000 (14:43 +1100)]
[PATCH] Fix over-zealous tag clearing in radix_tree_delete
If a tag is set for a node being deleted from a radix_tree, then that
tag gets cleared from the parent of the node, even if it is set for some
siblings of the node begin deleted.
This patch changes the logic to include a test for any_tag_set similar
to the logic a little futher down. Care is taken to ensure that
'nr_cleared_tags' remains equals to the number of entries in the 'tags'
array which are set to '0' (which means that this tag is not set in the
tree below pathp->node, and should be cleared at pathp->node and
possibly above.
[ Nick says: "Linus FYI, I was able to modify the radix tree test
harness to catch the bug and can no longer trigger it after the fix.
Resulting code passes all other harness tests as well of course." ]
Signed-off-by: Neil Brown <neilb@suse.de>
Acked-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Russell King [Thu, 16 Feb 2006 11:08:09 +0000 (11:08 +0000)]
[ARM] Fix SMP initialisation oops
A change to the SMP initialisation caused the following oops:
CPU1: Booted secondary processor
CPU1: D VIPT write-back cache
CPU1: I cache: 32768 bytes, associativity 4, 32 byte lines, 256 sets
CPU1: D cache: 32768 bytes, associativity 4, 32 byte lines, 256 sets
<7>Calibrating delay loop... 83.14 BogoMIPS (lpj=415744)
<1>Unable to handle kernel NULL pointer dereference at virtual address
0000001c
...
PC is at enqueue_task+0x1c/0x64
LR is at activate_task+0xcc/0xe4
SMP initialisation now requires cpu_possible_map to be initialised in
setup_arch(). Move this from smp_prepare_cpus() to smp_init_cpus()
and call it from our setup_arch() if CONFIG_SMP is enabled.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Linus Torvalds [Thu, 16 Feb 2006 04:00:48 +0000 (20:00 -0800)]
Merge /pub/scm/linux/kernel/git/dtor/input
Linus Torvalds [Thu, 16 Feb 2006 03:56:33 +0000 (19:56 -0800)]
Merge /pub/scm/linux/kernel/git/davem/net-2.6
Bjorn Helgaas [Wed, 15 Feb 2006 23:17:43 +0000 (15:17 -0800)]
[PATCH] mmconfig: add kernel parameter documentation
Mention the "pci=nommconf" option in kernel-parameters.txt.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Andrew Morton [Wed, 15 Feb 2006 23:17:42 +0000 (15:17 -0800)]
[PATCH] swsusp: nuke noisy message
I get about 88 squillion of these when suspending an old ad450nx server.
Cc: Pavel Roskin <proski@gnu.org>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Daniel Yeisley [Wed, 15 Feb 2006 23:17:41 +0000 (15:17 -0800)]
[PATCH] x86_64: early initialization of cpu_to_node
The early initialization of cpu_to_node code as it is now only updates the
cpu_to_node array, and does not update cpu_pda()->nodemember. This will
cause numa_node_id() to return 0 on systems where CPU 0 is not on Node 0.
This leads to a kernel panic in slab.c.
I've tested the patch below on a 16 processor x86_64 ES7000-600 server, and
no longer see the panic I saw with the original 2.6.16-rc3.
Signed-off-by: Dan Yeisley <dan.yeisley@unisys.com>
Acked-by: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Roman Zippel [Wed, 15 Feb 2006 23:17:40 +0000 (15:17 -0800)]
[PATCH] hrtimer: fix multiple macro argument expansion
For two macros the arguments were expanded twice, change them to inline
functions to avoid it.
Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Michael S. Tsirkin [Wed, 15 Feb 2006 23:17:39 +0000 (15:17 -0800)]
[PATCH] add asm-generic/mman.h
Make new MADV_REMOVE, MADV_DONTFORK, MADV_DOFORK consistent across all
arches. The idea is to make it possible to use them portably even before
distros include them in libc headers.
Move common flags to asm-generic/mman.h
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Paul Jackson [Wed, 15 Feb 2006 23:17:38 +0000 (15:17 -0800)]
[PATCH] cpuset: oops in exit on null cpuset fix
Fix a latent bug in cpuset_exit() handling. If a task tried to allocate
memory after calling cpuset_exit(), it oops'd in
cpuset_update_task_memory_state() on a NULL cpuset pointer.
So set the exiting tasks cpuset to the root cpuset instead of to NULL.
A distro kernel hit this with an added kernel package that had just such a
hook (allocating memory) in the exit code path.
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Andrew Morton [Wed, 15 Feb 2006 23:17:37 +0000 (15:17 -0800)]
[PATCH] ide: touch softlockup detector during pio
We're getting some softlockup false positives during heavy PIO operations. So
poke the lockup detector.
Cc: Bartlomiej Zolnierkiewicz <B.Zolnierkiewicz@elka.pw.edu.pl>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Benjamin LaHaise [Wed, 15 Feb 2006 23:17:35 +0000 (15:17 -0800)]
[PATCH] kbuild: revert "fix make -jN with multiple targets with O=..."
Commit
296e0855b0f9a4ec9be17106ac541745a55b2ce1:
"kbuild: fix make -jN with multiple targets with O=..."
causes a ~95% increase in build time for the kernel. Before: 4m21s
after: 8m1.403s. Can we revert this until another approach is found?
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Christian Trefzer [Wed, 15 Feb 2006 23:17:34 +0000 (15:17 -0800)]
[PATCH] neofb: avoid resetting display config on unblank (v2)
There were two mistakes in the register-read-on-(un)blank approach.
- First, without proper register (un)locking the value read back will always
be zero, and this is what I missed entirely until just now. Due to this,
the logic could not be verified at all and I tried some bogus checks which
are completely stupid.
- Second, the LCD status bit will always be set to zero when the backlight
has been turned off. Reading the value back during unblank will disable the
LCD unconditionally, regardless of the state it is supposed to be in, since
we set it to zero beforehand.
So this is what we do now:
- create a new variable in struct neofb_par, and use that to determine
whether to read back registers (initialized to true)
- before actually blanking the screen, read back the register to sense any
possible change made through Fn key combo
- use proper neoUnlock() / neoLock() to actually read something
- every call to neofb_blank() determines if we read back next time: blanking
disables readback, unblanking (FB_BLANK_UNBLANK) enables it
This should give us a nice and clean state machine. Has been thoroughly
tested on a Dell Latitude CPiA / NM220 Chip docked to a C/Dock2 with attached
CRT in all possible combinations of LCD/CRT on/off. I changed the config via
Fn key, let the console blank, unblanked by keypress - works flawlessly.
Signed-off-by: Christian Trefzer <ctrefzer@gmx.de>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Yasuyuki Kozakai [Wed, 15 Feb 2006 23:25:18 +0000 (15:25 -0800)]
[NETFILTER]: nf_conntrack: Fix TCP/UDP HW checksum handling for IPv6 packet
If skb->ip_summed is CHECKSUM_HW here, skb->csum includes checksum
of actual IPv6 header and extension headers. Then such excess
checksum must be subtruct when nf_conntrack calculates TCP/UDP checksum
with pseudo IPv6 header. Spotted by Ben Skeggs.
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yasuyuki Kozakai [Wed, 15 Feb 2006 23:24:15 +0000 (15:24 -0800)]
[NETFILTER]: nf_conntrack: attach conntrack to locally generated ICMPv6 error
Locally generated ICMPv6 errors should be associated with the conntrack
of the original packet. Since the conntrack entry may not be in the hash
tables (for the first packet), it must be manually attached.
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yasuyuki Kozakai [Wed, 15 Feb 2006 23:23:28 +0000 (15:23 -0800)]
[NETFILTER]: nf_conntrack: attach conntrack to TCP RST generated by ip6t_REJECT
TCP RSTs generated by the REJECT target should be associated with the
conntrack of the original TCP packet. Since the conntrack entry is
usually not is the hash tables, it must be manually attached.
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yasuyuki Kozakai [Wed, 15 Feb 2006 23:22:21 +0000 (15:22 -0800)]
[NETFILTER]: nf_conntrack: move registration of __nf_ct_attach
Move registration of __nf_ct_attach to nf_conntrack_core to make it usable
for IPv6 connection tracking as well.
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yasuyuki Kozakai [Wed, 15 Feb 2006 23:21:31 +0000 (15:21 -0800)]
[NETFILTER]: x_tables: fix dependencies of conntrack related modules
NF_CONNTRACK_MARK is bool and depends on NF_CONNTRACK which is
tristate. If a variable depends on NF_CONNTRACK_MARK and doesn't take
care about NF_CONNTRACK, it can be y even if NF_CONNTRACK isn't y.
NF_CT_ACCT have same issue, too.
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Wed, 15 Feb 2006 23:18:19 +0000 (15:18 -0800)]
[NETFILTER]: Don't invoke okfn in CONFIG_NETFILTER=n variant of nf_hook()
nf_hook() is supposed to call the netfilter hook and return control of the
packet back to the caller in case it may pass, the okfn is only used for
queueing.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tony Luck [Wed, 15 Feb 2006 23:17:57 +0000 (15:17 -0800)]
Pull fix-cpu-possible-map into release branch
Horms [Wed, 15 Feb 2006 08:23:09 +0000 (17:23 +0900)]
[IA64] support panic_on_oops sysctl
Trivial port of this feature from i386
As it stands, panic_on_oops but does nothing on ia64
Signed-Off-By: Horms <horms@verge.net.au>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Patrick McHardy [Wed, 15 Feb 2006 23:10:22 +0000 (15:10 -0800)]
[XFRM]: Fix SNAT-related crash in xfrm4_output_finish
When a packet matching an IPsec policy is SNATed so it doesn't match any
policy anymore it looses its xfrm bundle, which makes xfrm4_output_finish
crash because of a NULL pointer dereference.
This patch directs these packets to the original output path instead. Since
the packets have already passed the POST_ROUTING hook, but need to start at
the beginning of the original output path which includes another
POST_ROUTING invocation, a flag is added to the IPCB to indicate that the
packet was rerouted and doesn't need to pass the POST_ROUTING hook again.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
hawkes@sgi.com [Tue, 14 Feb 2006 18:40:17 +0000 (10:40 -0800)]
[IA64] ia64: simplify and fix udelay()
The original ia64 udelay() was simple, but flawed for platforms without
synchronized ITCs: a preemption and migration to another CPU during the
while-loop likely resulted in too-early termination or very, very
lengthy looping.
The first fix (now in 2.6.15) broke the delay loop into smaller,
non-preemptible chunks, reenabling preemption between the chunks. This
fix is flawed in that the total udelay is computed to be the sum of just
the non-premptible while-loop pieces, i.e., not counting the time spent
in the interim preemptible periods. If an interrupt or a migration
occurs during one of these interim periods, then that time is invisible
and only serves to lengthen the effective udelay().
This new fix backs out the current flawed fix and returns to a simple
udelay(), fully preemptible and interruptible. It implements two simple
alternative udelay() routines: one a default generic version that uses
ia64_get_itc(), and the other an sn-specific version that uses that
platform's RTC.
Signed-off-by: John Hawkes <hawkes@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Dean Nelson [Wed, 15 Feb 2006 14:02:21 +0000 (08:02 -0600)]
[IA64-SGI] enforce proper ordering of callouts by XPC
Fix XPC so that it does not deliver any messages until the connected
callout has returned, as well as, prevent the disconnected callout to
occur before the disconnecting callout has returned.
Signed-off-by: Dean Nelson <dcn@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Dean Roe [Tue, 14 Feb 2006 21:01:23 +0000 (15:01 -0600)]
[IA64-SGI] fix the size of __sn_cnodeid_to_nasid
The __sn_cnodeid_to_nasid array was incorrectly sized at MAX_NUMNODES.
On a large system, this array could overflow. The following patch
corrects this by defining it to MAX_COMPACT_NODES.
Signed-off-by: Dean Roe <roe@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Mark Maule [Tue, 14 Feb 2006 16:23:37 +0000 (10:23 -0600)]
[IA64-SGI] export sn_pcidev_info_get
Export sn_pcidev_info_get.
Signed-off-by Mark Maule <maule@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Jes Sorensen [Mon, 13 Feb 2006 10:35:01 +0000 (05:35 -0500)]
[IA64-SGI] remove compile time warning
This one falls into the "present for Andrew Morton" category to address
his wishlist for a compiler warning free build ;-)
Signed-off-by: Tony Luck <tony.luck@intel.com>
Jes Sorensen [Mon, 13 Feb 2006 10:32:09 +0000 (05:32 -0500)]
[IA64] remove obsolete corporate address
Remove obsolete SGI address
Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>