Rusty Russell [Fri, 30 May 2008 20:09:46 +0000 (15:09 -0500)]
lguest: notify on empty
This is the lguest implementation of the VIRTIO_F_NOTIFY_ON_EMPTY feature.
It is currently only published for network devices, but it is turned on for
everyone.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Rusty Russell [Fri, 30 May 2008 20:09:45 +0000 (15:09 -0500)]
virtio: force callback on empty.
virtio allows drivers to suppress callbacks (ie. interrupts) for
efficiency (no locking, it's just an optimization).
There's a similar mechanism for the host to suppress notifications
coming from the guest: in that case, we ignore the suppression if the
ring is completely full.
It turns out that life is simpler if the host similarly ignores
callback suppression when the ring is completely empty: the network
driver wants to free up old packets in a timely manner, and otherwise
has to use a timer to poll.
We have to remove the code which ignores interrupts when the driver
has disabled them (again, it had no locking and hence was unreliable
anyway).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Christian Borntraeger [Thu, 29 May 2008 09:10:01 +0000 (11:10 +0200)]
virtio_blk: fix endianess annotations
Since commit
72e61eb40b55dd57031ec5971e810649f82b0259 (virtio: change config
to guest endian) config space is no longer fixed endian.
Lets change the virtio_blk_config variables.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Christian Borntraeger [Thu, 29 May 2008 09:08:01 +0000 (11:08 +0200)]
virtio_config: fix len calculation of config elements
Rusty,
This patch is a prereq for the virtio_blk blocksize patch, please apply it
first.
Adding an u32 value to the virtio_blk_config unconvered a small bug the config
space defintions:
v is a pointer, to we have to use sizeof(*v) instead of sizeof(v).
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Christian Borntraeger [Mon, 26 May 2008 09:29:27 +0000 (11:29 +0200)]
virtio_net: another race with virtio_net and enable_cb
Hello Rusty,
seems that we still have a problem with virtio_net and the enable_cb callback.
During a long running network stress tests with virtio and got the following
oops:
------------[ cut here ]------------
kernel BUG at drivers/virtio/virtio_ring.c:230!
illegal operation: 0001 [#1] SMP
Modules linked in:
CPU: 0 Not tainted
2.6.26-rc2-kvm-00436-gc94c08b-dirty #34
Process netserver (pid: 2582, task:
000000000fbc4c68, ksp:
000000000f42b990)
Krnl PSW :
0704c00180000000 00000000002d0ec8 (vring_enable_cb+0x1c/0x60)
R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 EA:3
Krnl GPRS:
0000000000000000 0000000000000000 000000000ef3d000 0000000010009800
0000000000000000 0000000000419ce0 0000000000000080 000000000000007b
000000000adb5538 000000000ef40900 000000000ef40000 000000000ef40920
0000000000000000 0000000000000005 000000000029c1b0 000000000fea7d18
Krnl Code:
00000000002d0ebc:
a7110001 tmll %r1,1
00000000002d0ec0:
a7740004 brc 7,2d0ec8
00000000002d0ec4:
a7f40001 brc 15,2d0ec6
>
00000000002d0ec8:
a517fffe nill %r1,65534
00000000002d0ecc:
40103000 sth %r1,0(%r3)
00000000002d0ed0: 07f0 bcr 15,%r0
00000000002d0ed2:
e31020380004 lg %r1,56(%r2)
00000000002d0ed8:
a7480000 lhi %r4,0
Call Trace:
([<
000000000029c0fc>] virtnet_poll+0x290/0x3b8)
[<
0000000000333fb8>] net_rx_action+0x9c/0x1b8
[<
00000000001394bc>] __do_softirq+0x74/0x108
[<
000000000010d16a>] do_softirq+0x92/0xac
[<
0000000000139826>] irq_exit+0x72/0xc8
[<
000000000010a7b6>] do_extint+0xe2/0x104
[<
0000000000110508>] ext_no_vtime+0x16/0x1a
Last Breaking-Event-Address:
[<
00000000002d0ec4>] vring_enable_cb+0x18/0x60
I looked into the virtio_net code for some time and I think the following
scenario happened. Please look at virtnet_poll:
[...]
/* Out of packets? */
if (received < budget) {
netif_rx_complete(vi->dev, napi);
if (unlikely(!vi->rvq->vq_ops->enable_cb(vi->rvq))
&& napi_schedule_prep(napi)) {
vi->rvq->vq_ops->disable_cb(vi->rvq);
__netif_rx_schedule(vi->dev, napi);
goto again;
}
}
If an interrupt arrives after netif_rx_complete, a second poll routine can run
on a different cpu. The second check for napi_schedule_prep would prevent any
harm in the network stack, but we have called enable_cb possibly after the
disable_cb in skb_recv_done.
static void skb_recv_done(struct virtqueue *rvq)
{
struct virtnet_info *vi = rvq->vdev->priv;
/* Schedule NAPI, Suppress further interrupts if successful. */
if (netif_rx_schedule_prep(vi->dev, &vi->napi)) {
rvq->vq_ops->disable_cb(rvq);
__netif_rx_schedule(vi->dev, &vi->napi);
}
}
That means that the second poll routine runs with interrupts enabled, which is
ok, since we can handle additional interrupts. The problem is now that the
second poll routine might also call enable_cb, triggering the BUG.
The only solution I can come up with, is to remove the BUG statement in
enable_cb - similar to disable_cb. Opinions or better ideas where the oops
could come from?
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Rusty Russell [Fri, 30 May 2008 20:09:44 +0000 (15:09 -0500)]
virtio: An entropy device, as suggested by hpa.
Note that by itself, having a "hardware" random generator does very
little: you should probably run "rngd" in your guest to feed this into
the kernel entropy pool.
Included:
virtio_rng: dont use vmalloced addresses for virtio
If virtio_rng is build as a module, random_data is an address
in vmalloc space. As virtio expects guest real addresses, this
can cause any kind of funny behaviour, so lets allocate
random_data dynamically with kmalloc.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Christian Borntraeger [Fri, 16 May 2008 09:17:03 +0000 (11:17 +0200)]
virtio_blk: allow read-only disks
Hello Rusty,
sometimes it is useful to share a disk (e.g. usr). To avoid file system
corruption, the disk should be mounted read-only in that case. This patch
adds a new feature flag, that allows the host to specify, if the disk should
be considered read-only.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Rusty Russell [Fri, 30 May 2008 20:09:42 +0000 (15:09 -0500)]
lguest: fix ugly <NULL> in /proc/interrupts
Before:
root@ubuntu:~# cat /proc/interrupts
CPU0
1: 1672 lguest-<NULL> virtio0
2: 1 lguest-<NULL> virtio1
...
After:
root@ubuntu:~# cat /proc/interrupts
CPU0
1: 2889 lguest-level virtio0
2: 9 lguest-level virtio1
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Rusty Russell [Fri, 30 May 2008 20:09:42 +0000 (15:09 -0500)]
virtio: set device index in common code.
Anthony Liguori points out that three different transports use the virtio code,
but each one keeps its own counter to set the virtio_device's index field. In
theory (though not in current practice) this means that names could be
duplicated, and that risk grows as more transports are created.
So we move the selection of the unique virtio_device.index into the common code
in virtio.c, which has the side-benefit of removing duplicate code.
The only complexity is that lguest and S/390 use the index to uniquely identify
the device in case of catastrophic failure before register_virtio_device() is
called: now we use the offset within the descriptor page as a unique identifier
for the printks.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Carsten Otte <cotte@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Chris Lalancette <clalance@redhat.com>
Cc: Anthony Liguori <anthony@codemonkey.ws>
Rusty Russell [Fri, 30 May 2008 20:09:42 +0000 (15:09 -0500)]
virtio: virtio_pci should not set bus_id.
The common virtio code sets the bus_id, overriding anything virtio_pci
sets anyway.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Carsten Otte <cotte@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Chris Lalancette <clalance@redhat.com>
Cc: Anthony Liguori <anthony@codemonkey.ws>
Rusty Russell [Fri, 30 May 2008 20:09:41 +0000 (15:09 -0500)]
virtio: bus_id for devices should contain 'virtio'
Chris Lalancette <clalance@redhat.com> points out that virtio.c sets all device
names to '0', '1', etc, which looks silly in /proc/interrupts. We change this
from '%d' to 'virtio%d'.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Carsten Otte <cotte@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Chris Lalancette <clalance@redhat.com>
Cc: Anthony Liguori <anthony@codemonkey.ws>
Chris Lalancette [Fri, 30 May 2008 20:09:41 +0000 (15:09 -0500)]
Fix crash in virtio_blk during modprobe ; rmmod ; modprobe
Fix a modprobe virtio_blk ; rmmod virtio_blk ; modprobe virtio_blk crash; this
was basically because we weren't doing "del_gendisk()" in the remove path.
Signed-off-by: Chris Lalancette <clalance@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (moved del_gendisk up)
Rusty Russell [Fri, 30 May 2008 20:09:40 +0000 (15:09 -0500)]
lguest: use ioremap_cache, not ioremap
Thanks to Jon Corbet & LWN. Only took me a day to join the dots.
Host->Guest netcat before (with unnecessily large receive buffers):
1073741824 bytes (1.1 GB) copied, 24.7528 seconds, 43.4 MB/s
After:
1073741824 bytes (1.1 GB) copied, 17.6369 seconds, 60.9 MB/s
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
David Howells [Wed, 28 May 2008 15:49:01 +0000 (16:49 +0100)]
Fix FRV minimum slab/kmalloc alignment
> +#define ARCH_KMALLOC_MINALIGN (sizeof(long) * 2)
> +#define ARCH_SLAB_MINALIGN (sizeof(long) * 2)
This doesn't work if SLAB is selected and slab debugging is enabled as
these are passed to the preprocessor, and the preprocessor doesn't
understand sizeof.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Wed, 28 May 2008 15:00:51 +0000 (08:00 -0700)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
cfq-iosched: fix RCU problem in cfq_cic_lookup()
block: make blktrace use per-cpu buffers for message notes
Added in elevator switch message to blktrace stream
Added in MESSAGE notes for blktraces
block: reorder cfq_queue to save space on 64bit builds
block: Move the second call to get_request to the end of the loop
splice: handle try_to_release_page() failure
splice: fix sendfile() issue with relay
David Howells [Wed, 28 May 2008 14:36:34 +0000 (15:36 +0100)]
FRV: Specify the minimum slab/kmalloc alignment
Specify the minimum slab/kmalloc alignment to be 8 bytes. This fixes a
crash when SLOB is selected as the memory allocator. The FRV arch needs
this so that it can use the load- and store-double instructions without
faulting. By default SLOB sets the minimum to be 4 bytes.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vegard Nossum [Wed, 28 May 2008 12:55:24 +0000 (13:55 +0100)]
MN10300: Fix typo in header guard
Fix a typo in the header guard of asm/ipc.h.
Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jens Axboe [Wed, 28 May 2008 12:46:59 +0000 (14:46 +0200)]
cfq-iosched: fix RCU problem in cfq_cic_lookup()
cfq_cic_lookup() needs to properly protect ioc->ioc_data before
dereferencing it and also exclude updaters of ioc->ioc_data as well.
Also add a number of comments documenting why the existing RCU usage
is OK.
Thanks a lot to "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> for
review and comments!
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Jens Axboe [Wed, 28 May 2008 12:45:33 +0000 (14:45 +0200)]
block: make blktrace use per-cpu buffers for message notes
Currently it uses a single static char array, but that risks
being corrupted when multiple users issue message notes at the
same time. Make the buffers dynamically allocated when the trace
is setup and make them per-cpu instead.
The default max message size of 1k is also very large, the
interface is mainly for small text notes. So shrink it to 128 bytes.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Alan D. Brunelle [Tue, 27 May 2008 12:55:00 +0000 (14:55 +0200)]
Added in elevator switch message to blktrace stream
Signed-off-by: Alan D. Brunelle <alan.brunelle@hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Alan D. Brunelle [Tue, 27 May 2008 12:54:41 +0000 (14:54 +0200)]
Added in MESSAGE notes for blktraces
Allows messages to be inserted into blktrace streams.
Signed-off-by: Alan D. Brunelle <alan.brunelle@hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Richard Kennedy [Fri, 23 May 2008 04:52:00 +0000 (06:52 +0200)]
block: reorder cfq_queue to save space on 64bit builds
saves 8 bytes of padding & increases objects/slab from 30 to 32 on my
AMD64 config
Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Zhang, Yanmin [Thu, 22 May 2008 13:13:29 +0000 (15:13 +0200)]
block: Move the second call to get_request to the end of the loop
In function get_request_wait, the second call to get_request could be
moved to the end of the while loop, because if the first call to
get_request fails, the second call will fail without sleep.
Signed-off-by: Zhang Yanmin <yanmin.zhang@intel.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Jens Axboe [Tue, 20 May 2008 19:27:41 +0000 (21:27 +0200)]
splice: handle try_to_release_page() failure
splice currently assumes that try_to_release_page() always suceeds,
but it can return failure. If it does, we cannot steal the page.
Acked-by: Mingming Cao <cmm@us.ibm.com
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Tom Zanussi [Fri, 9 May 2008 11:28:36 +0000 (13:28 +0200)]
splice: fix sendfile() issue with relay
Splice isn't always incrementing the ppos correctly, which broke
relay splice.
Signed-off-by: Tom Zanussi <zanussi@comcast.net>
Tested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Linus Torvalds [Wed, 28 May 2008 01:47:59 +0000 (18:47 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jbarnes/pci-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
pciehp: add message about pciehp_slot_with_bus option
pci hotplug core: add check of duplicate slot name
pciehp: move msleep after power off
pciehp: poll cmd completion if hotplug interrupt is disabled
pciehp: fix slow probing
pciehp: fix NULL dereference in interrupt handler
shpchp: add message about shpchp_slot_with_bus option
PCI: don't enable ASPM on devices with mixed PCIe/PCI functions
Kenji Kaneshige [Tue, 27 May 2008 10:07:33 +0000 (19:07 +0900)]
pciehp: add message about pciehp_slot_with_bus option
Some (broken?) platform assign the same slot name to multiple hotplug
slots. On such system, slot initialization would fail because of name
collision. The pciehp driver already have a "slot_with_bus" module
option which adds the bus number into the slot name. This patch adds
the message about this module option that will be displayed when slot
name collision is detected.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Kristen Carlson Accardi <kristen.c.accardi@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Kenji Kaneshige [Tue, 27 May 2008 10:07:01 +0000 (19:07 +0900)]
pci hotplug core: add check of duplicate slot name
Fix the following errors reported by Jan C. Nordholz in
http://bugzilla.kernel.org/show_bug.cgi?id=10751.
kobject_add_internal failed for 2 with -EEXIST, don't try to register things with the same name in the same directory.
Pid: 1, comm: swapper Tainted: G W 2.6.26-rc3 #1
[<
c0266980>] kobject_add_internal+0x140/0x190
[<
c0266afd>] kobject_init_and_add+0x2d/0x40
[<
c027bc91>] pci_hp_register+0x81/0x2f0
[<
c027fd07>] pciehp_probe+0x1a7/0x470
[<
c01b3b84>] sysfs_add_one+0x44/0xa0
[<
c01b3c1f>] sysfs_addrm_start+0x3f/0xb0
[<
c01b497a>] sysfs_create_link+0x8a/0xf0
[<
c0279570>] pcie_port_probe_service+0x50/0x80
[<
c02e0545>] driver_sysfs_add+0x55/0x70
[<
c02e0662>] driver_probe_device+0x82/0x180
[<
c02e07cc>] __driver_attach+0x6c/0x70
[<
c02dfe0a>] bus_for_each_dev+0x3a/0x60
[<
c05db2d0>] pcied_init+0x0/0x80
[<
c02e04e6>] driver_attach+0x16/0x20
[<
c02e0760>] __driver_attach+0x0/0x70
[<
c02e0341>] bus_add_driver+0x1a1/0x220
[<
c05db2d0>] pcied_init+0x0/0x80
[<
c02e09cd>] driver_register+0x4d/0x120
[<
c05db050>] ibm_acpiphp_init+0x0/0x190
[<
c0125aab>] printk+0x1b/0x20
[<
c05db2d0>] pcied_init+0x0/0x80
[<
c05db2de>] pcied_init+0xe/0x80
[<
c05c751a>] kernel_init+0x10a/0x300
[<
c0120138>] schedule_tail+0x18/0x50
[<
c0103b9a>] ret_from_fork+0x6/0x1c
[<
c05c7410>] kernel_init+0x0/0x300
[<
c05c7410>] kernel_init+0x0/0x300
[<
c010485b>] kernel_thread_helper+0x7/0x1c
=======================
pci_hotplug: Unable to register kobject '2'<3>pciehp: pci_hp_register failed with error -22
Slot with the same name can be registered multiple times if shpchp or
pciehp driver is loaded after acpiphp is loaded because ACPI based
hotplug driver and Native OS hotplug driver trying to handle the same
physical slot. In this case, current pci_hotplug core will call
kobject_init_and_add() muliple time with the same name. This is the
cause of this problem. To fix this problem, this patch adds the check
into pci_hp_register() to see if the slot with the same name.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Kristen Carlson Accardi <kristen.c.accardi@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Kenji Kaneshige [Tue, 27 May 2008 10:06:22 +0000 (19:06 +0900)]
pciehp: move msleep after power off
According to the PCI Express specification, we must wait for at least
1 second after turning power off before taking any action that relies
on power having been removed from the slot/adapter. For this, current
pciehp wait for 1 second after issuing the power off command in
hpc_power_off_slot() function. But waiting for 1 second in
hpc_power_off_slot() can make pciehp probing slow-down because pciehp
probe code calls hpc_power_off_slot() if the slot is not occupied just
in case. We don't need to wait for 1 second at the pciehp probe time
because there is no action on that empty slot. So move 1 second wait
from hpc_power_off_slot() to the caller of hpc_power_off_slot().
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Kristen Carlson Accardi <kristen.c.accardi@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Kenji Kaneshige [Tue, 27 May 2008 10:05:26 +0000 (19:05 +0900)]
pciehp: poll cmd completion if hotplug interrupt is disabled
Fix improper long wait for command completion in pciehp probing.
As described in PCI Express specification, software notification is
not generated if the command that occurs as a result of a write to the
Slot Control register that disables software notification of command
completed events. Since pciehp driver doesn't take it into account,
such command is issued in pciehp probing, and it causes improper long
wait for command completion.
This patch changes the pciehp driver to take such command into
account.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Kristen Carlson Accardi <kristen.c.accardi@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Kenji Kaneshige [Tue, 27 May 2008 10:04:30 +0000 (19:04 +0900)]
pciehp: fix slow probing
Fix the "pciehp probing slow" problem reported from Jan C. Nordholz in
http://bugzilla.kernel.org/show_bug.cgi?id=10751.
The command completed bit in Slot Status register applies only to
commands issued to control the attention indicator, power indicator,
power controller, or electromechanical interlock. However, writes to
other parts of the Slot Control register would end up writing to the
control fields. Hence, any write to Slot Control register is
considered as a command. However, if the controller doesn't support
any of attention indicator, power indicator, power controller and
electromechanical interlock, command completed bit would not set in
writing to Slot Control register. In this case, we should not wait for
command completed bit set, otherwise all commands would be considered
not completed in timeout seconds (1 sec.).
The cause of the problem is pciehp driver didn't take this situation
into account. This patch changes pciehp to take it into account. This
patch also add the check for "No Command Completed Support" bit in
Slot Capability register. If it is set, we should not wait for command
completed bit set as well.
This problem seems to be revealed by the commit
c27fb883dffe11aa4cb35ecea1fa1832ba45d4da that fixed the bug that
pciehp did not wait for command completed properly (pciehp just
ignored the command completion event).
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Kristen Carlson Accardi <kristen.c.accardi@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Kenji Kaneshige [Tue, 27 May 2008 10:03:16 +0000 (19:03 +0900)]
pciehp: fix NULL dereference in interrupt handler
Fix the following NULL dereference problem reported from Pierre Ossman
and Ingo Molnar.
pciehp: HPC vendor_id 8086 device_id 27d0 ss_vid 0 ss_did 0
pciehp: pciehp_find_slot: slot (device=0x0) not found
BUG: unable to handle kernel NULL pointer dereference at
0000000000000070
IP: [<
ffffffff80494a8b>] pciehp_handle_presence_change+0x7e/0x113
PGD 0
Oops: 0000 [1]
CPU 0
Modules linked in:
Pid: 1, comm: swapper Tainted: G W
2.6.26-rc3-sched-devel.git-00001-g2b99b26-dirty #170
RIP: 0010:[<
ffffffff80494a8b>] [<
ffffffff80494a8b>] pciehp_handle_presence_change+0x7e/0x113
RSP: 0000:
ffff81003f83fbb0 EFLAGS:
00010046
RAX:
0000000000000039 RBX:
0000000000000000 RCX:
0000000000000000
RDX:
0000000000000000 RSI:
0000000000000001 RDI:
0000000000000046
RBP:
ffff81003f83fbd0 R08:
0000000000000001 R09:
ffffffff80245103
R10:
0000000000000020 R11:
0000000000000000 R12:
ffff81003ea53a30
R13:
0000000000000000 R14:
0000000000000011 R15:
ffffffff80495926
FS:
0000000000000000(0000) GS:
ffffffff80be7400(0000) knlGS:
0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0:
000000008005003b
CR2:
0000000000000070 CR3:
0000000000201000 CR4:
00000000000006a0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000ffff0ff0 DR7:
0000000000000400
Process swapper (pid: 1, threadinfo
ffff81003f83e000, task
ffff81003f840000)
Stack:
0000000000000008 ffff81003f83fbf6 ffff81003ea53a30 0000000000000008
ffff81003f83fc10 ffffffff80495ab4 0000000000000011 0000000000000002
0000000000000202 0000000000000202 00000000fffffff4 ffff81003ea53a30
Call Trace:
[<
ffffffff80495ab4>] pcie_isr+0x18e/0x1bc
[<
ffffffff80260831>] request_irq+0x106/0x12f
[<
ffffffff80495fb6>] pcie_init+0x15e/0x6cc
[<
ffffffff804933a3>] pciehp_probe+0x64/0x541
[<
ffffffff8048f4e7>] pcie_port_probe_service+0x4c/0x76
[<
ffffffff8054af70>] driver_probe_device+0xd4/0x1f0
[<
ffffffff8054b108>] __driver_attach+0x7c/0x7e
[<
ffffffff8054b08c>] ? __driver_attach+0x0/0x7e
[<
ffffffff8054a4b6>] bus_for_each_dev+0x53/0x7d
[<
ffffffff8054ad3c>] driver_attach+0x1c/0x1e
[<
ffffffff8054a9c2>] bus_add_driver+0xdd/0x25b
[<
ffffffff80c09d3d>] ? pcied_init+0x0/0x8b
[<
ffffffff8054b288>] driver_register+0x5f/0x13e
[<
ffffffff80c09d3d>] ? pcied_init+0x0/0x8b
[<
ffffffff8048f441>] pcie_port_service_register+0x47/0x49
[<
ffffffff80c09d52>] pcied_init+0x15/0x8b
[<
ffffffff80bf3938>] kernel_init+0x75/0x243
[<
ffffffff808639d2>] ? _spin_unlock_irq+0x2b/0x3a
[<
ffffffff80228d1f>] ? finish_task_switch+0x57/0x9a
[<
ffffffff8020c258>] child_rip+0xa/0x12
[<
ffffffff8020bcec>] ? restore_args+0x0/0x30
[<
ffffffff80bf38c3>] ? kernel_init+0x0/0x243
[<
ffffffff8020c24e>] ? child_rip+0x0/0x12
Code: 83 80 00 00 00 48 39 f0 75 e1 0f b6 c9 48 c7 c2 00 0e 8d 80 48 c7 c6 8a 60 a6 80 48 c7 c7 10 db a8 80 31 c0 e8 3f 8d d9 ff 31 db <48> 8b 43 70 48 8d 75 ef 48 89 df ff 50 30 80 7d ef 00 74 37 48
RIP [<
ffffffff80494a8b>] pciehp_handle_presence_change+0x7e/0x113
RSP <
ffff81003f83fbb0>
CR2:
0000000000000070
Kernel panic - not syncing: Fatal exception
The situation under which it occurs is hw and timing related: it appears
to happen on a system that has PCI hotplug hardware but with no active
hotplug cards, and another interrupt in the same (shared) IRQ line
arrives too early, before the hotplug-slot entry has been set up - as
triggered by CONFIG_DEBUG_SHIRQ=y:
This patch contains the following two fixes.
(1) Clear all events bits in Slot Status register to prevent the pciehp
driver from detecting the spurious events that would have been occur
before pciehp loading.
(2) Add check whether slot initialization had been already done.
This is short term fix. We need more structural fixes to install
interrupt handler after slot initialization is done.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Kristen Carlson Accardi <kristen.c.accardi@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Kenji Kaneshige [Tue, 27 May 2008 10:08:23 +0000 (19:08 +0900)]
shpchp: add message about shpchp_slot_with_bus option
Some (broken?) platform assign the same slot name to multiple hotplug
slots. On such system, slot initialization would fail because of name
collision. The shpchp driver already have a "slot_with_bus" module
option which adds the bus number into the slot name. This patch adds
the message about this module option that will be displayed when slot
name collision is detected.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Kristen Carlson Accardi <kristen.c.accardi@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Linus Torvalds [Tue, 27 May 2008 15:27:20 +0000 (08:27 -0700)]
Merge git://git./linux/kernel/git/hskinnemoen/avr32-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/hskinnemoen/avr32-2.6:
avr32: Fix cpufreq oops when ondemand governor is default
avr32: Update defconfigs
avr32: export strnlen_user
avr32: export copy_page
David Woodhouse [Tue, 27 May 2008 05:31:43 +0000 (06:31 +0100)]
ck804rom: fix driver_data in probe table.
There's a reason why using C99 initialisers even in the supposedly
trivial structs is a good idea.
Signed-off-by: Carl-Daniel Hailfinger <c-d.hailfinger.devel.2006@gmx.net>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Haavard Skinnemoen [Tue, 27 May 2008 07:37:42 +0000 (09:37 +0200)]
avr32: Fix cpufreq oops when ondemand governor is default
Move the AP7 cpufreq init to late_initcall() so that we don't try to
bring up cpufreq until the governor is ready. x86 also uses
late_initcall() for this.
Signed-off-by: Haavard Skinnemoen <haavard.skinnemoen@atmel.com>
Linus Torvalds [Mon, 26 May 2008 18:07:53 +0000 (11:07 -0700)]
Linux 2.6.26-rc4
Oleg Nesterov [Mon, 26 May 2008 16:55:42 +0000 (20:55 +0400)]
posix timers: discard SI_TIMER signals on exec
Based on Roland's patch. This approach was suggested by Austin Clements
from the very beginning, and then by Linus.
As Austin pointed out, the execing task can be killed by SI_TIMER signal
because exec flushes the signal handlers, but doesn't discard the pending
signals generated by posix timers. Perhaps not a bug, but people find this
surprising. See http://bugzilla.kernel.org/show_bug.cgi?id=10460
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Austin Clements <amdragon+kernelbugzilla@mit.edu>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oleg Nesterov [Mon, 26 May 2008 16:55:42 +0000 (20:55 +0400)]
posix timers: sigqueue_free: don't free sigqueue if it is queued
Currently sigqueue_free() removes sigqueue from list, but doesn't cancel the
pending signal. This is not consistent, the task should either receive the
"full" signal along with siginfo_t, or it shouldn't receive the signal at all.
Change sigqueue_free() to clear SIGQUEUE_PREALLOC but leave sigqueue on list
if it is queued.
This is a user-visible change. If the signal is blocked, it stays queued
after sys_timer_delete() until unblocked with the "stale" si_code/si_value,
and of course it is still counted wrt RLIMIT_SIGPENDING which also limits
the number of posix timers.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Austin Clements <amdragon+kernelbugzilla@mit.edu>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Mon, 26 May 2008 17:24:06 +0000 (10:24 -0700)]
Merge branch 'i2c-for-linus' of git://jdelvare.pck.nerim.net/jdelvare-2.6
* 'i2c-for-linus' of git://jdelvare.pck.nerim.net/jdelvare-2.6:
i2c: Align i2c_device_id
tuner: Do not alter i2c_client.name
Linus Torvalds [Mon, 26 May 2008 17:21:26 +0000 (10:21 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/penberg/slab-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6:
slub: ksize() abuse checks
slob: Fix to return wrong pointer
Linus Torvalds [Mon, 26 May 2008 17:20:40 +0000 (10:20 -0700)]
Merge git://git./linux/kernel/git/lethal/sh-2.6.26
* git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6.26:
sh: Drop broken URAM support on SH7723.
sh: update Migo-R defconfig
sh: use sm501 8250 mfd support on r2d boards
sh: add probe support for new sh7723 cut
sh: fix VPU interrupt vector for sh7723
sh: fix USBF resource for sh7722
Linus Torvalds [Mon, 26 May 2008 17:14:37 +0000 (10:14 -0700)]
Merge git://git./linux/kernel/git/davem/sparc-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
sparc64: global_reg_snapshot is not for userspace
Linus Torvalds [Mon, 26 May 2008 17:14:02 +0000 (10:14 -0700)]
Merge git://git./linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (52 commits)
vlan: Use bitmask of feature flags instead of seperate feature bits
fmvj18x_cs: add NextCom NC5310 rev B support
xirc2ps_cs: re-initialize the multicast address in do_reset
3C509: rx_bytes should not be increased when alloc_skb failed
NETFRONT: Use __skb_queue_purge()
VIRTIO: Use __skb_queue_purge()
phylib: do EXPORT_SYMBOL on get_phy_id
netlink: Fix nla_parse_nested_compat() to call nla_parse() directly
WAN: protect HDLC proto list while insmod/rmmod
drivers/net/fs_enet: remove null pointer dereference
S2io: Version update for napi and MSI-X patches
S2io: Added napi support when MSIX is enabled.
S2io: Move all the transmit completions to a single msi-x (alarm) vector
drivers/net/ehea - remove unnecessary memset after kzalloc
au1000_eth: remove useless check
Blackfin EMAC Driver: Removed duplicated include <linux/ethtool.h>
cpmac bugfixes and enhancements
e1000e: use resource_size_t, not unsigned long, for phys addrs
net/usb: add support for Apple USB Ethernet Adapter
uli526x: add support for netpoll
...
Jiri Slaby [Mon, 26 May 2008 14:08:40 +0000 (16:08 +0200)]
i2c: Align i2c_device_id
Align i2c_device_id.driver_data to 8 bytes to not fail on crossbuilds.
(Added in
d2653e92732bd3911feff6bee5e23dbf959381db.)
Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Michael Krufky [Mon, 26 May 2008 14:08:40 +0000 (16:08 +0200)]
tuner: Do not alter i2c_client.name
The tuner driver used to change i2c_client.name for its own needs, but
it really shouldn't, as this field is used by i2c-core to do the
device/driver matching. So, create and use a separate field for the
tuner driver needs.
Signed-off-by: Michael Krufky <mkrufky@linuxtv.org>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Haavard Skinnemoen [Mon, 26 May 2008 11:25:05 +0000 (13:25 +0200)]
avr32: Update defconfigs
Just provide reasonable defaults for the new stuff. Tickless and
hrtimers are turned on for all boards except ATSTK1004.
Signed-off-by: Haavard Skinnemoen <haavard.skinnemoen@atmel.com>
Adrian Bunk [Wed, 21 May 2008 22:01:38 +0000 (01:01 +0300)]
avr32: export strnlen_user
This patch fixes the following build error:
<-- snip -->
...
MODPOST 1327 modules
ERROR: "strnlen_user" [drivers/input/misc/uinput.ko] undefined!
...
make[2]: *** [__modpost] Error 1
<-- snip -->
Reported-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Haavard Skinnemoen <haavard.skinnemoen@atmel.com>
Adrian Bunk [Mon, 5 May 2008 18:29:57 +0000 (21:29 +0300)]
avr32: export copy_page
This patch fixes the following build error:
<-- snip -->
...
MODPOST 61 modules
ERROR: "copy_page" [fs/fuse/fuse.ko] undefined!
...
make[2]: *** [__modpost] Error 1
<-- snip -->
Also add an empty line since *_page aren't "String functions".
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Haavard Skinnemoen <haavard.skinnemoen@atmel.com>
Adrian Bunk [Mon, 26 May 2008 05:50:16 +0000 (22:50 -0700)]
sparc64: global_reg_snapshot is not for userspace
global_reg_snapshot shouldn't be visible in our userspace headers.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paul Mundt [Mon, 26 May 2008 02:45:45 +0000 (11:45 +0900)]
sh: Drop broken URAM support on SH7723.
This was copied over from the previous MobileR bits, which doesn't
apply to R2. The URAM block on R2 is recycled for the L2 instead.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Linus Torvalds [Sun, 25 May 2008 22:00:27 +0000 (15:00 -0700)]
Merge git://git./linux/kernel/git/sam/kbuild-fixes
* git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild-fixes:
Kconfig: introduce ARCH_DEFCONFIG to DEFCONFIG_LIST
.gitignore: match ncscope.out
scripts/ver_linux use 'gcc -dumpversion'
Linus Torvalds [Sun, 25 May 2008 21:59:59 +0000 (14:59 -0700)]
Merge git://git./linux/kernel/git/wim/linux-2.6-watchdog
* git://git.kernel.org/pub/scm/linux/kernel/git/wim/linux-2.6-watchdog:
[WATCHDOG] Add ICH9DO into the iTCO_wdt.c driver
[WATCHDOG] Fix booke_wdt.c on MPC85xx SMP system's
[WATCHDOG] Add a watchdog driver based on the CS5535/CS5536 MFGPT timers
[WATCHDOG] hpwdt: Fix NMI handling.
[WATCHDOG] Blackfin Watchdog Driver: split platform device/driver
[WATCHDOG] Add w83697h_wdt early_disable option
[WATCHDOG] Make w83697h_wdt timeout option string similar to others
[WATCHDOG] Make w83697h_wdt void-like functions void
Linus Torvalds [Sun, 25 May 2008 21:59:27 +0000 (14:59 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/tiwai/sound-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
[ALSA] hda - Fix capture mute Widget for stac9250/9251
[ALSA] snd-pcsp - fix pcsp_treble_info() to honour an item number
[ALSA] hda - Added support for Foxconn P35AX-S mainboard
[ALSA] hda - Fix COEF and EAPD in ALC889 auto-configuration mode
[ALSA] hda - Fix noise on VT1708 codec
[ALSA] hda - Add model for ASUS P5K-E/WIFI-AP
Sam Ravnborg [Sun, 25 May 2008 21:03:18 +0000 (23:03 +0200)]
Kconfig: introduce ARCH_DEFCONFIG to DEFCONFIG_LIST
init/Kconfig contains a list of configs that are searched
for if 'make *config' are used with no .config present.
Extend this list to look at the config identified by
ARCH_DEFCONFIG.
With this change we now try the defconfig targets last.
This fixes a regression reported
by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Jike Song [Thu, 22 May 2008 01:23:10 +0000 (09:23 +0800)]
.gitignore: match ncscope.out
Sometimes I got this:
$ git-status
{snip}
# On branch master
# Untracked files:
# (use "git add <file>..." to include in what will be committed)
#
# ncscope.out
nothing added to commit but untracked files present (use "git add"
to track)
Fix it.
Signed-off-by: Jike Song <albcamus@gmail.com>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Gabriel C [Wed, 21 May 2008 18:36:19 +0000 (20:36 +0200)]
scripts/ver_linux use 'gcc -dumpversion'
These magic greps and hacks in ver_linux to get the gcc version always break after some gcc releases.
Since now gcc >4.3 allows compiling with '--with-pkgversion' ( which can be everything 'My Cool Gcc' or something )
ver_linux will report random junk for these.
Simply use 'gcc -dumpversion' to get the gcc version which should always work.
Signed-off-by: Gabriel C <nix.or.die@googlemail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Mauro Carvalho Chehab [Sun, 25 May 2008 16:20:06 +0000 (18:20 +0200)]
[ALSA] hda - Fix capture mute Widget for stac9250/9251
Fix capture mute widget for STAC9250/9251 codecs. The widget 0x09
has no mute but 0x14 does actually.
Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
Stas Sergeev [Sat, 24 May 2008 16:05:47 +0000 (18:05 +0200)]
[ALSA] snd-pcsp - fix pcsp_treble_info() to honour an item number
This solves the problem with mixers wrongly displaying the PWM freq.
Signed-off-by: Stas Sergeev <stsp@aknet.ru>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Gabriel C [Wed, 30 Apr 2008 14:51:10 +0000 (16:51 +0200)]
[WATCHDOG] Add ICH9DO into the iTCO_wdt.c driver
Add the Intel ICH9DO controller ID's for the iTCO_wdt kernel driver and bump
the driver version.
Tested on an P5E-VM DO ASUS motherboard.
Signed-off-by: Gabriel Craciunescu <nix.or.die@googlemail.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Chen Gong [Tue, 29 Apr 2008 08:42:05 +0000 (16:42 +0800)]
[WATCHDOG] Fix booke_wdt.c on MPC85xx SMP system's
On Book-E SMP systems each core has its own private watchdog. If only one
watchdog is enabled, when the core that doesn't enable the watchdog is hung,
system can't reset because no watchdog is running on it. That's bad. It
means we must enable watchdogs on both cores.
We can use smp_call_function() to send appropriate messages to all the other
cores to enable and update the watchdog.
Signed-off-by: Chen Gong <g.chen@freescale.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Jordan Crouse [Mon, 21 Jan 2008 17:07:00 +0000 (10:07 -0700)]
[WATCHDOG] Add a watchdog driver based on the CS5535/CS5536 MFGPT timers
Add a watchdog timer based on the MFGPT timers in the CS5535/CS5536
companion chips to the AMD Geode GX and LX processors. Only caveat
is that the BIOS must provide at least a one free timer, and most
do not.
Signed-off-by: Jordan Crouse <jordan.crouse@amd.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Mingarelli, Thomas [Tue, 25 Mar 2008 17:17:30 +0000 (17:17 +0000)]
[WATCHDOG] hpwdt: Fix NMI handling.
I need to just return in case it's not my NMI so someone else can take a look
at it (and reset die_nmi_called to 0 in case I actually do get one that's mine
to handle).
Signed-off-by: Thomas Mingarelli <thomas.mingarelli@hp.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Mike Frysinger [Thu, 27 Mar 2008 18:53:32 +0000 (11:53 -0700)]
[WATCHDOG] Blackfin Watchdog Driver: split platform device/driver
- split platform device/driver registering from actual watchdog device/driver
registering so that we can cleanly load/unload
- fixup __initdata with __initconst and __devinitdata with __devinitconst
Signed-off-by: Mike Frysinger <vapier.adi@gmail.com>
Signed-off-by: Bryan Wu <cooloney@kernel.org>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Samuel Tardieu [Wed, 12 Mar 2008 13:28:03 +0000 (14:28 +0100)]
[WATCHDOG] Add w83697h_wdt early_disable option
Pádraig Brady requested the possibility of not disabling the watchdog
at module load time or kernel boot time if it had been previously enabled
in the bios. It may help rebooting the machine if it freezes before the
userland daemon kicks in.
Signed-off-by: Samuel Tardieu <sam@rfc1149.net>
Cc: Pádraig Brady <P@draigBrady.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Samuel Tardieu [Wed, 12 Mar 2008 13:28:02 +0000 (14:28 +0100)]
[WATCHDOG] Make w83697h_wdt timeout option string similar to others
Signed-off-by: Samuel Tardieu <sam@rfc1149.net>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Samuel Tardieu [Wed, 12 Mar 2008 13:28:01 +0000 (14:28 +0100)]
[WATCHDOG] Make w83697h_wdt void-like functions void
Some non-exported functions always returned 0. Mark them void instead.
Signed-off-by: Samuel Tardieu <sam@rfc1149.net>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Linus Torvalds [Sat, 24 May 2008 17:20:00 +0000 (10:20 -0700)]
Merge branch 'x86-fixes-for-linus' of git://git./linux/kernel/git/x86/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-tip:
x86: prevent PGE flush from interruption/preemption
x86: use explicit copy in vdso_gettimeofday()
namespacecheck: automated fixes
x86/xen: fix arbitrary_virt_to_machine()
x86: don't read maxlvt before checking if APIC is mapped
x86: disable TSC for sched_clock() when calibration failed
x86: distangle user disabled TSC from unstable
x86: fix setup of cyc2ns in tsc_64.c
Linus Torvalds [Sat, 24 May 2008 17:13:16 +0000 (10:13 -0700)]
Merge branch 'for-linus' of /home/rmk/linux-2.6-arm
* 'for-linus' of master.kernel.org:/home/rmk/linux-2.6-arm:
[ARM] integrator: fix build warnings and errors
[ARM] fix OMAP include loops
Revert "[ARM] pxa: spitz wants PXA27x UDC definitions"
[ARM] 5053/1: define before use of processor_id
[ARM] 5052/1: export clock functions for the at91x40
[ARM] 5051/1: define pgtable_t for the !CONFIG_MMU case too
[ARM] omap: fix omap clk support build errors
[ARM] 5039/1: S3C244X: Rename SDI device if running on S3C244X.
[ARM] 5043/1: pxafb: remove unused mode variable in pxafb_init_fbinfo
[ARM] 5041/1: VR1000: Fix DM9000 IRQ flags initialisation
[ARM] 5040/1: BAST: Fix DM9000 IRQ flags initialisation
[ARM] 5038/1: ARM: OMAP: Remove tsc2102 references from board-palmte.c
[ARM] 5025/2: fix collie cpu initialisation
David Brownell [Fri, 23 May 2008 20:05:03 +0000 (13:05 -0700)]
spi: remove some spidev oops-on-rmmod paths
Somehow the spidev code forgot to include a critical mechanism: when the
underlying device is removed (e.g. spi_master rmmod), open file
descriptors must be prevented from issuing new I/O requests to that
device. On penalty of the oopsing reported by Sebastian Siewior
<bigeasy@tglx.de> ...
This is a partial fix, adding handshaking between the lower level (SPI
messaging) and the file operations using the spi_dev. (It also fixes an
issue where reads and writes didn't return the number of bytes sent or
received.)
There's still a refcounting issue to be addressed (separately).
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Reported-by: Sebastian Siewior <bigeasy@tglx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cedric Le Goater [Fri, 23 May 2008 20:05:02 +0000 (13:05 -0700)]
cgroups: remove node_ prefix_from ns subsystem
This is a slight change in the namespace cgroup subsystem api.
The change is that previously when cgroup_clone() was called (currently
only from the unshare path in ns_proxy cgroup, you'd get a new group named
"node_$pid" whereas now you'll get a group named after just your pid.)
The only users who would notice it are those who are using the ns_proxy
cgroup subsystem to auto-create cgroups when namespaces are unshared -
something of an experimental feature, which I think really needs more
complete container/namespace support in order to be useful. I suspect the
only users are Cedric and Serge, or maybe a few others on
containers@lists.linux-foundation.org. And in fact it would only be
noticed by the users who make the assumption about how the name is
generated, rather than getting it from the /proc/<pid>/cgroups file for
the process in question.
Whether the change is actually needed or not I'm fairly agnostic on, but I
guess it is more elegant to just use the pid as the new group name rather
than adding a fairly arbitrary "node_" prefix on the front.
[menage@google.com: provided changelog]
Signed-off-by: Cedric Le Goater <clg@fr.ibm.com>
Cc: "Paul Menage" <menage@google.com>
Cc: "Serge E. Hallyn" <serue@us.ibm.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fernando Luis Vazquez Cao [Fri, 23 May 2008 20:05:01 +0000 (13:05 -0700)]
for_each_online_pgdat(): kerneldoc fix
for_each_pgdat() was renamed to for_each_online_pgdat() and kerneldoc
comments should be updated accordingly.
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Adrian Bunk [Fri, 23 May 2008 20:05:00 +0000 (13:05 -0700)]
frv: export empty_zero_page
Fix the following build error:
ERROR: "empty_zero_page" [fs/ext4/ext4dev.ko] undefined!
Reported-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Shi Weihua [Fri, 23 May 2008 20:04:59 +0000 (13:04 -0700)]
sys_prctl(): fix return of uninitialized value
If none of the switch cases match, the PR_SET_PDEATHSIG and
PR_SET_DUMPABLE cases of the switch statement will never write to local
variable `error'.
Signed-off-by: Shi Weihua <shiwh@cn.fujitsu.com>
Cc: Andrew G. Morgan <morgan@kernel.org>
Acked-by: "Serge E. Hallyn" <serue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kumar Gala [Fri, 23 May 2008 20:04:58 +0000 (13:04 -0700)]
edac: mpc85xx: fix building as a module
including of <asm/mpc85xx.h> causes build problems since it doesn't exist.
Also removed warning:
drivers/edac/mpc85xx_edac.c:45: warning: 'mpc85xx_ctl_name' defined but not used
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Acked-by: Doug Thompson <dougthompson@xmission.com>
Acked-by: Dave Jiang <djiang@mvista.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Brownell [Fri, 23 May 2008 20:04:58 +0000 (13:04 -0700)]
gpio: build fixes
This fixes various gpio-related build errors (mostly potential)
reported in part by Russell King and Uwe Kleine-König.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Cc: Uwe Kleine-König <Uwe.Kleine-Koenig@digi.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Arnaud Patard <arnaud.patard@rtp-net.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ben Dooks [Fri, 23 May 2008 20:04:57 +0000 (13:04 -0700)]
S3C2410: fix driver MODULE_ALIAS()
Add a correct MODULE_ALIAS() entry for this driver to enable udev module
loading.
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Cc: Arnaud Patard <arnaud.patard@rtp-net.org>
Acked-by: Krzysztof Helt <krzysztof.h1@wp.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ben Dooks [Fri, 23 May 2008 20:04:56 +0000 (13:04 -0700)]
S3C2410: clean out changelog header and tidy
Remove the old changelog entries which are now out of date and should be
extractable from git anyway. Also tidy up the copyright for the driver.
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Cc: Arnaud Patard <arnaud.patard@rtp-net.org>
Acked-by: Krzysztof Helt <krzysztof.h1@wp.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ben Dooks [Fri, 23 May 2008 20:04:56 +0000 (13:04 -0700)]
S3C2410: add error print if we cannot add attribute
Fix the following warning by checking the result of device_create_file and
printing an error but not removing the device (loss of debug registers is
not fatal).
drivers/video/s3c2410fb.c:905: warning: ignoring return value of 'device_create_file', declared with attribute warn_unused_result
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Cc: Arnaud Patard <arnaud.patard@rtp-net.org>
Acked-by: Krzysztof Helt <krzysztof.h1@wp.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ben Dooks [Fri, 23 May 2008 20:04:55 +0000 (13:04 -0700)]
S3C2410: ensure that FB_BLANK_POWERDOWN shuts down the controller
When a blank level of FB_BLANK_POWERDOWN is used, we should shut down the
controller so that it no longer tries to produce any panel signals or
data, and shuts down the DMA which is not needed.
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Cc: Arnaud Patard <arnaud.patard@rtp-net.org>
Acked-by: Krzysztof Helt <krzysztof.h1@wp.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ben Dooks [Fri, 23 May 2008 20:04:53 +0000 (13:04 -0700)]
SM501: reverse FPEN/VBIASEN flags behaviour
To keep backwards compatibility, reverse the meanings of these flags so
that when they are not set, the driver uses the original behvaiour.
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Cc: Arnaud Patard <arnaud.patard@rtp-net.org>
Acked-by: Krzysztof Helt <krzysztof.h1@wp.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Heiko Carstens [Fri, 23 May 2008 20:04:52 +0000 (13:04 -0700)]
memory hotplug: fix early allocation handling
Trying to add memory via add_memory() from within an initcall function
results in
bootmem alloc of 163840 bytes failed!
Kernel panic - not syncing: Out of memory
This is caused by zone_wait_table_init() which uses system_state to decide
if it should use the bootmem allocator or not.
When initcalls are handled the system_state is still SYSTEM_BOOTING but
the bootmem allocator doesn't work anymore. So the allocation will fail.
To fix this use slab_is_available() instead as indicator like we do it
everywhere else.
[akpm@linux-foundation.org: coding-style fix]
Reviewed-by: Andy Whitcroft <apw@shadowen.org>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andy Whitcroft [Fri, 23 May 2008 20:04:50 +0000 (13:04 -0700)]
zonelists: handle a node zonelist with no applicable entries
When booting 2.6.26-rc3 on a multi-node x86_32 numa system we are seeing
panics when trying node local allocations:
BUG: unable to handle kernel NULL pointer dereference at
0000034c
IP: [<
c1042507>] get_page_from_freelist+0x4a/0x18e
*pdpt =
00000000013a7001 *pde =
0000000000000000
Oops: 0000 [#1] SMP
Modules linked in:
Pid: 0, comm: swapper Not tainted (
2.6.26-rc3-00003-g5abc28d #82)
EIP: 0060:[<
c1042507>] EFLAGS:
00010282 CPU: 0
EIP is at get_page_from_freelist+0x4a/0x18e
EAX:
c1371ed8 EBX:
00000000 ECX:
00000000 EDX:
00000000
ESI:
f7801180 EDI:
00000000 EBP:
00000000 ESP:
c1371ec0
DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
Process swapper (pid: 0, ti=
c1370000 task=
c12f5b40 task.ti=
c1370000)
Stack:
00000000 00000000 00000000 00000000 000612d0 000412d0 00000000 000412d0
f7801180 f7c0101c f7c01018 c10426e4 f7c01018 00000001 00000044 00000000
00000001 c12f5b40 00000001 00000010 00000000 000412d0 00000286 000412d0
Call Trace:
[<
c10426e4>] __alloc_pages_internal+0x99/0x378
[<
c10429ca>] __alloc_pages+0x7/0x9
[<
c105e0e8>] kmem_getpages+0x66/0xef
[<
c105ec55>] cache_grow+0x8f/0x123
[<
c105f117>] ____cache_alloc_node+0xb9/0xe4
[<
c105f427>] kmem_cache_alloc_node+0x92/0xd2
[<
c122118c>] setup_cpu_cache+0xaf/0x177
[<
c105e6ca>] kmem_cache_create+0x2c8/0x353
[<
c13853af>] kmem_cache_init+0x1ce/0x3ad
[<
c13755c5>] start_kernel+0x178/0x1ee
This occurs when we are scanning the zonelists looking for a ZONE_NORMAL
page. In this system there is only ZONE_DMA and ZONE_NORMAL memory on
node 0, all other nodes are mapped above 4GB physical. Here is a dump
of the zonelists from this system:
zonelists pgdat=
c1400000
0:
c14006c0:2
f7c006c0:2
f7e006c0:2
c1400360:1
c1400000:0
1:
c14006c0:2
c1400360:1
c1400000:0
zonelists pgdat=
f7c00000
0:
f7c006c0:2
f7e006c0:2
c14006c0:2
c1400360:1
c1400000:0
1:
f7c006c0:2
zonelists pgdat=
f7e00000
0:
f7e006c0:2
c14006c0:2
f7c006c0:2
c1400360:1
c1400000:0
1:
f7e006c0:2
When performing a node local allocation we call get_page_from_freelist()
looking for a page. It in turn calls first_zones_zonelist() which returns
a preferred_zone. Where there are no applicable zones this will be NULL.
However we use this unconditionally, leading to this panic.
Where there are no applicable zones there is no possibility of a successful
allocation, so simply fail the allocation.
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Arjan van de Ven [Fri, 23 May 2008 20:04:49 +0000 (13:04 -0700)]
serial: fix enable_irq_wake/disable_irq_wake imbalance in serial_core.c
enable_irq_wake() and disable_irq_wake() need to be balanced. However,
serial_core.c calls these for different conditions during the suspend and
resume functions...
This is causing a regular WARN_ON() as found at
http://www.kerneloops.org/search.php?search=set_irq_wake
This patch makes the conditions for triggering the _wake enable/disable
sequence identical.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Denis V. Lunev [Fri, 23 May 2008 20:04:47 +0000 (13:04 -0700)]
proc: proc_get_inode() should get module only once
Any file under /proc/net opened more than once leaked the refcounter
on the module it belongs to.
The problem is that module_get is called for each file opening while
module_put is called only when /proc inode is destroyed. So, lets put
module counter if we are dealing with already initialised inode.
Addresses http://bugzilla.kernel.org/show_bug.cgi?id=10737
Signed-off-by: Denis V. Lunev <den@openvz.org>
Cc: David Miller <davem@davemloft.net>
Cc: Patrick McHardy <kaber@trash.net>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Robert Olsson <robert.olsson@its.uu.se>
Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Reported-by: Roland Kletzing <devzero@web.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Marcin Krol [Fri, 23 May 2008 20:04:46 +0000 (13:04 -0700)]
brd: don't show ramdisks in /proc/partitions
In 2.6.25, ramdisk devices show up in /proc/partitions, which is a
behaviour change from the old rd.c. Add GENHD_FL_SUPPRESS_PARTITION_INFO,
which was present in rd.c.
All kernels prior to 2.6.25 weren't displaying ramdisks in
/proc/partitions. Since there are many userspace tools using information
from /proc/partitions some of them may now behave incorrectly (I didn't
tested any though). For example before 2.6.25 /proc/partitions was empty
if no block devices like hard disks and such were detected by kernel. Now
all 16 ramdisks are always visible there. Some software may rely on such
information (I mean, on empty /proc/partitions).
There was quite similar situation back in 2004, and ramdisks were excluded
back from displaying. Thats why I called this a regression (maybe a bit
unfortunate). See this patch for info:
http://kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.3-rc2/2.6.3-rc2-mm1/broken-out/nbd-proc-partitions-fix.patch
I also think that someone somewhere (long time ago) excluded ramdisks from
/proc/partitions for good reasons. It is possible that now such new
"feature" is harmless, but I think there are more chances that someone
will say "hey, /proc/partitions has changed, now my software doesn't work"
then "hey where did my new 2.6.25 feature go". nbd devices are also
excluded, maybe for very same (unknown to me) reasons.
Signed-off-by: Marcin Krol <hawk@pld-linux.org>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alan Cox [Fri, 23 May 2008 20:04:45 +0000 (13:04 -0700)]
ip2: fix crashes on load/unload
This doesn't need to be two modules, and making it one cleans up the
problem
Signed-off-by: Alan Cox <alan@redhat.com>
Cc: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Trent Piepho [Fri, 23 May 2008 20:04:44 +0000 (13:04 -0700)]
gpiolib: fix off by one errors
The last gpio belonging to a chip is chip->base + chip->ngpios - 1. Some
places in the code, but not all, forgot the critical minus one.
Signed-off-by: Trent Piepho <xyzzy@speakeasy.org>
Acked-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Roel Kluin [Fri, 23 May 2008 20:04:43 +0000 (13:04 -0700)]
gpio: mcp23s08 debug fix
The return value of mcp23s08_read_regs() can only be evaluated when signed
Signed-off-by: Roel Kluin <12o3l@tiscali.nl>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Brownell [Fri, 23 May 2008 20:04:42 +0000 (13:04 -0700)]
gpio: pca953x driver handles pca9554 too
Teach drivers/gpio/pca953x.c about PCA9554, another compatible chip.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oleg Nesterov [Fri, 23 May 2008 20:04:41 +0000 (13:04 -0700)]
signals: fix sigqueue_free() vs __exit_signal() race
__exit_signal() does flush_sigqueue(tsk->pending) outside of ->siglock.
This can race with another thread doing sigqueue_free(), we can free the
same SIGQUEUE_PREALLOC sigqueue twice or corrupt the pending->list.
Note that even sys_exit_group() can trigger this race, not only
sys_timer_delete().
Move the callsite of flush_sigqueue(tsk->pending) under ->siglock.
This patch doesn't touch flush_sigqueue(->shared_pending) below, it is
called when there are no other threads which can play with signals, and
sigqueue_free() can't be used outside of our thread group.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
NeilBrown [Fri, 23 May 2008 20:04:39 +0000 (13:04 -0700)]
md: restart recovery cleanly after device failure.
When we get any IO error during a recovery (rebuilding a spare), we abort
the recovery and restart it.
For RAID6 (and multi-drive RAID1) it may not be best to restart at the
beginning: when multiple failures can be tolerated, the recovery may be
able to continue and re-doing all that has already been done doesn't make
sense.
We already have the infrastructure to record where a recovery is up to
and restart from there, but it is not being used properly.
This is because:
- We sometimes abort with MD_RECOVERY_ERR rather than just MD_RECOVERY_INTR,
which causes the recovery not be be checkpointed.
- We remove spares and then re-added them which loses important state
information.
The distinction between MD_RECOVERY_ERR and MD_RECOVERY_INTR really isn't
needed. If there is an error, the relevant drive will be marked as
Faulty, and that is enough to ensure correct handling of the error. So we
first remove MD_RECOVERY_ERR, changing some of the uses of it to
MD_RECOVERY_INTR.
Then we cause the attempt to remove a non-faulty device from an array to
fail (unless recovery is impossible as the array is too degraded). Then
when remove_and_add_spares attempts to remove the devices on which
recovery can continue, it will fail, they will remain in place, and
recovery will continue on them as desired.
Issue: If we are halfway through rebuilding a spare and another drive
fails, and a new spare is immediately available, do we want to:
1/ complete the current rebuild, then go back and rebuild the new spare or
2/ restart the rebuild from the start and rebuild both devices in
parallel.
Both options can be argued for. The code currently takes option 2 as
a/ this requires least code change
b/ this results in a minimally-degraded array in minimal time.
Cc: "Eivind Sarto" <ivan@kasenna.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bernd Schubert [Fri, 23 May 2008 20:04:38 +0000 (13:04 -0700)]
md: allow parallel resync of md-devices.
In some configurations, a raid6 resync can be limited by CPU speed
(Calculating P and Q and moving data) rather than by device speed. In
these cases there is nothing to be gained byt serialising resync of arrays
that share a device, and doing the resync in parallel can provide benefit.
So add a sysfs tunable to flag an array as being allowed to resync in
parallel with other arrays that use (a different part of) the same device.
Signed-off-by: Bernd Schubert <bs@q-leap.de>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dan Williams [Fri, 23 May 2008 20:04:37 +0000 (13:04 -0700)]
md: notify userspace on 'stop' events
This additional notification to 'array_state' is needed to allow the
monitor application to learn about stop events via sysfs. The
sysfs_notify("sync_action") call that comes at the end of do_md_stop()
(via md_new_event) is insufficient since the 'sync_action' attribute has
been removed by this point.
(Seems like a sysfs-notify-on-removal patch is a better fix. Currently
removal updates the event count but does not wake up waiters)
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
NeilBrown [Fri, 23 May 2008 20:04:36 +0000 (13:04 -0700)]
md: notify userspace on 'write-pending' changes to array_state
When an array enters write pending, 'array_state' changes, so we must be
sure to sysfs_notify.
Also, when waiting for user-space to acknowledge 'write-pending' by
marking the metadata as dirty, we don't want to wait for MD_CHANGE_DEVS to
be cleared as that might not happen. So explicity test for the bits that
we are really interested in.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
NeilBrown [Fri, 23 May 2008 20:04:35 +0000 (13:04 -0700)]
md: raid1: Fix restoration of bio between failed read and write.
When performing a "recovery" or "check" pass on a RAID1 array, we read
from each device and possible, if there is a difference or a read error,
write back to some devices.
We use the same 'bio' for both read and write, resetting various fields
between the two operations.
We forgot to reset bv_offset and bv_len however. These are often left
unchanged, but in the case where there is an IO error one or two sectors
into a page, they are changed.
This results in correctable errors not being corrected properly. It does
not result in any data corruption.
Cc: "Fairbanks, David" <David.Fairbanks@stratus.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bernd Schubert [Fri, 23 May 2008 20:04:34 +0000 (13:04 -0700)]
md: md: raid5 rate limit error printk
Last night we had scsi problems and a hardware raid unit was offlined
during heavy i/o. While this happened we got for about 3 minutes a huge
number messages like these
Apr 12 03:36:07 pfs1n14 kernel: [197510.696595] raid5:md7: read error not correctable (sector
2993096568 on sdj2).
I guess the high error rate is responsible for not scheduling other events
- during this time the system was not pingable and in the end also other
devices run into scsi command timeouts causing problems on these unrelated
devices as well.
Signed-off-by: Bernd Schubert <bernd-schubert@gmx.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Christoph Hellwig [Fri, 23 May 2008 20:04:34 +0000 (13:04 -0700)]
md: kill file_path wrapper
Kill the trivial and rather pointless file_path wrapper around d_path.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Adrian Bunk [Fri, 23 May 2008 20:04:33 +0000 (13:04 -0700)]
md: proper extern for mdp_major
This patch adds a proper extern for mdp_major in include/linux/raid/md.h
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
NeilBrown [Fri, 23 May 2008 20:04:32 +0000 (13:04 -0700)]
md: fix possible oops when removing a bitmap from an active array
It is possible to add a write-intent bitmap to an active array, or remove
the bitmap that is there.
When we do with the 'quiesce' the array, which causes make_request to
block in "wait_barrier()".
However we are sampling the value of "mddev->bitmap" before the
wait_barrier call, and using it afterwards. This can result in using a
bitmap structure that has been freed.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>