Lars Ellenberg [Mon, 6 Jun 2011 09:31:42 +0000 (11:31 +0200)]
drbd: don't cond_resched_lock with IRQs disabled
The last commit, drbd: add missing spinlock to bitmap receive,
introduced a cond_resched_lock(), where the lock in question is taken
with irqs disabled.
As we must not schedule with IRQs disabled,
and cond_resched_lock_irq() does not exist, yet,
we re-aquire the spin_lock_irq() for each bitmap page processed in turn.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Lars Ellenberg [Fri, 3 Jun 2011 19:18:13 +0000 (21:18 +0200)]
drbd: add missing spinlock to bitmap receive
During bitmap exchange, when using the RLE bitmap compression scheme,
we have a code path that can set the whole bitmap at once.
To avoid holding spin_lock_irq() for too long, we used to lock out other
bitmap modifications during bitmap exchange by other means, and then,
knowing we have exclusive access to the bitmap, modify it without
the spinlock, and with IRQs enabled.
Since we now allow local IO to continue, potentially setting additional
bits during the bitmap receive phase, this is no longer true, and we get
uncoordinated updates of bitmap members, causing bm_set to no longer
accurately reflect the total number of set bits.
To actually see this, you'd need to have a large bitmap, use RLE bitmap
compression, and have busy IO during sync handshake and bitmap exchange.
Fix this by taking the spin_lock_irq() in this code path as well, but
calling cond_resched_lock() after each page worth of bits processed.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Philipp Reisner [Wed, 25 May 2011 09:14:35 +0000 (11:14 +0200)]
drbd: Use the correct max_bio_size when creating resync requests
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Namhyung Kim [Tue, 24 May 2011 14:48:55 +0000 (16:48 +0200)]
loop: handle on-demand devices correctly
When finding or allocating a loop device, loop_probe() did not take
partition numbers into account so that it can result to a different
device. Consider following example:
$ sudo modprobe loop max_part=15
$ ls -l /dev/loop*
brw-rw---- 1 root disk 7, 0 2011-05-24 22:16 /dev/loop0
brw-rw---- 1 root disk 7, 16 2011-05-24 22:16 /dev/loop1
brw-rw---- 1 root disk 7, 32 2011-05-24 22:16 /dev/loop2
brw-rw---- 1 root disk 7, 48 2011-05-24 22:16 /dev/loop3
brw-rw---- 1 root disk 7, 64 2011-05-24 22:16 /dev/loop4
brw-rw---- 1 root disk 7, 80 2011-05-24 22:16 /dev/loop5
brw-rw---- 1 root disk 7, 96 2011-05-24 22:16 /dev/loop6
brw-rw---- 1 root disk 7, 112 2011-05-24 22:16 /dev/loop7
$ sudo mknod /dev/loop8 b 7 128
$ sudo losetup /dev/loop8 ~/temp/disk-with-3-parts.img
$ sudo losetup -a
/dev/loop128: [0805]:278201 (/home/namhyung/temp/disk-with-3-parts.img)
$ ls -l /dev/loop*
brw-rw---- 1 root disk 7, 0 2011-05-24 22:16 /dev/loop0
brw-rw---- 1 root disk 7, 16 2011-05-24 22:16 /dev/loop1
brw-rw---- 1 root disk 7, 2048 2011-05-24 22:18 /dev/loop128
brw-rw---- 1 root disk 7, 2049 2011-05-24 22:18 /dev/loop128p1
brw-rw---- 1 root disk 7, 2050 2011-05-24 22:18 /dev/loop128p2
brw-rw---- 1 root disk 7, 2051 2011-05-24 22:18 /dev/loop128p3
brw-rw---- 1 root disk 7, 32 2011-05-24 22:16 /dev/loop2
brw-rw---- 1 root disk 7, 48 2011-05-24 22:16 /dev/loop3
brw-rw---- 1 root disk 7, 64 2011-05-24 22:16 /dev/loop4
brw-rw---- 1 root disk 7, 80 2011-05-24 22:16 /dev/loop5
brw-rw---- 1 root disk 7, 96 2011-05-24 22:16 /dev/loop6
brw-rw---- 1 root disk 7, 112 2011-05-24 22:16 /dev/loop7
brw-r--r-- 1 root root 7, 128 2011-05-24 22:17 /dev/loop8
After this patch, /dev/loop8 - instead of /dev/loop128 - was
accessed correctly.
In addition, 'range' passed to blk_register_region() should
include all range of dev_t that LOOP_MAJOR can address. It does
not need to be limited by partition numbers unless 'max_loop'
param was specified.
Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Cc: Laurent Vivier <Laurent.Vivier@bull.net>
Cc: stable@kernel.org
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Namhyung Kim [Tue, 24 May 2011 14:48:54 +0000 (16:48 +0200)]
loop: limit 'max_part' module param to DISK_MAX_PARTS
The 'max_part' parameter controls the number of maximum partition
a loop block device can have. However if a user specifies very
large value it would exceed the limitation of device minor number
and can cause a kernel panic (or, at least, produce invalid
device nodes in some cases).
On my desktop system, following command kills the kernel. On qemu,
it triggers similar oops but the kernel was alive:
$ sudo modprobe loop max_part0000
------------[ cut here ]------------
kernel BUG at /media/Linux_Data/project/linux/fs/sysfs/group.c:65!
invalid opcode: 0000 [#1] SMP
last sysfs file:
CPU 0
Modules linked in: loop(+)
Pid: 43, comm: insmod Tainted: G W 2.6.39-qemu+ #155 Bochs Bochs
RIP: 0010:[<
ffffffff8113ce61>] [<
ffffffff8113ce61>] internal_create_group=
+0x2a/0x170
RSP: 0018:
ffff880007b3fde8 EFLAGS:
00000246
RAX:
00000000ffffffef RBX:
ffff880007b3d878 RCX:
00000000000007b4
RDX:
ffffffff8152da50 RSI:
0000000000000000 RDI:
ffff880007b3d878
RBP:
ffff880007b3fe38 R08:
ffff880007b3fde8 R09:
0000000000000000
R10:
ffff88000783b4a8 R11:
ffff880007b3d878 R12:
ffffffff8152da50
R13:
ffff880007b3d868 R14:
0000000000000000 R15:
ffff880007b3d800
FS:
0000000002137880(0063) GS:
ffff880007c00000(0000) knlGS:
00000000000000=
00
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
0000000000422680 CR3:
0000000007b50000 CR4:
00000000000006b0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
0000000000000000 DR7:
0000000000000000
Process insmod (pid: 43, threadinfo
ffff880007b3e000, task
ffff880007afb9c=
0)
Stack:
ffff880007b3fe58 ffffffff811e66dd ffff880007b3fe58 ffffffff811e570b
0000000000000010 ffff880007b3d800 ffff880007a7b390 ffff880007b3d868
0000000000400920 ffff880007b3d800 ffff880007b3fe48 ffffffff8113cfc8
Call Trace:
[<
ffffffff811e66dd>] ? device_add+0x4bc/0x5af
[<
ffffffff811e570b>] ? dev_set_name+0x3c/0x3e
[<
ffffffff8113cfc8>] sysfs_create_group+0xe/0x12
[<
ffffffff810b420e>] blk_trace_init_sysfs+0x14/0x16
[<
ffffffff8116a090>] blk_register_queue+0x47/0xf7
[<
ffffffff8116f527>] add_disk+0xdf/0x290
[<
ffffffffa00060eb>] loop_init+0xeb/0x1b8 [loop]
[<
ffffffffa0006000>] ? 0xffffffffa0005fff
[<
ffffffff8100020a>] do_one_initcall+0x7a/0x12e
[<
ffffffff81096804>] sys_init_module+0x9c/0x1e0
[<
ffffffff813329bb>] system_call_fastpath+0x16/0x1b
Code: c3 55 48 89 e5 41 57 41 56 41 89 f6 41 55 41 54 49 89 d4 53 48 89 fb=
48 83 ec 28 48 85 ff 74 0b 85 f6 75 0b 48 83 7f 30 00 75 14 <0f> 0b eb fe =
48 83 7f 30 00 b9 ea ff ff ff 0f 84 18 01 00 00 49
RIP [<
ffffffff8113ce61>] internal_create_group+0x2a/0x170
RSP <
ffff880007b3fde8>
---[ end trace
a123eb592043acad ]---
Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Cc: Laurent Vivier <Laurent.Vivier@bull.net>
Cc: stable@kernel.org
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Andrew Morton [Mon, 23 May 2011 22:29:32 +0000 (15:29 -0700)]
drbd: fix warning
In file included from drivers/block/drbd/drbd_main.c:54: drivers/block/drbd/drbd_int.h:1190: warning: parameter has incomplete type
Forward declarations of enums do not work.
Fix it unpleasantly by moving the prototype.
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Lars Ellenberg <drbd-dev@lists.linbit.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Philipp Reisner [Tue, 24 May 2011 08:27:38 +0000 (10:27 +0200)]
drbd: fix warning
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Bart Van Assche [Sat, 21 May 2011 16:32:29 +0000 (18:32 +0200)]
drbd: Fix spelling
Found these with the help of ispell -l.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Lars Ellenberg [Mon, 2 May 2011 09:51:31 +0000 (11:51 +0200)]
drbd: fix schedule in atomic
An administrative detach used to request a state change directly to D_DISKLESS,
first suspending IO to avoid the last put_ldev() occuring from an endio handler,
potentially in irq context.
This is not enough on the receiving side (typically secondary), we may miss
some peer_req on the way to local disk, which then may do the last put_ldev()
from their drbd_peer_request_endio().
This patch makes the detach always go through the intermediate D_FAILED state.
We may consider to rename it D_DETACHING.
Alternative approach would be to create yet an other work item to be scheduled
on the worker, do the destructor work from there, and get the timing right.
manually picked commit
564040f from the drbd 8.4 branch.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Philipp Reisner [Fri, 20 May 2011 14:39:13 +0000 (16:39 +0200)]
drbd: Take a more conservative approach when deciding max_bio_size
The old (optimistic) implementation could shrink the bio size
on an primary device.
Shrinking the bio size on a primary device is bad. Since there
we might get BIOs with the old (bigger) size shortly after
we published the new size.
The new implementation is more conservative, and eventually
increases the max_bio_size on a primary device (which is valid).
It does so, when it knows the local limit AND the remote limit.
We cache the last seen max_bio_size of the peer in the meta
data, and rely on that, to make the operation of single
nodes more efficient.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Philipp Reisner [Tue, 17 May 2011 12:19:41 +0000 (14:19 +0200)]
drbd: Fixed state transitions after async outdate-peer-handler returned
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Philipp Reisner [Tue, 17 May 2011 12:48:55 +0000 (14:48 +0200)]
drbd: Disallow the peer_disk_state to be D_OUTDATED while connected
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Philipp Reisner [Fri, 13 May 2011 10:03:55 +0000 (12:03 +0200)]
drbd: Fix for the connection problems on high latency links
It seems that the real cause of all the issues where that
we did not noticed in drbd_try_connect() when the other
guy closes one socket if the round trip time gets higher
than 100ms. There were that 100ms hard coded!
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Lars Ellenberg [Mon, 16 May 2011 13:31:45 +0000 (15:31 +0200)]
drbd: fix potential activity log refcount imbalance in error path
It is no longer sufficient to trigger on local WRITE,
we need to check on (rq_state & RQ_IN_ACT_LOG)
before calling drbd_al_complete_io also in the error path.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Philipp Reisner [Mon, 14 Mar 2011 10:54:47 +0000 (11:54 +0100)]
drbd: Only downgrade the disk state in case of disk failures
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Lars Ellenberg [Wed, 9 Mar 2011 21:44:55 +0000 (22:44 +0100)]
drbd: fix disconnect/reconnect loop, if ping-timeout == ping-int
If there is no replication traffic within the idle timeout
(ping-int seconds), DRBD will send a P_PING,
and adjust the timeout to ping-timeout.
If there is no P_PING_ACK received within this ping-timeout,
DRBD finally drops the connection, and tries to re-establish it.
To decide which timeout was active, we compared the current timeout
with the ping-timeout, and dropped the connection, if that was the case.
By default, ping-int is 10 seconds, ping-timeout is 500 ms.
Unfortunately, if you configure ping-timeout to be the same as ping-int,
expiry of the idle-timeout had been mistaken for a missing ping ack,
and caused an immediate reconnection attempt.
Fix:
Allow both timeouts to be equal, use a local variable
to store which timeout is active.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Lars Ellenberg [Tue, 8 Mar 2011 16:11:40 +0000 (17:11 +0100)]
drbd: fix potential distributed deadlock
We limit ourselves to a configurable maximum number of pages used as
temporary bio pages.
If the configured "max_buffers" is not big enough to match the bandwidth
of the respective deployment, a distributed deadlock could be triggered
by e.g. fast online verify and heavy application IO.
TCP connections would block on congestion, because both receivers
would wait on pages to become available.
Fortunately the respective senders in this case would be able to give
back some pages already. So do that.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Lars Ellenberg [Thu, 27 Jan 2011 14:24:58 +0000 (15:24 +0100)]
lru_cache.h: fix comments referring to ts_ instead of lc_
For some time we contemplated calling the "struct lru_cache"
a "struct tracked_set", and some comments kept the ts_ prefix.
Fix those to match the member field names.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Philipp Reisner [Wed, 2 Mar 2011 23:21:30 +0000 (00:21 +0100)]
drbd: Fix for application IO with the on-io-error=pass-on policy
In case a write failes on the local disk, go into D_INCONSISTENT
disk state. That causes future reads of that block to be shipped
to the peer.
Read retry remote was already in place.
Actually the documentation needs to get fixed now. Since the
application is still shielded from the error. (as long as we have
only a single disk failing) The difference to detach is that
we keep the disk. And therefore might keep all the other, still
working sectors up to date.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Jens Axboe [Thu, 19 May 2011 07:46:00 +0000 (09:46 +0200)]
Merge branches 'for-jens/xen-backend-fixes' and 'for-jens/xen-blkback-v3.3' of git://git./linux/kernel/git/konrad/xen into for-2.6.40/drivers
Konrad Rzeszutek Wilk [Wed, 20 Apr 2011 15:54:10 +0000 (11:54 -0400)]
xen/p2m: Add EXPORT_SYMBOL_GPL to the M2P override functions.
If the backends, which use these two functions, are compiled as
a module we need these two functions to be exported.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Mon, 28 Feb 2011 22:58:48 +0000 (17:58 -0500)]
xen/p2m/m2p/gnttab: Support GNTMAP_host_map in the M2P override.
We only supported the M2P (and P2M) override only for the
GNTMAP_contains_pte type mappings. Meaning that we grants
operations would "contain the machine address of the PTE to update"
If the flag is unset, then the grant operation is
"contains a host virtual address". The latter case means that
the Hypervisor takes care of updating our page table
(specifically the PTE entry) with the guest's MFN. As such we should
not try to do anything with the PTE. Previous to this patch
we would try to clear the PTE which resulted in Xen hypervisor
being upset with us:
(XEN) mm.c:1066:d0 Attempt to implicitly unmap a granted PTE
c0100000ccc59067
(XEN) domain_crash called from mm.c:1067
(XEN) Domain 0 (vcpu#0) crashed on cpu#3:
(XEN) ----[ Xen-4.0-110228 x86_64 debug=y Not tainted ]----
and crashing us.
This patch allows us to inhibit the PTE clearing in the PV guest
if the GNTMAP_contains_pte is not set.
On the m2p_remove_override path we provide the same parameter.
Sadly in the grant-table driver we do not have a mechanism to
tell m2p_remove_override whether to clear the PTE or not. Since
the grant-table driver is used by user-space, we can safely assume
that it operates only on PTE's. Hence the implementation for
it to work on !GNTMAP_contains_pte returns -EOPNOTSUPP. In the future
we can implement the support for this. It will require some extra
accounting structure to keep track of the page[i], and the flag.
[v1: Added documentation details, made it return -EOPNOTSUPP instead
of trying to do a half-way implementation]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Jan Beulich [Tue, 17 May 2011 10:07:05 +0000 (11:07 +0100)]
xen/blkback: don't fail empty barrier requests
The sector number on empty barrier requests may (will?) be -1, which,
given that it's being treated as unsigned 64-bit quantity, will almost
always exceed the actual (virtual) disk's size.
Inspired by Konrad's "When writting barriers set the sector number to
zero...".
While at it also add overflow checking to the math in vbd_translate().
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Laszlo Ersek [Fri, 13 May 2011 13:45:40 +0000 (09:45 -0400)]
xen/blkback: fix xenbus_transaction_start() hang caused by double xenbus_transaction_end()
vbd_resize() up_read()'s xs_state.suspend_mutex twice in a row via double
xenbus_transaction_end() calls. The next down_read() in
xenbus_transaction_start() (at eg. the next resize attempt) hangs.
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=618317
Acked-by: Jan Beulich <jbeulich@novell.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Thu, 12 May 2011 22:02:28 +0000 (18:02 -0400)]
xen/blkback: Align the tabs on the structure.
The recent changes caused this field of the structure to be offset a bit.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Thu, 12 May 2011 21:23:30 +0000 (17:23 -0400)]
xen/blkback: if log_stats is enabled print out the data.
And not depend on the driver being built with -DDEBUG flag.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Thu, 12 May 2011 20:58:21 +0000 (16:58 -0400)]
xen/blkback: Add the prefix XEN in the common.h.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Thu, 12 May 2011 20:53:56 +0000 (16:53 -0400)]
xen/blkback: Prefix 'vbd' with 'xen' in structs and functions.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Thu, 12 May 2011 20:47:48 +0000 (16:47 -0400)]
xen/blkback: Change structure name blkif_st to xen_blkif.
No need for that '_st' and xen_blkif is more apt.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Thu, 12 May 2011 20:37:04 +0000 (16:37 -0400)]
xen/blkback: Remove the unused typedefs.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Thu, 12 May 2011 20:31:51 +0000 (16:31 -0400)]
xen/blkback: Move include/xen/blkif.h into drivers/block/xen-blkback/common.h
Not point of the blkif.h file. It is not used by the frontend.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Thu, 12 May 2011 20:23:06 +0000 (16:23 -0400)]
xen/blkback: Fixing some more of the cleanpatch.pl warnings.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Thu, 12 May 2011 20:19:23 +0000 (16:19 -0400)]
xen/blkback: Checkpatch.pl recommend against multiple assigments.
CHECK: multiple assignments should be avoided
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Thu, 12 May 2011 20:14:15 +0000 (16:14 -0400)]
xen/blkback: Fix checkpatch.pl warnings about more than 80 lines.
Break up the macro usage.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Thu, 12 May 2011 20:10:55 +0000 (16:10 -0400)]
xen/blkback: Flesh out the description in the Kconfig.
with more details.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Wed, 11 May 2011 20:26:59 +0000 (16:26 -0400)]
xen/blkback: Fix spelling mistakes.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Wed, 11 May 2011 20:23:39 +0000 (16:23 -0400)]
xen/blkback: Move blkif_get_x86_[32|64]_req to common.h in block/xen-blkback dir.
From the blkif.h header, which was exposed to the frontend.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Wed, 11 May 2011 20:21:08 +0000 (16:21 -0400)]
xen/blkback: Removing the debug_lvl option.
It is not really used for anything.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Thu, 12 May 2011 20:43:12 +0000 (16:43 -0400)]
xen/blkback: Use the DRV_PFX in the pr_.. macros.
To make it easier to read.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Wed, 11 May 2011 20:15:24 +0000 (16:15 -0400)]
xen/blkback: Make the DPRINTK uniform.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Thu, 12 May 2011 20:42:31 +0000 (16:42 -0400)]
xen/blkback: Change printk/DPRINTK to pr_.. type variant.
And also make them uniform and prefix the message with 'xen-blkback'.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Tue, 3 May 2011 16:01:11 +0000 (12:01 -0400)]
xen-blkfront: Introduce BLKIF_OP_FLUSH_DISKCACHE support.
If the backend supports the 'feature-flush-cache' mode, use that
instead of the 'feature-barrier' support.
Currently there are three backends that support the 'feature-flush-cache'
mode: NetBSD, Solaris and Linux kernel. The 'flush' option is much
light-weight version than the 'barrier' support so lets try to use as
there are no filesystems in the kernel that use full barriers anymore.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Thu, 5 May 2011 16:41:03 +0000 (12:41 -0400)]
xen-blkfront: Provide for 'feature-flush-cache' the BLKIF_OP_WRITE_FLUSH_CACHE operation.
The operation BLKIF_OP_WRITE_FLUSH_CACHE has existed in the Xen
tree header file for years but it was never present in the Linux tree
because the frontend (nor the backend) supported this interface.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Marek Marczykowski [Tue, 3 May 2011 16:04:52 +0000 (12:04 -0400)]
xen-blkfront: fix data size for xenbus_gather in blkfront_connect
barrier variable is int, not long. This overflow caused another variable
override: "err" (in PV code) and "binfo" (in xenlinux code -
drivers/xen/blkfront/blkfront.c). The later caused incorrect device
flags (RO/removable etc).
Signed-off-by: Marek Marczykowski <marmarek@mimuw.edu.pl>
Acked-by: Ian Campbell <Ian.Campbell@citrix.com>
[v1: Changed title]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Wed, 11 May 2011 19:57:09 +0000 (15:57 -0400)]
xen/blkback: Fixed up comments and converted spaces to tabs.
Suggested-by: Ian Campbell <Ian.Campbell@eu.citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Jens Axboe [Fri, 6 May 2011 14:27:00 +0000 (08:27 -0600)]
cciss: fix compile issue
drivers/block/cciss.c: In function ‘cciss_send_reset’:
drivers/block/cciss.c:2515:2: error: implicit declaration of function ‘fill_cmd’
drivers/block/cciss.c: At top level:
drivers/block/cciss.c:2531:12: error: conflicting types for ‘fill_cmd’
drivers/block/cciss.c:2534:1: note: an argument type that has a default promotion can’t match an empty parameter name list declaration
drivers/block/cciss.c:2515:18: note: previous implicit declaration of ‘fill_cmd’ was here
make[1]: *** [drivers/block/cciss.o] Error 1
make: *** [drivers/block/cciss.o] Error 2
Move fill_cmd() to above where it is first used.
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Stephen M. Cameron [Tue, 3 May 2011 19:54:12 +0000 (14:54 -0500)]
cciss: add cciss_tape_cmds module paramter
This is to allow number of commands reserved for use by SCSI tape drives
and medium changers to be adjusted at driver load time via the kernel
parameter cciss_tape_cmds, with a default value of 6, and a range
of 2 - 16 inclusive. Previously, the driver limited the number of
commands which could be queued to the SCSI half of the the driver
to only 2. This is to fix the problem that if you had more than
two tape drives, you couldn't, for example, erase or rewind them all
at the same time.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Stephen M. Cameron [Tue, 3 May 2011 19:54:07 +0000 (14:54 -0500)]
cciss: do not use bit 2 doorbell reset
It causes NMIs which are undesirable at best, unsurvivable at worst.
Prefer the soft reset instead.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Stephen M. Cameron [Tue, 3 May 2011 19:54:02 +0000 (14:54 -0500)]
cciss: do not attempt PCI power management reset method if we know it won't work.
Just go straight to the soft-reset method instead.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Stephen M. Cameron [Tue, 3 May 2011 19:53:57 +0000 (14:53 -0500)]
cciss: remove superfluous sleeps around reset code
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Stephen M. Cameron [Tue, 3 May 2011 19:53:52 +0000 (14:53 -0500)]
cciss: do soft reset if hard reset is broken
on driver load, if reset_devices is set, and the hard reset
attempts fail, try to bring up the controller to the point that
a command can be sent, and send it a soft reset command, then
after the reset undo whatever driver initialization was done to get
it to the point to take a command, and re-do it after the reset.
This is to get kdump to work on all the "non-resettable" controllers
(except 64xx controllers which can't be reset due to the potentially
shared cache module.)
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Stephen M. Cameron [Tue, 3 May 2011 19:53:46 +0000 (14:53 -0500)]
cciss: use new doorbell-bit-5 reset method
The bit-2-doorbell reset method seemed to cause (survivable) NMIs
on some systems and (unsurvivable) IOCK NMIs on some G7 servers.
Firmware guys implemented a new doorbell method to alleviate these
problems triggered by bit 5 of the doorbell register. We want to
use it if it's available.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Stephen M. Cameron [Tue, 3 May 2011 19:53:41 +0000 (14:53 -0500)]
cciss: increase timeouts for post-reset no-ops
Just to reduce the messages about timeouts that appear.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Stephen M. Cameron [Tue, 3 May 2011 19:53:36 +0000 (14:53 -0500)]
cciss: clarify messages around reset behavior
When waiting for the board to become "not ready"
don't print a message saying "waiting for board to
become ready" (possibly followed by a message saying
"failed waiting for board to become not ready". Instead,
it should be "waiting for board to reset" and "failed
waiting for board to reset."
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
"
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Stephen M. Cameron [Tue, 3 May 2011 19:53:31 +0000 (14:53 -0500)]
cciss: increase time to wait for board reset to start
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Stephen M. Cameron [Tue, 3 May 2011 19:53:26 +0000 (14:53 -0500)]
cciss: get rid of message related magic numbers
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Stephen M. Cameron [Tue, 3 May 2011 19:53:21 +0000 (14:53 -0500)]
cciss: fix reply pool and block fetch table memory leaks
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Stephen M. Cameron [Tue, 3 May 2011 19:53:16 +0000 (14:53 -0500)]
cciss: factor out irq request code
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Stephen M. Cameron [Tue, 3 May 2011 19:53:10 +0000 (14:53 -0500)]
cciss: factor out scatterlist allocation functions
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Stephen M. Cameron [Tue, 3 May 2011 19:53:05 +0000 (14:53 -0500)]
cciss: factor out command pool allocation functions
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Stephen M. Cameron [Tue, 3 May 2011 19:53:00 +0000 (14:53 -0500)]
cciss: do a better job of detecting controller reset failure
Detect failure of controller reset by noticing if the 32 bytes of
"driver version" we store on the hardware in the config table
fail to get zeroed out. Previously we noticed if the controller
did not transition to "simple mode", but this did not detect reset
failure if the controller was already in simple mode prior to
the reset attempt (e.g. due to module parameter hpsa_simple_mode=1).
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Stephen M. Cameron [Tue, 3 May 2011 19:52:54 +0000 (14:52 -0500)]
cciss: add readl after writel in interrupt mask setting code
This is to ensure the board interrupts are really off when
these functions return.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Kees Cook [Fri, 6 May 2011 00:02:12 +0000 (18:02 -0600)]
iosched: remove redundant sprintf
After the anticipatory scheduler was dropped, there was no need to
special-case the request_module string. As such, drop the redundant
sprintf and stack variable.
Signed-off-by: Kees Cook <kees.cook@canonical.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Tao Ma [Thu, 5 May 2011 21:10:05 +0000 (15:10 -0600)]
block: Remove 'plug/unplug' comment in blk_execute_rq_nowait
unplug is replaced with blk_run_queue now in blk_execute_rq_nowait,
so change the comment accordingly.
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Konrad Rzeszutek Wilk [Thu, 5 May 2011 17:42:10 +0000 (13:42 -0400)]
xen/blkback: Fix up some of the comments.
They had the wrong data or were in the wrong spot.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Thu, 5 May 2011 17:37:23 +0000 (13:37 -0400)]
xen/blkback: Squash the checking for operation into dispatch_rw_block_io
We do a check for the operations right before calling dispatch_rw_block_io.
And then we do the same check in dispatch_rw_block_io. This patch
squashes those checks into the 'dispatch_rw_block_io' function.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Wed, 4 May 2011 21:07:27 +0000 (17:07 -0400)]
xen/blkback: Add support for BLKIF_OP_FLUSH_DISKCACHE and drop BLKIF_OP_WRITE_BARRIER.
We drop the support for 'feature-barrier' and add in the support
for the 'feature-flush-cache' if the real backend storage supports
flushing.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Thu, 5 May 2011 16:41:03 +0000 (12:41 -0400)]
xen-blkfront: Provide for 'feature-flush-cache' the BLKIF_OP_WRITE_FLUSH_CACHE operation.
The operation BLKIF_OP_WRITE_FLUSH_CACHE has existed in the Xen
tree header file for years but it was never present in the Linux tree
because the frontend (nor the backend) supported this interface.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Wed, 27 Apr 2011 16:40:11 +0000 (12:40 -0400)]
Revert "xen/blkback: Move the plugging/unplugging to a higher level."
This reverts commit
97961ef46b9b5a6a7c918a38b898a7b3e49869f4 b/c
we lose about 15% performance if we do the unplugging and the
end of the reading the ring buffer.
Konrad Rzeszutek Wilk [Tue, 26 Apr 2011 20:24:18 +0000 (16:24 -0400)]
xen/blkback: Stick REQ_SYNC on WRITEs to deal with CFQ I/O scheduler.
If one runs a simple fio request with random read/write with a
20%/80% ratio, the numbers are incredibly bad when using the CFQ scheduler.
IOmeter | | | |
64K, randrw | NOOP | CFQ | deadline |
randrwmix=80 | | | |
--------------+-------+------+----------+
blkback |103/27 |32/10 | 102/27 |
--------------+-------+------+----------+
QEMU qdisk |103/27 |102/27| 102/27 |
The problem as explained by Vivek Goyal was:
".. that difference is that sync vs async requests. In the case of
a kernel thread submitting IO, [..] all the WRITES might be being
considered as async and will go in a different queue. If you mix those
with some READS, they are always sync and will go in differnet queue.
In presence of sync queue, CFQ will idle and choke up WRITES in
an attempt to improve latencies of READs.
In case of AIO [note: this is what QEMU qdisk is doing] , [..]
it is direct IO and both READS and WRITES will be considered SYNC
and will go in a single queue and no choking of WRITES will take place."
The solution is quite simple, tack on REQ_SYNC (which is
what the WRITE_ODIRECT macro points to) and the numbers go
back up.
Suggested-by: Vivek Goyal <vgoyal@redhat.com
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Tue, 26 Apr 2011 16:57:59 +0000 (12:57 -0400)]
xen/blkback: Move the plugging/unplugging to a higher level.
We used to the plug/unplug on the submit_bio. But that means
if within a stream of WRITE, WRITE, WRITE,...,WRITE we have
one READ, it could stall the pipeline (as the 'submio_bio'
could trigger the unplug_fnc to be called and stall/sync
when doing the READ). Instead we want to move the unplugging
when the whole (or as a much as possible) ring buffer has been
processed. This also eliminates us doing plug/unplug for
each request.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tejun Heo [Thu, 21 Apr 2011 18:54:46 +0000 (20:54 +0200)]
block: don't block events on excl write for non-optical devices
Disk event code automatically blocks events on excl write. This is
primarily to avoid issuing polling commands while burning is in
progress. This behavior doesn't fit other types of devices with
removeable media where polling commands don't have adverse side
effects and door locking usually doesn't exist.
This patch introduces new genhd flag which controls the auto-blocking
behavior and uses it to enable auto-blocking only on optical devices.
Note for stable: 2.6.38 and later only
Cc: stable@kernel.org
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Tejun Heo [Thu, 21 Apr 2011 18:54:45 +0000 (20:54 +0200)]
block: rescan partitions on invalidated devices on -ENOMEDIA too
__blkdev_get() doesn't rescan partitions if disk->fops->open() fails,
which leads to ghost partition devices lingering after medimum removal
is known to both the kernel and userland. The behavior also creates a
subtle inconsistency where O_NONBLOCK open, which doesn't fail even if
there's no medium, clears the ghots partitions, which is exploited to
work around the problem from userland.
Fix it by updating __blkdev_get() to issue partition rescan after
-ENOMEDIA too.
This was reported in the following bz.
https://bugzilla.kernel.org/show_bug.cgi?id=13029
Note for stable: 2.6.38 and later only
Cc: stable@kernel.org
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: David Zeuthen <zeuthen@gmail.com>
Reported-by: Martin Pitt <martin.pitt@ubuntu.com>
Reported-by: Kay Sievers <kay.sievers@vrfy.org>
Tested-by: Kay Sievers <kay.sievers@vrfy.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Tejun Heo [Thu, 21 Apr 2011 18:54:44 +0000 (20:54 +0200)]
cdrom: always check_disk_change() on open
cdrom_open() called check_disk_change() after the rest of open path
succeeded which leads to the following bizarre behavior.
* After media change, if the device opened without O_NONBLOCK,
open_for_data() naturally fails with -ENOMEDIA and
check_disk_change() is never called. The media is known to be gone
and the open failure makes it obvious to the userland but device
invalidation never happens.
* But if the device is opened with O_NONBLOCK, all the checks are
bypassed and cdrom_open() doesn't notice that the media is not there
and check_disk_change() is called and invalidation happens.
There's nothing to be gained by avoiding calling check_disk_change()
on open failure. Common cases end up calling check_disk_change()
anyway. All we get is inconsistent behavior.
Fix it by moving check_disk_change() invocation to the top of
cdrom_open() so that it always gets called regardless of how the rest
of open proceeds.
Note for stable: 2.6.38 and later only
Cc: stable@kernel.org
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Amit Shah <amit.shah@redhat.com>
Tested-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Konrad Rzeszutek Wilk [Wed, 20 Apr 2011 15:50:43 +0000 (11:50 -0400)]
xen/blkback: Prefix exposed functions with xen_
And also shorten the name if it has blkback to blkbk.
This results in the symbol table (if compiled in the kernel)
to be much shorter, prettier, and also easier to search for.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Wed, 20 Apr 2011 15:21:43 +0000 (11:21 -0400)]
xen-blkback: Inline some of the functions that were moved from vbd/interface.c
Shuffling code around.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Wed, 20 Apr 2011 15:01:47 +0000 (11:01 -0400)]
xen-blkback: Remove from the copyright notice the address.
There is no need for it, as the address is updated constatly
in the root of the Linux kernel.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Wed, 20 Apr 2011 14:57:29 +0000 (10:57 -0400)]
xen/blkback: Squash vbd.c,interface.c in blkback.c and xenbus.c respectivly.
Daniel Stodden suggested to eliminate vbd.c and interface.c, inlining the
critical bits where they belong, respectively.
Leaving only blkback.c for the data- and xenbus.c for the control path.
Suggested-by: Daniel Stodden <daniel.stodden@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Linus Torvalds [Tue, 19 Apr 2011 04:26:00 +0000 (21:26 -0700)]
Linux 2.6.39-rc4
Linus Torvalds [Mon, 18 Apr 2011 22:44:29 +0000 (15:44 -0700)]
Merge branch 'for-39-rc4' of git://codeaurora.org/quic/kernel/davidb/linux-msm
* 'for-39-rc4' of git://codeaurora.org/quic/kernel/davidb/linux-msm:
msm: timer: fix missing return value
msm: Remove extraneous ffa device check
Linus Torvalds [Mon, 18 Apr 2011 20:29:03 +0000 (13:29 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: xen-kbdfront - fix mouse getting stuck after save/restore
Input: estimate number of events per packet
Input: evdev - indicate buffer overrun with SYN_DROPPED
Input: document event types and codes and their intended use
Input: add KEY_IMAGES specifically for AL Image Browser
Input: twl4030_keypad - fix potential NULL dereference in twl4030_kp_probe()
Input: h3600_ts - fix error handling at connect
Input: twl4030_keypad - avoid potential NULL-pointer dereference
Linus Torvalds [Mon, 18 Apr 2011 20:21:18 +0000 (13:21 -0700)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
block: add blk_run_queue_async
block: blk_delay_queue() should use kblockd workqueue
md: fix up raid1/raid10 unplugging.
md: incorporate new plugging into raid5.
md: provide generic support for handling unplug callbacks.
md - remove old plugging code.
md/dm - remove remains of plug_fn callback.
md: use new plugging interface for RAID IO.
block: drop queue lock before calling __blk_run_queue() for kblockd punt
Revert "block: add callback function for unplug notification"
block: Enhance new plugging support to support general callbacks
Linus Torvalds [Mon, 18 Apr 2011 19:24:24 +0000 (12:24 -0700)]
Merge branch 'merge' of git://git./linux/kernel/git/benh/powerpc
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
powerpc/powermac: Build fix with SMP and CPU hotplug
powerpc/perf_event: Skip updating kernel counters if register value shrinks
powerpc: Don't write protect kernel text with CONFIG_DYNAMIC_FTRACE enabled
powerpc: Fix oops if scan_dispatch_log is called too early
powerpc/pseries: Use a kmem cache for DTL buffers
powerpc/kexec: Fix regression causing compile failure on UP
powerpc/85xx: disable Suspend support if SMP enabled
powerpc/e500mc: Remove CPU_FTR_MAYBE_CAN_NAP/CPU_FTR_MAYBE_CAN_DOZE
powerpc/book3e: Fix CPU feature handling on 64-bit e5500
powerpc: Check device status before adding serial device
powerpc/85xx: Don't add disabled PCIe devices
Linus Torvalds [Mon, 18 Apr 2011 19:24:05 +0000 (12:24 -0700)]
Merge git://git./linux/kernel/git/mason/btrfs-unstable
* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (24 commits)
Btrfs: fix free space cache leak
Btrfs: avoid taking the chunk_mutex in do_chunk_alloc
Btrfs end_bio_extent_readpage should look for locked bits
Btrfs: don't force chunk allocation in find_free_extent
Btrfs: Check validity before setting an acl
Btrfs: Fix incorrect inode nlink in btrfs_link()
Btrfs: Check if btrfs_next_leaf() returns error in btrfs_real_readdir()
Btrfs: Check if btrfs_next_leaf() returns error in btrfs_listxattr()
Btrfs: make uncache_state unconditional
btrfs: using cached extent_state in set/unlock combinations
Btrfs: avoid taking the trans_mutex in btrfs_end_transaction
Btrfs: fix subvolume mount by name problem when default mount subvolume is set
fix user annotation in ioctl.c
Btrfs: check for duplicate iov_base's when doing dio reads
btrfs: properly handle overlapping areas in memmove_extent_buffer
Btrfs: fix memory leaks in btrfs_new_inode()
Btrfs: check for duplicate iov_base's when doing dio reads
Btrfs: reuse the extent_map we found when calling btrfs_get_extent
Btrfs: do not use async submit for small DIO io's
Btrfs: don't split dio bios if we don't have to
...
Konrad Rzeszutek Wilk [Mon, 18 Apr 2011 18:24:23 +0000 (14:24 -0400)]
xen/blkback: Move it from drivers/xen to drivers/block
.. and modify the Makefile and Kconfig files appropriately.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Mon, 18 Apr 2011 18:17:49 +0000 (14:17 -0400)]
block, xen/blkback: remove blk_[get|put]_queue calls.
They were used to check if the queue does not have QUEUE_FLAG_DEAD
set. That is not necessary anymore as the 'submit_io' call
ends up doing that for us.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Linus Torvalds [Mon, 18 Apr 2011 17:36:54 +0000 (10:36 -0700)]
proc: do proper range check on readdir offset
Rather than pass in some random truncated offset to the pid-related
functions, check that the offset is in range up-front.
This is just cleanup, the previous commit fixed the real problem.
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Mon, 18 Apr 2011 17:35:30 +0000 (10:35 -0700)]
next_pidmap: fix overflow condition
next_pidmap() just quietly accepted whatever 'last' pid that was passed
in, which is not all that safe when one of the users is /proc.
Admittedly the proc code should do some sanity checking on the range
(and that will be the next commit), but that doesn't mean that the
helper functions should just do that pidmap pointer arithmetic without
checking the range of its arguments.
So clamp 'last' to PID_MAX_LIMIT. The fact that we then do "last+1"
doesn't really matter, the for-loop does check against the end of the
pidmap array properly (it's only the actual pointer arithmetic overflow
case we need to worry about, and going one bit beyond isn't going to
overflow).
[ Use PID_MAX_LIMIT rather than pid_max as per Eric Biederman ]
Reported-by: Tavis Ormandy <taviso@cmpxchg8b.com>
Analyzed-by: Robert Święcki <robert@swiecki.net>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Igor Mammedov [Mon, 18 Apr 2011 17:17:17 +0000 (10:17 -0700)]
Input: xen-kbdfront - fix mouse getting stuck after save/restore
Mouse gets "stuck" after restore of PV guest but buttons are in working
condition.
If driver has been configured for ABS coordinates at start it will get
XENKBD_TYPE_POS events and then suddenly after restore it'll start getting
XENKBD_TYPE_MOTION events, that will be dropped later and they won't get
into user-space.
Regression was introduced by hunk 5 and 6 of
5ea5254aa0ad269cfbd2875c973ef25ab5b5e9db
("Input: xen-kbdfront - advertise either absolute or relative
coordinates").
Driver on restore should ask xen for request-abs-pointer again if it is
available. So restore parts that did it before
5ea5254.
Acked-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
[v1: Expanded the commit description]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Jeff Brown [Mon, 18 Apr 2011 17:08:02 +0000 (10:08 -0700)]
Input: estimate number of events per packet
Calculate a default based on the number of ABS axes, REL axes,
and MT slots for the device during input device registration.
Signed-off-by: Jeff Brown <jeffbrown@android.com>
Reviewed-by: Henrik Rydberg <rydberg@euromail.se>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Konrad Rzeszutek Wilk [Mon, 18 Apr 2011 16:04:17 +0000 (12:04 -0400)]
xen/blkback: Get the 'requeust_queue' properly.
After the commit
0faa8cca883bbc6a0919e3c89128672659b75820
(" xen/blkback: remove per-queue plugging") we forgot
to retrieve the 'struct request_queue' from the block device.
This puts the functionality back in and fixes a NULL pointer
bug.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Mon, 18 Apr 2011 15:34:55 +0000 (11:34 -0400)]
xen/blkback: Move the check for misaligned I/O once more.
The commit
976222e05ea5a9959ccf880d7a24efbf79b3c6cf
xen/blkback: Move the check for misaligned I/O higher.
moved it a bit to high. The preq->vbdev was not set, so the
check for misaligned I/O would cause a NULL pointer derefence.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Chris Mason [Mon, 18 Apr 2011 12:55:34 +0000 (08:55 -0400)]
Btrfs: fix free space cache leak
The free space caching code was recently reworked to
cache all the pages it needed instead of using find_get_page everywhere.
One loop was missed though, so it ended up leaking pages. This fixes
it to use our page array instead of find_get_page.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Christoph Hellwig [Mon, 18 Apr 2011 09:41:33 +0000 (11:41 +0200)]
block: add blk_run_queue_async
Instead of overloading __blk_run_queue to force an offload to kblockd
add a new blk_run_queue_async helper to do it explicitly. I've kept
the blk_queue_stopped check for now, but I suspect it's not needed
as the check we do when the workqueue items runs should be enough.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Jens Axboe [Mon, 18 Apr 2011 09:36:39 +0000 (11:36 +0200)]
block: blk_delay_queue() should use kblockd workqueue
Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
NeilBrown [Mon, 18 Apr 2011 08:25:43 +0000 (18:25 +1000)]
md: fix up raid1/raid10 unplugging.
We just need to make sure that an unplug event wakes up the md
thread, which is exactly what mddev_check_plugged does.
Also remove some plug-related code that is no longer needed.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Mon, 18 Apr 2011 08:25:43 +0000 (18:25 +1000)]
md: incorporate new plugging into raid5.
In raid5 plugging is used for 2 things:
1/ collecting writes that require a bitmap update
2/ collecting writes in the hope that we can create full
stripes - or at least more-full.
We now release these different sets of stripes when plug_cnt
is zero.
Also in make_request, we call mddev_check_plug to hopefully increase
plug_cnt, and wake up the thread at the end if plugging wasn't
achieved for some reason.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Mon, 18 Apr 2011 08:25:42 +0000 (18:25 +1000)]
md: provide generic support for handling unplug callbacks.
When an md device adds a request to a queue, it can call
mddev_check_plugged.
If this succeeds then we know that the md thread will be woken up
shortly, and ->plug_cnt will be non-zero until then, so some
processing can be delayed.
If it fails, then no unplug callback is expected and the make_request
function needs to do whatever is required to make the request happen.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Mon, 18 Apr 2011 08:25:42 +0000 (18:25 +1000)]
md - remove old plugging code.
md has some plugging infrastructure for RAID5 to use because the
normal plugging infrastructure required a 'request_queue', and when
called from dm, RAID5 doesn't have one of those available.
This relied on the ->unplug_fn callback which doesn't exist any more.
So remove all of that code, both in md and raid5. Subsequent patches
with restore the plugging functionality.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Mon, 18 Apr 2011 08:25:41 +0000 (18:25 +1000)]
md/dm - remove remains of plug_fn callback.
Now that unplugging is done differently, the unplug_fn callback is
never called, so it can be completely discarded.
Signed-off-by: NeilBrown <neilb@suse.de>