Borislav Petkov [Sun, 26 Apr 2009 08:39:07 +0000 (10:39 +0200)]
ide: unify interrupt reason checking
Add ide_check_ireason() function that handles all ATAPI devices.
Reorganize all unlikely cases in ireason checking further down in the
code path.
In addition, add PFX for printks originating from ide-atapi. Finally,
remove ide_cd_check_ireason.
Signed-off-by: Borislav Petkov <petkovbb@gmail.com>
Borislav Petkov [Thu, 23 Apr 2009 06:30:29 +0000 (08:30 +0200)]
ide-cd: use whole request_sense buffer in EH
Now that we use a static request_sense buffer, use it instead of the
first 18 bytes only. Also, remove sense-arg to cdrom_analyze_sense_data
and cdrom_log_sense since we can access it through drive->sense_data
now.
Signed-off-by: Borislav Petkov <petkovbb@gmail.com>
Borislav Petkov [Mon, 4 May 2009 07:53:03 +0000 (09:53 +0200)]
ide-atapi: remove pc->buf
Now after all users of pc->buf have been converted, remove the 64B buffer
embedded in each packet command.
There should be no functional change resulting from this patch.
Signed-off-by: Borislav Petkov <petkovbb@gmail.com>
Borislav Petkov [Mon, 4 May 2009 07:38:15 +0000 (09:38 +0200)]
ide-tape: fix READ POSITION cmd handling
ide-tape used to issue READ POSITION in several places and the
evaluation of the returned READ POSITION data was done in the
->pc_callback. Convert it to use local buffer and move that
evaluation chunk in the idetape_read_position(). Additionally, fold
idetape_create_read_position_cmd() into it, too, thus concentrating READ
POSITION handling in one method only and making all places call that.
Finally, mv {idetape,ide_tape}_read_position.
There should be no functional change resulting from this patch.
Signed-off-by: Borislav Petkov <petkovbb@gmail.com>
Borislav Petkov [Mon, 4 May 2009 07:48:57 +0000 (09:48 +0200)]
ide-tape/ide_tape_get_bsize_from_bdesc: use local buffer
There should be no functional change resulting from this patch.
Signed-off-by: Borislav Petkov <petkovbb@gmail.com>
Borislav Petkov [Tue, 12 May 2009 06:59:49 +0000 (08:59 +0200)]
ide-floppy/ide_floppy_get_format_progress: use local sense buffer
There should be no functional change resulting from this patch.
Signed-off-by: Borislav Petkov <petkovbb@gmail.com>
Borislav Petkov [Sat, 2 May 2009 08:58:17 +0000 (10:58 +0200)]
ide-atapi: use local sense buffer
Access the sense buffer through the bio in ->pc_callback method thus
alleviating the need for the pc->buf pointer.
There should be no functional change resulting from this patch.
Signed-off-by: Borislav Petkov <petkovbb@gmail.com>
Borislav Petkov [Sat, 2 May 2009 08:53:10 +0000 (10:53 +0200)]
ide-floppy/ide_floppy_format_unit: use local buffer
Pass the buffer into ide_floppy_create_format_unit_cmd instead of using
pc->buf.
There should be no functional change resulting from this patch.
Signed-off-by: Borislav Petkov <petkovbb@gmail.com>
Borislav Petkov [Sat, 2 May 2009 08:45:17 +0000 (10:45 +0200)]
ide-floppy/ide_floppy_get_sfrp_bit: use local buffer
There should be no functional change resulting from this patch.
Signed-off-by: Borislav Petkov <petkovbb@gmail.com>
Borislav Petkov [Sat, 2 May 2009 08:43:11 +0000 (10:43 +0200)]
ide-floppy/ide_floppy_get_flexible_disk_page: use local buffer
There should be no functional change resulting from this patch.
Signed-off-by: Borislav Petkov <petkovbb@gmail.com>
Borislav Petkov [Sat, 2 May 2009 08:26:12 +0000 (10:26 +0200)]
ide-atapi: add a buffer-arg to ide_queue_pc_tail
This is in preparation of removing ide_atapi_pc. Expose the buffer as an
argument to ide_queue_pc_tail with later replacing it with local buffer
or even kmalloc'ed one if needed due to stack usage constraints.
Also, add the possibility of passing a NULL-ptr buffer for cmds which
don't transfer data besides the cdb. While at it, switch to local buffer
in idetape_get_mode_sense_results().
There should be no functional change resulting from this patch.
Signed-off-by: Borislav Petkov <petkovbb@gmail.com>
Borislav Petkov [Fri, 1 May 2009 21:27:11 +0000 (23:27 +0200)]
ide-atapi: add a len-parameter to ide_queue_pc_tail
This is in preparation for removing ide_atapi_pc.
There should be no functional change resulting from this patch.
Signed-off-by: Borislav Petkov <petkovbb@gmail.com>
Borislav Petkov [Fri, 1 May 2009 19:21:02 +0000 (21:21 +0200)]
ide-atapi: switch to rq->resid_len
Now that we have rq->resid_len, use it to account partial completion
amount during the lifetime of an rq, decrementing it on each successful
transfer. As a result, get rid of now unused pc->xferred.
While at it, remove noisy debug call in ide_prep_sense.
Signed-off-by: Borislav Petkov <petkovbb@gmail.com>
Borislav Petkov [Fri, 1 May 2009 18:35:21 +0000 (20:35 +0200)]
ide-atapi: switch to blk_rq_bytes() on do_request() path
After the recent struct request cleanups, blk_rq_bytes() is guaranteed
to be valid and is the current total length of the rq's bio. Use that
instead of pc->req_xfer in the do_request() path after the command has
been queued
The remaining usage of pc->req_xfer now is only until we map the rq to a
bio.
While at it:
- remove local caching of rq completion length in ide_tape_issue_pc()
Signed-off-by: Borislav Petkov <petkovbb@gmail.com>
Borislav Petkov [Fri, 1 May 2009 18:29:53 +0000 (20:29 +0200)]
ide-tape: fix potential fs requests bug
ide-tape had a potential bug for fs requests when preparing the command
packet: it was writing the transfer length as a number of fixed blocks.
However, the block layer implies 512 byte blocks and ide-tape can have
other block sizes so account for that too.
ide-floppy does this calculation properly with the block size factor
(floppy->bs_factor).
Signed-off-by: Borislav Petkov <petkovbb@gmail.com>
Andrew Morton [Thu, 14 May 2009 07:49:44 +0000 (09:49 +0200)]
splice: fix error return code
fs/splice.c: In function 'default_file_splice_read':
fs/splice.c:566: warning: 'error' may be used uninitialized in this function
which is sort-of true. The code will in fact return -ENOMEM instead of the
kernel_readv() return value.
Cc: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Jens Axboe [Wed, 13 May 2009 06:35:35 +0000 (08:35 +0200)]
splice: fix repeated kmap()'s in default_file_splice_read()
We cannot reliably map more than one page at the time, or we risk
deadlocking. Just allocate the pages from low mem instead.
Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Jens Axboe [Tue, 12 May 2009 09:11:48 +0000 (11:11 +0200)]
splice: fix misleading comment
Splice is tied to pipes by design, it'll not change. And now that
the splice stuff is in splice.h (and note pipe.h), the rest of the comment
is out-of-date as well.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Tejun Heo [Tue, 12 May 2009 06:49:32 +0000 (08:49 +0200)]
scsi: fix resid_len mis-conversion in scsi_end_request()
Commit
c3a4d78c580de4edc9ef0f7c59812fb02ceb037f introduced
rq->data_len and converted residual count users to it. While
converting, it mistakenly converted scsi_end_request() to finish
requests with residual count when it wants to do is fully complete the
request. Fix it by using blk_end_request_all() instead.
This bug was spotted by Boaz Harrosh.
Signed-off-by: Tejun Heo <tj@kernel.org>
Spotted-by: Boaz Harrosh <bharrosh@panasas.com>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Miklos Szeredi [Thu, 7 May 2009 13:37:37 +0000 (15:37 +0200)]
splice: implement default splice_write method
If f_op->splice_write() is not implemented, fall back to a plain write.
Use vfs_writev() to write from the pipe buffers.
This will allow splice on all filesystems and file types. This
includes "direct_io" files in fuse which bypass the page cache.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Miklos Szeredi [Thu, 7 May 2009 13:37:36 +0000 (15:37 +0200)]
splice: implement default splice_read method
If f_op->splice_read() is not implemented, fall back to a plain read.
Use vfs_readv() to read into previously allocated pages.
This will allow splice and functions using splice, such as the loop
device, to work on all filesystems. This includes "direct_io" files
in fuse which bypass the page cache.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Miklos Szeredi [Thu, 7 May 2009 13:37:35 +0000 (15:37 +0200)]
splice: implement pipe to pipe splicing
Allow splice(2) to work when both the input and the output is a pipe.
Based on the impementation of the tee(2) syscall, but instead of
duplicating the buffer references move the buffers from the input pipe
to the output pipe.
Moving the whole buffer only succeeds if the full length of the buffer
is spliced. Otherwise duplicate the buffer, just like tee(2), set the
length of the output buffer and advance the offset on the input
buffer.
Since splice is operating on two pipes, special care needs to be taken
with locking to prevent AN ABBA deadlock. Again this is done
similarly to the tee(2) syscall, first preparing the input and output
pipes so there's data to consume and space for that data, and then
doing the move operation while holding both locks.
If other processes are doing I/O on the same pipes parallel to the
splice, then by the time both inodes are locked there might be no
buffers left to move, or no space to move them to. In this case retry
the whole operation, including the preparation phase. This could lead
to starvation, but I'm not sure if that's serious enough to worry
about.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
FUJITA Tomonori [Mon, 11 May 2009 08:56:09 +0000 (17:56 +0900)]
block: move completion related functions back to blk-core.c
Let's put the completion related functions back to block/blk-core.c
where they have lived. We can also unexport blk_end_bidi_request() and
__blk_end_bidi_request(), which nobody uses.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
FUJITA Tomonori [Mon, 11 May 2009 08:56:08 +0000 (17:56 +0900)]
scsi: simplify the bidi completion
Let's use blk_end_request_all() instead of blk_end_bidi_request().
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
FUJITA Tomonori [Mon, 11 May 2009 08:56:07 +0000 (17:56 +0900)]
block: let blk_end_request_all handle bidi requests
blk_end_request_all() and __blk_end_request_all() should finish all
bytes including bidi, by definition. That's what all bidi users need ,
bidi requests must be complete as a whole (partial completion is
impossible).
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Tejun Heo [Fri, 8 May 2009 02:54:16 +0000 (11:54 +0900)]
block: implement and enforce request peek/start/fetch
Till now block layer allowed two separate modes of request execution.
A request is always acquired from the request queue via
elv_next_request(). After that, drivers are free to either dequeue it
or process it without dequeueing. Dequeue allows elv_next_request()
to return the next request so that multiple requests can be in flight.
Executing requests without dequeueing has its merits mostly in
allowing drivers for simpler devices which can't do sg to deal with
segments only without considering request boundary. However, the
benefit this brings is dubious and declining while the cost of the API
ambiguity is increasing. Segment based drivers are usually for very
old or limited devices and as converting to dequeueing model isn't
difficult, it doesn't justify the API overhead it puts on block layer
and its more modern users.
Previous patches converted all block low level drivers to dequeueing
model. This patch completes the API transition by...
* renaming elv_next_request() to blk_peek_request()
* renaming blkdev_dequeue_request() to blk_start_request()
* adding blk_fetch_request() which is combination of peek and start
* disallowing completion of queued (not started) requests
* applying new API to all LLDs
Renamings are for consistency and to break out of tree code so that
it's apparent that out of tree drivers need updating.
[ Impact: block request issue API cleanup, no functional change ]
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Mike Miller <mike.miller@hp.com>
Cc: unsik Kim <donari75@gmail.com>
Cc: Paul Clements <paul.clements@steeleye.com>
Cc: Tim Waugh <tim@cyberelk.net>
Cc: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Laurent Vivier <Laurent@lvivier.info>
Cc: Jeff Garzik <jgarzik@pobox.com>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Grant Likely <grant.likely@secretlab.ca>
Cc: Adrian McMenamin <adrian@mcmen.demon.co.uk>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Cc: Borislav Petkov <petkovbb@googlemail.com>
Cc: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Cc: Alex Dubov <oakad@yahoo.com>
Cc: Pierre Ossman <drzeus@drzeus.cx>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Markus Lidel <Markus.Lidel@shadowconnect.com>
Cc: Stefan Weinhuber <wein@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Pete Zaitcev <zaitcev@redhat.com>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Tejun Heo [Fri, 8 May 2009 02:54:14 +0000 (11:54 +0900)]
gdrom: dequeue in-flight request
gdrom already dequeues and fully completes requests on normal path and
the error paths can be easily converted to do so too. Clean it up and
dequeue requests on error paths too.
While at it remove superflous blk_fs_request() && !blk_rq_sectors()
condition check.
[ Impact: dequeue in-flight request, cleanup ]
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Adrian McMenamin <adrian@mcmen.demon.co.uk>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Tejun Heo [Fri, 8 May 2009 02:54:15 +0000 (11:54 +0900)]
block: convert to dequeueing model (easy ones)
plat-omap/mailbox, floppy, viocd, mspro_block, i2o_block and
mmc/card/queue are already pretty close to dequeueing model and can be
converted with simple changes. Convert them.
While at it,
* xen-blkfront: !fs check moved downwards to share dequeue call with
normal path.
* mspro_block: __blk_end_request(..., blk_rq_cur_byte()) converted to
__blk_end_request_cur()
* mmc/card/queue: loop of __blk_end_request() converted to
__blk_end_request_all()
[ Impact: dequeue in-flight request ]
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Alex Dubov <oakad@yahoo.com>
Cc: Markus Lidel <Markus.Lidel@shadowconnect.com>
Cc: Pierre Ossman <drzeus@drzeus.cx>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Tejun Heo [Fri, 8 May 2009 02:54:13 +0000 (11:54 +0900)]
z2ram: dequeue in-flight request
z2ram processes requests one-by-one synchronously and can be easily
converted to dequeueing model. Convert it.
[ Impact: dequeue in-flight request ]
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Tejun Heo [Fri, 8 May 2009 02:54:12 +0000 (11:54 +0900)]
jsflash: dequeue in-flight request
jsflash processes requests one-by-one synchronously from a kthread and
can be easily converted to dequeueing model. Convert it.
[ Impact: dequeue in-flight request ]
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Pete Zaitcev <zaitcev@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Tejun Heo [Fri, 8 May 2009 02:54:11 +0000 (11:54 +0900)]
mtd_blkdevs: dequeue in-flight request
mtd_blkdevs processes requests one-by-one synchronously from a kthread
and can be easily converted to dequeueing model. Convert it.
[ Impact: dequeue in-flight request ]
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Tejun Heo [Fri, 8 May 2009 02:54:10 +0000 (11:54 +0900)]
xd: dequeue in-flight request
xd processes requests one-by-one synchronously and can be easily
converted to dequeueing model. Convert it.
While at it, use rq_cur_bytes instead of rq_bytes when checking for
sector overflow. This is for for consistency and better behavior for
merged requests.
[ Impact: dequeue in-flight request ]
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Tejun Heo [Fri, 8 May 2009 02:54:09 +0000 (11:54 +0900)]
swim: dequeue in-flight request
swim processes requests one-by-one synchronously and can easily be
converted to dequeuing model. Convert it.
[ Impact: dequeue in-flight request ]
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Laurent Vivier <Laurent@lvivier.info>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Tejun Heo [Fri, 8 May 2009 02:54:08 +0000 (11:54 +0900)]
amiflop: dequeue in-flight request
Request processing in amiflop is done sequentially in
redo_fd_request() proper and redo_fd_request() can easily be converted
to track in-flight request. Remove CURRENT, track in-flight request
directly and dequeue it when processing starts.
[ Impact: dequeue in-flight request ]
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Tejun Heo [Fri, 8 May 2009 02:54:07 +0000 (11:54 +0900)]
ps3disk: dequeue in-flight request
Other than in issue error paths, ps3disk always completely finishes
fetched requests. With full completion on error paths, it can be
easily converted to dequeueing model.
* After L1 r/w call failure, ps3disk_submit_request_sg() now fails the
whole request. Issue failure isn't likely to benefit from partial
retry anyway and ps3disk uses full failure in completion error path
too, so I don't think this amounts to any meaningful functionality
loss.
* flush completion is converted to _all for consistency. It doesn't
make any functional difference.
[ Impact: dequeue in-flight request ]
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Tejun Heo [Fri, 8 May 2009 02:54:06 +0000 (11:54 +0900)]
paride: dequeue in-flight request
pd/pf/pcd have track in-flight request by pd/pf/pcd_req. They can be
converted to dequeueing model by updating fetching and completion
paths. Convert them.
Note that removal of elv_next_request() call from pf_next_buf()
doesn't make any functional difference. The path is traveled only
during partial completion of a request and elv_next_request() call
must return the same request anyway.
[ Impact: dequeue in-flight request ]
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Tim Waugh <tim@cyberelk.net>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Tejun Heo [Fri, 8 May 2009 02:54:05 +0000 (11:54 +0900)]
xsysace: dequeue in-flight request
xsysace already tracks in-flight request using ace->req. Converting
to dequeueing model is mostly a matter of adding dequeueing call after
request fetching. The only tricky part is handling CF removal which
should complete both in flight and on queue requests. Convert to
dequeueing model.
While at it, remove explicit blk_rq_cur_bytes() and use
__blk_end_request_cur() instead.
[ Impact: dequeue in-flight request ]
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Tejun Heo [Fri, 8 May 2009 02:54:04 +0000 (11:54 +0900)]
swim3: dequeue in-flight request
swim3 has at most single request in flight and already tracks it using
fd_req. Convert it to dequeuing model by updating request fetching
and wrapping completion function.
[ Impact: dequeue in-flight request ]
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Tejun Heo [Fri, 8 May 2009 02:54:03 +0000 (11:54 +0900)]
ataflop: dequeue and track in-flight request
ataflop has single request in flight. Till now, whenever it needs to
access the in-flight request it called elv_next_request(). This patch
makes ataflop track the in-flight request directly and dequeue it when
processing starts. The added complexity is minimal and this will help
future block layer changes.
[ Impact: dequeue in-flight request, one elv_next_request() per request ]
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Tejun Heo [Fri, 8 May 2009 02:54:02 +0000 (11:54 +0900)]
hd: dequeue and track in-flight request
hd has at most single request in flight. Till now, whenever it needs
to access the in-flight request it called elv_next_request(). This
patch makes hd track the in-flight request directly and dequeue it
when processing starts. The added complexity is minimal and this will
help future block layer changes.
[ Impact: dequeue in-flight request, one elv_next_request() per request ]
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Tejun Heo [Fri, 8 May 2009 02:54:01 +0000 (11:54 +0900)]
mg_disk: dequeue and track in-flight request
mg_disk has at most single request in flight per device. Till now,
whenever it needs to access the in-flight request it called
elv_next_request(). This patch makes mg_disk track the in-flight
request directly using mg_host->req and dequeue it when processing
starts.
q->queuedata is set to mg_host so that mg_host can be determined
without fetching request from the queue.
[ Impact: dequeue in-flight request, one elv_next_request() per request ]
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: unsik Kim <donari75@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Tejun Heo [Fri, 8 May 2009 02:54:00 +0000 (11:54 +0900)]
mg_disk: fix queue hang / infinite retry on !fs requests
Both request functions in mg_disk simply return when they encounter a
!fs request, which means the request will never be cleared from the
queue causing queue hang and indefinite retry of the request. Fix it.
While at it, flatten condition checks and add unlikely to !fs tests.
[ Impact: fix possible queue hang / infinite retry of !fs requests ]
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: unsik Kim <donari75@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Tejun Heo [Fri, 8 May 2009 02:53:59 +0000 (11:53 +0900)]
ide: dequeue in-flight request
ide generally has single request in flight and tracks it using
hwif->rq and all state handlers follow the following convention.
* ide_started is returned if the request is in flight.
* ide_stopped is returned if the queue needs to be restarted. The
request might or might not have been processed fully or partially.
* hwif->rq is set to NULL, when an issued request completes.
So, dequeueing model can be implemented by dequeueing after fetch,
requeueing if hwif->rq isn't NULL on ide_stopped return and doing
about the same thing on completion / port unlock paths. These changes
can be made in ide-io proper.
In addition to the above main changes, the following updates are
necessary.
* ide-cd shouldn't dequeue a request when issuing REQUEST SENSE for it
as the request is already dequeued.
* ide-atapi uses request queue as stack when issuing REQUEST SENSE to
put the REQUEST SENSE in front of the failed request. This now
needs to be done using requeueing.
[ Impact: dequeue in-flight request ]
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Cc: Borislav Petkov <petkovbb@googlemail.com>
Cc: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Tejun Heo [Thu, 7 May 2009 13:24:45 +0000 (22:24 +0900)]
block: blk_rq_[cur_]_{sectors|bytes}() usage cleanup
With the previous changes, the followings are now guaranteed for all
requests in any valid state.
* blk_rq_sectors() == blk_rq_bytes() >> 9
* blk_rq_cur_sectors() == blk_rq_cur_bytes() >> 9
Clean up accessor usages. Notable changes are
* nbd,i2o_block: end_all used instead of explicit byte count
* scsi_lib: unnecessary conditional on request type removed
[ Impact: cleanup ]
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Paul Clements <paul.clements@steeleye.com>
Cc: Pete Zaitcev <zaitcev@redhat.com>
Cc: Alex Dubov <oakad@yahoo.com>
Cc: Markus Lidel <Markus.Lidel@shadowconnect.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Tejun Heo [Thu, 7 May 2009 13:24:44 +0000 (22:24 +0900)]
block: hide request sector and data_len
Block low level drivers for some reason have been pretty good at
abusing block layer API. Especially struct request's fields tend to
get violated in all possible ways. Make it clear that low level
drivers MUST NOT access or manipulate rq->sector and rq->data_len
directly by prefixing them with double underscores.
This change is also necessary to break build of out-of-tree codes
which assume the previous block API where internal fields can be
manipulated and rq->data_len carries residual count on completion.
[ Impact: hide internal fields, block API change ]
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Tejun Heo [Thu, 7 May 2009 13:24:43 +0000 (22:24 +0900)]
ide: cleanup rq->data_len usages
With recent unification of fields, it's now guaranteed that
rq->data_len always equals blk_rq_bytes(). Convert all direct users
to accessors.
[ Impact: convert direct rq->data_len usages to blk_rq_bytes() ]
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Cc: Borislav Petkov <petkovbb@googlemail.com>
Cc: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Tejun Heo [Thu, 7 May 2009 13:24:42 +0000 (22:24 +0900)]
block: cleanup rq->data_len usages
With recent unification of fields, it's now guaranteed that
rq->data_len always equals blk_rq_bytes(). Convert all non-IDE direct
users to accessors. IDE will be converted in a separate patch.
Boaz: spotted incorrect data_len/resid_len conversion in osd.
[ Impact: convert direct rq->data_len usages to blk_rq_bytes() ]
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Cc: Pete Zaitcev <zaitcev@redhat.com>
Cc: Eric Moore <Eric.Moore@lsi.com>
Cc: Markus Lidel <Markus.Lidel@shadowconnect.com>
Cc: Darrick J. Wong <djwong@us.ibm.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Eric Moore <Eric.Moore@lsi.com>
Cc: Boaz Harrosh <bharrosh@panasas.com>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Tejun Heo [Thu, 7 May 2009 13:24:41 +0000 (22:24 +0900)]
block: drop request->hard_* and *nr_sectors
struct request has had a few different ways to represent some
properties of a request. ->hard_* represent block layer's view of the
request progress (completion cursor) and the ones without the prefix
are supposed to represent the issue cursor and allowed to be updated
as necessary by the low level drivers. The thing is that as block
layer supports partial completion, the two cursors really aren't
necessary and only cause confusion. In addition, manual management of
request detail from low level drivers is cumbersome and error-prone at
the very least.
Another interesting duplicate fields are rq->[hard_]nr_sectors and
rq->{hard_cur|current}_nr_sectors against rq->data_len and
rq->bio->bi_size. This is more convoluted than the hard_ case.
rq->[hard_]nr_sectors are initialized for requests with bio but
blk_rq_bytes() uses it only for !pc requests. rq->data_len is
initialized for all request but blk_rq_bytes() uses it only for pc
requests. This causes good amount of confusion throughout block layer
and its drivers and determining the request length has been a bit of
black magic which may or may not work depending on circumstances and
what the specific LLD is actually doing.
rq->{hard_cur|current}_nr_sectors represent the number of sectors in
the contiguous data area at the front. This is mainly used by drivers
which transfers data by walking request segment-by-segment. This
value always equals rq->bio->bi_size >> 9. However, data length for
pc requests may not be multiple of 512 bytes and using this field
becomes a bit confusing.
In general, having multiple fields to represent the same property
leads only to confusion and subtle bugs. With recent block low level
driver cleanups, no driver is accessing or manipulating these
duplicate fields directly. Drop all the duplicates. Now rq->sector
means the current sector, rq->data_len the current total length and
rq->bio->bi_size the current segment length. Everything else is
defined in terms of these three and available only through accessors.
* blk_recalc_rq_sectors() is collapsed into blk_update_request() and
now handles pc and fs requests equally other than rq->sector update.
This means that now pc requests can use partial completion too (no
in-kernel user yet tho).
* bio_cur_sectors() is replaced with bio_cur_bytes() as block layer
now uses byte count as the primary data length.
* blk_rq_pos() is now guranteed to be always correct. In-block users
converted.
* blk_rq_bytes() is now guaranteed to be always valid as is
blk_rq_sectors(). In-block users converted.
* blk_rq_sectors() is now guaranteed to equal blk_rq_bytes() >> 9.
More convenient one is used.
* blk_rq_bytes() and blk_rq_cur_bytes() are now inlined and take const
pointer to request.
[ Impact: API cleanup, single way to represent one property of a request ]
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Tejun Heo [Thu, 7 May 2009 13:24:40 +0000 (22:24 +0900)]
ide: convert to rq pos and nr_sectors accessors
ide doesn't manipulate request fields anymore and thus all hard and
their soft equivalents are always equal. Convert all references to
accessors.
[ Impact: use pos and nr_sectors accessors ]
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Cc: Borislav Petkov <petkovbb@googlemail.com>
Cc: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Tejun Heo [Thu, 7 May 2009 13:24:39 +0000 (22:24 +0900)]
block: convert to pos and nr_sectors accessors
With recent cleanups, there is no place where low level driver
directly manipulates request fields. This means that the 'hard'
request fields always equal the !hard fields. Convert all
rq->sectors, nr_sectors and current_nr_sectors references to
accessors.
While at it, drop superflous blk_rq_pos() < 0 test in swim.c.
[ Impact: use pos and nr_sectors accessors ]
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Tested-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Tested-by: Adrian McMenamin <adrian@mcmen.demon.co.uk>
Acked-by: Adrian McMenamin <adrian@mcmen.demon.co.uk>
Acked-by: Mike Miller <mike.miller@hp.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Cc: Borislav Petkov <petkovbb@googlemail.com>
Cc: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Cc: Eric Moore <Eric.Moore@lsi.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Pete Zaitcev <zaitcev@redhat.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Paul Clements <paul.clements@steeleye.com>
Cc: Tim Waugh <tim@cyberelk.net>
Cc: Jeff Garzik <jgarzik@pobox.com>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Alex Dubov <oakad@yahoo.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Dario Ballabio <ballabio_dario@emc.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: unsik Kim <donari75@gmail.com>
Cc: Laurent Vivier <Laurent@lvivier.info>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Tejun Heo [Thu, 7 May 2009 13:24:38 +0000 (22:24 +0900)]
block: implement blk_rq_pos/[cur_]sectors() and convert obvious ones
Implement accessors - blk_rq_pos(), blk_rq_sectors() and
blk_rq_cur_sectors() which return rq->hard_sector, rq->hard_nr_sectors
and rq->hard_cur_sectors respectively and convert direct references of
the said fields to the accessors.
This is in preparation of request data length handling cleanup.
Geert : suggested adding const to struct request * parameter to accessors
Sergei : spotted error in patch description
[ Impact: cleanup ]
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Acked-by: Stephen Rothwell <sfr@canb.auug.org.au>
Tested-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Ackec-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Cc: Borislav Petkov <petkovbb@googlemail.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Tejun Heo [Thu, 7 May 2009 13:24:37 +0000 (22:24 +0900)]
block: add rq->resid_len
rq->data_len served two purposes - the length of data buffer on issue
and the residual count on completion. This duality creates some
headaches.
First of all, block layer and low level drivers can't really determine
what rq->data_len contains while a request is executing. It could be
the total request length or it coulde be anything else one of the
lower layers is using to keep track of residual count. This
complicates things because blk_rq_bytes() and thus
[__]blk_end_request_all() relies on rq->data_len for PC commands.
Drivers which want to report residual count should first cache the
total request length, update rq->data_len and then complete the
request with the cached data length.
Secondly, it makes requests default to reporting full residual count,
ie. reporting that no data transfer occurred. The residual count is
an exception not the norm; however, the driver should clear
rq->data_len to zero to signify the normal cases while leaving it
alone means no data transfer occurred at all. This reverse default
behavior complicates code unnecessarily and renders block PC on some
drivers (ide-tape/floppy) unuseable.
This patch adds rq->resid_len which is used only for residual count.
While at it, remove now unnecessasry blk_rq_bytes() caching in
ide_pc_intr() as rq->data_len is not changed anymore.
Boaz : spotted missing conversion in osd
Sergei : spotted too early conversion to blk_rq_bytes() in ide-tape
[ Impact: cleanup residual count handling, report 0 resid by default ]
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Cc: Borislav Petkov <petkovbb@googlemail.com>
Cc: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Cc: Mike Miller <mike.miller@hp.com>
Cc: Eric Moore <Eric.Moore@lsi.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Doug Gilbert <dgilbert@interlog.com>
Cc: Mike Miller <mike.miller@hp.com>
Cc: Eric Moore <Eric.Moore@lsi.com>
Cc: Darrick J. Wong <djwong@us.ibm.com>
Cc: Pete Zaitcev <zaitcev@redhat.com>
Cc: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Tejun Heo [Thu, 7 May 2009 13:24:36 +0000 (22:24 +0900)]
ide-tape: don't initialize rq->sector for rw requests
rq->sector is set to the tape->first_frame but it's never actually
used and not even in the correct unit (512 byte sectors). Don't set
it.
[ Impact: cleanup ]
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Borislav Petkov <petkovbb@gmail.com>
Acked-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Tejun Heo [Thu, 7 May 2009 13:24:35 +0000 (22:24 +0900)]
nbd: don't clear rq->sector and nr_sectors unnecessarily
There's no reason to clear rq->sector and nr_sectors after calling
blk_rq_init(). They're guaranteed to be clear. Drop unnecessary
clearing.
[ Impact: cleanup ]
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Paul Clements <paul.clements@steeleye.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Bartlomiej Zolnierkiewicz [Tue, 28 Apr 2009 04:06:17 +0000 (13:06 +0900)]
mg_disk: use defines from <linux/ata.h>
While at it:
- remove MG_REG_HEAD_MUST_BE_ON define
- remove MG_REG_CTRL_INTR_ENABLE define
- remove MG_REG_HEAD_LBA_MODE define
- remove unused defines
Cc: unsik Kim <donari75@gmail.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Bartlomiej Zolnierkiewicz [Tue, 28 Apr 2009 04:06:16 +0000 (13:06 +0900)]
mg_disk: fix dependency on libata
Add local copies of ata_id_string() and ata_id_c_string() to mg_disk
so there is no need for the driver to depend on ATA and SCSI.
[ Impact: break dependency on libata by copying ata id string functions ]
Cc: unsik Kim <donari75@gmail.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Tejun Heo [Tue, 28 Apr 2009 04:06:15 +0000 (13:06 +0900)]
mg_disk: clean up request completion paths
mg_disk implements its own partial completion. Convert to standard
block layer partial completion.
[ Impact: cleanup ]
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: unsik Kim <donari75@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Tejun Heo [Tue, 28 Apr 2009 04:06:14 +0000 (13:06 +0900)]
mg_disk: fold mg_disk.h into mg_disk.c
include/linux/mg_disk.h is used only by drivers/block/mg_disk.c. No
reason to put it in a separate header. Fold it into mg_disk.c.
[ Impact: cleanup ]
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: unsik Kim <donari75@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Tejun Heo [Tue, 28 Apr 2009 04:06:13 +0000 (13:06 +0900)]
swim: clean up request completion paths
swim curiously tries to update request parameters before calling
__blk_end_request() when __blk_end_request() will do it anyway and
unnecessarily checks whether current_nr_sectors is zero right after
fetching.
Drop unnecessary stuff and use standard block layer mechanisms.
[ Impact: cleanup ]
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Laurent Vivier <Laurent@lvivier.info>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Tejun Heo [Tue, 28 Apr 2009 04:06:12 +0000 (13:06 +0900)]
swim3: clean up request completion paths
swim3 curiously tries to update request parameters before calling
__blk_end_request() when __blk_end_request() will do it anyway, and it
updates request for partial completion manually instead of using
blk_update_request(). Also, it does some spurious checks on rq such
as testing whether rq->sector is negative or current_nr_sectors is
zero right after fetching.
Drop unnecessary stuff and use standard block layer mechanisms.
[ Impact: cleanup ]
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Tejun Heo [Tue, 28 Apr 2009 04:06:11 +0000 (13:06 +0900)]
hd: clean up request completion paths
hd read/write_intr() functions manually manipulate request to
incrementally complete it, which block layer already supports. Simply
use block layer completion routines instead of manual partial
completion.
While at it, clear unnecessary elv_next_request() check at the tail of
read_intr(). This also makes read and write_intr() more consistent.
[ Impact: cleanup ]
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Tejun Heo [Tue, 28 Apr 2009 04:06:10 +0000 (13:06 +0900)]
ubd: drop unnecessary rq->sector manipulation
ubd curiously updates rq->sector while issuing the request in multiple
pieces. Don't do it and simply use local copy of sector.
[ Impact: cleanup ]
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jeff Dike <jdike@linux.intel.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Tejun Heo [Tue, 28 Apr 2009 04:06:09 +0000 (13:06 +0900)]
ubd: cleanup completion path
ubd had its own block request partial completion mechanism, which is
unnecessary as block layer already does it. Kill ubd_end_request()
and ubd_finish() and replace them with direct call to
blk_end_request().
[ Impact: cleanup ]
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jeff Dike <jdike@linux.intel.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Tejun Heo [Tue, 28 Apr 2009 04:06:08 +0000 (13:06 +0900)]
sunvdc: kill vdc_end_request()
vdc_end_request() is a thin silly wrapper on top of
__blk_end_request(). Kill it.
[ Impact: cleanup ]
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Tejun Heo [Tue, 28 Apr 2009 04:06:07 +0000 (13:06 +0900)]
ps3disk: simplify request completion
ps3disk_interrupt() always completes requests fully but it uses
rq->hard_cur_sectors for FLUSH requests for some reason. Drop them
and simply use __blk_end_request_all().
[ Impact: cleanup ]
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Tejun Heo [Tue, 28 Apr 2009 04:06:06 +0000 (13:06 +0900)]
amiflop,ataflop,xd,mg_disk: clean up unnecessary stuff from block drivers
rq_data_dir() can only be READ or WRITE and rq->sector and nr_sectors
are always automatically updated after partial request completion.
Don't worry about rq_data_dir() not being either READ or WRITE or
manually update sector and nr_sectors.
[ Impact: cleanup ]
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jörg Dorchain <joerg@dorchain.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: unsik Kim <donari75@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Tejun Heo [Tue, 28 Apr 2009 04:06:05 +0000 (13:06 +0900)]
block: don't init rq fields unnecessarily
blk_get_request() always returns properly zeroed requests. Don't set
fields to zero/NULL unnecessarily.
[ Impact: cleanup ]
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Tejun Heo [Tue, 28 Apr 2009 04:06:04 +0000 (13:06 +0900)]
block: make blk_end_request_cur() return bool
In the process of mindlessly copying [__]blk_end_request_all(),
[__]blk_end_request_cur() ended up returning void even though they're
partial completion functions. Fix it.
[ Impact: fix braindead API ]
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Nikanth Karthikesan [Mon, 27 Apr 2009 12:53:54 +0000 (14:53 +0200)]
block: catch trying to use more bits than request->cmd_flags has
Signed-off-by: Nikanth Karthikesan <knikanth@suse.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Jens Axboe [Fri, 24 Apr 2009 06:12:19 +0000 (08:12 +0200)]
block: include discard requests in IO accounting
We currently don't do merging on discard requests, but we potentially
could. If we do, then we need to include discard requests in the IO
accounting, or merging would end up decrementing in_flight IO counters
for an IO which never incremented them.
So enable accounting for discard requests.
Problem found by Nikanth Karthikesan <knikanth@suse.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Jens Axboe [Fri, 24 Apr 2009 06:10:11 +0000 (08:10 +0200)]
block: make blk_do_io_stat() do the full "is this rq accountable" checks
We currently check for file system requests outside of blk_do_io_stat(rq),
but we may as well just include it.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Tejun Heo [Thu, 23 Apr 2009 02:05:20 +0000 (11:05 +0900)]
block: kill rq->data
Now that all block request data transfer is done via bio, rq->data
isn't used. Kill it.
While at it, make the roles of rq->special and buffer clear.
[ Impact: drop now unncessary field from struct request ]
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Boaz Harrosh <bharrosh@panasas.com>
Tejun Heo [Thu, 23 Apr 2009 02:05:20 +0000 (11:05 +0900)]
arm-omap: don't abuse rq->data
omap mailbox uses rq->data as the second opaque pointer to carry
mbox_msg_t and rq->special message argument which is needed only for
tx. Add and use omap_msg_tx_data struct for tx and use rq->special
for mbox_msg_t for rx such that only rq->special is used as opaque
pointer.
[ Impact: cleanup rq->data usage, extra kmalloc in msg_send ]
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Tejun Heo [Thu, 23 Apr 2009 02:05:19 +0000 (11:05 +0900)]
block: replace end_request() with [__]blk_end_request_cur()
end_request() has been kept around for backward compatibility;
however, it's about time for it to go away.
* There aren't too many users left.
* Its use of @updtodate is pretty confusing.
* In some cases, newer code ends up using mixture of end_request() and
[__]blk_end_request[_all](), which is way too confusing.
So, add [__]blk_end_request_cur() and replace end_request() with it.
Most conversions are straightforward. Noteworthy ones are...
* paride/pcd: next_request() updated to take 0/-errno instead of 1/0.
* paride/pf: pf_end_request() and next_request() updated to take
0/-errno instead of 1/0.
* xd: xd_readwrite() updated to return 0/-errno instead of 1/0.
* mtd/mtd_blkdevs: blktrans_discard_request() updated to return
0/-errno instead of 1/0. Unnecessary local variable res
initialization removed from mtd_blktrans_thread().
[ Impact: cleanup ]
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Joerg Dorchain <joerg@dorchain.net>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: Laurent Vivier <Laurent@lvivier.info>
Cc: Tim Waugh <tim@cyberelk.net>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Markus Lidel <Markus.Lidel@shadowconnect.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Pete Zaitcev <zaitcev@redhat.com>
Cc: unsik Kim <donari75@gmail.com>
Tejun Heo [Thu, 23 Apr 2009 02:05:19 +0000 (11:05 +0900)]
block: implement and use [__]blk_end_request_all()
There are many [__]blk_end_request() call sites which call it with
full request length and expect full completion. Many of them ensure
that the request actually completes by doing BUG_ON() the return
value, which is awkward and error-prone.
This patch adds [__]blk_end_request_all() which takes @rq and @error
and fully completes the request. BUG_ON() is added to to ensure that
this actually happens.
Most conversions are simple but there are a few noteworthy ones.
* cdrom/viocd: viocd_end_request() replaced with direct calls to
__blk_end_request_all().
* s390/block/dasd: dasd_end_request() replaced with direct calls to
__blk_end_request_all().
* s390/char/tape_block: tapeblock_end_request() replaced with direct
calls to blk_end_request_all().
[ Impact: cleanup ]
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Mike Miller <mike.miller@hp.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Jeff Garzik <jgarzik@pobox.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Alex Dubov <oakad@yahoo.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Tejun Heo [Thu, 23 Apr 2009 02:05:18 +0000 (11:05 +0900)]
block: move rq->start_time initialization to blk_rq_init()
rq->start_time was initialized in init_request_from_bio() so special
requests didn't have start_time set. This has been okay as start_time
has been used only for fs requests; however, there is no indication of
this actually is the case or not. Set rq->start_time in blk_rq_init()
and guarantee that all initialized rq's have its start_time set. This
improves consistency at virtually no cost and future changes will make
use of the timestamp for !bio requests.
[ Impact: rq->start_time is valid for all requests ]
Signed-off-by: Tejun Heo <tj@kernel.org>
Tejun Heo [Thu, 23 Apr 2009 02:05:18 +0000 (11:05 +0900)]
block: clean up request completion API
Request completion has gone through several changes and became a bit
messy over the time. Clean it up.
1. end_that_request_data() is a thin wrapper around
end_that_request_data_first() which checks whether bio is NULL
before doing anything and handles bidi completion.
blk_update_request() is a thin wrapper around
end_that_request_data() which clears nr_sectors on the last
iteration but doesn't use the bidi completion.
Clean it up by moving the initial bio NULL check and nr_sectors
clearing on the last iteration into end_that_request_data() and
renaming it to blk_update_request(), which makes blk_end_io() the
only user of end_that_request_data(). Collapse
end_that_request_data() into blk_end_io().
2. There are four visible completion variants - blk_end_request(),
__blk_end_request(), blk_end_bidi_request() and end_request().
blk_end_request() and blk_end_bidi_request() uses blk_end_request()
as the backend but __blk_end_request() and end_request() use
separate implementation in __blk_end_request() due to different
locking rules.
blk_end_bidi_request() is identical to blk_end_io(). Collapse
blk_end_io() into blk_end_bidi_request(), separate out request
update into internal helper blk_update_bidi_request() and add
__blk_end_bidi_request(). Redefine [__]blk_end_request() as thin
inline wrappers around [__]blk_end_bidi_request().
3. As the whole request issue/completion usages are about to be
modified and audited, it's a good chance to convert completion
functions return bool which better indicates the intended meaning
of return values.
4. The function name end_that_request_last() is from the days when it
was a public interface and slighly confusing. Give it a proper
internal name - blk_finish_request().
5. Add description explaning that blk_end_bidi_request() can be safely
used for uni requests as suggested by Boaz Harrosh.
The only visible behavior change is from #1. nr_sectors counts are
cleared after the final iteration no matter which function is used to
complete the request. I couldn't find any place where the code
assumes those nr_sectors counters contain the values for the last
segment and this change is good as it makes the API much more
consistent as the end result is now same whether a request is
completed using [__]blk_end_request() alone or in combination with
blk_update_request().
API further cleaned up per Christoph's suggestion.
[ Impact: cleanup, rq->*nr_sectors always updated after req completion ]
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Boaz Harrosh <bharrosh@panasas.com>
Cc: Christoph Hellwig <hch@infradead.org>
Tejun Heo [Thu, 23 Apr 2009 02:05:18 +0000 (11:05 +0900)]
block: kill blk_end_request_callback()
With recent IDE updates, blk_end_request_callback() doesn't have any
user now. Kill it.
[ Impact: removal of unused convoluted interface ]
Signed-off-by: Tejun Heo <tj@kernel.org>
Tejun Heo [Thu, 23 Apr 2009 02:05:18 +0000 (11:05 +0900)]
block: reorganize request fetching functions
Impact: code reorganization
elv_next_request() and elv_dequeue_request() are public block layer
interface than actual elevator implementation. They mostly deal with
how requests interact with block layer and low level drivers at the
beginning of rqeuest processing whereas __elv_next_request() is the
actual eleveator request fetching interface.
Move the two functions to blk-core.c. This prepares for further
interface cleanup.
Signed-off-by: Tejun Heo <tj@kernel.org>
Tejun Heo [Thu, 23 Apr 2009 02:05:18 +0000 (11:05 +0900)]
block: reorder request completion functions
Reorder request completion functions such that
* All request completion functions are located together.
* Functions which are used by only one caller is put right above the
caller.
* end_request() is put after other completion functions but before
blk_update_request().
This change is for completion function cleanup which will follow.
[ Impact: cleanup, code reorganization ]
Signed-off-by: Tejun Heo <tj@kernel.org>
Tejun Heo [Thu, 23 Apr 2009 02:05:18 +0000 (11:05 +0900)]
block: clean up misc stuff after block layer timeout conversion
* In blk_rq_timed_out_timer(), else { if } to else if
* In blk_add_timer(), simplify if/else block
[ Impact: cleanup ]
Signed-off-by: Tejun Heo <tj@kernel.org>
Tejun Heo [Thu, 23 Apr 2009 02:05:18 +0000 (11:05 +0900)]
block: cleanup REQ_SOFTBARRIER usages
blk_insert_request() doesn't need to worry about REQ_SOFTBARRIER.
Don't set it. Combined with recent ide updates, REQ_SOFTBARRIER is
now only used in elevator proper and for discard requests.
[ Impact: cleanup ]
Signed-off-by: Tejun Heo <tj@kernel.org>
Tejun Heo [Thu, 23 Apr 2009 02:05:17 +0000 (11:05 +0900)]
block: don't set REQ_NOMERGE unnecessarily
RQ_NOMERGE_FLAGS already clears defines which REQ flags aren't
mergeable. There is no reason to specify it superflously. It only
adds to confusion. Don't set REQ_NOMERGE for barriers and requests
with specific queueing directive. REQ_NOMERGE is now exclusively used
by the merging code.
[ Impact: cleanup ]
Signed-off-by: Tejun Heo <tj@kernel.org>
Tejun Heo [Thu, 23 Apr 2009 02:05:17 +0000 (11:05 +0900)]
block: kill blk_start_queueing()
blk_start_queueing() is identical to __blk_run_queue() except that it
doesn't check for recursion. None of the current users depends on
blk_start_queueing() running request_fn directly. Replace usages of
blk_start_queueing() with [__]blk_run_queue() and kill it.
[ Impact: removal of mostly duplicate interface function ]
Signed-off-by: Tejun Heo <tj@kernel.org>
Tejun Heo [Thu, 23 Apr 2009 02:05:17 +0000 (11:05 +0900)]
block: merge blk_invoke_request_fn() into __blk_run_queue()
__blk_run_queue wraps blk_invoke_request_fn() such that it
additionally removes plug and bails out early if the queue is empty.
Both extra operations have their own pending mechanisms and don't
cause any harm correctness-wise when they are done superflously.
The only user of blk_invoke_request_fn() being blk_start_queue(),
there isn't much reason to keep both functions around. Merge
blk_invoke_request_fn() into __blk_run_queue() and make
blk_start_queue() use __blk_run_queue() instead.
[ Impact: merge two subtly different internal functions ]
Signed-off-by: Tejun Heo <tj@kernel.org>
Jeff Moyer [Wed, 22 Apr 2009 12:08:13 +0000 (14:08 +0200)]
block: implement blkdev_readpages
Doing a proper block dev ->readpages() speeds up the crazy dump(8)
approach of using interleaved process IO.
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Bartlomiej Zolnierkiewicz [Tue, 21 Apr 2009 07:27:03 +0000 (09:27 +0200)]
block: enable by default support for large devices and files on 32-bit archs
Enable by default support for large devices and files (CONFIG_LBD):
- With 1TB disks being a commodity hardware it is quite easy to hit 2TB
limitation while building RAIDs etc. and many distros have been using
CONFIG_LBD=y by default already (at least Fedora 10 and openSUSE 11.1).
- This should also prevent a subtle ext4 filesystem compatibility issue:
mke2fs.ext4 defaults to creating filesystems with huge_files feature
enabled and such filesystems cannot be later mounted read-write on
machines with CONFIG_LBD=n (it should be quite easy to hit this issue
when trying to use filesystem created using distro kernel on system
running the self-build kernel, think about USB disk enclosures & co.).
While at it:
- Clarify config option help text w.r.t. mounting ext4 filesystems
(they can be mounted with CONFIG_LBD=n but in the read-only mode).
Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Tejun Heo [Tue, 21 Apr 2009 03:16:56 +0000 (12:16 +0900)]
ide-dma: don't reset request fields on dma_timeout_retry()
Impact: drop unnecessary code
Now that everything uses bio and block operations, there is no need to
reset request fields manually when retrying a request. Every field is
guaranteed to be always valid. Drop unnecessary request field
resetting from ide_dma_timeout_retry().
Signed-off-by: Tejun Heo <tj@kernel.org>
Tejun Heo [Sat, 18 Apr 2009 23:46:03 +0000 (08:46 +0900)]
ide: drop rq->data handling from ide_map_sg()
Impact: remove code path which is no longer necessary
All IDE data transfers now use rq->bio. Simplify ide_map_sg()
accordingly.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Tejun Heo [Sat, 18 Apr 2009 23:46:03 +0000 (08:46 +0900)]
ide-atapi: kill unused fields and callbacks
Impact: remove fields and code paths which are no longer necessary
Now that ide-tape uses standard mechanisms to transfer data, special
case handling for bh handling can be dropped from ide-atapi. Drop the
followings.
* pc->cur_pos, b_count, bh and b_data
* drive->pc_update_buffers() and pc_io_buffers().
Signed-off-by: Tejun Heo <tj@kernel.org>
Tejun Heo [Sat, 18 Apr 2009 23:46:03 +0000 (08:46 +0900)]
ide-tape: simplify read/write functions
Impact: cleanup
idetape_chrdev_read/write() functions are unnecessarily complex when
everything can be handled in a single loop. Collapse
idetape_add_chrdev_read/write_request() into the rw functions and
simplify the implementation.
Signed-off-by: Tejun Heo <tj@kernel.org>
Tejun Heo [Sat, 18 Apr 2009 23:46:03 +0000 (08:46 +0900)]
ide-tape: use byte size instead of sectors on rw issue functions
Impact: cleanup
Byte size is what most issue functions deal with, make
idetape_queue_rw_tail() and its wrappers take byte size instead of
sector counts. idetape_chrdev_read() and write() functions are
converted to use tape->buffer_size instead of ctl from tape->cap.
This cleans up code a little bit and will ease the next r/w
reimplementation.
Signed-off-by: Tejun Heo <tj@kernel.org>
Tejun Heo [Sat, 18 Apr 2009 23:46:02 +0000 (08:46 +0900)]
ide-tape: unify r/w init paths
Impact: cleanup
Read and write init paths are almost identical. Unify them into
idetape_init_rw().
Signed-off-by: Tejun Heo <tj@kernel.org>
Tejun Heo [Sat, 18 Apr 2009 23:46:02 +0000 (08:46 +0900)]
ide-tape: kill idetape_bh
Impact: kill now unnecessary idetape_bh
With everything using standard mechanisms, there is no need for
idetape_bh anymore. Kill it and use tape->buf, cur and valid to
describe data buffer instead.
Changes worth mentioning are...
* idetape_queue_rq_tail() now always queue tape->buf and and adjusts
buffer state properly before completion.
* idetape_pad_zeros() clears the buffer only once.
Signed-off-by: Tejun Heo <tj@kernel.org>
Tejun Heo [Sat, 18 Apr 2009 23:46:02 +0000 (08:46 +0900)]
ide-tape: use standard data transfer mechanism
Impact: use standard way to transfer data
ide-tape uses rq in an interesting way. For r/w requests, rq->special
is used to carry a private buffer management structure idetape_bh and
rq->nr_sectors and current_nr_sectors are initialized to the number of
idetape blocks which isn't necessary 512 bytes. Also,
rq->current_nr_sectors is used to report back the residual count in
units of idetape blocks.
This peculiarity taxes both block layer and ide. ide-atapi has
different paths and hooks to accomodate it and what a rq means becomes
quite confusing and making changes at the block layer becomes quite
difficult and error-prone.
This patch makes ide-tape use bio instead. With the previous patch,
ide-tape currently is using single contiguos buffer so replacing it
isn't difficult. Data buffer is mapped into bio using
blk_rq_map_kern() in idetape_queue_rw_tail(). idetape_io_buffers()
and idetape_update_buffers() are dropped and pc->bh is set to null to
tell ide-atapi to use standard data transfer mechanism and idetape_bh
byte counts are updated by the issuer on completion using the residual
count.
This change also nicely removes the FIXME in ide_pc_intr() where
ide-tape rqs need to be completed using ide_rq_bytes() instead of
blk_rq_bytes() (although this didn't really matter as the request
didn't have bio).
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <jens.axboe@oracle.com>
Tejun Heo [Sat, 18 Apr 2009 23:46:02 +0000 (08:46 +0900)]
ide-tape: use single continuous buffer
Impact: simpler buffer allocation and handling, kills OOM, fix DMA transfers
ide-tape has its own multiple buffer mechanism using struct
idetape_bh. It allocates buffer with decreasing order-of-two
allocations so that it results in minimum number of segments.
However, the implementation is quite complex and works in a way that
no other block or ide driver works necessitating a lot of special case
handling.
The benefit this complex allocation scheme brings is questionable as
PIO or DMA the number of segments (16 maximum) doesn't make any
noticeable difference and it also doesn't negate the need for multiple
order allocation which can fail under memory pressure or high
fragmentation although it does lower the highest order necessary by
one when the buffer size isn't power of two.
As the first step to remove the custom buffer management, this patch
makes ide-tape allocate single continous buffer. The maximum order is
four. I doubt the change would cause any trouble but if it ever
matters, it should be converted to regular sg mechanism like everyone
else and even in that case dropping custom buffer handling and moving
to standard mechanism first make sense as an intermediate step.
This patch makes the first bh to contain the whole buffer and drops
multi bh handling code. Following patches will make further changes.
This patch has the side effect of killing OOM triggered by allocation
path and fixing DMA transfers. Previously, bug in alloc path
triggered OOM on command issue and commands were passed to DMA engine
without DMA-mapping all the segments.
Signed-off-by: Tejun Heo <tj@kernel.org>
Tejun Heo [Sat, 18 Apr 2009 23:46:02 +0000 (08:46 +0900)]
ide-atapi,tape,floppy: allow ->pc_callback() to change rq->data_len
Impact: allow residual count implementation in ->pc_callback()
rq->data_len has two duties - carrying the number of input bytes on
issue and carrying residual count back to the issuer on completion.
ide-atapi completion callback ->pc_callback() is the right place to do
this but currently ide-atapi depends on rq->data_len carrying the
original request size after calling ->pc_callback() to complete the pc
request.
This patch makes ide_pc_intr(), ide_tape_issue_pc() and
ide_floppy_issue_pc() cache length to complete before calling
->pc_callback() so that it can modify rq->data_len as necessary.
Note: As using rq->data_len for two purposes can make cases like this
incorrect in subtle ways, future changes will introduce separate
field for residual count.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <jens.axboe@oracle.com>
Tejun Heo [Sat, 18 Apr 2009 23:46:02 +0000 (08:46 +0900)]
ide-tape,floppy: fix failed command completion after request sense
Impact: fix infinite retry loop
After a command failed, ide-tape and floppy inserts REQUEST_SENSE in
front of the failed command and according to the result, sets
pc->retries, flags and errors. After REQUEST_SENSE is complete, the
failed command is again at the front of the queue and if the verdict
was to terminate the request, the issue functions tries to complete it
directly by calling drive->pc_callback() and returning ide_stopped.
However, drive->pc_callback() doesn't complete a request. It only
prepares for completion of the request. As a result, this creates an
infinite loop where the failed request is retried perpetually.
Fix it by actually ending the request by calling ide_complete_rq().
Signed-off-by: Tejun Heo <tj@kernel.org>
Tejun Heo [Sat, 18 Apr 2009 22:00:43 +0000 (07:00 +0900)]
ide-pm: don't abuse rq->data
Impact: cleanup rq->data usage
ide-pm uses rq->data to carry pointer to struct request_pm_state
through request queue and rq->special is used to carray pointer to
local struct ide_cmd, which isn't necessary. Use rq->special for
request_pm_state instead and use local ide_cmd in
ide_start_power_step().
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Tejun Heo [Sat, 18 Apr 2009 22:00:42 +0000 (07:00 +0900)]
ide-cd,atapi: use bio for internal commands
Impact: unify request data buffer handling
rq->data is used mostly to pass kernel buffer through request queue
without using bio. There are only a couple of places which still do
this in kernel and converting to bio isn't difficult.
This patch converts ide-cd and atapi to use bio instead of rq->data
for request sense and internal pc commands. With previous change to
unify sense request handling, this is relatively easily achieved by
adding blk_rq_map_kern() during sense_rq prep and PC issue.
If blk_rq_map_kern() fails for sense, the error is deferred till sense
issue and aborts the failed command which triggered the sense. Note
that this is a slim possibility as sense prep is done on each command
issue, so for the above condition to actually trigger, all preps since
the last sense issue till the issue of the request which would require
a sense should fail.
* do_request functions might sleep now. This should be okay as ide
request_fn - do_ide_request() - is invoked only from make_request
and plug work. Make sure this is the case by adding might_sleep()
to do_ide_request().
* Functions which access the read sense data before the sense request
is complete now should access bio_data(sense_rq->bio) as the sense
buffer might have been copied during blk_rq_map_kern().
* ide-tape updated to map sg.
* cdrom_do_block_pc() now doesn't have to deal with REQ_TYPE_ATA_PC
special case. Simplified.
* tp_ops->output/input_data path dropped from ide_pc_intr().
Signed-off-by: Tejun Heo <tj@kernel.org>