Kent Overstreet [Wed, 31 Jul 2013 08:12:02 +0000 (01:12 -0700)]
bcache: Use ida for bcache block dev minor
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Kent Overstreet [Thu, 22 Aug 2013 00:49:09 +0000 (17:49 -0700)]
bcache: Fix sysfs splat on shutdown with flash only devs
Whoops.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Kent Overstreet [Thu, 31 Oct 2013 22:43:22 +0000 (15:43 -0700)]
bcache: Better full stripe scanning
The old scanning-by-stripe code burned too much CPU, this should be
better.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Kent Overstreet [Fri, 26 Jul 2013 19:32:38 +0000 (12:32 -0700)]
bcache: Have btree_split() insert into parent directly
The flow control in btree_insert_node() was... fragile... before,
this'll use more stack (but since our btrees are never more than depth
1, that shouldn't matter) and it should be significantly clearer and
less fragile.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Kent Overstreet [Wed, 31 Jul 2013 07:03:54 +0000 (00:03 -0700)]
bcache: Move spinlock into struct time_stats
Minor cleanup.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Kent Overstreet [Wed, 31 Jul 2013 05:34:40 +0000 (22:34 -0700)]
bcache: Kill sequential_merge option
It never really made sense to expose this, so just kill it.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Kent Overstreet [Wed, 11 Sep 2013 00:18:59 +0000 (17:18 -0700)]
bcache: Kill bch_next_recurse_key()
This dates from before the btree iterator, and now it's finally gone
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Kent Overstreet [Wed, 11 Sep 2013 02:07:35 +0000 (19:07 -0700)]
bcache: Avoid deadlocking in garbage collection
Not a complete fix - we could still deadlock if btree_insert_node() has
to split...
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Kent Overstreet [Wed, 11 Sep 2013 02:07:00 +0000 (19:07 -0700)]
bcache: Incremental gc
Big garbage collection rewrite; now, garbage collection uses the same
mechanisms as used elsewhere for inserting/updating btree node pointers,
instead of rewriting interior btree nodes in place.
This makes the code significantly cleaner and less fragile, and means we
can now make garbage collection incremental - it doesn't have to hold a
write lock on the root of the btree for the entire duration of garbage
collection.
This means that there's less of a latency hit for doing garbage
collection, which means we can gc more frequently (and do a better job
of reclaiming from the cache), and we can coalesce across more btree
nodes (improving our space efficiency).
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Kent Overstreet [Thu, 25 Jul 2013 06:18:05 +0000 (23:18 -0700)]
bcache: Add make_btree_freeing_key()
Refactoring, prep work for incremental garbage collection.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Kent Overstreet [Wed, 24 Jul 2013 03:48:29 +0000 (20:48 -0700)]
bcache: Add btree_node_write_sync()
More refactoring - mostly making the interfaces more explicit about what
we actually want to do.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Kent Overstreet [Tue, 2 Jul 2013 02:29:05 +0000 (19:29 -0700)]
bcache: PRECEDING_KEY()
btree_insert_key() was open coding this, this is just refactoring.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Kent Overstreet [Thu, 25 Jul 2013 06:06:40 +0000 (23:06 -0700)]
bcache: bch_(btree|extent)_ptr_invalid()
Trying to treat btree pointers and leaf node pointers the same way was a
mistake - going to start being more explicit about the type of
key/pointer we're dealing with. This is the first part of that
refactoring; this patch shouldn't change any actual behaviour.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Kent Overstreet [Wed, 24 Jul 2013 23:46:42 +0000 (16:46 -0700)]
bcache: Don't bother with bucket refcount for btree node allocations
The bucket refcount (dropped with bkey_put()) is only needed to prevent
the newly allocated bucket from being garbage collected until we've
added a pointer to it somewhere. But for btree node allocations, the
fact that we have btree nodes locked is enough to guard against races
with garbage collection.
Eventually the per bucket refcount is going to be replaced with
something specific to bch_alloc_sectors().
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Kent Overstreet [Thu, 24 Oct 2013 23:36:03 +0000 (16:36 -0700)]
bcache: Debug code improvements
Couple changes:
* Consolidate bch_check_keys() and bch_check_key_order(), and move the
checks that only check_key_order() could do to bch_btree_iter_next().
* Get rid of CONFIG_BCACHE_EDEBUG - now, all that code is compiled in
when CONFIG_BCACHE_DEBUG is enabled, and there's now a sysfs file to
flip on the EDEBUG checks at runtime.
* Dropped an old not terribly useful check in rw_unlock(), and
refactored/improved a some of the other debug code.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Kent Overstreet [Thu, 25 Jul 2013 01:14:44 +0000 (18:14 -0700)]
bcache: Fix bch_ptr_bad()
Previously, bch_ptr_bad() could return false when there was a pointer to
a nonexistant device... it only filtered out keys with PTR_CHECK_DEV
pointers.
This behaviour was intended for multiple cache device support; for that,
just because the device for one of the pointers has gone away doesn't
mean we want to filter out the rest of the pointers.
But we don't yet explicitly filter/check individual pointers, so without
that this behaviour was wrong - a corrupt bkey with a bad device pointer
could cause us to deref a bad pointer. Doh.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Kent Overstreet [Thu, 31 Oct 2013 22:46:42 +0000 (15:46 -0700)]
bcache: Pull on disk data structures out into a separate header
Now, the on disk data structures are in a header that can be exported to
userspace - and having them all centralized is nice too.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Kent Overstreet [Thu, 25 Jul 2013 01:11:11 +0000 (18:11 -0700)]
bcache: Move sector allocator to alloc.c
Just reorganizing things a bit.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Kent Overstreet [Wed, 11 Sep 2013 02:02:45 +0000 (19:02 -0700)]
bcache: Break up struct search
With all the recent refactoring around struct btree op struct search has
gotten rather large.
But we can now easily break it up in a different way - we break out
struct btree_insert_op which is for inserting data into the cache, and
that's now what the copying gc code uses - struct search is now specific
to request.c
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Kent Overstreet [Thu, 25 Jul 2013 01:07:22 +0000 (18:07 -0700)]
bcache: Convert bch_btree_insert() to bch_btree_map_leaf_nodes()
Last of the btree_map() conversions. Main visible effect is
bch_btree_insert() is no longer taking a struct btree_op as an argument
anymore - there's no fancy state machine stuff going on, it's just a
normal function.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Kent Overstreet [Thu, 25 Jul 2013 01:06:22 +0000 (18:06 -0700)]
bcache: Don't use op->insert_collision
When we convert bch_btree_insert() to bch_btree_map_leaf_nodes(), we
won't be passing struct btree_op to bch_btree_insert() anymore - so we
need a different way of returning whether there was a collision (really,
a replace collision).
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Kent Overstreet [Wed, 11 Sep 2013 01:52:54 +0000 (18:52 -0700)]
bcache: Kill op->replace
This is prep work for converting bch_btree_insert to
bch_btree_map_leaf_nodes() - we have to convert all its arguments to
actual arguments. Bunch of churn, but should be straightforward.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Kent Overstreet [Sat, 2 Nov 2013 01:03:08 +0000 (18:03 -0700)]
bcache: Drop some closure stuff
With a the recent bcache refactoring, some of the closure code isn't
needed anymore.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Kent Overstreet [Thu, 25 Jul 2013 01:04:18 +0000 (18:04 -0700)]
bcache: Kill op->cl
This isn't used for waiting asynchronously anymore - so this is a fairly
trivial refactoring.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Kent Overstreet [Thu, 25 Jul 2013 00:44:17 +0000 (17:44 -0700)]
bcache: Prune struct btree_op
Eventual goal is for struct btree_op to contain only what is necessary
for traversing the btree.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Kent Overstreet [Thu, 25 Jul 2013 00:41:13 +0000 (17:41 -0700)]
bcache: Clean up cache_lookup_fn
There was some looping in submit_partial_cache_hit() and
submit_partial_cache_hit() that isn't needed anymore - originally, we
wouldn't necessarily process the full hit or miss all at once because
when splitting the bio, we took into account the restrictions of the
device we were sending it to.
But, device bio size restrictions are now handled elsewhere, with a
wrapper around generic_make_request() - so that looping has been
unnecessary for awhile now and we can now do quite a bit of cleanup.
And if we trim the key we're reading from to match the subset we're
actually reading, we don't have to explicitly calculate bi_sector
anymore. Neat.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Kent Overstreet [Thu, 25 Jul 2013 00:41:08 +0000 (17:41 -0700)]
bcache: Convert bch_btree_read_async() to bch_btree_map_keys()
This is a fairly straightforward conversion, mostly reshuffling -
op->lookup_done goes away, replaced by MAP_DONE/MAP_CONTINUE. And the
code for handling cache hits and misses wasn't really btree code, so it
gets moved to request.c.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Kent Overstreet [Thu, 25 Jul 2013 00:37:59 +0000 (17:37 -0700)]
bcache: Move some stuff to btree.c
With the new btree_map() functions, we don't need to export the stuff
needed for traversing the btree anymore.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Kent Overstreet [Wed, 11 Sep 2013 01:48:51 +0000 (18:48 -0700)]
bcache: Add btree_map() functions
Lots of stuff has been open coding its own btree traversal - which is
generally pretty simple code, but there are a few subtleties.
This adds new new functions, bch_btree_map_nodes() and
bch_btree_map_keys(), which do the traversal for you. Everything that's
open coding btree traversal now (with the exception of garbage
collection) is slowly going to be converted to these two functions;
being able to write other code at a higher level of abstraction is a
big improvement w.r.t. overall code quality.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Kent Overstreet [Thu, 25 Jul 2013 00:50:06 +0000 (17:50 -0700)]
bcache: Convert writeback to a kthread
This simplifies the writeback flow control quite a bit - previously, it
was conceptually two coroutines, refill_dirty() and read_dirty(). This
makes the code quite a bit more straightforward.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Kent Overstreet [Fri, 25 Oct 2013 00:19:26 +0000 (17:19 -0700)]
bcache: Convert gc to a kthread
We needed a dedicated rescuer workqueue for gc anyways... and gc was
conceptually a dedicated thread, just one that wasn't running all the
time. Switch it to a dedicated thread to make the code a bit more
straightforward.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Kent Overstreet [Thu, 25 Jul 2013 00:29:09 +0000 (17:29 -0700)]
bcache: Convert bucket_wait to wait_queue_head_t
At one point we did do fancy asynchronous waiting stuff with
bucket_wait, but that's all gone (and bucket_wait is used a lot less
than it used to be). So use the standard primitives.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Kent Overstreet [Thu, 25 Jul 2013 00:27:07 +0000 (17:27 -0700)]
bcache: Convert try_wait to wait_queue_head_t
We never waited on c->try_wait asynchronously, so just use the standard
primitives.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Kent Overstreet [Thu, 25 Jul 2013 00:26:51 +0000 (17:26 -0700)]
bcache: Move keylist out of btree_op
Slowly working on pruning struct btree_op - the aim is for it to only
contain things that are actually necessary for traversing the btree.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Kent Overstreet [Fri, 25 Oct 2013 00:07:04 +0000 (17:07 -0700)]
bcache: Refactor journalling flow control
Making things less asynchronous that don't need to be - bch_journal()
only has to block when the journal or journal entry is full, which is
emphatically not a fast path. So make it a normal function that just
returns when it finishes, to make the code and control flow easier to
follow.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Kent Overstreet [Wed, 11 Sep 2013 00:06:17 +0000 (17:06 -0700)]
bcache: Refactor read request code a bit
More refactoring, and renaming.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Kent Overstreet [Thu, 25 Jul 2013 00:24:52 +0000 (17:24 -0700)]
bcache: Refactor request_write()
Try to improve some of the naming a bit to be more consistent, and also
improve the flow of control in request_write() a bit.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Kent Overstreet [Thu, 25 Jul 2013 00:24:25 +0000 (17:24 -0700)]
bcache: Clean up keylist code
More random refactoring.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Kent Overstreet [Wed, 11 Sep 2013 01:46:36 +0000 (18:46 -0700)]
bcache: Add explicit keylist arg to btree_insert()
Some refactoring - better to explicitly pass stuff around instead of
having it all in the "big bag of state", struct btree_op. Going to prune
struct btree_op quite a bit over time.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Kent Overstreet [Wed, 11 Sep 2013 01:39:16 +0000 (18:39 -0700)]
bcache: Convert btree_insert_check_key() to btree_insert_node()
This was the main point of all this refactoring - now,
btree_insert_check_key() won't fail just because the leaf node happened
to be full.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Kent Overstreet [Thu, 25 Jul 2013 00:22:44 +0000 (17:22 -0700)]
bcache: Insert multiple keys at a time
We'll often end up with a list of adjacent keys to insert -
because bch_data_insert() may have to fragment the data it writes.
Originally, to simplify things and avoid having to deal with corner
cases bch_btree_insert() would pass keys from this list one at a time to
btree_insert_recurse() - mainly because the list of keys might span leaf
nodes, so it was easier this way.
With the btree_insert_node() refactoring, it's now a lot easier to just
pass down the whole list and have btree_insert_recurse() iterate over
leaf nodes until it's done.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Kent Overstreet [Wed, 11 Sep 2013 01:41:15 +0000 (18:41 -0700)]
bcache: Add btree_insert_node()
The flow of control in the old btree insertion code was rather -
backwards; we'd recurse down the btree (in btree_insert_recurse()), and
then if we needed to split the keys to be inserted into the parent node
would be effectively returned up to btree_insert_recurse(), which would
notice there was more work to do and finish the insertion.
The main problem with this was that the full logic for btree insertion
could only be used by calling btree_insert_recurse; if you'd gotten to a
btree leaf some other way and had a key to insert, if it turned out that
node needed to be split you were SOL.
This inverts the flow of control so btree_insert_node() does _full_
btree insertion, including splitting - and takes a (leaf) btree node to
insert into as a parameter.
This means we can now _correctly_ handle cache misses - for cache
misses, we need to insert a fake "check" key into the btree when we
discover we have a cache miss - while we still have the btree locked.
Previously, if the btree node was full inserting a cache miss would just
fail.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Kent Overstreet [Thu, 25 Jul 2013 00:20:19 +0000 (17:20 -0700)]
bcache: Explicitly track btree node's parent
This is prep work for the reworked btree insertion code.
The way we set b->parent is ugly and hacky... the problem is, when
btree_split() or garbage collection splits or rewrites a btree node, the
parent changes for all its (potentially already cached) children.
I may change this later and add some code to look through the btree node
cache and find all our cached child nodes and change the parent pointer
then...
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Kent Overstreet [Thu, 25 Jul 2013 00:20:00 +0000 (17:20 -0700)]
bcache: Remove unnecessary check in should_split()
Checking i->seq was redundant, because since ages ago we always
initialize the new bset when advancing b->written
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Kent Overstreet [Sat, 17 Aug 2013 09:13:15 +0000 (02:13 -0700)]
bcache: Stripe size isn't necessarily a power of two
Originally I got this right... except that the divides didn't use
do_div(), which broke 32 bit kernels. When I went to fix that, I forgot
that the raid stripe size usually isn't a power of two... doh
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Kent Overstreet [Fri, 12 Jul 2013 02:42:51 +0000 (19:42 -0700)]
bcache: Add on error panic/unregister setting
Works kind of like the ext4 setting, to panic or remount read only on
errors.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Kent Overstreet [Thu, 25 Jul 2013 00:16:09 +0000 (17:16 -0700)]
bcache: Use blkdev_issue_discard()
The old asynchronous discard code was really a relic from when all the
allocation code was asynchronous - now that allocation runs out of a
dedicated thread there's no point in keeping around all that complicated
machinery.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Kent Overstreet [Fri, 25 Oct 2013 00:12:52 +0000 (17:12 -0700)]
bcache: Fix a lockdep splat
bch_keybuf_del() takes a spinlock that can't be taken in interrupt context -
whoops. Fortunately, this code isn't enabled by default (you have to toggle a
sysfs thing).
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Kent Overstreet [Tue, 8 Oct 2013 22:50:46 +0000 (15:50 -0700)]
bcache: Fix a journalling performance bug
Kent Overstreet [Mon, 11 Nov 2013 05:55:27 +0000 (21:55 -0800)]
bcache: Fix dirty_data accounting
Dirty data accounting wasn't quite right - firstly, we were adding the key we're
inserting after it could have merged with another dirty key already in the
btree, and secondly we could sometimes pass the wrong offset to
bcache_dev_sectors_dirty_add() for dirty data we were overwriting - which is
important when tracking dirty data by stripe.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: linux-stable <stable@vger.kernel.org> # >= v3.10
Ben Harris [Fri, 18 Oct 2013 20:23:35 +0000 (21:23 +0100)]
floppy: Correct documentation of driver options when used as a module.
The options have to be passed space-separated and prefixed by "floppy=",
rather than separately and unprefixed.
This fixes <http://bugs.debian.org/726655>.
Signed-off-by: Ben Harris <bjh21@cam.ac.uk>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Dan Carpenter [Wed, 6 Nov 2013 08:24:02 +0000 (09:24 +0100)]
pktcdvd: debugfs functions return NULL on error
My static checker complains correctly that this is potential NULL
dereference because debugfs functions return NULL on error. They return
an ERR_PTR if they are configured out.
We don't need to check for ERR_PTR because if debugfs is stubbed out the
dummy functions won't complain about that. We don't need to check the
values before calling debugfs_remove() because that accepts ERR_PTRs and
NULL pointers.
We don't need to set pkt->dfs_f_info to NULL in pkt_debugfs_dev_new()
because it was initialized with kzalloc() so I have removed that.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Roger Pau Monne [Tue, 29 Oct 2013 17:31:14 +0000 (18:31 +0100)]
xen-blkfront: restore the non-persistent data path
When persistent grants were added they were always used, even if the
backend doesn't have this feature (there's no harm in always using the
same set of pages). This restores the old data path when the backend
doesn't have persistent grants, removing the burden of doing a memcpy
when it is not actually needed.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reported-by: Felipe Franciosi <felipe.franciosi@citrix.com>
Cc: Felipe Franciosi <felipe.franciosi@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[v2: Fix up whitespace issues]
Bartlomiej Zolnierkiewicz [Tue, 5 Nov 2013 11:37:09 +0000 (12:37 +0100)]
skd: fix formatting in skd_s1120.h
Cc: Akhil Bhansali <abhansali@stec-inc.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bartlomiej Zolnierkiewicz [Tue, 5 Nov 2013 11:37:08 +0000 (12:37 +0100)]
skd: reorder construct/destruct code
Reorder placement of skd_construct(), skd_cons_sg_list(), skd_destruct()
and skd_free_sg_list() functions. Then remove no longer needed function
prototypes.
Cc: Akhil Bhansali <abhansali@stec-inc.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bartlomiej Zolnierkiewicz [Tue, 5 Nov 2013 11:37:07 +0000 (12:37 +0100)]
skd: cleanup skd_do_inq_page_da()
skdev->pdev and skdev->pdev->bus are always different than NULL in
skd_do_inq_page_da() so simplify the code accordingly.
Also cache skdev->pdev value in pdev variable while at it.
Cc: Akhil Bhansali <abhansali@stec-inc.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bartlomiej Zolnierkiewicz [Tue, 5 Nov 2013 11:37:06 +0000 (12:37 +0100)]
skd: remove SKD_OMIT_FROM_SRC_DIST ifdefs
SKD_OMIT_FROM_SRC_DIST is never defined.
Cc: Akhil Bhansali <abhansali@stec-inc.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bartlomiej Zolnierkiewicz [Tue, 5 Nov 2013 11:37:05 +0000 (12:37 +0100)]
skd: remove redundant skdev->pdev assignment from skd_pci_probe()
skdev->pdev is set to pdev twice in skd_pci_probe(), first time
through skd_construct() call and the second time directly in
the function. Remove the second assignment as it is not needed.
Cc: Akhil Bhansali <abhansali@stec-inc.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bartlomiej Zolnierkiewicz [Tue, 5 Nov 2013 11:37:04 +0000 (12:37 +0100)]
skd: use <asm/unaligned.h>
Use <asm/unaligned.h> instead of <asm-generic/unaligned.h>.
Cc: Akhil Bhansali <abhansali@stec-inc.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bartlomiej Zolnierkiewicz [Tue, 5 Nov 2013 11:37:03 +0000 (12:37 +0100)]
skd: remove SCSI subsystem specific includes
This is not a SCSI host driver so remove SCSI subsystem specific
includes.
Cc: Akhil Bhansali <abhansali@stec-inc.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bartlomiej Zolnierkiewicz [Tue, 5 Nov 2013 11:37:02 +0000 (12:37 +0100)]
skd: register block device only if some devices are present
Register block device in skd_pci_probe() instead of in skd_init() so it
is registered only if some devices are present (currently it is always
registered when the driver is loaded). Please note that this change
depends on the fact that register_blkdev(0, ...) never returns 0.
Cc: Akhil Bhansali <abhansali@stec-inc.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bartlomiej Zolnierkiewicz [Tue, 5 Nov 2013 11:37:01 +0000 (12:37 +0100)]
skd: fix error messages in skd_init()
* change priority level from KERN_INFO to KERN_ERR
* add "skd: " prefix
* do minor CodingStyle fixes
Cc: Akhil Bhansali <abhansali@stec-inc.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bartlomiej Zolnierkiewicz [Tue, 5 Nov 2013 11:37:00 +0000 (12:37 +0100)]
skd: fix error paths in skd_init()
Cc: Akhil Bhansali <abhansali@stec-inc.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bartlomiej Zolnierkiewicz [Tue, 5 Nov 2013 11:36:59 +0000 (12:36 +0100)]
skd: fix unregister_blkdev() placement
register_blkdev() is called before pci_register_driver() in skd_init()
so unregister_blkdev() should be called after pci_unregister_driver()
in skd_exit(). Fix it.
Cc: Akhil Bhansali <abhansali@stec-inc.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Mike Snitzer [Fri, 1 Nov 2013 19:05:10 +0000 (15:05 -0400)]
skd: more removal of bio-based code
Remove skd_flush_cmd structure and skd_flush_slab.
Remove skd_end_request wrapper around skd_end_request_blk.
Remove skd_requeue_request, use blk_requeue_request directly.
Cleanup some comments (remove "bio" info) and whitespace.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Fri, 1 Nov 2013 16:38:45 +0000 (10:38 -0600)]
skd: cleanup the skd_*() function block wrapping
Just call the block functions directly, don't wrap them
in skd helpers. With only one queueing model enabled, there's
no point in doing that.
Also kill the ->start_time and ->bio from the skd_request_context,
we don't use those anymore.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Fri, 1 Nov 2013 16:14:56 +0000 (10:14 -0600)]
skd: rip out bio path
The skd driver has a selectable rq or bio based queueing model.
For 3.14, we want to turn this into a single blk-mq interface
instead. With the immutable biovecs being merged in 3.13, the
bio model would need patches to even work. So rip it out, with
a conversion pending for blk-mq in the next release.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Wei Yongjun [Wed, 30 Oct 2013 05:23:53 +0000 (13:23 +0800)]
skd: fix error return code in skd_pci_probe()
Fix to return -ENOMEM in the skd construct error handling
case instead of 0, as done elsewhere in this function.
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Heiko Carstens [Thu, 31 Oct 2013 12:24:28 +0000 (13:24 +0100)]
s390/dasd: hold request queue sysfs lock when calling elevator_init()
"elevator: Fix a race in elevator switching and md device initialization"
changed the semantics of elevator_init() in a way that now enforces to hold
the corresponding request queue's sysfs_lock when calling elevator_init()
to fix a race.
The patch did not convert the s390 dasd device driver which is the only
device driver which also calls elevator_init(). So add the missing locking.
Cc: Tomoki Sekiyama <tomoki.sekiyama@hds.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Stephen M. Cameron [Tue, 29 Oct 2013 19:46:06 +0000 (13:46 -0600)]
cciss: return 0 from driver probe function on success, not 1
A return value of 1 is interpreted as an error
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
rchinthekindi [Thu, 24 Oct 2013 11:51:23 +0000 (12:51 +0100)]
skd: Replaced custom debug PRINTKs with pr_debug
Replaced DPRINTK() and VPRINTK() with pr_debug().
Signed-off-by: Ramprasad C <ramprasad.chinthekindi@hgst.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Akhil Bhansali [Wed, 23 Oct 2013 12:00:08 +0000 (13:00 +0100)]
skd: Fix checkpatch ERRORS and removed unused functions
This patch fixes checkpatch.pl errors for assignment in if condition.
It also removes unused readq / readl function calls.
As Andrew had disabled the compilation of drivers for 32 bit,
I have modified format specifiers in few VPRINTKs to avoid warnings
during 64 bit compilation.
Signed-off-by: Akhil Bhansali <abhansali@stec-inc.com>
Reviewed-by: Ramprasad Chinthekindi <rchinthekindi@stec-inc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Philip J Kelleher [Fri, 18 Oct 2013 22:12:35 +0000 (17:12 -0500)]
rsxx: Fix possible kernel panic with invalid config.
This patch fixes a possible Kernel Panic on driver load if
the configuration on the card is messed up or not yet set.
The driver could possible give a 32 bit unsigned all Fs to
the kernel as the device's block size.
Now we only write the block size to the kernel if the
configuration from the card is valid.
Also, driver version is being updated.
Signed-off-by: Philip J Kelleher <pjk1939@linux.vnet.ibm.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Philip J Kelleher [Fri, 18 Oct 2013 22:11:46 +0000 (17:11 -0500)]
rsxx: Disallow discards from being unmapped.
This patch fixes a bug in which discards were always
calling pci_unmap_page. Discards should never call the
pci_unmap_page function call because they are never mapped.
This caused a race condition on PowerPC systems when issuing
discards, writes, and reads all at the same time. The
pci_map_page function would eventually map logical address
0 for a read or write. Discards are always assigned a DMA
address of 0 because they are never mapped. So if
pci_map_page mapped address 0 for a DMA and a discard was
"unmapped" then the address would be freed and would cause
an EEH event to occur when Hardware accesses the address.
This was injected/uncovered in commit:
b347f9cf0bc8d42ee95ba1d3837fd93045ab336b
The pci_dma_mapping_error function declares -1 a DMA_ERROR
not 0 like initially thought So before we would never unmap
discards because they were considered NULL.
This patch should fall on top of commit id:
fc1967bb08a6184ed44ef990e1dd4389901b809c
Also, the driver version is being up dated.
Signed-off-by: Philip J Kelleher <pjk1939@linux.vnet.ibm.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Lars Ellenberg [Wed, 23 Oct 2013 08:59:19 +0000 (10:59 +0200)]
drbd: avoid to shrink max_bio_size due to peer re-configuration
For a long time, the receiving side has spread "too large" incoming
requests over multiple bios. No need to shrink our max_bio_size
(max_hw_sectors) if the peer is reconfigured to use a different storage.
The problem manifests itself if we are not the top of the device stack
(DRBD is used a LVM PV).
A hardware reconfiguration on the peer may cause the supported
max_bio_size to shrink, and the connection handshake would now
unnecessarily shrink the max_bio_size on the active node.
There is no way to notify upper layers that they have to "re-stack"
their limits. So they won't notice at all, and may keep submitting bios
that are suddenly considered "too large for device".
We already check for compatibility and ignore changes on the peer,
the code only was masked out unless we have a fully established connection.
We just need to allow it a bit earlier during the handshake.
Also consider max_hw_sectors in our merge bvec function, just in case.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Lars Ellenberg [Wed, 23 Oct 2013 08:59:18 +0000 (10:59 +0200)]
drbd: fix decoding of bitmap vli rle for device sizes > 64 TB
Symptoms: disconnect after bitmap exchange due to
bitmap overflow (e:
49731075554) while decoding bm RLE packet
In the decoding step of the variable length integer run length encoding
there was potentially an uncatched bitshift by wordsize (variable >> 64).
The result of which is "undefined" :(
(only "sometimes" the result is the desired 0)
Fix: don't do any bit shift magic for shift == 64, just assign.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Philipp Reisner [Wed, 23 Oct 2013 08:59:17 +0000 (10:59 +0200)]
drbd: Fix adding of new minors with freshly created meta data
Online adding of new minors with freshly created meta data
to an resource with an established connection failed, with a
wrong state transition on one side on one side of the new minor.
Freshly created meta-data has a la_size (last agreed size) of 0.
When we online add such devices, the code wrongly got into
the code path for resyncing new storage that was added while
the disk was detached.
Fixed that by making the GREW from ZERO a special case.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Philipp Reisner [Wed, 23 Oct 2013 08:59:16 +0000 (10:59 +0200)]
drbd: Fix an connection drop issue after enabling allow-two-primaries
Since drbd-8.4.0 it is possible to change the allow-two-primaries
network option while the connection is established.
The sequence code used to partially order packets from the
data socket with packets from the meta-data socket, still assued
that the allow-two-primaries option is constant while the
connection is established.
I.e.
On a node that has the RESOLVE_CONFLICTS bits set, after enabling
allow-two-primaries, when receiving the next data packet it timed out
while waiting for the necessary packets on the data socket to arrive
(wait_for_and_update_peer_seq() function).
Fixed that by always tracking the sequence number, but only waiting
for it if allow-two-primaries is set.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Lars Ellenberg [Wed, 23 Oct 2013 08:59:15 +0000 (10:59 +0200)]
drbd: fix NULL pointer deref in module init error path
If we want to iterate over the (as of yet still empty) list in the
cleanup path, we need to initialize the list before the first goto fail.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Thu, 17 Oct 2013 22:38:30 +0000 (16:38 -0600)]
block: disable cpqarray in Kconfig
Mike writes:
"cpqarray hasn't been used in over 12 years. It's doubtful that anyone
still uses the board. It's time the driver was removed from the mainline
kernel. The only updates these days are minor and mostly done by people
outside of HP."
If nobody yells, we'll remove it from the kernel tree completely
for 3.15.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Akhil Bhansali [Tue, 15 Oct 2013 20:19:07 +0000 (14:19 -0600)]
Add support for sTec's pci-e flash card Kronos
Signed-off-by: Akhil Bhansali <abhansali@stec-inc.com>
Signed-off-by: Ramprasad Chinthekindi <rchinthekindi@stec-inc.com>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Folded patch, contributions to clean up this driver from:
Jens Axboe
Dan Carpenter
Andrew Morton
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Philip J Kelleher [Sat, 28 Sep 2013 02:42:50 +0000 (20:42 -0600)]
rsxx: Kernel Panic caused by mapping Discards
This fixes a kernel panic injected by commit id
8d26750143341831bc312f61c5ed141eeb75b8d0 where discards
are getting mapped through the pci_map_page function call.
The driver will now start verifying that a dma is not a
discard before issuing a the pci_map_page function call.
Also, we are updating the driver version.
Signed-off-by: Philip J Kelleher <pjk1939@linux.vnet.ibm.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
David Milburn [Thu, 23 May 2013 21:23:45 +0000 (16:23 -0500)]
mtip32xx: dynamically allocate buffer in debugfs functions
Dynamically allocate buf to prevent warnings:
drivers/block/mtip32xx/mtip32xx.c: In function ‘mtip_hw_read_device_status’:
drivers/block/mtip32xx/mtip32xx.c:2823: warning: the frame size of 1056 bytes is larger than 1024 bytes
drivers/block/mtip32xx/mtip32xx.c: In function ‘mtip_hw_read_registers’:
drivers/block/mtip32xx/mtip32xx.c:2894: warning: the frame size of 1056 bytes is larger than 1024 bytes
drivers/block/mtip32xx/mtip32xx.c: In function ‘mtip_hw_read_flags’:
drivers/block/mtip32xx/mtip32xx.c:2917: warning: the frame size of 1056 bytes is larger than 1024 bytes
Signed-off-by: David Milburn <dmilburn@redhat.com>
Acked-by: Asai Thambi S P <asamymuthupa@micron.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Asai Thambi S P [Wed, 11 Sep 2013 19:14:42 +0000 (13:14 -0600)]
mtip32xx: Add SRSI support
This patch add support for SRSI(Surprise Removal Surprise Insertion).
Approach:
---------
Surprise Removal:
-----------------
On surprise removal of the device, gendisk, request queue, device index, sysfs
entries, etc are retained as long as device is in use - mounted filesystem,
device opened by an application, etc. The service thread breaks out of the main
while loop, waits for pci remove to exit, and then waits for device to become
free. When there no holders of the device, service thread cleans up the block
and device related stuff and returns.
Surprise Insertion:
-------------------
No change, this scenario follows the normal pci probe() function flow.
Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Philip J Kelleher [Wed, 4 Sep 2013 18:59:35 +0000 (13:59 -0500)]
rsxx: Moving pci_map_page to prevent overflow.
The pci_map_page function has been moved into our
issued workqueue to prevent an us running out of
mappable addresses on non-HWWD PCIe x8 slots. The
maximum amount that can possible be mapped at one
time now is: 255 dmas X 4 dma channels X 4096 Bytes.
Signed-off-by: Philip J Kelleher <pjk1939@linux.vnet.ibm.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Philip J Kelleher [Wed, 4 Sep 2013 18:59:02 +0000 (13:59 -0500)]
rsxx: Handling failed pci_map_page on PowerPC and double free.
The rsxx driver was not checking the correct value during a
pci_map_page failure. Fixing this also uncovered a
double free if the bio was returned before it was
broken up into indiviadual 4k dmas, that is also
fixed here.
Signed-off-by: Philip J Kelleher <pjk1939@linux.vnet.ibm.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Mikulas Patocka [Tue, 15 Oct 2013 20:14:38 +0000 (14:14 -0600)]
loop: fix crash when using unassigned loop device
When the loop module is loaded, it creates 8 loop devices /dev/loop[0-7].
The devices have no request routine and thus, when they are used without
being assigned, a crash happens.
For example, these commands cause crash (assuming there are no used loop
devices):
Kernel Fault: Code=26 regs=
000000007f420980 (Addr=
0000000000000010)
CPU: 1 PID: 50 Comm: kworker/1:1 Not tainted 3.11.0 #1
Workqueue: ksnaphd do_metadata [dm_snapshot]
task:
000000007fcf4078 ti:
000000007f420000 task.ti:
000000007f420000
[ 116.319988]
YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
PSW:
00001000000001001111111100001111 Not tainted
r00-03
000000ff0804ff0f 00000000408bf5d0 00000000402d8204 000000007b7ff6c0
r04-07
00000000408a95d0 000000007f420950 000000007b7ff6c0 000000007d06c930
r08-11
000000007f4205c0 0000000000000001 000000007f4205c0 000000007f4204b8
r12-15
0000000000000010 0000000000000000 0000000000000000 0000000000000000
r16-19
000000001108dd48 000000004061cd7c 000000007d859800 000000000800000f
r20-23
0000000000000000 0000000000000008 0000000000000000 0000000000000000
r24-27
00000000ffffffff 000000007b7ff6c0 000000007d859800 00000000408a95d0
r28-31
0000000000000000 000000007f420950 000000007f420980 000000007f4208e8
sr00-03
0000000000000000 0000000000000000 0000000000000000 0000000000303000
sr04-07
0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 117.549988]
IASQ:
0000000000000000 0000000000000000 IAOQ:
00000000402d82fc 00000000402d8300
IIR:
53820020 ISR:
0000000000000000 IOR:
0000000000000010
CPU: 1 CR30:
000000007f420000 CR31:
ffffffffffffffff
ORIG_R28:
0000000000000001
IAOQ[0]: generic_make_request+0x11c/0x1a0
IAOQ[1]: generic_make_request+0x120/0x1a0
RP(r2): generic_make_request+0x24/0x1a0
Backtrace:
[<
00000000402d83f0>] submit_bio+0x70/0x140
[<
0000000011087c4c>] dispatch_io+0x234/0x478 [dm_mod]
[<
0000000011087f44>] sync_io+0xb4/0x190 [dm_mod]
[<
00000000110883bc>] dm_io+0x2c4/0x310 [dm_mod]
[<
00000000110bfcd0>] do_metadata+0x28/0xb0 [dm_snapshot]
[<
00000000401591d8>] process_one_work+0x160/0x460
[<
0000000040159bc0>] worker_thread+0x300/0x478
[<
0000000040161a70>] kthread+0x118/0x128
[<
0000000040104020>] end_fault_vector+0x20/0x28
[<
0000000040177220>] task_tick_fair+0x420/0x4d0
[<
00000000401aa048>] invoke_rcu_core+0x50/0x60
[<
00000000401ad5b8>] rcu_check_callbacks+0x210/0x8d8
[<
000000004014aaa0>] update_process_times+0xa8/0xc0
[<
00000000401ab86c>] rcu_process_callbacks+0x4b4/0x598
[<
0000000040142408>] __do_softirq+0x250/0x2c0
[<
00000000401789d0>] find_busiest_group+0x3c0/0xc70
[ 119.379988]
Kernel panic - not syncing: Kernel Fault
Rebooting in 1 seconds..
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Vegard Nossum [Thu, 5 Sep 2013 11:00:14 +0000 (13:00 +0200)]
xen/blkback: fix reference counting
If the permission check fails, we drop a reference to the blkif without
having taken it in the first place. The bug was introduced in commit
604c499cbbcc3d5fe5fb8d53306aa0fae1990109 (xen/blkback: Check device
permissions before allowing OP_DISCARD).
Cc: stable@vger.kernel.org
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Roger Pau Monne [Mon, 12 Aug 2013 10:53:43 +0000 (12:53 +0200)]
xen-blkfront: improve aproximation of required grants per request
Improve the calculation of required grants to process a request by
using nr_phys_segments instead of always assuming a request is going
to use all posible segments.
nr_phys_segments contains the number of scatter-gather DMA addr+len
pairs, which is basically what we put at every granted page.
for_each_sg iterates over the DMA addr+len pairs and uses a grant
page for each of them.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Roger Pau Monne [Mon, 12 Aug 2013 10:53:44 +0000 (12:53 +0200)]
xen-blkfront: revoke foreign access for grants not mapped by the backend
There's no need to keep the foreign access in a grant if it is not
persistently mapped by the backend. This allows us to free grants that
are not mapped by the backend, thus preventing blkfront from hoarding
all grants.
The main effect of this is that blkfront will only persistently map
the same grants as the backend, and it will always try to use grants
that are already mapped by the backend. Also the number of persistent
grants in blkfront is the same as in blkback (and is controlled by the
value in blkback).
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Matt Wilson <msw@amazon.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Michael Opdenacker [Sat, 12 Oct 2013 04:33:47 +0000 (06:33 +0200)]
mg_disk: remove deprecated IRQF_DISABLED
This patch proposes to remove the use of the IRQF_DISABLED flag
It's a NOOP since 2.6.35 and it will be removed one day.
Signed-off-by: Michael Opdenacker <michael.opdenacker@free-electrons.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Duan Jiong [Wed, 6 Nov 2013 07:56:39 +0000 (15:56 +0800)]
block: replace IS_ERR and PTR_ERR with PTR_ERR_OR_ZERO
This patch fixes coccinelle error regarding usage of IS_ERR and
PTR_ERR instead of PTR_ERR_OR_ZERO.
Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Duan Jiong [Wed, 6 Nov 2013 07:55:44 +0000 (15:55 +0800)]
block: replace IS_ERR and PTR_ERR with PTR_ERR_OR_ZERO
This patch fixes coccinelle error regarding usage of IS_ERR and
PTR_ERR instead of PTR_ERR_OR_ZERO.
Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Geert Uytterhoeven [Mon, 4 Nov 2013 13:00:06 +0000 (14:00 +0100)]
block: Do not call sector_div() with a 64-bit divisor
do_div() (called by sector_div() if CONFIG_LBDAF=y) is meant for divisions
of 64-bit number by 32-bit numbers. Passing 64-bit divisor types caused
issues in the past on 32-bit platforms, cfr. commit
ea077b1b96e073eac5c3c5590529e964767fc5f7 ("m68k: Truncate base in
do_div()").
As queue_limits.max_discard_sectors and .discard_granularity are unsigned
int, max_discard_sectors and granularity should be unsigned int.
As bdev_discard_alignment() returns int, alignment should be int.
Now 2 calls to sector_div() can be replaced by 32-bit arithmetic:
- The 64-bit modulo operation can become a 32-bit modulo operation,
- The 64-bit division and multiplication can be replaced by a 32-bit
modulo operation and a subtraction.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Chen Gang [Sun, 3 Nov 2013 14:23:39 +0000 (22:23 +0800)]
kernel: trace: blktrace: remove redundent memcpy() in compat_blk_trace_setup()
do_blk_trace_setup() will fully initialize 'buts.name', so can remove
the related memcpy(). And also use BLKTRACE_BDEV_SIZE and ARRAY_SIZE
instead of hard code number '32'.
Signed-off-by: Chen Gang <gang.chen@asianux.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Kent Overstreet [Wed, 7 Aug 2013 18:14:32 +0000 (11:14 -0700)]
block: Consolidate duplicated bio_trim() implementations
Someone cut and pasted md's md_trim_bio() into xen-blkfront.c. Come on,
we should know better than this.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Neil Brown <neilb@suse.de>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Kent Overstreet [Wed, 7 Aug 2013 21:20:17 +0000 (14:20 -0700)]
block: Use rw_copy_check_uvector()
No need for silly open coding - and struct sg_iovec has exactly the same
layout as struct iovec...
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Alireza Haghdoost [Wed, 23 Oct 2013 16:08:16 +0000 (17:08 +0100)]
block: Enable sysfs nomerge control for I/O requests in the plug list
This patch enables the sysfs to control I/O request merge
functionality in the plug list. While this control has been
implemented for the request queue, it was dismissed in the plug list.
Therefore, block layer merges requests together (or attempt to merge)
even if the merge capability was disable using sysfs nomerge parameter
value 2.
This limitation is directly affects functionality of io_submit()
system call. The system call enables user to submit a bunch of IO
requests from user space using struct iocb **ios input argument.
However, the unconditioned merging functionality in the plug list
potentially merges these requests together down the road. Therefore,
there is no way to distinguish between an application sending bunch of
sequential IOs and an application sending one big IO. Ultimately, all
requests generated by the former app merge within the plug list
together and looks similar to the second app.
While the merging functionality is a desirable feature to improve the
performance of IO subsystem for some applications, it is not useful
for other application like ours at all.
Signed-off-by: Alireza Haghdoost <alireza@cs.umn.edu>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Coding style modified.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Mike Snitzer [Fri, 18 Oct 2013 15:44:49 +0000 (09:44 -0600)]
block: properly stack underlying max_segment_size to DM device
Without this patch all DM devices will default to BLK_MAX_SEGMENT_SIZE
(65536) even if the underlying device(s) have a larger value -- this is
due to blk_stack_limits() using min_not_zero() when stacking the
max_segment_size limit.
1073741824
before patch:
65536
after patch:
1073741824
Reported-by: Lukasz Flis <l.flis@cyfronet.pl>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # v3.3+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Tomoki Sekiyama [Tue, 15 Oct 2013 22:42:19 +0000 (16:42 -0600)]
elevator: acquire q->sysfs_lock in elevator_change()
Add locking of q->sysfs_lock into elevator_change() (an exported function)
to ensure it is held to protect q->elevator from elevator_init(), even if
elevator_change() is called from non-sysfs paths.
sysfs path (elv_iosched_store) uses __elevator_change(), non-locking
version, as the lock is already taken by elv_iosched_store().
Signed-off-by: Tomoki Sekiyama <tomoki.sekiyama@hds.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>