Heinz Mauelshagen [Wed, 15 Jun 2016 16:39:17 +0000 (18:39 +0200)]
dm raid: avoid superfluous memory barriers on static metadata
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Mike Snitzer [Wed, 6 Jul 2016 13:06:37 +0000 (09:06 -0400)]
dm rq: check kthread_run return for .request_fn request-based DM
Check return value of kthread_run() in dm_old_init_request_queue().
Reported-by: Minfei Huang <mnghuan@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Sami Tolvanen [Tue, 21 Jun 2016 18:02:42 +0000 (11:02 -0700)]
dm verity fec: fix block calculation
do_div was replaced with div64_u64 at some point, causing a bug with
block calculation due to incompatible semantics of the two functions.
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Fixes:
a739ff3f543a ("dm verity: add support for forward error correction")
Cc: stable@vger.kernel.org # v4.5+
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Bart Van Assche [Tue, 28 Jun 2016 14:36:46 +0000 (16:36 +0200)]
dm ioctl: Simplify parameter buffer management code
Merge the two DM_PARAMS_[KV]MALLOC flags into a single flag.
Doing so avoids the crashes seen with previous attempts to consolidate
buffer management to use kvfree() without first flagging that memory had
actually been allocated.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Bart Van Assche [Tue, 28 Jun 2016 14:32:32 +0000 (16:32 +0200)]
dm crypt: Fix sparse complaints
Avoid that sparse complains about assigning a __le64 value to a u64
variable. Remove the (u64) casts since these are superfluous. This
patch does not change the behavior of the source code.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Arnd Bergmann [Thu, 16 Jun 2016 09:03:25 +0000 (11:03 +0200)]
dm raid: don't use 'const' in function return
A newly introduced function has 'const int' as the return type,
but as "make W=1" reports, that has no meaning:
drivers/md/dm-raid.c:510:18: error: type qualifiers ignored on function return type [-Werror=ignored-qualifiers]
This changes the return type to plain 'int'.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes:
33e53f06850f ("dm raid: introduce extended superblock and new raid types to support takeover/reshaping")
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Heinz Mauelshagen [Tue, 14 Jun 2016 19:23:13 +0000 (15:23 -0400)]
dm raid: fix failed takeover/reshapes by keeping raid set frozen
Superblock updates where bogus causing some takovers/reshapes to fail.
Introduce new runtime flag (RT_FLAG_KEEP_RS_FROZEN) to keep a raid set
frozen when a layout change was requested. Userpace will immediately
reload the table w/o the flags requesting such change once they made it
to the superblocks and any change of recovery/reshape offsets has to be
avoided until after read.
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Heinz Mauelshagen [Mon, 13 Jun 2016 23:46:03 +0000 (01:46 +0200)]
dm raid: support to change bitmap region size
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Heinz Mauelshagen [Mon, 13 Jun 2016 23:46:01 +0000 (01:46 +0200)]
dm raid: update Documentation about reshaping/takeover/additonal RAID types
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Heinz Mauelshagen [Mon, 13 Jun 2016 15:55:14 +0000 (17:55 +0200)]
dm raid: add reshaping support to the target
Add bool functions rs_is_recovering and rs_is_reshaping()
to test for ongoing recovery/reshaping respectively in order
to reject respective requests on ongoing ones.
Remove ctr array size check, because ti->len and array
sectors will differ during disk addition/removal reshape.
Use __is_raid10_near() rather than type string compare.
Introduce rs_check_reshape() and rs_start_reshape(),
use the former in the ctr to reject bogus rehsape requests
and the latter in preresume to actually start a reshape.
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Heinz Mauelshagen [Mon, 13 Jun 2016 15:55:13 +0000 (17:55 +0200)]
dm raid: add prerequisite functions and definitions for reshaping
Add rs_is_reshapable(), rs_data_stripes(), rs_reshape_requested(),
rs_set_dev_and_array_sectors() and rs_adjust_data_offsets()
Remove superfluous check for reshape message
Correct runtime bit definitions to be incremental
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Heinz Mauelshagen [Thu, 9 Jun 2016 14:42:16 +0000 (16:42 +0200)]
dm raid: inverse check for flags from invalid to valid flags
It is more intuitive to manage each raid level's features in terms of
what is supported rather than what isn't supported.
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Mike Snitzer [Thu, 2 Jun 2016 19:27:22 +0000 (15:27 -0400)]
dm raid: various code cleanups
Renamed functions and variables with leading single underscore to have a
double underscore. Renamed some functions to have better names. Folded
functions that were split out without reason.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Mike Snitzer [Thu, 2 Jun 2016 19:08:09 +0000 (15:08 -0400)]
dm raid: rename functions that alloc and free struct raid_set
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Mike Snitzer [Thu, 2 Jun 2016 16:27:46 +0000 (12:27 -0400)]
dm raid: remove all the bitops wrappers
Removes obfuscation that is of little value.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Mike Snitzer [Thu, 2 Jun 2016 16:06:54 +0000 (12:06 -0400)]
dm raid: rename _in_range to __within_range
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Mike Snitzer [Thu, 2 Jun 2016 16:02:19 +0000 (12:02 -0400)]
dm raid: add missing "dm-raid0" module alias
Also update module description to "raid0/1/10/4/5/6 target"
Reported by Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Mike Snitzer [Thu, 2 Jun 2016 15:58:51 +0000 (11:58 -0400)]
dm raid: rename _argname_by_flag to dm_raid_arg_name_by_flag
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Mike Snitzer [Thu, 2 Jun 2016 15:48:09 +0000 (11:48 -0400)]
dm raid: bump to v1.9.0 and make the extended SB feature flag reflect it
No idea what Heinz was doing with the versioning but upstream commit
4c9971ca6a ("dm raid: make sure no feature flags are set in metadata")
bumped to 1.8.0 already.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Mike Snitzer [Tue, 31 May 2016 18:26:52 +0000 (14:26 -0400)]
dm raid: remove ti_error_* wrappers
There ti_error_* wrappers added very little. No other DM target has
ever gone to such lengths to wrap setting ti->error.
Also fixes some NULL derefences via rs->ti->error.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Mike Snitzer [Mon, 30 May 2016 17:03:37 +0000 (13:03 -0400)]
dm raid: tabify appropriate whitespace
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Heinz Mauelshagen [Thu, 19 May 2016 16:49:34 +0000 (18:49 +0200)]
dm raid: enhance status interface and fixup takeover/raid0
The target's status interface has to provide the new 'data_offset' value
to allow userspace to retrieve the kernels offset to the data on each
raid device of a raid set. This is the base for out-of-place reshaping
required to not write over any data during reshaping (e.g. change
raid6_zr -> raid6_nc):
- add rs_set_cur() to be able to start up existing array in case of no
takeover; use in ctr on takeover check
- enhance raid_status()
- add supporting functions to get resync/reshape progress and raid
device status chars
- fixup rebuild table line output race, which does miss to emit
'rebuild N' on fully synced/rebuild devices, because it is relying on
the transient 'In_sync' raid device flag
- add new status line output for 'data_offset', which'll later be used
for out-of-place reshaping
- fixup takeover not working for all levels
- fixup raid0 message interface oops caused by missing checks
for the md threads, which don't exist in case of raid0
- remove ALL_FREEZE_FLAGS not needed for takeover
- adjust comments
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Heinz Mauelshagen [Thu, 19 May 2016 16:49:33 +0000 (18:49 +0200)]
dm raid: add raid level takeover support
Add raid level takeover support allowing arbitrary takeovers between
raid levels supported by md personalities (i.e. raid0, raid1/10 and
raid4/5/6):
- add rs_config_{backup|restore} function to allow for temporary
storing ctr requested layout changes and restore them for takeover
conersion decision after the superblocks got loaded and analyzed
- add members to store layout to 'struct raid_set' (not mandatory
for takeover but needed for reshape in later patch)
- add rebuild_disks bitfield to 'struct raid_set' and set bits in ctr
to use in setting up takeover (base to address a 'rebuild' related
raid_status() table line bug and needed as well for reshape in future
patch)
- add runtime flags and respective manipulation functions to be able to
control e.g. wrting of superlocks to the preresume function on
takeover and (later) reshape
- add functions to detect takeover, check it's valid (mandatory here to
avoid failing on md_run()), setup for it and use in the ctr; those
will be likely moved out once reshaping gets added to simplify the
ctr
- start raid set readonly in ctr and switch to readwrite, optionally
updating superblocks, in preresume in order to allow suspend to
quiesce any active table before (which involves superblock updates);
this ensures the proper sequence of writing the current and any new
takeover(/reshape) metadata
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Heinz Mauelshagen [Thu, 19 May 2016 16:49:32 +0000 (18:49 +0200)]
dm raid: enhance super_sync() to support new superblock members
Add transferring the new takeover/reshape related superblock
members introduced to the super_sync() function:
- add/move supporting functions
- add failed devices bitfield transfer functions to retrieve the
bitfield from superblock format or update it in the superblock
- add code to transfer all new members
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Heinz Mauelshagen [Thu, 19 May 2016 16:49:31 +0000 (18:49 +0200)]
dm raid: add new reshaping/raid10 format table line options to parameter parser
Support the follwoing arguments in the ctr parameter parser:
- add 'delta_disks', 'data_offset' taking int and sector respectively
- 'raid10_use_near_sets' bool argument to optionally select
near sets with supporting raid10 mappings
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Heinz Mauelshagen [Thu, 19 May 2016 16:49:30 +0000 (18:49 +0200)]
dm raid: introduce extended superblock and new raid types to support takeover/reshaping
Add new members to the dm-raid superblock and new raid types to support
takeover/reshape.
Add all necessary members needed to support takeover and reshape in one
go -- aiming to limit the amount of changes to the superblock layout.
This is a larger patch due to the new superblock members, their related
flags, validation of both and involved API additions/changes:
- add additional members to keep track of:
- state about forward/backward reshaping
- reshape position
- new level, layout, stripe size and delta disks
- data offset to current and new data for out-of-place reshapes
- failed devices bitfield extensions to keep track of max raid devices
- adjust super_validate() to cope with new superblock members
- adjust super_init_validation() to cope with new superblock members
- add definitions for ctr flags supporting delta disks etc.
- add new raid types (raid6_n_6 etc.)
- add new raid10 supporting function API (_is_raid10_*())
- adjust to changed raid10 supporting function API
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Heinz Mauelshagen [Thu, 19 May 2016 16:49:29 +0000 (18:49 +0200)]
dm raid: use rt_is_raid*() in all appropriate checks
Make use if raid type rt_is_*() bool functions for simplification and
consistency reasons.
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Heinz Mauelshagen [Thu, 19 May 2016 16:49:28 +0000 (18:49 +0200)]
dm raid: more use of flag testing wrappers
- add _test_flags() function
- use it to simplify rs_check_for_invalid_flags()
- use _test_flag() throughout
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Heinz Mauelshagen [Thu, 19 May 2016 16:49:27 +0000 (18:49 +0200)]
dm raid: check constructor arguments for invalid raid level/argument combinations
Reject invalid flag combinations to avoid potential data corruption or
failing raid set construction:
- add definitions for constructor flag combinations and invalid flags
per level
- add bool test functions for the various raid types
(also will be used by future reshaping enhancements)
- introduce rs_check_for_invalid_flags() and _invalid_flags()
to perform the validity checks
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Heinz Mauelshagen [Thu, 19 May 2016 16:49:26 +0000 (18:49 +0200)]
dm raid: cleanup / provide infrastructure
Provide necessary infrastructure to handle ctr flags and their names
and cleanup setting ti->error:
- comment constructor flags
- introduce constructor flag manipulation
- introduce ti_error_*() functions to simplify
setting the error message (use in other targets?)
- introduce array to hold ctr flag <-> flag name mapping
- introduce argument name by flag functions for that array
- use those functions throughout the ctr call path
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Heinz Mauelshagen [Thu, 19 May 2016 16:49:25 +0000 (18:49 +0200)]
dm raid: use dm_arg_set API in constructor
- use dm_arg_set API in ctr and its callees parse_raid_params() and dev_parms()
- introduce _in_range() function to check a value is in a [ min, max ] range;
this is to support more callers in parsing parameters etc. in the future
- correct comment on MAX_RAID_DEVICES
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Heinz Mauelshagen [Thu, 19 May 2016 16:49:24 +0000 (18:49 +0200)]
dm raid: rename variable 'ret' to 'r' to conform to other dm code
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Mike Snitzer [Wed, 25 May 2016 01:16:51 +0000 (21:16 -0400)]
dm mpath: add optional "queue_mode" feature
Allow a user to specify an optional feature 'queue_mode <mode>' where
<mode> may be "bio", "rq" or "mq" -- which corresponds to bio-based,
request_fn rq-based, and blk-mq rq-based respectively.
If the queue_mode feature isn't specified the default for the
"multipath" target is still "rq" but if dm_mod.use_blk_mq is set to Y
it'll default to mode "mq".
This new queue_mode feature introduces the ability for each multipath
device to have its own queue_mode (whereas before this feature all
multipath devices effectively had to have the same queue_mode).
This commit also goes a long way to eliminate the awkward (ab)use of
DM_TYPE_*, the associated filter_md_type() and other relatively fragile
and difficult to maintain code.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Mike Snitzer [Tue, 24 May 2016 19:48:08 +0000 (15:48 -0400)]
dm mpath: remove bio-based bloat from struct dm_mpath_io
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Mike Snitzer [Thu, 19 May 2016 20:15:14 +0000 (16:15 -0400)]
dm mpath: reinstate bio-based support
Add "multipath-bio" target that offers a bio-based multipath target as
an alternative to the request-based "multipath" target -- but in a
following commit "multipath-bio" will immediately be replaced by a new
"queue_mode" feature for the "multipath" target which will allow
bio-based mode to be selected.
When DM multipath was originally converted from bio-based to
request-based the motivation for the change was better dynamic load
balancing (by leveraging block core's request-based IO schedulers, for
merging and sorting, _before_ DM multipath would make the decision on
where to steer the IO -- based on path load and/or availability).
More background is available in this "Request-based Device-mapper
multipath and Dynamic load balancing" paper:
https://www.kernel.org/doc/ols/2007/ols2007v2-pages-235-244.pdf
But we've now come full circle where significantly faster storage
devices no longer need IOs to be made larger to drive optimal IO
performance. And even if they do there have been changes to the block
and filesystem layers that help ensure upper layers are constructing
larger IOs. In addition, SCSI's differentiated IO errors will propagate
through to bio-based IO completion hooks -- so that eliminates another
historic justiciation for request-based DM multipath. Lastly, the block
layer's immutable biovec changes have made bio cloning cheaper than it
has ever been; whereas request cloning is still relatively expensive
(both on a CPU usage and memory footprint level).
As such, bio-based DM multipath offers the promise of a more efficient
IO path for high IOPs devices that are, or will be, emerging.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Mike Snitzer [Thu, 12 May 2016 20:28:10 +0000 (16:28 -0400)]
dm: move request-based code out to dm-rq.[hc]
Add some seperation between bio-based and request-based DM core code.
'struct mapped_device' and other DM core only structures and functions
have been moved to dm-core.h and all relevant DM core .c files have been
updated to include dm-core.h rather than dm.h
DM targets should _never_ include dm-core.h!
[block core merge conflict resolution from Stephen Rothwell]
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Ming Lei [Fri, 10 Jun 2016 03:27:12 +0000 (11:27 +0800)]
block: bio: kill BIO_MAX_SIZE
No one need this macro now, so remove it. Basically
only how many bvecs in one bio matters instead
of how many bytes in this bio.
The motivation is for supporting multipage bvecs, in
which we only know what the max count of bvecs is supported
in the bio.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Jens Axboe [Thu, 9 Jun 2016 21:47:29 +0000 (15:47 -0600)]
cfq-iosched: temporarily boost queue priority for idle classes
If we're queuing REQ_PRIO IO and the task is running at an idle IO
class, then temporarily boost the priority. This prevents livelocks
due to priority inversion, when a low priority task is holding file
system resources while attempting to do IO.
An example of that is shown below. An ioniced idle task is holding
the directory mutex, while a normal priority task is trying to do
a directory lookup.
[478381.198925] ------------[ cut here ]------------
[478381.200315] INFO: task ionice:
1168369 blocked for more than 120 seconds.
[478381.201324] Not tainted 4.0.9-38_fbk5_hotfix1_2936_g85409c6 #1
[478381.202278] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[478381.203462] ionice D
ffff8803692736a8 0
1168369 1 0x00000080
[478381.203466]
ffff8803692736a8 ffff880399c21300 ffff880276adcc00 ffff880369273698
[478381.204589]
ffff880369273fd8 0000000000000000 7fffffffffffffff 0000000000000002
[478381.205752]
ffffffff8177d5e0 ffff8803692736c8 ffffffff8177cea7 0000000000000000
[478381.206874] Call Trace:
[478381.207253] [<
ffffffff8177d5e0>] ? bit_wait_io_timeout+0x80/0x80
[478381.208175] [<
ffffffff8177cea7>] schedule+0x37/0x90
[478381.208932] [<
ffffffff8177f5fc>] schedule_timeout+0x1dc/0x250
[478381.209805] [<
ffffffff81421c17>] ? __blk_run_queue+0x37/0x50
[478381.210706] [<
ffffffff810ca1c5>] ? ktime_get+0x45/0xb0
[478381.211489] [<
ffffffff8177c407>] io_schedule_timeout+0xa7/0x110
[478381.212402] [<
ffffffff810a8c2b>] ? prepare_to_wait+0x5b/0x90
[478381.213280] [<
ffffffff8177d616>] bit_wait_io+0x36/0x50
[478381.214063] [<
ffffffff8177d325>] __wait_on_bit+0x65/0x90
[478381.214961] [<
ffffffff8177d5e0>] ? bit_wait_io_timeout+0x80/0x80
[478381.215872] [<
ffffffff8177d47c>] out_of_line_wait_on_bit+0x7c/0x90
[478381.216806] [<
ffffffff810a89f0>] ? wake_atomic_t_function+0x40/0x40
[478381.217773] [<
ffffffff811f03aa>] __wait_on_buffer+0x2a/0x30
[478381.218641] [<
ffffffff8123c557>] ext4_bread+0x57/0x70
[478381.219425] [<
ffffffff8124498c>] __ext4_read_dirblock+0x3c/0x380
[478381.220467] [<
ffffffff8124665d>] ext4_dx_find_entry+0x7d/0x170
[478381.221357] [<
ffffffff8114c49e>] ? find_get_entry+0x1e/0xa0
[478381.222208] [<
ffffffff81246bd4>] ext4_find_entry+0x484/0x510
[478381.223090] [<
ffffffff812471a2>] ext4_lookup+0x52/0x160
[478381.223882] [<
ffffffff811c401d>] lookup_real+0x1d/0x60
[478381.224675] [<
ffffffff811c4698>] __lookup_hash+0x38/0x50
[478381.225697] [<
ffffffff817745bd>] lookup_slow+0x45/0xab
[478381.226941] [<
ffffffff811c690e>] link_path_walk+0x7ae/0x820
[478381.227880] [<
ffffffff811c6a42>] path_init+0xc2/0x430
[478381.228677] [<
ffffffff813e6e26>] ? security_file_alloc+0x16/0x20
[478381.229776] [<
ffffffff811c8c57>] path_openat+0x77/0x620
[478381.230767] [<
ffffffff81185c6e>] ? page_add_file_rmap+0x2e/0x70
[478381.232019] [<
ffffffff811cb253>] do_filp_open+0x43/0xa0
[478381.233016] [<
ffffffff8108c4a9>] ? creds_are_invalid+0x29/0x70
[478381.234072] [<
ffffffff811c0cb0>] do_open_execat+0x70/0x170
[478381.235039] [<
ffffffff811c1bf8>] do_execveat_common.isra.36+0x1b8/0x6e0
[478381.236051] [<
ffffffff811c214c>] do_execve+0x2c/0x30
[478381.236809] [<
ffffffff811ca392>] ? getname+0x12/0x20
[478381.237564] [<
ffffffff811c23be>] SyS_execve+0x2e/0x40
[478381.238338] [<
ffffffff81780a1d>] stub_execve+0x6d/0xa0
[478381.239126] ------------[ cut here ]------------
[478381.239915] ------------[ cut here ]------------
[478381.240606] INFO: task python2.7:
1168375 blocked for more than 120 seconds.
[478381.242673] Not tainted 4.0.9-38_fbk5_hotfix1_2936_g85409c6 #1
[478381.243653] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[478381.244902] python2.7 D
ffff88005cf8fb98 0
1168375 1168248 0x00000080
[478381.244904]
ffff88005cf8fb98 ffff88016c1f0980 ffffffff81c134c0 ffff88016c1f11a0
[478381.246023]
ffff88005cf8ffd8 ffff880466cd0cbc ffff88016c1f0980 00000000ffffffff
[478381.247138]
ffff880466cd0cc0 ffff88005cf8fbb8 ffffffff8177cea7 ffff88005cf8fcc8
[478381.248252] Call Trace:
[478381.248630] [<
ffffffff8177cea7>] schedule+0x37/0x90
[478381.249382] [<
ffffffff8177d08e>] schedule_preempt_disabled+0xe/0x10
[478381.250465] [<
ffffffff8177e892>] __mutex_lock_slowpath+0x92/0x100
[478381.251409] [<
ffffffff8177e91b>] mutex_lock+0x1b/0x2f
[478381.252199] [<
ffffffff817745ae>] lookup_slow+0x36/0xab
[478381.253023] [<
ffffffff811c690e>] link_path_walk+0x7ae/0x820
[478381.253877] [<
ffffffff811aeb41>] ? try_charge+0xc1/0x700
[478381.254690] [<
ffffffff811c6a42>] path_init+0xc2/0x430
[478381.255525] [<
ffffffff813e6e26>] ? security_file_alloc+0x16/0x20
[478381.256450] [<
ffffffff811c8c57>] path_openat+0x77/0x620
[478381.257256] [<
ffffffff8115b2fb>] ? lru_cache_add_active_or_unevictable+0x2b/0xa0
[478381.258390] [<
ffffffff8117b623>] ? handle_mm_fault+0x13f3/0x1720
[478381.259309] [<
ffffffff811cb253>] do_filp_open+0x43/0xa0
[478381.260139] [<
ffffffff811d7ae2>] ? __alloc_fd+0x42/0x120
[478381.260962] [<
ffffffff811b95ac>] do_sys_open+0x13c/0x230
[478381.261779] [<
ffffffff81011393>] ? syscall_trace_enter_phase1+0x113/0x170
[478381.262851] [<
ffffffff811b96c2>] SyS_open+0x22/0x30
[478381.263598] [<
ffffffff81780532>] system_call_fastpath+0x12/0x17
[478381.264551] ------------[ cut here ]------------
[478381.265377] ------------[ cut here ]------------
Signed-off-by: Jens Axboe <axboe@fb.com>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Ming Lei [Mon, 30 May 2016 13:34:35 +0000 (21:34 +0800)]
block: drbd: avoid to use BIO_MAX_SIZE
Use BIO_MAX_PAGES instead and we will remove BIO_MAX_SIZE.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Tested-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Ming Lei [Thu, 9 Jun 2016 16:03:28 +0000 (10:03 -0600)]
block: bio: remove BIO_MAX_SECTORS
No one need this macro, so remove it. The motivation is for supporting
multipage bvecs, in which we only know what the max count of bvecs is
supported in the bio, instead of max size or max sectors.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Tested-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Ming Lei [Mon, 30 May 2016 13:34:33 +0000 (21:34 +0800)]
fs: xfs: replace BIO_MAX_SECTORS with BIO_MAX_PAGES
BIO_MAX_PAGES is used as maximum count of bvecs, so
replace BIO_MAX_SECTORS with BIO_MAX_PAGES since
BIO_MAX_SECTORS is to be removed.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Tested-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Ming Lei [Mon, 30 May 2016 13:34:32 +0000 (21:34 +0800)]
iov_iter: use bvec iterator to implement iterate_bvec()
bvec has one native/mature iterator for long time, so not
necessary to use the reinvented wheel for iterating bvecs
in lib/iov_iter.c.
Two ITER_BVEC test cases are run:
- xfstest(-g auto) on loop dio/aio, no regression found
- swap file works well under extreme stress(stress-ng --all 64 -t
800 -v), and lots of OOMs are triggerd, and the whole
system still survives
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Tested-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Ming Lei [Mon, 30 May 2016 13:34:31 +0000 (21:34 +0800)]
block: mark 1st parameter of bvec_iter_advance as const
bvec_iter_advance() only writes the parameter of iterator,
so the base address of bvec can be marked as const safely.
Without the change, we can see compiling warning in the
following patch for implementing iterate_bvec(): lib/iov_iter.c
with bvec iterator.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Tested-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Ming Lei [Mon, 30 May 2016 13:34:30 +0000 (21:34 +0800)]
block: move two bvec structure into bvec.h
This patch moves 'struct bio_vec' and 'struct bvec_iter'
into 'include/linux/bvec.h', then always include this header
into 'include/linux/blk_types.h'.
With this change, both 'struct bvec_iter' and bvec iterator
helpers don't depend on CONFIG_BLOCK any more, then we can
use bvec iterator to implement iterate_bvec(): lib/iov_iter.c.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Tested-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Ming Lei [Thu, 9 Jun 2016 16:00:58 +0000 (10:00 -0600)]
block: move bvec iterator into include/linux/bvec.h
bvec iterator helpers should be used to implement by
iterate_bvec():lib/iov_iter.c too, and move them into
one header, so that we can keep bvec iterator header
out of CONFIG_BLOCK. Then we can remove the reinventing
of wheel in iterate_bvec().
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Tested-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Omar Sandoval [Thu, 9 Jun 2016 01:22:20 +0000 (18:22 -0700)]
blk-mq: actually hook up defer list when running requests
If ->queue_rq() returns BLK_MQ_RQ_QUEUE_OK, we use continue and skip
over the rest of the loop body. However, dptr is assigned later in the
loop body, and the BLK_MQ_RQ_QUEUE_OK case is exactly the case that we'd
want it for.
NVMe isn't actually using BLK_MQ_F_DEFER_ISSUE yet, nor is any other
in-tree driver, but if the code's going to be there, it might as well
work.
Fixes:
74c450521dd8 ("blk-mq: add a 'list' parameter to ->queue_rq()")
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Christoph Hellwig [Thu, 9 Jun 2016 14:00:35 +0000 (16:00 +0200)]
block: better packing for struct request
Keep the 32-bit CPU and cmd_type flags together to avoid holes on 64-bit
architectures.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
Mike Christie [Wed, 8 Jun 2016 20:49:40 +0000 (15:49 -0500)]
ext4: use bio op helprs in ext4 crypto code
This was missed from my last patchset.
This patch has ext4 crypto code use the bio op helper
to set the operation. The operation (discard, write, writesame,
etc) is now defined seperately from the other REQ bits. They
still share the bi_rw field to save space, so we use these
helpers so modules do not have to worry about setting/overwriting
info.
Jens, I am not sure how you handle patches on top of patches
in the next branches. If you merge patches that fix issues
in previous patches in next, then this patch could be part
of
commit
95fe6c1a209ef89d9f94dd04a0ad72be1487d5d5
Author: Mike Christie <mchristi@redhat.com>
Date: Sun Jun 5 14:31:48 2016 -0500
block, fs, mm, drivers: use bio set/get op accessors
Signed-off-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Jan Kara [Wed, 8 Jun 2016 13:11:39 +0000 (15:11 +0200)]
cfq-iosched: Convert to use highres timers
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jan Kara <jack@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Jeff Moyer [Wed, 8 Jun 2016 13:11:38 +0000 (15:11 +0200)]
cfq-iosched: Expose microsecond interfaces
Expose interfaces to tune time slices of CFQ IO scheduler in
microseconds.
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
Jeff Moyer [Wed, 8 Jun 2016 14:55:34 +0000 (08:55 -0600)]
cfq-iosched: Convert from jiffies to nanoseconds
Convert all time-keeping in CFQ IO scheduler from jiffies to nanoseconds
so that we can later make the intervals more fine-grained than jiffies.
One jiffie is several miliseconds and even for today's rotating disks
that is a noticeable amount of time and thus we leave disk unnecessarily
idle.
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
Mike Christie [Sun, 5 Jun 2016 19:32:25 +0000 (14:32 -0500)]
block, drivers, fs: rename REQ_FLUSH to REQ_PREFLUSH
To avoid confusion between REQ_OP_FLUSH, which is handled by
request_fn drivers, and upper layers requesting the block layer
perform a flush sequence along with possibly a WRITE, this patch
renames REQ_FLUSH to REQ_PREFLUSH.
Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Mike Christie [Sun, 5 Jun 2016 19:32:24 +0000 (14:32 -0500)]
block: do not use REQ_FLUSH for tracking flush support
The last patch added a REQ_OP_FLUSH for request_fn drivers
and the next patch renames REQ_FLUSH to REQ_PREFLUSH which
will be used by file systems and make_request_fn drivers so
they can send a write/flush combo.
This patch drops xen's use of REQ_FLUSH to track if it supports
REQ_OP_FLUSH requests, so REQ_FLUSH can be deleted.
Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Juergen Gross <kernel@pfupf.net>
Signed-off-by: Jens Axboe <axboe@fb.com>
Mike Christie [Sun, 5 Jun 2016 19:32:23 +0000 (14:32 -0500)]
block, drivers: add REQ_OP_FLUSH operation
This adds a REQ_OP_FLUSH operation that is sent to request_fn
based drivers by the block layer's flush code, instead of
sending requests with the request->cmd_flags REQ_FLUSH bit set.
Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Mike Christie [Sun, 5 Jun 2016 19:32:22 +0000 (14:32 -0500)]
block, fs, drivers: remove REQ_OP compat defs and related code
This patch drops the compat definition of req_op where it matches
the rq_flag_bits definitions, and drops the related old and compat
code that allowed users to set either the op or flags for the operation.
We also then store the operation in the bi_rw/cmd_flags field similar
to how we used to store the bio ioprio where it sat in the upper bits
of the field.
Signed-off-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Mike Christie [Sun, 5 Jun 2016 19:32:21 +0000 (14:32 -0500)]
block, drivers, fs: shrink bi_rw from long to int
We don't need bi_rw to be so large on 64 bit archs, so
reduce it to unsigned int.
Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Mike Christie [Sun, 5 Jun 2016 19:32:20 +0000 (14:32 -0500)]
block: move bio io prio to a new field
In the next patch, we move drop the compat code and make
the op a separate value that is hidden in bi_rw. To give
the op and rq bits flags room to grow this moves prio to
its own field.
Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Mike Christie [Sun, 5 Jun 2016 19:32:19 +0000 (14:32 -0500)]
ide cd: do not set REQ_WRITE on requests.
The block layer will set the correct READ/WRITE operation flags/fields
when creating a request, so there is not need for drivers to set the
REQ_WRITE flag.
Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Mike Christie [Sun, 5 Jun 2016 19:32:18 +0000 (14:32 -0500)]
blktrace: use op accessors
Have blktrace use the req/bio op accessor to get the REQ_OP.
Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Mike Christie [Sun, 5 Jun 2016 19:32:17 +0000 (14:32 -0500)]
drivers: use req op accessor
The req operation REQ_OP is separated from the rq_flag_bits
definition. This converts the block layer drivers to
use req_op to get the op from the request struct.
Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Mike Christie [Sun, 5 Jun 2016 19:32:16 +0000 (14:32 -0500)]
block: convert is_sync helpers to use REQ_OPs.
This patch converts the is_sync helpers to use separate variables
for the operation and flags.
Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Mike Christie [Sun, 5 Jun 2016 19:32:15 +0000 (14:32 -0500)]
block: convert merge/insert code to check for REQ_OPs.
This patch converts the block layer merging code to use separate variables
for the operation and flags, and to check req_op for the REQ_OP.
Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Mike Christie [Sun, 5 Jun 2016 19:32:14 +0000 (14:32 -0500)]
blkg_rwstat: separate op from flags
The bio and request operation and flags are going to be separate
definitions, so we cannot pass them in as a bitmap. This patch
converts the blkg_rwstat code and its caller, cfq, to pass in the
values separately.
Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Mike Christie [Sun, 5 Jun 2016 19:32:13 +0000 (14:32 -0500)]
block: prepare elevator to use REQ_OPs.
This patch converts the elevator code to use separate variables
for the operation and flags, and to check req_op for the REQ_OP.
Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Mike Christie [Sun, 5 Jun 2016 19:32:12 +0000 (14:32 -0500)]
block: prepare mq request creation to use REQ_OPs
This patch modifies the blk mq request creation code to use
separate variables for the operation and flags, because in the
the next patches the struct request users will be converted like
was done for bios.
Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Mike Christie [Sun, 5 Jun 2016 19:32:11 +0000 (14:32 -0500)]
block: prepare request creation/destruction code to use REQ_OPs
This patch prepares *_get_request/*_put_request and freed_request,
to use separate variables for the operation and flags. In the
next patches the struct request users will be converted like
was done for bios where the op and flags are set separately.
Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Mike Christie [Sun, 5 Jun 2016 19:32:10 +0000 (14:32 -0500)]
block: copy bio op to request op
The bio users should now always be setting up the bio op. This patch
has the block layer copy that to the request.
Signed-off-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Mike Christie [Sun, 5 Jun 2016 19:32:09 +0000 (14:32 -0500)]
xen: use bio op accessors
Separate the op from the rq_flag_bits and have xen
set/get the bio using bio_set_op_attrs/bio_op.
Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Mike Christie [Sun, 5 Jun 2016 19:32:08 +0000 (14:32 -0500)]
target: use bio op accessors
Separate the op from the rq_flag_bits and have the target layer
set/get the bio using bio_set_op_attrs/bio_op.
Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Mike Christie [Sun, 5 Jun 2016 19:32:07 +0000 (14:32 -0500)]
md: use bio op accessors
Separate the op from the rq_flag_bits and have md
set/get the bio using bio_set_op_attrs/bio_op.
Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Mike Christie [Sun, 5 Jun 2016 19:32:06 +0000 (14:32 -0500)]
drbd: use bio op accessors
Separate the op from the rq_flag_bits and have drbd
set/get the bio using bio_set_op_attrs/bio_op.
Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Mike Christie [Sun, 5 Jun 2016 19:32:05 +0000 (14:32 -0500)]
bcache: use bio op accessors
Separate the op from the rq_flag_bits and have bcache
set/get the bio using bio_set_op_attrs/bio_op.
Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Mike Christie [Sun, 5 Jun 2016 19:32:04 +0000 (14:32 -0500)]
dm: use bio op accessors
Separate the op from the rq_flag_bits and have dm
set/get the bio using bio_set_op_attrs/bio_op.
Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Mike Christie [Sun, 5 Jun 2016 19:32:03 +0000 (14:32 -0500)]
dm: pass dm stats data dir instead of bi_rw
It looks like dm stats cares about the data direction
(READ vs WRITE) and does not need the bio/request flags.
Commands like REQ_FLUSH, REQ_DISCARD and REQ_WRITE_SAME
are currently always set with REQ_WRITE, so the extra check for
REQ_DISCARD in dm_stats_account_io is not needed.
This patch has it use the bio and request data_dir helpers
instead of accessing the bi_rw/cmd_flags directly. This makes
the next patches that remove the operation from the cmd_flags
and bi_rw easier, because we will no longer have the REQ_WRITE
bit set for operations like discards.
Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Mike Christie [Sun, 5 Jun 2016 19:32:02 +0000 (14:32 -0500)]
pm: use bio op accessors
Separate the op from the rq_flag_bits and have the pm code
set/get the bio using bio_set_op_attrs/bio_op.
Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Mike Christie [Sun, 5 Jun 2016 19:32:01 +0000 (14:32 -0500)]
ocfs2: use bio op accessors
Separate the op from the rq_flag_bits and have ocfs2
set/get the bio using bio_set_op_attrs/bio_op.
Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Mike Christie [Sun, 5 Jun 2016 19:32:00 +0000 (14:32 -0500)]
nilfs: use bio op accessors
Separate the op from the rq_flag_bits and have nilfs
set/get the bio using bio_set_op_attrs/bio_op.
Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <axboe@fb.com>
Mike Christie [Sun, 5 Jun 2016 19:31:59 +0000 (14:31 -0500)]
mpage: use bio op accessors
Separate the op from the rq_flag_bits and have the mpage code
set/get the bio using bio_set_op_attrs/bio_op.
Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Mike Christie [Sun, 5 Jun 2016 19:31:58 +0000 (14:31 -0500)]
hfsplus: use bio op accessors
Separate the op from the rq_flag_bits and have gfs2
set/get the bio using bio_set_op_attrs/bio_op.
Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Mike Christie [Sun, 5 Jun 2016 19:31:57 +0000 (14:31 -0500)]
xfs: use bio op accessors
Separate the op from the rq_flag_bits and have xfs
set/get the bio using bio_set_op_attrs/bio_op.
Signed-off-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Mike Christie [Sun, 5 Jun 2016 19:31:56 +0000 (14:31 -0500)]
gfs2: use bio op accessors
Separate the op from the rq_flag_bits and have gfs2
set/get the bio using bio_set_op_attrs/bio_op.
Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Mike Christie [Sun, 5 Jun 2016 19:31:55 +0000 (14:31 -0500)]
f2fs: use bio op accessors
Separate the op from the rq_flag_bits and have f2fs
set/get the bio using bio_set_op_attrs/bio_op.
Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Mike Christie [Sun, 5 Jun 2016 19:31:54 +0000 (14:31 -0500)]
btrfs: use bio fields for op and flags
The bio REQ_OP and bi_rw rq_flag_bits are now always setup, so there is
no need to pass around the rq_flag_bits bits too. btrfs users should
should access the bio insead.
Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Mike Christie [Sun, 5 Jun 2016 19:31:53 +0000 (14:31 -0500)]
btrfs: update __btrfs_map_block for REQ_OP transition
We no longer pass in a bitmap of rq_flag_bits bits to __btrfs_map_block.
It will always be a REQ_OP, or the btrfs specific REQ_GET_READ_MIRRORS,
so this drops the bit tests.
Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Mike Christie [Sun, 5 Jun 2016 19:31:52 +0000 (14:31 -0500)]
btrfs: use bio op accessors
This should be the easier cases to convert btrfs to
bio_set_op_attrs/bio_op.
They are mostly just cut and replace type of changes.
Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Mike Christie [Sun, 5 Jun 2016 19:31:51 +0000 (14:31 -0500)]
btrfs: have submit_one_bio users use bio op accessors
This patch has btrfs's submit_one_bio users set the bio op using
bio_set_op_attrs and get the op using bio_op.
The next patches will continue to convert btrfs,
so submit_bio_hook and merge_bio_hook
related code will be modified to take only the bio. I did
not do it in this patch to try and keep it smaller.
Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Mike Christie [Sun, 5 Jun 2016 19:31:50 +0000 (14:31 -0500)]
direct-io: use bio set/get op accessors
This patch has the dio code use a REQ_OP for the op and rq_flag_bits
for bi_rw flags. To set/get the op it uses the bio_set_op_attrs/bio_op
accssors.
It also begins to convert btrfs's dio_submit_t because of the dio
submit_io callout use. The next patches will completely convert
this code and the reset of the btrfs code paths.
Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Mike Christie [Sun, 5 Jun 2016 19:31:49 +0000 (14:31 -0500)]
block discard: use bio set op accessor
This converts the block issue discard helper and users to use
the bio_set_op_attrs accessor and only pass in the operation flags
like REQ_SEQURE.
Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Mike Christie [Sun, 5 Jun 2016 19:31:48 +0000 (14:31 -0500)]
block, fs, mm, drivers: use bio set/get op accessors
This patch converts the simple bi_rw use cases in the block,
drivers, mm and fs code to set/get the bio operation using
bio_set_op_attrs/bio_op
These should be simple one or two liner cases, so I just did them
in one patch. The next patches handle the more complicated
cases in a module per patch.
Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Mike Christie [Sun, 5 Jun 2016 19:31:47 +0000 (14:31 -0500)]
bcache: use op_is_write instead of checking for REQ_WRITE
We currently set REQ_WRITE/WRITE for all non READ IOs
like discard, flush, writesame, etc. In the next patches where we
no longer set up the op as a bitmap, we will not be able to
detect a operation direction like writesame by testing if REQ_WRITE is
set.
This has bcache use the op_is_write helper which will do the right
thing.
Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Mike Christie [Sun, 5 Jun 2016 19:31:46 +0000 (14:31 -0500)]
dm: use op_is_write instead of checking for REQ_WRITE
We currently set REQ_WRITE/WRITE for all non READ IOs
like discard, flush, writesame, etc. In the next patches where we
no longer set up the op as a bitmap, we will not be able to
detect a operation direction like writesame by testing if REQ_WRITE is
set.
This has dm use the op_is_write helper which will do the right
thing.
Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Mike Christie [Sun, 5 Jun 2016 19:31:45 +0000 (14:31 -0500)]
block, drivers, cgroup: use op_is_write helper instead of checking for REQ_WRITE
We currently set REQ_WRITE/WRITE for all non READ IOs
like discard, flush, writesame, etc. In the next patches where we
no longer set up the op as a bitmap, we will not be able to
detect a operation direction like writesame by testing if REQ_WRITE is
set.
This patch converts the drivers and cgroup to use the
op_is_write helper. This should just cover the simple
cases. I did dm, md and bcache in their own patches
because they were more involved.
Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Mike Christie [Sun, 5 Jun 2016 19:31:44 +0000 (14:31 -0500)]
fs: have ll_rw_block users pass in op and flags separately
This has ll_rw_block users pass in the operation and flags separately,
so ll_rw_block can setup the bio op and bi_rw flags on the bio that
is submitted.
Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Mike Christie [Sun, 5 Jun 2016 19:31:43 +0000 (14:31 -0500)]
fs: have submit_bh users pass in op and flags separately
This has submit_bh users pass in the operation and flags separately,
so submit_bh_wbc can setup the bio op and bi_rw flags on the bio that
is submitted.
Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Mike Christie [Sun, 5 Jun 2016 19:31:42 +0000 (14:31 -0500)]
block: add REQ_OP definitions and helpers
The following patches separate the operation (WRITE, READ, DISCARD,
etc) from the rq_flag_bits flags. This patch adds definitions for
request/bio operations (REQ_OPs) and adds request/bio accessors to
get/set the op.
In this patch the REQ_OPs match the REQ rq_flag_bits ones
for compat reasons while all the code is converted to use the
op accessors in the set. In the last patches the op will become a
number and the accessors and helpers in this patch will be dropped
or updated.
Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Mike Christie [Sun, 5 Jun 2016 19:31:41 +0000 (14:31 -0500)]
block/fs/drivers: remove rw argument from submit_bio
This has callers of submit_bio/submit_bio_wait set the bio->bi_rw
instead of passing it in. This makes that use the same as
generic_make_request and how we set the other bio fields.
Signed-off-by: Mike Christie <mchristi@redhat.com>
Fixed up fs/ext4/crypto.c
Signed-off-by: Jens Axboe <axboe@fb.com>
Linus Torvalds [Sun, 5 Jun 2016 21:31:26 +0000 (14:31 -0700)]
Linux 4.7-rc2
Linus Torvalds [Sun, 5 Jun 2016 18:15:33 +0000 (11:15 -0700)]
Merge branch 'parisc-4.7-2' of git://git./linux/kernel/git/deller/parisc-linux
Pull parisc fixes from Helge Deller:
- Fix printk time stamps on SMP systems which got wrong due to a patch
which was added during the merge window
- Fix two bugs in the stack backtrace code: Races in module unloading
and possible invalid accesses to memory due to wrong instruction
decoding (Mikulas Patocka)
- Fix userspace crash when syscalls access invalid unaligned userspace
addresses. Those syscalls will now return EFAULT as expected.
(tagged for stable kernel series)
* 'parisc-4.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Move die_if_kernel() prototype into traps.h header
parisc: Fix pagefault crash in unaligned __get_user() call
parisc: Fix printk time during boot
parisc: Fix backtrace on PA-RISC
Linus Torvalds [Sun, 5 Jun 2016 18:02:00 +0000 (11:02 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jmorris/linux-security
Pull key handling update from James Morris:
"This alters a new keyctl function added in the current merge window to
allow for a future extension planned for the next merge window"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
KEYS: Add placeholder for KDF usage with DH
Eric W. Biederman [Thu, 2 Jun 2016 15:29:47 +0000 (10:29 -0500)]
devpts: Make each mount of devpts an independent filesystem.
The /dev/ptmx device node is changed to lookup the directory entry "pts"
in the same directory as the /dev/ptmx device node was opened in. If
there is a "pts" entry and that entry is a devpts filesystem /dev/ptmx
uses that filesystem. Otherwise the open of /dev/ptmx fails.
The DEVPTS_MULTIPLE_INSTANCES configuration option is removed, so that
userspace can now safely depend on each mount of devpts creating a new
instance of the filesystem.
Each mount of devpts is now a separate and equal filesystem.
Reserved ttys are now available to all instances of devpts where the
mounter is in the initial mount namespace.
A new vfs helper path_pts is introduced that finds a directory entry
named "pts" in the directory of the passed in path, and changes the
passed in path to point to it. The helper path_pts uses a function
path_parent_directory that was factored out of follow_dotdot.
In the implementation of devpts:
- devpts_mnt is killed as it is no longer meaningful if all mounts of
devpts are equal.
- pts_sb_from_inode is replaced by just inode->i_sb as all cached
inodes in the tty layer are now from the devpts filesystem.
- devpts_add_ref is rolled into the new function devpts_ptmx. And the
unnecessary inode hold is removed.
- devpts_del_ref is renamed devpts_release and reduced to just a
deacrivate_super.
- The newinstance mount option continues to be accepted but is now
ignored.
In devpts_fs.h definitions for when !CONFIG_UNIX98_PTYS are removed as
they are never used.
Documentation/filesystems/devices.txt is updated to describe the current
situation.
This has been verified to work properly on openwrt-15.05, centos5,
centos6, centos7, debian-6.0.2, debian-7.9, debian-8.2, ubuntu-14.04.3,
ubuntu-15.10, fedora23, magia-5, mint-17.3, opensuse-42.1,
slackware-14.1, gentoo-
20151225 (13.0?), archlinux-2015-12-01. With the
caveat that on centos6 and on slackware-14.1 that there wind up being
two instances of the devpts filesystem mounted on /dev/pts, the lower
copy does not end up getting used.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Greg KH <greg@kroah.com>
Cc: Peter Hurley <peter@hurleysoftware.com>
Cc: Peter Anvin <hpa@zytor.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Serge Hallyn <serge.hallyn@ubuntu.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Aurelien Jarno <aurelien@aurel32.net>
Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>
Cc: Jann Horn <jann@thejh.net>
Cc: Jiri Slaby <jslaby@suse.com>
Cc: Florian Weimer <fw@deneb.enyo.de>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>