Al Viro [Thu, 9 Apr 2015 16:55:47 +0000 (12:55 -0400)]
switch generic_write_checks() to iocb and iter
... returning -E... upon error and amount of data left in iter after
(possible) truncation upon success. Note, that normal case gives
a non-zero (positive) return value, so any tests for != 0 _must_ be
updated.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Conflicts:
fs/ext4/file.c
Al Viro [Thu, 9 Apr 2015 15:14:45 +0000 (11:14 -0400)]
ocfs2: move generic_write_checks() before the alignment checks
Alignment checks for dio depend upon the range truncation done by
generic_write_checks(). They can be done as soon as we got ocfs2_rw_lock()
and that actually makes ocfs2_prepare_inode_for_write() simpler.
The only thing to watch out for is restoring the original count
in "unlock and redo without dio" case. Position doesn't need to be
restored, since we change it only in O_APPEND case and in that case it
will be reassigned anyway.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Thu, 9 Apr 2015 11:25:03 +0000 (07:25 -0400)]
ocfs2_file_write_iter: stop messing with ppos
it's &iocb->ki_pos; no need to obfuscate.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 12 Apr 2015 02:29:51 +0000 (22:29 -0400)]
Merge branch 'for-linus' into for-next
Al Viro [Tue, 7 Apr 2015 19:26:36 +0000 (15:26 -0400)]
udf_file_write_iter: reorder and simplify
it's easier to do generic_write_checks() first
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Tue, 7 Apr 2015 19:06:19 +0000 (15:06 -0400)]
fuse: ->direct_IO() doesn't need generic_write_checks()
already done by caller. We used to call __fuse_direct_write(), which
called generic_write_checks(); now the former got expanded, bringing
the latter to the surface. It used to be called all along and calling
it from there had been wrong all along...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Tue, 7 Apr 2015 18:48:22 +0000 (14:48 -0400)]
ext4_file_write_iter: move generic_write_checks() up
simpler that way...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Tue, 7 Apr 2015 18:25:18 +0000 (14:25 -0400)]
xfs_file_aio_write_checks: switch to iocb/iov_iter
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sat, 4 Apr 2015 08:05:48 +0000 (04:05 -0400)]
generic_write_checks(): drop isblk argument
all remaining callers are passing 0; some just obscure that fact.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Tue, 7 Apr 2015 15:35:14 +0000 (11:35 -0400)]
blkdev_write_iter: expand generic_file_checks() call in there
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Tue, 7 Apr 2015 15:28:12 +0000 (11:28 -0400)]
lift generic_write_checks() into callers of __generic_file_write_iter()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Tue, 7 Apr 2015 14:22:53 +0000 (10:22 -0400)]
__generic_file_write_iter: keep ->ki_pos and return value consistent
A side effect worth noting: in O_APPEND case we set ->ki_pos early,
so if it turns out to be an error or a zero-length write, we'll
end up with ->ki_pos modified. Safe, since all callers never
look at the ->ki_pos after the call of __generic_file_write_iter()
returning non-positive, all the way to caller of ->write_iter() and
those discard ->ki_pos when getting that.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Tue, 7 Apr 2015 02:44:11 +0000 (22:44 -0400)]
cifs: fold cifs_iovec_write() into the only caller
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 5 Apr 2015 18:06:24 +0000 (14:06 -0400)]
ntfs: move iov_iter_truncate() closer to generic_write_checks()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Tue, 7 Apr 2015 00:50:38 +0000 (20:50 -0400)]
new_sync_write(): discard ->ki_pos unless the return value is positive
That allows ->write_iter() instances much more convenient life wrt
iocb->ki_pos (and fixes several filesystems with borderline POSIX
violations when zero-length write succeeds and changes the current
position).
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Omar Sandoval [Mon, 16 Mar 2015 11:33:53 +0000 (04:33 -0700)]
direct_IO: remove rw from a_ops->direct_IO()
Now that no one is using rw, remove it completely.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Omar Sandoval [Mon, 16 Mar 2015 11:33:52 +0000 (04:33 -0700)]
direct_IO: use iov_iter_rw() instead of rw everywhere
The rw parameter to direct_IO is redundant with iov_iter->type, and
treated slightly differently just about everywhere it's used: some users
do rw & WRITE, and others do rw == WRITE where they should be doing a
bitwise check. Simplify this with the new iov_iter_rw() helper, which
always returns either READ or WRITE.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Omar Sandoval [Mon, 16 Mar 2015 11:33:51 +0000 (04:33 -0700)]
Remove rw from dax_{do_,}io()
And use iov_iter_rw() instead.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Omar Sandoval [Mon, 16 Mar 2015 11:33:50 +0000 (04:33 -0700)]
Remove rw from {,__,do_}blockdev_direct_IO()
Most filesystems call through to these at some point, so we'll start
here.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Omar Sandoval [Tue, 17 Mar 2015 21:04:02 +0000 (14:04 -0700)]
new helper: iov_iter_rw()
Get either READ or WRITE out of iter->type.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sat, 4 Apr 2015 05:14:53 +0000 (01:14 -0400)]
->aio_read and ->aio_write removed
no remaining users
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sat, 4 Apr 2015 04:19:32 +0000 (00:19 -0400)]
pcm: another weird API abuse
readv() and writev() should _not_ ignore all but the first ->iov_len,
among other things. Really weird abuse of those syscalls - it
expects a vector element per channel, with identical lengths (it
actually assumes them to be identical - no checking is done).
readv() and writev() are really bad match for that. Unfortunately,
userland API is userland API and we can't do anything about them.
Converted to ->read_iter/->write_iter. Please, _please_ don't do
anything of that kind when designing new interfaces.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sat, 4 Apr 2015 04:11:32 +0000 (00:11 -0400)]
infinibad: weird APIs switched to ->write_iter()
Things Not To Do When Writing A Driver, part 1001st:
have writev() and write() on the same file doing completely
different things. As in, "interpret very different sets of
commands".
We _can_ handle that, but it's a bloody bad idea.
Don't do that in new drivers. Ever.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sat, 4 Apr 2015 02:10:20 +0000 (22:10 -0400)]
kill do_sync_read/do_sync_write
all remaining instances of aio_{read,write} (all 4 of them) have explicit
->read and ->write resp.; do_sync_read/do_sync_write is never called by
__vfs_read/__vfs_write anymore and no other users had been left.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sat, 4 Apr 2015 02:06:08 +0000 (22:06 -0400)]
fuse: use iov_iter_get_pages() for non-splice path
store reference to iter instead of that to iovec
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sat, 4 Apr 2015 01:53:39 +0000 (21:53 -0400)]
fuse: switch to ->read_iter/->write_iter
we just change the calling conventions here; more work to follow.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Fri, 3 Apr 2015 19:57:04 +0000 (15:57 -0400)]
switch drivers/char/mem.c to ->read_iter/->write_iter
Note that _these_ guys have ->read() and ->write() left in place - they are
eqiuvalent to what we'd get if we replaced those with NULL, but we are
talking about hot paths here.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Fri, 3 Apr 2015 19:41:18 +0000 (15:41 -0400)]
make new_sync_{read,write}() static
All places outside of core VFS that checked ->read and ->write for being NULL or
called the methods directly are gone now, so NULL {read,write} with non-NULL
{read,write}_iter will do the right thing in all cases.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Fri, 3 Apr 2015 19:23:17 +0000 (15:23 -0400)]
coredump: accept any write method
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Fri, 3 Apr 2015 19:21:59 +0000 (15:21 -0400)]
switch /dev/loop to vfs_iter_write()
all writable files that might be used as backing store for /dev/loop
already support ->write_iter()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Fri, 3 Apr 2015 19:14:42 +0000 (15:14 -0400)]
serial2002: switch to __vfs_read/__vfs_write
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Fri, 3 Apr 2015 19:09:38 +0000 (15:09 -0400)]
ashmem: use __vfs_read()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Fri, 3 Apr 2015 19:09:18 +0000 (15:09 -0400)]
export __vfs_read()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Fri, 3 Apr 2015 19:07:48 +0000 (15:07 -0400)]
autofs: switch to __vfs_write()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Fri, 3 Apr 2015 19:06:43 +0000 (15:06 -0400)]
new helper: __vfs_write()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 12 Apr 2015 02:28:58 +0000 (22:28 -0400)]
Merge branch '9p-iov_iter' into for-next
Al Viro [Fri, 3 Apr 2015 15:31:35 +0000 (11:31 -0400)]
switch hugetlbfs to ->read_iter()
... and fix the case when the area we are asked to read crosses
a hugepage boundary
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Fri, 3 Apr 2015 14:58:11 +0000 (10:58 -0400)]
coda: switch to ->read_iter/->write_iter
... and request the same from the local cache - all filesystems with
anything usable for that support those already.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Fri, 3 Apr 2015 03:30:18 +0000 (23:30 -0400)]
ncpfs: switch to ->read_iter/->write_iter
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Fri, 3 Apr 2015 03:11:36 +0000 (23:11 -0400)]
net/9p: remove (now-)unused helpers
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Fri, 3 Apr 2015 01:47:49 +0000 (21:47 -0400)]
p9_client_attach(): set fid->uid correctly
it's almost always equal to current_fsuid(), but there's an exception -
if the first writeback fid is opened by non-root *and* that happens before
root has done any lookups in /, we end up doing attach for root. The
current code leaves the resulting FID owned by root from the server POV
and by non-root from the client one. Unfortunately, it means that e.g.
massive dcache eviction will leave that user buggered - they'll end
up redoing walks from / *and* picking that FID every time. As soon as
they try to create something, the things will get nasty.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Thu, 2 Apr 2015 16:02:03 +0000 (12:02 -0400)]
9p: we are leaking glock.client_id in v9fs_file_getlock()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Thu, 2 Apr 2015 03:59:57 +0000 (23:59 -0400)]
9p: switch to ->read_iter/->write_iter
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Thu, 2 Apr 2015 03:49:24 +0000 (23:49 -0400)]
9p: get rid of v9fs_direct_file_read()
do it in ->direct_IO()...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Thu, 2 Apr 2015 03:42:28 +0000 (23:42 -0400)]
9p: switch p9_client_read() to passing struct iov_iter *
... and make it loop
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Thu, 2 Apr 2015 02:32:23 +0000 (22:32 -0400)]
9p: get rid of v9fs_direct_file_write()
just handle it in ->direct_IO()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Thu, 2 Apr 2015 02:04:46 +0000 (22:04 -0400)]
9p: fold v9fs_file_write_internal() into the caller
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Thu, 2 Apr 2015 01:54:42 +0000 (21:54 -0400)]
9p: switch ->writepage() to direct use of p9_client_write()
Don't mess with kmap() - just use ITER_BVEC.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Thu, 2 Apr 2015 00:17:51 +0000 (20:17 -0400)]
9p: switch p9_client_write() to passing it struct iov_iter *
... and make it loop until it's done
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Wed, 1 Apr 2015 23:57:53 +0000 (19:57 -0400)]
net/9p: switch the guts of p9_client_{read,write}() to iov_iter
... and have get_user_pages_fast() mapping fewer pages than requested
to generate a short read/write.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Tue, 31 Mar 2015 16:35:13 +0000 (12:35 -0400)]
nommu: use __vfs_read()
... instead of open-coding the call of ->read()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Tue, 31 Mar 2015 16:30:48 +0000 (12:30 -0400)]
acct: check FMODE_CAN_WRITE
it's not calling ->write() directly anymore.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Tue, 31 Mar 2015 15:54:59 +0000 (11:54 -0400)]
aio_run_iocb(): kill dead check
We check if ->ki_pos is positive. However, by that point we have
already done rw_verify_area(), which would have rejected such
unless the file had been one of /dev/mem, /dev/kmem and /proc/kcore.
All of which do not have vectored rw methods, so we would've bailed
out even earlier.
This check had been introduced before rw_verify_area() had been added there
- in fact, it was a subset of checks done on sync paths by rw_verify_area()
(back then the /dev/mem exception didn't exist at all). The rest of checks
(mandatory locking, etc.) hadn't been added until later. Unfortunately,
by the time the call of rw_verify_area() got added, the /dev/mem exception
had already appeared, so it wasn't obvious that the older explicit check
downstream had become dead code. It *is* a dead code, though, since the few
files for which the exception applies do not have ->aio_{read,write}() or
->{read,write}_iter() and for them we won't reach that check anyway.
What's more, even if we ever introduce vectored methods for /dev/mem
and friends, they'll have to cope with negative positions anyway, since
readv(2) and writev(2) are using the same checks as read(2) and write(2) -
i.e. rw_verify_area().
Let's bury it.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Tue, 31 Mar 2015 15:43:52 +0000 (11:43 -0400)]
ioctx_alloc(): remove pointless check
Way, way back kiocb used to be picked from arrays, so ioctx_alloc()
checked for multiplication overflow when calculating the size of
such array. By the time fs/aio.c went into the tree (in 2002) they
were already allocated one-by-one by kmem_cache_alloc(), so that
check had already become pointless. Let's bury it...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Tue, 31 Mar 2015 03:39:16 +0000 (23:39 -0400)]
lustre: kill unused members of struct vvp_thread_info
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Tue, 31 Mar 2015 02:15:58 +0000 (22:15 -0400)]
expand __fuse_direct_write() in both callers
it's actually shorter that way *and* later we'll want iocb in scope
of generic_write_check() caller.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Tue, 31 Mar 2015 02:08:36 +0000 (22:08 -0400)]
fuse: switch fuse_direct_io_file_operations to ->{read,write}_iter()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sat, 21 Mar 2015 13:01:45 +0000 (09:01 -0400)]
cuse: switch to iov_iter
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 12 Apr 2015 02:27:19 +0000 (22:27 -0400)]
Merge branch 'for-davem' into for-next
Al Viro [Sun, 22 Mar 2015 00:25:30 +0000 (20:25 -0400)]
sg_start_req(): use import_iovec()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 22 Mar 2015 00:08:18 +0000 (20:08 -0400)]
sg_start_req(): make sure that there's not too many elements in iovec
unfortunately, allowing an arbitrary 16bit value means a possibility of
overflow in the calculation of total number of pages in bio_map_user_iov() -
we rely on there being no more than PAGE_SIZE members of sum in the
first loop there. If that sum wraps around, we end up allocating
too small array of pointers to pages and it's easy to overflow it in
the second loop.
X-Coverup: TINC (and there's no lumber cartel either)
Cc: stable@vger.kernel.org # way, way back
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 22 Mar 2015 00:06:04 +0000 (20:06 -0400)]
blk_rq_map_user(): use import_single_range()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 22 Mar 2015 00:02:55 +0000 (20:02 -0400)]
sg_io(): use import_iovec()
... and don't skip access_ok() validation.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sat, 21 Mar 2015 18:47:11 +0000 (14:47 -0400)]
process_vm_access: switch to {compat_,}import_iovec()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Tue, 17 Mar 2015 13:59:38 +0000 (09:59 -0400)]
switch keyctl_instantiate_key_common() to iov_iter
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sat, 21 Mar 2015 23:40:11 +0000 (19:40 -0400)]
switch {compat_,}do_readv_writev() to {compat_,}import_iovec()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sat, 21 Mar 2015 23:34:53 +0000 (19:34 -0400)]
aio_setup_vectored_rw(): switch to {compat_,}import_iovec()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sat, 21 Mar 2015 23:17:55 +0000 (19:17 -0400)]
vmsplice_to_user(): switch to import_iovec()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sat, 21 Mar 2015 23:11:55 +0000 (19:11 -0400)]
kill aio_setup_single_vector()
identical to import_single_range()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 12 Apr 2015 02:26:51 +0000 (22:26 -0400)]
Merge branch 'iov_iter' into for-next
Al Viro [Sat, 21 Mar 2015 00:40:18 +0000 (20:40 -0400)]
aio: simplify arguments of aio_setup_..._rw()
We don't need req in either of those. We don't need nr_segs in caller.
We don't really need len in caller either - iov_iter_count(&iter) will do.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sat, 21 Mar 2015 00:17:32 +0000 (20:17 -0400)]
aio: lift iov_iter_init() into aio_setup_..._rw()
the only non-trivial detail is that we do it before rw_verify_area(),
so we'd better cap the length ourselves in aio_setup_single_rw()
case (for vectored case rw_copy_check_uvector() will do that for us).
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sat, 21 Mar 2015 00:10:21 +0000 (20:10 -0400)]
lift iov_iter into {compat_,}do_readv_writev()
get it closer to matching {compat_,}rw_copy_check_uvector().
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 12 Apr 2015 02:24:41 +0000 (22:24 -0400)]
Merge branch 'iocb' into for-next
Andrew Elble [Mon, 23 Feb 2015 13:51:24 +0000 (08:51 -0500)]
NFS: fix BUG() crash in notify_change() with patch to chown_common()
We have observed a BUG() crash in fs/attr.c:notify_change(). The crash
occurs during an rsync into a filesystem that is exported via NFS.
1.) fs/attr.c:notify_change() modifies the caller's version of attr.
2.)
6de0ec00ba8d ("VFS: make notify_change pass ATTR_KILL_S*ID to
setattr operations") introduced a BUG() restriction such that "no
function will ever call notify_change() with both ATTR_MODE and
ATTR_KILL_S*ID set". Under some circumstances though, it will have
assisted in setting the caller's version of attr to this very
combination.
3.)
27ac0ffeac80 ("locks: break delegations on any attribute
modification") introduced code to handle breaking
delegations. This can result in notify_change() being re-called. attr
_must_ be explicitly reset to avoid triggering the BUG() established
in #2.
4.) The path that that triggers this is via fs/open.c:chmod_common().
The combination of attr flags set here and in the first call to
notify_change() along with a later failed break_deleg_wait()
results in notify_change() being called again via retry_deleg
without resetting attr.
Solution is to move retry_deleg in chmod_common() a bit further up to
ensure attr is completely reset.
There are other places where this seemingly could occur, such as
fs/utimes.c:utimes_common(), but the attr flags are not initially
set in such a way to trigger this.
Fixes:
27ac0ffeac80 ("locks: break delegations on any attribute modification")
Reported-by: Eric Meddaugh <etmsys@rit.edu>
Tested-by: Eric Meddaugh <etmsys@rit.edu>
Signed-off-by: Andrew Elble <aweits@rit.edu>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
J. Bruce Fields [Tue, 10 Feb 2015 15:55:53 +0000 (10:55 -0500)]
dcache: return -ESTALE not -EBUSY on distributed fs race
On a distributed filesystem it's possible for lookup to discover that a
directory it just found is already cached elsewhere in the directory
heirarchy. The dcache won't let us keep the directory in both places,
so we have to move the dentry to the new location from the place we
previously had it cached.
If the parent has changed, then this requires all the same locks as we'd
need to do a cross-directory rename. But we're already in lookup
holding one parent's i_mutex, so it's too late to acquire those locks in
the right order.
The (unreliable) solution in __d_unalias is to trylock() the required
locks and return -EBUSY if it fails.
I see no particular reason for returning -EBUSY, and -ESTALE is already
the result of some other lookup races on NFS. I think -ESTALE is the
more helpful error return. It also allows us to take advantage of the
logic Jeff Layton added in
c6a9428401c0 "vfs: fix renameat to retry on
ESTALE errors" and ancestors, which hopefully resolves some of these
errors before they're returned to userspace.
I can reproduce these cases using NFS with:
ssh root@$client '
mount -olookupcache=pos '$server':'$export' /mnt/
mkdir /mnt/TO
mkdir /mnt/DIR
touch /mnt/DIR/test.txt
while true; do
strace -e open cat /mnt/DIR/test.txt 2>&1 | grep EBUSY
done
'
ssh root@$server '
while true; do
mv $export/DIR $export/TO/DIR
mv $export/TO/DIR $export/DIR
done
'
It also helps to add some other concurrent use of the directory on the
client (e.g., "ls /mnt/TO"). And you can replace the server-side mv's
by client-side mv's that are repeatedly killed. (If the client is
interrupted while waiting for the RENAME response then it's left with a
dentry that has to go under one parent or the other, but it doesn't yet
know which.)
Acked-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Anton Altaparmakov [Wed, 11 Mar 2015 14:43:32 +0000 (10:43 -0400)]
NTFS: Version 2.1.32 - Update file write from aio_write to write_iter.
Signed-off-by: Anton Altaparmakov <anton@tuxera.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Anton Altaparmakov [Wed, 11 Mar 2015 14:43:31 +0000 (10:43 -0400)]
VFS: Add iov_iter_fault_in_multipages_readable()
simillar to iov_iter_fault_in_readable() but differs in that it is
not limited to faulting in the first iovec and instead faults in
"bytes" bytes iterating over the iovecs as necessary.
Also, instead of only faulting in the first and last page of the
range, all pages are faulted in.
This function is needed by NTFS when it does multi page file
writes.
Signed-off-by: Anton Altaparmakov <anton@tuxera.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Mon, 9 Mar 2015 03:36:51 +0000 (23:36 -0400)]
drop bogus check in file_open_root()
For one thing, LOOKUP_DIRECTORY will be dealt with in do_last().
For another, name can be an empty string, but not NULL - no callers
pass that and it would oops immediately if they would.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 8 Mar 2015 23:28:30 +0000 (19:28 -0400)]
switch security_inode_getattr() to struct path *
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 8 Mar 2015 23:24:30 +0000 (19:24 -0400)]
constify tomoyo_realpath_from_path()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Mon, 23 Feb 2015 10:46:21 +0000 (05:46 -0500)]
whack-a-mole: there's no point doing set_fs(USER_DS) in sigframe setup
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Mon, 23 Feb 2015 08:21:31 +0000 (03:21 -0500)]
whack-a-mole: no need to set_fs(USER_DS) in {start,flush}_thread()
flush_old_exec() has already done that. Back on 2011 a bunch of
instances like that had been kicked out, but that hadn't taken
care of then-out-of-tree architectures, obviously, and they served
as reinfection vector...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Mon, 23 Feb 2015 07:49:48 +0000 (02:49 -0500)]
remove incorrect comment in lookup_one_len()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Mon, 23 Feb 2015 07:44:36 +0000 (02:44 -0500)]
namei.c: fold do_path_lookup() into both callers
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Mon, 23 Feb 2015 01:07:13 +0000 (20:07 -0500)]
kill struct filename.separate
just make const char iname[] the last member and compare name->name with
name->iname instead of checking name->separate
We need to make sure that out-of-line name doesn't end up allocated adjacent
to struct filename refering to it; fortunately, it's easy to achieve - just
allocate that struct filename with one byte in ->iname[], so that ->iname[0]
will be inside the same object and thus have an address different from that
of out-of-line name [spotted by Boqun Feng <boqun.feng@gmail.com>]
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Tue, 16 Dec 2014 02:39:31 +0000 (21:39 -0500)]
new helper: msg_data_left()
convert open-coded instances
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sat, 11 Apr 2015 19:51:09 +0000 (15:51 -0400)]
Merge remote-tracking branch 'dh/afs' into for-davem
Al Viro [Thu, 11 Dec 2014 05:02:50 +0000 (00:02 -0500)]
get rid of the size argument of sock_sendmsg()
it's equal to iov_iter_count(&msg->msg_iter) in all cases
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Wed, 8 Apr 2015 21:00:32 +0000 (17:00 -0400)]
ocfs2: _really_ sync the right range
"ocfs2 syncs the wrong range" had been broken; prior to it the
code was doing the wrong thing in case of O_APPEND, all right,
but _after_ it we were syncing the wrong range in 100% cases.
*ppos, aka iocb->ki_pos is incremented prior to that point,
so we are always doing sync on the area _after_ the one we'd
written to.
Spotted by Joseph Qi <joseph.qi@huawei.com> back in January;
unfortunately, I'd missed his mail back then ;-/
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sat, 21 Mar 2015 23:56:16 +0000 (19:56 -0400)]
switch kernel_sendmsg() and kernel_recvmsg() to iov_iter_kvec()
For kernel_sendmsg() that eliminates the need to play with setfs();
for kernel_recvmsg() it does *not* - a couple of callers are using
it with non-NULL ->msg_control, which would be treated as userland
address on recvmsg side of things.
In all cases we are really setting a kvec-backed iov_iter, though.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sat, 21 Mar 2015 23:29:06 +0000 (19:29 -0400)]
net: switch importing msghdr from userland to {compat_,}import_iovec()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sat, 21 Mar 2015 23:12:32 +0000 (19:12 -0400)]
net: switch sendto() and recvfrom() to import_single_range()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Thu, 9 Apr 2015 04:02:06 +0000 (00:02 -0400)]
Merge branch 'iov_iter' into for-davem
Al Viro [Thu, 9 Apr 2015 04:00:30 +0000 (00:00 -0400)]
Merge branch 'iocb' into for-davem
trivial conflict in net/socket.c and non-trivial one in crypto -
that one had evaded aio_complete() removal.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Wed, 8 Apr 2015 19:45:02 +0000 (15:45 -0400)]
ocfs2_file_write_iter: keep return value and current position update in sync
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Wed, 8 Apr 2015 19:41:17 +0000 (15:41 -0400)]
[regression] ocfs2: do *not* increment ->ki_pos twice
generic_file_direct_write() already does that. Broken by
"ocfs2: do not fallback to buffer I/O write if appending"
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
David S. Miller [Tue, 7 Apr 2015 15:47:52 +0000 (11:47 -0400)]
Merge branch 'for-upstream' of git://git./linux/kernel/git/bluetooth/bluetooth-next
Johan Hedberg says:
====================
pull request: bluetooth-next 2015-04-04
Here's what's probably the last bluetooth-next pull request for 4.1:
- Fixes for LE advertising data & advertising parameters
- Fix for race condition with HCI_RESET flag
- New BNEPGETSUPPFEAT ioctl, needed for certification
- New HCI request callback type to get the resulting skb
- Cleanups to use BIT() macro wherever possible
- Consolidate Broadcom device entries in the btusb HCI driver
- Check for valid flags in CMTP, HIDP & BNEP
- Disallow local privacy & OOB data combo to prevent a potential race
- Expose SMP & ECDH selftest results through debugfs
- Expose current Device ID info through debugfs
Please let me know if there are any issues pulling. Thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 7 Apr 2015 01:52:19 +0000 (21:52 -0400)]
Merge git://git./linux/kernel/git/davem/net
Conflicts:
drivers/net/ethernet/mellanox/mlx4/cmd.c
net/core/fib_rules.c
net/ipv4/fib_frontend.c
The fib_rules.c and fib_frontend.c conflicts were locking adjustments
in 'net' overlapping addition and removal of code in 'net-next'.
The mlx4 conflict was a bug fix in 'net' happening in the same
place a constant was being replaced with a more suitable macro.
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Mon, 6 Apr 2015 22:39:45 +0000 (15:39 -0700)]
Linux 4.0-rc7