J. Bruce Fields [Tue, 21 Mar 2006 04:24:13 +0000 (23:24 -0500)]
LOCKD: nlmsvc_traverse_blocks return is unused
Note that we never return non-zero.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
J. Bruce Fields [Tue, 21 Mar 2006 04:24:04 +0000 (23:24 -0500)]
SUNRPC,RPCSEC_GSS: fix krb5 sequence numbers.
Use a spinlock to ensure unique sequence numbers when creating krb5 gss tokens.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
J. Bruce Fields [Tue, 21 Mar 2006 04:23:42 +0000 (23:23 -0500)]
NFSv4: Dont list system.nfs4_acl for filesystems that don't support it.
Thanks to Frank Filz for pointing out that we list system.nfs4_acl extended
attribute even on filesystems where we don't actually support nfs4_acl.
This is inconsistent with the e.g. ext3 POSIX ACL behaviour, and seems to
annoy cp.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
J. Bruce Fields [Tue, 21 Mar 2006 04:23:11 +0000 (23:23 -0500)]
SUNRPC,RPCSEC_GSS: remove unnecessary kmalloc of a checksum
Remove unnecessary kmalloc of temporary space to hold the md5 result; it's
small enough to just put on the stack.
This code may be called to process rpc's necessary to perform writes, so
there's a potential deadlock whenever we kmalloc() here. After this a
couple kmalloc()'s still remain, to be removed soon.
This also fixes a rare double-free on error noticed by coverity.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Mon, 20 Mar 2006 23:11:10 +0000 (18:11 -0500)]
SUNRPC: Ensure rpc_call_async() always calls tk_ops->rpc_release()
Currently this will not happen if we exit before rpc_new_task() was called.
Also fix up rpc_run_task() to do the same (for consistency).
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Mon, 20 Mar 2006 18:44:51 +0000 (13:44 -0500)]
SUNRPC: Fix memory barriers for req->rq_received
We need to ensure that all writes to the XDR buffers are done before
req->rq_received is visible to other processors.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Mon, 20 Mar 2006 18:44:51 +0000 (13:44 -0500)]
NFS: Fix a race in nfs_sync_inode()
Kudos to Neil Brown for spotting the problem:
"in nfs_sync_inode, there is effectively the sequence:
nfs_wait_on_requests
nfs_flush_inode
nfs_commit_inode
This seems a bit racy to me as if the only requests are on the
->commit list, and nfs_commit_inode is called separately after
nfs_wait_on_requests completes, and before nfs_commit_inode start
(say: by nfs_write_inode) then none of these function will return
>0, yet there will be some pending request that aren't waited for."
The solution is to search for requests to wait upon, search for dirty
requests, and search for uncommitted requests while holding the
nfsi->req_lock
The patch also cleans up nfs_sync_inode(), getting rid of the redundant
FLUSH_WAIT flag. It turns out that we were always setting it.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Mon, 20 Mar 2006 18:44:50 +0000 (13:44 -0500)]
NFS: Clean up nfs_flush_list()
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Mon, 20 Mar 2006 18:44:50 +0000 (13:44 -0500)]
NFS: Fix a race with PG_private and nfs_release_page()
We don't need to set PG_private for readahead pages, since they never get
unlocked while I/O is in progress. However there is a small race in
nfs_readpage_release() whereby the page may be unlocked, and have
PG_private set.
Fix is to have PG_private set only for the case of writes...
Also fix a bug in nfs_clear_page_writeback(): Don't attempt to clear the
radix_tree tag if we've already deleted the radix tree entry.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Mon, 20 Mar 2006 18:44:49 +0000 (13:44 -0500)]
NFSv4: Ensure the callback daemon flushes signals
If the callback daemon is signalled, but is unable to exit because it still
has users, then we need to flush signals. If not, then svc_recv() can
never sleep, and so we hang.
If we flush signals, then we also have to be prepared to resend them when
we want the thread to exit.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Mon, 20 Mar 2006 18:44:49 +0000 (13:44 -0500)]
SUNRPC: Fix a 'Busy inodes' error in rpc_pipefs
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Mon, 20 Mar 2006 18:44:48 +0000 (13:44 -0500)]
NFS, NLM: Allow blocking locks to respect signals
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Mon, 20 Mar 2006 18:44:48 +0000 (13:44 -0500)]
NFS: Make nfs_fhget() return appropriate error values
Currently it returns NULL, which usually gets interpreted as ENOMEM. In
fact it can mean a host of issues.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Mon, 20 Mar 2006 18:44:48 +0000 (13:44 -0500)]
NFSv4: Fix an oops in nfs4_fill_super
The mount statistics patches introduced a call to nfs_free_iostats that is
not only redundant, but actually causes an oops.
Also fix a memory leak due to the lack of a call to nfs_free_iostats on
unmount.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Mon, 20 Mar 2006 18:44:47 +0000 (13:44 -0500)]
lockd: blocks should hold a reference to the nlm_file
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Mon, 20 Mar 2006 18:44:47 +0000 (13:44 -0500)]
NFSv4: SETCLIENTID_CONFIRM should handle NFS4ERR_DELAY/NFS4ERR_RESOURCE
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Mon, 20 Mar 2006 18:44:46 +0000 (13:44 -0500)]
NFSv4: Send the delegation stateid for SETATTR calls
In the case where we hold a delegation stateid, use that in for inside
SETATTR calls.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Mon, 20 Mar 2006 18:44:46 +0000 (13:44 -0500)]
NFSv4: Ensure nfs_callback_down() calls svc_destroy()
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Mon, 20 Mar 2006 18:44:45 +0000 (13:44 -0500)]
lockd: Fix a typo in nlmsvc_grant_release()
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Mon, 20 Mar 2006 18:44:45 +0000 (13:44 -0500)]
lockd: Add helper for *_RES callbacks
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Mon, 20 Mar 2006 18:44:45 +0000 (13:44 -0500)]
NLM: Add nlmclnt_release_call
Add a helper function to simplify the freeing of NLM client requests.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Mon, 20 Mar 2006 18:44:44 +0000 (13:44 -0500)]
NLM: Fix nlmclnt_test to not copy private part of locks
The struct file_lock does not carry a properly initialised lock,
so don't copy it as if it were.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Mon, 20 Mar 2006 18:44:44 +0000 (13:44 -0500)]
NLM: Simplify client locks
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Mon, 20 Mar 2006 18:44:43 +0000 (13:44 -0500)]
NFS: O_DIRECT needs to use a completion
Now that we have aio writes, it is possible for dreq->outstanding to be
zero, but for the I/O not to have completed. Convert struct nfs_direct_req
to use a completion to signal when the I/O is done.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Mon, 20 Mar 2006 18:44:43 +0000 (13:44 -0500)]
NFS: Clean up nfs_get_user_pages
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Mon, 20 Mar 2006 18:44:42 +0000 (13:44 -0500)]
NFS: fix compiler warnings on 64-bit platforms
Introduced by NFS aio+dio patches.
Test plan:
Compile kernel with CONFIG_NFS enabled on 64-bit hardware.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Mon, 20 Mar 2006 18:44:42 +0000 (13:44 -0500)]
SUNRPC: fix compile warnings on 64-bit platforms
Introduced by NFS metrics patch.
Test plan:
Compile kernel with CONFIG_NFS enabled on a 64-bit platform.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Mon, 20 Mar 2006 18:44:41 +0000 (13:44 -0500)]
NLM: nlmclnt_cancel_callback should accept NLM_LCK_DENIED errors
NLM_LCK_DENIED is a valid error return for an NLM_CANCEL call by the
client.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Mon, 20 Mar 2006 18:44:41 +0000 (13:44 -0500)]
lockd: Fix Oopses due to list manipulation errors.
The patch "stop abusing file_lock_list introduces a couple of bugs since
the locks may be copied and need to be removed from the lists when they are
destroyed.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Christoph Hellwig [Mon, 20 Mar 2006 18:44:40 +0000 (13:44 -0500)]
lockd: stop abusing file_lock_list
Currently lockd directly access the file_lock_list from fs/locks.c.
It does so to mark locks granted or reclaimable. This is very
suboptimal, because a) lockd needs to poke into locks.c internals, and
b) it needs to iterate over all locks in the system for marking locks
granted or reclaimable.
This patch adds lists for granted and reclaimable locks to the nlm_host
structure instead, and adds locks to those.
nlmclnt_lock:
now adds the lock to h_granted instead of setting the
NFS_LCK_GRANTED, still O(1)
nlmclnt_mark_reclaim:
goes away completely, replaced by a list_splice_init.
Complexity reduced from O(locks in the system) to O(1)
reclaimer:
iterates over h_reclaim now, complexity reduced from
O(locks in the system) to O(locks per nlm_host)
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Mon, 20 Mar 2006 18:44:40 +0000 (13:44 -0500)]
lockd: Make lockd use rpc_new_client() instead of rpc_create_client
When doing NLM_GRANTED requests, lockd may end up blocking if we use
rpc_create_client() due to the synchronous call to rpc_ping(). Instead, use
rpc_new_client().
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Mon, 20 Mar 2006 18:44:39 +0000 (13:44 -0500)]
lockd: Make nlmsvc_create_block() use nlmsvc_lookup_host()
Currently it uses nlmclnt_lookup_host(), which puts the resulting host
structure on a different list.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Mon, 20 Mar 2006 18:44:39 +0000 (13:44 -0500)]
lockd: Clean up of the server-side GRANTED code
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Mon, 20 Mar 2006 18:44:39 +0000 (13:44 -0500)]
lockd: Add refcounting to struct nlm_block
Otherwise, the block may disappear from underneath us when in
nlmsvc_retry_blocked.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Mon, 20 Mar 2006 18:44:38 +0000 (13:44 -0500)]
lockd: Fix server-side lock blocking code
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Mon, 20 Mar 2006 18:44:38 +0000 (13:44 -0500)]
lockd: posix_test_lock() should not call locks_copy_lock()
The caller of posix_test_lock() should never need to look at the lock
private data, so do not copy that information. This also means that there
is no need to call the fl_release_private methods.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Mon, 20 Mar 2006 18:44:37 +0000 (13:44 -0500)]
NFS: Uninline nfs_writedata_(alloc|free) and nfs_readdata_(alloc|free)
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Mon, 20 Mar 2006 18:44:37 +0000 (13:44 -0500)]
NFS: Debugging code for nfs_direct_(read|write)_schedule()
Make sure that we're doing our list accounting correctly.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Mon, 20 Mar 2006 18:44:36 +0000 (13:44 -0500)]
NFS: O_DIRECT async IO may lose context
The struct nfs_direct_req currently keeps a pointer to the file descriptor
without referencing it. This may cause problems if the parent process is
killed.
The nfs_open_context should normally have all the information that we're
currently using the filp for, and unlike fput(), is safe to release from
an rpciod process context.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Mon, 20 Mar 2006 18:44:36 +0000 (13:44 -0500)]
nfs: Use UNSTABLE + COMMIT for NFS O_DIRECT writes
Currently NFS O_DIRECT writes use FILE_SYNC so that a COMMIT is not
necessary. This simplifies the internal logic, but this could be a
difficult workload for some servers.
Instead, let's send UNSTABLE writes, and after they all complete, send a
COMMIT for the dirty range. After the COMMIT returns successfully, then do
the wake_up or fire off aio_complete().
Test plan:
Async direct I/O tests against Solaris (or any server that requires
committed unstable writes). Reboot server during test.
Based on an earlier patch by Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Mon, 20 Mar 2006 18:44:35 +0000 (13:44 -0500)]
NFS: Make nfs_commit_alloc() extern
We need to use nfs_commit_alloc() in fs/nfs/direct.c.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Mon, 20 Mar 2006 18:44:35 +0000 (13:44 -0500)]
NFS: fix data_update accounting in NFS direct I/O path
^C against "iozone -I" is hitting the assertion in nfs_clear_inode().
Test plan:
"iozone -i0 -I -a -c" against a slow server, then control C. This should
not cause an oops.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Mon, 20 Mar 2006 18:44:34 +0000 (13:44 -0500)]
NFS: Replace atomic_t variables in nfs_direct_req with a single spin lock
Three atomic_t variables cause a lot of bus locking. Because they are all
used in the same places in the code, just use a single spin lock.
Now that the atomic_t variables are gone, we can remove the request size
limitation since the code no longer depends on the limited width of atomic_t
on some platforms.
Test plan:
Compile with CONFIG_NFS and CONFIG_NFS_DIRECTIO enabled. Millions of fsx
operations, iozone, OraSim.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Mon, 20 Mar 2006 18:44:34 +0000 (13:44 -0500)]
NFS: clean up comments and tab damage in direct.c
Clean up tab damage and comments. Replace "file_offset" with more commonly
used "pos".
Test plan:
Compile with CONFIG_NFS and CONFIG_NFS_DIRECTIO enabled.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Mon, 20 Mar 2006 18:44:33 +0000 (13:44 -0500)]
NFS: support EIOCBQUEUED return in direct write path
For async iocb's, the NFS direct write path now returns EIOCBQUEUED,
and calls aio_complete when all the requested writes are finished. The
synchronous part of the NFS direct write path behaves exactly as it
was before.
Shared mapped NFS files will have some coherency difficulties when
accessed concurrently with aio+dio. Will need to explore how this
is handled in the local file system case.
Test plan:
aio-stress with "-O". OraSim.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Mon, 20 Mar 2006 18:44:33 +0000 (13:44 -0500)]
NFS: make iocb available everywhere in direct write path
Pass the iocb argument all the way down to the direct write request
scheduler, and make it available in nfs_direct_write_result.
Test plan:
Compile the kernel with CONFIG_NFS and CONFIG_NFS_DIRECTIO enabled.
Millions of fsx-odirect ops. OraSim.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Mon, 20 Mar 2006 18:44:32 +0000 (13:44 -0500)]
NFS: remove support for multi-segment iovs in the direct write path
Eliminate the persistent use of automatic storage in all parts of the
NFS client's direct write path to pave the way for introducing support
for aio against files opened with the O_DIRECT flag.
Test plan:
Compile the kernel with CONFIG_NFS and CONFIG_NFS_DIRECTIO enabled.
Millions of fsx-odirect ops. OraSim.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Mon, 20 Mar 2006 18:44:32 +0000 (13:44 -0500)]
NFS: make direct write path generate write requests concurrently
Duplicate infrastructure from direct read path that will allow write
path to generate multiple write requests concurrently. This will
enable us to add support for aio in this path.
Temporarily we will lose the ability to do UNSTABLE writes followed by
a COMMIT in the direct write path. However, all applications I am
aware of that use NFS O_DIRECT currently write in relatively small
chunks, so this should not be inconvenient in any way.
Test plan:
Millions of fsx-odirect ops. OraSim.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Mon, 20 Mar 2006 18:44:31 +0000 (13:44 -0500)]
NFS: create common routine for handling direct I/O completion
Factor out the common piece of completing an NFS direct I/O request.
Test plan:
Compile kernel with CONFIG_NFS and CONFIG_NFS_DIRECTIO enabled.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Mon, 20 Mar 2006 18:44:31 +0000 (13:44 -0500)]
NFS: create common routine for allocating nfs_direct_req
Factor out a small common piece of the path that allocate nfs_direct_req
structures.
Test plan:
Compile kernel with CONFIG_NFS and CONFIG_NFS_DIRECTIO enabled.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Mon, 20 Mar 2006 18:44:31 +0000 (13:44 -0500)]
NFS: create common routine for waiting for direct I/O to complete
We're about to add asynchrony to the NFS direct write path. Begin by
abstracting out the common pieces in the read path.
The first piece is nfs_direct_read_wait, which works the same whether the
process is waiting for a read or a write.
Test plan:
Compile kernel with CONFIG_NFS and CONFIG_NFS_DIRECTIO enabled.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Mon, 20 Mar 2006 18:44:30 +0000 (13:44 -0500)]
NFS: support EIOCBQUEUED return in direct read path
For async iocb's, the NFS direct read path should return EIOCBQUEUED and
call aio_complete when all the requested reads are finished. The
synchronous part of the NFS direct read path behaves exactly as it was
before.
Test plan:
aio-stress with "-O". OraSim.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Mon, 20 Mar 2006 18:44:30 +0000 (13:44 -0500)]
NFS: make iocb available everywhere in direct read path
Pass the iocb argument all the way down to the direct read request
scheduler, and make it available in nfs_direct_read_result.
Test plan:
Compile the kernel with CONFIG_NFS and CONFIG_NFS_DIRECTIO enabled.
Millions of fsx-odirect ops. OraSim.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Mon, 20 Mar 2006 18:44:29 +0000 (13:44 -0500)]
NFS: remove support for multi-segment iovs in the direct read path
Eliminate the persistent use of automatic storage in all parts of the NFS
client's direct read path to pave the way for introducing support for aio
against files opened with the O_DIRECT flag.
Test plan:
Compile the kernel with CONFIG_NFS and CONFIG_NFS_DIRECTIO enabled.
Millions of fsx-odirect ops. OraSim.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Mon, 20 Mar 2006 18:44:29 +0000 (13:44 -0500)]
NFS: use size_t type for holding rsize bytes in NFS O_DIRECT read path
size_t is used for holding byte counts, so use it for variables storing rsize.
Note that the write path will be updated as we add support for async
O_DIRECT writes.
Test plan:
Need to verify that existing comparisons against new size_t variables behave
correctly.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Mon, 20 Mar 2006 18:44:28 +0000 (13:44 -0500)]
NFS: update comments and function definitions in fs/nfs/direct.c
Update to latest coding style standards. Remove block comments on
statically defined functions, and place function definitions all on
one line.
Test plan:
Compile kernel with CONFIG_NFS and CONFIG_NFS_DIRECTIO.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Mon, 20 Mar 2006 18:44:28 +0000 (13:44 -0500)]
NFS: clean up NFS client's a_ops->direct_IO method
The NFS client's a_ops->direct_IO method, nfs_direct_IO, is required to
be present to allow NFS files to be opened with O_DIRECT, but is never
called because the NFS client shunts reads and writes to files opened
with O_DIRECT directly to its own routines.
Gut the nfs_direct_IO function. This eliminates the only part of the
NFS client's direct I/O path that requires support for multi-segment
iovs, allowing further simplification in subsequent patches.
Test plan:
Compile the kernel with CONFIG_NFS and CONFIG_NFS_DIRECTIO enabled. Millions
of fsx-odirect ops. OraSim.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Mon, 20 Mar 2006 18:44:27 +0000 (13:44 -0500)]
NFS: Cleanup of NFS read code
Same callback hierarchy inversion as for the NFS write calls. This patch is
not strictly speaking needed by the O_DIRECT code, but avoids confusing
differences between the asynchronous read and write code.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Mon, 20 Mar 2006 18:44:27 +0000 (13:44 -0500)]
NFS: Cleanup of NFS write code in preparation for asynchronous o_direct
This patch inverts the callback hierarchy for NFS write calls.
Instead of having the NFSv2/v3/v4-specific code set up the RPC callback
ops, we allow the original caller to do so. This allows for more
flexibility w.r.t. how to set up and tear down the nfs_write_data
structure while still allowing the NFSv3/v4 code to perform error
handling.
The greater flexibility is needed by the asynchronous O_DIRECT code, which
wants to be able to hold on to the original nfs_write_data structures after
the WRITE RPC call has completed in order to be able to replay them if the
COMMIT call determines that the server has rebooted.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
J. Bruce Fields [Mon, 20 Mar 2006 18:44:26 +0000 (13:44 -0500)]
lockd: Remove FL_LOCKD flag
Currently lockd identifies its own locks using the FL_LOCKD flag. This
doesn't scale well to multiple lock managers--if we did this in nfsv4 too,
for example, we'd be left with only one free flag bit.
Instead, we just check whether the file manager ops (fl_lmops) set on this
lock are our own.
The only use for this is in nlm_traverse_locks, which uses it to find locks
that need cleaning up when freeing a host or a file.
In the long run it might be nice to do reference counting instead of
traversing all the locks like this....
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Andy Adamson [Mon, 20 Mar 2006 18:44:26 +0000 (13:44 -0500)]
locks,lockd: fix race in nlmsvc_testlock
posix_test_lock() returns a pointer to a struct file_lock which is unprotected
and can be removed while in use by the caller. Move the conflicting lock from
the return to a parameter, and copy the conflicting lock.
In most cases the caller ends up putting the copy of the conflicting lock on
the stack. On i386, sizeof(struct file_lock) appears to be about 100 bytes.
We're assuming that's reasonable.
Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Andy Adamson [Mon, 20 Mar 2006 18:44:26 +0000 (13:44 -0500)]
locks: remove unused posix_block_lock
posix_lock_file() is used to add a blocked lock to Lockd's block, so
posix_block_lock() is no longer needed.
Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Andy Adamson [Mon, 20 Mar 2006 18:44:25 +0000 (13:44 -0500)]
lockd: make nlmsvc_lock use only posix_lock_file
Reorganize nlmsvc_lock() to make full use of posix_lock_file(), which does
eveything nlmsvc_lock() needs - no need to call posix_test_lock(),
posix_locks_deadlock(), or posix_block_lock() separately.
Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Andy Adamson [Mon, 20 Mar 2006 18:44:25 +0000 (13:44 -0500)]
lockd: simplify nlmsvc_grant_blocked
Reorganize nlmsvc_grant_blocked() to make full use of posix_lock_file(). Note
that there's no need for separate calls to posix_test_lock(),
posix_locks_deadlock(), or posix_block_lock().
Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Andy Adamson [Mon, 20 Mar 2006 18:44:24 +0000 (13:44 -0500)]
lockd: clean up nlmsvc_lock
Slightly more consistent dprintk error reporting, consolidate some up()'s.
Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Mon, 20 Mar 2006 18:44:24 +0000 (13:44 -0500)]
NFS: directory trace messages
Reuse NFSDBG_DIRCACHE and NFSDBG_LOOKUPCACHE to provide additional
diagnostic messages that trace the operation of the NFS client's
directory behavior. A few new messages are now generated when NFSDBG_VFS
is active, as well, to trace normal VFS activity. This compromise
provides better trace debugging for those who use pre-built kernels,
without adding a lot of extra noise to the standard debug settings.
Test-plan:
Enable NFS trace debugging with flags 1, 2, or 4. You should be able to
see different types of trace messages with each flag setting.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Mon, 20 Mar 2006 18:44:23 +0000 (13:44 -0500)]
SUNRPC: minor cleanup
RPC_DEBUG_DATA no longer needed in net/sunrpc/xprt.c.
Test plan:
Compile kernel with CONFIG_NFS enabled.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Mon, 20 Mar 2006 18:44:23 +0000 (13:44 -0500)]
SUNRPC: eliminate rpc_call()
Clean-up: replace rpc_call() helper with direct call to rpc_call_sync.
This makes NFSv2 and NFSv3 synchronous calls more computationally
efficient, and reduces stack consumption in functions that used to
invoke rpc_call more than once.
Test plan:
Compile kernel with CONFIG_NFS enabled. Connectathon on NFS version 2,
version 3, and version 4 mount points.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Mon, 20 Mar 2006 18:44:22 +0000 (13:44 -0500)]
SUNRPC: display human-readable procedure name in rpc_iostats output
Add fields to the rpc_procinfo struct that allow the display of a
human-readable name for each procedure in the rpc_iostats output.
Also fix it so that the NFSv4 stats are broken up correctly by
sub-procedure number. NFSv4 uses only two real RPC procedures:
NULL, and COMPOUND.
Test plan:
Mount with NFSv2, NFSv3, and NFSv4, and do "cat /proc/self/mountstats".
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Mon, 20 Mar 2006 18:44:22 +0000 (13:44 -0500)]
NFS: add RPC I/O statistics to /proc/self/mountstats
NFS client now shows various RPC I/O metrics in /proc/self/mountstats.
Test plan:
Mount/umount while doing "cat /proc/self/mountstats", multiple iterations
of connectathon locking suite. Test with NFS version 2, 3, and 4.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Mon, 20 Mar 2006 18:44:22 +0000 (13:44 -0500)]
SUNRPC: provide a mechanism for collecting stats in the RPC client
Add a simple mechanism for collecting stats in the RPC client. Stats are
tabulated during xprt_release. Note that per_cpu shenanigans are not
required here because the RPC client already serializes on the transport
write lock.
Test plan:
Compile kernel with CONFIG_NFS enabled. Basic performance regression
testing with high-speed networking and high performance server.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Mon, 20 Mar 2006 18:44:17 +0000 (13:44 -0500)]
SUNRPC: introduce per-task RPC iostats
Account for various things that occur while an RPC task is executed.
Separate timers for RPC round trip and RPC execution time show how
long RPC requests wait in queue before being sent. Eventually these
will be accumulated at xprt_release time in one place where they can
be viewed from userland.
Test plan:
Compile kernel with CONFIG_NFS enabled.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Mon, 20 Mar 2006 18:44:16 +0000 (13:44 -0500)]
SUNRPC: add a handful of per-xprt counters
Monitor generic transport events. Add a transport switch callout to
format transport counters for export to user-land.
Test plan:
Compile kernel with CONFIG_NFS enabled.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Mon, 20 Mar 2006 18:44:15 +0000 (13:44 -0500)]
SUNRPC: track length of RPC wait queues
RPC wait queue length will eventually be exported to userland via the RPC
iostats interface.
Test plan:
Compile kernel with CONFIG_NFS enabled.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Mon, 20 Mar 2006 18:44:15 +0000 (13:44 -0500)]
NFS: report how long an NFS file system has been mounted
Add a field in nfs_server to record a timestamp when a mount succeeds.
Report the number of seconds the file system has been mounted via
nfs_show_stats().
Test plan:
Mount an NFS file system, watch the mountstats reports and compare with
clock time.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Mon, 20 Mar 2006 18:44:14 +0000 (13:44 -0500)]
NFS: add hooks to account for NFSERR_JUKEBOX errors
Make an inode or an nfs_server struct available in the logic that handles
JUKEBOX/DELAY type errors so the NFS client can account for them.
This patch is split out from the main nfs iostat patch to highlight minor
architectural changes required to support this statistic.
Test plan:
None.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Mon, 20 Mar 2006 18:44:14 +0000 (13:44 -0500)]
NFS: add I/O performance counters
Invoke the byte and event counter macros where we want to count bytes and
events.
Clean-up: fix a possible NULL dereference in nfs_lock, and simplify
nfs_file_open.
Test-plan:
fsx and iozone on UP and SMP systems, with and without pre-emption. Watch
for memory overwrite bugs, and performance loss (significantly more CPU
required per op).
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Mon, 20 Mar 2006 18:44:13 +0000 (13:44 -0500)]
NFS: introduce mechanism for tracking NFS client metrics
Add a per-superblock performance counter facility to the NFS client. This
facility mimics the counters available for block devices and for
networking. Expose these new counters via the new /proc/self/mountstats
interface.
Thanks to Andrew Morton and Trond Myklebust for their review and comments.
Test plan:
fsx and iozone on UP and SMP systems, with and without pre-emption. Watch
for memory overwrite bugs, and performance loss (significantly more CPU
required per op).
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Mon, 20 Mar 2006 18:44:13 +0000 (13:44 -0500)]
NFS: clean up some mount options
Get rid of "lock" and "posix", and spell out "vers=".
Test plan:
None.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Mon, 20 Mar 2006 18:44:12 +0000 (13:44 -0500)]
NFS: show retransmit settings when displaying mount options
Sometimes it's important to know the exact RPC retransmit settings the
kernel is using for an NFS mount point. Add this facility to the NFS
client's show_options method.
Test plan:
Set various retransmit settings via the mount command, and check that the
settings are reflected in /proc/mounts.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Mon, 20 Mar 2006 18:44:12 +0000 (13:44 -0500)]
VFS: New /proc file /proc/self/mountstats
Create a new file under /proc/self, called mountstats, where mounted file
systems can export information (configuration options, performance counters,
and so on). Use a mechanism similar to /proc/mounts and s_ops->show_options.
This mechanism does not violate namespace security, and is safe to use while
other processes are unmounting file systems.
Thanks to Mike Waychison for his review and comments.
Test-plan:
Test concurrent mount/unmount operations while cat'ing /proc/self/mountstats.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Levent Serinol [Mon, 20 Mar 2006 18:44:11 +0000 (13:44 -0500)]
SUNRPC: more verbose output for rpc auth weak error
This patch adds server ip address to be printed out when "server
requires stronger authentication" error occured.
Signed-off-by: Levent Serinol <lserinol@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Goldwyn Rodrigues [Mon, 20 Mar 2006 18:44:11 +0000 (13:44 -0500)]
NFS: Code comments update in NFS
read_cache_mtime is no longer used in nfs_inode. This patch removes
references of read_cache_mtime in the code comments.
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Ingo Molnar [Mon, 20 Mar 2006 18:44:11 +0000 (13:44 -0500)]
NFS: sem2mutex idmap.c
semaphore to mutex conversion.
the conversion was generated via scripts, and the result was validated
automatically via a script as well.
build and boot tested.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Eric Sesterhenn [Mon, 20 Mar 2006 18:44:10 +0000 (13:44 -0500)]
NFS: kzalloc conversion in fs/nfs
this converts fs/nfs to kzalloc() usage.
compile tested with make allyesconfig
Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Mon, 20 Mar 2006 18:44:10 +0000 (13:44 -0500)]
NFSv4: Kill braindead gcc warnings
nfs4_open_revalidate: 'res' may be used uninitialized
nfs4_callback_compound: ‘hdr_res.nops’ may be used uninitialized
'op_nr’ may be used uninitialized
encode_getattr_res: ‘savep’ may be used uninitialized
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Mon, 20 Mar 2006 18:44:09 +0000 (13:44 -0500)]
NFSv4: Do not call rpciod_down() before call to destroy_nfsv4_state()
The reason is that the idmapper cleanup may call flush_workqueue() on
rpciod_workqueue.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Mon, 20 Mar 2006 18:44:09 +0000 (13:44 -0500)]
SUNRPC: Ensure that rpc_mkpipe returns a refcounted dentry
If not, we cannot guarantee that idmap->idmap_dentry, gss_auth->dentry and
clnt->cl_dentry are valid dentries.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Mon, 20 Mar 2006 18:44:08 +0000 (13:44 -0500)]
SUNRPC: Run rpci->queue_timeout on the rpciod workqueue instead of generic
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Olaf Kirch [Mon, 20 Mar 2006 18:44:08 +0000 (13:44 -0500)]
SUNRPC: Auto-load RPC authentication kernel modules
This patch adds a request_module call to rpcauth_create which will try
to auto-load the kernel module for the requested authentication flavor.
For kernels with modular sunrpc, this reduces the admin overhead for
the user.
Signed-off-by: Olaf Kirch <okir@suse.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Mon, 20 Mar 2006 18:44:08 +0000 (13:44 -0500)]
NFS: reduce the number of false cache invalidations.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Jesper Juhl [Mon, 20 Mar 2006 18:44:07 +0000 (13:44 -0500)]
NFS: "const static" vs "static const" in nfs4
My previous "const static" vs "static const" cleanup missed a single case,
patch below takes care of it.
Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Mon, 20 Mar 2006 18:44:07 +0000 (13:44 -0500)]
NFSv4: Don't invalidate cached attributes if change attribute is unchanged
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Mon, 20 Mar 2006 18:44:06 +0000 (13:44 -0500)]
NFS: writes should not clobber utimes() calls
Ensure that we flush out writes in the case when someone calls utimes() in
order to set the file times.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Mon, 20 Mar 2006 18:44:06 +0000 (13:44 -0500)]
lockd: Don't expose the process pid to the NLM server
Instead we use the nlm_lockowner->pid.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Mon, 20 Mar 2006 18:44:05 +0000 (13:44 -0500)]
NLM: nlm_alloc_call should not immediately fail on signal
Currently, nlm_alloc_call tests for a signal before it even tries to
allocate memory.
Fix it so that it tries at least once.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Mon, 20 Mar 2006 18:44:05 +0000 (13:44 -0500)]
VFS: Fix __posix_lock_file() copy of private lock area
The struct file_lock->fl_u area must be copied using the fl_copy_lock()
operation.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Neil Brown [Mon, 20 Mar 2006 18:44:04 +0000 (13:44 -0500)]
NFS: Fix buglet in fs/nfs/write.c
I've been reading through fs/nfs/write.c trying to track down a bug
that seems to be related to pages loosing a refcount and getting
freed too early (you interested in detail??) and I spotted a little
bug which the following patch should fix.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Mon, 20 Mar 2006 18:44:04 +0000 (13:44 -0500)]
NFS: Avoid races between writebacks and truncation
Currently, there is no serialisation between NFS asynchronous writebacks
and truncation at the page level due to the fact that nfs_sync_inode()
cannot lock the pages that it is about to write out.
This means that it is possible to be flushing out data (and calling something
like set_page_writeback()) while the page cache is busy evicting the page.
Oops...
Use the hooks provided in try_to_release_page() to ensure that dirty pages
are always written back to storage before we evict them.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Mon, 20 Mar 2006 18:44:03 +0000 (13:44 -0500)]
NFS: Fix a busy inodes issue...
The nfs_open_context may live longer than the file descriptor that spawned
it, so it needs to carry a reference to the vfsmount. If not, then
generic_shutdown_super() may end up being called before reads and writes
have been flushed out.
Make a couple of functions static while we're at it...
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>