Trond Myklebust [Mon, 20 Mar 2006 18:44:41 +0000 (13:44 -0500)]
lockd: Fix Oopses due to list manipulation errors.
The patch "stop abusing file_lock_list introduces a couple of bugs since
the locks may be copied and need to be removed from the lists when they are
destroyed.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Christoph Hellwig [Mon, 20 Mar 2006 18:44:40 +0000 (13:44 -0500)]
lockd: stop abusing file_lock_list
Currently lockd directly access the file_lock_list from fs/locks.c.
It does so to mark locks granted or reclaimable. This is very
suboptimal, because a) lockd needs to poke into locks.c internals, and
b) it needs to iterate over all locks in the system for marking locks
granted or reclaimable.
This patch adds lists for granted and reclaimable locks to the nlm_host
structure instead, and adds locks to those.
nlmclnt_lock:
now adds the lock to h_granted instead of setting the
NFS_LCK_GRANTED, still O(1)
nlmclnt_mark_reclaim:
goes away completely, replaced by a list_splice_init.
Complexity reduced from O(locks in the system) to O(1)
reclaimer:
iterates over h_reclaim now, complexity reduced from
O(locks in the system) to O(locks per nlm_host)
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Mon, 20 Mar 2006 18:44:40 +0000 (13:44 -0500)]
lockd: Make lockd use rpc_new_client() instead of rpc_create_client
When doing NLM_GRANTED requests, lockd may end up blocking if we use
rpc_create_client() due to the synchronous call to rpc_ping(). Instead, use
rpc_new_client().
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Mon, 20 Mar 2006 18:44:39 +0000 (13:44 -0500)]
lockd: Make nlmsvc_create_block() use nlmsvc_lookup_host()
Currently it uses nlmclnt_lookup_host(), which puts the resulting host
structure on a different list.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Mon, 20 Mar 2006 18:44:39 +0000 (13:44 -0500)]
lockd: Clean up of the server-side GRANTED code
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Mon, 20 Mar 2006 18:44:39 +0000 (13:44 -0500)]
lockd: Add refcounting to struct nlm_block
Otherwise, the block may disappear from underneath us when in
nlmsvc_retry_blocked.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Mon, 20 Mar 2006 18:44:38 +0000 (13:44 -0500)]
lockd: Fix server-side lock blocking code
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Mon, 20 Mar 2006 18:44:38 +0000 (13:44 -0500)]
lockd: posix_test_lock() should not call locks_copy_lock()
The caller of posix_test_lock() should never need to look at the lock
private data, so do not copy that information. This also means that there
is no need to call the fl_release_private methods.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Mon, 20 Mar 2006 18:44:37 +0000 (13:44 -0500)]
NFS: Uninline nfs_writedata_(alloc|free) and nfs_readdata_(alloc|free)
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Mon, 20 Mar 2006 18:44:37 +0000 (13:44 -0500)]
NFS: Debugging code for nfs_direct_(read|write)_schedule()
Make sure that we're doing our list accounting correctly.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Mon, 20 Mar 2006 18:44:36 +0000 (13:44 -0500)]
NFS: O_DIRECT async IO may lose context
The struct nfs_direct_req currently keeps a pointer to the file descriptor
without referencing it. This may cause problems if the parent process is
killed.
The nfs_open_context should normally have all the information that we're
currently using the filp for, and unlike fput(), is safe to release from
an rpciod process context.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Mon, 20 Mar 2006 18:44:36 +0000 (13:44 -0500)]
nfs: Use UNSTABLE + COMMIT for NFS O_DIRECT writes
Currently NFS O_DIRECT writes use FILE_SYNC so that a COMMIT is not
necessary. This simplifies the internal logic, but this could be a
difficult workload for some servers.
Instead, let's send UNSTABLE writes, and after they all complete, send a
COMMIT for the dirty range. After the COMMIT returns successfully, then do
the wake_up or fire off aio_complete().
Test plan:
Async direct I/O tests against Solaris (or any server that requires
committed unstable writes). Reboot server during test.
Based on an earlier patch by Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Mon, 20 Mar 2006 18:44:35 +0000 (13:44 -0500)]
NFS: Make nfs_commit_alloc() extern
We need to use nfs_commit_alloc() in fs/nfs/direct.c.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Mon, 20 Mar 2006 18:44:35 +0000 (13:44 -0500)]
NFS: fix data_update accounting in NFS direct I/O path
^C against "iozone -I" is hitting the assertion in nfs_clear_inode().
Test plan:
"iozone -i0 -I -a -c" against a slow server, then control C. This should
not cause an oops.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Mon, 20 Mar 2006 18:44:34 +0000 (13:44 -0500)]
NFS: Replace atomic_t variables in nfs_direct_req with a single spin lock
Three atomic_t variables cause a lot of bus locking. Because they are all
used in the same places in the code, just use a single spin lock.
Now that the atomic_t variables are gone, we can remove the request size
limitation since the code no longer depends on the limited width of atomic_t
on some platforms.
Test plan:
Compile with CONFIG_NFS and CONFIG_NFS_DIRECTIO enabled. Millions of fsx
operations, iozone, OraSim.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Mon, 20 Mar 2006 18:44:34 +0000 (13:44 -0500)]
NFS: clean up comments and tab damage in direct.c
Clean up tab damage and comments. Replace "file_offset" with more commonly
used "pos".
Test plan:
Compile with CONFIG_NFS and CONFIG_NFS_DIRECTIO enabled.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Mon, 20 Mar 2006 18:44:33 +0000 (13:44 -0500)]
NFS: support EIOCBQUEUED return in direct write path
For async iocb's, the NFS direct write path now returns EIOCBQUEUED,
and calls aio_complete when all the requested writes are finished. The
synchronous part of the NFS direct write path behaves exactly as it
was before.
Shared mapped NFS files will have some coherency difficulties when
accessed concurrently with aio+dio. Will need to explore how this
is handled in the local file system case.
Test plan:
aio-stress with "-O". OraSim.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Mon, 20 Mar 2006 18:44:33 +0000 (13:44 -0500)]
NFS: make iocb available everywhere in direct write path
Pass the iocb argument all the way down to the direct write request
scheduler, and make it available in nfs_direct_write_result.
Test plan:
Compile the kernel with CONFIG_NFS and CONFIG_NFS_DIRECTIO enabled.
Millions of fsx-odirect ops. OraSim.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Mon, 20 Mar 2006 18:44:32 +0000 (13:44 -0500)]
NFS: remove support for multi-segment iovs in the direct write path
Eliminate the persistent use of automatic storage in all parts of the
NFS client's direct write path to pave the way for introducing support
for aio against files opened with the O_DIRECT flag.
Test plan:
Compile the kernel with CONFIG_NFS and CONFIG_NFS_DIRECTIO enabled.
Millions of fsx-odirect ops. OraSim.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Mon, 20 Mar 2006 18:44:32 +0000 (13:44 -0500)]
NFS: make direct write path generate write requests concurrently
Duplicate infrastructure from direct read path that will allow write
path to generate multiple write requests concurrently. This will
enable us to add support for aio in this path.
Temporarily we will lose the ability to do UNSTABLE writes followed by
a COMMIT in the direct write path. However, all applications I am
aware of that use NFS O_DIRECT currently write in relatively small
chunks, so this should not be inconvenient in any way.
Test plan:
Millions of fsx-odirect ops. OraSim.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Mon, 20 Mar 2006 18:44:31 +0000 (13:44 -0500)]
NFS: create common routine for handling direct I/O completion
Factor out the common piece of completing an NFS direct I/O request.
Test plan:
Compile kernel with CONFIG_NFS and CONFIG_NFS_DIRECTIO enabled.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Mon, 20 Mar 2006 18:44:31 +0000 (13:44 -0500)]
NFS: create common routine for allocating nfs_direct_req
Factor out a small common piece of the path that allocate nfs_direct_req
structures.
Test plan:
Compile kernel with CONFIG_NFS and CONFIG_NFS_DIRECTIO enabled.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Mon, 20 Mar 2006 18:44:31 +0000 (13:44 -0500)]
NFS: create common routine for waiting for direct I/O to complete
We're about to add asynchrony to the NFS direct write path. Begin by
abstracting out the common pieces in the read path.
The first piece is nfs_direct_read_wait, which works the same whether the
process is waiting for a read or a write.
Test plan:
Compile kernel with CONFIG_NFS and CONFIG_NFS_DIRECTIO enabled.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Mon, 20 Mar 2006 18:44:30 +0000 (13:44 -0500)]
NFS: support EIOCBQUEUED return in direct read path
For async iocb's, the NFS direct read path should return EIOCBQUEUED and
call aio_complete when all the requested reads are finished. The
synchronous part of the NFS direct read path behaves exactly as it was
before.
Test plan:
aio-stress with "-O". OraSim.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Mon, 20 Mar 2006 18:44:30 +0000 (13:44 -0500)]
NFS: make iocb available everywhere in direct read path
Pass the iocb argument all the way down to the direct read request
scheduler, and make it available in nfs_direct_read_result.
Test plan:
Compile the kernel with CONFIG_NFS and CONFIG_NFS_DIRECTIO enabled.
Millions of fsx-odirect ops. OraSim.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Mon, 20 Mar 2006 18:44:29 +0000 (13:44 -0500)]
NFS: remove support for multi-segment iovs in the direct read path
Eliminate the persistent use of automatic storage in all parts of the NFS
client's direct read path to pave the way for introducing support for aio
against files opened with the O_DIRECT flag.
Test plan:
Compile the kernel with CONFIG_NFS and CONFIG_NFS_DIRECTIO enabled.
Millions of fsx-odirect ops. OraSim.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Mon, 20 Mar 2006 18:44:29 +0000 (13:44 -0500)]
NFS: use size_t type for holding rsize bytes in NFS O_DIRECT read path
size_t is used for holding byte counts, so use it for variables storing rsize.
Note that the write path will be updated as we add support for async
O_DIRECT writes.
Test plan:
Need to verify that existing comparisons against new size_t variables behave
correctly.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Mon, 20 Mar 2006 18:44:28 +0000 (13:44 -0500)]
NFS: update comments and function definitions in fs/nfs/direct.c
Update to latest coding style standards. Remove block comments on
statically defined functions, and place function definitions all on
one line.
Test plan:
Compile kernel with CONFIG_NFS and CONFIG_NFS_DIRECTIO.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Mon, 20 Mar 2006 18:44:28 +0000 (13:44 -0500)]
NFS: clean up NFS client's a_ops->direct_IO method
The NFS client's a_ops->direct_IO method, nfs_direct_IO, is required to
be present to allow NFS files to be opened with O_DIRECT, but is never
called because the NFS client shunts reads and writes to files opened
with O_DIRECT directly to its own routines.
Gut the nfs_direct_IO function. This eliminates the only part of the
NFS client's direct I/O path that requires support for multi-segment
iovs, allowing further simplification in subsequent patches.
Test plan:
Compile the kernel with CONFIG_NFS and CONFIG_NFS_DIRECTIO enabled. Millions
of fsx-odirect ops. OraSim.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Mon, 20 Mar 2006 18:44:27 +0000 (13:44 -0500)]
NFS: Cleanup of NFS read code
Same callback hierarchy inversion as for the NFS write calls. This patch is
not strictly speaking needed by the O_DIRECT code, but avoids confusing
differences between the asynchronous read and write code.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Mon, 20 Mar 2006 18:44:27 +0000 (13:44 -0500)]
NFS: Cleanup of NFS write code in preparation for asynchronous o_direct
This patch inverts the callback hierarchy for NFS write calls.
Instead of having the NFSv2/v3/v4-specific code set up the RPC callback
ops, we allow the original caller to do so. This allows for more
flexibility w.r.t. how to set up and tear down the nfs_write_data
structure while still allowing the NFSv3/v4 code to perform error
handling.
The greater flexibility is needed by the asynchronous O_DIRECT code, which
wants to be able to hold on to the original nfs_write_data structures after
the WRITE RPC call has completed in order to be able to replay them if the
COMMIT call determines that the server has rebooted.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
J. Bruce Fields [Mon, 20 Mar 2006 18:44:26 +0000 (13:44 -0500)]
lockd: Remove FL_LOCKD flag
Currently lockd identifies its own locks using the FL_LOCKD flag. This
doesn't scale well to multiple lock managers--if we did this in nfsv4 too,
for example, we'd be left with only one free flag bit.
Instead, we just check whether the file manager ops (fl_lmops) set on this
lock are our own.
The only use for this is in nlm_traverse_locks, which uses it to find locks
that need cleaning up when freeing a host or a file.
In the long run it might be nice to do reference counting instead of
traversing all the locks like this....
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Andy Adamson [Mon, 20 Mar 2006 18:44:26 +0000 (13:44 -0500)]
locks,lockd: fix race in nlmsvc_testlock
posix_test_lock() returns a pointer to a struct file_lock which is unprotected
and can be removed while in use by the caller. Move the conflicting lock from
the return to a parameter, and copy the conflicting lock.
In most cases the caller ends up putting the copy of the conflicting lock on
the stack. On i386, sizeof(struct file_lock) appears to be about 100 bytes.
We're assuming that's reasonable.
Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Andy Adamson [Mon, 20 Mar 2006 18:44:26 +0000 (13:44 -0500)]
locks: remove unused posix_block_lock
posix_lock_file() is used to add a blocked lock to Lockd's block, so
posix_block_lock() is no longer needed.
Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Andy Adamson [Mon, 20 Mar 2006 18:44:25 +0000 (13:44 -0500)]
lockd: make nlmsvc_lock use only posix_lock_file
Reorganize nlmsvc_lock() to make full use of posix_lock_file(), which does
eveything nlmsvc_lock() needs - no need to call posix_test_lock(),
posix_locks_deadlock(), or posix_block_lock() separately.
Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Andy Adamson [Mon, 20 Mar 2006 18:44:25 +0000 (13:44 -0500)]
lockd: simplify nlmsvc_grant_blocked
Reorganize nlmsvc_grant_blocked() to make full use of posix_lock_file(). Note
that there's no need for separate calls to posix_test_lock(),
posix_locks_deadlock(), or posix_block_lock().
Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Andy Adamson [Mon, 20 Mar 2006 18:44:24 +0000 (13:44 -0500)]
lockd: clean up nlmsvc_lock
Slightly more consistent dprintk error reporting, consolidate some up()'s.
Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Mon, 20 Mar 2006 18:44:24 +0000 (13:44 -0500)]
NFS: directory trace messages
Reuse NFSDBG_DIRCACHE and NFSDBG_LOOKUPCACHE to provide additional
diagnostic messages that trace the operation of the NFS client's
directory behavior. A few new messages are now generated when NFSDBG_VFS
is active, as well, to trace normal VFS activity. This compromise
provides better trace debugging for those who use pre-built kernels,
without adding a lot of extra noise to the standard debug settings.
Test-plan:
Enable NFS trace debugging with flags 1, 2, or 4. You should be able to
see different types of trace messages with each flag setting.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Mon, 20 Mar 2006 18:44:23 +0000 (13:44 -0500)]
SUNRPC: minor cleanup
RPC_DEBUG_DATA no longer needed in net/sunrpc/xprt.c.
Test plan:
Compile kernel with CONFIG_NFS enabled.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Mon, 20 Mar 2006 18:44:23 +0000 (13:44 -0500)]
SUNRPC: eliminate rpc_call()
Clean-up: replace rpc_call() helper with direct call to rpc_call_sync.
This makes NFSv2 and NFSv3 synchronous calls more computationally
efficient, and reduces stack consumption in functions that used to
invoke rpc_call more than once.
Test plan:
Compile kernel with CONFIG_NFS enabled. Connectathon on NFS version 2,
version 3, and version 4 mount points.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Mon, 20 Mar 2006 18:44:22 +0000 (13:44 -0500)]
SUNRPC: display human-readable procedure name in rpc_iostats output
Add fields to the rpc_procinfo struct that allow the display of a
human-readable name for each procedure in the rpc_iostats output.
Also fix it so that the NFSv4 stats are broken up correctly by
sub-procedure number. NFSv4 uses only two real RPC procedures:
NULL, and COMPOUND.
Test plan:
Mount with NFSv2, NFSv3, and NFSv4, and do "cat /proc/self/mountstats".
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Mon, 20 Mar 2006 18:44:22 +0000 (13:44 -0500)]
NFS: add RPC I/O statistics to /proc/self/mountstats
NFS client now shows various RPC I/O metrics in /proc/self/mountstats.
Test plan:
Mount/umount while doing "cat /proc/self/mountstats", multiple iterations
of connectathon locking suite. Test with NFS version 2, 3, and 4.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Mon, 20 Mar 2006 18:44:22 +0000 (13:44 -0500)]
SUNRPC: provide a mechanism for collecting stats in the RPC client
Add a simple mechanism for collecting stats in the RPC client. Stats are
tabulated during xprt_release. Note that per_cpu shenanigans are not
required here because the RPC client already serializes on the transport
write lock.
Test plan:
Compile kernel with CONFIG_NFS enabled. Basic performance regression
testing with high-speed networking and high performance server.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Mon, 20 Mar 2006 18:44:17 +0000 (13:44 -0500)]
SUNRPC: introduce per-task RPC iostats
Account for various things that occur while an RPC task is executed.
Separate timers for RPC round trip and RPC execution time show how
long RPC requests wait in queue before being sent. Eventually these
will be accumulated at xprt_release time in one place where they can
be viewed from userland.
Test plan:
Compile kernel with CONFIG_NFS enabled.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Mon, 20 Mar 2006 18:44:16 +0000 (13:44 -0500)]
SUNRPC: add a handful of per-xprt counters
Monitor generic transport events. Add a transport switch callout to
format transport counters for export to user-land.
Test plan:
Compile kernel with CONFIG_NFS enabled.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Mon, 20 Mar 2006 18:44:15 +0000 (13:44 -0500)]
SUNRPC: track length of RPC wait queues
RPC wait queue length will eventually be exported to userland via the RPC
iostats interface.
Test plan:
Compile kernel with CONFIG_NFS enabled.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Mon, 20 Mar 2006 18:44:15 +0000 (13:44 -0500)]
NFS: report how long an NFS file system has been mounted
Add a field in nfs_server to record a timestamp when a mount succeeds.
Report the number of seconds the file system has been mounted via
nfs_show_stats().
Test plan:
Mount an NFS file system, watch the mountstats reports and compare with
clock time.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Mon, 20 Mar 2006 18:44:14 +0000 (13:44 -0500)]
NFS: add hooks to account for NFSERR_JUKEBOX errors
Make an inode or an nfs_server struct available in the logic that handles
JUKEBOX/DELAY type errors so the NFS client can account for them.
This patch is split out from the main nfs iostat patch to highlight minor
architectural changes required to support this statistic.
Test plan:
None.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Mon, 20 Mar 2006 18:44:14 +0000 (13:44 -0500)]
NFS: add I/O performance counters
Invoke the byte and event counter macros where we want to count bytes and
events.
Clean-up: fix a possible NULL dereference in nfs_lock, and simplify
nfs_file_open.
Test-plan:
fsx and iozone on UP and SMP systems, with and without pre-emption. Watch
for memory overwrite bugs, and performance loss (significantly more CPU
required per op).
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Mon, 20 Mar 2006 18:44:13 +0000 (13:44 -0500)]
NFS: introduce mechanism for tracking NFS client metrics
Add a per-superblock performance counter facility to the NFS client. This
facility mimics the counters available for block devices and for
networking. Expose these new counters via the new /proc/self/mountstats
interface.
Thanks to Andrew Morton and Trond Myklebust for their review and comments.
Test plan:
fsx and iozone on UP and SMP systems, with and without pre-emption. Watch
for memory overwrite bugs, and performance loss (significantly more CPU
required per op).
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Mon, 20 Mar 2006 18:44:13 +0000 (13:44 -0500)]
NFS: clean up some mount options
Get rid of "lock" and "posix", and spell out "vers=".
Test plan:
None.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Mon, 20 Mar 2006 18:44:12 +0000 (13:44 -0500)]
NFS: show retransmit settings when displaying mount options
Sometimes it's important to know the exact RPC retransmit settings the
kernel is using for an NFS mount point. Add this facility to the NFS
client's show_options method.
Test plan:
Set various retransmit settings via the mount command, and check that the
settings are reflected in /proc/mounts.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Mon, 20 Mar 2006 18:44:12 +0000 (13:44 -0500)]
VFS: New /proc file /proc/self/mountstats
Create a new file under /proc/self, called mountstats, where mounted file
systems can export information (configuration options, performance counters,
and so on). Use a mechanism similar to /proc/mounts and s_ops->show_options.
This mechanism does not violate namespace security, and is safe to use while
other processes are unmounting file systems.
Thanks to Mike Waychison for his review and comments.
Test-plan:
Test concurrent mount/unmount operations while cat'ing /proc/self/mountstats.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Levent Serinol [Mon, 20 Mar 2006 18:44:11 +0000 (13:44 -0500)]
SUNRPC: more verbose output for rpc auth weak error
This patch adds server ip address to be printed out when "server
requires stronger authentication" error occured.
Signed-off-by: Levent Serinol <lserinol@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Goldwyn Rodrigues [Mon, 20 Mar 2006 18:44:11 +0000 (13:44 -0500)]
NFS: Code comments update in NFS
read_cache_mtime is no longer used in nfs_inode. This patch removes
references of read_cache_mtime in the code comments.
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Ingo Molnar [Mon, 20 Mar 2006 18:44:11 +0000 (13:44 -0500)]
NFS: sem2mutex idmap.c
semaphore to mutex conversion.
the conversion was generated via scripts, and the result was validated
automatically via a script as well.
build and boot tested.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Eric Sesterhenn [Mon, 20 Mar 2006 18:44:10 +0000 (13:44 -0500)]
NFS: kzalloc conversion in fs/nfs
this converts fs/nfs to kzalloc() usage.
compile tested with make allyesconfig
Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Mon, 20 Mar 2006 18:44:10 +0000 (13:44 -0500)]
NFSv4: Kill braindead gcc warnings
nfs4_open_revalidate: 'res' may be used uninitialized
nfs4_callback_compound: ‘hdr_res.nops’ may be used uninitialized
'op_nr’ may be used uninitialized
encode_getattr_res: ‘savep’ may be used uninitialized
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Mon, 20 Mar 2006 18:44:09 +0000 (13:44 -0500)]
NFSv4: Do not call rpciod_down() before call to destroy_nfsv4_state()
The reason is that the idmapper cleanup may call flush_workqueue() on
rpciod_workqueue.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Mon, 20 Mar 2006 18:44:09 +0000 (13:44 -0500)]
SUNRPC: Ensure that rpc_mkpipe returns a refcounted dentry
If not, we cannot guarantee that idmap->idmap_dentry, gss_auth->dentry and
clnt->cl_dentry are valid dentries.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Mon, 20 Mar 2006 18:44:08 +0000 (13:44 -0500)]
SUNRPC: Run rpci->queue_timeout on the rpciod workqueue instead of generic
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Olaf Kirch [Mon, 20 Mar 2006 18:44:08 +0000 (13:44 -0500)]
SUNRPC: Auto-load RPC authentication kernel modules
This patch adds a request_module call to rpcauth_create which will try
to auto-load the kernel module for the requested authentication flavor.
For kernels with modular sunrpc, this reduces the admin overhead for
the user.
Signed-off-by: Olaf Kirch <okir@suse.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Mon, 20 Mar 2006 18:44:08 +0000 (13:44 -0500)]
NFS: reduce the number of false cache invalidations.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Jesper Juhl [Mon, 20 Mar 2006 18:44:07 +0000 (13:44 -0500)]
NFS: "const static" vs "static const" in nfs4
My previous "const static" vs "static const" cleanup missed a single case,
patch below takes care of it.
Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Mon, 20 Mar 2006 18:44:07 +0000 (13:44 -0500)]
NFSv4: Don't invalidate cached attributes if change attribute is unchanged
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Mon, 20 Mar 2006 18:44:06 +0000 (13:44 -0500)]
NFS: writes should not clobber utimes() calls
Ensure that we flush out writes in the case when someone calls utimes() in
order to set the file times.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Mon, 20 Mar 2006 18:44:06 +0000 (13:44 -0500)]
lockd: Don't expose the process pid to the NLM server
Instead we use the nlm_lockowner->pid.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Mon, 20 Mar 2006 18:44:05 +0000 (13:44 -0500)]
NLM: nlm_alloc_call should not immediately fail on signal
Currently, nlm_alloc_call tests for a signal before it even tries to
allocate memory.
Fix it so that it tries at least once.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Mon, 20 Mar 2006 18:44:05 +0000 (13:44 -0500)]
VFS: Fix __posix_lock_file() copy of private lock area
The struct file_lock->fl_u area must be copied using the fl_copy_lock()
operation.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Neil Brown [Mon, 20 Mar 2006 18:44:04 +0000 (13:44 -0500)]
NFS: Fix buglet in fs/nfs/write.c
I've been reading through fs/nfs/write.c trying to track down a bug
that seems to be related to pages loosing a refcount and getting
freed too early (you interested in detail??) and I spotted a little
bug which the following patch should fix.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Mon, 20 Mar 2006 18:44:04 +0000 (13:44 -0500)]
NFS: Avoid races between writebacks and truncation
Currently, there is no serialisation between NFS asynchronous writebacks
and truncation at the page level due to the fact that nfs_sync_inode()
cannot lock the pages that it is about to write out.
This means that it is possible to be flushing out data (and calling something
like set_page_writeback()) while the page cache is busy evicting the page.
Oops...
Use the hooks provided in try_to_release_page() to ensure that dirty pages
are always written back to storage before we evict them.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Mon, 20 Mar 2006 18:44:03 +0000 (13:44 -0500)]
NFS: Fix a busy inodes issue...
The nfs_open_context may live longer than the file descriptor that spawned
it, so it needs to carry a reference to the vfsmount. If not, then
generic_shutdown_super() may end up being called before reads and writes
have been flushed out.
Make a couple of functions static while we're at it...
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Linus Torvalds [Mon, 20 Mar 2006 05:53:29 +0000 (21:53 -0800)]
Linux 2.6.16
Andrea Arcangeli [Sun, 19 Mar 2006 18:04:17 +0000 (19:04 +0100)]
[PATCH] Remove obsolete CREDITS address
This address is going to be obsolete, so I should update it.
Linus Torvalds [Mon, 20 Mar 2006 05:12:00 +0000 (21:12 -0800)]
Merge branch 'upstream' of git://ftp.linux-mips.org/upstream-linus
* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus:
[MIPS] SB1: Check for -mno-sched-prolog if building corelis debug kernel.
[MIPS] Sibyte: Fix race in sb1250_gettimeoffset().
[MIPS] Sibyte: Fix interrupt timer off by one bug.
[MIPS] Sibyte: Fix M_SCD_TIMER_INIT and M_SCD_TIMER_CNT wrong field width.
[MIPS] Protect more of timer_interrupt() by xtime_lock.
[MIPS] Work around bad code generation for <asm/io.h>.
[MIPS] Simple patch to power off DBAU1200
[MIPS] Fix DBAu1550 software power off.
[MIPS] local_r4k_flush_cache_page fix
[MIPS] SB1: Fix interrupt disable hazard.
[MIPS] Get rid of the IP22-specific code in arclib.
Update MAINTAINERS entry for MIPS.
Michael Chan [Sun, 19 Mar 2006 21:21:12 +0000 (13:21 -0800)]
[TG3]: 40-bit DMA workaround part 2
The 40-bit DMA workaround recently implemented for 5714, 5715, and
5780 needs to be expanded because there may be other tg3 devices
behind the EPB Express to PCIX bridge in the 5780 class device.
For example, some 4-port card or mother board designs have 5704 behind
the 5714.
All devices behind the EPB require the 40-bit DMA workaround.
Thanks to Chris Elmquist again for reporting the problem and testing
the patch.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ralf Baechle DL5RB [Sun, 19 Mar 2006 21:20:06 +0000 (13:20 -0800)]
[AX.25]: Fix potencial memory hole.
If the AX.25 dialect chosen by the sysadmin is set to DAMA master / 3
(or DAMA slave / 2, if CONFIG_AX25_DAMA_SLAVE=n) ax25_kick() will fall
through the switch statement without calling ax25_send_iframe() or any
other function that would eventually free skbn thus leaking the packet.
Fix by restricting the sysctl inferface to allow only actually supported
AX.25 dialects.
The system administration mistake needed for this to happen is rather
unlikely, so this is an uncritical hole.
Coverity #651.
Signed-off-by: Ralf Baechle DL5RB <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Krufky [Wed, 15 Mar 2006 05:36:13 +0000 (02:36 -0300)]
[PATCH] Kconfig: swap VIDEO_CX88_ALSA and VIDEO_CX88_DVB
VIDEO_CX88_ALSA should not be between VIDEO_CX88_DVB and
VIDEO_CX88_DVB_ALL_FRONTENDS
When cx88-alsa was added to cx88/Kconfig, it was added in between
VIDEO_CX88_DVB and VIDEO_CX88_DVB_ALL_FRONTENDS. This caused
undesireable effects to the appearance of the menu options in
menuconfig.
This fix reorders cx88-alsa and cx88-dvb in Kconfig, to match saa7134,
and restore the correct menuconfig appearance.
Signed-off-by: Michael Krufky <mkrufky@linuxtv.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Markus Rechberger [Tue, 7 Feb 2006 10:49:13 +0000 (08:49 -0200)]
[PATCH] Fixed em28xx based system lockup
Fixed em28xx based system lockup, device needs to be initialized before
starting the isoc transfer otherwise the system will completly lock up.
Signed-off-by: Markus Rechberger <mrechberger@gmail.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Oleg Nesterov [Sat, 18 Mar 2006 17:41:10 +0000 (20:41 +0300)]
[PATCH] disable unshare(CLONE_VM) for now
sys_unshare() does mmput(new_mm). This is not enough if we have
mm->core_waiters.
This patch is a temporary fix for soon to be released 2.6.16.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
[ Checked with Uli: "I'm not planning to use unshare(CLONE_VM). It's
not needed for any functionality planned so far. What we (as in Red
Hat) need unshare() for now is the filesystem side." ]
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Ralf Baechle [Sat, 18 Mar 2006 16:59:31 +0000 (16:59 +0000)]
[MIPS] SB1: Check for -mno-sched-prolog if building corelis debug kernel.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Ralf Baechle [Wed, 15 Mar 2006 00:03:29 +0000 (00:03 +0000)]
[MIPS] Sibyte: Fix race in sb1250_gettimeoffset().
From Dave Johnson <djohnson+linuxmips@sw.starentnetworks.com>:
sb1250_gettimeoffset() simply reads the current cpu 0 timer remaining
value, however once this counter reaches 0 and the interrupt is raised,
it immediately resets and begins to count down again.
If sb1250_gettimeoffset() is called on cpu 1 via do_gettimeofday() after
the timer has reset but prior to cpu 0 processing the interrupt and
taking write_seqlock() in timer_interrupt() it will return a full value
(or close to it) causing time to jump backwards 1ms. Once cpu 0 handles
the interrupt and timer_interrupt() gets far enough along it will jump
forward 1ms.
Fix this problem by implementing mips_hpt_*() on sb1250 using a spare
timer unrelated to the existing periodic interrupt timers. It runs at
1Mhz with a full 23bit counter. This eliminated the custom
do_gettimeoffset() for sb1250 and allowed use of the generic
fixed_rate_gettimeoffset() using mips_hpt_*() and timerhi/timerlo.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Ralf Baechle [Tue, 14 Mar 2006 23:52:47 +0000 (23:52 +0000)]
[MIPS] Sibyte: Fix interrupt timer off by one bug.
From Dave Johnson <djohnson+linuxmips@sw.starentnetworks.com>:
The timers need to be loaded with 1 less than the desired interval not
the interval itself.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Ralf Baechle [Tue, 14 Mar 2006 23:47:35 +0000 (23:47 +0000)]
[MIPS] Sibyte: Fix M_SCD_TIMER_INIT and M_SCD_TIMER_CNT wrong field width.
From Dave Johnson <djohnson+linuxmips@sw.starentnetworks.com>:
Field width should be 23 bits not 20 bits.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Ralf Baechle [Tue, 14 Mar 2006 23:46:58 +0000 (23:46 +0000)]
[MIPS] Protect more of timer_interrupt() by xtime_lock.
From Dave Johnson <djohnson+linuxmips@sw.starentnetworks.com>:
* do_timer() expects the arch-specific handler to take the lock as it
modifies jiffies[_64] and xtime.
* writing timerhi/lo in timer_interrupt() will mess up
fixed_rate_gettimeoffset() which reads timerhi/lo.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Ralf Baechle [Wed, 15 Mar 2006 11:36:31 +0000 (11:36 +0000)]
[MIPS] Work around bad code generation for <asm/io.h>.
If a call to set_io_port_base() was being followed by usage of
mips_io_port_base in the same function gcc was possibly using the old
value due to some clever abuse of const. Adding a barrier will keep
the optimization and result in correct code with latest gcc.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Matej Kupljen [Wed, 30 Nov 2005 09:20:01 +0000 (10:20 +0100)]
[MIPS] Simple patch to power off DBAU1200
Signed-off-by: Matej Kupljen <matej.kupljen@ultra.si>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Sergei Shtylylov [Tue, 14 Mar 2006 04:20:00 +0000 (07:20 +0300)]
[MIPS] Fix DBAu1550 software power off.
Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Atsushi Nemoto [Mon, 13 Mar 2006 09:23:03 +0000 (18:23 +0900)]
[MIPS] local_r4k_flush_cache_page fix
If dcache_size != icache_size or dcache_size != scache_size, or
set-associative cache, icache/scache does not flushed properly. Make
blast_?cache_page_indexed() masks its index value correctly. Also,
use physical address for physically indexed pcache/scache.
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Ralf Baechle [Mon, 13 Mar 2006 16:16:29 +0000 (16:16 +0000)]
[MIPS] SB1: Fix interrupt disable hazard.
The SB1 core has a three cycle interrupt disable hazard but we were
wrongly treating it as fully interlocked.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Ralf Baechle [Fri, 10 Mar 2006 19:47:17 +0000 (19:47 +0000)]
[MIPS] Get rid of the IP22-specific code in arclib.
This breaks the kernel build if sgiwd93 was configured as a module.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Ralf Baechle [Fri, 10 Mar 2006 13:47:21 +0000 (13:47 +0000)]
Update MAINTAINERS entry for MIPS.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Alexey Kuznetsov [Sat, 18 Mar 2006 00:05:43 +0000 (16:05 -0800)]
[NET]: Fix race condition in sk_wait_event().
It is broken, the condition is checked out of socket lock. It is
wonderful the bug survived for so long time.
[ This fixes bugzilla #6233:
race condition in tcp_sendmsg when connection became established ]
Signed-off-by: David S. Miller <davem@davemloft.net>
Hugh Dickins [Fri, 17 Mar 2006 07:04:09 +0000 (23:04 -0800)]
[PATCH] fix free swap cache latency
Lee Revell reported 28ms latency when process with lots of swapped memory
exits.
2.6.15 introduced a latency regression when unmapping: in accounting the
zap_work latency breaker, pte_none counted 1, pte_present PAGE_SIZE, but a
swap entry counted nothing at all. We think of pages present as the slow
case, but Lee's trace shows that free_swap_and_cache's radix tree lookup
can make a lot of work - and we could have been doing it many thousands of
times without a latency break.
Move the zap_work update up to account swap entries like pages present.
This does account non-linear pte_file entries, and unmap_mapping_range
skipping over swap entries, by the same amount even though they're quick:
but neither of those cases deserves complicating the code (and they're
treated no worse than they were in 2.6.14).
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Acked-by: Nick Piggin <npiggin@suse.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Sam Ravnborg [Fri, 17 Mar 2006 07:04:08 +0000 (23:04 -0800)]
[PATCH] kbuild: fix buffer overflow in modpost
Jiri Benc <jbenc@suse.cz> reported that modpost would stop with SIGABRT if
used with long filepaths.
The error looked like:
> Building modules, stage 2.
> MODPOST
> *** glibc detected *** scripts/mod/modpost: realloc(): invalid next size:
+0x0809f588 ***
> [...]
Fix this by allocating at least the required memory + SZ bytes each time.
Before we sometimes ended up allocating too little memory resuting in the
glibc detected bug above. Based on patch originally submitted by: Jiri
Benc <jbenc@suse.cz>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Peter Staubach [Fri, 17 Mar 2006 07:04:02 +0000 (23:04 -0800)]
[PATCH] nfsservctl(): remove user-triggerable printk
A user can use nfsservctl() to spam the logs.
This can happen because the arguments to the nfsservctl() system call are
versioned. This is a good thing. However, when a bad version is detected,
the kernel prints a message and then returns an error.
Signed-off-by: Peter Staubach <staubach@redhat.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Christoph Lameter [Fri, 17 Mar 2006 07:04:07 +0000 (23:04 -0800)]
[PATCH] fix race in pagevec_strip?
We can call try_to_release_page() with PagePrivate off and a valid
page->mapping This may cause all sorts of trouble for the filesystem
*_releasepage() handlers. XFS bombs out in that case.
Lock the page before checking for page private.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Kevin Corry [Fri, 17 Mar 2006 07:04:03 +0000 (23:04 -0800)]
[PATCH] dm stripe: Fix bounds
The dm-stripe target currently does not enforce that the size of a stripe
device be a multiple of the chunk-size. Under certain conditions, this can
lead to I/O requests going off the end of an underlying device. This
test-case shows one example.
echo "0 100 linear /dev/hdb1 0" | dmsetup create linear0
echo "0 100 linear /dev/hdb1 100" | dmsetup create linear1
echo "0 200 striped 2 32 /dev/mapper/linear0 0 /dev/mapper/linear1 0" | \
dmsetup create stripe0
dd if=/dev/zero of=/dev/mapper/stripe0 bs=1k
This will produce the output:
dd: writing '/dev/mapper/stripe0': Input/output error
97+0 records in
96+0 records out
And in the kernel log will be:
attempt to access beyond end of device
dm-0: rw=0, want=104, limit=100
The patch will check that the table size is a multiple of the stripe
chunk-size when the table is created, which will prevent the above striped
device from being created.
This should not affect tools like LVM or EVMS, since in all the cases I can
think of, striped devices are always created with the sizes being a
multiple of the chunk-size.
The size of a stripe device must be a multiple of its chunk-size.
(akpm: that typecast is quite gratuitous)
Signed-off-by: Kevin Corry <kevcorry@us.ibm.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Srivatsa Vaddagiri [Fri, 17 Mar 2006 07:04:06 +0000 (23:04 -0800)]
[PATCH] x86: check for online cpus before bringing them up
Bryce reported a bug wherein offlining CPU0 (on x86 box) and then
subsequently onlining it resulted in a lockup.
On x86, CPU0 is never offlined. The subsequent attempt to online CPU0
doesn't take that into account. It actually tries to bootup the already
booted CPU. Following patch fixes the problem (as acknowledged by Bryce).
Please consider for inclusion in 2.6.16.
Check if cpu is already online.
Signed-off-by: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Eric Van Hensbergen [Fri, 17 Mar 2006 07:04:04 +0000 (23:04 -0800)]
[PATCH] v9fs: fix overzealous dropping of dentry which breaks dcache
There is a d_drop in dir_release which caused problems as it invalidates
dcache entries too soon. This was likely a part of the wierd cwd behavior
folks were seeing.
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>