Trond Myklebust [Wed, 11 Jun 2008 19:44:18 +0000 (15:44 -0400)]
NFS: Remove the BKL from the permission checking code
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Wed, 11 Jun 2008 17:26:14 +0000 (13:26 -0400)]
NFS: Remove attribute update related BKL references
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Wed, 11 Jun 2008 16:21:19 +0000 (12:21 -0400)]
NFS: Remove BKL requirement from attribute updates
The main problem is dealing with inode->i_size: we need to set the
inode->i_lock on all attribute updates, and so vmtruncate won't cut it.
Make an NFS-private version of vmtruncate that has the necessary locking
semantics.
The result should be that the following inode attribute updates are
protected by inode->i_lock
nfsi->cache_validity
nfsi->read_cache_jiffies
nfsi->attrtimeo
nfsi->attrtimeo_timestamp
nfsi->change_attr
nfsi->last_updated
nfsi->cache_change_attribute
nfsi->access_cache
nfsi->access_cache_entry_lru
nfsi->access_cache_inode_lru
nfsi->acl_access
nfsi->acl_default
nfsi->nfs_page_tree
nfsi->ncommit
nfsi->npages
nfsi->open_files
nfsi->silly_list
nfsi->acl
nfsi->open_states
inode->i_size
inode->i_atime
inode->i_mtime
inode->i_ctime
inode->i_nlink
inode->i_uid
inode->i_gid
The following is protected by dir->i_mutex
nfsi->cookieverf
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Wed, 11 Jun 2008 19:44:04 +0000 (15:44 -0400)]
NFS: Protect inode->i_nlink updates using inode->i_lock
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Felix Blyakher [Tue, 15 Jul 2008 17:40:22 +0000 (12:40 -0500)]
nfs: set correct fl_len in nlmclnt_test()
fcntl(F_GETLK) on an nfs client incorrectly returns
the values for the conflicting lock. fl_len value is
always 1.
If the conflicting lock is (0, 4095) the F_GETLK
request for (1024, 10) returns (0, 1), which doesn't
even cover the requested range, and is quite confusing.
The fix is trivial, set fl_end from the fl_end value
recieved from the nfs server.
Signed-off-by: Felix Blyakher <felixb@sgi.com>
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Mon, 14 Jul 2008 20:03:30 +0000 (16:03 -0400)]
SUNRPC: Support registering IPv6 interfaces with local rpcbind daemon
Introduce a new API to register RPC services on IPv6 interfaces to allow
the NFS server and lockd to advertise on IPv6 networks.
Unlike rpcb_register(), the new rpcb_v4_register() function uses rpcbind
protocol version 4 to contact the local rpcbind daemon. The version 4
SET/UNSET procedures allow services to register address families besides
AF_INET, register at specific network interfaces, and register transport
protocols besides UDP and TCP. All of this functionality is exposed via
the new rpcb_v4_register() kernel API.
A user-space rpcbind daemon implementation that supports version 4 of the
rpcbind protocol is required in order to make use of this new API.
Note that rpcbind version 3 is sufficient to support the new rpcbind
facilities listed above, but most extant implementations use version 4.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Mon, 14 Jul 2008 20:03:29 +0000 (16:03 -0400)]
SUNRPC: Refactor rpcb_register to make rpcbindv4 support easier
rpcbind version 4 registration will reuse part of rpcb_register, so just
split it out into a separate function now.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Mon, 14 Jul 2008 20:03:28 +0000 (16:03 -0400)]
SUNRPC: None of rpcb_create's callers wants a privileged source port
Clean up: Callers that required a privileged source port now use
rpcb_create_local(), so we can remove the @privileged argument from
rpcb_create().
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Mon, 14 Jul 2008 20:03:27 +0000 (16:03 -0400)]
SUNRPC: Introduce a specific rpcb_create for contacting localhost
Add rpcb_create_local() for use by rpcb_register() and upcoming IPv6
registration functions.
Ensure any errors encountered by rpcb_create_local() are properly
reported.
We can also use a statically allocated constant loopback socket address
instead of one allocated on the stack and initialized every time the
function is called.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Mon, 14 Jul 2008 20:03:26 +0000 (16:03 -0400)]
SUNRPC: Use correct XDR encoding procedure for rpcbind SET/UNSET
The rpcbind versions 3 and 4 SET and UNSET procedures use the same
arguments as the GETADDR procedure.
While definitely a bug, this hasn't been a problem so far since the
kernel hasn't used version 3 or 4 SET and UNSET. But this will change
in just a moment.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Mon, 7 Jul 2008 16:18:53 +0000 (12:18 -0400)]
SUNRPC: Ensure our task is notified when an rpcbind call is done
If another task is busy in rpcb_getport_async number, it is more efficient
to have it wake us up when it has finished instead of arbitrarily sleeping
for 5 seconds.
Also ensure that rpcb_wake_rpcbind_waiters() is called regardless of
whether or not rpcb_getport_done() gets called.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Tue, 24 Jun 2008 23:28:02 +0000 (19:28 -0400)]
NFS: Allow either strict or sloppy mount option parsing
The kernel's NFS client mount option parser currently doesn't allow
unrecognized or incorrect mount options. This prevents misspellings or
incorrectly specified mount options from possibly causing silent data
corruption.
However, NFS mount options are not standardized, so different operating
systems can use differently spelled mount options to support similar
features, or can support mount options which no other operating system
supports.
"Sloppy" mount option parsing, which allows the parser to ignore any
option it doesn't recognize, is needed to support automounters that often
use maps that are shared between heterogenous operating systems.
The legacy mount command ignores the validity of the values of mount
options entirely, except for the "sec=" and "proto=" options. If an
incorrect value is specified, the out-of-range value is passed to the
kernel; if a value is specified that contains non-numeric characters,
it appears as though the legacy mount command sets that option to zero
(probably incorrect behavior in general).
In any case, this sets a precedent which we will partially follow for
the kernel mount option parser:
+ if "sloppy" is not set, the parser will be strict about both
unrecognized options (same as legacy) and invalid option
values (stricter than legacy)
+ if "sloppy" is set, the parser will ignore unrecognized
options and invalid option values (same as legacy)
An "invalid" option value in this case means that either the type
(integer, short, or string) or sign (for integer values) of the specified
value is incorrect.
This patch does two things: it changes the NFS client's mount option
parsing loop so that it parses the whole string instead of failing at
the first unrecognized option or invalid option value. An unrecognized
option or an invalid option value cause the option to be skipped.
Then, the patch adds a "sloppy" mount option that allows the parsing
to succeed anyway if there were any problems during parsing. When
parsing a set of options is complete, if there are errors and "sloppy"
was specified, return success anyway. Otherwise, only return success
if there are no errors.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Tue, 24 Jun 2008 20:33:54 +0000 (16:33 -0400)]
NFS4: Set security flavor default for NFSv4 mounts like other defaults
Set the default security flavor when we set the other mount option
default values for NFSv4. This cleans up the NFSv4 mount option parsing
path to look like the NFSv2/v3 one.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Tue, 24 Jun 2008 20:33:46 +0000 (16:33 -0400)]
NFS: Set security flavor default for NFSv2/3 mounts like other defaults
Set the default security flavor when we set the other mount option default
values. After this change, only the legacy user-space mount path needs to
set the NFS_MOUNT_SECFLAVOUR flag.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Tue, 24 Jun 2008 20:33:38 +0000 (16:33 -0400)]
NFS: Refactor logic for parsing NFS security flavor mount options
Clean up: Refactor the NFS mount option parsing function to extract the
security flavor parsing logic into a separate function.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Thu, 26 Jun 2008 21:47:12 +0000 (17:47 -0400)]
NFS: use documenting macro constants for initializing ac{reg, dir}{min, max}
Clean up.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Thu, 26 Jun 2008 21:47:05 +0000 (17:47 -0400)]
NFS: Move the nfs_set_port() call out of nfs_parse_mount_options()
The remount path does not need to set the port in the server address.
Since it's not really a part of option parsing, move the nfs_set_port()
call to nfs_parse_mount_options()'s callers.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Wed, 2 Jul 2008 18:43:47 +0000 (14:43 -0400)]
NFS: set transport defaults after mount option parsing is finished
Move the UDP/TCP default timeo/retrans settings for text mounts to
nfs_init_timeout_values(), which was were they were always being
initialised (and sanity checked) for binary mounts.
Document the default timeout values using appropriate #defines.
Ensure that we initialise and sanity check the transport protocols that
may have been specified by the user.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Wed, 25 Jun 2008 21:24:54 +0000 (17:24 -0400)]
SUNRPC: Use only rpcbind v2 for AF_INET requests
Some server vendors support the higher versions of rpcbind only for
AF_INET6. The kernel doesn't need to use v3 or v4 for AF_INET anyway,
so change the kernel's rpcbind client to query AF_INET servers over
rpcbind v2 only.
This has a few interesting benefits:
1. If the rpcbind request is going over TCP, and the server doesn't
support rpcbind versions 3 or 4, the client reduces by two the number
of ephemeral ports left in TIME_WAIT for each rpcbind request. This
will help during NFS mount storms.
2. The rpcbind interaction with servers that don't support rpcbind
versions 3 or 4 will use less network traffic. Also helpful
during mount storms.
3. We can eliminate the kernel build option that controls whether the
kernel's rpcbind client uses rpcbind version 3 and 4 for AF_INET
servers. Less complicated kernel configuration...
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Wed, 25 Jun 2008 21:24:47 +0000 (17:24 -0400)]
SUNRPC: Use GETADDR for rpcbind version 4 queries
Some rpcbind servers that do support rpcbind version 4 do not support
the GETVERSADDR procedure. Use GETADDR for querying rpcbind servers
via rpcbind version 4 instead of GETVERSADDR.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Wed, 25 Jun 2008 21:24:39 +0000 (17:24 -0400)]
SUNRPC: Use rpcbind version 2 GETPORT
Clean up: Change the version 2 procedure name to GETPORT. It's the same
procedure number as GETADDR, but version 2 implementations usually refer
to it as GETPORT.
This also now matches the procedure name used in the version 2 procedure
entry in the rpcb_next_version[] array, making it slightly less confusing.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Wed, 25 Jun 2008 21:24:31 +0000 (17:24 -0400)]
SUNRPC: Document some naked integers in rpcbind client
Clean up: Replace naked integers that represent rpcbind protocol versions.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Wed, 25 Jun 2008 21:24:23 +0000 (17:24 -0400)]
SUNRPC: More useful debugging output for rpcb client
Clean up dprintk's in rpcb client's XDR decoder functions.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Jeff Layton [Wed, 11 Jun 2008 14:03:11 +0000 (10:03 -0400)]
nfs4: fix potential race with rapid nfs_callback_up/down cycle
If the nfsv4 callback thread is rapidly brought up and down, it's
possible that nfs_callback_svc might never get a chance to run. If
this happens, the cleanup at thread exit might never occur, throwing
the refcounting off and nfs_callback_info in an incorrect state.
Move the clean functions into nfs_callback_down. Also change the
nfs_callback_info struct to track the svc_rqst rather than svc_serv
since we need to know that to call svc_exit_thread.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Jeff Layton [Wed, 11 Jun 2008 14:03:10 +0000 (10:03 -0400)]
nfs4: remove BKL from nfs_callback_up and nfs_callback_down
The nfs_callback_mutex is sufficient protection.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Benny Halevy [Tue, 24 Jun 2008 17:25:57 +0000 (20:25 +0300)]
nfs: initialize timeout variable in nfs4_proc_setclientid_confirm
gcc (4.3.0) rightfully warns about this:
/usr0/export/dev/bhalevy/git/linux-pnfs-bh-nfs41/fs/nfs/nfs4proc.c: In function \91nfs4_proc_setclientid_confirm\92:
/usr0/export/dev/bhalevy/git/linux-pnfs-bh-nfs41/fs/nfs/nfs4proc.c:2936: warning: \91timeout\92 may be used uninitialized in this function
nfs4_delay that's passed a pointer to 'timeout' is looking at its value
and sets it up to some value in the range: NFS4_POLL_RETRY_MIN..NFS4_POLL_RETRY_MAX
if (*timeout <= 0)
*timeout = NFS4_POLL_RETRY_MIN;
if (*timeout > NFS4_POLL_RETRY_MAX)
*timeout = NFS4_POLL_RETRY_MAX;
Therefore it will end up set to some sane, though rather indeterministic, value.
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Mon, 23 Jun 2008 16:37:01 +0000 (12:37 -0400)]
NFS: handle interface identifiers in incoming IPv6 addresses
Add support in the kernel NFS client's address parser for interface
identifiers.
IPv6 link-local addresses require an additional "interface identifier",
which is a network device name or an integer that indexes the array of
local network interfaces. They are suffixed to the address with a '%'.
For example:
fe80::215:c5ff:fe3b:e1b2%2
indicates an interface index of 2. Or
fe80::215:c5ff:fe3b:e1b2%eth0
indicates that requests should be routed through the eth0 device.
Without the interface ID, link-local addresses are not usable for NFS.
Both the kernel NFS client mount option parser and the mount.nfs command
can take either form. The mount.nfs command always passes the address
through getnameinfo(3), which usually re-writes interface indices as
device names.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Mon, 23 Jun 2008 16:36:53 +0000 (12:36 -0400)]
NFS: Add string length argument to nfs_parse_server_address
To make nfs_parse_server_address() more generally useful, allow it to
accept input strings that are not terminated with '\0'.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Mon, 23 Jun 2008 16:36:45 +0000 (12:36 -0400)]
NFS: Support raw IPv6 address hostnames during NFS mount operation
Traditionally the mount command has looked for a ":" to separate the
server's hostname from the export path in the mounted on device name,
like this:
mount server:/export /mounted/on/dir
The server's hostname is "server" and the export path is "/export".
You can also substitute a specific IPv4 network address for the server
hostname, like this:
mount 192.168.0.55:/export /mounted/on/dir
Raw IPv6 addresses present a problem, however, because they look
something like this:
fe80::200:5aff:fe00:30b
Note the use of colons.
To get around the presence of colons, copy the Solaris convention used for
mounting IPv6 servers by address: wrap a raw IPv6 address with square
brackets.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Mon, 23 Jun 2008 16:36:37 +0000 (12:36 -0400)]
NFS: Use common device name parsing logic for NFSv4 and NFSv2/v3
To support passing a raw IPv6 address as a server hostname, we need to
expand the logic that handles splitting the passed-in device name into
a server hostname and export path
Start by pulling device name parsing out of the mount option validation
functions and into separate helper functions.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Tue, 17 Jun 2008 20:12:00 +0000 (16:12 -0400)]
NFS: Fix a dependency on CONFIG_NFS_V4 in nfs_remount
Fix the 'nfs4_fs_type' undeclared error in nfs_remount when compiling sans
NFSv4...
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Jeff Layton <jlayton@redhat.com>
Trond Myklebust [Fri, 13 Jun 2008 17:25:22 +0000 (13:25 -0400)]
NFS: Allow redirtying of a completed unstable write.
Currently, if an unstable write completes, we cannot redirty the page in
order to reflect a new change in the page data until after we've sent a
COMMIT request.
This patch allows a page rewrite to proceed without the unnecessary COMMIT
step, putting it immediately back onto the dirty page list, undoing the
VM unstable write accounting, and removing the NFS_PAGE_TAG_COMMIT tag from
the NFS radix tree.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Fri, 13 Jun 2008 16:12:32 +0000 (12:12 -0400)]
NFS: Clean up nfs_update_request()
Simplify the loop in nfs_update_request by moving into a separate function
the code that attempts to update an existing cached NFS write.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Thu, 12 Jun 2008 16:37:49 +0000 (12:37 -0400)]
NFS: missing newline in NFS mount debugging message
Clean up.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Thu, 12 Jun 2008 16:37:41 +0000 (12:37 -0400)]
NFS: Treat "intr" and "nointr" options as deprecated
Clean up: the "intr" and "nointr" mount options were recently retired.
Document this in the NFS mount option parser.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Thu, 12 Jun 2008 16:37:33 +0000 (12:37 -0400)]
NFS: Allow any value for the "retry" option
The kernel NFS mount option parser should ignore the retry= mount option
since it is meaningful only in user space. Today it expects a number
rather than arbitrary text, so it ignores the option if the value is
numeric, but chokes if there are other characters in the value.
Change it to allow any text (except ",") as its value.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Wed, 11 Jun 2008 21:39:04 +0000 (17:39 -0400)]
NFS: Ensure we zap only the access and acl caches when setting new acls
...and ensure that we obey the NFS_INO_INVALID_ACL flag when retrieving the
acls.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Wed, 11 Jun 2008 20:42:05 +0000 (16:42 -0400)]
NFS: Fix a warning in nfs4_async_handle_error
We're not modifying the nfs_server when we call nfs_inc_server_stats and
friends, so allow the compiler to pass 'const' pointers too.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Thu, 12 Jun 2008 16:32:25 +0000 (12:32 -0400)]
NFS: Move fs/nfs/iostat.h to include/linux
The fs/nfs/iostat.h header has definitions that were designed to be exposed
to user space. Move these definitions under include/linux so user space can
use the definitions in applications that read /proc/self/mountstats.
Also address a handful of coding style issues called out by checkpatch.pl in
fs/nfs/iostat.h.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Wed, 11 Jun 2008 20:32:46 +0000 (16:32 -0400)]
NFS: Remove the redundant file_open entry from struct nfs_rpc_ops
All instances are set to nfs_open(), so we should just remove the redundant
indirection. Ditto for the file_release op
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Fri, 6 Jun 2008 17:22:25 +0000 (13:22 -0400)]
SUNRPC: Ensure all transports set rq_xtime consistently
The RPC client uses the rq_xtime field in each RPC request to determine the
round-trip time of the request. Currently, the rq_xtime field is
initialized by each transport just before it starts enqueing a request to
be sent. However, transports do not handle initializing this value
consistently; sometimes they don't initialize it at all.
To make the measurement of request round-trip time consistent for all
RPC client transport capabilities, pull rq_xtime initialization into the
RPC client's generic transport logic. Now all transports will get a
standardized RTT measure automatically, from:
xprt_transmit()
to
xprt_complete_rqst()
This makes round-trip time calculation more accurate for the TCP transport.
The socket ->sendmsg() method can return "-EAGAIN" if the socket's output
buffer is full, so the TCP transport's ->send_request() method may call
the ->sendmsg() method repeatedly until it gets all of the request's bytes
queued in the socket's buffer.
Currently, the TCP transport sets the rq_xtime field every time through
that loop so the final value is the timestamp just before the *last* call
to the underlying socket's ->sendmsg() method. After this patch, the
rq_xtime field contains a timestamp that reflects the time just before the
*first* call to ->sendmsg().
This is consequential under heavy workloads because large requests often
take multiple ->sendmsg() calls to get all the bytes of a request queued.
The TCP transport causes the request to sleep until the remote end of the
socket has received enough bytes to clear space in the socket's local
output buffer. This delay can be quite significant.
The method introduced by this patch is a more accurate measure of RTT
for stream transports, since the server can cause enough back pressure
to delay (ie increase the latency of) requests from the client.
Additionally, this patch corrects the behavior of the RDMA transport, which
entirely neglected to initialize the rq_xtime field. RPC performance
metrics for RDMA transports now display correct RPC request round trip
times.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: Tom Talpey <thomas.talpey@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Tue, 10 Jun 2008 23:39:41 +0000 (19:39 -0400)]
NFS: Fix the ftruncate() credential problem
ftruncate() access checking is supposed to be performed at open() time,
just like reads and writes.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
\\\"J. Bruce Fields\\\ [Mon, 9 Jun 2008 20:51:35 +0000 (16:51 -0400)]
rpc: minor cleanup of scheduler callback code
Try to make the comment here a little more clear and concise.
Also, this macro definition seems unnecessary.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
\\\"J. Bruce Fields\\\ [Mon, 9 Jun 2008 20:51:34 +0000 (16:51 -0400)]
rpc: remove some unused macros
There used to be a print_hexl() function that used isprint(), now gone.
I don't know why NFS_NGROUPS and CA_RUN_AS_MACHINE were here.
I also don't know why another #define that's actually used was marked
"unused".
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
\\\"J. Bruce Fields\\\ [Mon, 9 Jun 2008 20:51:33 +0000 (16:51 -0400)]
rpc: eliminate unused variable in auth_gss upcall code
Also, a minor comment grammar fix in the same file.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Olga Kornievskaia [Mon, 9 Jun 2008 20:51:31 +0000 (16:51 -0400)]
rpc: bring back cl_chatty
The cl_chatty flag alows us to control whether a given rpc client leaves
"server X not responding, timed out"
messages in the syslog. Such messages make sense for ordinary nfs
clients (where an unresponsive server means applications on the
mountpoint are probably hanging), but not for the callback client (which
can fail more commonly, with the only result just of disabling some
optimizations).
Previously cl_chatty was removed, do to lack of users; reinstate it, and
use it for the nfsd's callback client.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Jeff Layton [Tue, 10 Jun 2008 19:38:39 +0000 (15:38 -0400)]
NFS: implement option checking when remounting NFS filesystems (resend)
When remounting an NFS or NFS4 filesystem, the new NFS options are not
respected, yet the remount will still return success. This patch adds
a remount_fs sb op for NFS that checks any new nfs mount options against
the existing ones and fails the mount if any have changed.
This is only implemented for string-based mount options since doing
this with binary options isn't really feasible.
This is essentially the same as the original patch I sent out, but
adds a check to see if the addr= option has changed.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Adrian Bunk [Mon, 19 May 2008 22:08:00 +0000 (01:08 +0300)]
fs/nfs/nfsroot.c: remove CVS keyword
This patch removes a CVS keyword that wasn't updated for a long time
from a comment.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Wed, 11 Jun 2008 21:56:13 +0000 (17:56 -0400)]
SUNRPC: Remove obsolete messages during transport connect
Recent changes to the RPC client's transport connect logic make connect
status values ECONNREFUSED and ECONNRESET impossible.
Clean up xprt_connect_status() to account for these changes.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Wed, 11 Jun 2008 21:56:05 +0000 (17:56 -0400)]
NFS: Fix trace debugging nits in write.c
Clean up: fix a few dprintk messages that still need to show the RPC task ID
correctly, and be sure we use the preferred %lld or %llu instead of %Ld or
%Lu.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Wed, 11 Jun 2008 21:55:58 +0000 (17:55 -0400)]
NFS: Use NFSDBG_FILE for all fops
Clean up: some fops use NFSDBG_FILE, some use NFSDBG_VFS. Let's use
NFSDBG_FILE for all fops, and consistently report file names instead
of inode numbers.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Wed, 11 Jun 2008 21:55:50 +0000 (17:55 -0400)]
NFS: Add debugging facility for NFS aops
Recent work in fs/nfs/file.c neglected to add appropriate trace debugging
for the NFS client's address space operations.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Wed, 11 Jun 2008 21:55:42 +0000 (17:55 -0400)]
NFS: Make nfs_open methods consistent
Clean up: Report the same debugging info and count function calls the
same for files and directories in nfs_opendir() and nfs_file_open().
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Wed, 11 Jun 2008 21:55:34 +0000 (17:55 -0400)]
NFS: Make nfs_llseek methods consistent
Clean up: Report the same debugging info in nfs_llseek_dir() and
nfs_llseek_file().
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Tue, 27 May 2008 20:29:07 +0000 (16:29 -0400)]
NFS: Make nfs_fsync methods consistent
Clean up: Report the same debugging info, count function calls the same,
and use similar function naming in nfs_fsync_dir() and nfs_fsync().
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Wed, 21 May 2008 21:09:41 +0000 (17:09 -0400)]
SUNRPC: Display some debugging information as text rather than numbers
In rpc_show_tasks(), display the program name, version number, procedure
name and tk_action as human-readable variable-length text fields rather
than columnar numbers.
Doing the symbol lookup here helps in cases where we have actual
debugging output from a kernel log, but don't have access to the kernel
image or RPC module that generated the output.
Sample output:
-pid- flgs status -client- --rqstp- -timeout ---ops--
5608 0001 -11
eeb42690 f6d93710 0
f8fa1764 nfsv3 WRITE a:call_transmit_status q:none
5609 0001 -11
eeb42690 f6d937e0 0
f8fa1764 nfsv3 WRITE a:call_status q:xprt_sending
5610 0001 -11
eeb42690 f6d93230 0
f8fa1764 nfsv3 WRITE a:call_status q:xprt_sending
5611 0001 -11
eeb42690 f6d93300 0
f8fa1764 nfsv3 WRITE a:call_status q:xprt_sending
5612 0001 -11
eeb42690 f6d93090 0
f8fa1764 nfsv3 WRITE a:call_status q:xprt_sending
5613 0001 -11
eeb42690 f6d933d0 0
f8fa1764 nfsv3 WRITE a:call_status q:xprt_sending
5614 0001 -11
eeb42690 f6d93cc0 0
f8fa1764 nfsv3 WRITE a:call_status q:xprt_sending
5615 0001 -11
eeb42690 f6d93a50 0
f8fa1764 nfsv3 WRITE a:call_status q:xprt_sending
5616 0001 -11
eeb42690 f6d93640 0
f8fa1764 nfsv3 WRITE a:call_status q:xprt_sending
5617 0001 -11
eeb42690 f6d93b20 0
f8fa1764 nfsv3 WRITE a:call_status q:xprt_sending
5618 0001 -11
eeb42690 f6d93160 0
f8fa1764 nfsv3 WRITE a:call_status q:xprt_sending
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Wed, 21 May 2008 21:09:33 +0000 (17:09 -0400)]
SUNRPC: Refactor rpc_show_tasks
Clean up: move the logic that displays each task to its own function.
This removes indentation and makes future changes easier.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Wed, 21 May 2008 21:09:26 +0000 (17:09 -0400)]
SUNRPC: Don't display the rpc_show_tasks header if there are no tasks
Clean up: don't display the rpc_show_tasks column header unless there is at
least one task to display. As far as I can tell, it is safe to let the
list_for_each_entry macro decide that each list is empty.
scripts/checkpatch.pl also wants a KERN_FOO at the start of any newly added
printk() calls, so this and subsequent patches will also add KERN_INFO.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Wed, 21 May 2008 21:09:19 +0000 (17:09 -0400)]
SUNRPC: Rename "call_" functions that are no longer FSM states
The RPC client uses a finite state machine to move RPC tasks through each
step of an RPC request. Each state is contained in a function in
net/sunrpc/clnt.c, and named call_foo.
Some of the functions named call_foo have changed over the past few years and
are no longer states in the FSM. These include: call_encode, call_header,
and call_verify. As a clean up, rename the functions that have changed.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Wed, 21 May 2008 21:09:12 +0000 (17:09 -0400)]
SUNRPC: Add a function to display the name of an RPC procedure
Improve debugging messages in call_start() and call_verify() by having
them show the RPC procedure name instead of the procedure number.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Wed, 21 May 2008 21:09:04 +0000 (17:09 -0400)]
NFS: Update help text for CONFIG_NFS_FS
Clean up: refresh the help text for Kconfig items related to the NFS
client. Remove obsolete URLs, and make the language consistent among
the options.
Also move the ROOT_NFS config option next to the options related to the
NFS client.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Tue, 10 Jun 2008 22:31:02 +0000 (18:31 -0400)]
NFS: do_setlk(): don't flush caches when we have a delegation
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Tue, 10 Jun 2008 22:31:01 +0000 (18:31 -0400)]
SUNRPC: Use GFP_NOFS when allocating credentials
Since the credentials may be allocated during the call to rpc_new_task(),
which again may be called by a memory allocator...
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Sun, 25 May 2008 16:59:19 +0000 (12:59 -0400)]
NFS: Revert commit
44dd151d
Revert commit
44dd151d "NFS: Don't mark a written page as uptodate until it
is on disk". While it is true that the write may fail, that is always the
case. There is no reason why we should treat data on pages that are not
already marked as PG_uptodate as being special. The only thing we gain is a
noticeable slowdown when re-reading these pages.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Tue, 10 Jun 2008 22:31:00 +0000 (18:31 -0400)]
NFS: Optimise append writes with holes
If a file is being extended, and we're creating a hole, we might as well
declare the entire page to be up to date.
This patch significantly improves the write performance for sparse files
in the case where lseek(SEEK_END) is used to append several non-contiguous
writes at intervals of < PAGE_SIZE.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Tue, 10 Jun 2008 22:30:11 +0000 (18:30 -0400)]
SUNRPC: An ENOMEM error from call_encode is always fatal
The special 'ENOMEM' case that was previously flagged as non-fatal is
bogus: auth_gss always returns EAGAIN for non-fatal errors, and may in fact
return ENOMEM in the special case where xdr_buf_read_netobj runs out of
preallocated buffer space (invariably a _fatal_ error, since there is no
provision for preallocating larger buffers).
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Thu, 15 May 2008 02:48:25 +0000 (19:48 -0700)]
SUNRPC: Ensure we exit early in case of an encode error
All errors from call_encode(), with exception of EAGAIN are fatal, so we
should immediately return instead of proceeding to xprt_transmit().
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Tue, 20 May 2008 23:34:39 +0000 (19:34 -0400)]
NFS: Add correct bounds checking to NFSv2 locks
NFSv2 file locking currently fails the Connectathon tests, because the
calls to the VFS locking code do not return an EINVAL error if the
struct file_lock overflows the 32-bit boundaries.
The problem is due to the fact that we occasionally call helpers from
fs/locks.c in order to avoid RPC calls to the server when we know that a
local process holds the lock. These helpers are, of course, always
64-bit enabled, so EINVAL is not returned in cases when it would if
the call had gone to the NLM code.
For consistency, we therefore add support for a bounds-checking helper.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Thu, 5 Jun 2008 19:17:39 +0000 (15:17 -0400)]
NFS: Fix a preemption count leak in nfs_update_request
The commit
2785259631697ebb0749a3782cca206e2e542939 (nfs: use GFP_NOFS
preloads for radix-tree insertion) appears to have introduced a bug:
We only want to call radix_tree_preload() once after creating a request.
Calling it every time we loop after we created the request, will cause
preemption count leaks.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Nick Piggin <npiggin@suse.de>
Trond Myklebust [Fri, 20 Jun 2008 21:00:23 +0000 (17:00 -0400)]
NFS: Reduce the stack usage in NFSv3 create operations
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Fri, 20 Jun 2008 19:35:32 +0000 (15:35 -0400)]
NFS: Reduce the stack usage in NFSv4 create operations
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Linus Torvalds [Tue, 8 Jul 2008 19:40:57 +0000 (12:40 -0700)]
Merge branch 'hotfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'hotfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
SUNRPC: Fix an rpcbind breakage for the case of IPv6 lookups
SUNRPC: Fix a double-free in rpcbind
NFS: Fix readdir cache invalidation
Linus Torvalds [Tue, 8 Jul 2008 19:40:19 +0000 (12:40 -0700)]
Merge branch 'upstream' of git://ftp.linux-mips.org/upstream-linus
* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus:
[MIPS] Fix 32bit kernels on R4k with 128 byte cache line size
[MIPS] Atlas, decstation: Fix section mismatches triggered by defconfigs
Jeff Mahoney [Tue, 8 Jul 2008 18:37:06 +0000 (14:37 -0400)]
reiserfs: discard prealloc in reiserfs_delete_inode
With the removal of struct file from the xattr code,
reiserfs_file_release() isn't used anymore, so the prealloc isn't
discarded. This causes hangs later down the line.
This patch adds it to reiserfs_delete_inode. In most cases it will be a
no-op due to it already having been called, but will avoid hangs with
xattrs.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Trond Myklebust [Tue, 1 Jul 2008 19:20:55 +0000 (15:20 -0400)]
SUNRPC: Fix an rpcbind breakage for the case of IPv6 lookups
Now that rpcb_next_version has been split into an IPv4 version and an IPv6
version, we Oops when rpcb_call_async attempts to look up the IPv6-specific
RPC procedure in rpcb_next_version.
Fix the Oops simply by having rpcb_getport_async pass the correct RPC
procedure as an argument.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Mon, 7 Jul 2008 16:18:52 +0000 (12:18 -0400)]
SUNRPC: Fix a double-free in rpcbind
It is wrong to be freeing up the rpcbind arguments if the call to
rpcb_call_async() fails, since they should already have been freed up by
rpcb_map_release().
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Mon, 7 Jul 2008 17:26:10 +0000 (13:26 -0400)]
NFS: Fix readdir cache invalidation
invalidate_inode_pages2_range() takes page offset arguments, not byte
ranges.
Another thought is that individual pages might perhaps get evicted by VM
pressure, in which case we might perhaps want to re-read not only the
evicted page, but all subsequent pages too (in case the server returns
more/less data per page so that the alignment of the next entry
changes). We should therefore remove the condition that we only do this on
page->index==0.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Thomas Bogendoerfer [Tue, 8 Jul 2008 12:46:34 +0000 (14:46 +0200)]
[MIPS] Fix 32bit kernels on R4k with 128 byte cache line size
The generated copy_page for R4k CPU with a 128 byte cache line size used
Create Dirty Exclusive cache line operations even if only part of the
cache line was filled. This change avoids generating cache operations,
if only part of the cache line size is copied in one loop. It also
increases the maxmimum loop size, because the generated code even fits
into the available space for r4k CPUs with 128 byte cache line size.
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Shane McDonald [Sat, 5 Jul 2008 23:19:42 +0000 (17:19 -0600)]
[MIPS] Atlas, decstation: Fix section mismatches triggered by defconfigs
Resolve these mismatches by defining affected functions with the __cpuinit
attribute, rather than __init.
Signed-off-by: Shane McDonald <mcdonald.shane@gmail.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Linus Torvalds [Tue, 8 Jul 2008 18:19:11 +0000 (11:19 -0700)]
Merge git://git./linux/kernel/git/bart/ide-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6:
it8213: fix return value in it8213_init_one()
palm_bk3710: fix IDECLK period calculation
ide: add __ide_default_irq() inline helper
Bartlomiej Zolnierkiewicz [Tue, 8 Jul 2008 17:27:23 +0000 (19:27 +0200)]
it8213: fix return value in it8213_init_one()
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Sergei Shtylyov [Tue, 8 Jul 2008 17:27:22 +0000 (19:27 +0200)]
palm_bk3710: fix IDECLK period calculation
The driver uses completely bogus rounding formula for calculating period from
the IDECLK frequency which gives one-off period values (e.g. 11 ns with 100 MHz
IDECLK) which in turn can lead to overclocked IDE transfer timings. Actually,
rounding is just wrong in this case, so use a mere division for a safe result.
While at it, also:
- give 'ide_palm_clk' variable a more suitable name;
- get rid of the useless 'ideclkp' variable;
- drop the LISP stype 'p' postfix from the 'clkp' variable's name. :-)
Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Cc: mcherkashin@ru.mvista.com
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Bartlomiej Zolnierkiewicz [Tue, 8 Jul 2008 17:27:22 +0000 (19:27 +0200)]
ide: add __ide_default_irq() inline helper
Add __ide_default_irq() inline helper and use it instead of
ide_default_irq() in ide-probe.c and ns87415.c (all host drivers
except IDE PCI ones always setup hwif->irq so it is enough to
check only for I/O bases 0x1f0 and 0x170).
This fixes post-2.6.25 regression since ide_default_irq()
define could shadow ide_default_irq() inline.
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Linus Torvalds [Tue, 8 Jul 2008 16:29:34 +0000 (09:29 -0700)]
Merge branch 'for-linus' of git://git390.osdl.marist.edu/linux-2.6
* 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6:
[S390] protect _PAGE_SPECIAL bit against mprotect
David Gibson [Tue, 8 Jul 2008 05:58:16 +0000 (15:58 +1000)]
Correct hash flushing from huge_ptep_set_wrprotect()
As Andy Whitcroft recently pointed out, the current powerpc version of
huge_ptep_set_wrprotect() has a bug. It just calls ptep_set_wrprotect()
which in turn calls pte_update() then hpte_need_flush() with the 'huge'
argument set to 0. This will cause hpte_need_flush() to flush the wrong
hash entries (of any). Andy's fix for this is already in the powerpc
tree as commit
016b33c4958681c24056abed8ec95844a0da80a3.
I have confirmed this is a real bug, not masked by some other
synchronization, with a new testcase for libhugetlbfs. A process write
a (MAP_PRIVATE) hugepage mapping, fork(), then alter the mapping and
have the child incorrectly see the second write.
Therefore, this should be fixed for 2.6.26, and for the stable tree.
Here is a suitable patch for 2.6.26, which I think will also be suitable
for the stable tree (neither of the headers in question has been changed
much recently).
It is cut down slighlty from Andy's original version, in that it does
not include a 32-bit version of huge_ptep_set_wrprotect(). Currently,
hugepages are not supported on any 32-bit powerpc platform. When they
are, a suitable 32-bit version can be added - the only 32-bit hardware
which supports hugepages does not use the conventional hashtable MMU and
so will have different needs anyway.
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nick Piggin [Tue, 8 Jul 2008 09:31:06 +0000 (11:31 +0200)]
[S390] protect _PAGE_SPECIAL bit against mprotect
Stop mprotect's pte_modify from wiping out the s390 pte_special bit, which
caused oops thereafter when vm_normal_page thought X's abnormal was normal.
Debugged-by: Ryan Hope <rmh3093@gmail.com>
Debugged-by: Zan Lynx <zlynx@acm.org>
Acked-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Linus Torvalds [Mon, 7 Jul 2008 23:59:43 +0000 (16:59 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jbarnes/pci-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
Revert "PCI: Correct last two HP entries in the bfsort whitelist"
Jesse Barnes [Mon, 7 Jul 2008 16:55:26 +0000 (09:55 -0700)]
Revert "PCI: Correct last two HP entries in the bfsort whitelist"
This reverts commit
a1676072558854b95336c8f7db76b0504e909a0a. It duplicates
the change from
8d64c781f0c5fbfdf8016bd1634506ff2ad1376a and only one should be
applied, otherwise some of the Dell quirks are lost.
Thanks to Tony Camuso for catching this.
Acked-by: Tony Camuso <tcamuso@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Jeff Dike [Mon, 7 Jul 2008 17:36:56 +0000 (13:36 -0400)]
[UML] fix gcc ICEs and unresolved externs
There are various constraints on the use of unit-at-a-time:
- i386 uses no-unit-at-a-time for pre-4.0 (not 4.3)
- x86_64 uses unit-at-a-time always
Uli reported a crash on x86_64 with gcc 4.1.2 with unit-at-a-time,
resulting in commit
c0a18111e571138747a98af18b3a2124df56a0d1
Ingo reported a gcc internal error with gcc 4.3 with no-unit-at-a-timem,
resulting in
22eecde2f9034764a3fd095eecfa3adfb8ec9a98
Benny Halevy is seeing extern inlines not resolved with gcc 4.3 with
no-unit-at-a-time
This patch reintroduces unit-at-a-time for gcc >= 4.0, bringing back the
possibility of Uli's crash. If that happens, we'll debug it.
I started seeing both the internal compiler errors and unresolved
inlines on Fedora 9. This patch fixes both problems, without so far
reintroducing the crash reported by Uli.
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Cc: Benny Halevy <bhalevy@panasas.com>
Cc: Adrian Bunk <bunk@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Ulrich Drepper <drepper@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Mon, 7 Jul 2008 16:24:28 +0000 (09:24 -0700)]
Merge git://git./linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
can: add sanity checks
fs_enet: restore promiscuous and multicast settings in restart()
ibm_newemac: Fixes entry of short packets
ibm_newemac: Fixes kernel crashes when speed of cable connected changes
pasemi_mac: Access iph->tot_len with correct endianness
ehea: Access iph->tot_len with correct endianness
ehea: fix race condition
ehea: add MODULE_DEVICE_TABLE
ehea: fix might sleep problem
forcedeth: fix lockdep warning on ethtool -s
Add missing skb->dev assignment in Frame Relay RX code
bridge: fix use-after-free in br_cleanup_bridges()
tcp: fix a size_t < 0 comparison in tcp_read_sock
tcp: net/ipv4/tcp.c needs linux/scatterlist.h
libertas: support USB persistence on suspend/resume (resend)
iwlwifi: drop skb silently for Tx request in monitor mode
iwlwifi: fix incorrect 5GHz rates reported in monitor mode
Benjamin Herrenschmidt [Mon, 7 Jul 2008 06:39:50 +0000 (16:39 +1000)]
powerpc: Fix unterminated of_device_id array in legacy_serial.c
A recent patch to legacy_serial.c factored out some code by
using the of_match_node() facility to match a node against
an array of possible matches. However, the patch didn't properly
terminate the array causing potential crashes in cases where no
match is found. In addition, the name of the array was poorly
chosen for a static symbol making debugging harder.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sun, 6 Jul 2008 23:43:12 +0000 (16:43 -0700)]
vsprintf: add support for '%pS' and '%pF' pointer formats
They print out a pointer in symbolic format, if possible (ie using
symbolic KALLSYMS information). The '%pS' format is for regular direct
pointers (which can point to data or code and that you find on the stack
during backtraces etc), while '%pF' is for C function pointer types.
On most architectures, the two mean exactly the same thing, but some
architectures use an indirect pointer for C function pointers, where the
function pointer points to a function descriptor (which in turn contains
the actual pointer to the code). The '%pF' code automatically does the
appropriate function descriptor dereference on such architectures.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sun, 6 Jul 2008 23:24:57 +0000 (16:24 -0700)]
vsprintf: add infrastructure support for extended '%p' specifiers
This expands the kernel '%p' handling with an arbitrary alphanumberic
specifier extension string immediately following the '%p'. Right now
it's just being ignored, but the next commit will start adding some
specific pointer type extensions.
NOTE! The reason the extension is appended to the '%p' is to allow
minimal gcc type checking: gcc will still see the '%p' and will check
that the argument passed in is indeed a pointer, and yet will not
complain about the extended information that gcc doesn't understand
about (on the other hand, it also won't actually check that the pointer
type and the extension are compatible).
Alphanumeric characters were chosen because there is no sane existing
use for a string format with a hex pointer representation immediately
followed by alphanumerics (which is what such a format string would have
traditionally resulted in).
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sun, 6 Jul 2008 23:16:15 +0000 (16:16 -0700)]
vsprintf: split out '%p' handling logic
The actual code is the same, just split out into a helper function.
This makes it easier to read, and allows for simple future extension
of %p handling.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sun, 6 Jul 2008 23:06:25 +0000 (16:06 -0700)]
vsprintf: split out '%s' handling logic
The actual code is the same, just split out into a helper function.
This makes it easier to read, and allows for future sharing of the
string code.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sun, 6 Jul 2008 18:16:23 +0000 (11:16 -0700)]
Merge branch 'kvm-updates-2.6.26' of git://git./linux/kernel/git/avi/kvm
* 'kvm-updates-2.6.26' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm:
KVM: IOAPIC: Fix level-triggered irq injection hang
x86: KVM guest: Add memory clobber to hypercalls
Philipp Zabel [Sat, 5 Jul 2008 23:15:34 +0000 (01:15 +0200)]
pxamci: fix byte aligned DMA transfers
The pxa27x DMA controller defaults to 64-bit alignment. This caused
the SCR reads to fail (and, depending on card type, error out) when
card->raw_scr was not aligned on a 8-byte boundary.
For performance reasons all scatter-gather addresses passed to
pxamci_request should be aligned on 8-byte boundaries, but if
this can't be guaranteed, byte aligned DMA transfers in the
have to be enabled in the controller to get correct behaviour.
Signed-off-by: Philipp Zabel <philipp.zabel@gmail.com>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sun, 6 Jul 2008 17:27:25 +0000 (10:27 -0700)]
Revert "USB: don't explicitly reenable root-hub status interrupts"
This reverts commit
e872154921a6b5256a3c412dd69158ac0b135176.
Andrey Borzenkov reports that it resulted in a totally hung machine for
him when loading the OHCI driver. Extensive netconsole capture with
SysRq output shows that modprobe gets stuck in ohci_hub_status_data()
when probing and enabling the OHCI controller, see for example
http://lkml.org/lkml/2008/7/5/236
for an analysis.
The problem appears to be an interrupt flood triggered by the commit
that gets reverted, and Andrey confirmed that the revert makes things
work for him again.
Reported-and-tested-by: Andrey Borzenkov <arvidjaar@mail.ru>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Acked-by: David Brownell <david-b@pacbell.net>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mark McLoughlin [Fri, 4 Jul 2008 17:23:15 +0000 (18:23 +0100)]
KVM: IOAPIC: Fix level-triggered irq injection hang
The "remote_irr" variable is used to indicate an interrupt
which has been received by the LAPIC, but not acked.
In our EOI handler, we unset remote_irr and re-inject the
interrupt if the interrupt line is still asserted.
However, we do not set remote_irr here, leading to a
situation where if kvm_ioapic_set_irq() is called, then we go
ahead and call ioapic_service(). This means that IRR is
re-asserted even though the interrupt is currently in service
(i.e. LAPIC IRR is cleared and ISR/TMR set)
The issue with this is that when the currently executing
interrupt handler finishes and writes LAPIC EOI, then TMR is
unset and EOI sent to the IOAPIC. Since IRR is now asserted,
but TMR is not, then when the second interrupt is handled,
no EOI is sent and if there is any pending interrupt, it is
not re-injected.
This fixes a hang only seen while running mke2fs -j on an
8Gb virtio disk backed by a fully sparse raw file, with
aliguori "avoid fragmented virtio-blk transfers by copying"
changes.
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Acked-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Anthony Liguori [Thu, 3 Jul 2008 16:02:36 +0000 (19:02 +0300)]
x86: KVM guest: Add memory clobber to hypercalls
Hypercalls can modify arbitrary regions of memory. Make sure to indicate this
in the clobber list. This fixes a hang when using KVM_GUEST kernel built with
GCC 4.3.0.
This was originally spotted and analyzed by Marcelo.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>