J. Bruce Fields [Wed, 26 May 2010 21:46:00 +0000 (17:46 -0400)]
nfsd4: rename nfs4_rpc_args->nfsd4_cb_args
With apologies for the gratuitous rename, the new name seems more
helpful to me.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
J. Bruce Fields [Wed, 26 May 2010 21:40:53 +0000 (17:40 -0400)]
nfsd4: combine nfs4_rpc_args and nfsd4_cb_sequence
These two structs don't really need to be distinct as far as I can tell.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
J. Bruce Fields [Mon, 31 May 2010 23:09:40 +0000 (19:09 -0400)]
nfsd4: minor variable renaming (cb -> conn)
Now that we have both nfsd4_callback and nfsd4_cb_conn structures, I get
confused if variables of both types are always named cb....
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
J. Bruce Fields [Fri, 1 Oct 2010 19:40:01 +0000 (15:40 -0400)]
nfsd4: remove spkm3
Unfortunately, spkm3 never got very far; while interoperability with one
other implementation was demonstrated at some point, problems were found
with the spec that were deemed not worth fixing.
The kernel code is useless on its own without nfs-utils patches which
were never merged into nfs-utils, and were only ever available from
citi.umich.edu. They appear not to have been updated since 2005.
Therefore it seems safe to assume that this code has no users, and never
will.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
NeilBrown [Wed, 22 Sep 2010 02:55:06 +0000 (12:55 +1000)]
sunrpc: fix race in new cache_wait code.
If we set up to wait for a cache item to be filled in, and then find
that it is no longer pending, it could be that some other thread is
in 'cache_revisit_request' and has moved our request to its 'pending' list.
So when our setup_deferral calls cache_revisit_request it will find nothing to
put on the pending list, and do nothing.
We then return from cache_wait_req, thus leaving the 'sleeper'
on-stack structure open to being corrupted by subsequent stack usage.
However that 'sleeper' could still be on the 'pending' list that the
other thread is looking at and so any corruption could cause it to behave badly.
To avoid this race we simply take the same path as if the
'wait_for_completion_interruptible_timeout' was interrupted and if the
sleeper is no longer on the list (which it won't be) we wait on the
completion - which will ensure that any other cache_revisit_request
will have let go of the sleeper.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Pavel Emelyanov [Wed, 29 Sep 2010 12:06:57 +0000 (16:06 +0400)]
sunrpc: Create sockets in net namespaces
The context is already known in all the sock_create callers.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Pavel Emelyanov [Wed, 29 Sep 2010 12:06:32 +0000 (16:06 +0400)]
net: Export __sock_create
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Pavel Emelyanov [Wed, 29 Sep 2010 12:05:43 +0000 (16:05 +0400)]
sunrpc: Tag rpc_xprt with net
The net is known from the xprt_create and this tagging will also
give un the context in the conntection workers where real sockets
are created.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Pavel Emelyanov [Wed, 29 Sep 2010 12:05:12 +0000 (16:05 +0400)]
sunrpc: Add net to xprt_create
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Pavel Emelyanov [Wed, 29 Sep 2010 12:04:45 +0000 (16:04 +0400)]
sunrpc: Add net to rpc_create_args
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Pavel Emelyanov [Wed, 29 Sep 2010 12:04:18 +0000 (16:04 +0400)]
sunrpc: Pull net argument downto svc_create_socket
After this the socket creation in it knows the context.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Pavel Emelyanov [Wed, 29 Sep 2010 12:03:50 +0000 (16:03 +0400)]
sunrpc: Add net argument to svc_create_xprt
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Pavel Emelyanov [Wed, 29 Sep 2010 12:03:13 +0000 (16:03 +0400)]
sunrpc: Factor out rpc_xprt freeing
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Pavel Emelyanov [Wed, 29 Sep 2010 12:02:43 +0000 (16:02 +0400)]
sunrpc: Factor out rpc_xprt allocation
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Benny Halevy [Thu, 30 Sep 2010 18:47:46 +0000 (20:47 +0200)]
nfsd4: adjust buflen for encoded attrs bitmap based on actual bitmap length
The existing code adjusted it based on the worst case scenario for the returned
bitmap and the best case scenario for the supported attrs attribute.
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[bfields@redhat.com: removed likely/unlikely's]
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Stephen Rothwell [Wed, 29 Sep 2010 04:16:57 +0000 (14:16 +1000)]
sunrpc: fix up rpcauth_remove_module section mismatch
On Wed, 29 Sep 2010 14:02:38 +1000 Stephen Rothwell <sfr@canb.auug.org.au> wrote:
>
> After merging the final tree, today's linux-next build (powerpc
> ppc44x_defconfig) produced tis warning:
>
> WARNING: net/sunrpc/sunrpc.o(.init.text+0x110): Section mismatch in reference from the function init_sunrpc() to the function .exit.text:rpcauth_remove_module()
> The function __init init_sunrpc() references
> a function __exit rpcauth_remove_module().
> This is often seen when error handling in the init function
> uses functionality in the exit path.
> The fix is often to remove the __exit annotation of
> rpcauth_remove_module() so it may be used outside an exit section.
>
> Probably caused by commit
2f72c9b73730c335381b13e2bd221abe1acea394
> ("sunrpc: The per-net skeleton").
This actually causes a build failure on a sparc32 defconfig build:
`rpcauth_remove_module' referenced in section `.init.text' of net/built-in.o: defined in discarded section `.exit.text' of net/built-in.o
I applied the following patch for today:
Fixes:
`rpcauth_remove_module' referenced in section `.init.text' of net/built-in.o: defined in discarded section `.exit.text' of net/built-in.o
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Pavel Emelyanov [Mon, 27 Sep 2010 10:02:29 +0000 (14:02 +0400)]
sunrpc: Make the ip_map_cache be per-net
Everything that is required for that already exists:
* the per-net cache registration with respective proc entries
* the context (struct net) is available in all the users
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Pavel Emelyanov [Mon, 27 Sep 2010 10:01:58 +0000 (14:01 +0400)]
sunrpc: Make the /proc/net/rpc appear in net namespaces
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Pavel Emelyanov [Mon, 27 Sep 2010 10:01:27 +0000 (14:01 +0400)]
sunrpc: The per-net skeleton
Register empty per-net operations for the sunrpc layer.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Pavel Emelyanov [Mon, 27 Sep 2010 10:00:49 +0000 (14:00 +0400)]
sunrpc: Tag svc_xprt with net
The transport representation should be per-net of course.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Pavel Emelyanov [Mon, 27 Sep 2010 10:00:15 +0000 (14:00 +0400)]
sunrpc: Add routines that allow registering per-net caches
Existing calls do the same, but for the init_net.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Pavel Emelyanov [Mon, 27 Sep 2010 09:59:48 +0000 (13:59 +0400)]
sunrpc: Add net to pure API calls
There are two calls that operate on ip_map_cache and are
directly called from the nfsd code. Other places will be
handled in a different way.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Pavel Emelyanov [Mon, 27 Sep 2010 09:59:13 +0000 (13:59 +0400)]
sunrpc: Pass xprt to cached get/put routines
They do not require the rqst actually and having the xprt simplifies
further patching.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Pavel Emelyanov [Mon, 27 Sep 2010 09:58:42 +0000 (13:58 +0400)]
sunrpc: Make xprt auth cache release work with the xprt
This is done in order to facilitate getting the ip_map_cache from
which to put the ip_map.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Pavel Emelyanov [Mon, 27 Sep 2010 09:57:36 +0000 (13:57 +0400)]
sunrpc: Pass the ip_map_parse's cd to lower calls
The target is to have many ip_map_cache-s in the system. This particular
patch handles its usage by the ip_map_parse callback.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
J. Bruce Fields [Fri, 24 Sep 2010 21:43:59 +0000 (17:43 -0400)]
nfsd: fix /proc/net/rpc/nfsd.export/content display
Note with "first" always 0, and "lastflags" initially 0, we always dump
a spurious set of 0 flags at the start, among other problems.
Fix. And attempt to make the code a little more obvious.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Pavel Emelyanov [Thu, 23 Sep 2010 14:26:58 +0000 (18:26 +0400)]
nfsd: Export get_task_comm for nfsd
The git://linux-nfs.org/~bfields/linux.git nfsd-next branch doesn't
compile when nfsd is a module with the following error:
ERROR: "get_task_comm" [fs/nfsd/nfsd.ko] undefined!
Replace the get_task_comm call with direct comm access, which is
safe for current.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
NeilBrown [Wed, 22 Sep 2010 02:55:07 +0000 (12:55 +1000)]
nfsd: allow deprecated interface to be compiled out.
Add CONFIG_NFSD_DEPRECATED, default to y.
Only include deprecated interface if this is defined.
This allows distros to remove this interface before the official
removal, and allows developers to test without it.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
NeilBrown [Wed, 22 Sep 2010 02:55:07 +0000 (12:55 +1000)]
nfsd: formally deprecate legacy nfsd syscall interface
The syscall interface is has been replaced by a more flexible
interface since 2.6.0. It is time to work towards discarding
the old interface.
So add a entry in feature-removal-schedule.txt and print a warning
when the interface is used.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
NeilBrown [Wed, 22 Sep 2010 02:55:06 +0000 (12:55 +1000)]
sunrpc/cache: fix recent breakage of cache_clean_deferred
commit
6610f720e9e8103c22d1f1ccf8fbb695550a571f
broke cache_clean_deferred as entries are no longer added to the
pending list for subsequent revisiting.
So put those requests back on the pending list.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Bryan Schumaker [Tue, 21 Sep 2010 20:38:12 +0000 (16:38 -0400)]
lockd: Mostly remove BKL from the server
This patch removes all but one call to lock_kernel() from the server.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Andy Shevchenko [Tue, 21 Sep 2010 06:40:25 +0000 (09:40 +0300)]
sunrpc/cache: don't use custom hex_to_bin() converter
Signed-off-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: linux-nfs@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
NeilBrown [Thu, 12 Aug 2010 07:04:08 +0000 (17:04 +1000)]
sunrpc/cache: change deferred-request hash table to use hlist.
Being a hash table, hlist is the best option.
There is currently some ugliness were we treat "->next == NULL" as
a special case to avoid having to initialise the whole array.
This change nicely gets rid of that case.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
NeilBrown [Thu, 12 Aug 2010 07:04:07 +0000 (17:04 +1000)]
svcauth_gss: replace a trivial 'switch' with an 'if'
Code like:
switch(xxx) {
case -error1:
case -error2:
..
return;
case 0:
stuff;
}
can more naturally be written:
if (xxx < 0)
return;
stuff;
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
NeilBrown [Thu, 12 Aug 2010 07:04:06 +0000 (17:04 +1000)]
nfsd/idmap: drop special request deferal in favour of improved default.
The idmap code manages request deferal by waiting for a reply from
userspace rather than putting the NFS request on a queue to be retried
from the start.
Now that the common deferal code does this there is no need for the
special code in idmap.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
NeilBrown [Thu, 12 Aug 2010 07:04:07 +0000 (17:04 +1000)]
nfsd: disable deferral for NFSv4
Now that a slight delay in getting a reply to an upcall doesn't
require deferring of requests, request deferral for all NFSv4
requests - the concept doesn't really fit with the v4 model.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
NeilBrown [Thu, 12 Aug 2010 07:04:07 +0000 (17:04 +1000)]
sunrpc: close connection when a request is irretrievably lost.
If we drop a request in the sunrpc layer, either due kmalloc failure,
or due to a cache miss when we could not queue the request for later
replay, then close the connection to encourage the client to retry sooner.
Note that if the drop happens in the NFS layer, NFSERR_JUKEBOX
(aka NFS4ERR_DELAY) is returned to guide the client concerning
replay.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
J. Bruce Fields [Mon, 20 Sep 2010 02:55:06 +0000 (22:55 -0400)]
nfsd4: fix hang on fast-booting nfs servers
The last_close field of a cache_detail is initialized to zero, so the
condition
detail->last_close < seconds_since_boot() - 30
may be false even for a cache that was never opened.
However, we want to immediately fail upcalls to caches that were never
opened: in the case of the auth_unix_gid cache, especially, which may
never be opened by mountd (if the --manage-gids option is not set), we
want to fail the upcall immediately. Otherwise client requests will be
dropped unnecessarily on reboot.
Also document these conditions.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
J. Bruce Fields [Mon, 20 Sep 2010 03:48:00 +0000 (23:48 -0400)]
Merge remote branch 'trond/bugfixes' into for-2.6.37
Without some client-side fixes, server testing is currently difficult.
Trond Myklebust [Sun, 12 Sep 2010 23:57:50 +0000 (19:57 -0400)]
SUNRPC: Fix the NFSv4 and RPCSEC_GSS Kconfig dependencies
The NFSv4 client's callback server calls svc_gss_principal(), which
is defined in the auth_rpcgss.ko
The NFSv4 server has the same dependency, and in addition calls
svcauth_gss_flavor(), gss_mech_get_by_pseudoflavor(),
gss_pseudoflavor_to_service() and gss_mech_put() from the same module.
The module auth_rpcgss itself has no dependencies aside from sunrpc,
so we only need to select RPCSEC_GSS.
Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Menyhart Zoltan [Sun, 12 Sep 2010 23:55:26 +0000 (19:55 -0400)]
statfs() gives ESTALE error
Hi,
An NFS client executes a statfs("file", &buff) call.
"file" exists / existed, the client has read / written it,
but it has already closed it.
user_path(pathname, &path) looks up "file" successfully in the
directory-cache and restarts the aging timer of the directory-entry.
Even if "file" has already been removed from the server, because the
lookupcache=positive option I use, keeps the entries valid for a while.
nfs_statfs() returns ESTALE if "file" has already been removed from the
server.
If the user application repeats the statfs("file", &buff) call, we
are stuck: "file" remains young forever in the directory-cache.
Signed-off-by: Zoltan Menyhart <Zoltan.Menyhart@bull.net>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
Trond Myklebust [Sun, 12 Sep 2010 23:55:26 +0000 (19:55 -0400)]
NFS: Fix a typo in nfs_sockaddr_match_ipaddr6
Reported-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
Miquel van Smoorenburg [Sun, 12 Sep 2010 23:55:26 +0000 (19:55 -0400)]
sunrpc: increase MAX_HASHTABLE_BITS to 14
The maximum size of the authcache is now set to 1024 (10 bits),
but on our server we need at least 4096 (12 bits). Increase
MAX_HASHTABLE_BITS to 14. This is a maximum of 16384 entries,
each containing a pointer (8 bytes on x86_64). This is
exactly the limit of kmalloc() (128K).
Signed-off-by: Miquel van Smoorenburg <mikevs@xs4all.net>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Bian Naimeng [Sun, 12 Sep 2010 23:55:26 +0000 (19:55 -0400)]
gss:spkm3 miss returning error to caller when import security context
spkm3 miss returning error to up layer when import security context,
it may be return ok though it has failed to import security context.
Signed-off-by: Bian Naimeng <biannm@cn.fujitsu.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Bian Naimeng [Sun, 12 Sep 2010 23:55:25 +0000 (19:55 -0400)]
gss:krb5 miss returning error to caller when import security context
krb5 miss returning error to up layer when import security context,
it may be return ok though it has failed to import security context.
Signed-off-by: Bian Naimeng <biannm@cn.fujitsu.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Fabio Olive Leite [Sun, 12 Sep 2010 23:55:25 +0000 (19:55 -0400)]
Remove incorrect do_vfs_lock message
The do_vfs_lock function on fs/nfs/file.c is only called if NLM is
not being used, via the -onolock mount option. Therefore it cannot
really be "out of sync with lock manager" when the local locking
function called returns an error, as there will be no corresponding
call to the NLM. For details, simply check the if/else on do_setlk
and do_unlk on fs/nfs/file.c.
Signed-Off-By: Fabio Olive Leite <fleite@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
J. Bruce Fields [Sun, 12 Sep 2010 23:55:25 +0000 (19:55 -0400)]
SUNRPC: cleanup state-machine ordering
This is just a minor cleanup: net/sunrpc/clnt.c clarifies the rpc client
state machine by commenting each state and by laying out the functions
implementing each state in the order that each state is normally
executed (in the absence of errors).
The previous patch "Fix null dereference in call_allocate" changed the
order of the states. Move the functions and update the comments to
reflect the change.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Sun, 12 Sep 2010 23:55:25 +0000 (19:55 -0400)]
SUNRPC: Fix a race in rpc_info_open
There is a race between rpc_info_open and rpc_release_client()
in that nothing stops a process from opening the file after
the clnt->cl_kref goes to zero.
Fix this by using atomic_inc_unless_zero()...
Reported-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
Trond Myklebust [Sun, 12 Sep 2010 23:55:25 +0000 (19:55 -0400)]
SUNRPC: Fix race corrupting rpc upcall
If rpc_queue_upcall() adds a new upcall to the rpci->pipe list just
after rpc_pipe_release calls rpc_purge_list(), but before it calls
gss_pipe_release (as rpci->ops->release_pipe(inode)), then the latter
will free a message without deleting it from the rpci->pipe list.
We will be left with a freed object on the rpc->pipe list. Most
frequent symptoms are kernel crashes in rpc.gssd system calls on the
pipe in question.
Reported-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
J. Bruce Fields [Sun, 12 Sep 2010 23:55:25 +0000 (19:55 -0400)]
Fix null dereference in call_allocate
In call_allocate we need to reach the auth in order to factor au_cslack
into the allocation.
As of
a17c2153d2e271b0cbacae9bed83b0eaa41db7e1 "SUNRPC: Move the bound
cred to struct rpc_rqst", call_allocate attempts to do this by
dereferencing tk_client->cl_auth, however this is not guaranteed to be
defined--cl_auth can be zero in the case of gss context destruction (see
rpc_free_auth).
Reorder the client state machine to bind credentials before allocating,
so that we can instead reach the auth through the cred.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
Linus Torvalds [Sun, 12 Sep 2010 23:07:37 +0000 (16:07 -0700)]
Linux 2.6.36-rc4
Randy Dunlap [Sat, 11 Sep 2010 22:55:26 +0000 (15:55 -0700)]
docbook: skip files with no docs since they generate scary warnings
Fix docbook templates that reference files that do not contain the
expected kernel-doc notation.
Fixes these warnings:
Warning(arch/x86/include/asm/unaligned.h): no structured comments found
Warning(lib/vsprintf.c): no structured comments found
These cause errors in the generated html output, like below, so drop
these lines.
Name
arch/x86/include/asm/unaligned.h - Document generation inconsistency
Oops
Warning
The template for this document tried to insert the structured comment from the file arch/x86/include/asm/unaligned.h at this point, but none was found. This dummy section is inserted to allow generation to continue.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Johannes Berg [Sat, 11 Sep 2010 22:55:22 +0000 (15:55 -0700)]
docbook: warn on unused doc entries
When you don't use !E or !I but only !F, then it's very easy to miss
including some functions, structs etc. in documentation. To help
finding which ones were missed, allow printing out the unused ones as
warnings.
For example, using this on mac80211 yields a lot of warnings like this:
Warning: didn't use docs for DOC: mac80211 workqueue
Warning: didn't use docs for ieee80211_max_queues
Warning: didn't use docs for ieee80211_bss_change
Warning: didn't use docs for ieee80211_bss_conf
when generating the documentation for it.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Johannes Berg [Sat, 11 Sep 2010 22:55:12 +0000 (15:55 -0700)]
kernel-doc: ignore case when stripping attributes
There are valid attributes that could have upper case letters, but we
still want to remove, like for example
__attribute__((aligned(NETDEV_ALIGN)))
as encountered in the wireless code.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sat, 11 Sep 2010 22:50:53 +0000 (15:50 -0700)]
Merge branch 'pm-fixes' of git://git./linux/kernel/git/rafael/suspend-2.6
* 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6:
PM / Hibernate: Avoid hitting OOM during preallocation of memory
PM QoS: Correct pr_debug() misuse and improve parameter checks
PM: Prevent waiting forever on asynchronous resume after failing suspend
Linus Torvalds [Sat, 11 Sep 2010 19:17:02 +0000 (12:17 -0700)]
Merge git://git./linux/kernel/git/jejb/scsi-rc-fixes-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6:
[SCSI] fix use-after-free in scsi_init_io()
[SCSI] sd: fix medium-removal bug
[SCSI] qla2xxx: Update version number to 8.03.04-k0.
[SCSI] qla2xxx: Check for empty slot in request queue before posting Command type 6 request.
[SCSI] qla2xxx: Cover UNDERRUN case where SCSI status is set.
[SCSI] qla2xxx: Correctly set fw hung and complete only waiting mbx.
[SCSI] qla2xxx: Reset seconds_since_last_heartbeat correctly.
[SCSI] qla2xxx: make rport deletions explicit during vport removal
[SCSI] qla2xxx: Fix vport delete issues
[SCSI] sd, sym53c8xx: Remove warnings after vsprintf %pV introducation.
[SCSI] Fix warning: zero-length gnu_printf format string
[SCSI] hpsa: disable doorbell reset on reset_devices
[SCSI] be2iscsi: Fix for Login failure
[SCSI] fix bio.bi_rw handling
Rafael J. Wysocki [Sat, 11 Sep 2010 18:58:27 +0000 (20:58 +0200)]
PM / Hibernate: Avoid hitting OOM during preallocation of memory
There is a problem in hibernate_preallocate_memory() that it calls
preallocate_image_memory() with an argument that may be greater than
the total number of available non-highmem memory pages. If that's
the case, the OOM condition is guaranteed to trigger, which in turn
can cause significant slowdown to occur during hibernation.
To avoid that, make preallocate_image_memory() adjust its argument
before calling preallocate_image_pages(), so that the total number of
saveable non-highem pages left is not less than the minimum size of
a hibernation image. Change hibernate_preallocate_memory() to try to
allocate from highmem if the number of pages allocated by
preallocate_image_memory() is too low.
Modify free_unnecessary_pages() to take all possible memory
allocation patterns into account.
Reported-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Tested-by: M. Vefa Bicakci <bicave@superonline.com>
Linus Torvalds [Sat, 11 Sep 2010 15:06:38 +0000 (08:06 -0700)]
Merge git://git./linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (28 commits)
ipheth: remove incorrect devtype to WWAN
MAINTAINERS: Add CAIF
sctp: fix test for end of loop
KS8851: Correct RX packet allocation
udp: add rehash on connect()
net: blackhole route should always be recalculated
ipv4: Suppress lockdep-RCU false positive in FIB trie (3)
niu: Fix kernel buffer overflow for ETHTOOL_GRXCLSRLALL
ipvs: fix active FTP
gro: Re-fix different skb headrooms
via-velocity: Turn scatter-gather support back off.
ipv4: Fix reverse path filtering with multipath routing.
UNIX: Do not loop forever at unix_autobind().
PATCH: b44 Handle RX FIFO overflow better (simplified)
irda: off by one
3c59x: Fix deadlock in vortex_error()
netfilter: discard overlapping IPv6 fragment
ipv6: discard overlapping fragment
net: fix tx queue selection for bridged devices implementing select_queue
bonding: Fix jiffies overflow problems (again)
...
Fix up trivial conflicts due to the same cgroup API thinko fix going
through both Andrew and the networking tree. However, there were small
differences between the two, with Andrew's version generally being the
nicer one, and the one I merged first. So pick that one.
Conflicts in: include/linux/cgroup.h and kernel/cgroup.c
Linus Torvalds [Sat, 11 Sep 2010 15:01:09 +0000 (08:01 -0700)]
Merge git://git./linux/kernel/git/davem/sparc-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
sparc: Kill all BKL usage.
Linus Torvalds [Sat, 11 Sep 2010 14:59:49 +0000 (07:59 -0700)]
Merge branch 'sched-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, tsc: Fix a preemption leak in restore_sched_clock_state()
sched: Move sched_avg_update() to update_cpu_load()
Peter Zijlstra [Fri, 10 Sep 2010 20:32:53 +0000 (22:32 +0200)]
x86, tsc: Fix a preemption leak in restore_sched_clock_state()
Doh, a real life genuine preemption leak..
This caused a suspend failure.
Reported-bisected-and-tested-by-the-invaluable: Jeff Chua <jeff.chua.linux@gmail.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Nico Schottelius <nico-linux-20100709@schottelius.org>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Florian Pritz <flo@xssn.at>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Len Brown <lenb@kernel.org>
Cc: <stable@kernel.org> # Greg, please apply after: cd7240c ("x86, tsc, sched: Recompute cyc2ns_offset's during resume from")
sleep states
LKML-Reference: <
1284150773.402.122.camel@laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Linus Torvalds [Sat, 11 Sep 2010 01:19:43 +0000 (18:19 -0700)]
Merge branch 'drm-intel-fixes' of git://git./linux/kernel/git/ickle/drm-intel
* 'drm-intel-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ickle/drm-intel:
drm/i915: don't enable self-refresh on Ironlake
drm/i915: Double check that the wait_request is not pending before warning
Revert "drm/i915: Warn if we run out of FIFO space for a mode"
Revert "drm/i915: Allow LVDS on pipe A on gen4+"
Revert "drm/i915: Enable RC6 on Ironlake."
Linus Torvalds [Sat, 11 Sep 2010 01:19:26 +0000 (18:19 -0700)]
Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs
* 'for-linus' of git://oss.sgi.com/xfs/xfs:
xfs: log IO completion workqueue is a high priority queue
xfs: prevent reading uninitialized stack memory
Peter Zijlstra [Fri, 10 Sep 2010 20:32:53 +0000 (22:32 +0200)]
x86, tsc: Fix a preemption leak in restore_sched_clock_state()
A real life genuine preemption leak..
Reported-and-tested-by: Jeff Chua <jeff.chua.linux@gmail.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
mark gross [Thu, 9 Sep 2010 21:20:09 +0000 (23:20 +0200)]
PM QoS: Correct pr_debug() misuse and improve parameter checks
Correct some pr_debug() misuse and add a stronger parameter check to
pm_qos_write() for the ASCII hex value case. Thanks to Dan Carpenter
for pointing out the problem!
Signed-off-by: mark gross <markgross@thegnar.org>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Dave Chinner [Wed, 8 Sep 2010 09:00:22 +0000 (09:00 +0000)]
xfs: log IO completion workqueue is a high priority queue
The workqueue implementation in 2.6.36-rcX has changed, resulting
in the workqueues no longer having dedicated threads for work
processing. This has caused severe livelocks under heavy parallel
create workloads because the log IO completions have been getting
held up behind metadata IO completions. Hence log commits would
stall, memory allocation would stall because pages could not be
cleaned, and lock contention on the AIL during inode IO completion
processing was being seen to slow everything down even further.
By making the log Io completion workqueue a high priority workqueue,
they are queued ahead of all data/metadata IO completions and
processed before the data/metadata completions. Hence the log never
gets stalled, and operations needed to clean memory can continue as
quickly as possible. This avoids the livelock conditions and allos
the system to keep running under heavy load as per normal.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
Roland McGrath [Wed, 8 Sep 2010 02:37:06 +0000 (19:37 -0700)]
execve: make responsive to SIGKILL with large arguments
An execve with a very large total of argument/environment strings
can take a really long time in the execve system call. It runs
uninterruptibly to count and copy all the strings. This change
makes it abort the exec quickly if sent a SIGKILL.
Note that this is the conservative change, to interrupt only for
SIGKILL, by using fatal_signal_pending(). It would be perfectly
correct semantics to let any signal interrupt the string-copying in
execve, i.e. use signal_pending() instead of fatal_signal_pending().
We'll save that change for later, since it could have user-visible
consequences, such as having a timer set too quickly make it so that
an execve can never complete, though it always happened to work before.
Signed-off-by: Roland McGrath <roland@redhat.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Roland McGrath [Wed, 8 Sep 2010 02:36:28 +0000 (19:36 -0700)]
execve: improve interactivity with large arguments
This adds a preemption point during the copying of the argument and
environment strings for execve, in copy_strings(). There is already
a preemption point in the count() loop, so this doesn't add any new
points in the abstract sense.
When the total argument+environment strings are very large, the time
spent copying them can be much more than a normal user time slice.
So this change improves the interactivity of the rest of the system
when one process is doing an execve with very large arguments.
Signed-off-by: Roland McGrath <roland@redhat.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Roland McGrath [Wed, 8 Sep 2010 02:35:49 +0000 (19:35 -0700)]
setup_arg_pages: diagnose excessive argument size
The CONFIG_STACK_GROWSDOWN variant of setup_arg_pages() does not
check the size of the argument/environment area on the stack.
When it is unworkably large, shift_arg_pages() hits its BUG_ON.
This is exploitable with a very large RLIMIT_STACK limit, to
create a crash pretty easily.
Check that the initial stack is not too large to make it possible
to map in any executable. We're not checking that the actual
executable (or intepreter, for binfmt_elf) will fit. So those
mappings might clobber part of the initial stack mapping. But
that is just userland lossage that userland made happen, not a
kernel problem.
Signed-off-by: Roland McGrath <roland@redhat.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 10 Sep 2010 15:02:45 +0000 (08:02 -0700)]
Merge branch 'kvm-updates/2.6.36' of git://git./virt/kvm/kvm
* 'kvm-updates/2.6.36' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: Perform hardware_enable in CPU_STARTING callback
KVM: i8259: fix migration
KVM: fix i8259 oops when no vcpus are online
KVM: x86 emulator: fix regression with cmpxchg8b on i386 hosts
Linus Torvalds [Fri, 10 Sep 2010 14:31:24 +0000 (07:31 -0700)]
Merge branch 'perf-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
tracing: t_start: reset FTRACE_ITER_HASH in case of seek/pread
perf symbols: Fix multiple initialization of symbol system
perf: Fix CPU hotplug
perf, trace: Fix module leak
tracing/kprobe: Fix handling of C-unlike argument names
tracing/kprobes: Fix handling of argument names
perf probe: Fix handling of arguments names
perf probe: Fix return probe support
tracing/kprobe: Fix a memory leak in error case
tracing: Do not allow llseek to set_ftrace_filter
David Howells [Fri, 10 Sep 2010 08:59:51 +0000 (09:59 +0100)]
KEYS: Fix bug in keyctl_session_to_parent() if parent has no session keyring
Fix a bug in keyctl_session_to_parent() whereby it tries to check the ownership
of the parent process's session keyring whether or not the parent has a session
keyring [CVE-2010-2960].
This results in the following oops:
BUG: unable to handle kernel NULL pointer dereference at
00000000000000a0
IP: [<
ffffffff811ae4dd>] keyctl_session_to_parent+0x251/0x443
...
Call Trace:
[<
ffffffff811ae2f3>] ? keyctl_session_to_parent+0x67/0x443
[<
ffffffff8109d286>] ? __do_fault+0x24b/0x3d0
[<
ffffffff811af98c>] sys_keyctl+0xb4/0xb8
[<
ffffffff81001eab>] system_call_fastpath+0x16/0x1b
if the parent process has no session keyring.
If the system is using pam_keyinit then it mostly protected against this as all
processes derived from a login will have inherited the session keyring created
by pam_keyinit during the log in procedure.
To test this, pam_keyinit calls need to be commented out in /etc/pam.d/.
Reported-by: Tavis Ormandy <taviso@cmpxchg8b.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Tavis Ormandy <taviso@cmpxchg8b.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Howells [Fri, 10 Sep 2010 08:59:46 +0000 (09:59 +0100)]
KEYS: Fix RCU no-lock warning in keyctl_session_to_parent()
There's an protected access to the parent process's credentials in the middle
of keyctl_session_to_parent(). This results in the following RCU warning:
===================================================
[ INFO: suspicious rcu_dereference_check() usage. ]
---------------------------------------------------
security/keys/keyctl.c:1291 invoked rcu_dereference_check() without protection!
other info that might help us debug this:
rcu_scheduler_active = 1, debug_locks = 0
1 lock held by keyctl-session-/2137:
#0: (tasklist_lock){.+.+..}, at: [<
ffffffff811ae2ec>] keyctl_session_to_parent+0x60/0x236
stack backtrace:
Pid: 2137, comm: keyctl-session- Not tainted 2.6.36-rc2-cachefs+ #1
Call Trace:
[<
ffffffff8105606a>] lockdep_rcu_dereference+0xaa/0xb3
[<
ffffffff811ae379>] keyctl_session_to_parent+0xed/0x236
[<
ffffffff811af77e>] sys_keyctl+0xb4/0xb6
[<
ffffffff81001eab>] system_call_fastpath+0x16/0x1b
The code should take the RCU read lock to make sure the parents credentials
don't go away, even though it's holding a spinlock and has IRQ disabled.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 10 Sep 2010 14:26:27 +0000 (07:26 -0700)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
block: Range check cpu in blk_cpu_to_group
scatterlist: prevent invalid free when alloc fails
writeback: Fix lost wake-up shutting down writeback thread
writeback: do not lose wakeup events when forking bdi threads
cciss: fix reporting of max queue depth since init
block: switch s390 tape_block and mg_disk to elevator_change()
block: add function call to switch the IO scheduler from a driver
fs/bio-integrity.c: return -ENOMEM on kmalloc failure
bio-integrity.c: remove dependency on __GFP_NOFAIL
BLOCK: fix bio.bi_rw handling
block: put dev->kobj in blk_register_queue fail path
cciss: handle allocation failure
cfq-iosched: Documentation help for new tunables
cfq-iosched: blktrace print per slice sector stats
cfq-iosched: Implement tunable group_idle
cfq-iosched: Do group share accounting in IOPS when slice_idle=0
cfq-iosched: Do not idle if slice_idle=0
cciss: disable doorbell reset on reset_devices
blkio: Fix return code for mkdir calls
Linus Torvalds [Fri, 10 Sep 2010 14:24:51 +0000 (07:24 -0700)]
Merge branch 'at91-fixes-for-linus' of git://github.com/at91linux/linux-2.6-at91
* 'at91-fixes-for-linus' of git://github.com/at91linux/linux-2.6-at91:
AT91: at91sam9261ek: remove C99 comments but keep information
AT91: at91sam9261ek board: remove warnings related to use of SPI or SD/MMC
AT91: dm9000 initialization update
AT91: SAM9G45 - add a separate clock entry for every single TC block
AT91: clock: peripheral clocks can have other parent than mck
AT91: change dma resource index
Linus Torvalds [Fri, 10 Sep 2010 14:23:45 +0000 (07:23 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/tiwai/sound-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
ALSA: rawmidi: fix the get next midi device ioctl
ALSA: hda - Fix wrong HP pin detection in snd_hda_parse_pin_def_config()
ALSA: seq/oss - Fix double-free at error path of snd_seq_oss_open()
ALSA: msnd-classic: Fix invalid cfg parameter
ALSA: hda - Enable PC-beep for EeePC with ALC269 codec
ALSA: hda - Add errata initverb sequence for CS42xx codecs
ALSA: usb - Release capture substream URBs properly
ALSA: virtuoso: fix setting of Xonar DS line-in/mic-in controls
ALSA: virtuoso: work around missing reset in the Xonar DS Windows driver
ALSA: hda - Add quirk for Lenovo T400s
ALSA: usb-audio: fix detection of vendor-specific device protocol settings
ALSA: usb-audio: Assume first control interface is for audio
ALSA: hda - Add a new hp-laptop model for Conexant 5066, tested on HP G60
Jesse Barnes [Thu, 9 Sep 2010 18:58:02 +0000 (11:58 -0700)]
drm/i915: don't enable self-refresh on Ironlake
We don't know how to enable it safely, especially as outputs turn on and
off. When disabling LP1 we also need to make sure LP2 and 3 are already
disabled.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=29173
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=29082
Reported-by: Chris Lord <chris@linux.intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Tested-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: stable@kernel.org
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Dan Rosenberg [Mon, 6 Sep 2010 22:24:57 +0000 (18:24 -0400)]
xfs: prevent reading uninitialized stack memory
The XFS_IOC_FSGETXATTR ioctl allows unprivileged users to read 12
bytes of uninitialized stack memory, because the fsxattr struct
declared on the stack in xfs_ioc_fsgetxattr() does not alter (or zero)
the 12-byte fsx_pad member before copying it back to the user. This
patch takes care of it.
Signed-off-by: Dan Rosenberg <dan.j.rosenberg@gmail.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
Nicolas Ferre [Fri, 10 Sep 2010 12:36:06 +0000 (14:36 +0200)]
AT91: at91sam9261ek: remove C99 comments but keep information
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Nicolas Ferre [Fri, 10 Sep 2010 09:26:42 +0000 (11:26 +0200)]
AT91: at91sam9261ek board: remove warnings related to use of SPI or SD/MMC
The sd/mmc data structure is not used if SPI is selected. The configuration
of PIO on the board prevent from using both interfaces at the same time
(board dependent).
Remove the warnings at compilation time adding a preprocessor condition.
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Nicolas Ferre [Fri, 10 Sep 2010 09:38:43 +0000 (11:38 +0200)]
AT91: dm9000 initialization update
Add information in dm9000 mac/phy chip initialization:
- irq resource details
- platform data details
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Brian King [Fri, 10 Sep 2010 07:03:21 +0000 (09:03 +0200)]
block: Range check cpu in blk_cpu_to_group
While testing CPU DLPAR, the following problem was discovered.
We were DLPAR removing the first CPU, which in this case was
logical CPUs 0-3. CPUs 0-2 were already marked offline and
we were in the process of offlining CPU 3. After marking
the CPU inactive and offline in cpu_disable, but before the
cpu was completely idle (cpu_die), we ended up in __make_request
on CPU 3. There we looked at the topology map to see which CPU
to complete the I/O on and found no CPUs in the cpu_sibling_map.
This resulted in the block layer setting the completion cpu
to be NR_CPUS, which then caused an oops when we tried to
complete the I/O.
Fix this by sanity checking the value we return from blk_cpu_to_group
to be a valid cpu value.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Takashi Iwai [Fri, 10 Sep 2010 06:27:00 +0000 (08:27 +0200)]
Merge branch 'fix/hda' into for-linus
Ingo Molnar [Fri, 10 Sep 2010 06:05:34 +0000 (08:05 +0200)]
Merge branch 'tip/perf/urgent' of git://git./linux/kernel/git/rostedt/linux-2.6-trace into perf/urgent
Ingo Molnar [Fri, 10 Sep 2010 05:34:14 +0000 (07:34 +0200)]
Merge branch 'perf/urgent' of git://git./linux/kernel/git/acme/linux-2.6 into perf/urgent
David S. Miller [Fri, 10 Sep 2010 04:59:51 +0000 (21:59 -0700)]
Merge branch 'vhost-net' of git://git./linux/kernel/git/mst/vhost
Dan Williams [Wed, 8 Sep 2010 07:50:47 +0000 (07:50 +0000)]
ipheth: remove incorrect devtype to WWAN
The 'wwan' devtype is meant for devices that require preconfiguration
and *every* time setup before the ethernet interface can be used, like
cellular modems which require a series of setup commands on serial ports
or other mechanisms before the ethernet interface will handle packets.
As ipheth only requires one-per-hotplug pairing setup with no
preconfiguration (like APN, phone #, etc) and the network interface is
usable at any time after that initial setup, remove the incorrect
devtype wwan.
Signed-off-by: Dan Williams <dcbw@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Joe Perches [Tue, 7 Sep 2010 20:33:24 +0000 (20:33 +0000)]
MAINTAINERS: Add CAIF
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 10 Sep 2010 03:28:19 +0000 (20:28 -0700)]
Merge branch 'upstream-linus' of git://git./linux/kernel/git/jgarzik/libata-dev
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
libata-sff: Reenable Port Multiplier after libata-sff remodeling.
libata: skip EH autopsy and recovery during suspend
ahci: AHCI and RAID mode SATA patch for Intel Patsburg DeviceIDs
ata_piix: IDE Mode SATA patch for Intel Patsburg DeviceIDs
libata,pata_via: revert ata_wait_idle() removal from ata_sff/via_tf_load()
ahci: fix hang on failed softreset
pata_artop: Fix device ID parity check
Chris Wright [Thu, 9 Sep 2010 23:34:59 +0000 (16:34 -0700)]
tracing: t_start: reset FTRACE_ITER_HASH in case of seek/pread
Be sure to avoid entering t_show() with FTRACE_ITER_HASH set without
having properly started the iterator to iterate the hash. This case is
degenerate and, as discovered by Robert Swiecki, can cause t_hash_show()
to misuse a pointer. This causes a NULL ptr deref with possible security
implications. Tracked as CVE-2010-3079.
Cc: Robert Swiecki <swiecki@google.com>
Cc: Eugene Teo <eugene@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Gwendal Grignou [Tue, 31 Aug 2010 23:20:36 +0000 (16:20 -0700)]
libata-sff: Reenable Port Multiplier after libata-sff remodeling.
Keep track of the link on the which the current request is in progress.
It allows support of links behind port multiplier.
Not all libata-sff is PMP compliant. Code for native BMDMA controller
does not take in accound PMP.
Tested on Marvell 7042 and Sil7526.
Signed-off-by: Gwendal Grignou <gwendal@google.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Tejun Heo [Tue, 7 Sep 2010 12:05:31 +0000 (14:05 +0200)]
libata: skip EH autopsy and recovery during suspend
For some mysterious reason, certain hardware reacts badly to usual EH
actions while the system is going for suspend. As the devices won't
be needed until the system is resumed, ask EH to skip usual autopsy
and recovery and proceed directly to suspend.
Signed-off-by: Tejun Heo <tj@kernel.org>
Tested-by: Stephan Diestelhorst <stephan.diestelhorst@amd.com>
Cc: stable@kernel.org
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Seth Heasley [Thu, 9 Sep 2010 16:44:56 +0000 (09:44 -0700)]
ahci: AHCI and RAID mode SATA patch for Intel Patsburg DeviceIDs
This patch adds the Intel Patsburg (PCH) SATA AHCI and RAID Controller
DeviceIDs.
Signed-off-by: Seth Heasley <seth.heasley@intel.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Seth Heasley [Thu, 9 Sep 2010 16:42:40 +0000 (09:42 -0700)]
ata_piix: IDE Mode SATA patch for Intel Patsburg DeviceIDs
This patch adds the Intel Patsburg (PCH) IDE mode SATA Controller DeviceIDs.
Signed-off-by: Seth Heasley <seth.heasley@intel.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Tejun Heo [Thu, 9 Sep 2010 15:13:31 +0000 (17:13 +0200)]
libata,pata_via: revert ata_wait_idle() removal from ata_sff/via_tf_load()
Commit
978c0666 (libata: Remove excess delay in the tf_load path)
removed ata_wait_idle() from ata_sff_tf_load() and via_tf_load().
This caused obscure detection problems in sata_sil.
https://bugzilla.kernel.org/show_bug.cgi?id=16606
The commit was pure performance optimization. Revert it for now.
Reported-by: Dieter Plaetinck <dieter@plaetinck.be>
Reported-by: Jan Beulich <JBeulich@novell.com>
Bisected-by: gianluca <gianluca@sottospazio.it>
Cc: stable@kernel.org
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Jorge Boncompte [DTI2] [Thu, 9 Sep 2010 23:38:19 +0000 (16:38 -0700)]
minix: fix regression in minix_mkdir()
Commit
9eed1fb721c ("minix: replace inode uid,gid,mode init with helper")
broke directory creation on minix filesystems.
Fix it by passing the needed mode flag to inode init helper.
Signed-off-by: Jorge Boncompte [DTI2] <jorge@dti2.net>
Cc: Dmitry Monakhov <dmonakhov@openvz.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: <stable@kernel.org> [2.6.35.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mel Gorman [Thu, 9 Sep 2010 23:38:18 +0000 (16:38 -0700)]
mm: page allocator: drain per-cpu lists after direct reclaim allocation fails
When under significant memory pressure, a process enters direct reclaim
and immediately afterwards tries to allocate a page. If it fails and no
further progress is made, it's possible the system will go OOM. However,
on systems with large amounts of memory, it's possible that a significant
number of pages are on per-cpu lists and inaccessible to the calling
process. This leads to a process entering direct reclaim more often than
it should increasing the pressure on the system and compounding the
problem.
This patch notes that if direct reclaim is making progress but allocations
are still failing that the system is already under heavy pressure. In
this case, it drains the per-cpu lists and tries the allocation a second
time before continuing.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Christoph Lameter <cl@linux.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Christoph Lameter [Thu, 9 Sep 2010 23:38:17 +0000 (16:38 -0700)]
mm: page allocator: calculate a better estimate of NR_FREE_PAGES when memory is low and kswapd is awake
Ordinarily watermark checks are based on the vmstat NR_FREE_PAGES as it is
cheaper than scanning a number of lists. To avoid synchronization
overhead, counter deltas are maintained on a per-cpu basis and drained
both periodically and when the delta is above a threshold. On large CPU
systems, the difference between the estimated and real value of
NR_FREE_PAGES can be very high. If NR_FREE_PAGES is much higher than
number of real free page in buddy, the VM can allocate pages below min
watermark, at worst reducing the real number of pages to zero. Even if
the OOM killer kills some victim for freeing memory, it may not free
memory if the exit path requires a new page resulting in livelock.
This patch introduces a zone_page_state_snapshot() function (courtesy of
Christoph) that takes a slightly more accurate view of an arbitrary vmstat
counter. It is used to read NR_FREE_PAGES while kswapd is awake to avoid
the watermark being accidentally broken. The estimate is not perfect and
may result in cache line bounces but is expected to be lighter than the
IPI calls necessary to continually drain the per-cpu counters while kswapd
is awake.
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mel Gorman [Thu, 9 Sep 2010 23:38:16 +0000 (16:38 -0700)]
mm: page allocator: update free page counters after pages are placed on the free list
When allocating a page, the system uses NR_FREE_PAGES counters to
determine if watermarks would remain intact after the allocation was made.
This check is made without interrupts disabled or the zone lock held and
so is race-prone by nature. Unfortunately, when pages are being freed in
batch, the counters are updated before the pages are added on the list.
During this window, the counters are misleading as the pages do not exist
yet. When under significant pressure on systems with large numbers of
CPUs, it's possible for processes to make progress even though they should
have been stalled. This is particularly problematic if a number of the
processes are using GFP_ATOMIC as the min watermark can be accidentally
breached and in extreme cases, the system can livelock.
This patch updates the counters after the pages have been added to the
list. This makes the allocator more cautious with respect to preserving
the watermarks and mitigates livelock possibilities.
[akpm@linux-foundation.org: avoid modifying incoming args]
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Christoph Lameter <cl@linux.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
KAMEZAWA Hiroyuki [Thu, 9 Sep 2010 23:38:14 +0000 (16:38 -0700)]
vmstat: update zone stat threshold when onlining a cpu
refresh_zone_stat_thresholds() calculates parameter based on the number of
online cpus. It's called at cpu offlining but needs to be called at
onlining, too.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>