J. Bruce Fields [Tue, 9 Apr 2013 21:42:28 +0000 (17:42 -0400)]
nfsd4: clean up validate_stateid
The logic here is better expressed with a switch statement.
While we're here, CLOSED stateids (or stateids of an unkown type--which
would indicate a server bug) should probably return nfserr_bad_stateid,
though this behavior shouldn't affect any non-buggy client.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
J. Bruce Fields [Tue, 9 Apr 2013 15:34:36 +0000 (11:34 -0400)]
nfsd4: check backchannel attributes on create_session
Make sure the client gives us an adequate backchannel.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
J. Bruce Fields [Mon, 8 Apr 2013 20:44:14 +0000 (16:44 -0400)]
nfsd4: fix forechannel attribute negotiation
Negotiation of the 4.1 session forechannel attributes is a mess. Fix:
- Move it all into check_forechannel_attrs instead of spreading
it between that, alloc_session, and init_forechannel_attrs.
- set a minimum "slotsize" so that our drc memory limits apply
even for small maxresponsesize_cached. This also fixes some
bugs when slotsize becomes <= 0.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
J. Bruce Fields [Mon, 8 Apr 2013 19:42:12 +0000 (15:42 -0400)]
nfsd4: cleanup check_forechannel_attrs
Pass this struct by reference, not by value, and return an error instead
of a boolean to allow for future additions.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
J. Bruce Fields [Fri, 29 Mar 2013 00:37:14 +0000 (20:37 -0400)]
nfsd4: don't close read-write opens too soon
Don't actually close any opens until we don't need them at all.
This means being left with write access when it's not really necessary,
but that's better than putting a file that might still have posix locks
held on it, as we have been.
Reported-by: Toralf Förster <toralf.foerster@gmx.de>
Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
J. Bruce Fields [Sun, 7 Apr 2013 17:28:16 +0000 (13:28 -0400)]
nfsd4: release lockowners on last unlock in 4.1 case
In the 4.1 case we're supposed to release lockowners as soon as they're
no longer used.
It would probably be more efficient to reference count them, but that's
slightly fiddly due to the need to have callbacks from locks.c to take
into account lock merging and splitting.
For most cases just scanning the inode's lock list on unlock for
matching locks will be sufficient.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
J. Bruce Fields [Fri, 22 Mar 2013 22:03:49 +0000 (18:03 -0400)]
nfsd4: more sessions/open-owner-replay cleanup
More logic that's unnecessary in the 4.1 case.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
J. Bruce Fields [Fri, 22 Mar 2013 21:44:19 +0000 (17:44 -0400)]
nfsd4: no need for replay_owner in sessions case
The replay_owner will never be used in the sessions case.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
J. Bruce Fields [Sun, 7 Apr 2013 17:21:08 +0000 (13:21 -0400)]
nfsd4: remove some redundant comments
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Wei Yongjun [Tue, 9 Apr 2013 06:15:31 +0000 (14:15 +0800)]
nfsd: use kmem_cache_free() instead of kfree()
memory allocated by kmem_cache_alloc() should be freed using
kmem_cache_free(), not kfree().
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
J. Bruce Fields [Mon, 1 Apr 2013 20:37:12 +0000 (16:37 -0400)]
nfsd4: cleanup handling of nfsv4.0 closed stateid's
Closed stateid's are kept around a little while to handle close replays
in the 4.0 case. So we stash them in the last-used stateid in the
oo_last_closed_stateid field of the open owner. We can free that in
encode_seqid_op_tail once the seqid on the open owner is next
incremented. But we don't want to do that on the close itself; so we
set NFS4_OO_PURGE_CLOSE flag set on the open owner, skip freeing it the
first time through encode_seqid_op_tail, then when we see that flag set
next time we free it.
This is unnecessarily baroque.
Instead, just move the logic that increments the seqid out of the xdr
code and into the operation code itself.
The justification given for the current placement is that we need to
wait till the last minute to be sure we know whether the status is a
sequence-id-mutating error or not, but examination of the code shows
that can't actually happen.
Reported-by: Yanchuan Nian <ycnian@gmail.com>
Tested-by: Yanchuan Nian <ycnian@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
J. Bruce Fields [Thu, 21 Mar 2013 19:49:47 +0000 (15:49 -0400)]
nfsd4: remove unused nfs4_check_deleg argument
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
J. Bruce Fields [Thu, 21 Mar 2013 19:19:33 +0000 (15:19 -0400)]
nfsd4: make del_recall_lru per-network-namespace
If nothing else this simplifies the nfs4_state_shutdown_net logic a tad.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
J. Bruce Fields [Thu, 21 Mar 2013 15:21:50 +0000 (11:21 -0400)]
nfsd4: shut down more of delegation earlier
Once we've unhashed the delegation, it's only hanging around for the
benefit of an oustanding recall, which only needs the encoded
filehandle, stateid, and dl_retries counter. No point keeping the file
around any longer, or keeping it hashed.
This also fixes a race: calls to idr_remove should really be serialized
by the caller, but the nfs4_put_delegation call from the callback code
isn't taking the state lock.
(Better might be to cancel the callback before destroying the
delegation, and remove any need for reference counting--but I don't see
an easy way to cancel an rpc call.)
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
J. Bruce Fields [Thu, 21 Mar 2013 14:59:29 +0000 (10:59 -0400)]
nfsd4: minor cb_recall simplification
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Alexey Khoroshilov [Fri, 22 Mar 2013 20:36:44 +0000 (00:36 +0400)]
SUNRPC/cache: add module_put() on error path in cache_open()
If kmalloc() fails in cache_open(), module cd->owner left locked.
The patch adds module_put(cd->owner) on this path.
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
fanchaoting [Wed, 27 Mar 2013 08:31:18 +0000 (16:31 +0800)]
nfsd: remove /proc/fs/nfs when create /proc/fs/nfs/exports error
when create /proc/fs/nfs/exports error, we should remove /proc/fs/nfs,
if don't do it, it maybe cause Memory leak.
Signed-off-by: fanchaoting <fanchaoting@cn.fujitsu.com>
Reviewed-by: chendt.fnst <chendt.fnst@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
fanchaoting [Mon, 1 Apr 2013 13:07:22 +0000 (21:07 +0800)]
nfsd: don't run get_file if nfs4_preprocess_stateid_op return error
we should return error status directly when nfs4_preprocess_stateid_op
return error.
Signed-off-by: fanchaoting <fanchaoting@cn.fujitsu.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Jeff Layton [Tue, 2 Apr 2013 13:01:59 +0000 (09:01 -0400)]
nfsd: convert the file_hashtbl to a hlist
We only ever traverse the hash chains in the forward direction, so a
double pointer list head isn't really necessary.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
J. Bruce Fields [Tue, 19 Mar 2013 16:05:39 +0000 (12:05 -0400)]
nfsd4: don't destroy in-use session
This changes session destruction to be similar to client destruction in
that attempts to destroy a session while in use (which should be rare
corner cases) result in DELAY. This simplifies things somewhat and
helps meet a coming 4.2 requirement.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
J. Bruce Fields [Tue, 2 Apr 2013 02:23:49 +0000 (22:23 -0400)]
nfsd4: don't destroy in-use clients
When a setclientid_confirm or create_session confirms a client after a
client reboot, it also destroys any previous state held by that client.
The shutdown of that previous state must be careful not to free the
client out from under threads processing other requests that refer to
the client.
This is a particular problem in the NFSv4.1 case when we hold a
reference to a session (hence a client) throughout compound processing.
The server attempts to handle this by unhashing the client at the time
it's destroyed, then delaying the final free to the end. But this still
leaves some races in the current code.
I believe it's simpler just to fail the attempt to destroy the client by
returning NFS4ERR_DELAY. This is a case that should never happen
anyway.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
J. Bruce Fields [Mon, 18 Mar 2013 21:31:30 +0000 (17:31 -0400)]
nfsd4: simplify bind_conn_to_session locking
The locking here is very fiddly, and there's no reason for us to be
setting cstate->session, since this is the only op in the compound.
Let's just take the state lock and drop the reference counting.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
J. Bruce Fields [Thu, 14 Mar 2013 23:55:33 +0000 (19:55 -0400)]
nfsd4: fix destroy_session race
destroy_session uses the session and client without continuously holding
any reference or locks.
Put the whole thing under the state lock for now.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
J. Bruce Fields [Thu, 14 Mar 2013 22:24:52 +0000 (18:24 -0400)]
nfsd4: clientid lookup cleanup
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
J. Bruce Fields [Thu, 14 Mar 2013 22:20:01 +0000 (18:20 -0400)]
nfsd4: destroy_clientid simplification
I'm not sure what the check for clientid expiry was meant to do here.
The check for a matching session is redundant given the previous check
for state: a client without state is, in particular, a client without
sessions.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
J. Bruce Fields [Thu, 14 Mar 2013 22:12:03 +0000 (18:12 -0400)]
nfsd4: remove some dprintk's
E.g. printk's that just report the return value from an op are
uninteresting as we already do that in the main proc_compound loop.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
J. Bruce Fields [Tue, 12 Mar 2013 21:36:17 +0000 (17:36 -0400)]
nfsd4: STALE_STATEID cleanup
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
J. Bruce Fields [Tue, 12 Mar 2013 14:12:37 +0000 (10:12 -0400)]
nfsd4: warn on odd create_session state
This should never happen.
(Note: the comparable case in setclientid_confirm *can* happen, since
updating a client record can result in both confirmed and unconfirmed
records with the same clientid.)
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
ycnian@gmail.com [Mon, 11 Mar 2013 00:46:14 +0000 (08:46 +0800)]
nfsd: fix bug on nfs4 stateid deallocation
NFS4_OO_PURGE_CLOSE is not handled properly. To avoid memory leak, nfs4
stateid which is pointed by oo_last_closed_stid is freed in nfsd4_close(),
but NFS4_OO_PURGE_CLOSE isn't cleared meanwhile. So the stateid released in
THIS close procedure may be freed immediately in the coming encoding function.
Sorry that Signed-off-by was forgotten in last version.
Signed-off-by: Yanchuan Nian <ycnian@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Yanchuan Nian [Mon, 11 Mar 2013 02:43:26 +0000 (10:43 +0800)]
nfsd: remove unused macro in nfsv4
lk_rflags is never used anywhere, and rflags is not defined in struct
nfsd4_lock.
Signed-off-by: Yanchuan Nian <ycnian@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
J. Bruce Fields [Fri, 8 Mar 2013 14:30:43 +0000 (09:30 -0500)]
nfsd4: fix use-after-free of 4.1 client on connection loss
Once we drop the lock here there's nothing keeping the client around:
the only lock still held is the xpt_lock on this socket, but this socket
no longer has any connection with the client so there's no way for other
code to know we're still using the client.
The solution is simple: all nfsd4_probe_callback does is set a few
variables and queue some work, so there's no reason we can't just keep
it under the lock.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
J. Bruce Fields [Thu, 7 Mar 2013 22:26:18 +0000 (17:26 -0500)]
nfsd4: fix race on client shutdown
Dropping the session's reference count after the client's means we leave
a window where the session's se_client pointer is NULL. An xpt_user
callback that encounters such a session may then crash:
[ 303.956011] BUG: unable to handle kernel NULL pointer dereference at
0000000000000318
[ 303.959061] IP: [<
ffffffff81481a8e>] _raw_spin_lock+0x1e/0x40
[ 303.959061] PGD
37811067 PUD
3d498067 PMD 0
[ 303.959061] Oops: 0002 [#8] PREEMPT SMP
[ 303.959061] Modules linked in: md5 nfsd auth_rpcgss nfs_acl snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_page_alloc microcode psmouse snd_timer serio_raw pcspkr evdev snd soundcore i2c_piix4 i2c_core intel_agp intel_gtt processor button nfs lockd sunrpc fscache ata_generic pata_acpi ata_piix uhci_hcd libata btrfs usbcore usb_common crc32c scsi_mod libcrc32c zlib_deflate floppy virtio_balloon virtio_net virtio_pci virtio_blk virtio_ring virtio
[ 303.959061] CPU 0
[ 303.959061] Pid: 264, comm: nfsd Tainted: G D 3.8.0-ARCH+ #156 Bochs Bochs
[ 303.959061] RIP: 0010:[<
ffffffff81481a8e>] [<
ffffffff81481a8e>] _raw_spin_lock+0x1e/0x40
[ 303.959061] RSP: 0018:
ffff880037877dd8 EFLAGS:
00010202
[ 303.959061] RAX:
0000000000000100 RBX:
ffff880037a2b698 RCX:
ffff88003d879278
[ 303.959061] RDX:
ffff88003d879278 RSI:
dead000000100100 RDI:
0000000000000318
[ 303.959061] RBP:
ffff880037877dd8 R08:
ffff88003c5a0f00 R09:
0000000000000002
[ 303.959061] R10:
0000000000000001 R11:
0000000000000000 R12:
0000000000000000
[ 303.959061] R13:
0000000000000318 R14:
ffff880037a2b680 R15:
ffff88003c1cbe00
[ 303.959061] FS:
0000000000000000(0000) GS:
ffff88003fc00000(0000) knlGS:
0000000000000000
[ 303.959061] CS: 0010 DS: 0000 ES: 0000 CR0:
000000008005003b
[ 303.959061] CR2:
0000000000000318 CR3:
000000003d49c000 CR4:
00000000000006f0
[ 303.959061] DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
[ 303.959061] DR3:
0000000000000000 DR6:
00000000ffff0ff0 DR7:
0000000000000400
[ 303.959061] Process nfsd (pid: 264, threadinfo
ffff880037876000, task
ffff88003c1fd0a0)
[ 303.959061] Stack:
[ 303.959061]
ffff880037877e08 ffffffffa03772ec ffff88003d879000 ffff88003d879278
[ 303.959061]
ffff88003d879080 0000000000000000 ffff880037877e38 ffffffffa0222a1f
[ 303.959061]
0000000000107ac0 ffff88003c22e000 ffff88003d879000 ffff88003c1cbe00
[ 303.959061] Call Trace:
[ 303.959061] [<
ffffffffa03772ec>] nfsd4_conn_lost+0x3c/0xa0 [nfsd]
[ 303.959061] [<
ffffffffa0222a1f>] svc_delete_xprt+0x10f/0x180 [sunrpc]
[ 303.959061] [<
ffffffffa0223d96>] svc_recv+0xe6/0x580 [sunrpc]
[ 303.959061] [<
ffffffffa03587c5>] nfsd+0xb5/0x140 [nfsd]
[ 303.959061] [<
ffffffffa0358710>] ? nfsd_destroy+0x90/0x90 [nfsd]
[ 303.959061] [<
ffffffff8107ae00>] kthread+0xc0/0xd0
[ 303.959061] [<
ffffffff81010000>] ? perf_trace_xen_mmu_set_pte_at+0x50/0x100
[ 303.959061] [<
ffffffff8107ad40>] ? kthread_freezable_should_stop+0x70/0x70
[ 303.959061] [<
ffffffff814898ec>] ret_from_fork+0x7c/0xb0
[ 303.959061] [<
ffffffff8107ad40>] ? kthread_freezable_should_stop+0x70/0x70
[ 303.959061] Code: ff ff 5d c3 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 55 65 48 8b 04 25 f0 c6 00 00 48 89 e5 83 80 44 e0 ff ff 01 b8 00 01 00 00 <3e> 66 0f c1 07 0f b6 d4 38 c2 74 0f 66 0f 1f 44 00 00 f3 90 0f
[ 303.959061] RIP [<
ffffffff81481a8e>] _raw_spin_lock+0x1e/0x40
[ 303.959061] RSP <
ffff880037877dd8>
[ 303.959061] CR2:
0000000000000318
[ 304.001218] ---[ end trace
2d809cd4a7931f5a ]---
[ 304.001903] note: nfsd[264] exited with preempt_count 2
Reported-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
J. Bruce Fields [Thu, 28 Feb 2013 20:51:49 +0000 (12:51 -0800)]
nfsd4: handle seqid-mutating open errors from xdr decoding
If a client sets an owner (or group_owner or acl) attribute on open for
create, and the mapping of that owner to an id fails, then we return
BAD_OWNER. But BAD_OWNER is a seqid-mutating error, so we can't
shortcut the open processing that case: we have to at least look up the
owner so we can find the seqid to bump.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
J. Bruce Fields [Thu, 28 Feb 2013 19:55:46 +0000 (11:55 -0800)]
nfsd4: remove BUG_ON
This BUG_ON just crashes the thread a little earlier than it would
otherwise--it doesn't seem useful.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Jeff Layton [Wed, 27 Mar 2013 14:15:39 +0000 (10:15 -0400)]
nfsd: scale up the number of DRC hash buckets with cache size
We've now increased the size of the duplicate reply cache by quite a
bit, but the number of hash buckets has not changed. So, we've gone from
an average hash chain length of 16 in the old code to 4096 when the
cache is its largest. Change the code to scale out the number of buckets
with the max size of the cache.
At the same time, we also need to fix the hash function since the
existing one isn't really suitable when there are more than 256 buckets.
Move instead to use the stock hash_32 function for this. Testing on a
machine that had 2048 buckets showed that this gave a smaller
longest:average ratio than the existing hash function:
The formula here is longest hash bucket searched divided by average
number of entries per bucket at the time that we saw that longest
bucket:
old hash: 68/(39258/2048) == 3.547404
hash_32: 45/(33773/2048) == 2.728807
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Jeff Layton [Wed, 27 Mar 2013 14:15:39 +0000 (10:15 -0400)]
nfsd: keep stats on worst hash balancing seen so far
The typical case with the DRC is a cache miss, so if we keep track of
the max number of entries that we've ever walked over in a search, then
we should have a reasonable estimate of the longest hash chain that
we've ever seen.
With that, we'll also keep track of the total size of the cache when we
see the longest chain. In the case of a tie, we prefer to track the
smallest total cache size in order to properly gauge the worst-case
ratio of max vs. avg chain length.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Jeff Layton [Wed, 27 Mar 2013 14:15:38 +0000 (10:15 -0400)]
nfsd: add new reply_cache_stats file in nfsdfs
For presenting statistics relating to duplicate reply cache.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Jeff Layton [Wed, 27 Mar 2013 14:15:38 +0000 (10:15 -0400)]
nfsd: track memory utilization by the DRC
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Jeff Layton [Wed, 27 Mar 2013 14:15:37 +0000 (10:15 -0400)]
nfsd: break out comparator into separate function
Break out the function that compares the rqstp and checksum against a
reply cache entry. While we're at it, track the efficacy of the checksum
over the NFS data by tracking the cases where we would have incorrectly
matched a DRC entry if we had not tracked it or the length.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Jeff Layton [Wed, 27 Mar 2013 14:15:37 +0000 (10:15 -0400)]
nfsd: eliminate one of the DRC cache searches
The most common case is to do a search of the cache, followed by an
insert. In the case where we have to allocate an entry off the slab,
then we end up having to redo the search, which is wasteful.
Better optimize the code for the common case by eliminating the initial
search of the cache and always preallocating an entry. In the case of a
cache hit, we'll end up just freeing that entry but that's preferable to
an extra search.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
J. Bruce Fields [Tue, 26 Mar 2013 18:11:13 +0000 (14:11 -0400)]
nfsd4: reject "negative" acl lengths
Since we only enforce an upper bound, not a lower bound, a "negative"
length can get through here.
The symptom seen was a warning when we attempt to a kmalloc with an
excessive size.
Reported-by: Toralf Förster <toralf.foerster@gmx.de>
Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Kent Overstreet [Fri, 22 Mar 2013 18:18:24 +0000 (11:18 -0700)]
nfsd: fix bad offset use
vfs_writev() updates the offset argument - but the code then passes the
offset to vfs_fsync_range(). Since offset now points to the offset after
what was just written, this is probably not what was intended
Introduced by
face15025ffdf664de95e86ae831544154d26c9c "nfsd: use
vfs_fsync_range(), not O_SYNC, for stable writes".
Signed-off-by: Kent Overstreet <koverstreet@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: stable@vger.kernel.org
Reviewed-by: Zach Brown <zab@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Jeff Layton [Fri, 15 Mar 2013 13:16:29 +0000 (09:16 -0400)]
nfsd: fix startup order in nfsd_reply_cache_init
If we end up doing "goto out_nomem" in this function, we'll call
nfsd_reply_cache_shutdown. That will attempt to walk the LRU list and
free entries, but that list may not be initialized yet if the server is
starting up for the first time. It's also possible for the shrinker to
kick in before we've initialized the LRU list.
Rearrange the initialization so that the LRU list_head and cache size
are initialized before doing any of the allocations that might fail.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Jeff Layton [Mon, 18 Mar 2013 14:49:07 +0000 (10:49 -0400)]
nfsd: only unhash DRC entries that are in the hashtable
It's not safe to call hlist_del() on a newly initialized hlist_node.
That leads to a NULL pointer dereference. Only do that if the entry
is hashed.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Linus Torvalds [Sun, 17 Mar 2013 22:59:32 +0000 (15:59 -0700)]
Linux 3.9-rc3
David Rientjes [Sun, 17 Mar 2013 22:49:10 +0000 (15:49 -0700)]
perf,x86: fix link failure for non-Intel configs
Commit
1d9d8639c063 ("perf,x86: fix kernel crash with PEBS/BTS after
suspend/resume") introduces a link failure since
perf_restore_debug_store() is only defined for CONFIG_CPU_SUP_INTEL:
arch/x86/power/built-in.o: In function `restore_processor_state':
(.text+0x45c): undefined reference to `perf_restore_debug_store'
Fix it by defining the dummy function appropriately.
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sun, 17 Mar 2013 22:44:43 +0000 (15:44 -0700)]
perf,x86: fix wrmsr_on_cpu() warning on suspend/resume
Commit
1d9d8639c063 ("perf,x86: fix kernel crash with PEBS/BTS after
suspend/resume") fixed a crash when doing PEBS performance profiling
after resuming, but in using init_debug_store_on_cpu() to restore the
DS_AREA mtrr it also resulted in a new WARN_ON() triggering.
init_debug_store_on_cpu() uses "wrmsr_on_cpu()", which in turn uses CPU
cross-calls to do the MSR update. Which is not really valid at the
early resume stage, and the warning is quite reasonable. Now, it all
happens to _work_, for the simple reason that smp_call_function_single()
ends up just doing the call directly on the CPU when the CPU number
matches, but we really should just do the wrmsr() directly instead.
This duplicates the wrmsr() logic, but hopefully we can just remove the
wrmsr_on_cpu() version eventually.
Reported-and-tested-by: Parag Warudkar <parag.lkml@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sun, 17 Mar 2013 18:04:14 +0000 (11:04 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
"Eric's rcu barrier patch fixes a long standing problem with our
unmount code hanging on to devices in workqueue helpers. Liu Bo
nailed down a difficult assertion for in-memory extent mappings."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: fix warning of free_extent_map
Btrfs: fix warning when creating snapshots
Btrfs: return as soon as possible when edquot happens
Btrfs: return EIO if we have extent tree corruption
btrfs: use rcu_barrier() to wait for bdev puts at unmount
Btrfs: remove btrfs_try_spin_lock
Btrfs: get better concurrency for snapshot-aware defrag work
Liu Bo [Fri, 15 Mar 2013 14:46:39 +0000 (08:46 -0600)]
Btrfs: fix warning of free_extent_map
Users report that an extent map's list is still linked when it's actually
going to be freed from cache.
The story is that
a) when we're going to drop an extent map and may split this large one into
smaller ems, and if this large one is flagged as EXTENT_FLAG_LOGGING which means
that it's on the list to be logged, then the smaller ems split from it will also
be flagged as EXTENT_FLAG_LOGGING, and this is _not_ expected.
b) we'll keep ems from unlinking the list and freeing when they are flagged with
EXTENT_FLAG_LOGGING, because the log code holds one reference.
The end result is the warning, but the truth is that we set the flag
EXTENT_FLAG_LOGGING only during fsync.
So clear flag EXTENT_FLAG_LOGGING for extent maps split from a large one.
Reported-by: Johannes Hirte <johannes.hirte@fem.tu-ilmenau.de>
Reported-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Linus Torvalds [Sat, 16 Mar 2013 01:06:55 +0000 (18:06 -0700)]
Merge branch 'kbuild' of git://git./linux/kernel/git/mmarek/kbuild
Pull kbuild fix from Michal Marek:
"One fix for for make headers_install/headers_check to not require make
3.81. The requirement has been accidentally introduced in 3.7."
* 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
kbuild: fix make headers_check with make 3.80
Linus Torvalds [Sat, 16 Mar 2013 01:05:37 +0000 (18:05 -0700)]
Merge tag 'for-3.9-rc3' of git://openrisc.net/jonas/linux
Pull OpenRISC bug fixes from Jonas Bonn:
- The GPIO descriptor work has exposed how broken the non-GPIOLIB bits
for OpenRISC were. We now require GPIOLIB as this is the preferred
way forward.
- The system.h split introduced a bug in llist.h for arches using
asm-generic/cmpxchg.h directly, which is currently only OpenRISC.
The patch here moves two defines from asm-generic/atomic.h to
asm-generic/cmpxchg.h to make things work as they should.
- The VIRT_TO_BUS selector was added for OpenRISC, but OpenRISC does
not have the virt_to_bus methods, so there's a patch to remove it
again.
* tag 'for-3.9-rc3' of git://openrisc.net/jonas/linux:
openrisc: remove HAVE_VIRT_TO_BUS
asm-generic: move cmpxchg*_local defs to cmpxchg.h
openrisc: require gpiolib
Linus Torvalds [Sat, 16 Mar 2013 01:04:38 +0000 (18:04 -0700)]
Merge tag 'char-misc-3.9-rc2' of git://git./linux/kernel/git/gregkh/char-misc
Pull char/misc fixes from Greg Kroah-Hartman:
"Here are some tiny fixes for the w1 drivers and the final removal
patch for getting rid of CONFIG_EXPERIMENTAL (all users of it are now
gone from your tree, this just drops the Kconfig item itself.)
All have been in the linux-next tree for a while"
* tag 'char-misc-3.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
final removal of CONFIG_EXPERIMENTAL
w1: fix oops when w1_search is called from netlink connector
w1-gpio: fix unused variable warning
w1-gpio: remove erroneous __exit and __exit_p()
ARM: w1-gpio: fix erroneous gpio requests
Linus Torvalds [Sat, 16 Mar 2013 00:35:49 +0000 (17:35 -0700)]
Merge tag 'sound-3.9' of git://git./linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A collection of small fixes, as expected for the middle rc:
- A couple of fixes for potential NULL dereferences and out-of-range
array accesses revealed by static code parsers
- A fix for the wrong error handling detected by trinity
- A regression fix for missing audio on some MacBooks
- CA0132 DSP loader fixes
- Fix for EAPD control of IDT codecs on machines w/o speaker
- Fix a regression in the HD-audio widget list parser code
- Workaround for the NuForce UDH-100 USB audio"
* tag 'sound-3.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda - Fix missing EAPD/GPIO setup for Cirrus codecs
sound: sequencer: cap array index in seq_chn_common_event()
ALSA: hda/ca0132 - Remove extra setting of dsp_state.
ALSA: hda/ca0132 - Check download state of DSP.
ALSA: hda/ca0132 - Check if dspload_image succeeded.
ALSA: hda - Disable IDT eapd_switch if there are no internal speakers
ALSA: hda - Fix snd_hda_get_num_raw_conns() to return a correct value
ALSA: usb-audio: add a workaround for the NuForce UDH-100
ALSA: asihpi - fix potential NULL pointer dereference
ALSA: seq: Fix missing error handling in snd_seq_timer_open()
Linus Torvalds [Sat, 16 Mar 2013 00:35:03 +0000 (17:35 -0700)]
Merge branch 'fixes-for-3.9' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping
Pull DMA-mapping fix from Marek Szyprowski:
"An important fix for all ARM architectures which use ZONE_DMA.
Without it dma_alloc_* calls with GFP_ATOMIC flag might have allocated
buffers outsize DMA zone."
* 'fixes-for-3.9' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping:
ARM: DMA-mapping: add missing GFP_DMA flag for atomic buffer allocation
Linus Torvalds [Sat, 16 Mar 2013 00:34:01 +0000 (17:34 -0700)]
Merge tag 'mfd-fixes-3.9-1' of git://git./linux/kernel/git/sameo/mfd-fixes
Pull MFD fixes from Samuel Ortiz:
"This is the first batch of MFD fixes for 3.9.
With this one we have:
- An ab8500 build failure fix.
- An ab8500 device tree parsing fix.
- A fix for twl4030_madc remove routine to work properly (when
built-in).
- A fix for properly registering palmas interrupt handler.
- A fix for omap-usb init routine to actually write into the
hostconfig register.
- A couple of warning fixes for ab8500-gpadc and tps65912"
* tag 'mfd-fixes-3.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-fixes:
mfd: twl4030-madc: Remove __exit_p annotation
mfd: ab8500: Kill "reg" property from binding
mfd: ab8500-gpadc: Complain if we fail to enable vtvout LDO
mfd: wm831x: Don't forward declare enum wm831x_auxadc
mfd: twl4030-audio: Fix argument type for twl4030_audio_disable_resource()
mfd: tps65912: Declare and use tps65912_irq_exit()
mfd: palmas: Provide irq flags through DT/platform data
mfd: Make AB8500_CORE select POWER_SUPPLY to fix build error
mfd: omap-usb-host: Actually update hostconfig
Linus Torvalds [Sat, 16 Mar 2013 00:33:13 +0000 (17:33 -0700)]
Merge tag 'hwmon-for-linus' of git://git./linux/kernel/git/groeck/linux-staging
Pull hwmon fixes from Guenter Roeck:
"Bug fixes for pmbus, ltc2978, and lineage-pem drivers
Added specific maintainer for some hwmon drivers"
* tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (pmbus/ltc2978) Fix temperature reporting
hwmon: (pmbus) Fix krealloc() misuse in pmbus_add_attribute()
hwmon: (lineage-pem) Add missing terminating entry for pem_[input|fan]_attributes
MAINTAINERS: Add maintainer for MAX6697, INA209, and INA2XX drivers
Stephane Eranian [Fri, 15 Mar 2013 13:26:07 +0000 (14:26 +0100)]
perf,x86: fix kernel crash with PEBS/BTS after suspend/resume
This patch fixes a kernel crash when using precise sampling (PEBS)
after a suspend/resume. Turns out the CPU notifier code is not invoked
on CPU0 (BP). Therefore, the DS_AREA (used by PEBS) is not restored properly
by the kernel and keeps it power-on/resume value of 0 causing any PEBS
measurement to crash when running on CPU0.
The workaround is to add a hook in the actual resume code to restore
the DS Area MSR value. It is invoked for all CPUS. So for all but CPU0,
the DS_AREA will be restored twice but this is harmless.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Takashi Iwai [Fri, 15 Mar 2013 13:23:32 +0000 (14:23 +0100)]
ALSA: hda - Fix missing EAPD/GPIO setup for Cirrus codecs
During the transition to the generic parser, the hook to the codec
specific automute function was forgotten. This resulted in the silent
output on some MacBooks.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Dan Carpenter [Fri, 15 Mar 2013 06:14:22 +0000 (09:14 +0300)]
sound: sequencer: cap array index in seq_chn_common_event()
"chn" here is a number between 0 and 255, but ->chn_info[] only has
16 elements so there is a potential write beyond the end of the
array.
If the seq_mode isn't SEQ_2 then we let the individual drivers
(either opl3.c or midi_synth.c) handle it. Those functions all
do a bounds check on "chn" so I haven't changed anything here.
The opl3.c driver has up to 18 channels and not 16.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Arnd Bergmann [Thu, 14 Mar 2013 21:56:38 +0000 (22:56 +0100)]
mfd: twl4030-madc: Remove __exit_p annotation
4740f73fe5 "mfd: remove use of __devexit" removed the __devexit annotation
on the twl4030_madc_remove function, but left an __exit_p() present on the
pointer to this function. Using __exit_p was as wrong with the devexit in
place as it is now, but now we get a gcc warning about an unused function.
In order for the twl4030_madc_remove to work correctly in built-in code, we
have to remove the __exit_p.
Cc: Bill Pemberton <wfp5p@virginia.edu>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Dylan Reid [Fri, 15 Mar 2013 00:27:46 +0000 (17:27 -0700)]
ALSA: hda/ca0132 - Remove extra setting of dsp_state.
spec->dsp_state is initialized to DSP_DOWNLOAD_INIT, no need to reset
and check it in ca0132_download_dsp().
Signed-off-by: Dylan Reid <dgreid@chromium.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Dylan Reid [Fri, 15 Mar 2013 00:27:45 +0000 (17:27 -0700)]
ALSA: hda/ca0132 - Check download state of DSP.
Instead of using the dspload_is_loaded() function, check the dsp_state
that is kept in the spec. The dspload_is_loaded() function returns
true if the DSP transfer was never started. This false-positive leads
to multiple second delays when ca0132_setup_efaults() times out on
each write.
Signed-off-by: Dylan Reid <dgreid@chromium.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Dylan Reid [Fri, 15 Mar 2013 00:27:44 +0000 (17:27 -0700)]
ALSA: hda/ca0132 - Check if dspload_image succeeded.
If dspload_image() fails, it was ignored and dspload_wait_loaded() was
still called. dsp_loaded should never be set to true in this case,
skip it. The check in dspload_wait_loaded() return true if the DSP is
loaded or if it never started.
Signed-off-by: Dylan Reid <dgreid@chromium.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Michel Lespinasse [Thu, 14 Mar 2013 23:50:02 +0000 (16:50 -0700)]
mm/fremap.c: fix possible oops on error path
The vm_flags introduced in
6d7825b10dbe ("mm/fremap.c: fix oops on error
path") is supposed to avoid a compiler warning about unitialized
vm_flags without changing the generated code.
However I am concerned that this is going to be very brittle, and fail
with some compiler versions. The failure could be either of:
- compiler could actually load vma->vm_flags before checking for the
!vma condition, thus reintroducing the oops
- compiler could optimize out the !vma check, since the pointer just got
dereferenced shortly before (so the compiler knows it can't be NULL!)
I propose reversing this part of the change and initializing vm_flags to 0
just to avoid the bogus uninitialized use warning.
Signed-off-by: Michel Lespinasse <walken@google.com>
Cc: Tommi Rantala <tt.rantala@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Thu, 14 Mar 2013 21:53:07 +0000 (14:53 -0700)]
Merge branch 'rcu/urgent' of git://git./linux/kernel/git/paulmck/linux-rcu
Pull fix for hlist_entry_safe() regression from Paul McKenney:
"This contains a single commit that fixes a regression in
hlist_entry_safe(). This macro references its argument twice, which
can cause NULL-pointer errors. This commit applies a gcc statement
expression, creating a temporary variable to avoid the double
reference. This has been posted to LKML at
https://lkml.org/lkml/2013/3/9/75.
Kudos to CAI Qian, whose testing uncovered this, to Eric Dumazet, who
spotted root cause, and to Li Zefan, who tested this commit."
* 'rcu/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
list: Fix double fetch of pointer in hlist_entry_safe()
Paul E. McKenney [Sat, 9 Mar 2013 15:38:41 +0000 (07:38 -0800)]
list: Fix double fetch of pointer in hlist_entry_safe()
The current version of hlist_entry_safe() fetches the pointer twice,
once to test for NULL and the other to compute the offset back to the
enclosing structure. This is OK for normal lock-based use because in
that case, the pointer cannot change. However, when the pointer is
protected by RCU (as in "rcu_dereference(p)"), then the pointer can
change at any time. This use case can result in the following sequence
of events:
1. CPU 0 invokes hlist_entry_safe(), fetches the RCU-protected
pointer as sees that it is non-NULL.
2. CPU 1 invokes hlist_del_rcu(), deleting the entry that CPU 0
just fetched a pointer to. Because this is the last entry
in the list, the pointer fetched by CPU 0 is now NULL.
3. CPU 0 refetches the pointer, obtains NULL, and then gets a
NULL-pointer crash.
This commit therefore applies gcc's "({ })" statement expression to
create a temporary variable so that the specified pointer is fetched
only once, avoiding the above sequence of events. Please note that
it is the caller's responsibility to use rcu_dereference() as needed.
This allows RCU-protected uses to work correctly without imposing
any additional overhead on the non-RCU case.
Many thanks to Eric Dumazet for spotting root cause!
Reported-by: CAI Qian <caiqian@redhat.com>
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Li Zefan <lizefan@huawei.com>
Linus Torvalds [Thu, 14 Mar 2013 19:11:28 +0000 (12:11 -0700)]
Merge branch 'for_linus' of git://git./linux/kernel/git/jack/linux-fs
Pull ext2, ext3, reiserfs, quota fixes from Jan Kara:
"A fix for regression in ext2, and a format string issue in ext3. The
rest isn't too serious."
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
ext2: Fix BUG_ON in evict() on inode deletion
reiserfs: Use kstrdup instead of kmalloc/strcpy
ext3: Fix format string issues
quota: add missing use of dq_data_lock in __dquot_initialize
Liu Bo [Wed, 13 Mar 2013 13:43:03 +0000 (07:43 -0600)]
Btrfs: fix warning when creating snapshots
Creating snapshot passes extent_root to commit its transaction,
but it can lead to the warning of checking root for quota in
the __btrfs_end_transaction() when someone else is committing
the current transaction. Since we've recorded the needed root
in trans_handle, just use it to get rid of the warning.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Wang Shilong [Wed, 6 Mar 2013 11:51:47 +0000 (11:51 +0000)]
Btrfs: return as soon as possible when edquot happens
If one of qgroup fails to reserve firstly, we should return immediately,
it is unnecessary to continue check.
Signed-off-by: Wang Shilong <wangsl-fnst@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Josef Bacik [Fri, 8 Mar 2013 20:41:02 +0000 (15:41 -0500)]
Btrfs: return EIO if we have extent tree corruption
The callers of lookup_inline_extent_info all handle getting an error back
properly, so return an error if we have corruption instead of being a jerk and
panicing. Still WARN_ON() since this is kind of crucial and I've been seeing it
a bit too much recently for my taste, I think we're doing something wrong
somewhere. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Eric Sandeen [Sat, 9 Mar 2013 15:18:39 +0000 (15:18 +0000)]
btrfs: use rcu_barrier() to wait for bdev puts at unmount
Doing this would reliably fail with -EBUSY for me:
# mount /dev/sdb2 /mnt/scratch; umount /mnt/scratch; mkfs.btrfs -f /dev/sdb2
...
unable to open /dev/sdb2: Device or resource busy
because mkfs.btrfs tries to open the device O_EXCL, and somebody still has it.
Using systemtap to track bdev gets & puts shows a kworker thread doing a
blkdev put after mkfs attempts a get; this is left over from the unmount
path:
btrfs_close_devices
__btrfs_close_devices
call_rcu(&device->rcu, free_device);
free_device
INIT_WORK(&device->rcu_work, __free_device);
schedule_work(&device->rcu_work);
so unmount might complete before __free_device fires & does its blkdev_put.
Adding an rcu_barrier() to btrfs_close_devices() causes unmount to wait
until all blkdev_put()s are done, and the device is truly free once
unmount completes.
Cc: stable@vger.kernel.org
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Liu Bo [Mon, 11 Mar 2013 09:37:45 +0000 (09:37 +0000)]
Btrfs: remove btrfs_try_spin_lock
Remove a useless function declaration
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Liu Bo [Mon, 11 Mar 2013 09:20:58 +0000 (09:20 +0000)]
Btrfs: get better concurrency for snapshot-aware defrag work
Using spinning case instead of blocking will result in better concurrency
overall.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Guenter Roeck [Thu, 21 Feb 2013 18:27:54 +0000 (10:27 -0800)]
hwmon: (pmbus/ltc2978) Fix temperature reporting
On LTC2978, only READ_TEMPERATURE is supported. It reports
the internal junction temperature. This register is unpaged.
On LTC3880, READ_TEMPERATURE and READ_TEMPERATURE2 are supported.
READ_TEMPERATURE is paged and reports external temperatures.
READ_TEMPERATURE2 is unpaged and reports the internal junction
temperature.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Cc: stable@vger.kernel.org # 3.2+
Acked-by: Jean Delvare <khali@linux-fr.org>
David Henningsson [Thu, 14 Mar 2013 14:28:29 +0000 (15:28 +0100)]
ALSA: hda - Disable IDT eapd_switch if there are no internal speakers
If there are no internal speakers, we should not turn the eapd switch
off, because it might be necessary to keep high for Headphone.
BugLink: https://bugs.launchpad.net/bugs/1155016
Signed-off-by: David Henningsson <david.henningsson@canonical.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
David Woodhouse [Thu, 14 Mar 2013 13:30:20 +0000 (13:30 +0000)]
hwmon: (pmbus) Fix krealloc() misuse in pmbus_add_attribute()
If krealloc() returns NULL, it *doesn't* free the original. So any code
of the form 'foo = krealloc(foo, …);' is almost certainly a bug.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Axel Lin [Thu, 14 Mar 2013 08:27:18 +0000 (16:27 +0800)]
hwmon: (lineage-pem) Add missing terminating entry for pem_[input|fan]_attributes
Signed-off-by: Axel Lin <axel.lin@ingics.com>
Cc: stable@vger.kernel.org
Acked-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Marek Szyprowski [Tue, 26 Feb 2013 06:46:24 +0000 (07:46 +0100)]
ARM: DMA-mapping: add missing GFP_DMA flag for atomic buffer allocation
Atomic pool should always be allocated from DMA zone if such zone is
available in the system to avoid issues caused by limited dma mask of
any of the devices used for making an atomic allocation.
Reported-by: Krzysztof Halasa <khc@pm.waw.pl>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Stable <stable@vger.kernel.org> [v3.6+]
Linus Torvalds [Wed, 13 Mar 2013 22:47:50 +0000 (15:47 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/ebiederm/user-namespace
Pull namespace bugfixes from Eric Biederman:
"This tree includes a partial revert for "fs: Limit sys_mount to only
request filesystem modules." When I added the new style module aliases
to the filesystems I deleted the old ones. A bad move. It turns out
that distributions like Arch linux use module aliases when
constructing ramdisks. Which meant ultimately that an ext3 filesystem
mounted with ext4 would not result in the ext4 module being put into
the ramdisk.
The other change in this tree adds a handful of filesystem module
alias I simply failed to add the first time. Which inconvinienced a
few folks using cifs.
I don't want to inconvinience folks any longer than I have to so here
are these trivial fixes."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
fs: Readd the fs module aliases.
fs: Limit sys_mount to only request filesystem modules. (Part 3)
Linus Torvalds [Wed, 13 Mar 2013 22:21:57 +0000 (15:21 -0700)]
Merge branch 'akpm' (fixes from Andrew)
Merge misc fixes from Andrew Morton:
- A bunch of fixes
- Finish off the idr API conversions before someone starts to use the
old interfaces again.
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
idr: idr_alloc() shouldn't trigger lowmem warning when preloaded
UAPI: fix endianness conditionals in M32R's asm/stat.h
UAPI: fix endianness conditionals in linux/raid/md_p.h
UAPI: fix endianness conditionals in linux/acct.h
UAPI: fix endianness conditionals in linux/aio_abi.h
decompressors: fix typo "POWERPC"
mm/fremap.c: fix oops on error path
idr: deprecate idr_pre_get() and idr_get_new[_above]()
tidspbridge: convert to idr_alloc()
zcache: convert to idr_alloc()
mlx4: remove leftover idr_pre_get() call
workqueue: convert to idr_alloc()
nfsd: convert to idr_alloc()
nfsd: remove unused get_new_stid()
kernel/signal.c: use __ARCH_HAS_SA_RESTORER instead of SA_RESTORER
signal: always clear sa_restorer on execve
mm: remove_memory(): fix end_pfn setting
include/linux/res_counter.h needs errno.h
Tejun Heo [Wed, 13 Mar 2013 21:59:49 +0000 (14:59 -0700)]
idr: idr_alloc() shouldn't trigger lowmem warning when preloaded
GFP_NOIO is often used for idr_alloc() inside preloaded section as the
allocation mask doesn't really matter. If the idr tree needs to be
expanded, idr_alloc() first tries to allocate using the specified
allocation mask and if it fails falls back to the preloaded buffer. This
order prevent non-preloading idr_alloc() users from taking advantage of
preloading ones by using preload buffer without filling it shifting the
burden of allocation to the preload users.
Unfortunately, this allowed/expected-to-fail kmem_cache allocation ends up
generating spurious slab lowmem warning before succeeding the request from
the preload buffer.
This patch makes idr_layer_alloc() add __GFP_NOWARN to the first
kmem_cache attempt and try kmem_cache again w/o __GFP_NOWARN after
allocation from preload_buffer fails so that lowmem warning is generated
if not suppressed by the original @gfp_mask.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: David Teigland <teigland@redhat.com>
Tested-by: David Teigland <teigland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Howells [Wed, 13 Mar 2013 21:59:48 +0000 (14:59 -0700)]
UAPI: fix endianness conditionals in M32R's asm/stat.h
In the UAPI header files, __BIG_ENDIAN and __LITTLE_ENDIAN must be
compared against __BYTE_ORDER in preprocessor conditionals where these are
exposed to userspace (that is they're not inside __KERNEL__ conditionals).
However, in the main kernel the norm is to check for
"defined(__XXX_ENDIAN)" rather than comparing against __BYTE_ORDER and
this has incorrectly leaked into the userspace headers.
The definition of struct stat64 in M32R's asm/stat.h is wrong in this way.
Note that userspace will likely interpret the field order incorrectly as
the big-endian variant on little-endian machines - depending on header
inclusion order.
[!!!] NOTE [!!!] This patch may adversely change the userspace API. It might
be better to fix the ordering of st_blocks and __pad4 in struct stat64.
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Howells [Wed, 13 Mar 2013 21:59:47 +0000 (14:59 -0700)]
UAPI: fix endianness conditionals in linux/raid/md_p.h
In the UAPI header files, __BIG_ENDIAN and __LITTLE_ENDIAN must be
compared against __BYTE_ORDER in preprocessor conditionals where these are
exposed to userspace (that is they're not inside __KERNEL__ conditionals).
However, in the main kernel the norm is to check for
"defined(__XXX_ENDIAN)" rather than comparing against __BYTE_ORDER and
this has incorrectly leaked into the userspace headers.
The definition of struct mdp_superblock_s in linux/raid/md_p.h is wrong in
this way. Note that userspace will likely interpret the ordering of the
fields incorrectly as the big-endian variant on a little-endian machines -
depending on header inclusion order.
[!!!] NOTE [!!!] This patch may adversely change the userspace API. It might
be better to fix the ordering of events_hi, events_lo, cp_events_hi and
cp_events_lo in struct mdp_superblock_s / typedef mdp_super_t.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Howells [Wed, 13 Mar 2013 21:59:46 +0000 (14:59 -0700)]
UAPI: fix endianness conditionals in linux/acct.h
In the UAPI header files, __BIG_ENDIAN and __LITTLE_ENDIAN must be
compared against __BYTE_ORDER in preprocessor conditionals where these are
exposed to userspace (that is they're not inside __KERNEL__ conditionals).
However, in the main kernel the norm is to check for
"defined(__XXX_ENDIAN)" rather than comparing against __BYTE_ORDER and
this has incorrectly leaked into the userspace headers.
The definition of ACCT_BYTEORDER in linux/acct.h is wrong in this way.
Note that userspace will likely interpret this incorrectly as the
big-endian variant on little-endian machines - depending on header
inclusion order.
[!!!] NOTE [!!!] This patch may adversely change the userspace API. It might
be better to fix the value of ACCT_BYTEORDER.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Howells [Wed, 13 Mar 2013 21:59:45 +0000 (14:59 -0700)]
UAPI: fix endianness conditionals in linux/aio_abi.h
In the UAPI header files, __BIG_ENDIAN and __LITTLE_ENDIAN must be
compared against __BYTE_ORDER in preprocessor conditionals where these are
exposed to userspace (that is they're not inside __KERNEL__ conditionals).
However, in the main kernel the norm is to check for
"defined(__XXX_ENDIAN)" rather than comparing against __BYTE_ORDER and
this has incorrectly leaked into the userspace headers.
The definition of PADDED() in linux/aio_abi.h is wrong in this way. Note
that userspace will likely interpret this and thus the order of fields in
struct iocb incorrectly as the little-endian variant on big-endian
machines - depending on header inclusion order.
[!!!] NOTE [!!!] This patch may adversely change the userspace API. It might
be better to fix the ordering of aio_key and aio_reserved1 in struct iocb.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Benjamin LaHaise <bcrl@kvack.org>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Paul Bolle [Wed, 13 Mar 2013 21:59:44 +0000 (14:59 -0700)]
decompressors: fix typo "POWERPC"
Commit
5dc49c75a26b ("decompressors: make the default XZ_DEC_* config
match the selected architecture") added
default y if POWERPC
to lib/xz/Kconfig. But there is no Kconfig symbol POWERPC. The most
general Kconfig symbol for the powerpc architecture is PPC. So let's
use that.
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Cc: Florian Fainelli <florian@openwrt.org>
Cc: Lasse Collin <lasse.collin@tukaani.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morton [Wed, 13 Mar 2013 21:59:43 +0000 (14:59 -0700)]
mm/fremap.c: fix oops on error path
If find_vma() fails, sys_remap_file_pages() will dereference `vma', which
contains NULL. Fix it by checking the pointer.
(We could alternatively check for err==0, but this seems more direct)
(The vm_flags change is to squish a bogus used-uninitialised warning
without adding extra code).
Reported-by: Tommi Rantala <tt.rantala@gmail.com>
Cc: Michel Lespinasse <walken@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tejun Heo [Wed, 13 Mar 2013 21:59:42 +0000 (14:59 -0700)]
idr: deprecate idr_pre_get() and idr_get_new[_above]()
Now that all in-kernel users are converted to ues the new alloc
interface, mark the old interface deprecated. We should be able to
remove these in a few releases.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tejun Heo [Wed, 13 Mar 2013 21:59:41 +0000 (14:59 -0700)]
tidspbridge: convert to idr_alloc()
idr_get_new*() and friends are about to be deprecated. Convert to the
new idr_alloc() interface.
There are some peculiarities and possible bugs in the converted
functions. This patch preserves those.
* drv_insert_node_res_element() returns -ENOMEM on alloc failure,
-EFAULT if id space is exhausted. -EFAULT is at best misleading.
* drv_proc_insert_strm_res_element() is even weirder. It returns
-EFAULT if kzalloc() fails, -ENOMEM if idr preloading fails and
-EPERM if id space is exhausted. What's going on here?
* drv_proc_insert_strm_res_element() doesn't free *pstrm_res after
failure.
Only compile tested.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Víctor Manuel Jáquez Leal <vjaquez@igalia.com>
Cc: Rene Sapiens <rene.sapiens@ti.com>
Cc: Armando Uribe <x0095078@ti.com>
Cc: Omar Ramirez Luna <omar.ramirez@ti.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tejun Heo [Wed, 13 Mar 2013 21:59:40 +0000 (14:59 -0700)]
zcache: convert to idr_alloc()
idr_get_new*() and friends are about to be deprecated. Convert to the
new idr_alloc() interface.
Only compile tested.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Dan Magenheimer <dan.magenheimer@oracle.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tejun Heo [Wed, 13 Mar 2013 21:59:39 +0000 (14:59 -0700)]
mlx4: remove leftover idr_pre_get() call
Commit
6a9200603d76 ("IB/mlx4: convert to idr_alloc()") forgot to remove
idr_pre_get() call in mlx4_ib_cm_paravirt_init(). It's unnecessary and
idr_pre_get() will soon be deprecated. Remove it.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jack Morgenstein <jackm@dev.mellanox.co.il>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: Roland Dreier <roland@purestorage.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tejun Heo [Wed, 13 Mar 2013 21:59:38 +0000 (14:59 -0700)]
workqueue: convert to idr_alloc()
idr_get_new*() and friends are about to be deprecated. Convert to the
new idr_alloc() interface.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tejun Heo [Wed, 13 Mar 2013 21:59:37 +0000 (14:59 -0700)]
nfsd: convert to idr_alloc()
idr_get_new*() and friends are about to be deprecated. Convert to the
new idr_alloc() interface.
Only compile-tested.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: J. Bruce Fields <bfields@redhat.com>
Tested-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tejun Heo [Wed, 13 Mar 2013 21:59:36 +0000 (14:59 -0700)]
nfsd: remove unused get_new_stid()
get_new_stid() is no longer used since commit
3abdb607125 ("nfsd4:
simplify idr allocation"). Remove it.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morton [Wed, 13 Mar 2013 21:59:34 +0000 (14:59 -0700)]
kernel/signal.c: use __ARCH_HAS_SA_RESTORER instead of SA_RESTORER
__ARCH_HAS_SA_RESTORER is the preferred conditional for use in 3.9 and
later kernels, per Kees.
Cc: Emese Revfy <re.emese@gmail.com>
Cc: Emese Revfy <re.emese@gmail.com>
Cc: PaX Team <pageexec@freemail.hu>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: Julien Tinnes <jln@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kees Cook [Wed, 13 Mar 2013 21:59:33 +0000 (14:59 -0700)]
signal: always clear sa_restorer on execve
When the new signal handlers are set up, the location of sa_restorer is
not cleared, leaking a parent process's address space location to
children. This allows for a potential bypass of the parent's ASLR by
examining the sa_restorer value returned when calling sigaction().
Based on what should be considered "secret" about addresses, it only
matters across the exec not the fork (since the VMAs haven't changed
until the exec). But since exec sets SIG_DFL and keeps sa_restorer,
this is where it should be fixed.
Given the few uses of sa_restorer, a "set" function was not written
since this would be the only use. Instead, we use
__ARCH_HAS_SA_RESTORER, as already done in other places.
Example of the leak before applying this patch:
$ cat /proc/$$/maps
...
7fb9f3083000-
7fb9f3238000 r-xp
00000000 fd:01 404469 .../libc-2.15.so
...
$ ./leak
...
7f278bc74000-
7f278be29000 r-xp
00000000 fd:01 404469 .../libc-2.15.so
...
1 0 (nil) 0x7fb9f30b94a0
2
4000000 (nil) 0x7f278bcaa4a0
3
4000000 (nil) 0x7f278bcaa4a0
4 0 (nil) 0x7fb9f30b94a0
...
[akpm@linux-foundation.org: use SA_RESTORER for backportability]
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Emese Revfy <re.emese@gmail.com>
Cc: Emese Revfy <re.emese@gmail.com>
Cc: PaX Team <pageexec@freemail.hu>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: Julien Tinnes <jln@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Toshi Kani [Wed, 13 Mar 2013 21:59:31 +0000 (14:59 -0700)]
mm: remove_memory(): fix end_pfn setting
remove_memory() calls walk_memory_range() with [start_pfn, end_pfn), where
end_pfn is exclusive in this range. Therefore, end_pfn needs to be set to
the next page of the end address.
Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Jiang Liu <jiang.liu@huawei.com>
Cc: Jianguo Wu <wujianguo@huawei.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Wu Jianguo <wujianguo@huawei.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morton [Wed, 13 Mar 2013 21:59:30 +0000 (14:59 -0700)]
include/linux/res_counter.h needs errno.h
alpha allmodconfig:
In file included from mm/memcontrol.c:28:
include/linux/res_counter.h: In function 'res_counter_set_limit':
include/linux/res_counter.h:203: error: 'EBUSY' undeclared (first use in this function)
include/linux/res_counter.h:203: error: (Each undeclared identifier is reported only once
include/linux/res_counter.h:203: error: for each function it appears in.)
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Glauber Costa <glommer@parallels.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Frederic Weisbecker <fweisbec@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Wed, 13 Mar 2013 22:03:48 +0000 (15:03 -0700)]
Merge tag 'usb-3.9-rc2' of git://git./linux/kernel/git/gregkh/usb
Pull USB fixes from Greg Kroah-Hartman:
"Here are a number of tiny USB fixes and new USB device ids for your
3.9 tree.
The "largest" one here is a revert of a usb-storage patch that turned
out to be incorrect, breaking existing users, which is never a good
thing. Everything else is pretty simple and small"
* tag 'usb-3.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (43 commits)
USB: quatech2: only write to the tty if the port is open.
qcserial: bind to DM/DIAG port on Gobi 1K devices
USB: cdc-wdm: fix buffer overflow
usb: serial: Add Rigblaster Advantage to device table
qcaux: add Franklin U600
usb: musb: core: fix possible build error with randconfig
usb: cp210x new Vendor/Device IDs
usb: gadget: pxa25x: fix disconnect reporting
usb: dwc3: ep0: fix sparc64 build
usb: c67x00 RetryCnt value in c67x00 TD should be 3
usb: Correction to c67x00 TD data length mask
usb: Makefile: fix drivers/usb/phy/ Makefile entry
USB: added support for Cinterion's products AH6 and PLS8
usb: gadget: fix omap_udc build errors
USB: storage: fix Huawei mode switching regression
USB: storage: in-kernel modeswitching is deprecated
tools: usb: ffs-test: Fix build failure
USB: option: add Huawei E5331
usb: musb: omap2430: fix sparse warning
usb: musb: omap2430: fix omap_musb_mailbox glue check again
...
Linus Torvalds [Wed, 13 Mar 2013 22:02:02 +0000 (15:02 -0700)]
Merge tag 'tty-3.9-rc2' of git://git./linux/kernel/git/gregkh/tty
Pull tty/serial fixes from Greg Kroah-Hartman:
"Here are some tty/serial driver fixes for 3.9
We finally mute the annoying WARN_ON that lots of people are hitting
and it turns out isn't needed anymore. Also add a few new device ids
and a some other minor fixes."
* tag 'tty-3.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
tty: serial: fix typo "SERIAL_S3C2412"
serial: 8250: Keep 8250.<xxxx> module options functional after driver rename
tty: serial: fix typo "ARCH_S5P6450"
tty/8250_pnp: serial port detection regression since v3.7
serial: bcm63xx_uart: fix compilation after "TTY: switch tty_insert_flip_char"
serial: 8250_pci: add support for another kind of NetMos Technology PCI 9835 Multi-I/O Controller
Fix 4 port and add support for 8 port 'Unknown' PCI serial port cards
tty/serial: Add support for Altera serial port
tty: serial: vt8500: Unneccessary duplicated clock code removed
tty: serial: mpc5xxx: fix PSC clock name bug
TTY: disable debugging warning