Christoph Hellwig [Sun, 18 Jul 2010 21:17:11 +0000 (21:17 +0000)]
xfs simplify and speed up direct I/O completions
Our current handling of direct I/O completions is rather suboptimal,
because we defer it to a workqueue more often than needed, and we
perform a much to aggressive flush of the workqueue in case unwritten
extent conversions happen.
This patch changes the direct I/O reads to not even use a completion
handler, as we don't bother to use it at all, and to perform the unwritten
extent conversions in caller context for synchronous direct I/O.
For a small I/O size direct I/O workload on a consumer grade SSD, such as
the untar of a kernel tree inside qemu this patch gives speedups of
about 5%. Getting us much closer to the speed of a native block device,
or a fully allocated XFS file.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
Christoph Hellwig [Sun, 18 Jul 2010 21:17:10 +0000 (21:17 +0000)]
xfs: move aio completion after unwritten extent conversion
If we write into an unwritten extent using AIO we need to complete the AIO
request after the extent conversion has finished. Without that a read could
race to see see the extent still unwritten and return zeros. For synchronous
I/O we already take care of that by flushing the xfsconvertd workqueue (which
might be a bit of overkill).
To do that add iocb and result fields to struct xfs_ioend, so that we can
call aio_complete from xfs_end_io after the extent conversion has happened.
Note that we need a new result field as io_error is used for positive errno
values, while the AIO code can return negative error values and positive
transfer sizes.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
Christoph Hellwig [Sun, 18 Jul 2010 21:17:09 +0000 (21:17 +0000)]
direct-io: move aio_complete into ->end_io
Filesystems with unwritten extent support must not complete an AIO request
until the transaction to convert the extent has been commited. That means
the aio_complete calls needs to be moved into the ->end_io callback so
that the filesystem can control when to call it exactly.
This makes a bit of a mess out of dio_complete and the ->end_io callback
prototype even more complicated.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Alex Elder <aelder@sgi.com>
Dave Chinner [Mon, 26 Jul 2010 18:51:46 +0000 (13:51 -0500)]
xfs: fix big endian build
Commit
0fd7275cc42ab734eaa1a2c747e65479bd1e42af ("xfs: fix gcc 4.6
set but not read and unused statement warnings") failed to convert
some code inside XFS_NATIVE_HOST (big endian host code only) and
hence fails to build on such machines. Fix it.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
Christoph Hellwig [Thu, 22 Jul 2010 02:52:08 +0000 (12:52 +1000)]
xfs: clean up xfs_bmap_get_bp
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Christoph Hellwig [Tue, 20 Jul 2010 07:51:31 +0000 (17:51 +1000)]
xfs: simplify xfs_truncate_file
xfs_truncate_file is only used for truncating quota files. Move it to
xfs_qm_syscalls.c so it can be marked static and take advatange of the
fact by removing the unused page cache validation and taking the iget
into the helper.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Christoph Hellwig [Tue, 20 Jul 2010 07:51:16 +0000 (17:51 +1000)]
xfs: kill the b_strat callback in xfs_buf
The b_strat callback is used by xfs_buf_iostrategy to perform additional
checks before submitting a buffer. It is used in xfs_bwrite and when
writing out delayed buffers. In xfs_bwrite it we can de-virtualize the
call easily as b_strat is set a few lines above the call to
xfs_buf_iostrategy. For the delayed buffers the rationale is a bit
more complicated:
- there are three callers of xfs_buf_delwri_queue, which places buffers
on the delwri list:
(1) xfs_bdwrite - this sets up b_strat, so it's fine
(2) xfs_buf_iorequest. None of the callers can have XBF_DELWRI set:
- xlog_bdstrat is only used for log buffers, which are never delwri
- _xfs_buf_read explicitly clears the delwri flag
- xfs_buf_iodone_work retries log buffers only
- xfsbdstrat - only used for reads, superblock writes without the
delwri flag, log I/O and file zeroing with explicitly allocated
buffers.
- xfs_buf_iostrategy - only calls xfs_buf_iorequest if b_strat is
not set
(3) xfs_buf_unlock
- only puts the buffer on the delwri list if the DELWRI flag is
already set. The DELWRI flag is only ever set in xfs_bwrite,
xfs_buf_iodone_callbacks, or xfs_trans_log_buf. For
xfs_buf_iodone_callbacks and xfs_trans_log_buf we require
an initialized buf item, which means b_strat was set to
xfs_bdstrat_cb in xfs_buf_item_init.
Conclusion: we can just get rid of the callback and replace it with
explicit calls to xfs_bdstrat_cb.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Christoph Hellwig [Tue, 20 Jul 2010 07:50:52 +0000 (17:50 +1000)]
xfs: remove obsolete osyncisosync mount option
Since Linux 2.6.33 the kernel has support for real O_SYNC, which made
the osyncisosync option a no-op. Warn the users about this and remove
the mount flag for it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Christoph Hellwig [Tue, 20 Jul 2010 07:31:01 +0000 (17:31 +1000)]
xfs: clean up filestreams helpers
Move xfs_filestream_peek_ag, xxfs_filestream_get_ag and xfs_filestream_put_ag
from xfs_filestream.h to xfs_filestream.c where it's only callers are, and
remove the inline marker while we're at it to let the compiler decide on the
inlining. Also don't return a value from xfs_filestream_put_ag because
we don't need it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Christoph Hellwig [Tue, 20 Jul 2010 07:54:45 +0000 (17:54 +1000)]
xfs: fix gcc 4.6 set but not read and unused statement warnings
[hch: dropped a few hunks that need structural changes instead]
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Tony Luck [Tue, 20 Jul 2010 07:54:41 +0000 (17:54 +1000)]
xfs: Fix build when CONFIG_XFS_POSIX_ACL=n
When CONFIG_XFS_POSIX_ACL is not set "xfs_check_acl" is #defined
to NULL - which breaks the code attempting to add a tracepoint
on this function.
Only define the tracepoint when the function exists.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Kulikov Vasiliy [Tue, 20 Jul 2010 07:54:28 +0000 (17:54 +1000)]
xfs: fix unsigned underflow in xfs_free_eofblocks
map_len is unsigned. Checking map_len <= 0 is buggy when it should be
below zero. So, check exact expression instead of map_len.
Signed-off-by: Kulikov Vasiliy <segooon@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Dave Chinner [Tue, 20 Jul 2010 07:54:12 +0000 (17:54 +1000)]
xfs: use GFP_NOFS for page cache allocation
Avoid a lockdep warning by preventing page cache allocation from
recursing back into the filesystem during memory reclaim.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Dave Chinner [Tue, 20 Jul 2010 07:53:59 +0000 (17:53 +1000)]
xfs: fix memory reclaim recursion deadlock on locked inode buffer
Calling into memory reclaim with a locked inode buffer can deadlock
if memory reclaim tries to lock the inode buffer during inode
teardown. Convert the relevant memory allocations to use KM_NOFS to
avoid this deadlock condition.
Reported-by: Peter Watkins <treestem@gmail.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Dave Chinner [Tue, 20 Jul 2010 07:53:44 +0000 (17:53 +1000)]
xfs: fix xfs_trans_add_item() lockdep warnings
xfs_trans_add_item() is called with ip->i_ilock held, which means it
is unsafe for memory reclaim to recurse back into the filesystem
(ilock is required in writeback). Hence the allocation needs to be
KM_NOFS to avoid recursion.
Lockdep report indicating memory allocation being called with the
ip->i_ilock held is as follows:
[ 1749.866796] =================================
[ 1749.867788] [ INFO: inconsistent lock state ]
[ 1749.868327] 2.6.35-rc3-dgc+ #25
[ 1749.868741] ---------------------------------
[ 1749.868741] inconsistent {IN-RECLAIM_FS-W} -> {RECLAIM_FS-ON-W} usage.
[ 1749.868741] dd/2835 [HC0[0]:SC0[0]:HE1:SE1] takes:
[ 1749.868741] (&(&ip->i_lock)->mr_lock){++++?.}, at: [<
ffffffff813170fb>] xfs_ilock+0x10b/0x190
[ 1749.868741] {IN-RECLAIM_FS-W} state was registered at:
[ 1749.868741] [<
ffffffff810b3a97>] __lock_acquire+0x437/0x1450
[ 1749.868741] [<
ffffffff810b4b56>] lock_acquire+0xa6/0x160
[ 1749.868741] [<
ffffffff810a20b5>] down_write_nested+0x65/0xb0
[ 1749.868741] [<
ffffffff813170fb>] xfs_ilock+0x10b/0x190
[ 1749.868741] [<
ffffffff8134e819>] xfs_reclaim_inode+0x99/0x310
[ 1749.868741] [<
ffffffff8134f56b>] xfs_inode_ag_walk+0x8b/0x150
[ 1749.868741] [<
ffffffff8134f6bb>] xfs_inode_ag_iterator+0x8b/0xf0
[ 1749.868741] [<
ffffffff8134f7a8>] xfs_reclaim_inode_shrink+0x88/0x90
[ 1749.868741] [<
ffffffff81119d07>] shrink_slab+0x137/0x1a0
[ 1749.868741] [<
ffffffff8111bbe1>] balance_pgdat+0x421/0x6a0
[ 1749.868741] [<
ffffffff8111bf7d>] kswapd+0x11d/0x320
[ 1749.868741] [<
ffffffff8109ce56>] kthread+0x96/0xa0
[ 1749.868741] [<
ffffffff81035de4>] kernel_thread_helper+0x4/0x10
[ 1749.868741] irq event stamp:
4234335
[ 1749.868741] hardirqs last enabled at (
4234335): [<
ffffffff81147d25>] kmem_cache_free+0x115/0x220
[ 1749.868741] hardirqs last disabled at (
4234334): [<
ffffffff81147c4d>] kmem_cache_free+0x3d/0x220
[ 1749.868741] softirqs last enabled at (
4233112): [<
ffffffff81084dd2>] __do_softirq+0x142/0x260
[ 1749.868741] softirqs last disabled at (
4233095): [<
ffffffff81035edc>] call_softirq+0x1c/0x50
[ 1749.868741]
[ 1749.868741] other info that might help us debug this:
[ 1749.868741] 2 locks held by dd/2835:
[ 1749.868741] #0: (&(&ip->i_iolock)->mr_lock#2){+.+.+.}, at: [<
ffffffff81316edd>] xfs_ilock_nowait+0xed/0x200
[ 1749.868741] #1: (&(&ip->i_lock)->mr_lock){++++?.}, at: [<
ffffffff813170fb>] xfs_ilock+0x10b/0x190
[ 1749.868741]
[ 1749.868741] stack backtrace:
[ 1749.868741] Pid: 2835, comm: dd Not tainted 2.6.35-rc3-dgc+ #25
[ 1749.868741] Call Trace:
[ 1749.868741] [<
ffffffff810b1faa>] print_usage_bug+0x18a/0x190
[ 1749.868741] [<
ffffffff8104264f>] ? save_stack_trace+0x2f/0x50
[ 1749.868741] [<
ffffffff810b2400>] ? check_usage_backwards+0x0/0xf0
[ 1749.868741] [<
ffffffff810b2f11>] mark_lock+0x331/0x400
[ 1749.868741] [<
ffffffff810b3047>] mark_held_locks+0x67/0x90
[ 1749.868741] [<
ffffffff810b3111>] lockdep_trace_alloc+0xa1/0xe0
[ 1749.868741] [<
ffffffff81147419>] kmem_cache_alloc+0x39/0x1e0
[ 1749.868741] [<
ffffffff8133f954>] kmem_zone_alloc+0x94/0xe0
[ 1749.868741] [<
ffffffff8133f9be>] kmem_zone_zalloc+0x1e/0x50
[ 1749.868741] [<
ffffffff81335f02>] xfs_trans_add_item+0x72/0xb0
[ 1749.868741] [<
ffffffff81339e41>] xfs_trans_ijoin+0xa1/0xd0
[ 1749.868741] [<
ffffffff81319f82>] xfs_itruncate_finish+0x312/0x5d0
[ 1749.868741] [<
ffffffff8133cb87>] xfs_free_eofblocks+0x227/0x280
[ 1749.868741] [<
ffffffff8133cd18>] xfs_release+0x138/0x190
[ 1749.868741] [<
ffffffff813464c5>] xfs_file_release+0x15/0x20
[ 1749.868741] [<
ffffffff81150ebf>] fput+0x13f/0x260
[ 1749.868741] [<
ffffffff8114d8c2>] filp_close+0x52/0x80
[ 1749.868741] [<
ffffffff8114d9a9>] sys_close+0xb9/0x120
[ 1749.868741] [<
ffffffff81034ff2>] system_call_fastpath+0x16/0x1b
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Dave Chinner [Tue, 20 Jul 2010 07:53:25 +0000 (17:53 +1000)]
xfs: simplify and remove xfs_ireclaim
xfs_ireclaim has to get and put te pag structure because it is only
called with the inode to reclaim. The one caller of this function
already has a reference on the pag and a pointer to is, so move the
radix tree delete to the caller and remove xfs_ireclaim completely.
This avoids a xfs_perag_get/put on every inode being reclaimed.
The overhead was noticed in a bug report at:
https://bugzilla.kernel.org/show_bug.cgi?id=16348
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Dave Chinner [Tue, 20 Jul 2010 07:52:59 +0000 (17:52 +1000)]
xfs: don't block on buffer read errors
xfs_buf_read() fails to detect dispatch errors before attempting to
wait on sychronous IO. If there was an error, it will get stuck
forever, waiting for an I/O that was never started. Make sure the
error is detected correctly.
Further, such a failure can leave locked pages in the page cache
which will cause a later operation to hang on the page. Ensure that
we correctly process pages in the buffers when we get a dispatch
error.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Dave Chinner [Mon, 12 Jul 2010 06:40:58 +0000 (06:40 +0000)]
xfs: move inode shrinker unregister even earlier
I missed Dave Chinner's second revision of this change, and pushed
his first version out to the repository instead.
commit
a476c59ebb279d738718edc0e3fb76aab3687114
Author: Dave Chinner <dchinner@redhat.com>
This commit compensates for that by moving a block of code up a bit
further, with a result that matches the the effect of Dave's second
version.
Dave's first version was:
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Dave's second version was:
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Christoph Hellwig [Sat, 3 Jul 2010 09:21:17 +0000 (09:21 +0000)]
xfs: remove a dmapi leftover
The open_exec file operation is only added by the external dmapi
patch.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
Christoph Hellwig [Mon, 28 Jun 2010 14:34:57 +0000 (10:34 -0400)]
xfs: writepage always has buffers
These days we always have buffers thanks to ->page_mkwrite. And we
already have an assert a few lines above tripping in case that was
not true due to a bug.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
Christoph Hellwig [Mon, 28 Jun 2010 14:34:44 +0000 (10:34 -0400)]
xfs: allow writeback from kswapd
We only need disable I/O from direct or memcg reclaim.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
Christoph Hellwig [Mon, 28 Jun 2010 14:34:34 +0000 (10:34 -0400)]
xfs: remove incorrect log write optimization
We do need a barrier for the first buffer of a split log write.
Otherwise we might incorrectly stamp the tail LSN into transactions
in the first part of the split write, or not flush data I/O before
updating the inode size.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
Dave Chinner [Fri, 25 Jun 2010 01:08:40 +0000 (11:08 +1000)]
xfs: unregister inode shrinker before freeing filesystem structures
Currently we don't remove the XFS mount from the shrinker list until
late in the unmount path. By this time, we have already torn down
the internals of the filesystem (e.g. the per-ag structures), and
hence if the shrinker is executed between the teardown and the
unregistering, the shrinker will get NULL per-ag structure pointers
and panic trying to dereference them.
Fix this by removing the xfs mount from the shrinker list before
tearing down it's internal structures.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
Christoph Hellwig [Thu, 24 Jun 2010 01:57:09 +0000 (11:57 +1000)]
xfs: split xfs_itrace_entry
Replace the xfs_itrace_entry catchall with specific trace points. For
most simple callers we now use the simple inode class, which used to
be the iget class, but add more details tracing for namespace events,
which now includes the name of the directory entries manipulated.
Remove the xfs_inactive trace point, which is a duplicate of the clear_inode
one, and the xfs_change_file_space trace point, which is immediately
followed by the more specific alloc/free space trace points.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Christoph Hellwig [Thu, 24 Jun 2010 01:52:50 +0000 (11:52 +1000)]
xfs: remove xfs_iput
xfs_iput is just a small wrapper for xfs_iunlock + IRELE. Having this
out of line wrapper means the trace events in those two can't track
their caller properly. So just remove the wrapper and opencode the
unlock + rele in the few callers.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Christoph Hellwig [Thu, 24 Jun 2010 01:51:19 +0000 (11:51 +1000)]
xfs: remove xfs_iput_new
We never get an i_mode of 0 or a locked VFS inode until we pass in the
XFS_IGET_CREATE flag to xfs_iget, which makes xfs_iput_new equivalent to
xfs_iput for the only caller. In addition to that xfs_nfs_get_inode
does not even need to lock the inode given that the generation never changes
for a life inode, so just pass a 0 lock_flags to xfs_iget and release
the inode using IRELE in the error path.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Christoph Hellwig [Thu, 24 Jun 2010 01:50:22 +0000 (11:50 +1000)]
xfs: some iget tracing cleanups / fixes
The xfs_iget_alloc/found tracepoints are a bit misnamed and misplaced.
Rename them to xfs_iget_hit/xfs_iget_miss and move them to the beggining
of the xfs_iget_cache_hit/miss functions. Add a new xfs_iget_reclaim_fail
tracepoint for the case where we fail to re-initialize a VFS inode,
and add a second instance of the xfs_iget_skip tracepoint for the case
of a failed igrab() call.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Christoph Hellwig [Thu, 24 Jun 2010 01:49:12 +0000 (11:49 +1000)]
xfs: do not use emums for flags used in tracing
The tracing code can't print flags defined as enums. Most flags that
we want to print are defines as macros already, but move the few remaining
ones over to make the trace output more useful.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Christoph Hellwig [Thu, 24 Jun 2010 01:45:34 +0000 (11:45 +1000)]
xfs: remove explicit xfs_sync_data/xfs_sync_attr calls on umount
On the final put of a superblock the VFS already calls sync_filesystem
for us to write out all data and wait for it. No need to start another
asynchronous writeback inside ->put_super.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Christoph Hellwig [Thu, 24 Jun 2010 01:44:35 +0000 (11:44 +1000)]
xfs: small cleanups for xfs_iomap / __xfs_get_blocks
Remove the flags argument to __xfs_get_blocks as we can easily derive
it from the direct argument, and remove the unused BMAPI_MMAP flag.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Christoph Hellwig [Thu, 24 Jun 2010 01:42:19 +0000 (11:42 +1000)]
xfs: reduce stack usage in xfs_iomap
xfs_iomap passes a xfs_bmbt_irec pointer to xfs_iomap_write_direct and
xfs_iomap_write_allocate to give them the results of our read-only
xfs_bmapi query. Instead of allocating a new xfs_bmbt_irec on stack
for the next call to xfs_bmapi re use the one we got passed as it's not
used after this point.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Christoph Hellwig [Thu, 24 Jun 2010 01:39:25 +0000 (11:39 +1000)]
xfs: avoid synchronous transaction in xfs_fs_write_inode
We already rely on the fact that the sync code will cause a synchronous
log force later on (currently via xfs_fs_sync_fs -> xfs_quiesce_data ->
xfs_sync_data), so no need to do this here. This allows us to avoid
a lot of synchronous log forces during sync, which pays of especially
with delayed logging enabled. Some compilebench numbers that show
this:
xfs (delayed logging, 256k logbufs)
===================================
intial create 25.94 MB/s 25.75 MB/s 25.64 MB/s
create 8.54 MB/s 9.12 MB/s 9.15 MB/s
patch 2.47 MB/s 2.47 MB/s 3.17 MB/s
compile 29.65 MB/s 30.51 MB/s 27.33 MB/s
clean 90.92 MB/s 98.83 MB/s 128.87 MB/s
read tree 11.90 MB/s 11.84 MB/s 8.56 MB/s
read compiled 28.75 MB/s 29.96 MB/s 24.25 MB/s
delete tree 8.39 seconds 8.12 seconds 8.46 seconds
delete compiled 8.35 seconds 8.44 seconds 5.11 seconds
stat tree 6.03 seconds 5.59 seconds 5.19 seconds
stat compiled tree 9.00 seconds 9.52 seconds 8.49 seconds
xfs + write_inode log_force removal
===================================
intial create 25.87 MB/s 25.76 MB/s 25.87 MB/s
create 15.18 MB/s 14.80 MB/s 14.94 MB/s
patch 3.13 MB/s 3.14 MB/s 3.11 MB/s
compile 36.74 MB/s 37.17 MB/s 36.84 MB/s
clean 226.02 MB/s 222.58 MB/s 217.94 MB/s
read tree 15.14 MB/s 15.02 MB/s 15.14 MB/s
read compiled tree 29.30 MB/s 29.31 MB/s 29.32 MB/s
delete tree 6.22 seconds 6.14 seconds 6.15 seconds
delete compiled tree 5.75 seconds 5.92 seconds 5.81 seconds
stat tree 4.60 seconds 4.51 seconds 4.56 seconds
stat compiled tree 4.07 seconds 3.87 seconds 3.96 seconds
In addition to that also remove the delwri inode flush that is unessecary
now that bulkstat is always coherent.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Christoph Hellwig [Wed, 23 Jun 2010 23:46:01 +0000 (09:46 +1000)]
xfs: simplify xfs_vm_writepage
The writepage implementation in XFS still tries to deal with dirty but
unmapped buffers which used to caused by writes through shared mmaps. Since
the introduction of ->page_mkwrite these can't happen anymore, so remove the
code dealing with them.
Note that the all_bh variable which causes us to start I/O on all buffers on
the pages was controlled by the count of unmapped buffers, which also
included those not actually dirty. It's now unconditionally initialized to
0 but set to 1 for the case of small file size extensions. It probably can
be removed entirely, but that's left for another patch.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Christoph Hellwig [Wed, 23 Jun 2010 23:45:48 +0000 (09:45 +1000)]
xfs: simplify xfs_vm_releasepage
Currently the xfs releasepage implementation has code to deal with converting
delayed allocated and unwritten space. But we never get called for those as
we always convert delayed and unwritten space when cleaning a page, or drop
the state from the buffers in block_invalidatepage. We still keep a WARN_ON
on those cases for now, but remove all the case dealing with it, which allows
to fold xfs_page_state_convert into xfs_vm_writepage and remove the !startio
case from the whole writeback path.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Eric Sandeen [Wed, 23 Jun 2010 23:45:30 +0000 (09:45 +1000)]
xfs: fix corruption case for block size < page size
xfstests 194 first truncats a file back and then extends it again by
truncating it to a larger size. This causes discard_buffer to drop
the mapped, but not the uptodate bit and thus creates something that
xfs_page_state_convert takes for unmapped space created by mmap because
it doesn't check for the dirty bit, which also gets cleared by
discard_buffer and checked by other ->writepage implementations like
block_write_full_page. Handle this kind of buffers early, and unlike
Eric's first version of the patch simply ASSERT that the buffers is
dirty, given that the mmap write case can't happen anymore since the
introduction of ->page_mkwrite. The now dead code dealing with that
will be deleted in a follow on patch.
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Christoph Hellwig [Wed, 23 Jun 2010 08:11:15 +0000 (18:11 +1000)]
xfs: remove unused delta tracking code in xfs_bmapi
This code was introduced four years ago in commit
3e57ecf640428c01ba1ed8c8fc538447ada1715b without any review and has
been unused since. Remove it just as the rest of the code introduced
in that commit to reduce that stack usage and complexity in this central
piece of code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Christoph Hellwig [Wed, 23 Jun 2010 08:11:15 +0000 (18:11 +1000)]
xfs: remove unused XFS_BMAPI_ flags
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Christoph Hellwig [Wed, 23 Jun 2010 08:11:15 +0000 (18:11 +1000)]
xfs: remove the unused XFS_TRANS_NOSLEEP/XFS_TRANS_WAIT flags
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Christoph Hellwig [Wed, 23 Jun 2010 08:11:15 +0000 (18:11 +1000)]
xfs: remove the unused XFS_LOG_SLEEP and XFS_LOG_NOSLEEP flags
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Christoph Hellwig [Wed, 23 Jun 2010 08:11:15 +0000 (18:11 +1000)]
xfs: kill the unused xlog_debug variable
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Christoph Hellwig [Wed, 23 Jun 2010 08:11:15 +0000 (18:11 +1000)]
xfs: fix the xfs_log_iovec i_addr type
By making this member a void pointer we can get rid of a lot of pointless
casts.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Christoph Hellwig [Thu, 24 Jun 2010 01:36:58 +0000 (11:36 +1000)]
xfs: simplify inode to transaction joining
Currently we need to either call IHOLD or xfs_trans_ihold on an inode when
joining it to a transaction via xfs_trans_ijoin.
This patches instead makes xfs_trans_ijoin usable on it's own by doing
an implicity xfs_trans_ihold, which also allows us to drop the third
argument. For the case where we want to hold a reference on the inode
a xfs_trans_ijoin_ref wrapper is added which does the IHOLD and marks
the inode for needing an xfs_iput. In addition to the cleaner interface
to the caller this also simplifies the implementation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Christoph Hellwig [Wed, 23 Jun 2010 08:11:15 +0000 (18:11 +1000)]
xfs: simplify buffer pinning
Get rid of the xfs_buf_pin/xfs_buf_unpin/xfs_buf_ispin helpers and opencode
them in their only callers, just like we did for the inode pinning a while
ago. Also remove duplicate trace points - the bufitem tracepoints cover
all the information that is present in a buffer tracepoint.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Christoph Hellwig [Wed, 23 Jun 2010 08:11:15 +0000 (18:11 +1000)]
xfs: give li_cb callbacks the correct prototype
Stop the function pointer casting madness and give all the li_cb instances
correct prototype.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Christoph Hellwig [Wed, 23 Jun 2010 08:11:15 +0000 (18:11 +1000)]
xfs: give xfs_item_ops methods the correct prototypes
Stop the function pointer casting madness and give all the xfs_item_ops the
correct prototypes.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Christoph Hellwig [Wed, 23 Jun 2010 08:11:15 +0000 (18:11 +1000)]
xfs: merge iop_unpin_remove into iop_unpin
The unpin_remove item operation instances always share most of the
implementation with the respective unpin implementation. So instead
of keeping two different entry points add a remove flag to the unpin
operation and share the code more easily.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Christoph Hellwig [Wed, 23 Jun 2010 08:11:15 +0000 (18:11 +1000)]
xfs: simplify log item descriptor tracking
Currently we track log item descriptor belonging to a transaction using a
complex opencoded chunk allocator. This code has been there since day one
and seems to work around the lack of an efficient slab allocator.
This patch replaces it with dynamically allocated log item descriptors
from a dedicated slab pool, linked to the transaction by a linked list.
This allows to greatly simplify the log item descriptor tracking to the
point where it's just a couple hundred lines in xfs_trans.c instead of
a separate file. The external API has also been simplified while we're
at it - the xfs_trans_add_item and xfs_trans_del_item functions to add/
delete items from a transaction have been simplified to the bare minium,
and the xfs_trans_find_item function is replaced with a direct dereference
of the li_desc field. All debug code walking the list of log items in
a transaction is down to a simple list_for_each_entry.
Note that we could easily use a singly linked list here instead of the
double linked list from list.h as the fastpath only does deletion from
sequential traversal. But given that we don't have one available as
a library function yet I use the list.h functions for simplicity.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Christoph Hellwig [Wed, 23 Jun 2010 08:11:15 +0000 (18:11 +1000)]
xfs: remove unneeded #include statements
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Christoph Hellwig [Wed, 23 Jun 2010 08:11:15 +0000 (18:11 +1000)]
xfs: drop dmapi hooks
Dmapi support was never merged upstream, but we still have a lot of hooks
bloating XFS for it, all over the fast pathes of the filesystem.
This patch drops over 700 lines of dmapi overhead. If we'll ever get HSM
support in mainline at least the namespace events can be done much saner
in the VFS instead of the individual filesystem, so it's not like this
is much help for future work.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Linus Torvalds [Thu, 22 Jul 2010 19:13:38 +0000 (12:13 -0700)]
Linux 2.6.35-rc6
Linus Torvalds [Thu, 22 Jul 2010 18:46:15 +0000 (11:46 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: synaptics - relax capability ID checks on newer hardware
Input: twl40300-keypad - fix handling of "all ground" rows
Input: gamecon - reference correct pad in gc_psx_command()
Input: gamecon - reference correct input device in NES mode
Input: w90p910_keypad - change platfrom driver name to 'nuc900-kpi'
Input: i8042 - add Gigabyte Spring Peak to dmi_noloop_table
Input: qt2160 - rename kconfig symbol name
Linus Torvalds [Thu, 22 Jul 2010 18:45:57 +0000 (11:45 -0700)]
Merge branch 'drm-fixes' of git://git./linux/kernel/git/airlied/drm-2.6
* 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
drm/radeon/kms: add quirk to make HP DV5000 laptop resume
drm/radeon/kms: fix RADEON_INFO_CRTC_FROM_ID info ioctl
Fix ttm_page_alloc.c build breakage
drm/radeon/kms: fix legacy LVDS dpms sequence
drm/radeon/kms: drop taking lock around crtc lookup.
Linus Torvalds [Thu, 22 Jul 2010 18:45:23 +0000 (11:45 -0700)]
Merge git://git./linux/kernel/git/herbert/crypto-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: talitos - fix bug in sg_copy_end_to_buffer
Linus Torvalds [Thu, 22 Jul 2010 18:45:02 +0000 (11:45 -0700)]
Merge branch 'x86/auditsyscall' of git://git./linux/kernel/git/frob/linux-2.6-roland
* 'x86/auditsyscall' of git://git.kernel.org/pub/scm/linux/kernel/git/frob/linux-2.6-roland:
x86: auditsyscall: fix fastpath return value after reschedule
Linus Torvalds [Thu, 22 Jul 2010 18:44:26 +0000 (11:44 -0700)]
Merge branch 'for_linus' of git://git./linux/kernel/git/jwessel/linux-2.6-kgdb
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb:
sysrq,kdb: Use __handle_sysrq() for kdb's sysrq function
debug_core,kdb: fix kgdb_connected bit set in the wrong place
Fix merge regression from external kdb to upstream kdb
repair gdbstub to match the gdbserial protocol specification
kdb: break out of kdb_ll() when command is terminated
David Howells [Thu, 22 Jul 2010 11:53:18 +0000 (12:53 +0100)]
CIFS: Fix a malicious redirect problem in the DNS lookup code
Fix the security problem in the CIFS filesystem DNS lookup code in which a
malicious redirect could be installed by a random user by simply adding a
result record into one of their keyrings with add_key() and then invoking a
CIFS CFS lookup [CVE-2010-2524].
This is done by creating an internal keyring specifically for the caching of
DNS lookups. To enforce the use of this keyring, the module init routine
creates a set of override credentials with the keyring installed as the thread
keyring and instructs request_key() to only install lookup result keys in that
keyring.
The override is then applied around the call to request_key().
This has some additional benefits when a kernel service uses this module to
request a key:
(1) The result keys are owned by root, not the user that caused the lookup.
(2) The result keys don't pop up in the user's keyrings.
(3) The result keys don't come out of the quota of the user that caused the
lookup.
The keyring can be viewed as root by doing cat /proc/keys:
2a0ca6c3 I----- 1 perm
1f030000 0 0 keyring .dns_resolver: 1/4
It can then be listed with 'keyctl list' by root.
# keyctl list 0x2a0ca6c3
1 key in keyring:
726766307: --alswrv 0 0 dns_resolver: foo.bar.com
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-and-Tested-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Steve French <smfrench@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alex Deucher [Thu, 22 Jul 2010 03:54:35 +0000 (23:54 -0400)]
drm/radeon/kms: add quirk to make HP DV5000 laptop resume
Fixes:
https://bugs.freedesktop.org/show_bug.cgi?id=29062
Reported-by: Andres Cimmarusti <acimmarusti@gmail.com>
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dmitry Torokhov [Wed, 21 Jul 2010 07:01:19 +0000 (00:01 -0700)]
Input: synaptics - relax capability ID checks on newer hardware
Older firmwares fixed the middle byte of the Synaptics capabilities
query to 0x47, but starting with firmware 7.5 the middle byte
represents submodel ID, sometimes also called "dash number".
Reported-and-tested-by: Miroslav Å ulc <fordfrog@gmail.com>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Roland McGrath [Thu, 22 Jul 2010 00:44:12 +0000 (17:44 -0700)]
x86: auditsyscall: fix fastpath return value after reschedule
In the CONFIG_AUDITSYSCALL fast-path for x86 64-bit system calls,
we can pass a bad return value and/or error indication for the
system call to audit_syscall_exit(). This happens when
TIF_NEED_RESCHED was set as the system call returned, so we went
out to schedule() and came back to the exit-audit fast-path. The
fix is to reload the user return value register from the pt_regs
before using it for audit_syscall_exit().
Both the 32-bit kernel's fast path and the 64-bit kernel's 32-bit
system call fast paths work slightly differently, so that they
always leave the fast path entirely to reschedule and don't return
there, so they don't have the analogous bugs.
Reported-by: Alexander Viro <aviro@redhat.com>
Signed-off-by: Roland McGrath <roland@redhat.com>
Jason Wessel [Thu, 22 Jul 2010 00:27:07 +0000 (19:27 -0500)]
sysrq,kdb: Use __handle_sysrq() for kdb's sysrq function
The kdb code should not toggle the sysrq state in case an end user
wants to try and resume the normal kernel execution.
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Jason Wessel [Thu, 22 Jul 2010 00:27:07 +0000 (19:27 -0500)]
debug_core,kdb: fix kgdb_connected bit set in the wrong place
Immediately following an exit from the kdb shell the kgdb_connected
variable should be set to zero, unless there are breakpoints planted.
If the kgdb_connected variable is not zeroed out with kdb, it is
impossible to turn off kdb.
This patch is merely a work around for now, the real fix will check
for the breakpoints.
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Jason Wessel [Thu, 22 Jul 2010 00:27:06 +0000 (19:27 -0500)]
Fix merge regression from external kdb to upstream kdb
In the process of merging kdb to the mainline, the kdb lsmod command
stopped printing the base load address of kernel modules. This is
needed for using kdb in conjunction with external tools such as gdb.
Simply restore the functionality by adding a kdb_printf for the base
load address of the kernel modules.
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Jason Wessel [Thu, 22 Jul 2010 00:27:05 +0000 (19:27 -0500)]
repair gdbstub to match the gdbserial protocol specification
The gdbserial protocol handler should return an empty packet instead
of an error string when ever it responds to a command it does not
implement.
The problem cases come from a debugger client sending
qTBuffer, qTStatus, qSearch, qSupported.
The incorrect response from the gdbstub leads the debugger clients to
not function correctly. Recent versions of gdb will not detach correctly as a result of this behavior.
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Dongdong Deng <dongdong.deng@windriver.com>
Martin Hicks [Thu, 22 Jul 2010 00:27:05 +0000 (19:27 -0500)]
kdb: break out of kdb_ll() when command is terminated
Without this patch the "ll" linked-list traversal command won't
terminate when you hit q/Q.
Signed-off-by: Martin Hicks <mort@sgi.com>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Alex Deucher [Wed, 21 Jul 2010 18:05:35 +0000 (14:05 -0400)]
drm/radeon/kms: fix RADEON_INFO_CRTC_FROM_ID info ioctl
Return the crtc_id, not the counter value. They are not
necessarily the same.
Cc: Jerome Glisse <glisse@freedesktop.org>
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Luck, Tony [Wed, 21 Jul 2010 17:15:39 +0000 (10:15 -0700)]
Fix ttm_page_alloc.c build breakage
The commit
1e8655f87333def92bb8215b423adc65403b08a5
drm/ttm: Fix build on architectures without AGP
looks at TTM_HAS_AGP before it has been set in ttm_bo_driver.h
Move the conditional inclusion of <asm/agp.h> *after* we have included
ttm_bo_driver.h
Signed-of-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Alex Deucher [Wed, 21 Jul 2010 23:37:21 +0000 (19:37 -0400)]
drm/radeon/kms: fix legacy LVDS dpms sequence
Add delay after turning off the LVDS encoder.
Fixes:
https://bugzilla.kernel.org/show_bug.cgi?id=16389
Tested-by: Jan Kreuzer <kontrollator@gmx.de>
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Cc: stable@kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Wed, 21 Jul 2010 23:57:13 +0000 (09:57 +1000)]
drm/radeon/kms: drop taking lock around crtc lookup.
We only add/remove crtcs at driver load, you cannot remove when
the GPU is running a CS packet since the fd is open, when
GPU hotplugging on radeons actually is needed all this locking
needs a review and I've started re-working kms core locking to deal
with this better. But for now avoid long delays in CS processing when
hotplug detect is happening in a different thread.
this fixes a regression introduced with hotplug detection.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Linus Torvalds [Wed, 21 Jul 2010 16:31:15 +0000 (09:31 -0700)]
Merge branch 'urgent' of git://git./linux/kernel/git/brodo/pcmcia-2.6
* 'urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/brodo/pcmcia-2.6:
pcmcia: fix 'driver ... did not release config properly' warning
Linus Torvalds [Wed, 21 Jul 2010 16:30:59 +0000 (09:30 -0700)]
Merge branch 'shrinker' of git://git./linux/kernel/git/dgc/xfsdev
* 'shrinker' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/xfsdev:
mm: add context argument to shrinker callback to remaining shrinkers
Linus Torvalds [Wed, 21 Jul 2010 16:29:39 +0000 (09:29 -0700)]
Merge branch 'fix/asoc' of git://git./linux/kernel/git/tiwai/sound-2.6
* 'fix/asoc' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
ASoC: Select wm_hubs automatically for WM8994
ASoC: Remove duplicate AUX definition from WM8776
ASoC:: remove a redundant snd_soc_unregister_codec call in wm8988_register
ASoC: wm8727: add a missing return in wm8727_platform_probe
ASoC: fsi: fixup wrong value setting order of TDM
ASoC: fsi: fixup clock inversion operation
Linus Torvalds [Wed, 21 Jul 2010 16:28:50 +0000 (09:28 -0700)]
Merge git://git./linux/kernel/git/davem/sparc-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
math-emu: correct test for downshifting fraction in _FP_FROM_INT()
perf: Add DWARF register lookup for sparc
MAINTAINERS: Add SBUS driver path to sparc entry.
drivers/sbus: Remove unnecessary casts of private_data
sparc: remove homegrown L1_CACHE_ALIGN macro
sparc64: fix the build error due to smp_kgdb_capture_client()
sparc64: Fix maybe_change_configuration() PCR setting.
arch/sparc/kernel: Eliminate what looks like a NULL pointer dereference
sparc64: Update defconfig.
sunsu: Fix use after free in su_remove().
sunserial: Don't call add_preferred_console() when console= is specified.
sparc32: Kill none_mask, it's bogus.
Linus Torvalds [Wed, 21 Jul 2010 16:25:42 +0000 (09:25 -0700)]
Fix up trivial spelling errors ('taht' -> 'that')
Pointed out by Lucas who found the new one in a comment in
setup_percpu.c. And then I fixed the others that I grepped
for.
Reported-by: Lucas <canolucas@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patrick McHardy [Tue, 20 Jul 2010 22:21:42 +0000 (15:21 -0700)]
pcmcia: fix 'driver ... did not release config properly' warning
Up to 2.6.34 pcmcia_release_irq() reset p_dev->_irq to 0 after releasing
the irq. The IRQ is now released in pcmcia_disable_device(), however
p_dev->_irq is not reset, triggering a warning in pcmcia_device_remove().
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Dave Chinner [Wed, 21 Jul 2010 05:33:01 +0000 (15:33 +1000)]
mm: add context argument to shrinker callback to remaining shrinkers
Add the shrinkers missed in the first conversion of the API in
commit
7f8275d0d660c146de6ee3017e1e2e594c49e820 ("mm: add context argument to
shrinker callback").
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Dmitry Torokhov [Wed, 21 Jul 2010 03:25:35 +0000 (20:25 -0700)]
Input: twl40300-keypad - fix handling of "all ground" rows
The Nokia RX51 board code (arch/arm/mach-omap2/board-rx51-peripherals.c)
defines a key map for the matrix keypad keyboard. The hardware seems to
use all of the 8 rows and 8 columns of the keypad, although not all
possible locations are used.
The TWL4030 supports keypads with at most 8 rows and 8 columns. Most keys
are defined with a row and column number between 0 and 7, except
KEY(0xff, 2, KEY_F9),
KEY(0xff, 4, KEY_F10),
KEY(0xff, 5, KEY_F11),
which represent keycodes that should be emitted when entire row is
connected to the ground. since the driver handles this case as if we
had an extra column in the key matrix. Unfortunately we do not allocate
enough space and end up owerwriting some random memory.
Reported-and-tested-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Cc: stable@kernel.org
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Dmitry Torokhov [Wed, 21 Jul 2010 03:25:35 +0000 (20:25 -0700)]
Input: gamecon - reference correct pad in gc_psx_command()
Otherwise we won't see any events from the gamepad.
Addresses https://bugzilla.kernel.org/show_bug.cgi?id=16408
Reported-and-tested-by: Eugene Yudin <eugene.yudin@gmail.com>
Cc: stable@kernel.org
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Dmitry Torokhov [Wed, 21 Jul 2010 03:25:35 +0000 (20:25 -0700)]
Input: gamecon - reference correct input device in NES mode
We moved input devices from 'struct gc' to individial pads (struct
gc-pad), but gc_nes_process_packet() was still trying to use old
ones and crashing.
Cc: stable@kernel.org
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Mikael Pettersson [Wed, 21 Jul 2010 01:45:14 +0000 (18:45 -0700)]
math-emu: correct test for downshifting fraction in _FP_FROM_INT()
The kernel's math-emu code contains a macro _FP_FROM_INT() which is
used to convert an integer to a raw normalized floating-point value.
It does this basically in three steps:
1. Compute the exponent from the number of leading zero bits.
2. Downshift large fractions to put the MSB in the right position
for normalized fractions.
3. Upshift small fractions to put the MSB in the right position.
There is an boundary error in step 2, causing a fraction with its
MSB exactly one bit above the normalized MSB position to not be
downshifted. This results in a non-normalized raw float, which when
packed becomes a massively inaccurate representation for that input.
The impact of this depends on a number of arch-specific factors,
but it is known to have broken emulation of FXTOD instructions
on UltraSPARC III, which was originally reported as GCC bug 44631
<http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44631>.
Any arch which uses math-emu to emulate conversions from integers to
same-size floats may be affected.
The fix is simple: the exponent comparison used to determine if the
fraction should be downshifted must be "<=" not "<".
I'm sending a kernel module to test this as a reply to this message.
There are also SPARC user-space test cases in the GCC bug entry.
Signed-off-by: Mikael Pettersson <mikpe@it.uu.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Wed, 21 Jul 2010 01:29:25 +0000 (18:29 -0700)]
Merge branch 'drm-fixes' of git://git./linux/kernel/git/airlied/drm-2.6
* 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
drm/r600: fix possible NULL pointer derefernce
drm/radeon/kms: add quirk for ASUS HD 3600 board
include/linux/vgaarb.h: add missing part of include guard
drm/nouveau: Fix crashes during fbcon init on single head cards.
drm/nouveau: fix pcirom vbios shadow breakage from acpi rom patch
drm/radeon/kms: fix shared ddc harder
drm/i915: enable low power render writes on GEN3 hardware.
drm/i915: Define MI_ARB_STATE bits
vmwgfx: return -EFAULT if copy_to_user fails
fb: handle allocation failure in alloc_apertures()
drm: radeon: check kzalloc() result
drm/ttm: Fix build on architectures without AGP
drm/radeon/kms: fix gtt MC base alignment on rs4xx/rs690/rs740 asics
drm/radeon/kms: fix possible mis-detection of sideport on rs690/rs740
drm/radeon/kms: fix legacy tv-out pal mode
Alex Deucher [Wed, 21 Jul 2010 00:29:32 +0000 (10:29 +1000)]
drm/r600: fix possible NULL pointer derefernce
Reported-by: Alexander Y. Fomichev <git.user@gmail.com>
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Alex Deucher [Tue, 20 Jul 2010 22:07:22 +0000 (18:07 -0400)]
drm/radeon/kms: add quirk for ASUS HD 3600 board
Connector is actually DVI rather than HDMI.
Reported-by: trapDoor <trapdoor6@gmail.com>
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Cc: stable@kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
Doug Goldstein [Tue, 20 Jul 2010 22:22:25 +0000 (15:22 -0700)]
include/linux/vgaarb.h: add missing part of include guard
vgaarb.h was missing the #define of the #ifndef at the top for the guard
to prevent multiple #include's from causing re-define errors
Signed-off-by: Doug Goldstein <cardoe@gentoo.org>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Linus Torvalds [Tue, 20 Jul 2010 23:27:58 +0000 (16:27 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/sage/ceph-client
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
ceph: do not include cap/dentry releases in replayed messages
ceph: reuse request message when replaying against recovering mds
ceph: fix creation of ipv6 sockets
ceph: fix parsing of ipv6 addresses
ceph: fix printing of ipv6 addrs
ceph: add kfree() to error path
ceph: fix leak of mon authorizer
ceph: fix message revocation
Linus Torvalds [Tue, 20 Jul 2010 23:27:34 +0000 (16:27 -0700)]
Merge branch 'linuxdocs' of git://git./linux/kernel/git/rdunlap/linux-docs
* 'linuxdocs' of git://git.kernel.org/pub/scm/linux/kernel/git/rdunlap/linux-docs:
documentation: fix almost duplicate filenames (IO/io-mapping.txt)
Linus Torvalds [Tue, 20 Jul 2010 23:26:42 +0000 (16:26 -0700)]
Merge git://git./linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (24 commits)
bridge: Partially disable netpoll support
tcp: fix crash in tcp_xmit_retransmit_queue
IPv6: fix CoA check in RH2 input handler (mip6_rthdr_input())
ibmveth: lost IRQ while closing/opening device leads to service loss
rt2x00: Fix lockdep warning in rt2x00lib_probe_dev()
vhost: avoid pr_err on condition guest can trigger
ipmr: Don't leak memory if fib lookup fails.
vhost-net: avoid flush under lock
net: fix problem in reading sock TX queue
net/core: neighbour update Oops
net: skb_tx_hash() fix relative to skb_orphan_try()
rfs: call sock_rps_record_flow() in tcp_splice_read()
xfrm: do not assume that template resolving always returns xfrms
hostap_pci: set dev->base_addr during probe
axnet_cs: use spin_lock_irqsave in ax_interrupt
dsa: Fix Kconfig dependencies.
act_nat: not all of the ICMP packets need an IP header payload
r8169: incorrect identifier for a 8168dp
Phonet: fix skb leak in pipe endpoint accept()
Bluetooth: Update sec_level/auth_type for already existing connections
...
Paul E. McKenney [Tue, 20 Jul 2010 20:24:34 +0000 (13:24 -0700)]
vfs: fix RCU-lockdep false positive due to /proc
If a single-threaded process does a file-descriptor operation, and some
other process accesses that same file descriptor via /proc, the current
rcu_dereference_check_fdtable() can give a false-positive RCU-lockdep
splat due to the reference count being increased by the /proc access after
the reference-count check in fget_light() but before the check in
rcu_dereference_check_fdtable().
This commit prevents this false positive by checking for a single-threaded
process. To avoid #include hell, this commit uses the wrapper for
thread_group_empty(current) defined by rcu_my_thread_group_empty()
provided in a separate commit.
Located-by: Miles Lane <miles.lane@gmail.com>
Located-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Marek Szyprowski [Tue, 20 Jul 2010 20:24:33 +0000 (13:24 -0700)]
sdhci-s3c: add missing remove function
System will crash sooner or later once the memory with the code of the
s3c-sdhci.ko module is reused for something else. I really have no idea
how the lack of remove function went unnoticed into the mainline code.
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andres Salomon [Tue, 20 Jul 2010 20:24:32 +0000 (13:24 -0700)]
Andres has moved
My Collabora address is no longer enabled - update the MODULE_AUTHOR
fields of drivers to my current email address.
Signed-off-by: Andres Salomon <dilinger@queued.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yinghai Lu [Tue, 20 Jul 2010 20:24:31 +0000 (13:24 -0700)]
x86,nobootmem: make alloc_bootmem_node fall back to other node when 32bit numa is used
Borislav Petkov reported his 32bit numa system has problem:
[ 0.000000] Reserving total of 4c00 pages for numa KVA remap
[ 0.000000] kva_start_pfn ~ 32800 max_low_pfn ~ 375fe
[ 0.000000] max_pfn = 238000
[ 0.000000] 8202MB HIGHMEM available.
[ 0.000000] 885MB LOWMEM available.
[ 0.000000] mapped low ram: 0 -
375fe000
[ 0.000000] low ram: 0 -
375fe000
[ 0.000000] alloc (nid=8 100000 -
7ee00000) (
1000000 -
ffffffff) 1000 1000 =>
34e7000
[ 0.000000] alloc (nid=8 100000 -
7ee00000) (
1000000 -
ffffffff) 200 40 =>
34c9d80
[ 0.000000] alloc (nid=0 100000 -
7ee00000) (
1000000 -
ffffffffffffffff) 180 40 =>
34e6140
[ 0.000000] alloc (nid=1
80000000 -
c7e60000) (
1000000 -
ffffffffffffffff) 240 40 =>
80000000
[ 0.000000] BUG: unable to handle kernel paging request at
40000000
[ 0.000000] IP: [<
c2c8cff1>] __alloc_memory_core_early+0x147/0x1d6
[ 0.000000] *pdpt =
0000000000000000 *pde =
f000ff53f000ff00
...
[ 0.000000] Call Trace:
[ 0.000000] [<
c2c8b4f8>] ? __alloc_bootmem_node+0x216/0x22f
[ 0.000000] [<
c2c90c9b>] ? sparse_early_usemaps_alloc_node+0x5a/0x10b
[ 0.000000] [<
c2c9149e>] ? sparse_init+0x1dc/0x499
[ 0.000000] [<
c2c79118>] ? paging_init+0x168/0x1df
[ 0.000000] [<
c2c780ff>] ? native_pagetable_setup_start+0xef/0x1bb
looks like it allocates too much high address for bootmem.
Try to cut limit with get_max_mapped()
Reported-by: Borislav Petkov <borislav.petkov@amd.com>
Tested-by: Conny Seidel <conny.seidel@amd.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: <stable@kernel.org> [2.6.34.x]
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yinghai Lu [Tue, 20 Jul 2010 20:24:30 +0000 (13:24 -0700)]
x86, numa: fix boot without RAM on node0 again
Commit
e534c7c5f8d6 ("numa: x86_64: use generic percpu var
numa_node_id() implementation") broke numa systems that don't have ram
on node0 when MEMORY_HOTPLUG is enabled, because cpu_up() will call
cpu_to_node() before per_cpu(numa_node) is setup for APs.
When Node0 doesn't have RAM, on x86, cpus already round it to nearest
node with RAM in x86_cpu_to_node_map. and per_cpu(numa_node) is not set
up until in c_init for APs.
When later cpu_up() calling cpu_to_node() will get 0 again, and make it
online even there is no RAM on node0. so later all APs can not booted up,
and later will have panic.
[ 1.611101] On node 0 totalpages: 0
.........
[ 2.608558] On node 0 totalpages: 0
[ 2.612065] Brought up 1 CPUs
[ 2.615199] Total of 1 processors activated (3990.31 BogoMIPS).
...
93.225341] calling loop_init+0x0/0x1a4 @ 1
[ 93.229314] PERCPU: allocation failed, size=80 align=8, failed to populate
[ 93.246539] Pid: 1, comm: swapper Tainted: G W
2.6.35-rc4-tip-yh-04371-gd64e6c4-dirty #354
[ 93.264621] Call Trace:
[ 93.266533] [<
ffffffff81125e43>] pcpu_alloc+0x83a/0x8e7
[ 93.270710] [<
ffffffff81125f15>] __alloc_percpu+0x10/0x12
[ 93.285849] [<
ffffffff8140786c>] alloc_disk_node+0x94/0x16d
[ 93.291811] [<
ffffffff81407956>] alloc_disk+0x11/0x13
[ 93.306157] [<
ffffffff81503e51>] loop_alloc+0xa7/0x180
[ 93.310538] [<
ffffffff8277ef48>] loop_init+0x9b/0x1a4
[ 93.324909] [<
ffffffff8277eead>] ? loop_init+0x0/0x1a4
[ 93.329650] [<
ffffffff810001f2>] do_one_initcall+0x57/0x136
[ 93.345197] [<
ffffffff827486d0>] kernel_init+0x184/0x20e
[ 93.348146] [<
ffffffff81034954>] kernel_thread_helper+0x4/0x10
[ 93.365194] [<
ffffffff81c7cc3c>] ? restore_args+0x0/0x30
[ 93.369305] [<
ffffffff8274854c>] ? kernel_init+0x0/0x20e
[ 93.386011] [<
ffffffff81034950>] ? kernel_thread_helper+0x0/0x10
[ 93.392047] loop: out of memory
...
Try to assign per_cpu(numa_node) early
[akpm@linux-foundation.org: tidy up code comment]
Signed-off-by: Yinghai <yinghai@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Denys Vlasenko <vda.linux@googlemail.com>
Acked-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Anton Vorontsov [Tue, 20 Jul 2010 20:24:28 +0000 (13:24 -0700)]
edac: mpc85xx: add support for MPC8569 EDAC controllers
Simply add a proper ID into the device table.
Signed-off-by: Anton Vorontsov <avorontsov@mvista.com>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: Peter Tyser <ptyser@xes-inc.com>
Cc: Dave Jiang <djiang@mvista.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Anton Vorontsov [Tue, 20 Jul 2010 20:24:27 +0000 (13:24 -0700)]
edac: mpc85xx: fix MPC85xx dependency
Since commit
5753c082f66eca5be81f6bda85c1718c5eea6ada ("powerpc/85xx:
Kconfig cleanup"), there is no MPC85xx Kconfig symbol anymore, so the
driver became non-selectable.
This patch fixes the issue by switching to PPC_85xx symbol.
Signed-off-by: Anton Vorontsov <avorontsov@mvista.com>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: Peter Tyser <ptyser@xes-inc.com>
Cc: Dave Jiang <djiang@mvista.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nick Piggin [Tue, 20 Jul 2010 20:24:25 +0000 (13:24 -0700)]
mm/vmscan.c: fix mapping use after free
We need lock_page_nosync() here because we have no reference to the
mapping when taking the page lock.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Reviewed-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Manfred Spraul [Tue, 20 Jul 2010 20:24:23 +0000 (13:24 -0700)]
ipc/sem.c: bugfix for semop() not reporting successful operation
The last change to improve the scalability moved the actual wake-up out of
the section that is protected by spin_lock(sma->sem_perm.lock).
This means that IN_WAKEUP can be in queue.status even when the spinlock is
acquired by the current task. Thus the same loop that is performed when
queue.status is read without the spinlock acquired must be performed when
the spinlock is acquired.
Thanks to kamezawa.hiroyu@jp.fujitsu.com for noticing lack of the memory
barrier.
Addresses https://bugzilla.kernel.org/show_bug.cgi?id=16255
[akpm@linux-foundation.org: clean up kerneldoc, checkpatch warning and whitespace]
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Reported-by: Luca Tettamanti <kronos.it@gmail.com>
Tested-by: Luca Tettamanti <kronos.it@gmail.com>
Reported-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Maciej Rutecki <maciej.rutecki@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Francisco Jerez [Sat, 10 Jul 2010 15:37:00 +0000 (17:37 +0200)]
drm/nouveau: Fix crashes during fbcon init on single head cards.
this fixes a regression since the fbcon rework.
Signed-off-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Ben Skeggs [Mon, 12 Jul 2010 03:15:44 +0000 (13:15 +1000)]
drm/nouveau: fix pcirom vbios shadow breakage from acpi rom patch
On nv50 it became impossible to attempt a PCI ROM shadow of the VBIOS,
which will break some setups.
This patch also removes the different ordering of shadow methods for
pre-nv50 chipsets. The reason for the different ordering was paranoia,
but it should hopefully be OK to try shadowing PRAMIN first.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Alex Deucher [Tue, 20 Jul 2010 15:27:54 +0000 (11:27 -0400)]
drm/radeon/kms: fix shared ddc harder
This fixes a regression caused by
b2ea4aa67bfd084834edd070e0a4a47857d6db59
due to the way shared ddc with multiple digital connectors was handled.
You generally have two cases where DDC lines are shared:
- HDMI + VGA
- HDMI + DVI-D
HDMI + VGA is easy to deal with because you can check the EDID for the
to see if the attached monitor is digital. A shared DDC line with two
digital connectors is more complex. You can't use the hdmi bits in the
EDID since they may not be there with DVI<->HDMI adapters. In this case
all we can do is check the HPD pins to see which is connected as we have
no way of knowing using the EDID.
Reported-by: trapdoor6@gmail.com
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Cc: stable@kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
Randy Dunlap [Mon, 19 Jul 2010 22:20:27 +0000 (22:20 +0000)]
documentation: fix almost duplicate filenames (IO/io-mapping.txt)
Having both IO-mapping.txt and io-mapping.txt in Documentation/
was confusing and/or bothersome to some people, so rename
IO-mapping.txt to bus-virt-phys-mapping.txt. Also update
Documentation/00-INDEX for both of these files.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Kees Bakker <kees.bakker@xs4all.nl>
Cc: Keith Packard <keithp@keithp.com>
Linus Torvalds [Tue, 20 Jul 2010 15:22:15 +0000 (08:22 -0700)]
Merge git://git.infradead.org/users/cbou/battery-2.6.35
* git://git.infradead.org/users/cbou/battery-2.6.35:
ds2782_battery: Fix ds2782_get_capacity return value