Christoph Hellwig [Mon, 10 Oct 2011 16:52:53 +0000 (16:52 +0000)]
xfs: do not flush data workqueues in xfs_flush_buftarg
When we call xfs_flush_buftarg (generally from sync or umount) it already
is too late to flush the data workqueues, as I/O completion is signalled
for them and we are thus already done with the data we would flush here.
There are places where flushing them might be useful, but the current
sync interface doesn't give us that opportunity.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
Christoph Hellwig [Mon, 10 Oct 2011 16:52:52 +0000 (16:52 +0000)]
xfs: remove XFS_bflush
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
Christoph Hellwig [Mon, 10 Oct 2011 16:52:51 +0000 (16:52 +0000)]
xfs: remove xfs_buf_target_name
The calling convention that returns a pointer to a static buffer is
fairly nasty, so just opencode it in the only caller that is left.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
Christoph Hellwig [Mon, 10 Oct 2011 16:52:50 +0000 (16:52 +0000)]
xfs: use xfs_ioerror_alert in xfs_buf_iodone_callbacks
Use xfs_ioerror_alert instead of opencoding a very similar error
message.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
Christoph Hellwig [Mon, 10 Oct 2011 16:52:49 +0000 (16:52 +0000)]
xfs: clean up xfs_ioerror_alert
Instead of passing the block number and mount structure explicitly
get them off the bp and fix make the argument order more natural.
Also move it to xfs_buf.c and stop printing the device name given
that we already get the fs name as part of xfs_alert, and we know
what device is operates on because of the caller that gets printed,
finally rename it to xfs_buf_ioerror_alert and pass __func__ as
argument where it makes sense.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
Christoph Hellwig [Mon, 10 Oct 2011 16:52:48 +0000 (16:52 +0000)]
xfs: clean up buffer allocation
Change _xfs_buf_initialize to allocate the buffer directly and rename it to
xfs_buf_alloc now that is the only buffer allocation routine. Also remove
the xfs_buf_deallocate wrapper around the kmem_zone_free calls for buffers.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
Christoph Hellwig [Mon, 10 Oct 2011 16:52:47 +0000 (16:52 +0000)]
xfs: remove buffers from the delwri list in xfs_buf_stale
For each call to xfs_buf_stale we call xfs_buf_delwri_dequeue either
directly before or after it, or are guaranteed by the surrounding
conditionals that we are never called on delwri buffers. Simply
this situation by moving the call to xfs_buf_delwri_dequeue into
xfs_buf_stale.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
Christoph Hellwig [Mon, 10 Oct 2011 16:52:46 +0000 (16:52 +0000)]
xfs: remove XFS_BUF_STALE and XFS_BUF_SUPER_STALE
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
Christoph Hellwig [Mon, 10 Oct 2011 16:52:45 +0000 (16:52 +0000)]
xfs: remove XFS_BUF_SET_VTYPE and XFS_BUF_SET_VTYPE_REF
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
Christoph Hellwig [Mon, 10 Oct 2011 16:52:44 +0000 (16:52 +0000)]
xfs: remove XFS_BUF_FINISH_IOWAIT
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
Christoph Hellwig [Mon, 10 Oct 2011 16:52:43 +0000 (16:52 +0000)]
xfs: remove xfs_get_buftarg_list
The code is unused and under a config option that doesn't exist, remove it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
Christoph Hellwig [Wed, 14 Sep 2011 14:08:26 +0000 (14:08 +0000)]
xfs: fix buffer flushing during unmount
The code to flush buffers in the umount code is a bit iffy: we first
flush all delwri buffers out, but then might be able to queue up a
new one when logging the sb counts. On a normal shutdown that one
would get flushed out when doing the synchronous superblock write in
xfs_unmountfs_writesb, but we skip that one if the filesystem has
been shut down.
Fix this by moving the delwri list flushing until just before unmounting
the log, and while we're at it also remove the superflous delwri list
and buffer lru flusing for the rt and log device that can never have
cached or delwri buffers.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Amit Sahrawat <amit.sahrawat83@gmail.com>
Tested-by: Amit Sahrawat <amit.sahrawat83@gmail.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
Christoph Hellwig [Sun, 2 Oct 2011 14:25:16 +0000 (14:25 +0000)]
xfs: optimize fsync on directories
Directories are only updated transactionally, which means fsync only
needs to flush the log the inode is currently dirty, but not bother
with checking for dirty data, non-transactional updates, and most
importanly doesn't have to flush disk caches except as part of a
transaction commit.
While the first two optimizations can't easily be measured, the
latter actually makes a difference when doing lots of fsync that do
not actually have to commit the inode, e.g. because an earlier fsync
already pushed the log far enough.
The new xfs_dir_fsync is identical to xfs_nfs_commit_metadata except
for the prototype, but I'm not sure creating a common helper for the
two is worth it given how simple the functions are.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
Dave Chinner [Fri, 30 Sep 2011 04:45:03 +0000 (04:45 +0000)]
xfs: reduce the number of log forces from tail pushing
The AIL push code will issue a log force on ever single push loop
that it exits and has encountered pinned items. It doesn't rescan
these pinned items until it revisits the AIL from the start. Hence
we only need to force the log once per walk from the start of the
AIL to the target LSN.
This results in numbers like this:
xs_push_ail_flush..... 1456
xs_log_force......... 1485
For an 8-way 50M inode create workload - almost all the log forces
are coming from the AIL pushing code.
Reduce the number of log forces by only forcing the log if the
previous walk found pinned buffers. This reduces the numbers to:
xs_push_ail_flush..... 665
xs_log_force......... 682
For the same test.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
Dave Chinner [Fri, 30 Sep 2011 04:45:02 +0000 (04:45 +0000)]
xfs: Don't allocate new buffers on every call to _xfs_buf_find
Stats show that for an 8-way unlink @ ~80,000 unlinks/s we are doing
~1 million cache hit lookups to ~3000 buffer creates. That's almost
3 orders of magnitude more cahce hits than misses, so optimising for
cache hits is quite important. In the cache hit case, we do not need
to allocate a new buffer in case of a cache miss, so we are
effectively hitting the allocator for no good reason for vast the
majority of calls to _xfs_buf_find. 8-way create workloads are
showing similar cache hit/miss ratios.
The result is profiles that look like this:
samples pcnt function DSO
_______ _____ _______________________________ _________________
1036.00 10.0% _xfs_buf_find [kernel.kallsyms]
582.00 5.6% kmem_cache_alloc [kernel.kallsyms]
519.00 5.0% __memcpy [kernel.kallsyms]
468.00 4.5% __ticket_spin_lock [kernel.kallsyms]
388.00 3.7% kmem_cache_free [kernel.kallsyms]
331.00 3.2% xfs_log_commit_cil [kernel.kallsyms]
Further, there is a fair bit of work involved in initialising a new
buffer once a cache miss has occurred and we currently do that under
the rbtree spinlock. That increases spinlock hold time on what are
heavily used trees.
To fix this, remove the initialisation of the buffer from
_xfs_buf_find() and only allocate the new buffer once we've had a
cache miss. Initialise the buffer immediately after allocating it in
xfs_buf_get, too, so that is it ready for insert if we get another
cache miss after allocation. This minimises lock hold time and
avoids unnecessary allocator churn. The resulting profiles look
like:
samples pcnt function DSO
_______ _____ ___________________________ _________________
8111.00 9.1% _xfs_buf_find [kernel.kallsyms]
4380.00 4.9% __memcpy [kernel.kallsyms]
4341.00 4.8% __ticket_spin_lock [kernel.kallsyms]
3401.00 3.8% kmem_cache_alloc [kernel.kallsyms]
2856.00 3.2% xfs_log_commit_cil [kernel.kallsyms]
2625.00 2.9% __kmalloc [kernel.kallsyms]
2380.00 2.7% kfree [kernel.kallsyms]
2016.00 2.3% kmem_cache_free [kernel.kallsyms]
Showing a significant reduction in time spent doing allocation and
freeing from slabs (kmem_cache_alloc and kmem_cache_free).
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
Christoph Hellwig [Mon, 19 Sep 2011 15:00:54 +0000 (15:00 +0000)]
xfs: simplify xfs_trans_ijoin* again
There is no reason to keep a reference to the inode even if we unlock
it during transaction commit because we never drop a reference between
the ijoin and commit. Also use this fact to merge xfs_trans_ijoin_ref
back into xfs_trans_ijoin - the third argument decides if an unlock
is needed now.
I'm actually starting to wonder if allowing inodes to be unlocked
at transaction commit really is worth the effort. The only real
benefit is that they can be unlocked earlier when commiting a
synchronous transactions, but that could be solved by doing the
log force manually after the unlock, too.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
Christoph Hellwig [Sun, 18 Sep 2011 20:47:51 +0000 (20:47 +0000)]
xfs: unlock the inode before log force in xfs_change_file_space
Let the transaction commit unlock the inode before it potentially causes
a synchronous log force.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
Christoph Hellwig [Sun, 18 Sep 2011 20:47:50 +0000 (20:47 +0000)]
xfs: unlock the inode before log force in xfs_fs_nfs_commit_metadata
Only read the LSN we need to push to with the ilock held, and then release
it before we do the log force to improve concurrency.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
Christoph Hellwig [Mon, 19 Sep 2011 14:55:51 +0000 (14:55 +0000)]
xfs: unlock the inode before log force in xfs_fsync
Only read the LSN we need to push to with the ilock held, and then release
it before we do the log force to improve concurrency.
This also removes the only direct caller of _xfs_trans_commit, thus
allowing it to be merged into the plain xfs_trans_commit again.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
Christoph Hellwig [Mon, 26 Sep 2011 09:14:34 +0000 (09:14 +0000)]
xfs: XFS_TRANS_SWAPEXT is not a valid flag for xfs_trans_commit
XFS_TRANS_SWAPEXT is a transaction type, not a flag for xfs_trans_commit, so
don't pass it in xfs_swap_extents.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
Lukas Czerner [Wed, 21 Sep 2011 09:42:30 +0000 (09:42 +0000)]
xfs: fix possible overflow in xfs_ioc_trim()
In xfs_ioc_trim it is possible that computing the last allocation group
to discard might overflow for big start & len values, because the result
might be bigger then xfs_agnumber_t which is 32 bit long. Fix this by not
allowing the start and end block of the range to be beyond the end of the
file system.
Note that if the start is beyond the end of the file system we have to
return -EINVAL, but in the "end" case we have to truncate it to the fs
size.
Also introduce "end" variable, rather than using start+len which which
might be more confusing to get right as this bug shows.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
Christoph Hellwig [Sun, 18 Sep 2011 20:41:07 +0000 (20:41 +0000)]
xfs: cleanup xfs_bmap.h
Convert all function prototypes to the short form used elsewhere,
and remove duplicates of comments already placed at the function
body.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
Christoph Hellwig [Sun, 18 Sep 2011 20:41:06 +0000 (20:41 +0000)]
xfs: dont ignore error code from xfs_bmbt_update
Fix a case in xfs_bmap_add_extent_unwritten_real where we aren't
passing the returned error on.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
Christoph Hellwig [Sun, 18 Sep 2011 20:41:05 +0000 (20:41 +0000)]
xfs: pass bmalloca to xfs_bmap_add_extent_hole_real
All the parameters passed to xfs_bmap_add_extent_hole_real() are in
the xfs_bmalloca structure now. Just pass the bmalloca parameter to
the function instead of 8 separate parameters.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
Christoph Hellwig [Sun, 18 Sep 2011 20:41:04 +0000 (20:41 +0000)]
xfs: pass bmalloca to xfs_bmap_add_extent_delay_real
All the parameters passed to xfs_bmap_add_extent_delay_real() are in
the xfs_bmalloca structure now. Just pass the bmalloca parameter to
the function instead of 8 separate parameters.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
Christoph Hellwig [Sun, 18 Sep 2011 20:41:02 +0000 (20:41 +0000)]
xfs: move logflags into bmalloca
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
Dave Chinner [Sun, 18 Sep 2011 20:41:01 +0000 (20:41 +0000)]
xfs: move lastx and nallocs into bmalloca
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
Dave Chinner [Sun, 18 Sep 2011 20:41:00 +0000 (20:41 +0000)]
xfs: move btree cursor into bmalloca
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
Dave Chinner [Sun, 18 Sep 2011 20:40:59 +0000 (20:40 +0000)]
xfs: do not keep local copies of allocation ranges in xfs_bmapi_allocate
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
Dave Chinner [Sun, 18 Sep 2011 20:40:58 +0000 (20:40 +0000)]
xfs: rename allocation range fields in struct xfs_bmalloca
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
Dave Chinner [Sun, 18 Sep 2011 20:40:57 +0000 (20:40 +0000)]
xfs: move firstblock and bmap freelist cursor into bmalloca structure
Rather than passing the firstblock and freelist structure around,
embed it into the bmalloca structure and remove it from the function
parameters.
This also enables the minleft parameter to be set only once in
xfs_bmapi_write(), and the freelist cursor directly queried in
xfs_bmapi_allocate to clear it when the lowspace algorithm is
activated.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
Dave Chinner [Sun, 18 Sep 2011 20:40:56 +0000 (20:40 +0000)]
xfs: move extent records into bmalloca structure
Rather that putting extent records on the stack and then pointing to
them in the bmalloca structure which is in the same stack frame, put
the extent records directly in the bmalloca structure. This reduces
the number of args that need to be passed around.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
Dave Chinner [Sun, 18 Sep 2011 20:40:55 +0000 (20:40 +0000)]
xfs: pass bmalloca structure to xfs_bmap_isaeof
All the variables xfs_bmap_isaeof() is passed are contained within
the xfs_bmalloca structure. Pass that instead.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
Christoph Hellwig [Sun, 18 Sep 2011 20:40:54 +0000 (20:40 +0000)]
xfs: remove xfs_bmap_add_extent
There is no real need to the xfs_bmap_add_extent, as the callers
know what kind of extents they need to it. Removing it means
duplicating the extents to btree conversion logic in three places,
but overall it's still much simpler code and quite a bit less code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
Christoph Hellwig [Sun, 18 Sep 2011 20:40:53 +0000 (20:40 +0000)]
xfs: introduce xfs_bmap_last_extent
Add a common helper for finding the last extent in a file.
Largely based on a patch from Dave Chinner.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
Dave Chinner [Sun, 18 Sep 2011 20:40:52 +0000 (20:40 +0000)]
xfs: rename xfs_bmapi to xfs_bmapi_write
Now that all the read-only users of xfs_bmapi have been converted to
use xfs_bmapi_read(), we can remove all the read-only handling cases
from xfs_bmapi().
Once this is done, rename xfs_bmapi to xfs_bmapi_write to reflect
the fact it is for allocation only. This enables us to kill the
XFS_BMAPI_WRITE flag as well.
Also clean up xfs_bmapi_write to the style used in the newly added
xfs_bmapi_read/delay functions.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
Dave Chinner [Sun, 18 Sep 2011 20:40:51 +0000 (20:40 +0000)]
xfs: factor unwritten extent map manipulations out of xfs_bmapi
To further improve the readability of xfs_bmapi(), factor the
unwritten extent conversion out into a separate function. This
removes large block of logic from the xfs_bmapi() code loop and
makes it easier to see the operational logic flow for xfs_bmapi().
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
Dave Chinner [Sun, 18 Sep 2011 20:40:50 +0000 (20:40 +0000)]
xfs: factor extent allocation out of xfs_bmapi
To further improve the readability of xfs_bmapi(), factor the extent
allocation out into a separate function. This removes a large block
of logic from the xfs_bmapi() code loop and makes it easier to see
the operational logic flow for xfs_bmapi().
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
Christoph Hellwig [Sun, 18 Sep 2011 20:40:49 +0000 (20:40 +0000)]
xfs: do not use xfs_bmap_add_extent for adding delalloc extents
We can just call xfs_bmap_add_extent_hole_delay directly to add a
delayed allocated regions to the extent tree, instead of going
through all the complexities of xfs_bmap_add_extent that aren't
needed for this simple case.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
Christoph Hellwig [Sun, 18 Sep 2011 20:40:48 +0000 (20:40 +0000)]
xfs: introduce xfs_bmapi_delay()
Delalloc reservations are much simpler than allocations, so give
them a separate bmapi-level interface. Using the previously added
xfs_bmapi_reserve_delalloc we get a function that is only minimally
more complicated than xfs_bmapi_read, which is far from the complexity
in xfs_bmapi. Also remove the XFS_BMAPI_DELAY code after switching
over the only user to xfs_bmapi_delay.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
Christoph Hellwig [Sun, 18 Sep 2011 20:40:47 +0000 (20:40 +0000)]
xfs: factor delalloc reservations out of xfs_bmapi
Move the reservation of delayed allocations, and addition of delalloc
regions to the extent trees into a new helper function. For now
this adds some twisted goto logic to xfs_bmapi, but that will be
cleaned up in the following patches.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
Dave Chinner [Sun, 18 Sep 2011 20:40:46 +0000 (20:40 +0000)]
xfs: remove xfs_bmapi_single()
Now we have xfs_bmapi_read, there is no need for xfs_bmapi_single().
Change the remaining caller over and kill the function.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
Dave Chinner [Sun, 18 Sep 2011 20:40:45 +0000 (20:40 +0000)]
xfs: introduce xfs_bmapi_read()
xfs_bmapi() currently handles both extent map reading and
allocation. As a result, the code is littered with "if (wr)"
branches to conditionally do allocation operations if required.
This makes the code much harder to follow and causes significant
indent issues with the code.
Given that read mapping is much simpler than allocation, we can
split out read mapping from xfs_bmapi() and reuse the logic that
we have already factored out do do all the hard work of handling the
extent map manipulations. The results in a much simpler function for
the common extent read operations, and will allow the allocation
code to be simplified in another commit.
Once xfs_bmapi_read() is implemented, convert all the callers of
xfs_bmapi() that are only reading extents to use the new function.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
Dave Chinner [Sun, 18 Sep 2011 20:40:44 +0000 (20:40 +0000)]
xfs: factor extent map manipulations out of xfs_bmapi
To further improve the readability of xfs_bmapi(), factor the pure
extent map manipulations out into separate functions. This removes
large blocks of logic from the xfs_bmapi() code loop and makes it
easier to see the operational logic flow for xfs_bmapi().
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
Christoph Hellwig [Sun, 18 Sep 2011 20:40:43 +0000 (20:40 +0000)]
xfs: remove the nextents variable in xfs_bmapi
Instead of using a local variable that needs to updated when we modify
the extent map just check ifp->if_bytes directly where we use it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
Christoph Hellwig [Sun, 18 Sep 2011 20:40:42 +0000 (20:40 +0000)]
xfs: remove impossible to read code in xfs_bmap_add_extent_delay_real
We already have the worst case blocks reserved, so xfs_icsb_modify_counters
won't fail in xfs_bmap_add_extent_delay_real. In fact we've had an assert
to catch this case since day and it never triggered. So remove the code
to try smaller reservations, and just return the error for that case in
addition to keeping the assert.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
Christoph Hellwig [Sun, 18 Sep 2011 20:40:41 +0000 (20:40 +0000)]
xfs: remove the first extent special case in xfs_bmap_add_extent
Both xfs_bmap_add_extent_hole_delay and xfs_bmap_add_extent_hole_real
already contain code to handle the case where there is no extent to
merge with, which is effectively the same as the code duplicated here.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
Mitsuo Hayasaka [Sat, 17 Sep 2011 13:38:38 +0000 (13:38 +0000)]
xfs: Return -EIO when xfs_vn_getattr() failed
An attribute of inode can be fetched via xfs_vn_getattr() in XFS.
Currently it returns EIO, not negative value, when it failed. As a
result, the system call returns not negative value even though an
error occured. The stat(2), ls and mv commands cannot handle this
error and do not work correctly.
This patch fixes this bug, and returns -EIO, not EIO when an error
is detected in xfs_vn_getattr().
Signed-off-by: Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
Chandra Seetharaman [Thu, 8 Sep 2011 20:18:50 +0000 (20:18 +0000)]
xfs: Fix the incorrect comment in the header of _xfs_buf_find
Fix the incorrect comment in the header of the function
_xfs_buf_find().
Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
Chandra Seetharaman [Tue, 20 Sep 2011 13:56:55 +0000 (13:56 +0000)]
xfs: Check the return value of xfs_trans_get_buf()
Check the return value of xfs_trans_get_buf() and fail
appropriately.
Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
Chandra Seetharaman [Wed, 7 Sep 2011 19:37:54 +0000 (19:37 +0000)]
xfs: Check the return value of xfs_buf_get()
Check the return value of xfs_buf_get() and fail appropriately.
Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
Christoph Hellwig [Wed, 24 Aug 2011 05:59:25 +0000 (05:59 +0000)]
xfs: improve ioend error handling
Return unwritten extent conversion errors to aio_complete.
Skip both unwritten extent conversion and size updates if we had an
I/O error or the filesystem has been shut down.
Return -EIO to the aio/buffer completion handlers in case of a
forced shutdown.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
Christoph Hellwig [Sat, 27 Aug 2011 14:42:53 +0000 (14:42 +0000)]
xfs: avoid direct I/O write vs buffered I/O race
Currently a buffered reader or writer can add pages to the pagecache
while we are waiting for the iolock in xfs_file_dio_aio_write. Prevent
this by re-checking mapping->nrpages after we got the iolock, and if
nessecary upgrade the lock to exclusive mode. To simplify this a bit
only take the ilock inside of xfs_file_aio_write_checks.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
Christoph Hellwig [Sat, 27 Aug 2011 14:45:11 +0000 (14:45 +0000)]
xfs: avoid synchronous transactions when deleting attr blocks
Currently xfs_attr_inactive causes a synchronous transactions if we are
removing a file that has any extents allocated to the attribute fork, and
thus makes XFS extremely slow at removing files with out of line extended
attributes. The code looks a like a relict from the days before the busy
extent list, but with the busy extent list we avoid reusing data and attr
extents that have been freed but not commited yet, so this code is just
as superflous as the synchronous transactions for data blocks.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
Christoph Hellwig [Tue, 23 Aug 2011 08:28:13 +0000 (08:28 +0000)]
xfs: remove i_iocount
We now have an i_dio_count filed and surrounding infrastructure to wait
for direct I/O completion instead of i_icount, and we have never needed
to iocount waits for buffered I/O given that we only set the page uptodate
after finishing all required work. Thus remove i_iocount, and replace
the actually needed waits with calls to inode_dio_wait.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
Christoph Hellwig [Tue, 23 Aug 2011 08:28:12 +0000 (08:28 +0000)]
xfs: wait for I/O completion when writing out pages in xfs_setattr_size
The current code relies on the xfs_ioend_wait call later on to make sure
all I/O actually has completed. The xfs_ioend_wait call will go away soon,
so prepare for that by using the waiting filemap function.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
Christoph Hellwig [Tue, 23 Aug 2011 08:28:11 +0000 (08:28 +0000)]
xfs: reduce ioend latency
There is no reason to queue up ioends for processing in user context
unless we actually need it. Just complete ioends that do not convert
unwritten extents or need a size update from the end_io context.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
Christoph Hellwig [Tue, 23 Aug 2011 08:28:10 +0000 (08:28 +0000)]
xfs: defer AIO/DIO completions
We really shouldn't complete AIO or DIO requests until we have finished
the unwritten extent conversion and size update. This means fsync never
has to pick up any ioends as all work has been completed when signalling
I/O completion.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
Christoph Hellwig [Tue, 23 Aug 2011 08:28:09 +0000 (08:28 +0000)]
xfs: remove dead ENODEV handling in xfs_destroy_ioend
No driver returns ENODEV from it bio completion handler, not has this
ever been documented. Remove the dead code dealing with it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
Christoph Hellwig [Tue, 23 Aug 2011 08:28:08 +0000 (08:28 +0000)]
xfs: use the "delwri" terminology consistently
And also remove the strange local lock and delwri list pointers in a few
functions.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
Christoph Hellwig [Tue, 23 Aug 2011 08:28:07 +0000 (08:28 +0000)]
xfs: let xfs_bwrite callers handle the xfs_buf_relse
Remove the xfs_buf_relse from xfs_bwrite and let the caller handle it to
mirror the delwri and read paths.
Also remove the mount pointer passed to xfs_bwrite, which is superflous now
that we have a mount pointer in the buftarg.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
Christoph Hellwig [Tue, 23 Aug 2011 08:28:06 +0000 (08:28 +0000)]
xfs: call xfs_buf_delwri_queue directly
Unify the ways we add buffers to the delwri queue by always calling
xfs_buf_delwri_queue directly. The xfs_bdwrite functions is removed and
opencoded in its callers, and the two places setting XBF_DELWRI while a
buffer is locked and expecting xfs_buf_unlock to pick it up are converted
to call xfs_buf_delwri_queue directly, too. Also replace the
XFS_BUF_UNDELAYWRITE macro with direct calls to xfs_buf_delwri_dequeue
to make the explicit queuing/dequeuing more obvious.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
Christoph Hellwig [Tue, 23 Aug 2011 08:28:05 +0000 (08:28 +0000)]
xfs: move more delwri setup into xfs_buf_delwri_queue
Do not transfer a reference held by the caller to the buffer on the list,
or decrement it in xfs_buf_delwri_queue, but instead grab a new reference
if needed, and let the caller drop its own reference. Also move setting
of the XBF_DELWRI and XBF_ASYNC flags into xfs_buf_delwri_queue, and
only do it if needed. Note that for now xfs_buf_unlock already has
XBF_DELWRI, but that will change in the following patches.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
Christoph Hellwig [Tue, 23 Aug 2011 08:28:04 +0000 (08:28 +0000)]
xfs: remove the unlock argument to xfs_buf_delwri_queue
We can just unlock the buffer in the caller, and the decrement of b_hold
would also be needed in the !unlock, we just never hit that case currently
given that the caller handles that case.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
Christoph Hellwig [Tue, 23 Aug 2011 08:28:03 +0000 (08:28 +0000)]
xfs: remove delwri buffer handling from xfs_buf_iorequest
We cannot ever reach xfs_buf_iorequest for a buffer with XBF_DELWRI set,
given that all write handlers make sure that the buffer is remove from
the delwri queue before, and we never do reads with the XBF_DELWRI flag
set (which the code would not handle correctly anyway).
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
Dave Chinner [Thu, 25 Aug 2011 07:17:02 +0000 (07:17 +0000)]
xfs: don't serialise adjacent concurrent direct IO appending writes
For append write workloads, extending the file requires a certain
amount of exclusive locking to be done up front to ensure sanity in
things like ensuring that we've zeroed any allocated regions
between the old EOF and the start of the new IO.
For single threads, this typically isn't a problem, and for large
IOs we don't serialise enough for it to be a problem for two
threads on really fast block devices. However for smaller IO and
larger thread counts we have a problem.
Take 4 concurrent sequential, single block sized and aligned IOs.
After the first IO is submitted but before it completes, we end up
with this state:
IO 1 IO 2 IO 3 IO 4
+-------+-------+-------+-------+
^ ^
| |
| |
| |
| \- ip->i_new_size
\- ip->i_size
And the IO is done without exclusive locking because offset <=
ip->i_size. When we submit IO 2, we see offset > ip->i_size, and
grab the IO lock exclusive, because there is a chance we need to do
EOF zeroing. However, there is already an IO in progress that avoids
the need for IO zeroing because offset <= ip->i_new_size. hence we
could avoid holding the IO lock exlcusive for this. Hence after
submission of the second IO, we'd end up this state:
IO 1 IO 2 IO 3 IO 4
+-------+-------+-------+-------+
^ ^
| |
| |
| |
| \- ip->i_new_size
\- ip->i_size
There is no need to grab the i_mutex of the IO lock in exclusive
mode if we don't need to invalidate the page cache. Taking these
locks on every direct IO effective serialises them as taking the IO
lock in exclusive mode has to wait for all shared holders to drop
the lock. That only happens when IO is complete, so effective it
prevents dispatch of concurrent direct IO writes to the same inode.
And so you can see that for the third concurrent IO, we'd avoid
exclusive locking for the same reason we avoided the exclusive lock
for the second IO.
Fixing this is a bit more complex than that, because we need to hold
a write-submission local value of ip->i_new_size to that clearing
the value is only done if no other thread has updated it before our
IO completes.....
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
Dave Chinner [Thu, 25 Aug 2011 07:17:01 +0000 (07:17 +0000)]
xfs: don't serialise direct IO reads on page cache checks
There is no need to grab the i_mutex of the IO lock in exclusive
mode if we don't need to invalidate the page cache. Taking these
locks on every direct IO effective serialises them as taking the IO
lock in exclusive mode has to wait for all shared holders to drop
the lock. That only happens when IO is complete, so effective it
prevents dispatch of concurrent direct IO reads to the same inode.
Fix this by taking the IO lock shared to check the page cache state,
and only then drop it and take the IO lock exclusively if there is
work to be done. Hence for the normal direct IO case, no exclusive
locking will occur.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Tested-by: Joern Engel <joern@logfs.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
Linus Torvalds [Wed, 5 Oct 2011 01:11:50 +0000 (18:11 -0700)]
Linux 3.1-rc9
Linus Torvalds [Tue, 4 Oct 2011 17:37:06 +0000 (10:37 -0700)]
Merge git://github.com/davem330/net
* git://github.com/davem330/net:
pch_gbe: Fixed the issue on which a network freezes
pch_gbe: Fixed the issue on which PC was frozen when link was downed.
make PACKET_STATISTICS getsockopt report consistently between ring and non-ring
net: xen-netback: correctly restart Tx after a VM restore/migrate
bonding: properly stop queuing work when requested
can bcm: fix incomplete tx_setup fix
RDSRDMA: Fix cleanup of rds_iw_mr_pool
net: Documentation: Fix type of variables
ibmveth: Fix oops on request_irq failure
ipv6: nullify ipv6_ac_list and ipv6_fl_list when creating new socket
cxgb4: Fix EEH on IBM P7IOC
can bcm: fix tx_setup off-by-one errors
MAINTAINERS: tehuti: Alexander Indenbaum's address bounces
dp83640: reduce driver noise
ptp: fix L2 event message recognition
Linus Torvalds [Tue, 4 Oct 2011 16:59:22 +0000 (09:59 -0700)]
Merge branch 'fix/asoc' of git://github.com/tiwai/sound
* 'fix/asoc' of git://github.com/tiwai/sound:
ASoC: omap_mcpdm_remove cannot be __devexit
ASoC: Fix setting update bits for WM8753_LADC and WM8753_RADC
ASoC: use a valid device for dev_err() in Zylonite
Linus Torvalds [Tue, 4 Oct 2011 16:54:18 +0000 (09:54 -0700)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm/radeon/kms: fix channel_remap setup (v2)
drm/radeon: Set cursor x/y to 0 when x/yorigin > 0.
drm/radeon: Update AVIVO cursor coordinate origin before x/yorigin calculation.
drm/radeon: Simplify cursor x/yorigin calculation.
drm/radeon/kms: fix cursor image off-by-one error
drm/radeon/kms: Fix logic error in DP HPD handler
drm/radeon/kms: add retry limits for native DP aux defer
drm/radeon/kms: fix regression in DP aux defer handling
Linus Torvalds [Tue, 4 Oct 2011 16:52:56 +0000 (09:52 -0700)]
Merge branch 'spi/merge' of git://git.secretlab.ca/git/linux-2.6
* 'spi/merge' of git://git.secretlab.ca/git/linux-2.6:
spi-topcliff-pch: Fix overrun issue
spi-topcliff-pch: Add recovery processing in case FIFO overrun error occurs
spi-topcliff-pch: Fix CPU read complete condition issue
spi-topcliff-pch: Fix SSN Control issue
spi-topcliff-pch: add tx-memory clear after complete transmitting
Jon Mason [Mon, 3 Oct 2011 14:50:20 +0000 (09:50 -0500)]
PCI: Disable MPS configuration by default
Add the ability to disable PCI-E MPS turning and using the BIOS
configured MPS defaults. Due to the number of issues recently
discovered on some x86 chipsets, make this the default behavior.
Also, add the option for peer to peer DMA MPS configuration. Peer to
peer DMA is outside the scope of this patch, but MPS configuration could
prevent it from working by having the MPS on one root port different
than the MPS on another. To work around this, simply make the system
wide MPS the smallest possible value (128B).
Signed-off-by: Jon Mason <mason@myri.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alex Deucher [Tue, 4 Oct 2011 14:46:34 +0000 (10:46 -0400)]
drm/radeon/kms: fix channel_remap setup (v2)
Most asics just use the hw default value which requires
no explicit programming. For those that need a different
value, the vbios will program it properly. As such,
there's no need to program these registers explicitly
in the driver. Changing MC_SHARED_CHREMAP requires a reload
of all data in vram otherwise its contents will be scambled.
Fixes:
https://bugs.freedesktop.org/show_bug.cgi?id=40103
v2: drop now unused channel_remap functions.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Cc: stable@kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
Tomoya MORINAGA [Tue, 6 Sep 2011 08:16:38 +0000 (17:16 +0900)]
spi-topcliff-pch: Fix overrun issue
We found that adding load, Rx data sometimes drops.(with DMA transfer mode)
The cause is that before starting Rx-DMA processing, Tx-DMA processing starts.
This causes FIFO overrun occurs.
This patch fixes the issue by modifying FIFO tx-threshold and DMA descriptor
size like below.
Current this patch
Rx-descriptor 4Byte+12Byte*341 --> 12Byte*340-4Byte-12Byte
Rx-threshold (Not modified)
Tx-descriptor 4Byte+12Byte*341 --> 16Byte-12Byte*340
Rx-threshold 12Byte --> 2Byte
Signed-off-by: Tomoya MORINAGA <tomoya-linux@dsn.okisemi.com>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Tomoya MORINAGA [Tue, 6 Sep 2011 08:16:37 +0000 (17:16 +0900)]
spi-topcliff-pch: Add recovery processing in case FIFO overrun error occurs
Add recovery processing in case FIFO overrun error occurs with DMA transfer mode.
Signed-off-by: Tomoya MORINAGA <tomoya-linux@dsn.okisemi.com>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Tomoya MORINAGA [Tue, 6 Sep 2011 08:16:36 +0000 (17:16 +0900)]
spi-topcliff-pch: Fix CPU read complete condition issue
We found Rx data sometimes drops.(with non-DMA transfer mode)
The cause is read complete condition is not true.
This patch fixes the issue.
Signed-off-by: Tomoya MORINAGA <tomoya-linux@dsn.okisemi.com>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Tomoya MORINAGA [Tue, 6 Sep 2011 08:16:35 +0000 (17:16 +0900)]
spi-topcliff-pch: Fix SSN Control issue
During processing 1 command/data series,
SSN should keep LOW.
However, currently, SSN becomes HIGH.
This patch fixes the issue.
Signed-off-by: Tomoya MORINAGA <tomoya-linux@dsn.okisemi.com>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Tomoya MORINAGA [Tue, 6 Sep 2011 08:16:34 +0000 (17:16 +0900)]
spi-topcliff-pch: add tx-memory clear after complete transmitting
Currently, in case of reading date from SPI flash,
command is sent twice.
The cause is that tx-memory clear processing is missing .
This patch adds the tx-momory clear processing.
Signed-off-by: Tomoya MORINAGA <tomoya-linux@dsn.okisemi.com>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Takashi Iwai [Tue, 4 Oct 2011 01:09:14 +0000 (18:09 -0700)]
lis3: fix regression of HP DriveGuard with 8bit chip
Commit
2a7fade7e03 ("hwmon: lis3: Power on corrections") caused a
regression on HP laptops with 8bit chip. Writing CTRL2_BOOT_8B bit seems
clearing the BIOS setup, and no proper interrupt for DriveGuard will be
triggered any more.
Since the init code there is basically only for embedded devices, put a
pdata check so that the problematic initialization will be skipped for
hp_accel stuff.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Cc: Eric Piel <eric.piel@tremplin-utc.net>
Cc: Samu Onkalo <samu.p.onkalo@nokia.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Mon, 3 Oct 2011 19:54:56 +0000 (12:54 -0700)]
Merge branch 'hwmon-for-linus' of git://github.com/groeck/linux
* 'hwmon-for-linus' of git://github.com/groeck/linux:
hwmon: (coretemp) Avoid leaving around dangling pointer
hwmon: (coretemp) Fixup platform device ID change
Linus Torvalds [Mon, 3 Oct 2011 19:53:43 +0000 (12:53 -0700)]
Merge git://github.com/davem330/ide
* git://github.com/davem330/ide:
ide-disk: Fix request requeuing
Linus Torvalds [Mon, 3 Oct 2011 19:17:44 +0000 (12:17 -0700)]
Merge branch 'btrfs-3.0' of git://github.com/chrismason/linux
* 'btrfs-3.0' of git://github.com/chrismason/linux:
Btrfs: force a page fault if we have a shorty copy on a page boundary
Borislav Petkov [Mon, 3 Oct 2011 18:28:18 +0000 (14:28 -0400)]
ide-disk: Fix request requeuing
Simon Kirby reported that on his RAID setup with idedisk underneath
the box OOMs after a couple of days of runtime. Running with
CONFIG_DEBUG_KMEMLEAK pointed to idedisk_prep_fn() which unconditionally
allocates an ide_cmd struct. However, ide_requeue_and_plug() can be
called more than once per request, either from the request issue or the
IRQ handler path and do blk_peek_request() ends up in idedisk_prep_fn()
repeatedly, allocating a struct ide_cmd everytime and "forgetting" the
previous pointer.
Make sure the code reuses the old allocated chunk.
Reported-and-tested-by: Simon Kirby <sim@hostway.ca>
Cc: <stable@kernel.org> [ 39.x, 3.0.x ]
Link: http://marc.info/?l=linux-kernel&m=131667641517919
Link: http://lkml.kernel.org/r/20110922072643.GA27232@hostway.ca
Signed-off-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Toshiharu Okada [Sun, 25 Sep 2011 21:27:43 +0000 (21:27 +0000)]
pch_gbe: Fixed the issue on which a network freezes
The pch_gbe driver has an issue which a network stops,
when receiving traffic is high.
In the case, The link down and up are necessary to return a network.
This patch fixed this issue.
Signed-off-by: Toshiharu Okada <toshiharu-linux@dsn.okisemi.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Toshiharu Okada [Sun, 25 Sep 2011 21:27:42 +0000 (21:27 +0000)]
pch_gbe: Fixed the issue on which PC was frozen when link was downed.
When a link was downed during network use,
there is an issue on which PC freezes.
This patch fixed this issue.
Signed-off-by: Toshiharu Okada <toshiharu-linux@dsn.okisemi.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Willem de Bruijn [Fri, 30 Sep 2011 10:38:28 +0000 (10:38 +0000)]
make PACKET_STATISTICS getsockopt report consistently between ring and non-ring
This is a minor change.
Up until kernel 2.6.32, getsockopt(fd, SOL_PACKET, PACKET_STATISTICS,
...) would return total and dropped packets since its last invocation. The
introduction of socket queue overflow reporting [1] changed drop
rate calculation in the normal packet socket path, but not when using a
packet ring. As a result, the getsockopt now returns different statistics
depending on the reception method used. With a ring, it still returns the
count since the last call, as counts are incremented in tpacket_rcv and
reset in getsockopt. Without a ring, it returns 0 if no drops occurred
since the last getsockopt and the total drops over the lifespan of
the socket otherwise. The culprit is this line in packet_rcv, executed
on a drop:
drop_n_acct:
po->stats.tp_drops = atomic_inc_return(&sk->sk_drops);
As it shows, the new drop number it taken from the socket drop counter,
which is not reset at getsockopt. I put together a small example
that demonstrates the issue [2]. It runs for 10 seconds and overflows
the queue/ring on every odd second. The reported drop rates are:
ring: 16, 0, 16, 0, 16, ...
non-ring: 0, 15, 0, 30, 0, 46, 0, 60, 0 , 74.
Note how the even ring counts monotonically increase. Because the
getsockopt adds tp_drops to tp_packets, total counts are similarly
reported cumulatively. Long story short, reinstating the original code, as
the below patch does, fixes the issue at the cost of additional per-packet
cycles. Another solution that does not introduce per-packet overhead
is be to keep the current data path, record the value of sk_drops at
getsockopt() at call N in a new field in struct packetsock and subtract
that when reporting at call N+1. I'll be happy to code that, instead,
it's just more messy.
[1] http://patchwork.ozlabs.org/patch/35665/
[2] http://kernel.googlecode.com/files/test-packetsock-getstatistics.c
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Vrabel [Fri, 30 Sep 2011 06:37:51 +0000 (06:37 +0000)]
net: xen-netback: correctly restart Tx after a VM restore/migrate
If a VM is saved and restored (or migrated) the netback driver will no
longer process any Tx packets from the frontend. xenvif_up() does not
schedule the processing of any pending Tx requests from the front end
because the carrier is off. Without this initial kick the frontend
just adds Tx requests to the ring without raising an event (until the
ring is full).
This was caused by
47103041e91794acdbc6165da0ae288d844c820b (net:
xen-netback: convert to hw_features) which reordered the calls to
xenvif_up() and netif_carrier_on() in xenvif_connect().
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Gospodarek [Fri, 23 Sep 2011 10:53:34 +0000 (10:53 +0000)]
bonding: properly stop queuing work when requested
During a test where a pair of bonding interfaces using ARP monitoring
were both brought up and torn down (with an rmmod) repeatedly, a panic
in the timer code was noticed. I tracked this down and determined that
any of the bonding functions that ran as workqueue handlers and requeued
more work might not properly exit when the module was removed.
There was a flag protected by the bond lock called kill_timers that is
set when the interface goes down or the module is removed, but many of
the functions that monitor link status now unlock the bond lock to take
rtnl first. There is a chance that another CPU running the rmmod could
get the lock and set kill_timers after the first check has passed.
This patch does not allow any function to queue work that will make
itself run unless kill_timers is not set. I also noticed while doing
this work that bond_resend_igmp_join_requests did not have a check for
kill_timers, so I added the needed call there as well.
Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Reported-by: Liang Zheng <lzheng@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michel Dänzer [Fri, 30 Sep 2011 15:16:53 +0000 (17:16 +0200)]
drm/radeon: Set cursor x/y to 0 when x/yorigin > 0.
Apart from the obvious cleanup, this should make the line
cursor_end = x - xorigin + w;
correct now.
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Michel Dänzer [Fri, 30 Sep 2011 15:16:52 +0000 (17:16 +0200)]
drm/radeon: Update AVIVO cursor coordinate origin before x/yorigin calculation.
Fixes cursor disappearing prematurely when moving off a top/left edge which
is not located at the desktop top/left edge.
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Cc: stable@kernel.org
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Michel Dänzer [Fri, 30 Sep 2011 15:16:51 +0000 (17:16 +0200)]
drm/radeon: Simplify cursor x/yorigin calculation.
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Nicholas Miell [Fri, 30 Sep 2011 02:07:14 +0000 (19:07 -0700)]
drm/radeon/kms: fix cursor image off-by-one error
The mouse cursor hotspot calculation when the cursor is partially off the
top or left side of the screen was off by one.
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=41158
Signed-off-by: Nicholas Miell <nmiell@gmail.com>
Reviewed-by: Michel Dänzer <michel@daenzer.net>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Alex Deucher [Mon, 3 Oct 2011 12:37:33 +0000 (08:37 -0400)]
drm/radeon/kms: Fix logic error in DP HPD handler
Only disable the pipe if the monitor is physically
disconnected. The previous logic also disabled the
pipe if the link was trained.
Fixes:
https://bugs.freedesktop.org/show_bug.cgi?id=41248
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
Alex Deucher [Mon, 3 Oct 2011 13:13:46 +0000 (09:13 -0400)]
drm/radeon/kms: add retry limits for native DP aux defer
The previous code could potentially loop forever. Limit
the number of DP aux defer retries to 4 for native aux
transactions, same as i2c over aux transactions.
Noticed by: Brad Campbell <lists2009@fnarfbargle.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Brad Campbell <lists2009@fnarfbargle.com>
Cc: stable@kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
Alex Deucher [Mon, 3 Oct 2011 13:13:45 +0000 (09:13 -0400)]
drm/radeon/kms: fix regression in DP aux defer handling
An incorrect ordering in the error checking code lead
to DP aux defer being skipped in the aux native write
path. Move the bytes transferred check (ret == 0)
below the defer check.
Tracked down by: Brad Campbell <brad@fnarfbargle.com>
Fixes:
https://bugs.freedesktop.org/show_bug.cgi?id=41121
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Brad Campbell <brad@fnarfbargle.com>
Cc: stable@kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
Linus Torvalds [Mon, 3 Oct 2011 02:23:44 +0000 (19:23 -0700)]
Merge branch 'for-linus' of git://git.infradead.org/users/sameo/mfd-2.6
* 'for-linus' of git://git.infradead.org/users/sameo/mfd-2.6:
mfd: Fix generic irq chip ack function name for jz4740-adc
Linus Torvalds [Mon, 3 Oct 2011 02:22:44 +0000 (19:22 -0700)]
Merge branch 'for-linus' of git://github.com/tiwai/sound
* 'for-linus' of git://github.com/tiwai/sound:
ALSA: hda - Fix a regression of the position-buffer check
Arnd Bergmann [Sun, 2 Oct 2011 14:45:31 +0000 (16:45 +0200)]
ASoC: omap_mcpdm_remove cannot be __devexit
omap_mcpdm_remove is used from asoc_mcpdm_probe, which is an
initcall, and must not be discarded when HOTPLUG is disabled.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Axel Lin [Sun, 2 Oct 2011 12:41:04 +0000 (20:41 +0800)]
ASoC: Fix setting update bits for WM8753_LADC and WM8753_RADC
Current code set update bits for WM8753_LDAC and WM8753_RDAC twice,
but missed setting update bits for WM8753_LADC and WM8753_RADC.
I think it is a copy-paste bug in commit 776065
"ASoC: codecs: wm8753: Fix register cache incoherency".
Signed-off-by: Axel Lin <axel.lin@gmail.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Cc: stable@kernel.org