Eryu Guan [Tue, 2 May 2017 20:54:47 +0000 (13:54 -0700)]
xfs: fix use-after-free in xfs_finish_page_writeback
Commit
28b783e47ad7 ("xfs: bufferhead chains are invalid after
end_page_writeback") fixed one use-after-free issue by
pre-calculating the loop conditionals before calling bh->b_end_io()
in the end_io processing loop, but it assigned 'next' pointer before
checking end offset boundary & breaking the loop, at which point the
bh might be freed already, and caused use-after-free.
This is caught by KASAN when running fstests generic/127 on sub-page
block size XFS.
[ 2517.244502] run fstests generic/127 at 2017-04-27 07:30:50
[ 2747.868840] ==================================================================
[ 2747.876949] BUG: KASAN: use-after-free in xfs_destroy_ioend+0x3d3/0x4e0 [xfs] at addr
ffff8801395ae698
...
[ 2747.918245] Call Trace:
[ 2747.920975] dump_stack+0x63/0x84
[ 2747.924673] kasan_object_err+0x21/0x70
[ 2747.928950] kasan_report+0x271/0x530
[ 2747.933064] ? xfs_destroy_ioend+0x3d3/0x4e0 [xfs]
[ 2747.938409] ? end_page_writeback+0xce/0x110
[ 2747.943171] __asan_report_load8_noabort+0x19/0x20
[ 2747.948545] xfs_destroy_ioend+0x3d3/0x4e0 [xfs]
[ 2747.953724] xfs_end_io+0x1af/0x2b0 [xfs]
[ 2747.958197] process_one_work+0x5ff/0x1000
[ 2747.962766] worker_thread+0xe4/0x10e0
[ 2747.966946] kthread+0x2d3/0x3d0
[ 2747.970546] ? process_one_work+0x1000/0x1000
[ 2747.975405] ? kthread_create_on_node+0xc0/0xc0
[ 2747.980457] ? syscall_return_slowpath+0xe6/0x140
[ 2747.985706] ? do_page_fault+0x30/0x80
[ 2747.989887] ret_from_fork+0x2c/0x40
[ 2747.993874] Object at
ffff8801395ae690, in cache buffer_head size: 104
[ 2748.001155] Allocated:
[ 2748.003782] PID = 8327
[ 2748.006411] save_stack_trace+0x1b/0x20
[ 2748.010688] save_stack+0x46/0xd0
[ 2748.014383] kasan_kmalloc+0xad/0xe0
[ 2748.018370] kasan_slab_alloc+0x12/0x20
[ 2748.022648] kmem_cache_alloc+0xb8/0x1b0
[ 2748.027024] alloc_buffer_head+0x22/0xc0
[ 2748.031399] alloc_page_buffers+0xd1/0x250
[ 2748.035968] create_empty_buffers+0x30/0x410
[ 2748.040730] create_page_buffers+0x120/0x1b0
[ 2748.045493] __block_write_begin_int+0x17a/0x1800
[ 2748.050740] iomap_write_begin+0x100/0x2f0
[ 2748.055308] iomap_zero_range_actor+0x253/0x5c0
[ 2748.060362] iomap_apply+0x157/0x270
[ 2748.064347] iomap_zero_range+0x5a/0x80
[ 2748.068624] iomap_truncate_page+0x6b/0xa0
[ 2748.073227] xfs_setattr_size+0x1f7/0xa10 [xfs]
[ 2748.078312] xfs_vn_setattr_size+0x68/0x140 [xfs]
[ 2748.083589] xfs_file_fallocate+0x4ac/0x820 [xfs]
[ 2748.088838] vfs_fallocate+0x2cf/0x780
[ 2748.093021] SyS_fallocate+0x48/0x80
[ 2748.097006] do_syscall_64+0x18a/0x430
[ 2748.101186] return_from_SYSCALL_64+0x0/0x6a
[ 2748.105948] Freed:
[ 2748.108189] PID = 8327
[ 2748.110816] save_stack_trace+0x1b/0x20
[ 2748.115093] save_stack+0x46/0xd0
[ 2748.118788] kasan_slab_free+0x73/0xc0
[ 2748.122969] kmem_cache_free+0x7a/0x200
[ 2748.127247] free_buffer_head+0x41/0x80
[ 2748.131524] try_to_free_buffers+0x178/0x250
[ 2748.136316] xfs_vm_releasepage+0x2e9/0x3d0 [xfs]
[ 2748.141563] try_to_release_page+0x100/0x180
[ 2748.146325] invalidate_inode_pages2_range+0x7da/0xcf0
[ 2748.152087] xfs_shift_file_space+0x37d/0x6e0 [xfs]
[ 2748.157557] xfs_collapse_file_space+0x49/0x120 [xfs]
[ 2748.163223] xfs_file_fallocate+0x2a7/0x820 [xfs]
[ 2748.168462] vfs_fallocate+0x2cf/0x780
[ 2748.172642] SyS_fallocate+0x48/0x80
[ 2748.176629] do_syscall_64+0x18a/0x430
[ 2748.180810] return_from_SYSCALL_64+0x0/0x6a
Fixed it by checking on offset against end & breaking out first,
dereference bh only if there're still bufferheads to process.
Signed-off-by: Eryu Guan <eguan@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Darrick J. Wong [Wed, 12 Apr 2017 19:26:07 +0000 (12:26 -0700)]
xfs: reserve enough blocks to handle btree splits when remapping
In xfs_reflink_end_cow, we erroneously reserve only enough blocks to
handle adding 1 extent. This is problematic if we fragment free space,
have to do CoW, and then have to perform multiple bmap btree expansions.
Furthermore, the BUI recovery routine doesn't reserve /any/ blocks to
handle btree splits, so log recovery fails after our first error causes
the filesystem to go down.
Therefore, refactor the transaction block reservation macros until we
have a macro that works for our deferred (re)mapping activities, and fix
both problems by using that macro.
With 1k blocks we can hit this fairly often in g/187 if the scratch fs
is big enough.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Brian Foster [Wed, 26 Apr 2017 15:30:40 +0000 (08:30 -0700)]
xfs: wait on new inodes during quotaoff dquot release
The quotaoff operation has a race with inode allocation that results
in a livelock. An inode allocation that occurs before the quota
status flags are updated acquires the appropriate dquots for the
inode via xfs_qm_vop_dqalloc(). It then inserts the XFS_INEW inode
into the perag radix tree, sometime later attaches the dquots to the
inode and finally clears the XFS_INEW flag. Quotaoff expects to
release the dquots from all inodes in the filesystem via
xfs_qm_dqrele_all_inodes(). This invokes the AG inode iterator,
which skips inodes in the XFS_INEW state because they are not fully
constructed. If the scan occurs after dquots have been attached to
an inode, but before XFS_INEW is cleared, the newly allocated inode
will continue to hold a reference to the applicable dquots. When
quotaoff invokes xfs_qm_dqpurge_all(), the reference count of those
dquot(s) remain elevated and the dqpurge scan spins indefinitely.
To address this problem, update the xfs_qm_dqrele_all_inodes() scan
to wait on inodes marked on the XFS_INEW state. We wait on the
inodes explicitly rather than skip and retry to avoid continuous
retry loops due to a parallel inode allocation workload. Since
quotaoff updates the quota state flags and uses a synchronous
transaction before the dqrele scan, and dquots are attached to
inodes after radix tree insertion iff quota is enabled, one INEW
waiting pass through the AG guarantees that the scan has processed
all inodes that could possibly hold dquot references.
Reported-by: Eryu Guan <eguan@redhat.com>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Brian Foster [Wed, 26 Apr 2017 15:30:39 +0000 (08:30 -0700)]
xfs: update ag iterator to support wait on new inodes
The AG inode iterator currently skips new inodes as such inodes are
inserted into the inode radix tree before they are fully
constructed. Certain contexts require the ability to wait on the
construction of new inodes, however. The fs-wide dquot release from
the quotaoff sequence is an example of this.
Update the AG inode iterator to support the ability to wait on
inodes flagged with XFS_INEW upon request. Create a new
xfs_inode_ag_iterator_flags() interface and support a set of
iteration flags to modify the iteration behavior. When the
XFS_AGITER_INEW_WAIT flag is set, include XFS_INEW flags in the
radix tree inode lookup and wait on them before the callback is
executed.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Brian Foster [Wed, 26 Apr 2017 15:30:39 +0000 (08:30 -0700)]
xfs: support ability to wait on new inodes
Inodes that are inserted into the perag tree but still under
construction are flagged with the XFS_INEW bit. Most contexts either
skip such inodes when they are encountered or have the ability to
handle them.
The runtime quotaoff sequence introduces a context that must wait
for construction of such inodes to correctly ensure that all dquots
in the fs are released. In anticipation of this, support the ability
to wait on new inodes. Wake the appropriate bit when XFS_INEW is
cleared.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Amir Goldstein [Fri, 28 Apr 2017 15:10:53 +0000 (08:10 -0700)]
xfs: publish UUID in struct super_block
Copy the uuid of the filesystem to struct super_block s_uuid field,
as several other filesystems already do. Copy regardless of the nouuid
mount option, because other filesystems also do not guaranty uniqueness
of the s_uuid field in super_block struct.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Lukas Czerner [Thu, 27 Apr 2017 15:59:36 +0000 (08:59 -0700)]
xfs: Allow user to kill fstrim process
fstrim can take really long time on big, slow device or on file system
with a lots of allocation groups. Currently there is no way for the user
to cancell the operation. This patch makes it possible for the user to
kill fstrim pocess by adding the check for fatal_signal_pending() in
xfs_trim_extents().
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Reported-by: Zdenek Kabelac <zkabelac@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Darrick J. Wong [Thu, 20 Apr 2017 22:09:05 +0000 (15:09 -0700)]
xfs: better log intent item refcount checking
Use ASSERTs on the log intent item refcounts so that we fail noisily if
anyone tries to double-free the item.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Brian Foster [Fri, 21 Apr 2017 19:40:44 +0000 (12:40 -0700)]
xfs: fix up quotacheck buffer list error handling
The quotacheck error handling of the delwri buffer list assumes the
resident buffers are locked and doesn't clear the _XBF_DELWRI_Q flag
on the buffers that are dequeued. This can lead to assert failures
on buffer release and possibly other locking problems.
Move this code to a delwri queue cancel helper function to
encapsulate the logic required to properly release buffers from a
delwri queue. Update the helper to clear the delwri queue flag and
call it from quotacheck.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Christoph Hellwig [Fri, 21 Apr 2017 18:24:42 +0000 (11:24 -0700)]
xfs: remove xfs_trans_ail_delete_bulk
xfs_iflush_done uses an on-stack variable length array to pass the log
items to be deleted to xfs_trans_ail_delete_bulk. On-stack VLAs are a
nasty gcc extension that can lead to unbounded stack allocations, but
fortunately we can easily avoid them by simply open coding
xfs_trans_ail_delete_bulk in xfs_iflush_done, which is the only caller
of it except for the single-item xfs_trans_ail_delete.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Christoph Hellwig [Fri, 21 Apr 2017 18:24:42 +0000 (11:24 -0700)]
xfs: don't use bool values in trace buffers
Using bool values produces sparse warnings of this form:
fs/xfs/./xfs_trace.h:2252:1: warning: odd constant _Bool cast (
ffffffffffffffff becomes 1)
fs/xfs/./xfs_trace.h:2252:1: warning: odd constant _Bool cast (
ffffffffffffffff becomes 1)
fs/xfs/./xfs_trace.h:2278:1: warning: odd constant _Bool cast (
ffffffffffffffff becomes 1)
fs/xfs/./xfs_trace.h:2278:1: warning: odd constant _Bool cast (
ffffffffffffffff becomes 1)
fs/xfs/./xfs_trace.h:2307:1: warning: odd constant _Bool cast (
ffffffffffffffff becomes 1)
Just use a char instead to fix those up.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Darrick J. Wong [Sun, 23 Apr 2017 17:45:21 +0000 (10:45 -0700)]
xfs: fix getfsmap userspace memory corruption while setting OF_LAST
At the end of a getfsmap call, we will set FMR_OF_LAST in the last
struct fsmap that was handed in by userspace if we've truly run out of
space mapping record (as opposed to simply running out of space in the
user array). Unfortunately, fmh_entries is the wrong check for whether
or not we've filled out anything in the user array because the ioctl
provides that fmh_count==0 sets fmh_entries without filling out the user
array. Therefore we end up writing things into user memory areas that we
weren't given, and kaboom.
Since Christoph amended the getfsmap structure to track the number of
fsmap entries we've actually filled out, use that as part of deciding if
we have to set the OF_LAST flag.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Christoph Hellwig [Fri, 21 Apr 2017 18:24:41 +0000 (11:24 -0700)]
xfs: fix __user annotations for xfs_ioc_getfsmap
By passing the whole fsmap_head structure and an index we can get the
user point annotations right for the embedded variable sized array
in struct fsmap_head.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
[darrick: change idx to unsigned int]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Christoph Hellwig [Fri, 21 Apr 2017 18:24:40 +0000 (11:24 -0700)]
xfs: corruption needs to respect endianess too!
At least if we want to be able to recognize the pattern. Add a missing
byte swap to the corruption injection case in xlog_sync.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Christoph Hellwig [Fri, 21 Apr 2017 18:24:40 +0000 (11:24 -0700)]
xfs: use NULL instead of 0 to initialize a pointer in xfs_ioc_getfsmap
Found by sparse.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Christoph Hellwig [Fri, 21 Apr 2017 18:24:39 +0000 (11:24 -0700)]
xfs: use NULL instead of 0 to initialize a pointer in xfs_getfsmap
Found by sparse.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Christoph Hellwig [Thu, 20 Apr 2017 16:42:48 +0000 (09:42 -0700)]
xfs: simplify validation of the unwritten extent bit
XFS only supports the unwritten extent bit in the data fork, and only if
the file system has a version 5 superblock or the unwritten extent
feature bit.
We currently have two routines that validate the invariant:
xfs_check_nostate_extents which return -EFSCORRUPTED when it's not met,
and xfs_validate_extent that triggers and assert in debug build.
Both of them iterate over all extents of an inode fork when called,
which isn't very efficient.
This patch instead adds a new helper that verifies the invariant one
extent at a time, and calls it from the places where we iterate over
all extents to converted them from or two the in-memory format. The
callers then return -EFSCORRUPTED when reading invalid extents from
disk, or trigger an assert when writing them to disk.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Christoph Hellwig [Wed, 19 Apr 2017 22:19:49 +0000 (15:19 -0700)]
xfs: remove unused values from xfs_exntst_t
We only ever use the normal and unwritten states. And the actual
ondisk format (this enum isn't despite being in xfs_format.h) only
has space for the unwritten bit anyway.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Christoph Hellwig [Wed, 19 Apr 2017 22:19:45 +0000 (15:19 -0700)]
xfs: remove the unused XFS_MAXLINK_1 define
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Eric Sandeen [Wed, 19 Apr 2017 22:19:32 +0000 (15:19 -0700)]
xfs: more do_div cleanups
On some architectures do_div does the pointer compare
trick to make sure that we've sent it an unsigned 64-bit
number. (Why unsigned? I don't know.)
Fix up the few places that squawk about this; in
xfs_bmap_wants_extents() we just used a bare int64_t so change
that to unsigned.
In xfs_adjust_extent_unmap_boundaries() all we wanted was the
mod, and we have an xfs-specific function to handle that w/o
side effects, which includes proper casting for do_div.
In xfs_daddr_to_ag[b]no, we were using the wrong type anyway;
XFS_BB_TO_FSBT returns a block in the filesystem, so use
xfs_rfsblock_t not xfs_daddr_t, and gain the unsignedness
from that type as a bonus.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Eric Sandeen [Wed, 19 Apr 2017 19:55:57 +0000 (12:55 -0700)]
xfs: remove use of do_div with 32-bit dividend in quota
The kbuild test robot caught this; in debug code we have another
caller of do_div with a 32-bit dividend (j) which is caught now
that we are using the kernel-supplied do_div.
None of the values used here are 64-bit; just use simple division.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Hou Tao [Fri, 14 Apr 2017 18:43:27 +0000 (11:43 -0700)]
xfs: remove the trailing newline used in the fmt parameter of TP_printk
The trailing newlines wil lead to extra newlines in the trace file
which looks like the following output, so remove them.
>kworker/4:1H-1508 [004] .... 47879.101608: xfs_discard_extent: dev 8:0
>
>kworker/u16:2-238 [004] .... 47879.101725: xfs_extent_busy_clear: dev 8:0
Signed-off-by: Hou Tao <houtao1@huawei.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
[darrick: fix the getfsmap tracepoints too]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Brian Foster [Thu, 20 Apr 2017 15:06:47 +0000 (08:06 -0700)]
xfs: prevent multi-fsb dir readahead from reading random blocks
Directory block readahead uses a complex iteration mechanism to map
between high-level directory blocks and underlying physical extents.
This mechanism attempts to traverse the higher-level dir blocks in a
manner that handles multi-fsb directory blocks and simultaneously
maintains a reference to the corresponding physical blocks.
This logic doesn't handle certain (discontiguous) physical extent
layouts correctly with multi-fsb directory blocks. For example,
consider the case of a 4k FSB filesystem with a 2 FSB (8k) directory
block size and a directory with the following extent layout:
EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL
0: [0..7]: 88..95 0 (88..95) 8
1: [8..15]: 80..87 0 (80..87) 8
2: [16..39]: 168..191 0 (168..191) 24
3: [40..63]:
5242952..
5242975 1 (72..95) 24
Directory block 0 spans physical extents 0 and 1, dirblk 1 lies
entirely within extent 2 and dirblk 2 spans extents 2 and 3. Because
extent 2 is larger than the directory block size, the readahead code
erroneously assumes the block is contiguous and issues a readahead
based on the physical mapping of the first fsb of the dirblk. This
results in read verifier failure and a spurious corruption or crc
failure, depending on the filesystem format.
Further, the subsequent readahead code responsible for walking
through the physical table doesn't correctly advance the physical
block reference for dirblk 2. Instead of advancing two physical
filesystem blocks, the first iteration of the loop advances 1 block
(correctly), but the subsequent iteration advances 2 more physical
blocks because the next physical extent (extent 3, above) happens to
cover more than dirblk 2. At this point, the higher-level directory
block walking is completely off the rails of the actual physical
layout of the directory for the respective mapping table.
Update the contiguous dirblock logic to consider the current offset
in the physical extent to avoid issuing directory readahead to
unrelated blocks. Also, update the mapping table advancing code to
consider the current offset within the current dirblock to avoid
advancing the mapping reference too far beyond the dirblock.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Eric Sandeen [Thu, 13 Apr 2017 22:15:47 +0000 (15:15 -0700)]
xfs: handle array index overrun in xfs_dir2_leaf_readbuf()
Carlos had a case where "find" seemed to start spinning
forever and never return.
This was on a filesystem with non-default multi-fsb (8k)
directory blocks, and a fragmented directory with extents
like this:
0:[0,133646,2,0]
1:[2,195888,1,0]
2:[3,195890,1,0]
3:[4,195892,1,0]
4:[5,195894,1,0]
5:[6,195896,1,0]
6:[7,195898,1,0]
7:[8,195900,1,0]
8:[9,195902,1,0]
9:[10,195908,1,0]
10:[11,195910,1,0]
11:[12,195912,1,0]
12:[13,195914,1,0]
...
i.e. the first extent is a contiguous 2-fsb dir block, but
after that it is fragmented into 1 block extents.
At the top of the readdir path, we allocate a mapping array
which (for this filesystem geometry) can hold 10 extents; see
the assignment to map_info->map_size. During readdir, we are
therefore able to map extents 0 through 9 above into the array
for readahead purposes. If we count by 2, we see that the last
mapped index (9) is the first block of a 2-fsb directory block.
At the end of xfs_dir2_leaf_readbuf() we have 2 loops to fill
more readahead; the outer loop assumes one full dir block is
processed each loop iteration, and an inner loop that ensures
that this is so by advancing to the next extent until a full
directory block is mapped.
The problem is that this inner loop may step past the last
extent in the mapping array as it tries to reach the end of
the directory block. This will read garbage for the extent
length, and as a result the loop control variable 'j' may
become corrupted and never fail the loop conditional.
The number of valid mappings we have in our array is stored
in map->map_valid, so stop this inner loop based on that limit.
There is an ASSERT at the top of the outer loop for this
same condition, but we never made it out of the inner loop,
so the ASSERT never fired.
Huge appreciation for Carlos for debugging and isolating
the problem.
Debugged-and-analyzed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Tested-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Bill O'Donnell <billodo@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Chandan Rajendra [Wed, 12 Apr 2017 18:03:20 +0000 (11:03 -0700)]
iomap_dio_rw: Prevent reading file data beyond iomap_dio->i_size
On a ppc64 machine executing overlayfs/019 with xfs as the lower and
upper filesystem causes the following call trace,
WARNING: CPU: 2 PID: 8034 at /root/repos/linux/fs/iomap.c:765 .iomap_dio_actor+0xcc/0x420
Modules linked in:
CPU: 2 PID: 8034 Comm: fsstress Tainted: G L 4.11.0-rc5-next-
20170405 #100
task:
c000000631314880 task.stack:
c0000003915d4000
NIP:
c00000000035a72c LR:
c00000000035a6f4 CTR:
c00000000035a660
REGS:
c0000003915d7570 TRAP: 0700 Tainted: G L (4.11.0-rc5-next-
20170405)
MSR:
800000000282b032 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI>
CR:
24004284 XER:
00000000
CFAR:
c0000000006f7190 SOFTE: 1
GPR00:
c00000000035a6f4 c0000003915d77f0 c0000000015a3f00 000000007c22f600
GPR04:
000000000022d000 0000000000002600 c0000003b2d56360 c0000003915d7960
GPR08:
c0000003915d7cd0 0000000000000002 0000000000002600 c000000000521cc0
GPR12:
0000000024004284 c00000000fd80a00 000000004b04ae64 ffffffffffffffff
GPR16:
000000001000ca70 0000000000000000 c0000003b2d56380 c00000000153d2b8
GPR20:
0000000000000010 c0000003bc87bac8 0000000000223000 000000000022f5ff
GPR24:
c0000003b2d56360 000000000000000c 0000000000002600 000000000022d000
GPR28:
0000000000000000 c0000003915d7960 c0000003b2d56360 00000000000001ff
NIP [
c00000000035a72c] .iomap_dio_actor+0xcc/0x420
LR [
c00000000035a6f4] .iomap_dio_actor+0x94/0x420
Call Trace:
[
c0000003915d77f0] [
c00000000035a6f4] .iomap_dio_actor+0x94/0x420 (unreliable)
[
c0000003915d78f0] [
c00000000035b9f4] .iomap_apply+0xf4/0x1f0
[
c0000003915d79d0] [
c00000000035c320] .iomap_dio_rw+0x230/0x420
[
c0000003915d7ae0] [
c000000000512a14] .xfs_file_dio_aio_read+0x84/0x160
[
c0000003915d7b80] [
c000000000512d24] .xfs_file_read_iter+0x104/0x130
[
c0000003915d7c10] [
c0000000002d6234] .__vfs_read+0x114/0x1a0
[
c0000003915d7cf0] [
c0000000002d7a8c] .vfs_read+0xac/0x1a0
[
c0000003915d7d90] [
c0000000002d96b8] .SyS_read+0x58/0x100
[
c0000003915d7e30] [
c00000000000b8e0] system_call+0x38/0xfc
Instruction dump:
78630020 7f831b78 7ffc07b4 7c7ce039 40820360 a13d0018 2f890003 419e0288
2f890004 419e00a0 2f890001 419e02a8 <
0fe00000>
3b80fffb 38210100 7f83e378
The above problem can also be recreated on a regular xfs filesystem
using the command,
$ fsstress -d /mnt -l 1000 -n 1000 -p 1000
The reason for the call trace is,
1. When 'reserving' blocks for delayed allocation , XFS reserves more
blocks (i.e. past file's current EOF) than required. This is done
because XFS assumes that userspace might write more data and hence
'reserving' more blocks might lead to the file's new data being
stored contiguously on disk.
2. The in-memory 'struct xfs_bmbt_irec' mapping the file's last extent would
then cover the prealloc-ed EOF blocks in addition to the regular blocks.
3. When flushing the dirty blocks to disk, we only flush data till the
file's EOF. But before writing out the dirty data, we allocate blocks
on the disk for holding the file's new data. This allocation includes
the blocks that are part of the 'prealloc EOF blocks'.
4. Later, when the last reference to the inode is being closed, XFS frees the
unused 'prealloc EOF blocks' in xfs_inactive().
In step 3 above, When allocating space on disk for the delayed allocation
range, the space allocator might sometimes allocate less blocks than
required. If such an allocation ends right at the current EOF of the
file, We will not be able to clear the "delayed allocation" flag for the
'prealloc EOF blocks', since we won't have dirty buffer heads associated
with that range of the file.
In such a situation if a Direct I/O read operation is performed on file
range [X, Y] (where X < EOF and Y > EOF), we flush dirty data in the
range [X, Y] and invalidate page cache for that range (Refer to
iomap_dio_rw()). Later for performing the Direct I/O read, XFS obtains
the extent items (which are still cached in memory) for the file
range. When doing so we are not supposed to get an extent item with
IOMAP_DELALLOC flag set, since the previous "flush" operation should
have converted any delayed allocation data in the range [X, Y]. Hence we
end up hitting a WARN_ON_ONCE(1) statement in iomap_dio_actor().
This commit fixes the bug by preventing the read operation from going
beyond iomap_dio->i_size.
Reported-by: Santhosh G <santhog4@linux.vnet.ibm.com>
Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Christoph Hellwig [Tue, 11 Apr 2017 23:45:57 +0000 (16:45 -0700)]
xfs: remove bmap block allocation retries
Now that reflink operations don't set the firstblock value we don't
need the workarounds for non-NULL firstblock values without a prior
allocation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Christoph Hellwig [Tue, 11 Apr 2017 23:45:56 +0000 (16:45 -0700)]
xfs: remove xfs_bmap_remap_alloc
The main thing that xfs_bmap_remap_alloc does is fixing the AGFL, similar
to what we do in the space allocator. But the reflink code doesn't touch
the allocation btree unlike the normal space allocator, so we couldn't
care less about the state of the AGFL.
So remove xfs_bmap_remap_alloc and just handle the di_nblocks update in
the caller.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Christoph Hellwig [Tue, 11 Apr 2017 23:45:55 +0000 (16:45 -0700)]
xfs: introduce xfs_bmapi_remap
Add a new helper to be used for reflink extent list additions instead of
funneling them through xfs_bmapi_write and overloading the firstblock
member in struct xfs_bmalloca and struct xfs_alloc_args.
With some small changes to xfs_bmap_remap_alloc this also means we do
not need a xfs_bmalloca structure for this case at all.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Christoph Hellwig [Tue, 11 Apr 2017 23:45:54 +0000 (16:45 -0700)]
xfs: pass individual arguments to xfs_bmap_add_extent_hole_real
For the reflink case we'd much rather pass the required arguments than
faking up a struct xfs_bmalloca.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Christoph Hellwig [Tue, 11 Apr 2017 23:45:53 +0000 (16:45 -0700)]
xfs: remove attr fork handling in xfs_bmap_finish_one
We never do COW operations for the attr fork, so don't pretend we handle
them.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Christoph Hellwig [Tue, 11 Apr 2017 23:45:52 +0000 (16:45 -0700)]
xfs: fix integer truncation in xfs_bmap_remap_alloc
bno should be a xfs_fsblock_t, which is 64-bit wides instead of a
xfs_aglock_t, which truncates the value to 32 bits.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Brian Foster [Tue, 11 Apr 2017 17:50:05 +0000 (10:50 -0700)]
xfs: drop iolock from reclaim context to appease lockdep
Lockdep complains about use of the iolock in inode reclaim context
because it doesn't understand that reclaim has the last reference to
the inode, and thus an iolock->reclaim->iolock deadlock is not
possible.
The iolock is technically not necessary in xfs_inactive() and was
only added to appease an assert in xfs_free_eofblocks(), which can
be called from other non-reclaim contexts. Therefore, just kill the
assert and drop the use of the iolock from reclaim context to quiet
lockdep.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Eric Sandeen [Tue, 11 Apr 2017 17:45:17 +0000 (10:45 -0700)]
xfs: remove custom do_div implementations
Long ago, all this gunk was added with a lament about problems
with gcc's do_div, and a fun recommendation in the changelog:
egcs-2.91.66 is the recommended compiler version for building XFS.
All this special stuff was needed to work around an old gcc bug,
apparently, and it's been there ever since.
There should be no need for this anymore, so remove it.
Remove the special 32-bit xfs_do_mod as well; just let the
kernel's do_div() handle all this.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Eric Sandeen [Thu, 6 Apr 2017 23:01:47 +0000 (16:01 -0700)]
xfs: simplify xfs_calc_dquots_per_chunk
ndquots is a 32-bit value, and we don't care
about the remainder; there is no reason to use do_div
here, it seems to be the result of a decade+ historical
accident.
Worse, the do_div implementation in userspace breaks
when fed a 32-bit dividend, so we commented it out there
in any case.
Change to simple division, and then we can change
userspace to match, and mandate a 64-bit dividend in
the do_div() in userspace as well.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Darrick J. Wong [Thu, 6 Apr 2017 23:00:39 +0000 (16:00 -0700)]
xfs: actually report xattr extents via iomap
Apparently FIEMAP for xattrs has been broken since we switched to
the iomap backend because of an incorrect check for xattr presence.
Also fix the broken locking.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Christoph Hellwig [Thu, 6 Apr 2017 23:00:11 +0000 (16:00 -0700)]
xfs: fold __xfs_trans_roll into xfs_trans_roll
No one cares about the low-level helper anymore.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Darrick J. Wong [Tue, 28 Mar 2017 21:56:38 +0000 (14:56 -0700)]
xfs: report realtime space information via the rtbitmap
Use the realtime bitmap to return free space information via getfsmap.
Eventually this will be superseded by the realtime rmapbt code.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Darrick J. Wong [Tue, 28 Mar 2017 21:56:37 +0000 (14:56 -0700)]
xfs: have getfsmap fall back to the freesp btrees when rmap is not present
If the reverse-mapping btree isn't available, fall back to the
free space btrees to provide partial reverse mapping information.
The online scrub tool can make use of even partial information to
speed up the data block scan.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Darrick J. Wong [Tue, 28 Mar 2017 21:56:37 +0000 (14:56 -0700)]
xfs: implement the GETFSMAP ioctl
Introduce a new ioctl that uses the reverse mapping btree to return
information about the physical layout of the filesystem.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Darrick J. Wong [Tue, 28 Mar 2017 21:56:36 +0000 (14:56 -0700)]
xfs: add a couple of queries to iterate free extents in the rtbitmap
Add _query_range and _query_all functions to the realtime bitmap
allocator. These two functions are similar in usage to the btree
functions with the same name and will be used for getfsmap and scrub.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Darrick J. Wong [Tue, 28 Mar 2017 21:56:35 +0000 (14:56 -0700)]
xfs: create a function to query all records in a btree
Create a helper function that will query all records in a btree.
This will be used by the online repair functions to examine every
record in a btree to rebuild a second btree.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Darrick J. Wong [Tue, 28 Mar 2017 21:56:35 +0000 (14:56 -0700)]
xfs: provide a query_range function for freespace btrees
Implement a query_range function for the bnobt and cntbt. This will
be used for getfsmap fallback if there is no rmapbt and by the online
scrub and repair code.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Darrick J. Wong [Tue, 28 Mar 2017 21:56:34 +0000 (14:56 -0700)]
xfs: plumb in needed functions for range querying of the freespace btrees
Plumb in the pieces (init_high_key, diff_two_keys) necessary to call
query_range on the free space btrees. Remove the debugging asserts
so that we can make queries starting from block 0.
While we're at it, merge the redundant "if (btnum ==" hunks.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Darrick J. Wong [Tue, 28 Mar 2017 21:56:34 +0000 (14:56 -0700)]
vfs: add common GETFSMAP ioctl definitions
Add the GETFSMAP headers to the VFS kernel headers
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Darrick J. Wong [Mon, 3 Apr 2017 22:17:57 +0000 (15:17 -0700)]
xfs: fix over-copying of getbmap parameters from userspace
In xfs_ioc_getbmap, we should only copy the fields of struct getbmap
from userspace, or else we end up copying random stack contents into the
kernel. struct getbmap is a strict subset of getbmapx, so a partial
structure copy should work fine.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Nikolay Borisov [Tue, 28 Mar 2017 21:55:15 +0000 (14:55 -0700)]
xfs: Remove obsolete declaration of xfs_buf_get_empty
This function has been removed ever since at least 3.12-era. No need to
keep its declaration in the header so nuke it.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Eric Sandeen [Tue, 28 Mar 2017 21:54:29 +0000 (14:54 -0700)]
xfs: fix up inode validation failure message
"xfs_iread: validation failed for inode 96 failed"
One "failed" seems like enough.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Alex Elder <elder@linaro.org>
Reviewed-by: Bill O'Donnell <billodo@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Christoph Hellwig [Tue, 28 Mar 2017 21:53:36 +0000 (14:53 -0700)]
xfs: remove the ISUNWRITTEN macro
Opencoding the trivial checks makes it much easier to read (and grep..).
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Christoph Hellwig [Tue, 28 Mar 2017 21:53:35 +0000 (14:53 -0700)]
xfs: factor out a xfs_bmap_is_real_extent helper
This checks for all the non-normal extent types, including handling both
encodings of delayed allocations.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Brian Foster [Tue, 28 Mar 2017 21:51:44 +0000 (14:51 -0700)]
xfs: use dedicated log worker wq to avoid deadlock with cil wq
The log covering background task used to be part of the xfssyncd
workqueue. That workqueue was removed as of commit
5889608df ("xfs:
syncd workqueue is no more") and the associated work item scheduled
to the xfs-log wq. The latter is used for log buffer I/O completion.
Since xfs_log_worker() can invoke a log flush, a deadlock is
possible between the xfs-log and xfs-cil workqueues. Consider the
following codepath from xfs_log_worker():
xfs_log_worker()
xfs_log_force()
_xfs_log_force()
xlog_cil_force()
xlog_cil_force_lsn()
xlog_cil_push_now()
flush_work()
The above is in xfs-log wq context and blocked waiting on the
completion of an xfs-cil work item. Concurrently, the cil push in
progress can end up blocked here:
xlog_cil_push_work()
xlog_cil_push()
xlog_write()
xlog_state_get_iclog_space()
xlog_wait(&log->l_flush_wait, ...)
The above is in xfs-cil context waiting on log buffer I/O
completion, which executes in xfs-log wq context. In this scenario
both workqueues are deadlocked waiting on eachother.
Add a new workqueue specifically for the high level log covering and
ail pushing worker, as was the case prior to commit
5889608df.
Diagnosed-by: David Jeffery <djeffery@redhat.com>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Darrick J. Wong [Mon, 3 Apr 2017 22:17:53 +0000 (15:17 -0700)]
xfs: fix kernel memory exposure problems
Fix a memory exposure problems in inumbers where we allocate an array of
structures with holes, fail to zero the holes, then blindly copy the
kernel memory contents (junk and all) into userspace.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Calvin Owens [Fri, 31 Mar 2017 14:57:55 +0000 (07:57 -0700)]
xfs: Honor FALLOC_FL_KEEP_SIZE when punching ends of files
When punching past EOF on XFS, fallocate(mode=PUNCH_HOLE|KEEP_SIZE) will
round the file size up to the nearest multiple of PAGE_SIZE:
calvinow@vm-disks/generic-xfs-1 ~$ dd if=/dev/urandom of=test bs=2048 count=1
calvinow@vm-disks/generic-xfs-1 ~$ stat test
Size: 2048 Blocks: 8 IO Block: 4096 regular file
calvinow@vm-disks/generic-xfs-1 ~$ fallocate -n -l 2048 -o 2048 -p test
calvinow@vm-disks/generic-xfs-1 ~$ stat test
Size: 4096 Blocks: 8 IO Block: 4096 regular file
Commit
3c2bdc912a1cc050 ("xfs: kill xfs_zero_remaining_bytes") replaced
xfs_zero_remaining_bytes() with calls to iomap helpers. The new helpers
don't enforce that [pos,offset) lies strictly on [0,i_size) when being
called from xfs_free_file_space(), so by "leaking" these ranges into
xfs_zero_range() we get this buggy behavior.
Fix this by reintroducing the checks xfs_zero_remaining_bytes() did
against i_size at the bottom of xfs_free_file_space().
Reported-by: Aaron Gao <gzh@fb.com>
Fixes:
3c2bdc912a1cc050 ("xfs: kill xfs_zero_remaining_bytes")
Cc: Christoph Hellwig <hch@lst.de>
Cc: Brian Foster <bfoster@redhat.com>
Cc: <stable@vger.kernel.org> # 4.8+
Signed-off-by: Calvin Owens <calvinowens@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Darrick J. Wong [Tue, 28 Mar 2017 21:51:10 +0000 (14:51 -0700)]
xfs: rework the inline directory verifiers
The inline directory verifiers should be called on the inode fork data,
which means after iformat_local on the read side, and prior to
ifork_flush on the write side. This makes the fork verifier more
consistent with the way buffer verifiers work -- i.e. they will operate
on the memory buffer that the code will be reading and writing directly.
Furthermore, revise the verifier function to return -EFSCORRUPTED so
that we don't flood the logs with corruption messages and assert
notices. This has been a particular problem with xfs/348, which
triggers the XFS_WANT_CORRUPTED_RETURN assertions, which halts the
kernel when CONFIG_XFS_DEBUG=y. Disk corruption isn't supposed to do
that, at least not in a verifier.
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
v2: get the inode d_ops the proper way
v3: describe the bug that this patch fixes; no code changes
Linus Torvalds [Sun, 26 Mar 2017 21:15:16 +0000 (14:15 -0700)]
Linux 4.11-rc4
Linus Torvalds [Sun, 26 Mar 2017 18:15:54 +0000 (11:15 -0700)]
Merge tag 'char-misc-4.11-rc4' of git://git./linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes from Greg KH:
"A smattering of different small fixes for some random driver
subsystems. Nothing all that major, just resolutions for reported
issues and bugs.
All have been in linux-next with no reported issues"
* tag 'char-misc-4.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (21 commits)
extcon: int3496: Set the id pin to direction-input if necessary
extcon: int3496: Use gpiod_get instead of gpiod_get_index
extcon: int3496: Add dependency on X86 as it's Intel specific
extcon: int3496: Add GPIO ACPI mapping table
extcon: int3496: Rename GPIO pins in accordance with binding
vmw_vmci: handle the return value from pci_alloc_irq_vectors correctly
ppdev: fix registering same device name
parport: fix attempt to write duplicate procfiles
auxdisplay: img-ascii-lcd: add missing sentinel entry in img_ascii_lcd_matches
Drivers: hv: vmbus: Don't leak memory when a channel is rescinded
Drivers: hv: vmbus: Don't leak channel ids
Drivers: hv: util: don't forget to init host_ts.lock
Drivers: hv: util: move waiting for release to hv_utils_transport itself
vmbus: remove hv_event_tasklet_disable/enable
vmbus: use rcu for per-cpu channel list
mei: don't wait for os version message reply
mei: fix deadlock on mei reset
intel_th: pci: Add Gemini Lake support
intel_th: pci: Add Denverton SOC support
intel_th: Don't leak module refcount on failure to activate
...
Linus Torvalds [Sun, 26 Mar 2017 18:05:42 +0000 (11:05 -0700)]
Merge tag 'driver-core-4.11-rc4' of git://git./linux/kernel/git/gregkh/driver-core
Pull driver core fix from Greg KH:
"Here is a single kernfs fix for 4.11-rc4 that resolves a reported
issue.
It has been in linux-next with no reported issues"
* tag 'driver-core-4.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
kernfs: Check KERNFS_HAS_RELEASE before calling kernfs_release_file()
Linus Torvalds [Sun, 26 Mar 2017 18:03:42 +0000 (11:03 -0700)]
Merge tag 'tty-4.11-rc4' of git://git./linux/kernel/git/gregkh/tty
Pull tty/serial driver fixes from Greg KH:
"Here are some tty and serial driver fixes for 4.11-rc4.
One of these fix a long-standing issue in the ldisc code that was
found by Dmitry Vyukov with his great fuzzing work. The other fixes
resolve other reported issues, and there is one revert of a patch in
4.11-rc1 that wasn't correct.
All of these have been in linux-next for a while with no reported
issues"
* tag 'tty-4.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
tty: fix data race in tty_ldisc_ref_wait()
tty: don't panic on OOM in tty_set_ldisc()
Revert "tty: serial: pl011: add ttyAMA for matching pl011 console"
tty: acpi/spcr: QDF2400 E44 checks for wrong OEM revision
serial: 8250_dw: Fix breakage when HAVE_CLK=n
serial: 8250_dw: Honor clk_round_rate errors in dw8250_set_termios
Linus Torvalds [Sun, 26 Mar 2017 18:02:00 +0000 (11:02 -0700)]
Merge tag 'staging-4.11-rc4' of git://git./linux/kernel/git/gregkh/staging
Pull IIO driver fixes from Greg KH:
"Here are some small IIO driver fixes for 4.11-rc4 that resolve a
number of tiny reported issues. All of these have been in linux-next
for a while with no reported issues"
* tag 'staging-4.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
iio: imu: st_lsm6dsx: fix FIFO_CTRL2 overwrite during watermark configuration
iio: adc: ti_am335x_adc: fix fifo overrun recovery
iio: sw-device: Fix config group initialization
iio: magnetometer: ak8974: remove incorrect __exit markups
iio: hid-sensor-trigger: Change get poll value function order to avoid sensor properties losing after resume from S3
Linus Torvalds [Sun, 26 Mar 2017 17:52:52 +0000 (10:52 -0700)]
Merge tag 'usb-4.11-rc4' of git://git./linux/kernel/git/gregkh/usb
Pull USB/PHY fixes from Greg KH:
"Here are a number of small USB and PHY driver fixes for 4.11-rc4.
Nothing major here, just an bunch of small fixes, and a handfull of
good fixes from Johan for devices with crazy descriptors. There are a
few new device ids in here as well.
All of these have been in linux-next with no reported issues"
* tag 'usb-4.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (26 commits)
usb: gadget: f_hid: fix: Don't access hidg->req without spinlock held
usb: gadget: udc: remove pointer dereference after free
usb: gadget: f_uvc: Sanity check wMaxPacketSize for SuperSpeed
usb: gadget: f_uvc: Fix SuperSpeed companion descriptor's wBytesPerInterval
usb: gadget: acm: fix endianness in notifications
usb: dwc3: gadget: delay unmap of bounced requests
USB: serial: qcserial: add Dell DW5811e
usb: hub: Fix crash after failure to read BOS descriptor
ACM gadget: fix endianness in notifications
USB: usbtmc: fix probe error path
USB: usbtmc: add missing endpoint sanity check
USB: serial: option: add Quectel UC15, UC20, EC21, and EC25 modems
usb: musb: fix possible spinlock deadlock
usb: musb: dsps: fix iounmap in error and exit paths
usb: musb: cppi41: don't check early-TX-interrupt for Isoch transfer
usb-core: Add LINEAR_FRAME_INTR_BINTERVAL USB quirk
uwb: i1480-dfu: fix NULL-deref at probe
uwb: hwa-rc: fix NULL-deref at probe
USB: wusbcore: fix NULL-deref at probe
USB: uss720: fix NULL-deref at probe
...
Linus Torvalds [Sun, 26 Mar 2017 17:34:10 +0000 (10:34 -0700)]
Merge tag 'powerpc-4.11-6' of git://git./linux/kernel/git/powerpc/linux
Pull more powerpc fixes from Michael Ellerman:
"These are all pretty minor. The fix for idle wakeup would be a bad bug
but has not been observed in practice.
The update to the gcc-plugins docs was Cc'ed to Kees and Jon, Kees
OK'ed it going via powerpc and I didn't hear from Jon.
- cxl: Route eeh events to all slices for pci_channel_io_perm_failure state
- powerpc/64s: Fix idle wakeup potential to clobber registers
- Revert "powerpc/64: Disable use of radix under a hypervisor"
- gcc-plugins: update architecture list in documentation
Thanks to: Andrew Donnellan, Nicholas Piggin, Paul Mackerras, Vaibhav
Jain"
* tag 'powerpc-4.11-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
gcc-plugins: update architecture list in documentation
Revert "powerpc/64: Disable use of radix under a hypervisor"
powerpc/64s: Fix idle wakeup potential to clobber registers
cxl: Route eeh events to all slices for pci_channel_io_perm_failure state
Linus Torvalds [Sun, 26 Mar 2017 17:29:21 +0000 (10:29 -0700)]
Merge tag 'ext4_for_linus_stable' of git://git./linux/kernel/git/tytso/ext4
Pull ext4 fixes from Ted Ts'o:
"Fix a memory leak on an error path, and two races when modifying
inodes relating to the inline_data and metadata checksum features"
* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: fix two spelling nits
ext4: lock the xattr block before checksuming it
jbd2: don't leak memory if setting up journal fails
ext4: mark inode dirty after converting inline directory
Linus Torvalds [Sat, 25 Mar 2017 22:36:56 +0000 (15:36 -0700)]
Merge tag 'fscrypt-for-linus_stable' of git://git./linux/kernel/git/tytso/fscrypt
Pull fscrypto fixes from Ted Ts'o:
"A code cleanup and bugfix for fs/crypto"
* tag 'fscrypt-for-linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt:
fscrypt: eliminate ->prepare_context() operation
fscrypt: remove broken support for detecting keyring key revocation
Linus Torvalds [Sat, 25 Mar 2017 22:31:50 +0000 (15:31 -0700)]
Merge tag 'hwmon-for-linus-v4.11-rc4' of git://git./linux/kernel/git/groeck/linux-staging
Pull hwmon fixes from Guenter Roeck:
- bug fixes in asus_atk0110, it87 and max31790 drivers
- added missing API definition to hwmon core
* tag 'hwmon-for-linus-v4.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (asus_atk0110) fix uninitialized data access
hwmon: Add missing HWMON_T_ALARM
hwmon: (it87) Avoid registering the same chip on both SIO addresses
hwmon: (max31790) Set correct PWM value
Linus Torvalds [Sat, 25 Mar 2017 22:25:58 +0000 (15:25 -0700)]
Merge tag 'for-linus' of git://git./linux/kernel/git/dledford/rdma
Pull rdma fixes from Doug Ledford:
"This has been a slow -rc cycle for the RDMA subsystem. We really
haven't had a lot of rc fixes come in. This pull request is the first
of this entire rc cycle and it has all of the suitable fixes so far
and it's still only about 20 patches. The fix for the minor breakage
cause by the dma mapping patchset is in here, as well as a couple
other potential oops fixes, but the rest is more minor.
Summary:
- fix for dma_ops change in this kernel, resolving the s390, powerpc,
and IOMMU operation
- a few other oops fixes
- the rest are all minor fixes"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
IB/qib: fix false-postive maybe-uninitialized warning
RDMA/iser: Fix possible mr leak on device removal event
IB/device: Convert ib-comp-wq to be CPU-bound
IB/cq: Don't process more than the given budget
IB/rxe: increment msn only when completing a request
uapi: fix rdma/mlx5-abi.h userspace compilation errors
IB/core: Restore I/O MMU, s390 and powerpc support
IB/rxe: Update documentation link
RDMA/ocrdma: fix a type issue in ocrdma_put_pd_num()
IB/rxe: double free on error
RDMA/vmw_pvrdma: Activate device on ethernet link up
RDMA/vmw_pvrdma: Dont hardcode QP header page
RDMA/vmw_pvrdma: Cleanup unused variables
infiniband: Fix alignment of mmap cookies to support VIPT caching
IB/core: Protect against self-requeue of a cq work item
i40iw: Receive netdev events post INET_NOTIFIER state
Linus Torvalds [Sat, 25 Mar 2017 22:13:55 +0000 (15:13 -0700)]
Merge branch 'stable-4.11' of git://git.infradead.org/users/pcmoore/audit
Pull audit fix from Paul Moore:
"We've got an audit fix, and unfortunately it is big.
While I'm not excited that we need to be sending you something this
large during the -rcX phase, it does fix some very real, and very
tangled, problems relating to locking, backlog queues, and the audit
daemon connection.
This code has passed our testsuite without problem and it has held up
to my ad-hoc stress tests (arguably better than the existing code),
please consider pulling this as fix for the next v4.11-rcX tag"
* 'stable-4.11' of git://git.infradead.org/users/pcmoore/audit:
audit: fix auditd/kernel connection state tracking
Theodore Ts'o [Sat, 25 Mar 2017 21:33:31 +0000 (17:33 -0400)]
ext4: fix two spelling nits
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Theodore Ts'o [Sat, 25 Mar 2017 21:22:47 +0000 (17:22 -0400)]
ext4: lock the xattr block before checksuming it
We must lock the xattr block before calculating or verifying the
checksum in order to avoid spurious checksum failures.
https://bugzilla.kernel.org/show_bug.cgi?id=193661
Reported-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
Linus Torvalds [Sat, 25 Mar 2017 17:34:56 +0000 (10:34 -0700)]
Merge tag 'clk-fixes-for-linus' of git://git./linux/kernel/git/clk/linux
Pull clk fixes from Stephen Boyd:
"A handful of Sunxi and Rockchip clk driver fixes and a core framework
one where we need to copy a string because we can't guarantee it isn't
freed sometime later"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: sunxi-ng: fix recalc_rate formula of NKMP clocks
clk: sunxi-ng: Fix div/mult settings for osc12M on A64
clk: rockchip: Make uartpll a child of the gpll on rk3036
clk: rockchip: add "," to mux_pll_src_apll_dpll_gpll_usb480m_p on rk3036
clk: core: Copy connection id
dt-bindings: arm: update Armada CP110 system controller binding
clk: sunxi-ng: sun6i: Fix enable bit offset for hdmi-ddc module clock
clk: sunxi: ccu-sun5i needs nkmp
clk: sunxi-ng: mp: Adjust parent rate for pre-dividers
Arnd Bergmann [Tue, 14 Mar 2017 12:18:45 +0000 (13:18 +0100)]
IB/qib: fix false-postive maybe-uninitialized warning
aarch64-linux-gcc-7 complains about code it doesn't fully understand:
drivers/infiniband/hw/qib/qib_iba7322.c: In function 'qib_7322_txchk_change':
include/asm-generic/bitops/non-atomic.h:105:35: error: 'shadow' may be used uninitialized in this function [-Werror=maybe-uninitialized]
The code is right, and despite trying hard, I could not come up with a version
that I liked better than just adding a fake initialization here to shut up the
warning.
Fixes:
f931551bafe1 ("IB/qib: Add new qib driver for QLogic PCIe InfiniBand adapters")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sagi Grimberg [Mon, 27 Feb 2017 18:16:33 +0000 (20:16 +0200)]
RDMA/iser: Fix possible mr leak on device removal event
When the rdma device is removed, we must cleanup all
the rdma resources within the DEVICE_REMOVAL event
handler to let the device teardown gracefully. When
this happens with live I/O, some memory regions are
occupied. Thus, track them too and dereg all the mr's.
We are safe with mr access by iscsi_iser_cleanup_task.
Reported-by: Raju Rangoju <rajur@chelsio.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sagi Grimberg [Wed, 8 Mar 2017 20:03:17 +0000 (22:03 +0200)]
IB/device: Convert ib-comp-wq to be CPU-bound
This workqueue is used by our storage target mode ULPs
via the new CQ API. Recent observations when working
with very high-end flash storage devices reveal that
UNBOUND workqueue threads can migrate between cpu cores
and even numa nodes (although some numa locality is accounted
for).
While this attribute can be useful in some workloads,
it does not fit in very nicely with the normal
run-to-completion model we usually use in our target-mode
ULPs and the block-mq irq<->cpu affinity facilities.
The whole block-mq concept is that the completion will
land on the same cpu where the submission was performed.
The fact that our submitter thread is migrating cpus
can break this locality.
We assume that as a target mode ULP, we will serve multiple
initiators/clients and we can spread the load enough without
having to use unbound kworkers.
Also, while we're at it, expose this workqueue via sysfs which
is harmless and can be useful for debug.
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>--
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sagi Grimberg [Thu, 16 Mar 2017 16:57:00 +0000 (18:57 +0200)]
IB/cq: Don't process more than the given budget
The caller might not want this overhead.
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
David Marchand [Fri, 24 Feb 2017 14:38:26 +0000 (15:38 +0100)]
IB/rxe: increment msn only when completing a request
According to C9-147, MSN should only be incremented when the last packet of
a multi packet request has been received.
"Logically, the requester associates a sequential Send Sequence Number
(SSN) with each WQE posted to the send queue. The SSN bears a one-
to-one relationship to the MSN returned by the responder in each re-
sponse packet. Therefore, when the requester receives a response, it in-
terprets the MSN as representing the SSN of the most recent request
completed by the responder to determine which send WQE(s) can be
completed."
Fixes:
8700e3e7c485 ("Soft RoCE driver")
Signed-off-by: David Marchand <david.marchand@6wind.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Dmitry V. Levin [Fri, 24 Feb 2017 00:28:13 +0000 (03:28 +0300)]
uapi: fix rdma/mlx5-abi.h userspace compilation errors
Consistently use types from linux/types.h to fix the following
rdma/mlx5-abi.h userspace compilation errors:
/usr/include/rdma/mlx5-abi.h:69:25: error: 'u64' undeclared here (not in a function)
MLX5_LIB_CAP_4K_UAR = (u64)1 << 0,
/usr/include/rdma/mlx5-abi.h:69:29: error: expected ',' or '}' before numeric constant
MLX5_LIB_CAP_4K_UAR = (u64)1 << 0,
Include <linux/if_ether.h> to fix the following rdma/mlx5-abi.h
userspace compilation error:
/usr/include/rdma/mlx5-abi.h:286:12: error: 'ETH_ALEN' undeclared here (not in a function)
__u8 dmac[ETH_ALEN];
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Bart Van Assche [Tue, 7 Mar 2017 22:56:53 +0000 (22:56 +0000)]
IB/core: Restore I/O MMU, s390 and powerpc support
Avoid that the following error message is reported on the console
while loading an RDMA driver with I/O MMU support enabled:
DMAR: Allocating domain for mlx5_0 failed
Ensure that DMA mapping operations that use to_pci_dev() to
access to struct pci_dev see the correct PCI device. E.g. the s390
and powerpc DMA mapping operations use to_pci_dev() even with I/O
MMU support disabled.
This patch preserves the following changes of the DMA mapping updates
patch series:
- Introduction of dma_virt_ops.
- Removal of ib_device.dma_ops.
- Removal of struct ib_dma_mapping_ops.
- Removal of an if-statement from each ib_dma_*() operation.
- IB HW drivers no longer set dma_device directly.
Reported-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reported-by: Parav Pandit <parav@mellanox.com>
Fixes: commit
99db9494035f ("IB/core: Remove ib_device.dma_device")
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: parav@mellanox.com
Tested-by: parav@mellanox.com
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Leon Romanovsky [Tue, 28 Feb 2017 19:42:53 +0000 (21:42 +0200)]
IB/rxe: Update documentation link
All Soft-RoCE (rxe) is handled now in rdma-core user space library,
so the documentation. The patch below updates the documentation
link to that new location.
Reported-by: Josh Beavers <josh.beavers@gmail.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Dan Carpenter [Thu, 23 Feb 2017 10:40:16 +0000 (13:40 +0300)]
RDMA/ocrdma: fix a type issue in ocrdma_put_pd_num()
We want to return zero on success or negative error codes. The type
should be int and not u8.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Dan Carpenter [Wed, 8 Mar 2017 05:21:52 +0000 (08:21 +0300)]
IB/rxe: double free on error
"goto err;" has it's own kfree_skb() call so it's a double free. We
only need to free on the "goto exit;" path.
Fixes:
8700e3e7c485 ("Soft RoCE driver")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Aditya Sarwade [Thu, 23 Feb 2017 01:22:58 +0000 (17:22 -0800)]
RDMA/vmw_pvrdma: Activate device on ethernet link up
Restore device state when ethernet link changes to active.
Acked-by: George Zhang <georgezhang@vmware.com>
Acked-by: Jorgen Hansen <jhansen@vmware.com>
Acked-by: Bryan Tan <bryantan@vmware.com>
Signed-off-by: Aditya Sarwade <asarwade@vmware.com>
Signed-off-by: Adit Ranadive <aditr@vmware.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Adit Ranadive [Thu, 23 Feb 2017 01:22:57 +0000 (17:22 -0800)]
RDMA/vmw_pvrdma: Dont hardcode QP header page
Moved the header page count to a macro.
Reported-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Adit Ranadive <aditr@vmware.com>
Reviewed-by: Aditya Sarwade <asarwade@vmware.com>
Tested-by: Andrew Boyer <andrew.boyer@dell.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Adit Ranadive [Thu, 23 Feb 2017 01:22:56 +0000 (17:22 -0800)]
RDMA/vmw_pvrdma: Cleanup unused variables
Removed the unused nreq and redundant index variables.
Moved hardcoded async and cq ring pages number to macro.
Reported-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Adit Ranadive <aditr@vmware.com>
Reviewed-by: Aditya Sarwade <asarwade@vmware.com>
Tested-by: Andrew Boyer <andrew.boyer@dell.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Linus Torvalds [Fri, 24 Mar 2017 21:39:36 +0000 (14:39 -0700)]
Merge tag 'vfio-v4.11-rc4' of git://github.com/awilliam/linux-vfio
Pull VFIO fix from Alex Williamson:
"Rework sanity check for mdev driver group notifier de-registration
(Alex Williamson)"
* tag 'vfio-v4.11-rc4' of git://github.com/awilliam/linux-vfio:
vfio: Rework group release notifier warning
Linus Torvalds [Fri, 24 Mar 2017 21:37:12 +0000 (14:37 -0700)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"A few fixes for the current series that should go into -rc4. This
contains:
- a fix for a potential corruption of un-started requests from Ming.
- a blk-stat fix from Omar, ensuring we flush the stat batch before
checking nr_samples.
- a set of fixes from Sagi for the nvmeof family"
* 'for-linus' of git://git.kernel.dk/linux-block:
blk-mq: don't complete un-started request in timeout handler
nvme-loop: handle cpu unplug when re-establishing the controller
nvme-rdma: handle cpu unplug when re-establishing the controller
nvmet-rdma: Fix a possible uninitialized variable dereference
nvmet: confirm sq percpu has scheduled and switched to atomic
nvme-loop: fix a possible use-after-free when destroying the admin queue
blk-stat: fix blk_stat_sum() if all samples are batched
Linus Torvalds [Fri, 24 Mar 2017 21:35:39 +0000 (14:35 -0700)]
Merge tag 'ceph-for-4.11-rc4' of git://github.com/ceph/ceph-client
Pull ceph fix from Ilya Dryomov:
"A fix for a writeback deadlock caused by a GFP_KERNEL allocation on
the reclaim path, tagged for stable"
* tag 'ceph-for-4.11-rc4' of git://github.com/ceph/ceph-client:
libceph: force GFP_NOIO for socket allocations
Linus Torvalds [Fri, 24 Mar 2017 21:32:21 +0000 (14:32 -0700)]
Merge tag 'armsoc-fixes' of git://git./linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Arnd Bergmann:
- a couple of OMAP 4.11 regression fixes, including a boot regression
for SmartReflex, hypervisor mode in thumb2 mode, and reference
counting of device nodes
- a fix for cpu_idle on at91
- minor DT fixes on across several platforms: sunxi, bcm53xx, at91,
nsp, ns2, ux500, omap
- a fix to correct an API change in the reset controllers
* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (22 commits)
arm64: dts: NS2: Add dma-coherent to relevant DT entries
reset: fix optional reset_control_get stubs to return NULL
ARM: sun8i: a23/a33: drop bl_en_pin GPIO pinmux in reference design DTSI
ARM: dts: sun7i: lamobo-r1: Fix CPU port RGMII settings
ARM: dts: NSP: GPIO reboot open-source
ARM: at91: pm: cpu_idle: switch DDR to power-down mode
ARM: dts: add the AB8500 clocks to the device tree
ARM: dts: imx6sx-udoo-neo: Fix reboot hang
ARM: sun8i: Fix the mali clock rate
ARM: dts: BCM5301X: Correct GIC_PPI interrupt flags
ARM: dts: BCM5301X: Fix memory start address
ARM: dts: BCM5301X: Fix UARTs on bcm953012k
Revert "ARM: at91/dt: sama5d2: Use new compatible for ohci node"
ARM: OMAP2+: Release device node after it is no longer needed.
ARM: OMAP2+: Fix device node reference counts
ARM: OMAP2+: Remove legacy gpmc-nand.c
ARM: OMAP2+: gpmc-onenand: propagate error on initialization failure
ARM: dts: am335x-pcm953: Fix legacy wakeup source binding
ARM: omap2plus_defconfig: Enable INPUT_MOUSEDEV as loadable modules
ARM: dts: am57xx-idk: tpic2810 is on I2C bus, not SPI
...
Linus Torvalds [Fri, 24 Mar 2017 21:29:23 +0000 (14:29 -0700)]
Merge tag 'for-linus-4.11b-rc4-tag' of git://git./linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
"Fixes for PM under Xen"
* tag 'for-linus-4.11b-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/acpi: upload PM state from init-domain to Xen
xen/acpi: Replace hard coded "ACPI0007"
Linus Torvalds [Fri, 24 Mar 2017 21:21:09 +0000 (14:21 -0700)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"There's a kaslr fix and then two patches to update our native and
compat syscall tables. Arnd asked that we take the addition of statx
to the asm-generic unistd.h via arm64, as he didn't have anything
queued in the asm-generic tree.
Summary:
- Fix mapping of kernel image under certain kaslr offsets
- Hook up new statx syscall in asm-generic syscall table
- Update compat syscall table to match arch/arm/ (pkeys and statx)"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: kaslr: Fix up the kernel image alignment
arm64: compat: Update compat syscalls
generic syscalls: Wire up statx syscall
Linus Torvalds [Fri, 24 Mar 2017 21:11:36 +0000 (14:11 -0700)]
Merge branch 'linus' of git://git./linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
"This fixes regressions in the crypto ccp driver and the hwrng drivers
for amd and geode"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
hwrng: geode - Revert managed API changes
hwrng: amd - Revert managed API changes
crypto: ccp - Assign DMA commands to the channel's CCP
Jason Gunthorpe [Fri, 10 Mar 2017 18:34:20 +0000 (11:34 -0700)]
infiniband: Fix alignment of mmap cookies to support VIPT caching
When vmalloc_user is used to create memory that is supposed to be mmap'd
to user space, it is necessary for the mmap cookie (eg the offset) to be
aligned to SHMLBA.
This creates a situation where all virtual mappings of the same physical
page share the same virtual cache index and guarantees VIPT coherence.
Otherwise the cache is non-coherent and the kernel will not see writes
by userspace when reading the shared page (or vice-versa).
Reported-by: Josh Beavers <josh.beavers@gmail.com>
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Linus Torvalds [Fri, 24 Mar 2017 20:42:17 +0000 (13:42 -0700)]
Merge tag 'iommu-fixes-v4.11-rc3' of git://git./linux/kernel/git/joro/iommu
Pull IOMMU fixes from Joerg Roedel:
"A few fixes piled up:
- fix a NULL-ptr dereference that happens in VT-d on some platforms
- a fix for ARM MSI region reporting, so that a sane interface makes
it to a released kernel
- fixes for leaf-checking in ARM io-page-table code
- two fixes for IO/TLB flushing code on ARM Exynos platforms"
* tag 'iommu-fixes-v4.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu: Disambiguate MSI region types
iommu/exynos: Workaround FLPD cache flush issues for SYSMMU v5
iommu/exynos: Block SYSMMU while invalidating FLPD cache
iommu/vt-d: Fix NULL pointer dereference in device_to_iommu
iommu/io-pgtable-arm-v7s: Check for leaf entry before dereferencing it
iommu/io-pgtable-arm: Check for leaf entry before dereferencing it
Sagi Grimberg [Wed, 8 Mar 2017 20:00:52 +0000 (22:00 +0200)]
IB/core: Protect against self-requeue of a cq work item
We need to make sure that the cq work item does not
run when we are destroying the cq. Unlike flush_work,
cancel_work_sync protects against self-requeue of the
work item (which we can do in ib_cq_poll_work).
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>--
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Linus Torvalds [Fri, 24 Mar 2017 20:37:40 +0000 (13:37 -0700)]
Merge tag 'mmc-v4.11-rc2' of git://git./linux/kernel/git/ulfh/mmc
Pull mmc fixes from Ulf Hansson:
"Here are a couple of mmc fixes intended for v4.11 rc4.
MMC core:
- Fix initialization of HS400-ES eMMC cards
- A couple of fixes for the mmc block device driver
- Resolved a compiler warning
MMC host:
- sdhci: Do not disable IRQs while waiting for clock
- sdhci-pci: Do not disable IRQs in sdhci_intel_set_power
- sdhci-of-arasan: Fix incorrect timeout clock
- mediatek: Fix bug for setting wrong clock frequency
- sdhci-of-at91: Use regulator to fix cmd timeout errors
- ushc: Fix NULL-deref at probe
- rockchip-dw-mshc: Rename RK1108 to RV1108 in DT"
* tag 'mmc-v4.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: sdhci-pci: Do not disable interrupts in sdhci_intel_set_power
mmc: sdhci: Do not disable interrupts while waiting for clock
mmc: ushc: fix NULL-deref at probe
mmc: sdhci-of-at91: Support external regulators
mmc: core: mmc_blk_rw_cmd_err - remove unused variable
mmc: mediatek: Fixed bug where clock frequency could be set wrong
mmc: block: Fix cmd error reset failure path
mmc: block: Fix is_waiting_last_req set incorrectly
mmc: core: Fix access to HS400-ES devices
mmc: sdhci-of-arasan: fix incorrect timeout clock
dt-bindings: rockchip-dw-mshc: rename RK1108 to RV1108
Linus Torvalds [Fri, 24 Mar 2017 20:34:16 +0000 (13:34 -0700)]
Merge tag 'media/v4.11-3' of git://git./linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
- dvb-usb-firmware: don't do DMA on stack
- coda/imx-vdoa: platform_driver should not be const
- bdisp: Clean up file handle in open() error path
- exynos-gsc: Do not swap cb/cr for semi planar formats
* tag 'media/v4.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
[media] exynos-gsc: Do not swap cb/cr for semi planar formats
[media] bdisp: Clean up file handle in open() error path
[media] coda/imx-vdoa: platform_driver should not be const
[media] dvb-usb-firmware: don't do DMA on stack
Shiraz Saleem [Fri, 17 Mar 2017 23:30:07 +0000 (18:30 -0500)]
i40iw: Receive netdev events post INET_NOTIFIER state
Netdev notification events are de-registered only when all
client iwdev instances are removed. If a single client is closed
and re-opened, netdev events could arrive even before the Control
Queue-Pair (CQP) is created, causing a NULL pointer dereference crash
in i40iw_get_cqp_request. Fix this by allowing netdev event
notification only after we have reached the INET_NOTIFIER state with
respect to device initialization.
Reported-by: Stefan Assmann <sassmann@redhat.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Linus Torvalds [Fri, 24 Mar 2017 20:15:52 +0000 (13:15 -0700)]
Merge tag 'drm-fixes-for-v4.11-rc4' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
- one core drm/fbdev regression fix
- a set of i915 fixes including a few GVT related fixes, along with
some reset fixes
- one new PCI id for amdgpu, and some minor workaround regression
fixes
- .. and a set of exynos fixes, dropping support for an old unsupported
SoC, some vblank timing fixes, and an info leak fix
* tag 'drm-fixes-for-v4.11-rc4' of git://people.freedesktop.org/~airlied/linux: (34 commits)
drm/fb-helper: Allow var->x/yres(_virtual) < fb->width/height again
drm/i915: make context status notifier head be per engine
drm/i915: Avoid rcu_barrier() from reclaim paths (shrinker)
drm/exynos/dsi: make te-gpios optional
drm/exynos: Print kernel pointers in a restricted form
drm/exynos/decon5433: fix software trigger mask
drm/exynos/fimd: signal frame done interrupt at front porch
drm/exynos/decon5433: signal frame done interrupt at front porch
drm/exynos/decon5433: fix vblank event handling
drm/exynos: move crtc event handling to drivers callbacks
drm/exynos: Remove support for Exynos4415 (SoC not supported anymore)
drm/exynos/decon5433: & vs | typo
drm/amd/amdgpu: add POLARIS12 PCI ID
drm/i915/gvt: Fix gvt scheduler interval time
drm/i915/gvt: GVT pin/unpin shadow context
drm/i915/gvt: scan shadow indirect context image when valid
drm/i915/kvmgt: fix suspicious rcu dereference usage
drm/i915/gvt: add enable_execlists check before enable gvt
drm/i915/gvt: Remove bogus retry around i915_wait_request
drm/i915/gvt: correct the ggtt valid bit check in pipe control command
...
Arnd Bergmann [Fri, 24 Mar 2017 16:51:50 +0000 (17:51 +0100)]
Merge tag 'arm-soc/for-4.11/devicetree-arm64-fixes' of github.com/Broadcom/stblinux into fixes
Pull "Broadcom arm64 Device Tree fixes for 4.11" from Florian Fainelli:
This pull request contains Broadcom ARM64-based SoCs Device Tree fixes for 4.11,
please pull the following:
- Jon adds missing "dma-coherent" property to the Northstar 2 DTS include file
in order to fix both performance and cache problems for: PCIe, Ethernet,
PDC/mailbox, SATA3 and SDHCI
* tag 'arm-soc/for-4.11/devicetree-arm64-fixes' of http://github.com/Broadcom/stblinux:
arm64: dts: NS2: Add dma-coherent to relevant DT entries
Arnd Bergmann [Fri, 24 Mar 2017 16:49:40 +0000 (17:49 +0100)]
Merge tag 'arm-soc/for-4.11/devicetree-fixes-2' of github.com/Broadcom/stblinux into fixes
Pull "Broadcom arm Device Tree fixes for 4.11 (part 2)" from Florian Fainelli:
This pull request contains Broadcom ARM-based SoCs Device Tree fixes for 4.11,
please pull the following:
- Jon fixes a reboot issue on most Northstar Plus platforms by adding the
"open-source" property to the "gpio-restart" Device Tree nodes
* tag 'arm-soc/for-4.11/devicetree-fixes-2' of http://github.com/Broadcom/stblinux:
ARM: dts: NSP: GPIO reboot open-source
Linus Torvalds [Fri, 24 Mar 2017 03:00:39 +0000 (20:00 -0700)]
Merge tag 'pm-4.11-rc4' of git://git./linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"One of these is an intel_pstate regression fix and it is not a small
change, but it mostly removes code that shouldn't be there. That code
was acquired by mistake and has been a source of constant pain since
then, so the time has come to get rid of it finally. We have not seen
problems with this change in the lab, so fingers crossed.
The rest is more usual: one more intel_pstate commit removing useless
code, a cpufreq core fix to make it restore policy limits on CPU
online (which prevents the limits from being reset over system
suspend/resume), a schedutil cpufreq governor initialization fix to
make it actually work as advertised on all systems and an extra sanity
check in the cpuidle core to prevent crashes from happening if the
arch code messes things up.
Specifics:
- Make intel_pstate use one set of global P-state limits in the
active mode regardless of the scaling_governor settings for
individual CPUs instead of switching back and forth between two of
them in a way that is hard to control (Rafael Wysocki).
- Drop a useless function from intel_pstate to prevent it from
modifying the maximum supported frequency value unexpectedly which
may confuse the cpufreq core (Rafael Wysocki).
- Fix the cpufreq core to restore policy limits on CPU online so that
the limits are not reset over system suspend/resume, among other
things (Viresh Kumar).
- Fix the initialization of the schedutil cpufreq governor to make
the IO-wait boosting mechanism in it actually work on systems with
one CPU per cpufreq policy (Rafael Wysocki).
- Add a sanity check to the cpuidle core to prevent crashes from
happening if the architecture code initialization fails to set up
things as expected (Vaidyanathan Srinivasan)"
* tag 'pm-4.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: Restore policy min/max limits on CPU online
cpuidle: Validate cpu_dev in cpuidle_add_sysfs()
cpufreq: intel_pstate: Fix policy data management in passive mode
cpufreq: schedutil: Fix per-CPU structure initialization in sugov_start()
cpufreq: intel_pstate: One set of global limits in active mode
Linus Torvalds [Fri, 24 Mar 2017 02:51:06 +0000 (19:51 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input
Pull input fixes from Dmitry Torokhov:
"Fixes to various USB drivers to validate existence of endpoints before
trying to use them, fixes to APLS v8 protocol, and a couple of i8042
quirks"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: ALPS - fix trackstick button handling on V8 devices
Input: ALPS - fix V8+ protocol handling (73 03 28)
Input: sur40 - validate number of endpoints before using them
Input: kbtab - validate number of endpoints before using them
Input: hanwang - validate number of endpoints before using them
Input: yealink - validate number of endpoints before using them
Input: ims-pcu - validate number of endpoints before using them
Input: cm109 - validate number of endpoints before using them
Input: iforce - validate number of endpoints before using them
Input: elan_i2c - add ASUS EeeBook X205TA special touchpad fw
Input: i8042 - add TUXEDO BU1406 (N24_25BU) to the nomux list
Input: synaptics-rmi4 - prevent null pointer dereference in f30
Input: i8042 - add noloop quirk for Dell Embedded Box PC 3000
Dave Airlie [Fri, 24 Mar 2017 01:05:06 +0000 (11:05 +1000)]
Merge branch 'drm-fixes-4.11' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
A few small fixes for 4.11
* 'drm-fixes-4.11' of git://people.freedesktop.org/~agd5f/linux:
drm/amd/amdgpu: add POLARIS12 PCI ID
drm/amdgpu: fix the clearing wb size
drm/amdgpu: reinstate oland workaround for sclk
drm/radeon: reinstate oland workaround for sclk