Jaegeuk Kim [Wed, 7 Oct 2015 16:46:37 +0000 (09:46 -0700)]
f2fs: add F2FS_GOING_DOWN_METAFLUSH to test power-failure
This patch introduces F2FS_GOING_DOWN_METAFLUSH which flushes meta pages like
SSA blocks and then blocks all the writes.
This can be used by power-failure tests.
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Thu, 1 Oct 2015 23:42:55 +0000 (16:42 -0700)]
f2fs: merge meta writes as many possible
This patch tries to merge IOs as many as possible when background flusher
conducts flushing the dirty meta pages.
[Before]
...
2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 124320, size = 4096
f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 124560, size = 32768
f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 95720, size = 987136
f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 123928, size = 4096
f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 123944, size = 8192
f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 123968, size = 45056
f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 124064, size = 4096
f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 97648, size =
1007616
f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 123776, size = 8192
f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 123800, size = 32768
f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 124624, size = 4096
f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 99616, size = 921600
f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 123608, size = 4096
f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 123624, size = 77824
f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 123792, size = 4096
f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 123864, size = 32768
...
[After]
...
f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 92168, size = 892928
f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 93912, size = 753664
f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 95384, size = 716800
f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 96784, size = 712704
f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 104160, size = 364544
f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 104872, size = 356352
f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 105568, size = 278528
f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 106112, size = 319488
f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 106736, size = 258048
f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 107240, size = 270336
f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 107768, size = 180224
...
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Mon, 5 Oct 2015 21:49:57 +0000 (14:49 -0700)]
f2fs: introduce a periodic checkpoint flow
This patch introduces a periodic checkpoint feature.
Note that, this is not enforcing to conduct checkpoints very strictly in terms
of trigger timing, instead just hope to help user experiences.
The default value is 60 seconds.
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Mon, 5 Oct 2015 18:32:34 +0000 (11:32 -0700)]
f2fs: add a tracepoint for background gc
This patch introduces a tracepoint to monitor background gc behaviors.
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Mon, 5 Oct 2015 18:02:54 +0000 (11:02 -0700)]
f2fs: introduce background_gc=sync mount option
This patch introduce background_gc=sync enabling synchronous cleaning in
background.
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Mon, 5 Oct 2015 14:24:19 +0000 (22:24 +0800)]
f2fs: introduce a new ioctl F2FS_IOC_WRITE_CHECKPOINT
This patch introduce a new ioctl for those users who want to trigger
checkpoint from userspace through ioctl.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Mon, 5 Oct 2015 14:22:44 +0000 (22:22 +0800)]
f2fs: support synchronous gc in ioctl
This patch drops in batches gc triggered through ioctl, since user
can easily control the gc by designing the loop around the ->ioctl.
We support synchronous gc by forcing using FG_GC in f2fs_gc, so with
it, user can make sure that in this round all blocks gced were
persistent in the device until ioctl returned.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Mon, 5 Oct 2015 14:20:40 +0000 (22:20 +0800)]
f2fs: skip searching dirty map if dirty segment is not exist
When searching victim during gc, if there are no dirty segments in
filesystem, we will still take the time to search the whole dirty segment
map, it's not needed, it's better to skip in this condition.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Mon, 5 Oct 2015 14:19:24 +0000 (22:19 +0800)]
f2fs: fix to avoid redundant searching in dirty map during gc
When doing gc, we search a victim in dirty map, starting from position of
last victim, we will reset the current searching position until we touch
the end of dirty map, and then search the whole diryt map. So sometimes we
will search the range [victim, last] twice, it's redundant, this patch
avoids this issue.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Wed, 30 Sep 2015 09:38:48 +0000 (17:38 +0800)]
f2fs: use atomic64_t for extent cache hit stat
Our hit stat of extent cache will increase all the time until remount,
and we use atomic_t type for the stat variable, so it may easily incur
overflow when we query extent cache frequently in a long time running
fs.
So to avoid that, this patch uses atomic64_t for hit stat variables.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Tue, 22 Sep 2015 20:50:47 +0000 (13:50 -0700)]
f2fs: use vmalloc to handle -ENOMEM error
This patch introduces f2fs_kvmalloc to avoid -ENOMEM during mount.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Sat, 19 Sep 2015 00:33:00 +0000 (17:33 -0700)]
f2fs: should get a victim from retrials
If we do not call get_victim first, we cannot get a new victim for retrial
path.
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Mon, 28 Sep 2015 09:42:24 +0000 (17:42 +0800)]
f2fs: fix to correct freed section number during gc
This patch fixes to maintain the right section count freed in garbage
collecting when triggering a foreground gc.
Besides, when a foreground gc is running on current selected section, once
we fail to gc one segment, it's better to abandon gcing the left segments
in current section, because anyway we will select next victim for
foreground gc, so gc on the left segments in previous section will become
overhead and also cause the long latency for caller.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Mon, 28 Sep 2015 09:46:01 +0000 (17:46 +0800)]
f2fs: fix to update {m,c}time correctly when truncating larger
This patch fixes to update ctime and atime correctly when truncating
larger in ->setattr.
The bug is reported by xfstest generic/313 as below:
generic/313 2s ... - output mismatch (see ./results/generic/313.out.bad)
--- tests/generic/313.out 2015-08-04 15:28:53.
430798882 +0800
+++ results/generic/313.out.bad 2015-09-28 17:04:27.
294278016 +0800
@@ -1,2 +1,4 @@
QA output created by 313
Silence is golden
+ctime not updated after truncate up
+mtime not updated after truncate up
...
(Run 'diff -u tests/generic/313.out tests/generic/313.out.bad' to see the entire diff)
Ran: generic/313
Failures: generic/313
Failed 1 of 1 tests
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Sat, 26 Sep 2015 02:34:50 +0000 (19:34 -0700)]
f2fs: do not skip dentry block writes
Previously, we skip dentry block writes when wbc is SYNC_NONE with no memory
pressure and the number of dirty pages is pretty small.
But, we didn't skip for normal data writes, which gives us not much big impact
on overall performance.
Moreover, by skipping some data writes, kworker falls into infinite loop to try
to write blocks, when many dir inodes have only one dentry block.
So, this patch removes skipping data writes.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Fri, 25 Sep 2015 09:54:56 +0000 (17:54 +0800)]
f2fs: remove unneeded f2fs_{,un}lock_op in do_recover_data()
Protecting recovery flow by using cp_rwsem is not needed, since we have
prevent triggering any checkpoint by locking cp_mutex previously.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Wed, 23 Sep 2015 01:25:43 +0000 (09:25 +0800)]
f2fs: fix incorrect bimodal calculation
In update_sit_info, we use div_u64 to handle 'u64 divide u64' case, but
div_u64 can only handle 32-bits divisor, so our divisor with u64 type
passed to div_u64 will overflow, result in the wrong calculation when
show debug info of f2fs as below:
BDF: 464, avg. vblocks: 23509
(BDF should never exceed 100)
So change to use div64_u64 to handle this case correctly.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Tue, 22 Sep 2015 13:07:47 +0000 (21:07 +0800)]
f2fs: introduce __try_update_largest_extent
This patch adds a new helper __try_update_largest_extent for cleanup.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Nicholas Krause [Mon, 21 Sep 2015 22:55:49 +0000 (18:55 -0400)]
f2fs: fix error handling for calls to various functions in the function recover_inline_data
This fixes error handling for calls to various functions in the
function recover_inline_data to check if these particular functions
either return a error code or the boolean value false to signal their
caller they have failed internally and if this arises return false
to signal failure immediately to the caller of recover_inline_data
as we cannot continue after failures to calling either the function
truncate_inline_inode or truncate_blocks.
Signed-off-by: Nicholas Krause <xerofoify@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Fri, 18 Sep 2015 08:55:26 +0000 (16:55 +0800)]
f2fs: disallow switch extent_cache option dynamically
Swith extent_cache option dynamically when remount may casue consistency
issue between extent cache and dnode page. Fix in this patch to avoid
that condition.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Fri, 18 Sep 2015 08:54:16 +0000 (16:54 +0800)]
f2fs: use correct flag in f2fs_map_blocks()
We introduce F2FS_GET_BLOCK_READ in commit
e2b4e2bc8865 ("f2fs: fix
incorrect mapping for bmap"), but forget to use this flag in the right
place, fix it.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Mon, 21 Sep 2015 12:17:52 +0000 (20:17 +0800)]
f2fs: fix to handle io error in ->direct_IO
Here is a oops reported as following message when testing generic/019 of
xfstest:
------------[ cut here ]------------
kernel BUG at /home/yuchao/git/f2fs-dev/segment.c:882!
invalid opcode: 0000 [#1] SMP
Modules linked in: zram lz4_compress lz4_decompress f2fs(O) ip6table_filter ip6_tables ebtable_nat ebtables nf_conntrack_ipv4
nf_def
CPU: 2 PID: 25441 Comm: fio Tainted: G O 4.3.0-rc1+ #6
Hardware name: Hewlett-Packard HP Z220 CMT Workstation/1790, BIOS K51 v01.61 05/16/2013
task:
ffff8803f4e85580 ti:
ffff8803fd61c000 task.ti:
ffff8803fd61c000
RIP: 0010:[<
ffffffffa0784981>] [<
ffffffffa0784981>] new_curseg+0x321/0x330 [f2fs]
RSP: 0018:
ffff8803fd61f918 EFLAGS:
00010246
RAX:
00000000000007ed RBX:
0000000000000224 RCX:
000000000000001f
RDX:
0000000000000800 RSI:
ffffffffffffffff RDI:
ffff8803f56f4300
RBP:
ffff8803fd61f978 R08:
0000000000000000 R09:
0000000000000000
R10:
0000000000000024 R11:
ffff8800d23bbd78 R12:
ffff8800d0ef0000
R13:
0000000000000224 R14:
0000000000000000 R15:
0000000000000001
FS:
00007f827ff85700(0000) GS:
ffff88041ea80000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
ffffffffff600000 CR3:
00000003fef17000 CR4:
00000000001406e0
Stack:
000007ea00000002 0000000100000001 ffff8803f6456248 000007ed0000002b
0000000000000224 ffff880404d1aa20 ffff8803fd61f9c8 ffff8800d0ef0000
ffff8803f6456248 0000000000000001 00000000ffffffff ffffffffa078f358
Call Trace:
[<
ffffffffa0785b87>] allocate_segment_by_default+0x1a7/0x1f0 [f2fs]
[<
ffffffffa078322c>] allocate_data_block+0x17c/0x360 [f2fs]
[<
ffffffffa0779521>] __allocate_data_block+0x131/0x1d0 [f2fs]
[<
ffffffffa077a995>] f2fs_direct_IO+0x4b5/0x580 [f2fs]
[<
ffffffff811510ae>] generic_file_direct_write+0xae/0x160
[<
ffffffff811518f5>] __generic_file_write_iter+0xd5/0x1f0
[<
ffffffff81151e07>] generic_file_write_iter+0xf7/0x200
[<
ffffffff81319e38>] ? apparmor_file_permission+0x18/0x20
[<
ffffffffa0768480>] ? f2fs_fallocate+0x1190/0x1190 [f2fs]
[<
ffffffffa07684c6>] f2fs_file_write_iter+0x46/0x90 [f2fs]
[<
ffffffff8120b4fe>] aio_run_iocb+0x1ee/0x290
[<
ffffffff81700f7e>] ? mutex_lock+0x1e/0x50
[<
ffffffff8120a1d7>] ? aio_read_events+0x207/0x2b0
[<
ffffffff8120b913>] do_io_submit+0x373/0x630
[<
ffffffff8120a4f6>] ? SyS_io_getevents+0x56/0xb0
[<
ffffffff8120bbe0>] SyS_io_submit+0x10/0x20
[<
ffffffff81703857>] entry_SYSCALL_64_fastpath+0x12/0x6a
Code: 45 c8 48 8b 78 10 e8 9f 23 bf e0 41 8b 8c 24 cc 03 00 00 89 c7 31 d2 89 c6 89 d8 29 df f7 f1 29 d1 39 cf 0f 83 be fd ff ff eb
RIP [<
ffffffffa0784981>] new_curseg+0x321/0x330 [f2fs]
RSP <
ffff8803fd61f918>
---[ end trace
2e577d7f711ddb86 ]---
The reason is that: in the test of generic/019, we will trigger a manmade
IO error in block layer through debugfs, after that, prefree segment will
no longer be freed, because we always skip doing gc or checkpoint when
there occurs an IO error.
Meanwhile fio with aio engine generated a large number of direct IOs,
which continue allocating spaces in free segment until we run out of them,
eventually, results in panic in new_curseg as no more free segment was
found.
So, this patch changes to return EIO in direct_IO for this condition.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Thu, 17 Sep 2015 12:22:44 +0000 (20:22 +0800)]
f2fs: do in batches truncation in truncate_hole
truncate_data_blocks_range can do in batches truncation which makes all
changes in dnode page content, dnode page status, extent cache, block
count updating together.
But previously, truncate_hole() always truncates one block in dnode page
at a time by invoking truncate_data_blocks_range(,1), which make thing
slow.
This patch changes truncate_hole() to do in batches truncation for all
target blocks in one direct node inside truncate_data_blocks_range, which
can make our punch hole operation in ->fallocate more efficent.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Fan Li [Thu, 17 Sep 2015 10:42:06 +0000 (18:42 +0800)]
f2fs: optimize code of f2fs_update_extent_tree_range
Fix 2 potential problems:
1. when largest extent needs to be invalidated, it will be reset in
__drop_largest_extent, which makes __is_extent_same after always
return false, and largest extent unchanged. Now we update it properly.
2. when extent is split and the latter part remains in tree, next_en
should be the latter part instead of next extent of original extent.
It will cause merge failure if there is in-place update, although
there is not, I think this fix will still makes codes less ambiguous.
This patch also simplifies codes of invalidating extents, and optimizes the
procedues that split extent into two.
There are a few modifications after last patch:
1. prev_en now is updated properly.
2. more codes and branches are simplified.
Signed-off-by: Fan li <fanofcode.li@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Fan Li [Thu, 17 Sep 2015 10:24:17 +0000 (18:24 +0800)]
f2fs: drop largest extent by range
now we update extent by range, fofs may not be on the largest
extent if the new extent overlaps with it. so add a new function
to drop largest extent properly.
Signed-off-by: Fan li <fanofcode.li@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Wed, 16 Sep 2015 21:06:54 +0000 (14:06 -0700)]
f2fs: check end_io for metapages before making next checkpoint blocks
This patch avoids to produce new checkpoint blocks before the previous meta
pages were written completely.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Thu, 3 Sep 2015 20:38:23 +0000 (13:38 -0700)]
f2fs crypto: allocate buffer for decrypting filename
We got dentry pages from high_mem, and its address space directly goes into the
decryption path via f2fs_fname_disk_to_usr.
But, sg_init_one assumes the address is not from high_mem, so we can get this
panic since it doesn't call kmap_high but kunmap_high is triggered at the end.
kernel BUG at ../../../../../../kernel/mm/highmem.c:290!
Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
...
(kunmap_high+0xb0/0xb8) from [<
c0114534>] (__kunmap_atomic+0xa0/0xa4)
(__kunmap_atomic+0xa0/0xa4) from [<
c035f028>] (blkcipher_walk_done+0x128/0x1ec)
(blkcipher_walk_done+0x128/0x1ec) from [<
c0366c24>] (crypto_cbc_decrypt+0xc0/0x170)
(crypto_cbc_decrypt+0xc0/0x170) from [<
c0367148>] (crypto_cts_decrypt+0xc0/0x114)
(crypto_cts_decrypt+0xc0/0x114) from [<
c035ea98>] (async_decrypt+0x40/0x48)
(async_decrypt+0x40/0x48) from [<
c032ca34>] (f2fs_fname_disk_to_usr+0x124/0x304)
(f2fs_fname_disk_to_usr+0x124/0x304) from [<
c03056fc>] (f2fs_fill_dentries+0xac/0x188)
(f2fs_fill_dentries+0xac/0x188) from [<
c03059c8>] (f2fs_readdir+0x1f0/0x300)
(f2fs_readdir+0x1f0/0x300) from [<
c0218054>] (vfs_readdir+0x90/0xb4)
(vfs_readdir+0x90/0xb4) from [<
c0218418>] (SyS_getdents64+0x64/0xcc)
(SyS_getdents64+0x64/0xcc) from [<
c0105ba0>] (ret_fast_syscall+0x0/0x30)
Cc: <stable@vger.kernel.org>
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Fri, 18 Sep 2015 08:51:51 +0000 (16:51 +0800)]
f2fs: reorganize f2fs_map_blocks
In this patch, we try to reorganize f2fs_map_blocks to make block mapping
flow more clear by using following structure:
/* check status of mapping */
if (unmapped) {
/* blkaddr == NULL_ADDR || blkaddr == NEW_ADDR */
if (create) {
/* write path, handle dio write case here */
alloc_and_map;
} else {
/*
* handle read cases from all call paths:
* 1. generic read;
* 2. dio read;
* 3. fiemap;
* 4. bmap
*/
}
}
/* map buffer_header */
Besides, this patch handles the missing case correctly for dio write:
When we fail in __allocate_data_blocks, then in f2fs_map_blocks, we will
not allocate blocks correctly for preallocated blocks, but returning with
an unmapped buffer head, which will result in failure of dio write.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Tue, 15 Sep 2015 21:46:43 +0000 (14:46 -0700)]
f2fs: declare f2fs_update_extent_tree_range as static
This function should be static.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Fri, 11 Sep 2015 06:43:52 +0000 (14:43 +0800)]
f2fs: fix overflow of size calculation
We have potential overflow issue when calculating size of object, when
we left shift index with PAGE_CACHE_SHIFT bits, if type of index has only
32-bits space in 32-bit architecture, left shifting will incur overflow,
i.e:
pgoff_t index = 0xFFFFFFFF;
loff_t size = index << PAGE_CACHE_SHIFT;
size: 0xFFFFF000
So we should cast index with 64-bits type to avoid this issue.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Fri, 11 Sep 2015 06:43:02 +0000 (14:43 +0800)]
f2fs: fix incorrect searching position when shrinking extent cache
When shrinking extent cache, we have two steps in the flow:
1) shrink objects which are unreferenced by inodes;
2) shrink objects from LRU list of extent cache.
In step 1, if we haven't shrunk enough number of objects, we will try
step 2, but before that we didn't update the searching position which
may point to last inode index in global extent tree, result in failing
to shrink objects by traversing the all inodes' extent tree.
In this patch, we reset searching position to beginning of global extent
tree for fixing.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Fri, 11 Sep 2015 06:39:02 +0000 (14:39 +0800)]
f2fs: verify file type early in f2fs_fallocate
This patch changes to verify file type early in f2fs_fallocate for
cleanup, meanwhile this also fixes to add missing verification for
expand_inode_data.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Sat, 12 Sep 2015 18:25:30 +0000 (11:25 -0700)]
f2fs: no need to lock for update_inode_page all the time
As comment says, we don't need to call f2fs_lock_op in write_inode to prevent
from producing dirty node pages all the time.
That happens only when there is not enough free sections and we can avoid that
by calling balance_fs in prior to that.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Sat, 12 Sep 2015 18:13:04 +0000 (11:13 -0700)]
f2fs: cover number of dirty node pages under node_write lock
This number is referenced by checkpoint under node_write lock.
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Nicholas Krause [Sun, 6 Sep 2015 12:28:46 +0000 (08:28 -0400)]
f2fs: fix incorrect return statement in the function f2fs_ioc_release_volatile_write
This fixes the incorrect return statement at the end of the function
f2fs_ioc_release_volatile_write's body for returning zero as this is
incorrect due to the function call before this return statement to
the function punch_hole being able to fail and we should return this
function's return fail directly in order to signal to callers of the
function f2fs_ioc_release_volatile if a failure arises with this call
to punch_hole fails.
Signed-off-by: Nicholas Krause <xerofoify@gmail.com>
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Sun, 6 Sep 2015 09:50:13 +0000 (17:50 +0800)]
f2fs: trace in batches extent info update
Rename trace_f2fs_update_extent_tree to trace_f2fs_update_extent_tree_range,
then expand and enable it to trace in batches extent info updates.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Linus Torvalds [Wed, 7 Oct 2015 17:17:46 +0000 (18:17 +0100)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"This addresses a couple of issues found with RT, a broken initrd
message in the console log and a simple performance fix for some MMC
workloads.
Summary:
- A couple of locking fixes for RT kernels
- Avoid printing bogus initrd warnings when initrd isn't present
- Performance fix for random mmap file readahead
- Typo fix"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: replace read_lock to rcu lock in call_break_hook
arm64: Don't relocate non-existent initrd
arm64: convert patch_lock to raw lock
arm64: readahead: fault retry breaks mmap file read random detection
arm64: debug: Fix typo in debug-monitors.c
Linus Torvalds [Wed, 7 Oct 2015 17:05:09 +0000 (18:05 +0100)]
Merge tag 'fbdev-fixes-4.3' of git://git./linux/kernel/git/tomba/linux
Pull fbdev fixes from Tomi Valkeinen:
- fbdev: Minor fixes to broadsheetfb, fsl-diu-fb, mb862xxfb, tridentfb,
omapfb
- display-timing: Fix memory leak in error path
* tag 'fbdev-fixes-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux:
video: of: fix memory leak
fbdev: broadsheetfb: fix memory leak
OMAPDSS: panel-sony-acx565akm: Export OF module alias information
fbdev: omap2: connector-dvi: use of_get_i2c_adapter_by_node interface
tridentfb: Fix set_lwidth on TGUI9440 and CYBER9320
tridentfb: fix hang on Blade3D with CONFIG_CC_OPTIMIZE_FOR_SIZE
video: fbdev: mb862xx: Fix module autoload for OF platform driver
video: fbdev: fsl: Fix the sleep function for FSL DIU module
Linus Torvalds [Wed, 7 Oct 2015 13:15:49 +0000 (14:15 +0100)]
Merge tag 'regmap-fix-v4.3-rc4' of git://git./linux/kernel/git/broonie/regmap
Pull regmap fixes from Mark Brown:
"A couple of fixes for the debugfs information on the register map,
fixing issues with very small reads potentially causing underflows and
wraparounds"
* tag 'regmap-fix-v4.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
regmap: debugfs: Don't bother actually printing when calculating max length
regmap: debugfs: Ensure we don't underflow when printing access masks
Linus Torvalds [Wed, 7 Oct 2015 13:13:10 +0000 (14:13 +0100)]
Merge tag 'spi-fix-v4.3-rc4' of git://git./linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"A couple of very minor fixes, one for error handling in the Davinci
driver probe function and another making the Renesas sh-msiof DT
binding documentation correspond to what's actually implemented"
* tag 'spi-fix-v4.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: sh-msiof: Match renesas,rx-fifo-size in DT bindings doc with driver
spi: davinci: fix handling platform_get_irq result
Linus Torvalds [Wed, 7 Oct 2015 13:11:47 +0000 (14:11 +0100)]
Merge tag 'regulator-fix-v4.3-rc4' of git://git./linux/kernel/git/broonie/regulator
Pull regulator fixes from Mark Brown:
"Two fixes here, one device specific fix for axp20x and a core fix for
cases where one regulator is supplying another which broke probe
deferral, substituting in a dummy regulator too aggressively"
* tag 'regulator-fix-v4.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: core: Handle probe deferral from DT when resolving supplies
regulator: axp20x: Fix enable bit indexes for DCDC4 and DCDC5
Sudip Mukherjee [Wed, 30 Sep 2015 09:54:08 +0000 (15:24 +0530)]
video: of: fix memory leak
If of_parse_display_timing() fails we are printing an error message and
jumping to the error path but we missed freeing "dt".
Signed-off-by: Sudip Mukherjee <sudip@vectorindia.org>
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
Mark Brown [Wed, 7 Oct 2015 10:43:39 +0000 (11:43 +0100)]
Merge remote-tracking branches 'spi/fix/davinci' and 'spi/fix/sh-msiof' into spi-linus
Linus Torvalds [Wed, 7 Oct 2015 08:52:42 +0000 (09:52 +0100)]
Merge branch 'strscpy' of git://git./linux/kernel/git/cmetcalf/linux-tile
Pull strscpy fixes from Chris Metcalf :
"This patch series fixes up a couple of architecture issues where
strscpy wasn't configured correctly (missing on h8300, duplicating
local and asm-generic copies on powerpc and tile).
It also adds a use of zero_bytemask() to the final store for strscpy
to avoid writing uninitialized data to the destination. However, to
make this work we had to add support for zero_bytemask() to the two
architectures that didn't have it (alpha and tile), because they were
providing their own local copies, but didn't provide the
zero_bytemask() that was previously only required when building with
CONFIG_DCACHE_WORD_ACCESS"
[ Side note: there is still no actual users of strscpy except for the
one preexisting use in arch/tile that predates the generic version.
So this is all about fixing the infrastructure so that we eventually
can start using it. - Linus ]
* 'strscpy' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
strscpy: zero any trailing garbage bytes in the destination
word-at-a-time.h: support zero_bytemask() on alpha and tile
word-at-a-time.h: fix some Kbuild files
Linus Torvalds [Wed, 7 Oct 2015 08:35:15 +0000 (09:35 +0100)]
Merge tag 'for-linus-
20151006' of git://git.infradead.org/linux-mtd
Pull MTD fixes from Brian Norris:
"A few MTD fixes:
- mxc_nand: a "refactoring only" change in 4.3-rc1 had some bad
pointer (array) arithmetic. Fix that
- sunxi_nand:
- Fix an old list manipulation / memory management bug in the device
release() code path
- Correct a few mistakes in OOB write support"
* tag 'for-linus-
20151006' of git://git.infradead.org/linux-mtd:
mxc_nand: fix copy_spare
mtd: nand: sunxi: fix sunxi_nand_chips_cleanup()
mtd: nand: sunxi: fix OOB handling in ->write_xxx() functions
Linus Torvalds [Wed, 7 Oct 2015 07:54:22 +0000 (08:54 +0100)]
Merge tag 'nfs-for-4.3-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client bugfixes from Trond Myklebust:
"Highlights include:
Bugfixes:
- Fix a use-after-free bug in the RPC/RDMA client
- Fix a write performance regression
- Fix up page writeback accounting
- Don't try to reclaim unused state owners
- Fix a NFSv4 nograce recovery hang
- reset states to use open_stateid when returning delegation
voluntarily
- Fix a tracepoint NULL-pointer dereference"
* tag 'nfs-for-4.3-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFS: Fix a tracepoint NULL-pointer dereference
nfs4: reset states to use open_stateid when returning delegation voluntarily
NFSv4: Fix a nograce recovery hang
NFSv4.1: nfs4_opendata_check_deleg needs to handle NFS4_OPEN_CLAIM_DELEG_CUR_FH
NFSv4: Don't try to reclaim unused state owners
NFS: Fix a write performance regression
NFS: Fix up page writeback accounting
xprtrdma: disconnect and flush cqs before freeing buffers
Linus Torvalds [Wed, 7 Oct 2015 07:32:38 +0000 (08:32 +0100)]
Revert "fs: do not prefault sys_write() user buffer pages"
This reverts commit
998ef75ddb5709bbea0bf1506cd2717348a3c647.
The commit itself does not appear to be buggy per se, but it is exposing
a bug in ext4 (and Ted thinks ext3 too, but we solved that by getting
rid of it). It's too late in the release cycle to really worry about
this, even if Dave Hansen has a patch that may actually fix the
underlying ext4 problem. We can (and should) revisit this for the next
release.
The problem is that moving the prefaulting later now exposes a special
case with partially successful writes that isn't handled correctly. And
the prefaulting likely isn't normally even that much of a performance
issue - it looks like at least one reason Dave saw this in his
performance tests is that he also ran them on Skylake that now supports
the new SMAP code, which makes the normally very cheap user space
prefaulting noticeably more expensive.
Bisected-and-acked-by: Ted Ts'o <tytso@mit.edu>
Analyzed-and-acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Anna Schumaker [Mon, 5 Oct 2015 20:43:26 +0000 (16:43 -0400)]
NFS: Fix a tracepoint NULL-pointer dereference
Running xfstest generic/013 with the tracepoint nfs:nfs4_open_file
enabled produces a NULL-pointer dereference when calculating fileid and
filehandle of the opened file. Fix this by checking if state is NULL
before trying to use the inode pointer.
Reported-by: Olga Kornievskaia <aglo@umich.edu>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Chris Metcalf [Tue, 6 Oct 2015 16:37:41 +0000 (12:37 -0400)]
strscpy: zero any trailing garbage bytes in the destination
It's possible that the destination can be shadowed in userspace
(as, for example, the perf buffers are now). So we should take
care not to leak data that could be inspected by userspace.
Signed-off-by: Chris Metcalf <cmetcalf@ezchip.com>
Chris Metcalf [Tue, 6 Oct 2015 18:20:45 +0000 (14:20 -0400)]
word-at-a-time.h: support zero_bytemask() on alpha and tile
Both alpha and tile needed implementations of zero_bytemask.
The alpha version is untested.
Signed-off-by: Chris Metcalf <cmetcalf@ezchip.com>
Chris Metcalf [Tue, 6 Oct 2015 17:35:10 +0000 (13:35 -0400)]
word-at-a-time.h: fix some Kbuild files
arch/tile added word-at-a-time.h after the patch that added generic-y
entries; the generic-y entry is now stale.
arch/h8300 is newer than the generic-y patch for word-at-a-time.h,
and needs a generic-y entry.
arch/powerpc seems to have gotten a generic-y entry by mistake in
the first patch; this change removes it.
Signed-off-by: Chris Metcalf <cmetcalf@ezchip.com>
Yang Shi [Mon, 5 Oct 2015 21:32:51 +0000 (14:32 -0700)]
arm64: replace read_lock to rcu lock in call_break_hook
BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:917
in_atomic(): 0, irqs_disabled(): 128, pid: 342, name: perf
1 lock held by perf/342:
#0: (break_hook_lock){+.+...}, at: [<
ffffffc0000851ac>] call_break_hook+0x34/0xd0
irq event stamp: 62224
hardirqs last enabled at (62223): [<
ffffffc00010b7bc>] __call_rcu.constprop.59+0x104/0x270
hardirqs last disabled at (62224): [<
ffffffc0000fbe20>] vprintk_emit+0x68/0x640
softirqs last enabled at (0): [<
ffffffc000097928>] copy_process.part.8+0x428/0x17f8
softirqs last disabled at (0): [< (null)>] (null)
CPU: 0 PID: 342 Comm: perf Not tainted 4.1.6-rt5 #4
Hardware name: linux,dummy-virt (DT)
Call trace:
[<
ffffffc000089968>] dump_backtrace+0x0/0x128
[<
ffffffc000089ab0>] show_stack+0x20/0x30
[<
ffffffc0007030d0>] dump_stack+0x7c/0xa0
[<
ffffffc0000c878c>] ___might_sleep+0x174/0x260
[<
ffffffc000708ac8>] __rt_spin_lock+0x28/0x40
[<
ffffffc000708db0>] rt_read_lock+0x60/0x80
[<
ffffffc0000851a8>] call_break_hook+0x30/0xd0
[<
ffffffc000085a70>] brk_handler+0x30/0x98
[<
ffffffc000082248>] do_debug_exception+0x50/0xb8
Exception stack(0xffffffc00514fe30 to 0xffffffc00514ff50)
fe20:
00000000 00000000 c1594680 0000007f
fe40:
ffffffff ffffffff 92063940 0000007f 0550dcd8 ffffffc0 00000000 00000000
fe60:
0514fe70 ffffffc0 000be1f8 ffffffc0 0514feb0 ffffffc0 0008948c ffffffc0
fe80:
00000004 00000000 0514fed0 ffffffc0 ffffffff ffffffff 9282a948 0000007f
fea0:
00000000 00000000 9282b708 0000007f c1592820 0000007f 00083914 ffffffc0
fec0:
00000000 00000000 00000010 00000000 00000064 00000000 00000001 00000000
fee0:
005101e0 00000000 c1594680 0000007f c1594740 0000007f ffffffd8 ffffff80
ff00:
00000000 00000000 00000000 00000000 c1594770 0000007f c1594770 0000007f
ff20:
00665e10 00000000 7f7f7f7f 7f7f7f7f 01010101 01010101 00000000 00000000
ff40:
928e4cc0 0000007f 91ff11e8 0000007f
call_break_hook is called in atomic context (hard irq disabled), so replace
the sleepable lock to rcu lock, replace relevant list operations to rcu
version and call synchronize_rcu() in unregister_break_hook().
And, replace write lock to spinlock in {un}register_break_hook.
Signed-off-by: Yang Shi <yang.shi@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Mark Rutland [Tue, 6 Oct 2015 17:24:37 +0000 (18:24 +0100)]
arm64: Don't relocate non-existent initrd
When booting a kernel without an initrd, the kernel reports that it
moves -1 bytes worth, having gone through the motions with initrd_start
equal to initrd_end:
Moving initrd from [
4080000000-
407fffffff] to [
9fff49000-
9fff48fff]
Prevent this by bailing out early when the initrd size is zero (i.e. we
have no initrd), avoiding the confusing message and other associated
work.
Fixes:
1570f0d7ab425c1e ("arm64: support initrd outside kernel linear map")
Cc: Mark Salter <msalter@redhat.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Linus Torvalds [Tue, 6 Oct 2015 14:05:02 +0000 (15:05 +0100)]
Merge tag 'for-linus-4.3b-rc4-tag' of git://git./linux/kernel/git/xen/tip
Pull xen bug fixes from David Vrabel:
- Fix VM save performance regression with x86 PV guests
- Make kexec work in x86 PVHVM guests (if Xen has the soft-reset ABI)
- Other minor fixes.
* tag 'for-linus-4.3b-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
x86/xen/p2m: hint at the last populated P2M entry
x86/xen: Do not clip xen_e820_map to xen_e820_map_entries when sanitizing map
x86/xen: Support kexec/kdump in HVM guests by doing a soft reset
xen/x86: Don't try to write syscall-related MSRs for PV guests
xen: use correct type for HYPERVISOR_memory_op()
Linus Torvalds [Tue, 6 Oct 2015 13:59:36 +0000 (14:59 +0100)]
Merge branch 'for-linus' of git://git./linux/kernel/git/s390/linux
Pull s390 fixes from Martin Schwidefsky:
"Three bug fixes and an update to the default configuration"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/defconfig: set SCSI_DH=y
s390/vtime: correct scaled cputime of partially idle CPUs
s390/boot/decompression: disable floating point in decompressor
s390/numa: use correct type for node_to_cpumask_map
Linus Torvalds [Tue, 6 Oct 2015 13:30:21 +0000 (14:30 +0100)]
Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6
Pull CIFS fixes from Steve French:
"Two fixes for problems pointed out by automated tools.
Thanks PaX/grsecurity team and Dan Carpenter (and the Smatch tool)"
* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
[CIFS] Update cifs version number
[SMB3] Do not fall back to SMBWriteX in set_file_size error cases
[SMB3] Missing null tcon check
David Vrabel [Mon, 7 Sep 2015 16:14:08 +0000 (17:14 +0100)]
x86/xen/p2m: hint at the last populated P2M entry
With commit
633d6f17cd91ad5bf2370265946f716e42d388c6 (x86/xen: prepare
p2m list for memory hotplug) the P2M may be sized to accomdate a much
larger amount of memory than the domain currently has.
When saving a domain, the toolstack must scan all the P2M looking for
populated pages. This results in a performance regression due to the
unnecessary scanning.
Instead of reporting (via shared_info) the maximum possible size of
the P2M, hint at the last PFN which might be populated. This hint is
increased as new leaves are added to the P2M (in the expectation that
they will be used for populated entries).
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Cc: <stable@vger.kernel.org> # 4.0+
Mark Brown [Tue, 6 Oct 2015 11:00:42 +0000 (12:00 +0100)]
Merge remote-tracking branch 'regulator/fix/axp20x' into regulator-linus
Mark Brown [Tue, 6 Oct 2015 11:00:38 +0000 (12:00 +0100)]
Merge remote-tracking branch 'regulator/fix/core' into regulator-linus
Yang Shi [Wed, 30 Sep 2015 18:23:12 +0000 (19:23 +0100)]
arm64: convert patch_lock to raw lock
When running kprobe test on arm64 rt kernel, it reports the below warning:
root@qemu7:~# modprobe kprobe_example
BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:917
in_atomic(): 0, irqs_disabled(): 128, pid: 484, name: modprobe
CPU: 0 PID: 484 Comm: modprobe Not tainted 4.1.6-rt5 #2
Hardware name: linux,dummy-virt (DT)
Call trace:
[<
ffffffc0000891b8>] dump_backtrace+0x0/0x128
[<
ffffffc000089300>] show_stack+0x20/0x30
[<
ffffffc00061dae8>] dump_stack+0x1c/0x28
[<
ffffffc0000bbad0>] ___might_sleep+0x120/0x198
[<
ffffffc0006223e8>] rt_spin_lock+0x28/0x40
[<
ffffffc000622b30>] __aarch64_insn_write+0x28/0x78
[<
ffffffc000622e48>] aarch64_insn_patch_text_nosync+0x18/0x48
[<
ffffffc000622ee8>] aarch64_insn_patch_text_cb+0x70/0xa0
[<
ffffffc000622f40>] aarch64_insn_patch_text_sync+0x28/0x48
[<
ffffffc0006236e0>] arch_arm_kprobe+0x38/0x48
[<
ffffffc00010e6f4>] arm_kprobe+0x34/0x50
[<
ffffffc000110374>] register_kprobe+0x4cc/0x5b8
[<
ffffffbffc002038>] kprobe_init+0x38/0x7c [kprobe_example]
[<
ffffffc000084240>] do_one_initcall+0x90/0x1b0
[<
ffffffc00061c498>] do_init_module+0x6c/0x1cc
[<
ffffffc0000fd0c0>] load_module+0x17f8/0x1db0
[<
ffffffc0000fd8cc>] SyS_finit_module+0xb4/0xc8
Convert patch_lock to raw loc kto avoid this issue.
Although the problem is found on rt kernel, the fix should be applicable to
mainline kernel too.
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Yang Shi <yang.shi@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Mark Salyzyn [Mon, 21 Sep 2015 20:39:50 +0000 (21:39 +0100)]
arm64: readahead: fault retry breaks mmap file read random detection
This is the arm64 portion of commit
45cac65b0fcd ("readahead: fault
retry breaks mmap file read random detection"), which was absent from
the initial port and has since gone unnoticed. The original commit says:
> .fault now can retry. The retry can break state machine of .fault. In
> filemap_fault, if page is miss, ra->mmap_miss is increased. In the second
> try, since the page is in page cache now, ra->mmap_miss is decreased. And
> these are done in one fault, so we can't detect random mmap file access.
>
> Add a new flag to indicate .fault is tried once. In the second try, skip
> ra->mmap_miss decreasing. The filemap_fault state machine is ok with it.
With this change, Mark reports that:
> Random read improves by 250%, sequential read improves by 40%, and
> random write by 400% to an eMMC device with dm crypto wrapped around it.
Cc: Shaohua Li <shli@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Mark Salyzyn <salyzyn@android.com>
Signed-off-by: Riley Andrews <riandrews@android.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Yang Shi [Fri, 18 Sep 2015 21:09:00 +0000 (22:09 +0100)]
arm64: debug: Fix typo in debug-monitors.c
Fix comment typo: s/handers/handlers/
Signed-off-by: Yang Shi <yang.shi@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Linus Torvalds [Sun, 4 Oct 2015 15:57:17 +0000 (16:57 +0100)]
Linux 4.3-rc4
Linus Torvalds [Sun, 4 Oct 2015 15:31:13 +0000 (16:31 +0100)]
Merge branch 'strscpy' of git://git./linux/kernel/git/cmetcalf/linux-tile
Pull strscpy string copy function implementation from Chris Metcalf.
Chris sent this during the merge window, but I waffled back and forth on
the pull request, which is why it's going in only now.
The new "strscpy()" function is definitely easier to use and more secure
than either strncpy() or strlcpy(), both of which are horrible nasty
interfaces that have serious and irredeemable problems.
strncpy() has a useless return value, and doesn't NUL-terminate an
overlong result. To make matters worse, it pads a short result with
zeroes, which is a performance disaster if you have big buffers.
strlcpy(), by contrast, is a mis-designed "fix" for strlcpy(), lacking
the insane NUL padding, but having a differently broken return value
which returns the original length of the source string. Which means
that it will read characters past the count from the source buffer, and
you have to trust the source to be properly terminated. It also makes
error handling fragile, since the test for overflow is unnecessarily
subtle.
strscpy() avoids both these problems, guaranteeing the NUL termination
(but not excessive padding) if the destination size wasn't zero, and
making the overflow condition very obvious by returning -E2BIG. It also
doesn't read past the size of the source, and can thus be used for
untrusted source data too.
So why did I waffle about this for so long?
Every time we introduce a new-and-improved interface, people start doing
these interminable series of trivial conversion patches.
And every time that happens, somebody does some silly mistake, and the
conversion patch to the improved interface actually makes things worse.
Because the patch is mindnumbing and trivial, nobody has the attention
span to look at it carefully, and it's usually done over large swatches
of source code which means that not every conversion gets tested.
So I'm pulling the strscpy() support because it *is* a better interface.
But I will refuse to pull mindless conversion patches. Use this in
places where it makes sense, but don't do trivial patches to fix things
that aren't actually known to be broken.
* 'strscpy' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
tile: use global strscpy() rather than private copy
string: provide strscpy()
Make asm/word-at-a-time.h available on all architectures
Linus Torvalds [Sun, 4 Oct 2015 10:47:28 +0000 (11:47 +0100)]
Merge tag 'md/4.3-fixes' of git://neil.brown.name/md
Pull md fixes from Neil Brown:
"Assorted fixes for md in 4.3-rc.
Two tagged for -stable, and one is really a cleanup to match and
improve kmemcache interface.
* tag 'md/4.3-fixes' of git://neil.brown.name/md:
md/bitmap: don't pass -1 to bitmap_storage_alloc.
md/raid1: Avoid raid1 resync getting stuck
md: drop null test before destroy functions
md: clear CHANGE_PENDING in readonly array
md/raid0: apply base queue limits *before* disk_stack_limits
md/raid5: don't index beyond end of array in need_this_block().
raid5: update analysis state for failed stripe
md: wait for pending superblock updates before switching to read-only
Linus Torvalds [Sun, 4 Oct 2015 10:41:58 +0000 (11:41 +0100)]
Merge branch 'upstream' of git://git.linux-mips.org/ralf/upstream-linus
Pull MIPS updates from Ralf Baechle:
"This week's round of MIPS fixes:
- Fix JZ4740 build
- Fix fallback to GFP_DMA
- FP seccomp in case of ENOSYS
- Fix bootmem panic
- A number of FP and CPS fixes
- Wire up new syscalls
- Make sure BPF assembler objects can properly be disassembled
- Fix BPF assembler code for MIPS I"
* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
MIPS: scall: Always run the seccomp syscall filters
MIPS: Octeon: Fix kernel panic on startup from memory corruption
MIPS: Fix R2300 FP context switch handling
MIPS: Fix octeon FP context switch handling
MIPS: BPF: Fix load delay slots.
MIPS: BPF: Do all exports of symbols with FEXPORT().
MIPS: Fix the build on jz4740 after removing the custom gpio.h
MIPS: CPS: #ifdef on CONFIG_MIPS_MT_SMP rather than CONFIG_MIPS_MT
MIPS: CPS: Don't include MT code in non-MT kernels.
MIPS: CPS: Stop dangling delay slot from has_mt.
MIPS: dma-default: Fix 32-bit fall back to GFP_DMA
MIPS: Wire up userfaultfd and membarrier syscalls.
Linus Torvalds [Sun, 4 Oct 2015 10:40:09 +0000 (11:40 +0100)]
Merge branch 'irq-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
"This update contains:
- Fix for a long standing race affecting /proc/irq/NNN
- One line fix for ARM GICV3-ITS counting the wrong data
- Warning silencing in ARM GICV3-ITS. Another GCC trying to be
overly clever issue"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/gic-v3-its: Count additional LPIs for the aliased devices
irqchip/gic-v3-its: Silence warning when its_lpi_alloc_chunks gets inlined
genirq: Fix race in register_irq_proc()
Markos Chandras [Fri, 25 Sep 2015 07:17:42 +0000 (08:17 +0100)]
MIPS: scall: Always run the seccomp syscall filters
The MIPS syscall handler code used to return -ENOSYS on invalid
syscalls. Whilst this is expected, it caused problems for seccomp
filters because the said filters never had the change to run since
the code returned -ENOSYS before triggering them. This caused
problems on the chromium testsuite for filters looking for invalid
syscalls. This has now changed and the seccomp filters are always
run even if the syscall is invalid. We return -ENOSYS once we
return from the seccomp filters. Moreover, similar codepaths have
been merged in the process which simplifies somewhat the overall
syscall code.
Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/11236/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Steve French [Sat, 3 Oct 2015 21:54:17 +0000 (16:54 -0500)]
[CIFS] Update cifs version number
Update modinfo cifs.ko version number to 2.08
Signed-off-by: Steve French <steve.french@primarydata.com>
Linus Torvalds [Sat, 3 Oct 2015 14:53:05 +0000 (10:53 -0400)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"Fixes all around the map: W+X kernel mapping fix, WCHAN fixes, two
build failure fixes for corner case configs, x32 header fix and a
speling fix"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/headers/uapi: Fix __BITS_PER_LONG value for x32 builds
x86/mm: Set NX on gap between __ex_table and rodata
x86/kexec: Fix kexec crash in syscall kexec_file_load()
x86/process: Unify 32bit and 64bit implementations of get_wchan()
x86/process: Add proper bound checks in 64bit get_wchan()
x86, efi, kasan: Fix build failure on !KASAN && KMEMCHECK=y kernels
x86/hyperv: Fix the build in the !CONFIG_KEXEC_CORE case
x86/cpufeatures: Correct spelling of the HWP_NOTIFY flag
Linus Torvalds [Sat, 3 Oct 2015 14:51:41 +0000 (10:51 -0400)]
Merge branch 'timers-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull timer fixes from Ingo Molnar:
"An abs64() fix in the watchdog driver, and two clocksource driver
NO_IRQ assumption fixes"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
clocksource: Fix abs() usage w/ 64bit values
clocksource/drivers/keystone: Fix bad NO_IRQ usage
clocksource/drivers/rockchip: Fix bad NO_IRQ usage
Linus Torvalds [Sat, 3 Oct 2015 14:46:41 +0000 (10:46 -0400)]
Merge branch 'core-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull EFI fixes from Ingo Molnar:
"Two EFI fixes: one for x86, one for ARM, fixing a boot crash bug that
can trigger under newer EFI firmware"
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
arm64/efi: Fix boot crash by not padding between EFI_MEMORY_RUNTIME regions
x86/efi: Fix boot crash by mapping EFI memmap entries bottom-up at runtime, instead of top-down
Linus Torvalds [Sat, 3 Oct 2015 14:39:31 +0000 (10:39 -0400)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"Bunch of fixes all over the place, all pretty small: amdgpu, i915,
exynos, one qxl and one vmwgfx.
There is also a bunch of mst fixes, I left some cleanups in the series
as I didn't think it was worth splitting up the tested series"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (37 commits)
drm/dp/mst: add some defines for logical/physical ports
drm/dp/mst: drop cancel work sync in the mstb destroy path (v2)
drm/dp/mst: split connector registration into two parts (v2)
drm/dp/mst: update the link_address_sent before sending the link address (v3)
drm/dp/mst: fixup handling hotplug on port removal.
drm/dp/mst: don't pass port into the path builder function
drm/radeon: drop radeon_fb_helper_set_par
drm: handle cursor_set2 in restore_fbdev_mode
drm/exynos: Staticize local function in exynos_drm_gem.c
drm/exynos: fimd: actually disable dp clock
drm/exynos: dp: remove suspend/resume functions
drm/qxl: recreate the primary surface when the bo is not primary
drm/amdgpu: only print meaningful VM faults
drm/amdgpu/cgs: remove import_gpu_mem
drm/i915: Call non-locking version of drm_kms_helper_poll_enable(), v2
drm: Add a non-locking version of drm_kms_helper_poll_enable(), v2
drm/vmwgfx: Fix a command submission hang regression
drm/exynos: remove unused mode_fixup() code
drm/exynos: remove decon_mode_fixup()
drm/exynos: remove fimd_mode_fixup()
...
Linus Torvalds [Fri, 2 Oct 2015 21:53:25 +0000 (17:53 -0400)]
Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input
Pull input layer fixes from Dmitry Torokhov:
"Fixes for two recent regressions (in Synaptics PS/2 and uinput
drivers) and some more driver fixups"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Revert "Input: synaptics - fix handling of disabling gesture mode"
Input: psmouse - fix data race in __ps2_command
Input: elan_i2c - add all valid ic type for i2c/smbus
Input: zhenhua - ensure we have BITREVERSE
Input: omap4-keypad - fix memory leak
Input: serio - fix blocking of parport
Input: uinput - fix crash when using ABS events
Input: elan_i2c - expand maximum product_id form 0xFF to 0xFFFF
Input: elan_i2c - add ic type 0x03
Input: elan_i2c - don't require known iap version
Input: imx6ul_tsc - fix controller name
Input: imx6ul_tsc - use the preferred method for kzalloc()
Input: imx6ul_tsc - check for negative return value
Input: imx6ul_tsc - propagate the errors
Input: walkera0701 - fix abs() calculations on 64 bit values
Input: mms114 - remove unneded semicolons
Input: pm8941-pwrkey - remove unneded semicolon
Input: fix typo in MT documentation
Input: cyapa - fix address of Gen3 devices in device tree documentation
John Stultz [Tue, 15 Sep 2015 01:05:20 +0000 (18:05 -0700)]
clocksource: Fix abs() usage w/ 64bit values
This patch fixes one cases where abs() was being used with 64-bit
nanosecond values, where the result may be capped at 32-bits.
This potentially could cause watchdog false negatives on 32-bit
systems, so this patch addresses the issue by using abs64().
Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1442279124-7309-2-git-send-email-john.stultz@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Trond Myklebust [Fri, 2 Oct 2015 19:49:33 +0000 (15:49 -0400)]
Merge tag 'nfs-rdma-for-4.3-2' of git://git.linux-nfs.org/projects/anna/nfs-rdma
NFS: NFSoRDMA bugfix
Fixes a use-after-free bug.
Signed-off-by: Anna Schumaker <Anna.Schumaker@netapp.com>
Jeff Layton [Fri, 2 Oct 2015 17:14:37 +0000 (13:14 -0400)]
nfs4: reset states to use open_stateid when returning delegation voluntarily
When the client goes to return a delegation, it should always update any
nfs4_state currently set up to use that delegation stateid to instead
use the open stateid. It already does do this in some cases,
particularly in the state recovery code, but not currently when the
delegation is voluntarily returned (e.g. in advance of a RENAME). This
causes the client to try to continue using the delegation stateid after
the DELEGRETURN, e.g. in LAYOUTGET.
Set the nfs4_state back to using the open stateid in
nfs4_open_delegation_recall, just before clearing the
NFS_DELEGATED_STATE bit.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Benjamin Coddington [Thu, 1 Oct 2015 13:17:33 +0000 (09:17 -0400)]
NFSv4: Fix a nograce recovery hang
Since commit
5cae02f42793130e1387f4ec09c4d07056ce9fa5 an OPEN_CONFIRM should
have a privileged sequence in the recovery case to allow nograce recovery to
proceed for NFSv4.0.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Trond Myklebust [Fri, 2 Oct 2015 15:44:54 +0000 (11:44 -0400)]
NFSv4.1: nfs4_opendata_check_deleg needs to handle NFS4_OPEN_CLAIM_DELEG_CUR_FH
We need to warn against broken NFSv4.1 servers that try to hand out
delegations in response to NFS4_OPEN_CLAIM_DELEG_CUR_FH.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Trond Myklebust [Fri, 2 Oct 2015 15:11:16 +0000 (11:11 -0400)]
NFSv4: Don't try to reclaim unused state owners
Currently, we don't test if the state owner is in use before we try to
recover it. The problem is that if the refcount is zero, then the
state owner will be waiting on the lru list for garbage collection.
The expectation in that case is that if you bump the refcount, then
you must also remove the state owner from the lru list. Otherwise
the call to nfs4_put_state_owner will corrupt that list by trying
to add our state owner a second time.
Avoid the whole problem by just skipping state owners that hold no
state.
Reported-by: Andrew W Elble <aweits@rit.edu>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Trond Myklebust [Thu, 1 Oct 2015 22:38:27 +0000 (18:38 -0400)]
NFS: Fix a write performance regression
If all other conditions in nfs_can_extend_write() are met, and there
are no locks, then we should be able to assume close-to-open semantics
and the ability to extend our write to cover the whole page.
With this patch, the xfstests generic/074 test completes in 242s instead
of >1400s on my test rig.
Fixes:
bd61e0a9c852 ("locks: convert posix locks to file_lock_context")
Cc: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Trond Myklebust [Thu, 1 Oct 2015 15:36:38 +0000 (11:36 -0400)]
NFS: Fix up page writeback accounting
Currently, we are crediting all the calls to nfs_writepages_callback()
(i.e. the nfs_writepages() callback) to nfs_writepage(). Aside from
being inconsistent with the behaviour of the equivalent readpage/readpages
accounting, this also means that we cannot distinguish between bulk writes
and single page writebacks (which confuses the 'nfsiostat -p' tool).
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Linus Torvalds [Fri, 2 Oct 2015 18:54:16 +0000 (14:54 -0400)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
- Fix for transparent huge page change_protection() logic which was
inadvertently changing a huge pmd page into a pmd table entry.
- Function graph tracer panic fix caused by the return_to_handler code
corrupting the multi-regs function return value (composite types).
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: ftrace: fix function_graph tracer panic
arm64: Fix THP protection change logic
Linus Torvalds [Fri, 2 Oct 2015 18:51:46 +0000 (14:51 -0400)]
Merge branch 'for-linus' of git://git./linux/kernel/git/geert/linux-m68k
Pull m68k updates from Geert Uytterhoeven:
"Summary:
- Fix for accidental modification of arguments of syscall functions
- Wire up new syscalls
- Update defconfigs"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k:
m68k/defconfig: Update defconfigs for v4.3-rc1
m68k: Define asmlinkage_protect
m68k: Wire up membarrier
m68k: Wire up userfaultfd
m68k: Wire up direct socket calls
Marc Zyngier [Fri, 2 Oct 2015 15:44:06 +0000 (16:44 +0100)]
irqchip/gic-v3-its: Count additional LPIs for the aliased devices
When configuring the interrupt mapping for a new device, we
iterate over all the possible aliases to account for their
maximum MSI allocation. This was introduced by
e8137f4f5088
("irqchip: gicv3-its: Iterate over PCI aliases to generate ITS configuration").
Turns out that the code doing that is a bit braindead, and repeatedly
accounts for the same device over and over.
Fix this by counting the actual alias that is passed to us by the
core code.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Alex Shi <alex.shi@linaro.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: David Daney <ddaney.cavm@gmail.com>
Cc: Jason Cooper <jason@lakedaemon.net>
Link: http://lkml.kernel.org/r/1443800646-8074-3-git-send-email-marc.zyngier@arm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Marc Zyngier [Fri, 2 Oct 2015 15:44:05 +0000 (16:44 +0100)]
irqchip/gic-v3-its: Silence warning when its_lpi_alloc_chunks gets inlined
More agressive inlining in recent versions of GCC have uncovered
a new set of warnings:
drivers/irqchip/irq-gic-v3-its.c: In function its_msi_prepare:
drivers/irqchip/irq-gic-v3-its.c:1148:26: warning: lpi_base may be used
uninitialized in this function [-Wmaybe-uninitialized]
dev->event_map.lpi_base = lpi_base;
^
drivers/irqchip/irq-gic-v3-its.c:1116:6: note: lpi_base was declared here
int lpi_base;
^
drivers/irqchip/irq-gic-v3-its.c:1149:25: warning: nr_lpis may be used
uninitialized in this function [-Wmaybe-uninitialized]
dev->event_map.nr_lpis = nr_lpis;
^
drivers/irqchip/irq-gic-v3-its.c:1117:6: note: nr_lpis was declared here
int nr_lpis;
^
The warning is fairly benign (there is no code path that could
actually use uninitialized variables), but let's silence it anyway
by zeroing the variables on the error path.
Reported-by: Alex Shi <alex.shi@linaro.org>
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: David Daney <ddaney.cavm@gmail.com>
Cc: Jason Cooper <jason@lakedaemon.net>
Link: http://lkml.kernel.org/r/1443800646-8074-2-git-send-email-marc.zyngier@arm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Linus Torvalds [Fri, 2 Oct 2015 18:46:15 +0000 (14:46 -0400)]
Merge tag 'dmaengine-fix-4.3-rc4' of git://git.infradead.org/users/vkoul/slave-dma
Pull dmaengine fixes from Vinod Koul:
"This contains fixes spread throughout the drivers, and also fixes one
more instance of privatecnt in dmaengine.
Driver fixes summary:
- bunch of pxa_dma fixes for reuse of descriptor issue, residue and
no-requestor
- odd fixes in xgene, idma, sun4i and zxdma
- at_xdmac fixes for cleaning descriptor and block addr mode"
* tag 'dmaengine-fix-4.3-rc4' of git://git.infradead.org/users/vkoul/slave-dma:
dmaengine: pxa_dma: fix residue corner case
dmaengine: pxa_dma: fix the no-requestor case
dmaengine: zxdma: Fix off-by-one for testing valid pchan request
dmaengine: at_xdmac: clean used descriptor
dmaengine: at_xdmac: change block increment addressing mode
dmaengine: dw: properly read DWC_PARAMS register
dmaengine: xgene-dma: Fix overwritting DMA tx ring
dmaengine: fix balance of privatecnt
dmaengine: sun4i: fix unsafe list iteration
dmaengine: idma64: improve residue estimation
dmaengine: xgene-dma: fix handling xgene_dma_get_ring_size result
dmaengine: pxa_dma: fix initial list move
Linus Torvalds [Fri, 2 Oct 2015 18:40:57 +0000 (14:40 -0400)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"Another week, another round of fixes.
These have been brewing for a bit and in various iterations, but I
feel pretty comfortable about the quality of them. They fix real
issues. The pull request is mostly blk-mq related, and the only one
not fixing a real bug, is the tag iterator abstraction from Christoph.
But it's pretty trivial, and we'll need it for another fix soon.
Apart from the blk-mq fixes, there's an NVMe affinity fix from Keith,
and a single fix for xen-blkback from Roger fixing failure to free
requests on disconnect"
* 'for-linus' of git://git.kernel.dk/linux-block:
blk-mq: factor out a helper to iterate all tags for a request_queue
blk-mq: fix racy updates of rq->errors
blk-mq: fix deadlock when reading cpu_list
blk-mq: avoid inserting requests before establishing new mapping
blk-mq: fix q->mq_usage_counter access race
blk-mq: Fix use after of free q->mq_map
blk-mq: fix sysfs registration/unregistration race
blk-mq: avoid setting hctx->tags->cpumask before allocation
NVMe: Set affinity after allocating request queues
xen/blkback: free requests on disconnection
Dmitry Torokhov [Fri, 2 Oct 2015 17:31:32 +0000 (10:31 -0700)]
Revert "Input: synaptics - fix handling of disabling gesture mode"
This reverts commit
e51e38494a8ecc18650efb0c840600637891de2c: we
actually do want the device to work in extended W mode, as this is the
mode that allows us receiving multiple contact information.
Cc: stable@vger.kernel.org
Matt Bennett [Wed, 30 Sep 2015 04:40:42 +0000 (17:40 +1300)]
MIPS: Octeon: Fix kernel panic on startup from memory corruption
During development it was found that a number of builds would panic
during the kernel init process, more specifically in 'delayed_fput()'.
The panic showed the kernel trying to access a memory address of
'0xb7fdc00' while traversing the 'delayed_fput_list' structure.
Comparing this memory address to the value of the pointer used on
builds that did not panic confirmed that the pointer on crashing
builds must have been corrupted at some stage earlier in the init
process.
By traversing the list earlier and earlier in the code it was found
that 'plat_mem_setup()' was responsible for corrupting the list.
Specifically the line:
memory = cvmx_bootmem_phy_alloc(mem_alloc_size,
__pa_symbol(&__init_end), -1,
0x100000,
CVMX_BOOTMEM_FLAG_NO_LOCKING);
Which would eventually call:
cvmx_bootmem_phy_set_size(new_ent_addr,
cvmx_bootmem_phy_get_size
(ent_addr) -
(desired_min_addr -
ent_addr));
Where 'new_ent_addr'=0x4800000 (the address of 'delayed_fput_list')
and the second argument (size)=0xb7fdc00 (the address causing the
kernel panic). The job of this part of 'plat_mem_setup()' is to
allocate chunks of memory for the kernel to use. At the start of
each chunk of memory the size of the chunk is written, hence the
value 0xb7fdc00 is written onto memory at 0x4800000, therefore the
kernel panics when it goes back to access 'delayed_fput_list' later
on in the initialisation process.
On builds that were not crashing it was found that the compiler had
placed 'delayed_fput_list' at 0x4800008, meaning it wasn't corrupted
(but something else in memory was overwritten).
As can be seen in the first function call above the code begins to
allocate chunks of memory beginning from the symbol '__init_end'.
The MIPS linker script (vmlinux.lds.S) however defines the .bss
section to begin after '__init_end'. Therefore memory within the
.bss section is allocated to the kernel to use (System.map shows
'delayed_fput_list' and other kernel structures to be in .bss).
To stop the kernel panic (and the .bss section being corrupted)
memory should begin being allocated from the symbol '_end'.
Signed-off-by: Matt Bennett <matt.bennett@alliedtelesis.co.nz>
Acked-by: David Daney <david.daney@cavium.com>
Cc: linux-mips@linux-mips.org
Cc: aleksey.makarov@auriga.com
Patchwork: https://patchwork.linux-mips.org/patch/11251/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Paul Burton [Mon, 21 Sep 2015 17:07:42 +0000 (10:07 -0700)]
MIPS: Fix R2300 FP context switch handling
Commit
1a3d59579b9f ("MIPS: Tidy up FPU context switching") removed FP
context saving from the asm-written resume function in favour of reusing
existing code to perform the same task. However it only removed the FP
context saving code from the r4k_switch.S implementation of resume.
Remove it from the r2300_switch.S implementation too in order to prevent
attempting to save the FP context twice, which would likely lead to an
exception from the second save because the FPU had already been disabled
by the first save.
This patch has only been build tested, using rbtx49xx_defconfig.
Fixes:
1a3d59579b9f ("MIPS: Tidy up FPU context switching")
Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: linux-mips@linux-mips.org
Cc: Maciej W. Rozycki <macro@linux-mips.org>
Cc: linux-kernel@vger.kernel.org
Cc: Manuel Lauss <manuel.lauss@gmail.com>
Patchwork: https://patchwork.linux-mips.org/patch/11167/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Paul Burton [Mon, 21 Sep 2015 17:07:41 +0000 (10:07 -0700)]
MIPS: Fix octeon FP context switch handling
Commit
1a3d59579b9f ("MIPS: Tidy up FPU context switching") removed FP
context saving from the asm-written resume function in favour of reusing
existing code to perform the same task. However it only removed the FP
context saving code from the r4k_switch.S implementation of resume.
Octeon uses its own implementation in octeon_switch.S, so remove FP
context saving there too in order to prevent attempting to save context
twice. That formerly led to an exception from the second save as follows
because the FPU had already been disabled by the first save:
do_cpu invoked from kernel context![#1]:
CPU: 0 PID: 2 Comm: kthreadd Not tainted 4.3.0-rc2-dirty #2
task:
800000041f84a008 ti:
800000041f864000 task.ti:
800000041f864000
$ 0 :
0000000000000000 0000000010008ce1 0000000000100000 ffffffffbfffffff
$ 4 :
800000041f84a008 800000041f84ac08 800000041f84c000 0000000000000004
$ 8 :
0000000000000001 0000000000000000 0000000000000000 0000000000000001
$12 :
0000000010008ce3 0000000000119c60 0000000000000036 800000041f864000
$16 :
800000041f84ac08 800000000792ce80 800000041f84a008 ffffffff81758b00
$20 :
0000000000000000 ffffffff8175ae50 0000000000000000 ffffffff8176c740
$24 :
0000000000000006 ffffffff81170300
$28 :
800000041f864000 800000041f867d90 0000000000000000 ffffffff815f3fa0
Hi :
0000000000fa8257
Lo :
ffffffffe15cfc00
epc :
ffffffff8112821c resume+0x9c/0x200
ra :
ffffffff815f3fa0 __schedule+0x3f0/0x7d8
Status:
10008ce2 KX SX UX KERNEL EXL
Cause :
1080002c (ExcCode 0b)
PrId :
000d0601 (Cavium Octeon+)
Modules linked in:
Process kthreadd (pid: 2, threadinfo=
800000041f864000, task=
800000041f84a008, tls=
0000000000000000)
Stack :
ffffffff81604218 ffffffff815f7e08 800000041f84a008 ffffffff811681b0
800000041f84a008 ffffffff817e9878 0000000000000000 ffffffff81770000
ffffffff81768340 ffffffff81161398 0000000000000001 0000000000000000
0000000000000000 ffffffff815f4424 0000000000000000 ffffffff81161d68
ffffffff81161be8 0000000000000000 0000000000000000 0000000000000000
0000000000000000 0000000000000000 0000000000000000 ffffffff8111e16c
0000000000000000 0000000000000000 0000000000000000 0000000000000000
0000000000000000 0000000000000000 0000000000000000 0000000000000000
0000000000000000 0000000000000000 0000000000000000 0000000000000000
0000000000000000 0000000000000000 0000000000000000 0000000000000000
...
Call Trace:
[<
ffffffff8112821c>] resume+0x9c/0x200
[<
ffffffff815f3fa0>] __schedule+0x3f0/0x7d8
[<
ffffffff815f4424>] schedule+0x34/0x98
[<
ffffffff81161d68>] kthreadd+0x180/0x198
[<
ffffffff8111e16c>] ret_from_kernel_thread+0x14/0x1c
Tested using cavium_octeon_defconfig on an EdgeRouter Lite.
Fixes:
1a3d59579b9f ("MIPS: Tidy up FPU context switching")
Reported-by: Aaro Koskinen <aaro.koskinen@nokia.com>
Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: linux-mips@linux-mips.org
Cc: Aleksey Makarov <aleksey.makarov@auriga.com>
Cc: linux-kernel@vger.kernel.org
Cc: Chandrakala Chavva <cchavva@caviumnetworks.com>
Cc: David Daney <david.daney@cavium.com>
Cc: Leonid Rosenboim <lrosenboim@caviumnetworks.com>
Patchwork: https://patchwork.linux-mips.org/patch/11166/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Linus Torvalds [Fri, 2 Oct 2015 12:03:04 +0000 (08:03 -0400)]
Merge tag 'mmc-v4.3-rc3' of git://git.linaro.org/people/ulf.hansson/mmc
Pull MMC fixes from Ulf Hansson:
"Here are some mmc fixes intended for v4.3 rc4:
MMC core:
- Allow users of mmc_of_parse() to succeed when CONFIG_GPIOLIB is
unset
- Prevent infinite loop of re-tuning for CRC-errors for CMD19 and
CMD21
MMC host:
- pxamci: Fix issues with card detect
- sunxi: Fix clk-delay settings"
* tag 'mmc-v4.3-rc3' of git://git.linaro.org/people/ulf.hansson/mmc:
mmc: core: fix dead loop of mmc_retune
mmc: pxamci: fix card detect with slot-gpio API
mmc: sunxi: Fix clk-delay settings
mmc: core: Don't return an error for CD/WP GPIOs when GPIOLIB is unset
Linus Torvalds [Fri, 2 Oct 2015 11:59:29 +0000 (07:59 -0400)]
Merge git://git.infradead.org/intel-iommu
Pull IOVA fixes from David Woodhouse:
"The main fix here is the first one, fixing the over-allocation of
size-aligned requests. The other patches simply make the existing
IOVA code available to users other than the Intel VT-d driver, with no
functional change.
I concede the latter really *should* have been submitted during the
merge window, but since it's basically risk-free and people are
waiting to build on top of it and it's my fault I didn't get it in, I
(and they) would be grateful if you'd take it"
* git://git.infradead.org/intel-iommu:
iommu: Make the iova library a module
iommu: iova: Export symbols
iommu: iova: Move iova cache management to the iova library
iommu/iova: Avoid over-allocating when size-aligned
Li Bin [Wed, 30 Sep 2015 02:49:55 +0000 (10:49 +0800)]
arm64: ftrace: fix function_graph tracer panic
When function graph tracer is enabled, the following operation
will trigger panic:
mount -t debugfs nodev /sys/kernel
echo next_tgid > /sys/kernel/tracing/set_ftrace_filter
echo function_graph > /sys/kernel/tracing/current_tracer
ls /proc/
------------[ cut here ]------------
[ 198.501417] Unable to handle kernel paging request at virtual address
cb88537fdc8ba316
[ 198.506126] pgd =
ffffffc008f79000
[ 198.509363] [
cb88537fdc8ba316] *pgd=
00000000488c6003, *pud=
00000000488c6003, *pmd=
0000000000000000
[ 198.517726] Internal error: Oops:
94000005 [#1] SMP
[ 198.518798] Modules linked in:
[ 198.520582] CPU: 1 PID: 1388 Comm: ls Tainted: G
[ 198.521800] Hardware name: linux,dummy-virt (DT)
[ 198.522852] task:
ffffffc0fa9e8000 ti:
ffffffc0f9ab0000 task.ti:
ffffffc0f9ab0000
[ 198.524306] PC is at next_tgid+0x30/0x100
[ 198.525205] LR is at return_to_handler+0x0/0x20
[ 198.526090] pc : [<
ffffffc0002a1070>] lr : [<
ffffffc0000907c0>] pstate:
60000145
[ 198.527392] sp :
ffffffc0f9ab3d40
[ 198.528084] x29:
ffffffc0f9ab3d40 x28:
ffffffc0f9ab0000
[ 198.529406] x27:
ffffffc000d6a000 x26:
ffffffc000b786e8
[ 198.530659] x25:
ffffffc0002a1900 x24:
ffffffc0faf16c00
[ 198.531942] x23:
ffffffc0f9ab3ea0 x22:
0000000000000002
[ 198.533202] x21:
ffffffc000d85050 x20:
0000000000000002
[ 198.534446] x19:
0000000000000002 x18:
0000000000000000
[ 198.535719] x17:
000000000049fa08 x16:
ffffffc000242efc
[ 198.537030] x15:
0000007fa472b54c x14:
ffffffffff000000
[ 198.538347] x13:
ffffffc0fada84a0 x12:
0000000000000001
[ 198.539634] x11:
ffffffc0f9ab3d70 x10:
ffffffc0f9ab3d70
[ 198.540915] x9 :
ffffffc0000907c0 x8 :
ffffffc0f9ab3d40
[ 198.542215] x7 :
0000002e330f08f0 x6 :
0000000000000015
[ 198.543508] x5 :
0000000000000f08 x4 :
ffffffc0f9835ec0
[ 198.544792] x3 :
cb88537fdc8ba316 x2 :
cb88537fdc8ba306
[ 198.546108] x1 :
0000000000000002 x0 :
ffffffc000d85050
[ 198.547432]
[ 198.547920] Process ls (pid: 1388, stack limit = 0xffffffc0f9ab0020)
[ 198.549170] Stack: (0xffffffc0f9ab3d40 to 0xffffffc0f9ab4000)
[ 198.582568] Call trace:
[ 198.583313] [<
ffffffc0002a1070>] next_tgid+0x30/0x100
[ 198.584359] [<
ffffffc0000907bc>] ftrace_graph_caller+0x6c/0x70
[ 198.585503] [<
ffffffc0000907bc>] ftrace_graph_caller+0x6c/0x70
[ 198.586574] [<
ffffffc0000907bc>] ftrace_graph_caller+0x6c/0x70
[ 198.587660] [<
ffffffc0000907bc>] ftrace_graph_caller+0x6c/0x70
[ 198.588896] Code:
aa0003f5 2a0103f4 b4000102 91004043 (
885f7c60)
[ 198.591092] ---[ end trace
6a346f8f20949ac8 ]---
This is because when using function graph tracer, if the traced
function return value is in multi regs ([x0-x7]), return_to_handler
may corrupt them. So in return_to_handler, the parameter regs should
be protected properly.
Cc: <stable@vger.kernel.org> # 3.18+
Signed-off-by: Li Bin <huawei.libin@huawei.com>
Acked-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Ralf Baechle [Fri, 2 Oct 2015 07:48:57 +0000 (09:48 +0200)]
MIPS: BPF: Fix load delay slots.
The entire bpf_jit_asm.S is written in noreorder mode because "we know
better" according to a comment. This also prevented the assembler from
throwing in the required NOPs for MIPS I processors which have no
load-use interlock, thus the load's consumer might end up using the
old value of the register from prior to the load.
Fixed by putting the assembler in reorder mode for just the affected
load instructions. This is not enough for gas to actually try to be
clever by looking at the next instruction and inserting a nop only
when needed but as the comment said "we know better", so getting gas
to unconditionally emit a NOP is just right in this case and prevents
adding further ifdefery.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Ben Hutchings [Thu, 1 Oct 2015 00:40:43 +0000 (01:40 +0100)]
x86/headers/uapi: Fix __BITS_PER_LONG value for x32 builds
On x32, gcc predefines __x86_64__ but long is only 32-bit. Use
__ILP32__ to distinguish x32.
Fixes this compiler error in perf:
tools/include/asm-generic/bitops/__ffs.h: In function '__ffs':
tools/include/asm-generic/bitops/__ffs.h:19:8: error: right shift count >= width of type [-Werror=shift-count-overflow]
word >>= 32;
^
This isn't sufficient to build perf for x32, though.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/1443660043.2730.15.camel@decadent.org.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
NeilBrown [Thu, 1 Oct 2015 06:03:38 +0000 (16:03 +1000)]
md/bitmap: don't pass -1 to bitmap_storage_alloc.
Passing -1 to bitmap_storage_alloc() causes page->index to be set to
-1, which is quite problematic.
So only pass ->cluster_slot if mddev_is_clustered().
Fixes:
b97e92574c0b ("Use separate bitmaps for each nodes in the cluster")
Cc: stable@vger.kernel.org (v4.1+)
Signed-off-by: NeilBrown <neilb@suse.com>
Jes Sorensen [Wed, 16 Sep 2015 14:20:05 +0000 (10:20 -0400)]
md/raid1: Avoid raid1 resync getting stuck
close_sync() needs to set conf->next_resync to a large, but safe value
below MaxSector and use it to determine whether or not to set
start_next_window in wait_barrier()
Solution suggested by Neil Brown.
Reported-by: Nate Dailey <nate.dailey@stratus.com>
Tested-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.com>
Julia Lawall [Sun, 13 Sep 2015 12:15:10 +0000 (14:15 +0200)]
md: drop null test before destroy functions
Remove unneeded NULL test.
The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)
// <smpl>
@@ expression x; @@
-if (x != NULL)
\(kmem_cache_destroy\|mempool_destroy\|dma_pool_destroy\)(x);
// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: NeilBrown <neilb@suse.com>