Yunlei He [Mon, 31 Aug 2015 09:15:10 +0000 (17:15 +0800)]
f2fs: upset segment_info repair
upset segment_info like this:
276000|161 0|0 4|70 3|0 3|0 0|0 0|91 4|0 4|232 4|39
276104|0 4|0 4|1 4|0 4|0 4|280 4|0 4|42 4|262 4|38
276204|179 4|89 4|39 4|24 4|0 4|96 4|3 4|428 4|0 4|118
276304|112 4|97 4|0 4|0 4|0 4|68 4|0 4|0 4|86 4|138
276404|0 4|0 0|166 5|39 4|101 0|111
Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Fri, 28 Aug 2015 10:18:57 +0000 (18:18 +0800)]
f2fs: avoid accessing NULL pointer in f2fs_drop_largest_extent
If extent cache is disable, we will encounter oops when triggering direct
IO as below:
BUG: unable to handle kernel NULL pointer dereference at
0000000c
IP: [<
f0b9c61e>] f2fs_drop_largest_extent+0xe/0x30 [f2fs]
*pdpt =
000000002bb9a001 *pde =
0000000000000000
Oops: 0000 [#1] SMP
Modules linked in: f2fs(O) fuse bnep rfcomm bluetooth nfsd dm_crypt nfs_acl auth_rpcgss oid_registry nfs binfmt_misc fscache lockd
sunrpc grace snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer
snd_seq_device snd soundcore joydev psmouse hid_generic i2c_piix4 serio_raw ppdev mac_hid parport_pc lp parport ext4 jbd2 mbcache
usbhid hid e1000
CPU: 3 PID: 3608 Comm: dd Tainted: G O 4.2.0-rc4 #12
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
task:
ef161600 ti:
ebd5e000 task.ti:
ebd5e000
EIP: 0060:[<
f0b9c61e>] EFLAGS:
00010202 CPU: 3
EIP is at f2fs_drop_largest_extent+0xe/0x30 [f2fs]
EAX:
00000000 EBX:
ddebc000 ECX:
00000000 EDX:
00000000
ESI:
ebd5fdf8 EDI:
00000000 EBP:
ebd5fd58 ESP:
ebd5fd58
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
CR0:
80050033 CR2:
0000000c CR3:
2c24ee40 CR4:
000006f0
Stack:
ebd5fda4 f0b8c005 00000000 00000001 00000000 f0b8c430 c816cd68 ddebc000
ddebc088 00001000 00000555 00000555 ffffffff c160bb00 00055501 00000000
00000000 00000100 00000000 ebd5fe20 f0b8c430 00000046 ef161600 00001000
Call Trace:
[<
f0b8c005>] __allocate_data_block+0x1a5/0x260 [f2fs]
[<
f0b8c430>] ? f2fs_direct_IO+0x370/0x440 [f2fs]
[<
c160bb00>] ? down_read+0x30/0x50
[<
f0b8c430>] f2fs_direct_IO+0x370/0x440 [f2fs]
[<
c113e115>] generic_file_direct_write+0xa5/0x260
[<
c10b53f8>] ? current_fs_time+0x18/0x50
[<
c113e38b>] __generic_file_write_iter+0xbb/0x210
[<
c113e50f>] ? generic_file_write_iter+0x2f/0x320
[<
c113e63c>] generic_file_write_iter+0x15c/0x320
[<
f0b77f29>] f2fs_file_write_iter+0x39/0x80 [f2fs]
[<
c11984d9>] __vfs_write+0xa9/0xe0
[<
c1199227>] vfs_write+0x97/0x180
[<
c119955b>] SyS_write+0x5b/0xd0
[<
c160dcd0>] sysenter_do_call+0x12/0x12
Code: 10 8b 50 1c 89 53 14 eb ca 8d 74 26 00 85 f6 74 86 eb a6 0f 0b 90 8d b4 26 00 00 00 00 55 89 e5 3e 8d 74 26 00 8b 80 d4 02 00
00 <8b> 48 0c 39 d1 77 0e 03 48 14 39 ca 73 07 c7 40 14 00 00 00 00
EIP: [<
f0b9c61e>] f2fs_drop_largest_extent+0xe/0x30 [f2fs] SS:ESP 0068:
ebd5fd58
CR2:
000000000000000c
---[ end trace
a38c07026a1afffd ]---
This is because when extent cache is disable, extent_tree pointer in struct
f2fs_inode_info should be NULL, but in f2fs_drop_largest_extent we access
this NULL pointer directly without checking state of extent cache, then,
the oops occurs. Let's fix it by checking state of extent cache before
accessing.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Wed, 26 Aug 2015 12:34:48 +0000 (20:34 +0800)]
f2fs: update extent tree in batches
This patch introduce a new helper f2fs_update_extent_tree_range which can
do extent mapping update at a specified range.
The main idea is:
1) punch all mapping info in extent node(s) which are at a specified range;
2) try to merge new extent mapping with adjacent node, or failing that,
insert the mapping into extent tree as a new node.
In order to see the benefit, I add a function for stating time stamping
count as below:
uint64_t rdtsc(void)
{
uint32_t lo, hi;
__asm__ __volatile__ ("rdtsc" : "=a" (lo), "=d" (hi));
return (uint64_t)hi << 32 | lo;
}
My test environment is: ubuntu, intel i7-3770, 16G memory, 256g micron ssd.
truncation path: update extent cache from truncate_data_blocks_range
non-truncataion path: update extent cache from other paths
total: all update paths
a) Removing 128MB file which has one extent node mapping whole range of
file:
1. dd if=/dev/zero of=/mnt/f2fs/128M bs=1M count=128
2. sync
3. rm /mnt/f2fs/128M
Before:
total count average
truncation:
7651022 32768 233.49
Patched:
total count average
truncation: 3321 33 100.64
b) fsstress:
fsstress -d /mnt/f2fs -l 5 -n 100 -p 20
Test times: 5 times.
Before:
total count average
truncation:
5812480.6 20911.6 277.95
non-truncation:
7783845.6 13440.8 579.12
total:
13596326.2 34352.4 395.79
Patched:
total count average
truncation:
1281283.0 3041.6 421.25
non-truncation:
7355844.4 13662.8 538.38
total:
8637127.4 16704.4 517.06
1) For the updates in truncation path:
- we can see updating in batches leads total tsc and update count reducing
explicitly;
- besides, for a single batched updating, punching multiple extent nodes
in a loop, result in executing more operations, so our average tsc
increase intensively.
2) For the updates in non-truncation path:
- there is a little improvement, that is because for the scenario that we
just need to update in the head or tail of extent node, new interface
optimize to update info in extent node directly, rather than removing
original extent node for updating and then inserting that updated one
into cache as new node.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Mon, 24 Aug 2015 09:40:45 +0000 (17:40 +0800)]
f2fs: fix to release inode correctly
In following call stack, if unfortunately we lose all chances to truncate
inode page in remove_inode_page, eventually we will add the nid allocated
previously into free nid cache, this nid is with NID_NEW status and with
NEW_ADDR in its blkaddr pointer:
- f2fs_create
- f2fs_add_link
- __f2fs_add_link
- init_inode_metadata
- new_inode_page
- new_node_page
- set_node_addr(, NEW_ADDR)
- f2fs_init_acl failed
- remove_inode_page failed
- handle_failed_inode
- remove_inode_page failed
- iput
- f2fs_evict_inode
- remove_inode_page failed
- alloc_nid_failed cache a nid with valid blkaddr: NEW_ADDR
This may not only cause resource leak of previous inode, but also may cause
incorrect use of the previous blkaddr which is located in NO.nid node entry
when this nid is reused by others.
This patch tries to add this inode to orphan list if we fail to truncate
inode, so that we can obtain a second chance to release it in orphan
recovery flow.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Mon, 24 Aug 2015 09:39:42 +0000 (17:39 +0800)]
f2fs: handle f2fs_truncate error correctly
This patch fixes to return error number of f2fs_truncate, so that we
can handle the error correctly in callers.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Mon, 24 Aug 2015 09:36:25 +0000 (17:36 +0800)]
f2fs: avoid unneeded initializing when converting inline dentry
When converting inline dentry, we will zero out target dentry page before
duplicating data of inline dentry into target page, it become overhead
since inline dentry size is not small.
So this patch tries to remove unneeded initializing in the space of target
dentry page.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Zhang Zhen [Mon, 24 Aug 2015 02:41:32 +0000 (10:41 +0800)]
f2fs: atomically set inode->i_flags
According to commit
5f16f3225b06 ("ext4: atomically set inode->i_flags in
ext4_set_inode_flags()").
Signed-off-by: Zhang Zhen <zhenzhang.zhang@huawei.com>
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Sat, 22 Aug 2015 06:37:18 +0000 (23:37 -0700)]
f2fs: fix wrong pointer access during try_to_free_nids
If we release the lock in list_for_each_entry_safe, we can lose the tmp
pointer by alloc_nid.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Thu, 20 Aug 2015 15:51:56 +0000 (08:51 -0700)]
f2fs: use __GFP_NOFAIL to avoid infinite loop
__GFP_NOFAIL can avoid retrying the whole path of kmem_cache_alloc and
bio_alloc.
And, it also fixes the use cases of GFP_ATOMIC correctly.
Suggested-by: Chao Yu <chao2.yu@samsung.com>
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Wed, 19 Aug 2015 11:16:09 +0000 (19:16 +0800)]
f2fs: lookup neighbor extent nodes for merging later
In __lookup_extent_tree_ret we will not try to find neighbor nodes if
we find the target node, in this condition, we will lost the chance to
merge the new mapping with exist extent node later.
So our extent cache of inode will be fragmented after overwrite exist
file, we can see the number of extent node increases intensively in
following test case:
dd if=/dev/zero of=/mnt/f2fs/4m bs=4K count=1024
Extent Cache:
- Hit Count: L1-1:0 L1-2:0 L2:0
- Hit Ratio: 0% (0 / 3072)
- Inner Struct Count: tree: 1, node: 1
dd if=/dev/zero of=/mnt/f2fs/4m bs=4K count=1024 conv=notrunc
Extent Cache:
- Hit Count: L1-1:2048 L1-2:0 L2:0
- Hit Ratio: 33% (2048 / 6144)
- Inner Struct Count: tree: 1, node: 961
This patch fixes to lookup neighbors of target node for further
merging.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Wed, 19 Aug 2015 11:15:09 +0000 (19:15 +0800)]
f2fs: split __insert_extent_tree_ret for readability
This patch splits __insert_extent_tree_ret into __try_merge_extent_node &
__insert_extent_tree for code readability.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Wed, 19 Aug 2015 11:14:15 +0000 (19:14 +0800)]
f2fs: kill dead code in __insert_extent_tree
After commit
0f825ee6e873 ("f2fs: add new interfaces for extent tree"),
f2fs_init_extent_tree becomes the only caller of __insert_extent_tree, and
in f2fs_init_extent_tree, we will only insert extent node in an empty tree,
so __try_{back,front}_merge in __insert_extent_tree will never be called.
This patch removes these dead codes, besides, rename __insert_extent_tree
to __init_extent_tree for readability.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Wed, 19 Aug 2015 11:13:25 +0000 (19:13 +0800)]
f2fs: adjust showing of extent cache stat
This patch alters to replace total hit stat with rbtree hit stat,
and then adjust showing of extent cache stat:
Hit Count:
L1-1: for largest node hit count;
L1-2: for last cached node hit count;
L2: for extent node hit after lookuping in rbtree.
Hit Ratio:
ratio (hit count / total lookup count)
Inner Struct Count:
tree count, node count.
Before:
Extent Hit Ratio: 0 / 2
Extent Tree Count: 3
Extent Node Count: 2
Patched:
Exten Cacache:
- Hit Count: L1-1:4871 L1-2:2074 L2:208
- Hit Ratio: 1% (7153 / 550751)
- Inner Struct Count: tree: 26560, node: 11824
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Wed, 19 Aug 2015 11:12:20 +0000 (19:12 +0800)]
f2fs: add largest/cached stat in extent cache
This patch adds to stat the hit count of largest/cached node for showing
in debugfs.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Wed, 19 Aug 2015 11:11:19 +0000 (19:11 +0800)]
f2fs: fix incorrect mapping for bmap
The test step is like below:
1. touch file
2. truncate -s $((1024*1024)) file
3. fallocate -o 0 -l $((1024*1024)) file
4. fibmap.f2fs file
Our result of fibmap.f2fs showed below is not correct:
file_pos start_blk end_blk blks
0 -
937166132 -
937166132 1
4096 -
937166132 -
937166132 1
8192 -
937166132 -
937166132 1
12288 -
937166132 -
937166132 1
16384 -
937166132 -
937166132 1
20480 -
937166132 -
937166132 1
...
1040384 -
937166132 -
937166132 1
1044480 -
937166132 -
937166132 1
This is because f2fs_map_blocks will return with no error when meeting
a hole or preallocated block, the caller __get_data_block will map the
uninitialized variable value to bh->b_blocknr.
Unfortunately generic_block_bmap will neither check the return value of
get_data() nor check mapping info of buffer_head, result in returning
the random block address.
After fixing the issue, our result shows correctly:
file_pos start_blk end_blk blks
0 0 0 256
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Wed, 19 Aug 2015 11:02:02 +0000 (19:02 +0800)]
f2fs: add annotation for space utilization of regular/inline dentry
Add annotation to let us know more clearly about space utilization
information of regular dentry and inline dentry.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Fan Li [Tue, 18 Aug 2015 09:13:13 +0000 (17:13 +0800)]
f2fs: fix to update cached_en of extent tree properly
In f2fs_lookup_extent_tree, et->cached_en was read and updated with only
read lock held,
it could cause __lookup_extent_tree within return entirely wrong
extent_node, if other
thread update et->cached_en just before __lookup_extent_tree return.
However, there are two things about this patch that need to be noticed:
1. It does no good to arrange the order of concurrent read/write, the result
would still
be random in such case.
2. It's built on this assumption: the mix up of reads and writes on a single
pointer would
not make the pointer partially wrong at any time. Please let me know if I'm
wrong, thx.
Signed-off-by: Fan li <fanofcode.li@samsung.com>
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Junesung Lee [Tue, 18 Aug 2015 13:42:15 +0000 (22:42 +0900)]
f2fs: fix typo
Fix typo.
Signed-off-by: Junesung Lee <junesoung412@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Sun, 16 Aug 2015 20:04:50 +0000 (13:04 -0700)]
f2fs: check the node block address of newly allocated nid
This patch adds a routine which checks the block address of newly allocated nid.
If an nid has already allocated by other thread due to subtle data races, it
will result in filesystem corruption.
So, it needs to check whether its block address was already allocated or not
in prior to nid allocation as the last chance.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Sun, 16 Aug 2015 19:38:15 +0000 (12:38 -0700)]
f2fs: go out for insert_inode_locked failure
We should not call unlock_new_inode when insert_inode_locked failed.
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Sun, 16 Aug 2015 05:06:08 +0000 (22:06 -0700)]
f2fs: retry gc if one section is not successfully reclaimed
If FG_GC failed to reclaim one section, let's retry with another section
from the start, since we can get anoterh good candidate.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Sun, 16 Aug 2015 04:51:05 +0000 (21:51 -0700)]
f2fs: fix to cover lock_op for update_inode_page
Previously, update_inode_page is not called under f2fs_lock_op.
Instead we should call with f2fs_write_inode.
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Sat, 15 Aug 2015 00:57:29 +0000 (17:57 -0700)]
f2fs: reuse nids more aggressively
If we can reuse nids as many as possible, we can mitigate producing obsolete
node pages in the page cache.
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Fri, 14 Aug 2015 21:37:50 +0000 (14:37 -0700)]
f2fs: avoid garbage collecting already moved node blocks
If node blocks were already moved, we don't need to move them again.
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Fri, 14 Aug 2015 18:43:56 +0000 (11:43 -0700)]
f2fs: handle failed bio allocation
As the below comment of bio_alloc_bioset, f2fs can allocate multiple bios at the
same time. So, we can't guarantee that bio is allocated all the time.
"
* When @bs is not NULL, if %__GFP_WAIT is set then bio_alloc will always be
* able to allocate a bio. This is due to the mempool guarantees. To make this
* work, callers must never allocate more than 1 bio at a time from this pool.
* Callers that need to allocate more than 1 bio must always submit the
* previously allocated bio for IO before attempting to allocate a new one.
* Failure to do so can cause deadlocks under memory pressure.
"
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Mon, 10 Aug 2015 22:01:12 +0000 (15:01 -0700)]
f2fs: increase the number of max hard links
This patch increases the number of maximum hard links for one file.
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Wed, 12 Aug 2015 04:59:49 +0000 (21:59 -0700)]
f2fs: skip checkpoint if there is no dirty and prefree segments
We should avoid needless checkpoints when there is no dirty and prefree segment.
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Tue, 28 Jul 2015 10:33:46 +0000 (18:33 +0800)]
f2fs: shrink free_nids entries
This patch introduces __count_free_nids/try_to_free_nids and registers
them in slab shrinker for shrinking under memory pressure.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Wed, 12 Aug 2015 09:48:21 +0000 (17:48 +0800)]
f2fs: avoid clear valid page
In f2fs_delete_entry, if last dirent is remove from the dentry page,
we will try to punch that page since it has no valid date in it.
But truncate_hole which is used for punching could fail because of
no memory or IO error, if that happened, we'd better skip clearing
this valid dentry page.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Wed, 12 Aug 2015 09:47:08 +0000 (17:47 +0800)]
MAINTAINERS: add myself as a dedicated reviewer of f2fs
I volunteer to be a dedicated reviewer of f2fs, add my email address in
maintainship entry of f2fs.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Tue, 11 Aug 2015 19:45:39 +0000 (12:45 -0700)]
f2fs: do not write any node pages related to orphan inodes
We should not write node pages when deleting orphan inodes.
In order to do that, we can eaisly set POR_DOING flag earlier before entering
orphan inode routine.
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Tue, 11 Aug 2015 23:01:30 +0000 (16:01 -0700)]
f2fs: avoid a build warning
If F2FS_CHECK_FS is turned off, we can get a build warning for unused variable.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Fri, 7 Aug 2015 09:58:43 +0000 (17:58 +0800)]
f2fs: handle error of f2fs_iget correctly
In recover_orphan_inode, whenever f2fs_iget fail, we will make kernel panic,
but it's not reasonable, because f2fs_iget can fail due to a lot of reasons
including out of memory.
So we change error handling method as below:
a) when finding no entry for the orphan inode, bug_on for catching bugs;
b) for other reasons, report it to caller.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Tue, 11 Aug 2015 17:17:27 +0000 (10:17 -0700)]
f2fs: do not assign a new segment for dio under space shortage
If there is not enough free segment, we should not assign a new segment
explicitly. Otherwise, we can run out of free segment.
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Fri, 7 Aug 2015 10:42:09 +0000 (18:42 +0800)]
f2fs: remove inmem radix tree
Previously, we use radix tree to index all registered page entries for
atomic file, but now we only use radix tree to see whether current page
is indexed or not, since the other user of radix tree is gone in commit
042b7816aaeb ("f2fs: remove unnecessary call to invalidate inmemory pages").
So in this patch, we try to use one more efficient way:
Introducing a macro ATOMIC_WRITTEN_PAGE, and setting it as page private
value to indicate page indexing status. By using this way, we can save
memory and lookup time.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Fri, 7 Aug 2015 10:39:32 +0000 (18:39 +0800)]
f2fs: report EINVAL for unalignment direct IO
We run ltp testcase with f2fs and obtain a TFAIL in diotest4, the result in
detail is as fallow:
dio04
<<<test_start>>>
tag=dio04 stime=
1432278894
cmdline="diotest4"
contacts=""
analysis=exit
<<<test_output>>>
diotest4 1 TPASS : Negative Offset
diotest4 2 TPASS : removed
diotest4 3 TFAIL : diotest4.c:129: write allows odd count.returns 1: Success
diotest4 4 TFAIL : diotest4.c:183: Odd count of read and write
diotest4 5 TPASS : Read beyond the file size
......
the result of ext4 with same environment:
dio04
<<<test_start>>>
tag=dio04 stime=
1432259643
cmdline="diotest4"
contacts=""
analysis=exit
<<<test_output>>>
diotest4 1 TPASS : Negative Offset
diotest4 2 TPASS : removed
diotest4 3 TPASS : Odd count of read and write
diotest4 4 TPASS : Read beyond the file size
......
The reason is that when triggering DIO in f2fs, we will return zero value
in ->direct_IO if writer's buffer offset, file offset and transfer size is
not alignment to block size of filesystem, resulting in falling back into
buffered write instead of returning -EINVAL.
This patch fixes that problem by returning correct error number for above
case, and removing the judgement condition in check_direct_IO to make sure
the verification will be enabled for direct reader too.
Besides, Jaegeuk Kim pointed out that there is expectional cases we should
always make direct-io falling back into buffered write, such as dio in
encrypted file.
Signed-off-by: Yunlei He <heyunlei@huawei.com>
[Chao Yu make small change and add detail description in commit message]
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Fri, 7 Aug 2015 10:36:06 +0000 (18:36 +0800)]
f2fs: report error of fill_zero
fill_zero can fail due to a lot of reason, but previously we do not handle
its return value, so its callers such as punch_hole/f2fs_zero_range may
report success, but actually can fail because of error occurs inside
fill_zero.
This patch fixes to report correct return value of fill_zero.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Wed, 5 Aug 2015 09:23:54 +0000 (17:23 +0800)]
f2fs: recover invalid/reserved block address for fsynced file
When testing with generic/101 in xfstests, error message outputed as below:
--- tests/generic/101.out
+++ results//generic/101.out.bad
@@ -10,10 +10,14 @@
File foo content after log replay:
0000000 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa
*
-
0200000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
+
0200000 bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb
*
0372000
...
(Run 'diff -u tests/generic/101.out results/generic/101.out.bad' to see the entire diff)
The test flow is like below:
1. pwrite foo -S 0xaa 0 64K
2. pwrite foo -S 0xbb 64K 61K
3. sync
4. truncate foo 64K
5. truncate foo 125K
6. fsync foo
7. flakey drop writes
8. umount
After this test, we expect the data of recovered file will have the first
64k of data filling with value 0xaa and the next 61k of data filling with
value 0x00 because we have fsynced it before dropping writes in dm.
In f2fs, during recovering, we will only recover the valid block address
in direct node page if it is marked as a fsynced dnode, but block address
which means invalid/reserved (with value NULL_ADDR/NEW_ADDR) will not be
recovered. So, the file recovered shows its incorrect data 0xbb in range of
[61k, 125k].
In this patch, we fix to recover invalid/reserved block during recover flow.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Fan Li [Wed, 5 Aug 2015 07:52:16 +0000 (15:52 +0800)]
f2fs: use extent cache to optimize f2fs_reserve_block
In some cases, we only need the block address when we call
f2fs_reserve_block,
other fields of struct dnode_of_data aren't necessary.
We can try extent cache first for such cases in order to speed up the
process.
Signed-off-by: Fan li <fanofcode.li@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Tue, 28 Jul 2015 10:36:47 +0000 (18:36 +0800)]
f2fs: invalidate temporary meta page
To avoid meeting garbage data in next free node block at the end of warm
node chain when doing recovery, we will try to zero out that invalid block.
If the device is not support discard, our way for zeroing out block is:
grabbing a temporary zeroed page in meta inode, then, issue write request
with this page.
But, we forget to release that temporary page, so our memory usage will
increase without gaining any hit ratio benefit, so it's better to free it
for saving memory.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Tue, 14 Jul 2015 10:14:06 +0000 (18:14 +0800)]
f2fs: fix to release inode page correctly
In following call path, we will pass a locked and referenced ipage
pointer to get_new_data_page:
- init_inode_metadata
- make_empty_dir
- get_new_data_page
There are two exit paths in get_new_data_page when error occurs:
1) grab_cache_page fails, ipage will not be released;
2) f2fs_reserve_block fails, ipage will be released in callee.
So, it's not consistent for error handling in get_new_data_page.
For f2fs_reserve_block, it's not very easy to change the rule
of error handling, since it's already complicated.
Here we deside to choose an easy way to fix this issue:
If any error occur in get_new_data_page, we will ensure releasing
ipage in this function.
The same issue is in f2fs_convert_inline_dir, fix that too.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Liu Xue [Mon, 27 Jul 2015 10:17:59 +0000 (10:17 +0000)]
f2fs: unify f2fs_bug_on when check blocks and segment
Replace BUG_ON with f2fs_bug_on to deal with
block and segment validity check failed.
Signed-off-by: Xue Liu <liuxueliu.liu@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Wed, 29 Jul 2015 09:33:13 +0000 (17:33 +0800)]
f2fs: freeze filesystem when fail to update meta page due to IO error
In get_meta_page, we guarantee no failure for the returned page,
but sometimes, IO error from device will incur returning an
non-updated page.
Then, we still use this page as updated one, exception could happen
when using this kind of page.
So in this condition, we'd better freeze fs by making fs readonly and
and stop doing checkpoint.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Fan Li [Tue, 4 Aug 2015 05:27:51 +0000 (13:27 +0800)]
f2fs: change the timing of f2fs_wait_on_page_writeback
some backing devices need pages to be stable during writeback. It doesn't
matter if
the page is completely overwritten or already uptodate, it needs to wait
before write.
Signed-off-by: Fan li <fanofcode.li@samsung.com>
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Sat, 25 Jul 2015 07:52:52 +0000 (00:52 -0700)]
f2fs: handle error cases in commit_inmem_pages
This patch adds to handle error cases in commit_inmem_pages.
If an error occurs, it stops to write the pages and return the error right
away.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Fri, 24 Jul 2015 10:26:26 +0000 (18:26 +0800)]
f2fs: fix to build free nids from readaheaded nat pages
When there is no enough free nids in free nid cache, we will try to
readahead FREE_NID_PAGES:4 nat pages into page cache of meta_inode,
then, reading nat entries in nat page for adding free nids to free nid
cache.
But when traversing all nat pages we readaheaded in a circulation,
our exit condition is not set right, one more nat page will be scanned
without readaheading, resulting worse read performance.
This patch fixes to read the correct number nat pages to avoid bad
performance.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Fri, 24 Jul 2015 10:24:45 +0000 (18:24 +0800)]
f2fs: fix inline data/dentry stat number leak
If we clear inline data/dentry flag in handle_failed_inode, we will fail
to decline the stat count of inline data/dentry in f2fs_evict_inode due
to no flag in inode. So remove the wrong clearing.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Fri, 17 Jul 2015 10:06:35 +0000 (18:06 +0800)]
f2fs: convert inline data before set atomic/volatile flag
In f2fs_ioc_start_{atomic,volatile}_write, if we failed in converting
inline data, we will report error to user, but still remain atomic/volatile
flag in inode, it will impact further writes for this file. Fix it.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Fri, 17 Jul 2015 10:05:21 +0000 (18:05 +0800)]
f2fs: fix to wait all atomic written pages writeback
This patch fixes the incorrect range (0, LONG_MAX) which is used
in ranged fsync. If we use LONG_MAX as the parameter for indicating
the end of file we want to synchronize, in 32-bits architecture
machine, these datas after 4GB offset may not be persisted in
storage after ->fsync returned.
Here, we alter LONG_MAX to LLONG_MAX to fix this issue.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Fri, 17 Jul 2015 10:02:39 +0000 (18:02 +0800)]
f2fs: skip writing in ->writepages when no dirty pages exist
When flushing comes from background, if there is no dirty page in the
mapping of inode, we'd better to skip seeking dirty page from mapping
for writebacking.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Tiezhu Yang [Fri, 17 Jul 2015 04:56:00 +0000 (12:56 +0800)]
f2fs: optimize f2fs_write_cache_pages
The if statement "goto continue_unlock" is exactly the same when
each if condition is true that is depended on the value of both
"step" and "is_cold_data(page)" are 0 or 1. That means when the
value of "step" equals to "is_cold_data(page)", the if condition
is true and the if statement "goto continue_unlock" appears only
once, so it can be optimized to reduce the duplicated code.
Signed-off-by: Tiezhu Yang <kernelpatch@126.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Thu, 16 Jul 2015 10:19:02 +0000 (18:19 +0800)]
f2fs: fix double lock in handle_failed_inode
In handle_failed_inode, there is a potential deadlock which can happen
in below call path:
- f2fs_create
- f2fs_lock_op down_read(cp_rwsem)
- f2fs_add_link
- __f2fs_add_link
- init_inode_metadata
- f2fs_init_security failed
- truncate_blocks failed
- handle_failed_inode
- f2fs_truncate
- truncate_blocks(..,true)
- write_checkpoint
- block_operations
- f2fs_lock_all down_write(cp_rwsem)
- f2fs_lock_op down_read(cp_rwsem)
So in this path, we pass parameter to f2fs_truncate to make sure
cp_rwsem in truncate_blocks will not be locked again.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Thu, 16 Jul 2015 10:18:11 +0000 (18:18 +0800)]
f2fs: reduce region of cp_rwsem covered in f2fs_do_collapse
In f2fs_do_collapse, region cp_rwsem covered is large, since it will be
held until all blocks are left shifted, so if we try to collapse small
area at the beginning of large file, checkpoint who want to grab writer's
lock of cp_rwsem will be delayed for long time.
In order to avoid this condition, altering to lock/unlock cp_rwsem each
shift operation.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Fan Li [Wed, 15 Jul 2015 10:05:17 +0000 (18:05 +0800)]
f2fs: add new interfaces for extent tree
Add a lookup and a insertion interface for extent tree.
The new lookup return the insert position and the prev/next
extents closest to the offset we lookup when find no match.
The new insertion uses above parameters to improve performance.
There are three possible insertions after the lookup in
f2fs_update_extent_tree, two of them insert parts of removed extent
back to tree, since no merge happens during this process, new insertion
skips the merge check in this scanario; the another insertion inserts a
new extent to tree, new insertion uses prev/next extent and insert
position to insert this extent directly, and save the time of searching
down the tree.
As long as tree remains unchanged between lookup and insertion, this
would work fine. And the new lookup would be useful when add
multi-blocks extent support for insertion interface.
Signed-off-by: Fan li <fanofcode.li@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Wed, 15 Jul 2015 20:08:21 +0000 (13:08 -0700)]
f2fs: callers take care of the page from bio error
This patch changes for a caller to handle the page after its bio gets an error.
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Wed, 15 Jul 2015 09:29:49 +0000 (17:29 +0800)]
f2fs: use atomic_t to record hit ratio info of extent cache
Variables for recording extent cache ratio info were updated without
protection, this patch tries to alter them to atomic_t type for more
accurate stat.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Wed, 15 Jul 2015 09:28:53 +0000 (17:28 +0800)]
f2fs: stat inline xattr inode number
This patch adds to stat the number of inline xattr inode for
showing in debugfs.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Tue, 14 Jul 2015 01:31:24 +0000 (18:31 -0700)]
f2fs: use a page temporarily for encrypted gced page
That encrypted page is used temporarily, so we don't need to mark it accessed.
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Tue, 14 Jul 2015 10:56:10 +0000 (18:56 +0800)]
f2fs: expose f2fs_write_cache_pages
If there are gced dirty pages and normal dirty pages in the mapping
of one inode, we might writeback them alternately with discontinuous
block address, resulting in low performance.
This patch introduces f2fs_write_cache_pages with codes copied from
write_cache_pages in mm/page-writeback.c.
In this function, we refactor flow with two steps:
1) writeback all cold type pages.
2) writeback all non-cold type pages.
By using this method, f2fs will writeback dirty pages with the same
temperature in bunch mode, it makes writeouted block being with
more continuous address, so they can be merged as much as possible
in f2fs bio cache, and also it will reduce the chance of submiting
small IO from block layer.
Test environment: 8g nokia sd card (very old sd card, but it shows
better effect when testing with this patch, and with a 32g kingston
sd card, I didn't see much more improvement).
Test step:
1. touch testfile;
2. truncate -s 512K testfile;
3. write all pages with odd index;
4. trigger gc by ioctl;
5. write all pages with even index;
6. time fsync testfile.
before:
real 0m0.402s
user 0m0.000s
sys 0m0.000s
after:
real 0m0.143s
user 0m0.004s
sys 0m0.004s
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Mon, 13 Jul 2015 09:45:19 +0000 (17:45 +0800)]
f2fs: correct return value of ->setxattr
This patch fixes to return correct error number of ->setxattr, which
is reported by xfstest tests/generic/026 as below:
generic/026 - output mismatch
--- tests/generic/026.out
+++ results/generic/026.out.bad
@@ -4,6 +4,6 @@
1 below acl max
acl max
1 above acl max
-chacl: cannot set access acl on "largeaclfile": Argument list too long
+chacl: cannot set access acl on "largeaclfile": Numerical result out of range
use 16 aces
use 17 aces
...
Ran: generic/026
Failures: generic/026
Failed 1 of 1 tests
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Mon, 13 Jul 2015 09:44:25 +0000 (17:44 +0800)]
f2fs: cleanup write_orphan_inodes
Previously, since 'commit
4531929e3922 ("f2fs: move grabing orphan
pages out of protection region")' was committed, in write_orphan_inodes(),
we will grab all meta page in a batch before we use them under spinlock,
so that we can avoid large time delay of grabbing meta pages under
spinlock.
Now, 'commit
d6c67a4fee86 ("f2fs: revmove spin_lock for
write_orphan_inodes")' remove the spinlock in write_orphan_inodes,
so there is no issue we describe above, we'd better recover to move
the grab operation to original place for readability.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Mon, 13 Jul 2015 09:43:19 +0000 (17:43 +0800)]
f2fs: warm up cold page after mmaped write
With cost-benifit method, background gc will consider old section with
fewer valid blocks as candidate victim, these old blocks in section will
be treated as cold data, and laterly will be moved into cold segment.
But if the gcing page is attached by user through buffered or mmaped
write, we should reset the page as non-cold one, because this page may
have more opportunity for further updating.
So fix to add clearing code for the missed 'mmap' case.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Fri, 10 Jul 2015 10:08:10 +0000 (18:08 +0800)]
f2fs: add new ioctl F2FS_IOC_GARBAGE_COLLECT
When background gc is off, the only way to trigger gc is executing
a force gc in some operations who wants to grab space in disk.
The executing condition is limited: to execute force gc, we should
wait for the time when there is almost no more free section for LFS
allocation. This seems not reasonable for our user who wants to
control triggering gc by himself.
This patch introduces F2FS_IOC_GARBAGE_COLLECT interface for
triggering garbage collection by using ioctl. It provides our users
one more option to trigger gc.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Wed, 8 Jul 2015 09:59:36 +0000 (17:59 +0800)]
f2fs: maintain extent cache in separated file
This patch moves extent cache related code from data.c into extent_cache.c
since extent cache is independent feature, and its codes are not relate to
others in data.c, it's better for us to maintain them in separated place.
There is no functionality change, but several small coding style fixes
including:
* rename __drop_largest_extent to f2fs_drop_largest_extent for exporting;
* rename misspelled word 'untill' to 'until';
* remove unneeded 'return' in the end of f2fs_destroy_extent_tree().
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Fan Li [Wed, 8 Jul 2015 08:02:54 +0000 (16:02 +0800)]
f2fs: don't try to split extents shorter than F2FS_MIN_EXTENT_LEN
Since only parts of extents longer than F2FS_MIN_EXTENT_LEN will
be kept in extent cache after split, extents already shorter than
F2FS_MIN_EXTENT_LEN don't need to try split at all.
Signed-off-by: Fan Li <fanofcode.li@samsung.com>
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Wed, 8 Jul 2015 10:24:38 +0000 (18:24 +0800)]
f2fs: fix to update page flag
This patch fixes to update page flag (e.g. Uptodate/cold flag) in
->write_begin.
Otherwise, page will be non-uptodate when we try to write entire
page, and cold data flag in page will not be clean when gced page
is being rewritten.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Mon, 29 Jun 2015 23:34:39 +0000 (16:34 -0700)]
f2fs: shrink unreferenced extent_caches first
If an extent_tree entry has a zero reference count, we can drop it from the
cache in higher priority rather than currently referencing entries.
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Mon, 6 Jul 2015 12:31:49 +0000 (20:31 +0800)]
f2fs: enhance multithread performance
In ->writepages, we use writepages mutex lock to serialize all block
address allocation and page submitting pairs from different inodes.
This method makes our delayed dirty pages of one inode being written
continously as many as possible.
But there is one problem that we did not submit current cached bio in
protection region of writepages mutex lock, so there is a small chance
that we submit the one of other thread's as below, resulting in
splitting more bios.
thread 1 thread 2
->writepages
lock(writepages)
->write_cache_pages
unlock(writepages)
lock(writepages)
->write_cache_pages
->f2fs_submit_merged_bio
->writepage
unlock(writepages)
fs_mark-6535 [002] .... 2242.270230: f2fs_submit_write_bio: dev = (1,0), WRITE_SYNC, DATA, sector =
5766152, size = 524288
fs_mark-6536 [000] .... 2242.270361: f2fs_submit_write_bio: dev = (1,0), WRITE_SYNC, DATA, sector =
5767176, size = 4096
fs_mark-6536 [000] .... 2242.270370: f2fs_submit_write_bio: dev = (1,0), WRITE_SYNC, NODE, sector =
8138112, size = 4096
fs_mark-6535 [002] .... 2242.270776: f2fs_submit_write_bio: dev = (1,0), WRITE_SYNC, DATA, sector =
5767184, size = 516096
This may really increase time of block layer works, and may cause
larger IO lantency.
This patch moves the submitting operation into region of writepages
mutex lock to avoid bio splits when concurrently writebacking is
intensive.
my test environment: virtual machine,
intel cpu i5 2500, 8GB size memory, 4GB size ramdisk
time fs_mark -t 16 -L 1 -s 524288 -S 1 -d /mnt/f2fs/
before:
real 0m4.244s
user 0m0.088s
sys 0m12.336s
after:
real 0m3.822s
user 0m0.072s
sys 0m10.760s
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Mon, 6 Jul 2015 12:30:40 +0000 (20:30 +0800)]
f2fs: restrict multimedia filename
When testing with fs_mark, some blocks were written out as cold
data which were mixed with warm data, resulting in splitting more
bios.
This is because fs_mark will create file with random filename as
below:
559551ee~~~~~~~~15Z29OCC05JCKQP60JQ42MKV
559551ee~~~~~~~~NZAZ6X8OA8LHIIP6XD0L58RM
559551ef~~~~~~~~B15YDSWAK789HPSDZKYTW6WM
559551f1~~~~~~~~2DAE5DPS79785BUNTFWBEMP3
559551f1~~~~~~~~1MYDY0BKSQCJPI32Q8C514RM
559551f1~~~~~~~~YQOTMAOMN5CVRFOUNI026MP4
559551f3~~~~~~~~1WF42LPRTQJNPPGR3EINKMPE
559551f3~~~~~~~~8Y2NRK7CEPPAA02LY936PJPG
They are regarded as cold file since their filename are ended with
multimedia files' extension, but this should be wrong as we only
match the extension of filename, not the whole one.
In this patch, we try to fix the format of multimedia filename to:
"filename + '.' + extension", then we set cold file only its
filename matches the format.
So after this change, it will reduce the probability we set the
wrong cold file, also it helps a little for fs_mark's performance
on f2fs.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Mon, 6 Jul 2015 12:29:46 +0000 (20:29 +0800)]
MAINTAINERS: add missed trace file for f2fs
This patch adds missed trace file in maintainer-ship of f2fs,
so it completes the description of files maintained in f2fs,
and also it allows people to find correct mailing list by using
get_maintainer.pl when only patching the trace file of f2fs.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Nicholas Krause [Wed, 1 Jul 2015 01:37:21 +0000 (21:37 -0400)]
f2fs: make the function check_dnode have a return type of bool and change it's name to is_alive
This makes the function check_dnode have a return type of bool
due to this particular function only ever returning either one
or zero as its return value and changes the name of the function
to is_alive in order to better explain this function's intended
work of checking if a dnode is still in use by the filesystem.
Signed-off-by: Nicholas Krause <xerofoify@gmail.com>
[Jaegeuk Kim: change the return value check for the renamed function]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Mon, 29 Jun 2015 23:01:14 +0000 (16:01 -0700)]
f2fs: check the largest extent at look-up time
Because of the extent shrinker or other -ENOMEM scenarios, it cannot guarantee
that the largest extent would be cached in the tree all the time.
Instead of relying on extent_tree, we can simply check the cached one in extent
tree accordingly.
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Sat, 20 Jun 2015 00:53:26 +0000 (17:53 -0700)]
f2fs: use extent_cache by default
We don't need to handle the duplicate extent information.
The integrated rule is:
- update on-disk extent with largest one tracked by in-memory extent_cache
- destroy extent_tree for the truncation case
- drop per-inode extent_cache by shrinker
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Fri, 26 Jun 2015 00:43:04 +0000 (17:43 -0700)]
f2fs: add noextent_cache mount option
This patch adds noextent_cache mount option.
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Fri, 19 Jun 2015 20:41:23 +0000 (13:41 -0700)]
f2fs: shrink extent_cache entries
This patch registers shrinking extent_caches.
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Fri, 19 Jun 2015 22:36:07 +0000 (15:36 -0700)]
f2fs: shrink nat_cache entries
This patch registers shrinking nat_cache entries.
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Fri, 19 Jun 2015 19:01:21 +0000 (12:01 -0700)]
f2fs: introduce a shrinker for mounted fs
This patch introduces a shrinker targeting to reduce memory footprint consumed
by a number of in-memory f2fs data structures.
In addition, it newly adds:
- sbi->umount_mutex to avoid data races on shrinker and put_super
- sbi->shruinker_run_no to not revisit objects
Note that the basic implementation was copied from fs/ubifs/shrinker.c
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Tue, 23 Jun 2015 01:22:38 +0000 (18:22 -0700)]
f2fs: set cached_en after checking finally
This patch relocates cached_en not only to be covered by spin_lock, but also
to set once after checking out completely.
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Tue, 16 Jun 2015 22:17:01 +0000 (15:17 -0700)]
f2fs: update on-disk extents even under extent_cache
Previously, f2fs_update_extent_cache() updates in-memory extent_cache all the
time, and then finally preserves its up-to-date extent into on-disk one during
f2fs_evict_inode.
But, in the following scenario:
1. mount
2. open & write an extent X
3. f2fs_evict_inode; on-disk extent is X
4. open & update the extent X with Y
5. sync; trigger checkpoint
6. power-cut
after power-on, f2fs should serve extent Y, but we have an on-disk extent X.
This causes a failure on xfstests/311.
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Thu, 18 Jun 2015 21:17:04 +0000 (14:17 -0700)]
f2fs: fix wrong block address calculation for a split extent
This patch fixes wrong calculation on block address field when an extent is
split.
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Wed, 17 Jun 2015 20:59:05 +0000 (13:59 -0700)]
f2fs: convert inline_data for various fallocate
For newly added fallocate types, it should convert inline_data before handling
block swapping.
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Tue, 23 Jun 2015 17:36:08 +0000 (10:36 -0700)]
f2fs: avoid to use failed inode immediately
Before iput is called, the inode number used by a bad inode can be reassigned
to other new inode, resulting in any abnormal behaviors on the new inode.
This should not happen for the new inode.
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Mon, 15 Jun 2015 21:52:29 +0000 (14:52 -0700)]
f2fs: avoid freed stat information
The write_checkpoint can update stat information, so we should destroy the stat
structure after it.
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Mon, 29 Jun 2015 10:14:10 +0000 (18:14 +0800)]
f2fs: fix to record dirty page count for symlink
Dirty page can be exist in mapping of newly created symlink, but previously
we did not maintain the counting of dirty page for symlink like we maintained
for regular/directory, so the counting we lookuped should be wrong.
This patch adds missed dirty page counting for symlink to fix this issue.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Markus Elfring [Fri, 26 Jun 2015 15:28:55 +0000 (17:28 +0200)]
f2fs crypto: delete an unnecessary check before the function call "key_put"
The key_put() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Linus Torvalds [Tue, 4 Aug 2015 16:27:19 +0000 (09:27 -0700)]
Merge tag 'pci-v4.2-fixes-1' of git://git./linux/kernel/git/helgaas/pci
Pull PCI fix from Bjorn Helgaas:
"This is a trivial fix for a change that broke user program compilation
(QEMU in this case)"
* tag 'pci-v4.2-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
PCI: Restore PCI_MSIX_FLAGS_BIRMASK definition
Linus Torvalds [Tue, 4 Aug 2015 15:51:06 +0000 (08:51 -0700)]
Merge tag 'topic/mst-fixes-2015-08-04' of git://anongit.freedesktop.org/drm-intel
Pull drm mst fixes from Daniel Vetter:
"Special pull request for mst fixes since most of the patches touch
code outside of i915 proper. DRM parts have also been reviewed by
Thierry (nvidia) since Dave's enjoying vacations"
* tag 'topic/mst-fixes-2015-08-04' of git://anongit.freedesktop.org/drm-intel:
drm/atomic-helpers: Make encoder picking more robust
drm/dp-mst: Remove debug WARN_ON
drm/i915: Fixup dp mst encoder selection
drm/atomic-helper: Add an atomice best_encoder callback
Linus Torvalds [Tue, 4 Aug 2015 15:49:08 +0000 (08:49 -0700)]
Merge tag 'for-linus-4.2-rc5-tag' of git://git./linux/kernel/git/xen/tip
Pull xen bug fixes from David Vrabel:
- don't lose interrupts when offlining CPUs
- fix gntdev oops during unmap
- drop the balloon lock occasionally to allow domain create/destroy
* tag 'for-linus-4.2-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/events/fifo: Handle linked events when closing a port
xen: release lock occasionally during ballooning
xen/gntdevt: Fix race condition in gntdev_release()
Ross Lagerwall [Fri, 31 Jul 2015 13:30:42 +0000 (14:30 +0100)]
xen/events/fifo: Handle linked events when closing a port
An event channel bound to a CPU that was offlined may still be linked
on that CPU's queue. If this event channel is closed and reused,
subsequent events will be lost because the event channel is never
unlinked and thus cannot be linked onto the correct queue.
When a channel is closed and the event is still linked into a queue,
ensure that it is unlinked before completing.
If the CPU to which the event channel bound is online, spin until the
event is handled by that CPU. If that CPU is offline, it can't handle
the event, so clear the event queue during the close, dropping the
events.
This fixes the missing interrupts (and subsequent disk stalls etc.)
when offlining a CPU.
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Linus Torvalds [Tue, 4 Aug 2015 13:57:32 +0000 (06:57 -0700)]
Merge branch 'rc-fixes' of git://git./linux/kernel/git/mmarek/kbuild
Pull kbuild fixes from Michal Marek:
"Two fixes for kbuild:
- The new ARCH_{CPP,A,C}FLAGS variables are reset before including
the arch Makefile
- Fix calling make modules_install twice when module compression is
enabled"
* 'rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
Makefile: Force gzip and xz on module install
kbuild: Do not pick up ARCH_{CPP,A,C}FLAGS from the environment
Daniel Vetter [Mon, 3 Aug 2015 15:24:11 +0000 (17:24 +0200)]
drm/atomic-helpers: Make encoder picking more robust
We've had a few issues with atomic where subtle bugs in the encoder
picking logic lead to accidental self-stealing of the encoder,
resulting in a NULL connector_state->crtc in update_connector_routing
and subsequent.
Linus applied some duct-tape for an mst regression in
commit
27667f4744fc5a0f3e50910e78740bac5670d18b
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date: Wed Jul 29 22:18:16 2015 -0700
i915: temporary fix for DP MST docking station NULL pointer dereference
But that was incomplete (the code will still oops when debuggin is
enabled) and mangled the state even further. So instead WARN and bail
out as the more future-proof option.
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Thierry Reding <treding@nvidia.com>
Reviewed-by: Ander Conselvan de Oliveira <conselvan2@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Daniel Vetter [Mon, 3 Aug 2015 15:24:10 +0000 (17:24 +0200)]
drm/dp-mst: Remove debug WARN_ON
Apparently been in there since forever and fairly easy to hit when
hotplugging really fast. I can do that since my mst hub has a manual
button to flick the hpd line for reprobing. The resulting WARNING spam
isn't pretty.
Cc: Dave Airlie <airlied@gmail.com>
Cc: stable@vger.kernel.org
Reviewed-by: Thierry Reding <treding@nvidia.com>
Reviewed-by: Ander Conselvan de Oliveira <conselvan2@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Daniel Vetter [Mon, 3 Aug 2015 15:24:09 +0000 (17:24 +0200)]
drm/i915: Fixup dp mst encoder selection
In
commit
8c7b5ccb729870e606321b3703e2c2e698c49a95
Author: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com>
Date: Tue Apr 21 17:13:19 2015 +0300
drm/i915: Use atomic helpers for computing changed flags
we've switched over to the atomic version to compute the
crtc->encoder->connector routing from the i915 variant. That one
relies upon the ->best_encoder callback, but the i915-private version
relied upon intel_find_encoder. Which didn't matter except for dp mst,
where the encoder depends upon the selected crtc.
Fix this functional bug by implemented a correct atomic-state based
encoder selector for dp mst.
Note that we can't get rid of the legacy best_encoder callback since
the fbdev emulation uses that still. That means it's incorrect there
still, but that's been the case ever since i915 dp mst support was
merged so not a regression. Best to fix that by converting fbdev over
to atomic too.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Ander Conselvan de Oliveira <conselvan2@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Daniel Vetter [Mon, 3 Aug 2015 15:24:08 +0000 (17:24 +0200)]
drm/atomic-helper: Add an atomice best_encoder callback
With legacy helpers all the routing was already set up when calling
best_encoder and so could be inspected. But with atomic it's staged,
hence we need a new atomic compliant callback for drivers which need
to inspect the requested state and can't just decided the best encoder
statically.
This is needed to fix up i915 dp mst where we need to pick the right
encoder depending upon the requested CRTC for the connector.
v2: Don't forget to amend the kerneldoc
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Acked-by: Thierry Reding <treding@nvidia.com>
Reviewed-by: Ander Conselvan de Oliveira <conselvan2@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Linus Torvalds [Mon, 3 Aug 2015 21:51:30 +0000 (14:51 -0700)]
Merge branch 'i2c/for-current' of git://git./linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"A refcounting bugfix for the i2c-core, bugfixes for the generic bus
recovery algorithm and for its omap-user, making binary file
attributes for EEPROMs behave POSIX compliant, and a small typo fix
while we are here"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: fix leaked device refcount on of_find_i2c_* error path
i2c: Fix typo in i2c-bfin-twi.c
i2c: omap: fix bus recovery setup
i2c: core: only use set_scl for bus recovery after calling prepare_recovery
misc: eeprom: at24: clean up at24_bin_write()
i2c: slave eeprom: clean up sysfs bin attribute read()/write()
Linus Torvalds [Mon, 3 Aug 2015 18:09:07 +0000 (11:09 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/sage/ceph-client
Pull Ceph fixes from Sage Weil:
"There are two critical regression fixes for CephFS from Zheng, and an
RBD completion fix for layered images from Ilya"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
rbd: fix copyup completion race
ceph: always re-send cap flushes when MDS recovers
ceph: fix ceph_encode_locks_to_buffer()
Linus Torvalds [Mon, 3 Aug 2015 18:00:53 +0000 (11:00 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jmorris/linux-security
Pull security layer fix from James Morris:
"Yama initialization fix"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
Adding YAMA hooks also when YAMA is not stacked.
Linus Torvalds [Mon, 3 Aug 2015 17:53:58 +0000 (10:53 -0700)]
Merge git://git./linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
"This fixes the following issues:
- a bogus BUG_ON in ixp4xx that can be triggered by a dst buffer that
is an SG list.
- the error handling in hwrngd may cause a crash in case of an error.
- fix a race condition in qat registration when multiple devices are
present"
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
hwrng: core - correct error check of kthread_run call
crypto: ixp4xx - Remove bogus BUG_ON on scattered dst buffer
crypto: qat - Fix invalid synchronization between register/unregister sym algs
Linus Torvalds [Mon, 3 Aug 2015 17:25:32 +0000 (10:25 -0700)]
Merge tag 'fixes-for-linus' of git://git./linux/kernel/git/rusty/linux
Pull module fix from Rusty Russell:
"Single overzealous locking assertion fix"
* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
module: weaken locking assertion for oops path.
Salvatore Mesoraca [Mon, 3 Aug 2015 10:40:51 +0000 (12:40 +0200)]
Adding YAMA hooks also when YAMA is not stacked.
Without this patch YAMA will not work at all if it is chosen
as the primary LSM instead of being "stacked".
Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: James Morris <james.l.morris@oracle.com>