GitHub/moto-9609/android_kernel_motorola_exynos9610.git
7 years agof2fs: new helper cur_cp_crc() getting crc in f2fs_checkpoint
Kinglong Mee [Sat, 25 Feb 2017 11:53:39 +0000 (19:53 +0800)]
f2fs: new helper cur_cp_crc() getting crc in f2fs_checkpoint

There are four places that getting the crc value in f2fs_checkpoint,
just add a new helper cur_cp_crc for them.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: update the comment of default nr_pages to skipping
Kinglong Mee [Sat, 25 Feb 2017 11:32:21 +0000 (19:32 +0800)]
f2fs: update the comment of default nr_pages to skipping

Fixes: 2c237ebaa4 ("f2fs: avoid writing node/metapages during writes")
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: drop the duplicate pval in f2fs_getxattr
Kinglong Mee [Sat, 25 Feb 2017 11:23:40 +0000 (19:23 +0800)]
f2fs: drop the duplicate pval in f2fs_getxattr

Fixes: ba38c27eb9 ("f2fs: enhance lookup xattr")
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: Don't update the xattr data that same as the exist
Kinglong Mee [Sat, 25 Feb 2017 11:23:27 +0000 (19:23 +0800)]
f2fs: Don't update the xattr data that same as the exist

f2fs removes the old xattr data and appends the new data although
the new data is same as the exist.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: kill __is_extent_same
Chao Yu [Sat, 25 Feb 2017 09:29:54 +0000 (17:29 +0800)]
f2fs: kill __is_extent_same

Since commit ee6d182f2a19 ("f2fs: remove syncing inode page in all the
cases") delayed inode element updating from inode cache to node page
cache, so once largest cached extent is updated, we can make inode dirty
immediately instead of checking and updating it in the end of extent
cache update.

The above commit didn't clean up unneeded codes in extent_cache.c, let's
finish the job in this patch.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: avoid bggc->fggc when enough free segments are avaliable after cp
Hou Pengyang [Sat, 25 Feb 2017 03:57:38 +0000 (03:57 +0000)]
f2fs: avoid bggc->fggc when enough free segments are avaliable after cp

We use has_not_enough_free_secs to check if there are enough free segments,

     (free_sections(sbi) + freed) <=
        (node_secs + 2 * dent_secs + imeta_secs +
         reserved_sections(sbi) + needed);

Under scenario with large number of dirty nodes, these nodes would be flushed
during cp, as a result, right side of the inequality would be decreased, while
left side stays unchanged if these nodes are flushed in SSR way, which means
there are enough free segments after this cp.

For this case, we just do a bggc instead of fggc.

Signed-off-by: Hou Pengyang <houpengyang@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: select target segment with closer temperature in SSR mode
Chao Yu [Fri, 24 Feb 2017 10:46:00 +0000 (18:46 +0800)]
f2fs: select target segment with closer temperature in SSR mode

In SSR mode, we can allocate target segment which has different
temperature type from the type of current block, in order to avoid
mixing coldest and hottest data/node as much as possible, change
SSR allocation policy to select closer temperature for current
block prior.

Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: show simple call stack in fault injection message
Chao Yu [Sat, 25 Feb 2017 03:08:28 +0000 (11:08 +0800)]
f2fs: show simple call stack in fault injection message

Previously kernel message can show that in which function we do the
injection, but unfortunately, most of the caller are the same, for
tracking more information of injection path, it needs to show upper
caller's name. This patch supports that ability.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: no need lock_op in f2fs_write_inline_data
Yunlei He [Thu, 23 Feb 2017 12:31:20 +0000 (20:31 +0800)]
f2fs: no need lock_op in f2fs_write_inline_data

Similar as f2fs_write_inode, f2fs_write_inline_data just
mark inode page dirty, so it's no need to write inline data
under read lock of cp_rwsem.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: add bitmaps for empty or full NAT blocks
Jaegeuk Kim [Thu, 9 Feb 2017 18:38:09 +0000 (10:38 -0800)]
f2fs: add bitmaps for empty or full NAT blocks

This patches adds bitmaps to represent empty or full NAT blocks containing
free nid entries.

If we can find valid crc|cp_ver in the last block of checkpoint pack, we'll
use these bitmaps when building free nids. In order to avoid checkpointing
burden, up-to-date bitmaps will be flushed only during umount time. So,
normally we can get this gain, but when power-cut happens, we rely on fsck.f2fs
which recovers this bitmap again.

After this patch, we build free nids from nid #0 at mount time to make more
full NAT blocks, but in runtime, we check empty NAT blocks to load free nids
without loading any NAT pages from disk.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: replace rw semaphore extent_tree_lock with mutex lock
Yunlei He [Thu, 23 Feb 2017 11:39:59 +0000 (19:39 +0800)]
f2fs: replace rw semaphore extent_tree_lock with mutex lock

This patch replace rw semaphore extent_tree_lock with mutex lock
for no read cases with this lock.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: avoid m_flags overlay when allocating more data blocks
Kinglong Mee [Thu, 23 Feb 2017 11:55:05 +0000 (19:55 +0800)]
f2fs: avoid m_flags overlay when allocating more data blocks

When more than one data blocks are allocated, the F2FS_MAP_UNWRITTEN/MAPPED
flags will be overlapped by F2FS_MAP_NEW at the later times.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: remove unsafe bitmap checking
Hou Pengyang [Thu, 23 Feb 2017 09:18:06 +0000 (09:18 +0000)]
f2fs: remove unsafe bitmap checking

proc A:                      proc B:
- writeback_sb_inodes
- __writeback_single_inode
- do_writepages
- f2fs_write_node_pages
- f2fs_balance_fs_bg         - write_checkpoint
- build_free_nids            - flush_nat_entries
- __build_free_nids          - __flush_nat_entry_set
- ra_meta_pages              - get_next_nat_page
- current_nat_addr           - set_to_next_nat
[do nat_bitmap checking]     - f2fs_change_bit

For proc A, nat_bitmap and nat_bitmap_mir would be compared without lock_op and
nm_i->nat_tree_lock, while proc B is changing nat_bitmap/nat_bitmap_ver in cp.

So it is normal for nat_bitmap/nat_bitmap diffrence under such scenario.

This patch fix this by removing the monitoring point.

[Fix: 599a09b f2fs: check in-memory nat version bitmap]
Signed-off-by: Hou Pengyang <houpengyang@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: init local extent_info to avoid stale stack info in tp
Hou Pengyang [Thu, 23 Feb 2017 09:18:05 +0000 (09:18 +0000)]
f2fs: init local extent_info to avoid stale stack info in tp

To avoid such stale(fops, blk, len) info in f2fs_lookup_extent_tree_end tp

dio-23095 [005] ...1 17878.856859: f2fs_lookup_extent_tree_end:
dev = (259,30), ino = 856, pgofs = 0,
ext_info(fofs: 3441207040, blk: 4294967232, len: 3481143808)

Signed-off-by: Hou Pengyang <houpengyang@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: remove unnecessary condition check for write_checkpoint in f2fs_gc
Yunlong Song [Tue, 21 Feb 2017 12:43:48 +0000 (20:43 +0800)]
f2fs: remove unnecessary condition check for write_checkpoint in f2fs_gc

Since has_not_enough_free_secs(sbi, 0, 0) must be true if has_not_enough_
free_secs(sbi, sec_freed, 0) is true, write_checkpoint is sure to execute in
both conditions.

Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: check discard alignment only for SEQWRITE zones
Jaegeuk Kim [Thu, 23 Feb 2017 04:18:35 +0000 (20:18 -0800)]
f2fs: check discard alignment only for SEQWRITE zones

For converntional zones, we don't need to align discard commands to exact zone
size.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: wait for discard completion after submission
Jaegeuk Kim [Thu, 23 Feb 2017 03:58:23 +0000 (19:58 -0800)]
f2fs: wait for discard completion after submission

We don't need to wait for each discard commands when unmounting the image.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: much larger batched trim_fs job
Jaegeuk Kim [Thu, 23 Feb 2017 03:53:07 +0000 (19:53 -0800)]
f2fs: much larger batched trim_fs job

We have a kernel thread to issue discard commands, so we can increase the
number of batched discard sections. By default, now it becomes 4GB range.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: avoid very large discard command
Jaegeuk Kim [Thu, 23 Feb 2017 03:10:35 +0000 (19:10 -0800)]
f2fs: avoid very large discard command

This patch adds MAX_DISCARD_BLOCKS() to avoid issuing too much large single
discard command.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: do SSR for node segments more aggresively
Jaegeuk Kim [Thu, 23 Feb 2017 01:02:32 +0000 (17:02 -0800)]
f2fs: do SSR for node segments more aggresively

This patch gives more SSR chances for node blocks.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: find data segments across all the types
Jaegeuk Kim [Thu, 23 Feb 2017 01:10:18 +0000 (17:10 -0800)]
f2fs: find data segments across all the types

Previously, if type is CURSEG_HOT_DATA, we only check CURSEG_HOT_DATA only.
This patch fixes to search all the different types for SSR.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: do SSR in higher priority
Jaegeuk Kim [Thu, 23 Feb 2017 00:39:11 +0000 (16:39 -0800)]
f2fs: do SSR in higher priority

Let's check SSR in prior to LFS allocation.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: do SSR for data when there is enough free space
Yunlong Song [Wed, 22 Feb 2017 12:50:49 +0000 (20:50 +0800)]
f2fs: do SSR for data when there is enough free space

In allocate_segment_by_default(), need_SSR() already detected it's time to do
SSR. So, let's try to find victims for data segments more aggressively in time.

Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: node segment is prior to data segment selected victim
Hou Pengyang [Wed, 22 Feb 2017 10:28:59 +0000 (10:28 +0000)]
f2fs: node segment is prior to data segment selected victim

As data segment gc may lead dnode dirty, so the greedy cost for data segment
should be valid blocks * 2, that is data segment is prior to node segment.

Signed-off-by: Hou Pengyang <houpengyang@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: put allocate_segment after refresh_sit_entry
Yunlong Song [Tue, 21 Feb 2017 08:59:26 +0000 (16:59 +0800)]
f2fs: put allocate_segment after refresh_sit_entry

SIT information should be updated before segment allocation, since SSR needs
latest valid block information. Current code does not update the old_blkaddr
info in sit_entry, so adjust the allocate_segment to its proper location. Commit
5e443818fa0b2a2845561ee25bec181424fb2889 ("f2fs: handle dirty segments inside
refresh_sit_entry") puts it into wrong location.

Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: add ovp valid_blocks check for bg gc victim to fg_gc
Hou Pengyang [Thu, 16 Feb 2017 12:34:31 +0000 (12:34 +0000)]
f2fs: add ovp valid_blocks check for bg gc victim to fg_gc

For foreground gc, greedy algorithm should be adapted, which makes
this formula work well:

(2 * (100 / config.overprovision + 1) + 6)

But currently, we fg_gc have a prior to select bg_gc victim segments to gc
first, these victims are selected by cost-benefit algorithm, we can't guarantee
such segments have the small valid blocks, which may destroy the f2fs rule, on
the worstest case, would consume all the free segments.

This patch fix this by add a filter in check_bg_victims, if segment's has # of
valid blocks over overprovision ratio, skip such segments.

Cc: <stable@vger.kernel.org>
Signed-off-by: Hou Pengyang <houpengyang@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: do not wait for writeback in write_begin
Jaegeuk Kim [Fri, 17 Feb 2017 17:55:55 +0000 (09:55 -0800)]
f2fs: do not wait for writeback in write_begin

Otherwise we can get livelock like below.

[79880.428136] dbench          D    0 18405  18404 0x00000000
[79880.428139] Call Trace:
[79880.428142]  __schedule+0x219/0x6b0
[79880.428144]  schedule+0x36/0x80
[79880.428147]  schedule_timeout+0x243/0x2e0
[79880.428152]  ? update_sd_lb_stats+0x16b/0x5f0
[79880.428155]  ? ktime_get+0x3c/0xb0
[79880.428157]  io_schedule_timeout+0xa6/0x110
[79880.428161]  __lock_page+0xf7/0x130
[79880.428164]  ? unlock_page+0x30/0x30
[79880.428167]  pagecache_get_page+0x16b/0x250
[79880.428171]  grab_cache_page_write_begin+0x20/0x40
[79880.428182]  f2fs_write_begin+0xa2/0xdb0 [f2fs]
[79880.428192]  ? f2fs_mark_inode_dirty_sync+0x16/0x30 [f2fs]
[79880.428197]  ? kmem_cache_free+0x79/0x200
[79880.428203]  ? __mark_inode_dirty+0x17f/0x360
[79880.428206]  generic_perform_write+0xbb/0x190
[79880.428213]  ? file_update_time+0xa4/0xf0
[79880.428217]  __generic_file_write_iter+0x19b/0x1e0
[79880.428226]  f2fs_file_write_iter+0x9c/0x180 [f2fs]
[79880.428231]  __vfs_write+0xc5/0x140
[79880.428235]  vfs_write+0xb2/0x1b0
[79880.428238]  SyS_write+0x46/0xa0
[79880.428242]  entry_SYSCALL_64_fastpath+0x1e/0xad

Fixes: cae96a5c8ab6 ("f2fs: check io submission more precisely")
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: replace __get_victim by dirty_segments in FG_GC
Yunlei He [Fri, 17 Feb 2017 09:16:38 +0000 (17:16 +0800)]
f2fs: replace __get_victim by dirty_segments in FG_GC

In FG_GC process, it will search victim section twice. This will
cause some dirty section with less valid blocks skip garbage
collection.

section # 26425 : valid blocks # 3
142.037567: get_victim_by_default: victim 26425 : valid blocks # 3
142.037585: f2fs_get_victim: dev = (259,30), type = No TYPE, policy = (Foreground GC, LFS-mode, Greedy), victim = 26425 ofs_unit = 1, pre_victim_secno = 26425, prefree = 0, free = 244
142.039494: f2fs_get_victim: dev = (259,30), type = Hot DATA, policy = (Background GC, SSR-mode, Greedy), victim = 19022 ofs_unit = 1, pre_victim_secno = 26425, prefree = 0, free = 24
142.070247: new_curseg: Debug: alloc new segment 26746
142.244341: f2fs_get_victim: dev = (259,30), type = No TYPE, policy = (Foreground GC, LFS-mode, Greedy), victim = 26054 ofs_unit = 1, pre_victim_secno = 26054, prefree = 0, free = 243
142.254475: do_garbage_collect: Debug: FG_GC, seg_freed = 1
142.293131: f2fs_get_victim: dev = (259,30), type = Warm DATA, policy = (Background GC, SSR-mode, Greedy), victim = 23466 ofs_unit = 1, pre_victim_secno = -1, prefree = 0, free = 244
142.319001: f2fs_get_victim: dev = (259,30), type = Warm DATA, policy = (Background GC, SSR-mode, Greedy), victim = 23467 ofs_unit = 1, pre_victim_secno = -1, prefree = 0, free = 244
142.368879: get_victim_by_default: victim 26425 : valid blocks # 3
142.368894: f2fs_get_victim: dev = (259,30), type = No TYPE, policy = (Foreground GC, LFS-mode, Greedy), victim = 26425 ofs_unit = 1, pre_victim_secno = 26425, prefree = 0, free = 244
142.378127: f2fs_get_victim: dev = (259,30), type = Hot DATA, policy = (Background GC, SSR-mode, Greedy), victim = 19612 ofs_unit = 1, pre_victim_secno = 26425, prefree = 0, free = 24
142.416917: new_curseg: Debug: alloc new segment 26054
142.656794: f2fs_get_victim: dev = (259,30), type = No TYPE, policy = (Foreground GC, LFS-mode, Greedy), victim = 25404 ofs_unit = 1, pre_victim_secno = 25404, prefree = 0, free = 243
142.662139: do_garbage_collect: Debug: FG_GC, seg_freed = 1
142.684159: new_curseg: Debug: alloc new segment 25197
142.685059: get_victim_by_default: victim 26425 : valid blocks # 3
142.685079: f2fs_get_victim: dev = (259,30), type = No TYPE, policy = (Foreground GC, LFS-mode, Greedy), victim = 26425 ofs_unit = 1, pre_victim_secno = 26425, prefree = 0, free = 243
142.701427: f2fs_get_victim: dev = (259,30), type = No TYPE, policy = (Foreground GC, LFS-mode, Greedy), victim = 26238 ofs_unit = 1, pre_victim_secno = 26238, prefree = 0, free = 243
142.707105: do_garbage_collect: Debug: FG_GC, seg_freed = 1
142.802444: f2fs_get_victim: dev = (259,30), type = Warm DATA, policy = (Background GC, SSR-mode, Greedy), victim = 23473 ofs_unit = 1, pre_victim_secno = -1, prefree = 0, free = 244
142.804422: get_victim_by_default: victim 26425 : valid blocks # 3
142.804443: f2fs_get_victim: dev = (259,30), type = No TYPE, policy = (Foreground GC, LFS-mode, Greedy), victim = 26425 ofs_unit = 1, pre_victim_secno = 26425, prefree = 0, free = 244
142.851567: f2fs_get_victim: dev = (259,30), type = Hot DATA, policy = (Background GC, SSR-mode, Greedy), victim = 19092 ofs_unit = 1, pre_victim_secno = 26425, prefree = 0, free = 24
142.865014: new_curseg: Debug: alloc new segment 26238
143.082245: f2fs_get_victim: dev = (259,30), type = No TYPE, policy = (Foreground GC, LFS-mode, Greedy), victim = 26307 ofs_unit = 1, pre_victim_secno = 26307, prefree = 0, free = 244
143.088252: do_garbage_collect: Debug: FG_GC, seg_freed = 1
143.128307: new_curseg: Debug: alloc new segment 25404
143.181846: get_victim_by_default: victim 26425 : valid blocks # 3
143.181872: f2fs_get_victim: dev = (259,30), type = No TYPE, policy = (Foreground GC, LFS-mode, Greedy), victim = 26425 ofs_unit = 1, pre_victim_secno = 26425, prefree = 0, free = 244

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: trace victim's cost selectecd by f2fs_gc
Jaegeuk Kim [Thu, 16 Feb 2017 18:35:41 +0000 (10:35 -0800)]
f2fs: trace victim's cost selectecd by f2fs_gc

This patch adds min_cost of each victims.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: fix multiple f2fs_add_link() calls having same name
Jaegeuk Kim [Tue, 14 Feb 2017 17:54:37 +0000 (09:54 -0800)]
f2fs: fix multiple f2fs_add_link() calls having same name

It turns out a stakable filesystem like sdcardfs in AOSP can trigger multiple
vfs_create() to lower filesystem. In that case, f2fs will add multiple dentries
having same name which breaks filesystem consistency.

Until upper layer fixes, let's work around by f2fs, which shows actually not
much performance regression.

Cc: <stable@vger.kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: show actual device info in tracepoints
Jaegeuk Kim [Wed, 15 Feb 2017 19:14:06 +0000 (11:14 -0800)]
f2fs: show actual device info in tracepoints

This patch shows actual device information in the tracepoints.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: use SSR for warm node as well
Jaegeuk Kim [Wed, 15 Feb 2017 03:32:51 +0000 (19:32 -0800)]
f2fs: use SSR for warm node as well

We have had node chains, but haven't used it so far due to stale node blocks.
Now, we have crc|cp_ver in node footer and give random cp_ver at format time,
we can start to use it again.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: enable inline_xattr by default
Chao Yu [Wed, 8 Feb 2017 09:39:44 +0000 (17:39 +0800)]
f2fs: enable inline_xattr by default

In android, since SElinux is enable, security policy will be appliedd for
each file, it stores in inode as an xattr entry, so it will take one 4k
size node block additionally for each file.

Let's enable inline_xattr by default in order to save storage space.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: introduce noinline_xattr mount option
Chao Yu [Wed, 15 Feb 2017 02:34:45 +0000 (10:34 +0800)]
f2fs: introduce noinline_xattr mount option

This patch introduces new mount option 'noinline_xattr', so we can disable
inline xattr functionality which is already set as a default mount option.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: avoid reading NAT page by get_node_info
Jaegeuk Kim [Tue, 14 Feb 2017 01:02:44 +0000 (17:02 -0800)]
f2fs: avoid reading NAT page by get_node_info

We've not seen this buggy case for a long time, so it's time to avoid this
unnecessary get_node_info() call which reading NAT page to cache nat entry.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: remove build_free_nids() during checkpoint
Jaegeuk Kim [Sat, 11 Feb 2017 18:46:44 +0000 (10:46 -0800)]
f2fs: remove build_free_nids() during checkpoint

Let's avoid build_free_nids() in checkpoint path.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: change recovery policy of xattr node block
Chao Yu [Wed, 8 Feb 2017 09:39:45 +0000 (17:39 +0800)]
f2fs: change recovery policy of xattr node block

Currently, if we call fsync after updating the xattr date belongs to the
file, f2fs needs to trigger checkpoint to keep xattr data consistent. But,
this policy cause low performance as checkpoint will block most foreground
operations and cause unneeded and unrelated IOs around checkpoint.

This patch will reuse regular file recovery policy for xattr node block,
so, we change to write xattr node block tagged with fsync flag to warm
area instead of cold area, and during recovery, we search warm node chain
for fsynced xattr block, and do the recovery.

So, for below application IO pattern, performance can be improved
obviously:
- touch file
- create/update/delete xattr entry in file
- fsync file

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: super: constify fscrypt_operations structure
Bhumika Goyal [Sat, 11 Feb 2017 10:20:46 +0000 (15:50 +0530)]
f2fs: super: constify fscrypt_operations structure

Declare fscrypt_operations structure as const as it is only stored in
the s_cop field of a super_block structure. This field is of type const,
so fscrypt_operations structure having this property can be made const
too.

File size before: fs/f2fs/super.o
   text    data     bss     dec     hex filename
  54131   31355     184   85670   14ea6 fs/f2fs/super.o

File size after: fs/f2fs/super.o
   text    data     bss     dec     hex filename
  54227   31259     184   85670   14ea6 fs/f2fs/super.o

Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: show checkpoint version at mount time
Jaegeuk Kim [Thu, 9 Feb 2017 21:25:35 +0000 (13:25 -0800)]
f2fs: show checkpoint version at mount time

If we mounted f2fs successfully, let's show current checkpoint version.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: fix a typo in f2fs.txt
Tiezhu Yang [Tue, 7 Feb 2017 21:08:01 +0000 (05:08 +0800)]
f2fs: fix a typo in f2fs.txt

There is a typo "f2f2" in f2fs.txt, this patch fixes it.

Signed-off-by: Tiezhu Yang <kernelpatch@126.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: remove preflush for nobarrier case
Jaegeuk Kim [Mon, 6 Feb 2017 21:57:58 +0000 (13:57 -0800)]
f2fs: remove preflush for nobarrier case

This patch removes REQ_PREFLUSH in the nobarrier case.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: check last page index in cached bio to decide submission
Jaegeuk Kim [Thu, 2 Feb 2017 00:51:22 +0000 (16:51 -0800)]
f2fs: check last page index in cached bio to decide submission

If the cached bio has the last page's index, then we need to submit it.
Otherwise, we don't need to submit it and can wait for further IO merges.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: check io submission more precisely
Jaegeuk Kim [Sat, 4 Feb 2017 01:44:04 +0000 (17:44 -0800)]
f2fs: check io submission more precisely

This patch check IO submission more precisely than previous rough check.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: call internal __write_data_page directly
Jaegeuk Kim [Sat, 4 Feb 2017 01:18:00 +0000 (17:18 -0800)]
f2fs: call internal __write_data_page directly

This patch introduces __write_data_page to call it by f2fs_write_cache_pages
directly..

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: avoid out-of-order execution of atomic writes
Jaegeuk Kim [Fri, 3 Feb 2017 02:18:06 +0000 (18:18 -0800)]
f2fs: avoid out-of-order execution of atomic writes

We need to flush data writes before flushing last node block writes by using
FUA with PREFLUSH. We don't need to guarantee precedent node writes since if
those are not written, we can't reach to the last node block when scanning
node block chain during roll-forward recovery.
Afterwards f2fs_wait_on_page_writeback guarantees all the IO submission to
disk, which builds a valid node block chain.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: move write_node_page above fsync_node_pages
Jaegeuk Kim [Fri, 3 Feb 2017 02:27:17 +0000 (18:27 -0800)]
f2fs: move write_node_page above fsync_node_pages

This patch just moves write_node_page and introduces an inner function.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: move flush tracepoint
Jaegeuk Kim [Fri, 3 Feb 2017 00:40:55 +0000 (16:40 -0800)]
f2fs: move flush tracepoint

This patch moves the tracepoint location for flush command.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: show # of APPEND and UPDATE inodes
Jaegeuk Kim [Wed, 1 Feb 2017 23:40:11 +0000 (15:40 -0800)]
f2fs: show # of APPEND and UPDATE inodes

This patch shows cached # of APPEND and UPDATE inode entries.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: fix 446 coding style warnings in f2fs.h
DongOh Shin [Mon, 30 Jan 2017 18:55:18 +0000 (10:55 -0800)]
f2fs: fix 446 coding style warnings in f2fs.h

1) Nine coding style warnings below have been resolved:
"Missing a blank line after declarations"

2) 435 coding style warnings below have been resolved:
"function definition argument 'x' should also have an identifier name"

3) Two coding style warnings below have been resolved:
"macros should not use a trailing semicolon"

Signed-off-by: DongOh Shin <doscode.kr@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: fix 3 coding style errors in f2fs.h
DongOh Shin [Mon, 30 Jan 2017 18:55:17 +0000 (10:55 -0800)]
f2fs: fix 3 coding style errors in f2fs.h

Two coding style errors below have been resolved:
"Macros with complex values should be enclosed in parentheses"

And a coding style error below has been resolved:
"space prohibited before that ',' (ctx:WxW)"

Signed-off-by: DongOh Shin <doscode.kr@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: declare missing static function
Jaegeuk Kim [Sun, 29 Jan 2017 05:27:02 +0000 (14:27 +0900)]
f2fs: declare missing static function

We missed two functions declared as static functions.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: show the fault injection mount option
Kaixu Xia [Fri, 27 Jan 2017 01:35:37 +0000 (09:35 +0800)]
f2fs: show the fault injection mount option

This patch shows the fault injection mount option in
f2fs_show_options().

Signed-off-by: Kaixu Xia <xiakaixu@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: fix null pointer dereference when issuing flush in ->fsync
Chao Yu [Wed, 25 Jan 2017 02:52:40 +0000 (10:52 +0800)]
f2fs: fix null pointer dereference when issuing flush in ->fsync

We only allocate flush merge control structure sbi::sm_info::fcc_info when
flush_merge option is on, but in f2fs_issue_flush we still try to access
member of the control structure without that option, it incurs panic as
show below, fix it.

Call Trace:
 __remove_ino_entry+0xa9/0xc0 [f2fs]
 f2fs_do_sync_file.isra.27+0x214/0x6d0 [f2fs]
 f2fs_sync_file+0x18/0x20 [f2fs]
 vfs_fsync_range+0x3d/0xb0
 __do_page_fault+0x261/0x4d0
 do_fsync+0x3d/0x70
 SyS_fsync+0x10/0x20
 do_syscall_64+0x6e/0x180
 entry_SYSCALL64_slow_path+0x25/0x25
RIP: 0033:0x7f18ce260de0
RSP: 002b:00007ffdd4589258 EFLAGS: 00000246 ORIG_RAX: 000000000000004a
RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f18ce260de0
RDX: 0000000000000006 RSI: 00000000016c0360 RDI: 0000000000000003
RBP: 00000000016c0360 R08: 000000000000ffff R09: 000000000000001f
R10: 00007ffdd4589020 R11: 0000000000000246 R12: 00000000016c0100
R13: 0000000000000000 R14: 00000000016c1f00 R15: 00000000016c0100
Code: fb 81 e3 00 08 00 00 48 89 45 a0 0f 1f 44 00 00 31 c0 85 db 75 27 41 81 e7 00 04 00 00 74 0c 41 8b 45 20 85 c0 0f 85 81 00 00 00 <f0> 41 ff 45 20 4c 89 e7 e8 f8 e9 ff ff f0 41 ff 4d 20 48 83 c4
RIP: f2fs_issue_flush+0x5b/0x170 [f2fs] RSP: ffffc90003b5fd78
CR2: 0000000000000020
---[ end trace a09314c24f037648 ]---

Reported-by: Shuoran Liu <liushuoran@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: fix to avoid overflow when left shifting page offset
Chao Yu [Wed, 25 Jan 2017 02:52:39 +0000 (10:52 +0800)]
f2fs: fix to avoid overflow when left shifting page offset

We use following method to calculate size with current page index:
size = index << PAGE_SHIFT
If type of index has only 32-bits size, left shifting will incur overflow,
which makes result incorrect.

So let's cast index with 64-bits type to avoid such issue.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: enhance lookup xattr
Chao Yu [Tue, 24 Jan 2017 12:39:51 +0000 (20:39 +0800)]
f2fs: enhance lookup xattr

Previously, in getxattr we will load all entries both in inline xattr and
xattr node block, and then do the lookup in all entries, but our lookup
flow shows low efficiency, since if we can lookup and hit in inline xattr
of inode page cache first, we don't need to load and lookup xattr node
block, which can obviously save cpu time and IO latency.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: initialize NULL to avoid warning]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agoDoc: f2fs: Fix typo in Documentation/filesystems/f2fs.txt
Masanari Iida [Tue, 24 Jan 2017 03:47:55 +0000 (12:47 +0900)]
Doc: f2fs: Fix typo in Documentation/filesystems/f2fs.txt

This patch fix a typo in f2fs.txt

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: fix a dead loop in f2fs_fiemap()
Wei Fang [Sun, 22 Jan 2017 04:21:02 +0000 (12:21 +0800)]
f2fs: fix a dead loop in f2fs_fiemap()

A dead loop can be triggered in f2fs_fiemap() using the test case
as below:

...
fd = open();
fallocate(fd, 0, 0, 4294967296);
ioctl(fd, FS_IOC_FIEMAP, fiemap_buf);
...

It's caused by an overflow in __get_data_block():
...
bh->b_size = map.m_len << inode->i_blkbits;
...
map.m_len is an unsigned int, and bh->b_size is a size_t which is 64 bits
on 64 bits archtecture, type conversion from an unsigned int to a size_t
will result in an overflow.

In the above-mentioned case, bh->b_size will be zero, and f2fs_fiemap()
will call get_data_block() at block 0 again an again.

Fix this by adding a force conversion before left shift.

Signed-off-by: Wei Fang <fangwei1@huawei.com>
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: do not preallocate blocks which has wrong buffer
Jaegeuk Kim [Fri, 13 Jan 2017 21:12:29 +0000 (13:12 -0800)]
f2fs: do not preallocate blocks which has wrong buffer

Sheng Yong reports needless preallocation if write(small_buffer, large_size)
is called.

In that case, f2fs preallocates large_size, but vfs returns early due to
small_buffer size. Let's detect it before preallocation phase in f2fs.

Reported-by: Sheng Yong <shengyong1@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: show # of on-going flush and discard bios
Jaegeuk Kim [Wed, 11 Jan 2017 18:20:04 +0000 (10:20 -0800)]
f2fs: show # of on-going flush and discard bios

This patch adds stat information for flush and discard commands.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: add a kernel thread to issue discard commands asynchronously
Jaegeuk Kim [Tue, 10 Jan 2017 04:32:07 +0000 (20:32 -0800)]
f2fs: add a kernel thread to issue discard commands asynchronously

This patch adds a kernel thread to issue discard commands.
It proposes three states, D_PREP, D_SUBMIT, and D_DONE to identify current
bio status.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: factor out discard command info into discard_cmd_control
Jaegeuk Kim [Wed, 11 Jan 2017 22:40:24 +0000 (14:40 -0800)]
f2fs: factor out discard command info into discard_cmd_control

This patch adds discard_cmd_control with the existing discarding controls.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: reorganize stat information
Jaegeuk Kim [Wed, 11 Jan 2017 18:21:15 +0000 (10:21 -0800)]
f2fs: reorganize stat information

This patch modifies stat information more clearly.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: clean up flush/discard command namings
Jaegeuk Kim [Mon, 9 Jan 2017 22:13:03 +0000 (14:13 -0800)]
f2fs: clean up flush/discard command namings

This patch simply cleans up the names for flush/discard commands.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: check in-memory sit version bitmap
Chao Yu [Sat, 7 Jan 2017 10:52:34 +0000 (18:52 +0800)]
f2fs: check in-memory sit version bitmap

This patch adds a mirror for sit version bitmap, and use it to detect
in-memory bitmap corruption which may be caused by bit-transition of
cache or memory overflow.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: check in-memory nat version bitmap
Chao Yu [Sat, 7 Jan 2017 10:52:01 +0000 (18:52 +0800)]
f2fs: check in-memory nat version bitmap

This patch adds a mirror for nat version bitmap, and use it to detect
in-memory bitmap corruption which may be caused by bit-transition of
cache or memory overflow.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: check in-memory block bitmap
Chao Yu [Sat, 7 Jan 2017 10:51:01 +0000 (18:51 +0800)]
f2fs: check in-memory block bitmap

This patch adds a mirror for valid block bitmap, and use it to detect
in-memory bitmap corruption which may be caused by bit-transition of
cache or memory overflow.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: introduce FI_ATOMIC_COMMIT
Chao Yu [Sat, 7 Jan 2017 10:50:26 +0000 (18:50 +0800)]
f2fs: introduce FI_ATOMIC_COMMIT

This patch introduces a new flag to indicate inode status of doing atomic
write committing, so that, we can keep atomic write status for inode
during atomic committing, then we can skip GCing pages of atomic write inode,
that avoids random GCed datas being mixed with current transaction, so
isolation of transaction can be kept.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: clean up with list_{first, last}_entry
Chao Yu [Sat, 7 Jan 2017 10:49:42 +0000 (18:49 +0800)]
f2fs: clean up with list_{first, last}_entry

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: return fs_trim if there is no candidate
Jaegeuk Kim [Fri, 30 Dec 2016 06:06:15 +0000 (22:06 -0800)]
f2fs: return fs_trim if there is no candidate

If there is no candidate to submit discard command during f2fs_trim_fs, let's
return without checkpoint.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: avoid needless checkpoint in f2fs_trim_fs
Jaegeuk Kim [Fri, 30 Dec 2016 00:58:54 +0000 (16:58 -0800)]
f2fs: avoid needless checkpoint in f2fs_trim_fs

The f2fs_trim_fs() doesn't need to do checkpoint if there are newly allocated
data blocks only which didn't change the critical checkpoint data such as nat
and sit entries.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: relax async discard commands more
Jaegeuk Kim [Thu, 29 Dec 2016 22:07:53 +0000 (14:07 -0800)]
f2fs: relax async discard commands more

This patch relaxes async discard commands to avoid waiting its end_io during
checkpoint.
Instead of waiting them during checkpoint, it will be done when actually reusing
them.

Test on initial partition of nvme drive.

 # time fstrim /mnt/test

Before : 6.158s
After : 4.822s

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: drop exist_data for inline_data when truncated to 0
Jaegeuk Kim [Wed, 4 Jan 2017 01:19:30 +0000 (17:19 -0800)]
f2fs: drop exist_data for inline_data when truncated to 0

A test program gets the SEEK_DATA with two values between
a new created file and the exist file on f2fs filesystem.

F2FS filesystem,  (the first "test1" is a new file)
SEEK_DATA size != 0 (offset = 8192)
SEEK_DATA size != 0 (offset = 4096)

PNFS filesystem, (the first "test1" is a new file)
SEEK_DATA size != 0 (offset = 4096)
SEEK_DATA size != 0 (offset = 4096)

int main(int argc, char **argv)
{
        char *filename = argv[1];
        int offset = 1, i = 0, fd = -1;

        if (argc < 2) {
                printf("Usage: %s f2fsfilename\n", argv[0]);
                return -1;
        }

        /*
        if (!access(filename, F_OK) || errno != ENOENT) {
                printf("Needs a new file for test, %m\n");
                return -1;
        }*/

        fd = open(filename, O_RDWR | O_CREAT, 0777);
        if (fd < 0) {
                printf("Create test file %s failed, %m\n", filename);
                return -1;
        }

        for (i = 0; i < 20; i++) {
                offset = 1 << i;
                ftruncate(fd, 0);
                lseek(fd, offset, SEEK_SET);
                write(fd, "test", 5);
                /* Get the alloc size by seek data equal zero*/
                if (lseek(fd, 0, SEEK_DATA)) {
                        printf("SEEK_DATA size != 0 (offset = %d)\n", offset);
                        break;
                }
        }

        close(fd);
        return 0;
}

Reported-and-Tested-by: Kinglong Mee <kinglongmee@gmail.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: don't allow encrypted operations without keys
Jaegeuk Kim [Thu, 29 Dec 2016 01:31:15 +0000 (17:31 -0800)]
f2fs: don't allow encrypted operations without keys

This patch fixes the renaming bug on encrypted filenames, which was pointed by

 (ext4: don't allow encrypted operations without keys)

Cc: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: show the max number of atomic operations
Jaegeuk Kim [Wed, 28 Dec 2016 21:55:09 +0000 (13:55 -0800)]
f2fs: show the max number of atomic operations

This patch adds to show the max number of atomic operations which are
conducting concurrently.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: get io size bit from mount option
Jaegeuk Kim [Thu, 22 Dec 2016 01:09:19 +0000 (17:09 -0800)]
f2fs: get io size bit from mount option

This patch adds to set io_size_bits from mount option.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: support IO alignment for DATA and NODE writes
Jaegeuk Kim [Wed, 14 Dec 2016 18:12:56 +0000 (10:12 -0800)]
f2fs: support IO alignment for DATA and NODE writes

This patch implements IO alignment by filling dummy blocks in DATA and NODE
write bios. If we can guarantee, for example, 32KB or 64KB for such the IOs,
we can eliminate underlying dummy page problem which FTL conducts in order to
close MLC or TLC partial written pages.

Note that,
 - it requires "-o mode=lfs".
 - IO size should be power of 2, not exceed BIO_MAX_PAGES, 256.
 - read IO is still 4KB.
 - do checkpoint at fsync, if dummy NODE page was written.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: add submit_bio tracepoint
Jaegeuk Kim [Wed, 21 Dec 2016 20:13:03 +0000 (12:13 -0800)]
f2fs: add submit_bio tracepoint

This patch adds final submit_bio() tracepoint.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: fix wrong tracepoints for op and op_flags
Jaegeuk Kim [Wed, 21 Dec 2016 22:26:39 +0000 (14:26 -0800)]
f2fs: fix wrong tracepoints for op and op_flags

This patch fixes wrong tracepoints in terms of op and op_flags.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: reassign new segment for mode=lfs
Jaegeuk Kim [Wed, 21 Dec 2016 19:51:32 +0000 (11:51 -0800)]
f2fs: reassign new segment for mode=lfs

Otherwise we can remain wrong curseg->next_blkoff, resulting in fsck failure.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: fix a missing discard prefree segments
Yunlei He [Thu, 22 Dec 2016 03:46:24 +0000 (11:46 +0800)]
f2fs: fix a missing discard prefree segments

If userspace issue a fstrim with a range not involve prefree segments,
it will reuse these segments without discard. This patch fix it.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: use rb_entry_safe
Geliang Tang [Tue, 20 Dec 2016 13:57:42 +0000 (21:57 +0800)]
f2fs: use rb_entry_safe

Use rb_entry_safe() instead of open-coding it.

Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: add a case of no need to read a page in write begin
Yunlei He [Tue, 20 Dec 2016 03:11:35 +0000 (11:11 +0800)]
f2fs: add a case of no need to read a page in write begin

If the range we write cover the whole valid data in the last page,
we do not need to read it.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
[Jaegeuk Kim: nullify the remaining area (fix: xfstests/f2fs/001)]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: fix a problem of using memory after free
Yunlei He [Mon, 19 Dec 2016 12:10:48 +0000 (20:10 +0800)]
f2fs: fix a problem of using memory after free

This patch fix a problem of using memory after free
in function __try_merge_extent_node.

Fixes: 0f825ee6e873 ("f2fs: add new interfaces for extent tree")
Cc: <stable@vger.kernel.org>
Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: remove unneeded condition
Dan Carpenter [Fri, 16 Dec 2016 08:18:15 +0000 (11:18 +0300)]
f2fs: remove unneeded condition

We checked that "inode" is not an error pointer earlier so there is
no need to check again here.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: don't cache nat entry if out of memory
Chao Yu [Tue, 13 Dec 2016 10:54:59 +0000 (18:54 +0800)]
f2fs: don't cache nat entry if out of memory

If we run out of memory, in cache_nat_entry, it's better to avoid loop
for allocating memory to cache nat entry, so in low memory scenario, for
read path of node block, I expect this can avoid unneeded latency.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: remove unused values in recover_fsync_data
Yunlei He [Tue, 13 Dec 2016 09:23:37 +0000 (17:23 +0800)]
f2fs: remove unused values in recover_fsync_data

This patch remove unused values in function recover_fsync_data

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agoMerge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa...
Linus Torvalds [Sat, 28 Jan 2017 23:09:23 +0000 (15:09 -0800)]
Merge branch 'i2c/for-current' of git://git./linux/kernel/git/wsa/linux

Pull i2c fixes from Wolfram Sang:
 "Two I2C driver bugfixes.

  The 'VLLS mode support' patch should have been entitled 'reconfigure
  pinctrl after suspend' to make the bugfix more clear. Sorry, I missed
  that, yet didn't want to rebase"

* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: imx-lpi2c: add VLLS mode support
  i2c: i2c-cadence: Initialize configuration before probing devices

7 years agoMerge tag 'nfs-for-4.10-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Linus Torvalds [Sat, 28 Jan 2017 19:50:17 +0000 (11:50 -0800)]
Merge tag 'nfs-for-4.10-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes from Trond Myklebust:
 "Stable patches:
   - NFSv4.1: Fix a deadlock in layoutget
   - NFSv4 must not bump sequence ids on NFS4ERR_MOVED errors
   - NFSv4 Fix a regression with OPEN EXCLUSIVE4 mode
   - Fix a memory leak when removing the SUNRPC module

  Bugfixes:
   - Fix a reference leak in _pnfs_return_layout"

* tag 'nfs-for-4.10-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  pNFS: Fix a reference leak in _pnfs_return_layout
  nfs: Fix "Don't increment lock sequence ID after NFS4ERR_MOVED"
  SUNRPC: cleanup ida information when removing sunrpc module
  NFSv4.0: always send mode in SETATTR after EXCLUSIVE4
  nfs: Don't increment lock sequence ID after NFS4ERR_MOVED
  NFSv4.1: Fix a deadlock in layoutget

7 years agoMerge tag 'md/4.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md
Linus Torvalds [Sat, 28 Jan 2017 19:09:04 +0000 (11:09 -0800)]
Merge tag 'md/4.10-rc6' of git://git./linux/kernel/git/shli/md

Pull MD fixes from Shaohua Li:
 "This fixes several corner cases for raid5 cache, which is merged into
  this cycle"

* tag 'md/4.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md:
  md/r5cache: disable write back for degraded array
  md/r5cache: shift complex rmw from read path to write path
  md/r5cache: flush data only stripes in r5l_recovery_log()
  md/raid5: move comment of fetch_block to right location
  md/r5cache: read data into orig_page for prexor of cached data
  md/raid5-cache: delete meaningless code

7 years agoMerge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Linus Torvalds [Sat, 28 Jan 2017 19:06:42 +0000 (11:06 -0800)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux

Pull arm64 fix from Catalin Marinas:
 "Fix kernel panic on ACPI-based systems where CPU capacity description
  is not currently handled"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: skip register_cpufreq_notifier on ACPI-based systems

7 years agoMerge tag 'arc-4.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc
Linus Torvalds [Sat, 28 Jan 2017 19:00:08 +0000 (11:00 -0800)]
Merge tag 'arc-4.10-rc6' of git://git./linux/kernel/git/vgupta/arc

Pull ARC fixes from Vineet Gupta:
 "Hopefully last set of changes for ARC for 4.10:

   - fix for unaligned access emulation corner case

   - fix for udelay loop inline asm regression

   - fix irq affinity finally for AXS103 board [Yuriy]

   - final fixes for setting IO-coherency sanely in SMP"

* tag 'arc-4.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
  ARC: [arcompact] handle unaligned access delay slot corner case
  ARCv2: smp-boot: wake_flag polling by non-Masters needs to be uncached
  ARC: smp-boot: Decouple Non masters waiting API from jump to entry point
  ARCv2: MCIP: update the BCR per current changes
  ARC: udelay: fix inline assembler by adding LP_COUNT to clobber list
  ARCv2: MCIP: Deprecate setting of affinity in Device Tree

7 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Linus Torvalds [Fri, 27 Jan 2017 20:54:16 +0000 (12:54 -0800)]
Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:

 1) GTP fixes from Andreas Schultz (missing genl module alias, clear IP
    DF on transmit).

 2) Netfilter needs to reflect the fwmark when sending resets, from Pau
    Espin Pedrol.

 3) nftable dump OOPS fix from Liping Zhang.

 4) Fix erroneous setting of VIRTIO_NET_HDR_F_DATA_VALID on transmit,
    from Rolf Neugebauer.

 5) Fix build error of ipt_CLUSTERIP when procfs is disabled, from Arnd
    Bergmann.

 6) Fix regression in handling of NETIF_F_SG in harmonize_features(),
    from Eric Dumazet.

 7) Fix RTNL deadlock wrt. lwtunnel module loading, from David Ahern.

 8) tcp_fastopen_create_child() needs to setup tp->max_window, from
    Alexey Kodanev.

 9) Missing kmemdup() failure check in ipv6 segment routing code, from
    Eric Dumazet.

10) Don't execute unix_bind() under the bindlock, otherwise we deadlock
    with splice. From WANG Cong.

11) ip6_tnl_parse_tlv_enc_lim() potentially reallocates the skb buffer,
    therefore callers must reload cached header pointers into that skb.
    Fix from Eric Dumazet.

12) Fix various bugs in legacy IRQ fallback handling in alx driver, from
    Tobias Regnery.

13) Do not allow lwtunnel drivers to be unloaded while they are
    referenced by active instances, from Robert Shearman.

14) Fix truncated PHY LED trigger names, from Geert Uytterhoeven.

15) Fix a few regressions from virtio_net XDP support, from John
    Fastabend and Jakub Kicinski.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (102 commits)
  ISDN: eicon: silence misleading array-bounds warning
  net: phy: micrel: add support for KSZ8795
  gtp: fix cross netns recv on gtp socket
  gtp: clear DF bit on GTP packet tx
  gtp: add genl family modules alias
  tcp: don't annotate mark on control socket from tcp_v6_send_response()
  ravb: unmap descriptors when freeing rings
  virtio_net: reject XDP programs using header adjustment
  virtio_net: use dev_kfree_skb for small buffer XDP receive
  r8152: check rx after napi is enabled
  r8152: re-schedule napi for tx
  r8152: avoid start_xmit to schedule napi when napi is disabled
  r8152: avoid start_xmit to call napi_schedule during autosuspend
  net: dsa: Bring back device detaching in dsa_slave_suspend()
  net: phy: leds: Fix truncated LED trigger names
  net: phy: leds: Break dependency of phy.h on phy_led_triggers.h
  net: phy: leds: Clear phy_num_led_triggers on failure to avoid crash
  net-next: ethernet: mediatek: change the compatible string
  Documentation: devicetree: change the mediatek ethernet compatible string
  bnxt_en: Fix RTNL lock usage on bnxt_get_port_module_status().
  ...

7 years agoMerge tag 'xfs-for-linus-4.10-rc6-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Linus Torvalds [Fri, 27 Jan 2017 20:44:32 +0000 (12:44 -0800)]
Merge tag 'xfs-for-linus-4.10-rc6-5' of git://git./fs/xfs/xfs-linux

Pull xfs uodates from Darrick Wong:
 "I have some more fixes this week: better input validation, corruption
  avoidance, build fixes, memory leak fixes, and a couple from Christoph
  to avoid an ENOSPC failure.

  Summary:
   - Fix race conditions in the CoW code
   - Fix some incorrect input validation checks
   - Avoid crashing fs by running out of space when freeing inodes
   - Fix toctou race wrt whether or not an inode has an attr
   - Fix build error on arm
   - Fix page refcount corruption when readahead fails
   - Don't corrupt userspace in the bmap ioctl"

* tag 'xfs-for-linus-4.10-rc6-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: prevent quotacheck from overloading inode lru
  xfs: fix bmv_count confusion w/ shared extents
  xfs: clear _XBF_PAGES from buffers when readahead page
  xfs: extsize hints are not unlikely in xfs_bmap_btalloc
  xfs: remove racy hasattr check from attr ops
  xfs: use per-AG reservations for the finobt
  xfs: only update mount/resv fields on success in __xfs_ag_resv_init
  xfs: verify dirblocklog correctly
  xfs: fix COW writeback race

7 years agoMerge branch 'for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mason...
Linus Torvalds [Fri, 27 Jan 2017 20:41:46 +0000 (12:41 -0800)]
Merge branch 'for-linus-4.10' of git://git./linux/kernel/git/mason/linux-btrfs

Pull btrfs updates from Chris Mason:
 "Some fixes that we've collected from the list.

  We still have one more pending to nail down a regression in lzo
  compression, but I wanted to get this batch out the door"

* 'for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: remove ->{get, set}_acl() from btrfs_dir_ro_inode_operations
  Btrfs: disable xattr operations on subvolume directories
  Btrfs: remove old tree_root case in btrfs_read_locked_inode()
  Btrfs: fix truncate down when no_holes feature is enabled
  Btrfs: Fix deadlock between direct IO and fast fsync
  btrfs: fix false enospc error when truncating heavily reflinked file

7 years agoMerge branch 'for-linus' of git://git.kernel.dk/linux-block
Linus Torvalds [Fri, 27 Jan 2017 20:36:39 +0000 (12:36 -0800)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:
 "A set of fixes for this series. This contains:

   - Set of fixes for the nvme target code

   - A revert of patch from this merge window, causing a regression with
     WRITE_SAME on iSCSI targets at least.

   - A fix for a use-after-free in the new O_DIRECT bdev code.

   - Two fixes for the xen-blkfront driver"

* 'for-linus' of git://git.kernel.dk/linux-block:
  Revert "sd: remove __data_len hack for WRITE SAME"
  nvme-fc: use blk_rq_nr_phys_segments
  nvmet-rdma: Fix missing dma sync to nvme data structures
  nvmet: Call fatal_error from keep-alive timout expiration
  nvmet: cancel fatal error and flush async work before free controller
  nvmet: delete controllers deletion upon subsystem release
  nvmet_fc: correct logic in disconnect queue LS handling
  block: fix use after free in __blkdev_direct_IO
  xen-blkfront: correct maximum segment accounting
  xen-blkfront: feature flags handling adjustments

7 years agoMerge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma
Linus Torvalds [Fri, 27 Jan 2017 20:29:30 +0000 (12:29 -0800)]
Merge tag 'for-linus' of git://git./linux/kernel/git/dledford/rdma

Pull rdma fixes from Doug Ledford:
 "Second round of -rc fixes for 4.10.

  This -rc cycle has been slow for the rdma subsystem. I had already
  sent you the first batch before the Holiday break. After that, we kept
  only getting a few here or there. Up until this week, when I got a
  drop of 13 to one driver (qedr). So, here's the -rc patches I have. I
  currently have none held in reserve, so unless something new comes in,
  this is it until the next merge window opens.

  Summary:

   - series of iw_cxgb4 fixes to make it work with the drain cq API

   - one or two patches each to: srp, iser, cxgb3, vmw_pvrdma, umem,
     rxe, and ipoib

   - one big series (13 patches) for the new qedr driver"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (27 commits)
  RDMA/cma: Fix unknown symbol when CONFIG_IPV6 is not enabled
  IB/rxe: Prevent from completer to operate on non valid QP
  IB/rxe: Fix rxe dev insertion to rxe_dev_list
  IB/umem: Release pid in error and ODP flow
  RDMA/qedr: Dispatch port active event from qedr_add
  RDMA/qedr: Fix and simplify memory leak in PD alloc
  RDMA/qedr: Fix RDMA CM loopback
  RDMA/qedr: Fix formatting
  RDMA/qedr: Mark three functions as static
  RDMA/qedr: Don't reset QP when queues aren't flushed
  RDMA/qedr: Don't spam dmesg if QP is in error state
  RDMA/qedr: Remove CQ spinlock from CM completion handlers
  RDMA/qedr: Return max inline data in QP query result
  RDMA/qedr: Return success when not changing QP state
  RDMA/qedr: Add uapi header qedr-abi.h
  RDMA/qedr: Fix MTU returned from QP query
  RDMA/core: Add the function ib_mtu_int_to_enum
  IB/vmw_pvrdma: Fix incorrect cleanup on pvrdma_pci_probe error path
  IB/vmw_pvrdma: Don't leak info from alloc_ucontext
  IB/cxgb3: fix misspelling in header guard
  ...

7 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Linus Torvalds [Fri, 27 Jan 2017 20:25:26 +0000 (12:25 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/s390/linux

Pull s390 fixes from Martin Schwidefsky:
 "Another two bug fixes:

   - ptrace partial write information leak

   - a guest page hinting regression introduced with v4.6"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/mm: Fix cmma unused transfer from pgste into pte
  s390/ptrace: Preserve previous registers for short regset write

7 years agoMerge branch 'stable/for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 27 Jan 2017 20:17:07 +0000 (12:17 -0800)]
Merge branch 'stable/for-linus-4.10' of git://git./linux/kernel/git/konrad/swiotlb

Pull swiotlb fix from Konrad Rzeszutek Wilk:
 "An ARM fix in the Xen SWIOTLB - mainly the translation of physical to
  bus addresses was done just a tad too late"

* 'stable/for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb:
  swiotlb-xen: update dev_addr after swapping pages

7 years agoMerge tag 'vfio-v4.10-rc6' of git://github.com/awilliam/linux-vfio
Linus Torvalds [Fri, 27 Jan 2017 20:10:58 +0000 (12:10 -0800)]
Merge tag 'vfio-v4.10-rc6' of git://github.com/awilliam/linux-vfio

Pull VFIO fix from Alex Williamson:
 "mdev IOMMU groups are not yet compatible with the powerpc SPAPR IOMMU
  backend, detect and fail group attach (Greg Kurz)"

* tag 'vfio-v4.10-rc6' of git://github.com/awilliam/linux-vfio:
  vfio/spapr: fail tce_iommu_attach_group() when iommu_data is null

7 years agoRDMA/cma: Fix unknown symbol when CONFIG_IPV6 is not enabled
Jack Morgenstein [Sun, 15 Jan 2017 18:15:00 +0000 (20:15 +0200)]
RDMA/cma: Fix unknown symbol when CONFIG_IPV6 is not enabled

If IPV6 has not been enabled in the underlying kernel, we must avoid
calling IPV6 procedures in rdma_cm.ko.

This requires using "IS_ENABLED(CONFIG_IPV6)" in "if" statements
surrounding any code which calls external IPV6 procedures.

In the instance fixed here, procedure cma_bind_addr() called
ipv6_addr_type() -- which resulted in calling external procedure
__ipv6_addr_type().

Fixes: 6c26a77124ff ("RDMA/cma: fix IPv6 address resolution")
Cc: <stable@vger.kernel.org> # v4.2+
Cc: Spencer Baugh <sbaugh@catern.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Reviewed-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>