GitHub/moto-9609/android_kernel_motorola_exynos9610.git
7 years agof2fs: introduce CP_TRIMMED_FLAG to avoid unneeded discard
Chao Yu [Fri, 28 Apr 2017 05:56:08 +0000 (13:56 +0800)]
f2fs: introduce CP_TRIMMED_FLAG to avoid unneeded discard

Introduce CP_TRIMMED_FLAG to indicate all invalid block were trimmed
before umount, so once we do mount with image which contain the flag,
we don't record invalid blocks as undiscard one, when fstrim is being
triggered, we can avoid issuing redundant discard commands.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: allow cpc->reason to indicate more than one reason
Chao Yu [Thu, 27 Apr 2017 12:40:39 +0000 (20:40 +0800)]
f2fs: allow cpc->reason to indicate more than one reason

Change to use different bits of cpc->reason to indicate different status,
so cpc->reason can indicate more than one reason.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: release cp and dnode lock before IPU
Hou Pengyang [Wed, 26 Apr 2017 16:17:21 +0000 (00:17 +0800)]
f2fs: release cp and dnode lock before IPU

We don't need to rewrite the page under cp_rwsem and dnode locks.

Signed-off-by: Hou Pengyang <houpengyang@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: shrink size of struct discard_cmd
Chao Yu [Wed, 26 Apr 2017 09:39:55 +0000 (17:39 +0800)]
f2fs: shrink size of struct discard_cmd

In order to shrink size of struct discard_cmd, change variable type of
@state in struct discard_cmd from int to unsigned char.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: don't hold cmd_lock during waiting discard command
Chao Yu [Wed, 26 Apr 2017 09:39:54 +0000 (17:39 +0800)]
f2fs: don't hold cmd_lock during waiting discard command

Previously, with protection of cmd_lock, we will wait for end io of
discard command which potentially may lead long latency, making worse
concurrency.

So, in this patch, we try to add reference into discard entry to prevent
the entry being released by other thread, then we can avoid holding
global cmd_lock during waiting discard to finish.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: nullify fio->encrypted_page for each writes
Jaegeuk Kim [Wed, 26 Apr 2017 18:11:12 +0000 (11:11 -0700)]
f2fs: nullify fio->encrypted_page for each writes

This makes sure each write request has nullified encrypted_page pointer.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: sanity check segment count
Jin Qian [Tue, 25 Apr 2017 23:28:48 +0000 (16:28 -0700)]
f2fs: sanity check segment count

F2FS uses 4 bytes to represent block address. As a result, supported
size of disk is 16 TB and it equals to 16 * 1024 * 1024 / 2 segments.

Signed-off-by: Jin Qian <jinqian@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: introduce valid_ipu_blkaddr to clean up
Jaegeuk Kim [Mon, 24 Apr 2017 22:20:16 +0000 (15:20 -0700)]
f2fs: introduce valid_ipu_blkaddr to clean up

This patch introduces valid_ipu_blkaddr to clean up checking block address for
inplace-update.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: lookup extent cache first under IPU scenario
Hou Pengyang [Tue, 25 Apr 2017 12:45:13 +0000 (12:45 +0000)]
f2fs: lookup extent cache first under IPU scenario

If a page is cold, NOT atomit written and need_ipu now, there is
a high probability that IPU should be adapted. For IPU, we try to
check extent tree to get the block index first, instead of reading
the dnode page, where may lead to an useless dnode IO, since no need to
update the dnode index for IPU.

Signed-off-by: Hou Pengyang <houpengyang@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: reconstruct code to write a data page
Hou Pengyang [Tue, 25 Apr 2017 12:45:12 +0000 (12:45 +0000)]
f2fs: reconstruct code to write a data page

This patch introduces encrypt_one_page which encrypts one data page before
submit_bio, and change the use of need_inplace_update.

Signed-off-by: Hou Pengyang <houpengyang@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: introduce __wait_discard_cmd
Chao Yu [Tue, 25 Apr 2017 12:21:38 +0000 (20:21 +0800)]
f2fs: introduce __wait_discard_cmd

Just cleanup, no logic change.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: introduce __issue_discard_cmd
Chao Yu [Tue, 25 Apr 2017 12:21:37 +0000 (20:21 +0800)]
f2fs: introduce __issue_discard_cmd

Just cleanup, no logic change.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: enable small discard by default
Chao Yu [Mon, 24 Apr 2017 16:21:35 +0000 (00:21 +0800)]
f2fs: enable small discard by default

This patch start to enable 4K granularity small discard by default
when realtime discard is on, so, in seriously fragmented space,
small size discard can be issued in time to avoid useless storage
space occupying of invalid filesystem's data, then performance of
flash storage can be recovered.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: delay awaking discard thread
Chao Yu [Mon, 24 Apr 2017 16:21:34 +0000 (00:21 +0800)]
f2fs: delay awaking discard thread

It's better to delay awaking discard thread while queuing discard commands
in checkpoint, it will help to give more chances for merging big and small
discard.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: seperate read nat page from nat_tree_lock
Yunlei He [Sat, 22 Apr 2017 10:06:26 +0000 (18:06 +0800)]
f2fs: seperate read nat page from nat_tree_lock

This patch seperate nat page read io from nat_tree_lock.

-lock_page
-get_node_info()
-current_nat_addr

......            ->       write_checkpoint

-get_meta_page

Because we lock node page, we can make sure no other threads
modify this nid concurrently. So we just obtain current_nat_addr
under nat_tree_lock, node info is always same in both nat pack.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: fix multiple f2fs_add_link() having same name for inline dentry
Sheng Yong [Sat, 22 Apr 2017 02:39:20 +0000 (10:39 +0800)]
f2fs: fix multiple f2fs_add_link() having same name for inline dentry

Commit 88c5c13a5027 (f2fs: fix multiple f2fs_add_link() calls having
same name) does not cover the scenario where inline dentry is enabled.
In that case, F2FS_I(dir)->task will be NULL, and __f2fs_add_link will
lookup dentries one more time.

This patch fixes it by moving the assigment of current task to a upper
level to cover both normal and inline dentry.

Cc: <stable@vger.kernel.org>
Fixes: 88c5c13a5027 (f2fs: fix multiple f2fs_add_link() calls having same name)
Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: skip encrypted inode in ASYNC IPU policy
Hou Pengyang [Fri, 21 Apr 2017 12:41:48 +0000 (12:41 +0000)]
f2fs: skip encrypted inode in ASYNC IPU policy

Async request may be throttled in block layer, so page for async may keep WRITE_BACK
for a long time.

For encrytped inode, we need wait on page writeback no matter if the device supports
BDI_CAP_STABLE_WRITES. This may result in a higher waiting page writeback time for
async encrypted inode page.

This patch skips IPU for encrypted inode's updating write.

Signed-off-by: Hou Pengyang <houpengyang@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: fix out-of free segments
Jaegeuk Kim [Thu, 20 Apr 2017 20:51:57 +0000 (13:51 -0700)]
f2fs: fix out-of free segments

This patch also reverts d0db7703ac1 ("f2fs: do SSR in higher priority").

This patch fixes out of free segments caused by many small file creation by
1) mkfs -s 1 2G
2) mount
3) untar
 - preoduce 60000 small files burstly
4) sync
 - flush node pages
 - flush imeta

Here, when we do f2fs_balance_fs, we missed # of imeta blocks, resulting in
skipping to check has_not_enough_free_secs.

Another test is done by
1) mkfs -s 12 2G
2) mount
3) untar
 - preoduce 60000 small files burstly
4) sync
 - flush node pages
 - flush imeta

In this case, this patch also fixes wrong block allocation under large section
size.

Reported-by: William Brana <wbrana@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: add parentheses for macro variables more
Jaegeuk Kim [Thu, 20 Apr 2017 00:36:38 +0000 (17:36 -0700)]
f2fs: add parentheses for macro variables more

This patch adds parentheses for macro variables more in include/linux/f2fs_fs.h.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: improve definition of statistic macros
Arnd Bergmann [Wed, 19 Apr 2017 17:38:33 +0000 (19:38 +0200)]
f2fs: improve definition of statistic macros

With a recent addition of f2fs_lookup_extent_tree(), we get a warning about
the use of empty macros:

fs/f2fs/extent_cache.c: In function 'f2fs_lookup_extent_tree':
fs/f2fs/extent_cache.c:358:32: error: suggest braces around empty body in an 'else' statement [-Werror=empty-body]
   stat_inc_rbtree_node_hit(sbi);

A good way to avoid the warning and make the code more robust is to define
all no-op macros as 'do { } while (0)'.

Fixes: 54c2258cd63a ("f2fs: extract rb-tree operation infrastructure")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reivewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: assign allocation hint for warm/cold data
Jaegeuk Kim [Tue, 18 Apr 2017 22:03:15 +0000 (15:03 -0700)]
f2fs: assign allocation hint for warm/cold data

This patch gives slower device region to warm/cold data area more eagerly.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: fix _IOW usage
Jaegeuk Kim [Tue, 18 Apr 2017 20:47:25 +0000 (13:47 -0700)]
f2fs: fix _IOW usage

This patch fixes wrong _IOW usage.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: add ioctl to flush data from faster device to cold area
Jaegeuk Kim [Thu, 13 Apr 2017 22:17:00 +0000 (15:17 -0700)]
f2fs: add ioctl to flush data from faster device to cold area

This patch adds an ioctl to flush data in faster device to cold area. User can
give device number and number of segments to move. It doesn't move it if there
is only one device.

The parameter looks like:

struct f2fs_flush_device {
u32 dev_num; /* device number to flush */
u32 segments; /* # of segments to flush */
};

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: introduce async IPU policy
Hou Pengyang [Tue, 18 Apr 2017 11:57:16 +0000 (11:57 +0000)]
f2fs: introduce async IPU policy

This patch introduces an ASYNC IPU policy.

Under senario of large # of async updating(e.g. log writing in Android),
disk would be seriously fragmented, and higher frequent gc would be triggered.

This patch uses IPU to rewrite the async update writting, since async is
NOT sensitive to io latency.

Signed-off-by: Hou Pengyang <houpengyang@huawei.com>
7 years agof2fs: add undiscard blocks stat
Chao Yu [Tue, 18 Apr 2017 11:27:39 +0000 (19:27 +0800)]
f2fs: add undiscard blocks stat

This patch adds to account undiscard blocks.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
7 years agof2fs: unlock cp_rwsem early for IPU writes
Chao Yu [Tue, 18 Apr 2017 11:23:39 +0000 (19:23 +0800)]
f2fs: unlock cp_rwsem early for IPU writes

For IPU writes, there won't be any udpates in dnode page since we
will reuse old block address instead of allocating new one, so we
don't need to lock cp_rwsem during IPU IO submitting.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
7 years agof2fs: introduce __check_rb_tree_consistence
Chao Yu [Mon, 17 Apr 2017 10:21:43 +0000 (18:21 +0800)]
f2fs: introduce __check_rb_tree_consistence

Introduce __check_rb_tree_consistence to check consistence of rb-tree
based discard cache in runtime.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: trace __submit_discard_cmd
Chao Yu [Sat, 15 Apr 2017 06:09:38 +0000 (14:09 +0800)]
f2fs: trace __submit_discard_cmd

Add an even class f2fs_discard for introducing f2fs_queue_discard, then
use f2fs_{queue,issue}_discard to trace __{queue,submit}_discard_cmd.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: in prior to issue big discard
Chao Yu [Sat, 15 Apr 2017 06:09:37 +0000 (14:09 +0800)]
f2fs: in prior to issue big discard

Keep issuing big size discard in prior instead of the one with random
size, so that we expect that it will help to:
- be quick to recycle unused large space in flash storage device.
- give a chance for
  a) wait to merge small piece discards into bigger one, or
  b) avoid issuing discards while they have being reallocated by SSR.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: clean up discard_cmd_control structure
Chao Yu [Sat, 15 Apr 2017 06:09:36 +0000 (14:09 +0800)]
f2fs: clean up discard_cmd_control structure

Avoid long variable name in discard_cmd_control structure, no logic
change.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: use rb-tree to track pending discard commands
Chao Yu [Fri, 14 Apr 2017 15:24:55 +0000 (23:24 +0800)]
f2fs: use rb-tree to track pending discard commands

Introduce rb-tree based discard cache infrastructure to speed up lookup and
merge operation of discard entry.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: initialize dc to avoid build warning]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: avoid dirty node pages in check_only recovery
Jaegeuk Kim [Fri, 14 Apr 2017 22:46:23 +0000 (15:46 -0700)]
f2fs: avoid dirty node pages in check_only recovery

In the check_only mode, we should not make any dirty node pages. Otherwise,
we can get this panic:

F2FS-fs (nvme0n1p1): Need to recover fsync data
------------[ cut here ]------------
kernel BUG at fs/f2fs/node.c:2204!
CPU: 7 PID: 19923 Comm: mount Tainted: G           OE   4.9.8 #2
RIP: 0010:[<ffffffffc0979c0b>]  [<ffffffffc0979c0b>] flush_nat_entries+0x43b/0x7d0 [f2fs]
Call Trace:
 [<ffffffffc096ddaa>] ? __f2fs_submit_merged_bio+0x5a/0xd0 [f2fs]
 [<ffffffffc096ddaa>] ? __f2fs_submit_merged_bio+0x5a/0xd0 [f2fs]
 [<ffffffffc096dddb>] ? __f2fs_submit_merged_bio+0x8b/0xd0 [f2fs]
 [<ffffffff860e450f>] ? up_write+0x1f/0x40
 [<ffffffffc096dddb>] ? __f2fs_submit_merged_bio+0x8b/0xd0 [f2fs]
 [<ffffffffc0969f04>] write_checkpoint+0x2f4/0xf20 [f2fs]
 [<ffffffff860e938d>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffffc0960bc9>] ? f2fs_sync_fs+0x79/0x190 [f2fs]
 [<ffffffffc0960bc9>] ? f2fs_sync_fs+0x79/0x190 [f2fs]
 [<ffffffffc0960bd5>] f2fs_sync_fs+0x85/0x190 [f2fs]
 [<ffffffffc097b6de>] f2fs_balance_fs_bg+0x7e/0x1c0 [f2fs]
 [<ffffffffc0977b64>] f2fs_write_node_pages+0x34/0x350 [f2fs]
 [<ffffffff860e5f42>] ? __lock_is_held+0x52/0x70
 [<ffffffff861d9b31>] do_writepages+0x21/0x30
 [<ffffffff86298ce1>] __writeback_single_inode+0x61/0x760
 [<ffffffff86909127>] ? _raw_spin_unlock+0x27/0x40
 [<ffffffff8629a735>] writeback_single_inode+0xd5/0x190
 [<ffffffff8629a889>] write_inode_now+0x99/0xc0
 [<ffffffff86283876>] iput+0x1f6/0x2c0
 [<ffffffffc0964b52>] f2fs_fill_super+0xc32/0x10c0 [f2fs]
 [<ffffffff86266462>] mount_bdev+0x182/0x1b0
 [<ffffffffc0963f20>] ? f2fs_commit_super+0x100/0x100 [f2fs]
 [<ffffffffc0960da5>] f2fs_mount+0x15/0x20 [f2fs]
 [<ffffffff86266e08>] mount_fs+0x38/0x170
 [<ffffffff86288bab>] vfs_kern_mount+0x6b/0x160
 [<ffffffff8628bcfe>] do_mount+0x1be/0xd60

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: fix not to set fsync/dentry mark
Jaegeuk Kim [Wed, 12 Apr 2017 19:02:00 +0000 (12:02 -0700)]
f2fs: fix not to set fsync/dentry mark

Otherwise, we can see stale fsync/dentry mark given by previous calls, resulting
in giving up roll-forward recovery due to wrong dentry mark.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: allocate hot_data for atomic writes
Jaegeuk Kim [Wed, 12 Apr 2017 17:01:33 +0000 (10:01 -0700)]
f2fs: allocate hot_data for atomic writes

We'd better allocate atomic writes to hot_data zone.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: give time to flush dirty pages for checkpoint
Jaegeuk Kim [Wed, 12 Apr 2017 02:15:33 +0000 (19:15 -0700)]
f2fs: give time to flush dirty pages for checkpoint

If all the threads are waiting for checkpoint, we have no chance to flush
required dirty pages.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: fix fs corruption due to zero inode page
Jaegeuk Kim [Wed, 12 Apr 2017 02:01:26 +0000 (19:01 -0700)]
f2fs: fix fs corruption due to zero inode page

This patch fixes the following scenario.

- f2fs_create/f2fs_mkdir             - write_checkpoint
 - f2fs_mark_inode_dirty_sync         - block_operations
                                       - f2fs_lock_all
                                       - f2fs_sync_inode_meta
                                        - f2fs_unlock_all
                                        - sync_inode_metadata
 - f2fs_lock_op
                                         - f2fs_write_inode
                                          - update_inode_page
                                           - get_node_page
                                             return -ENOENT
 - new_inode_page
  - fill_node_footer
 - f2fs_mark_inode_dirty_sync
 - ...
 - f2fs_unlock_op
                                          - f2fs_inode_synced
                                       - f2fs_lock_all
                                       - do_checkpoint

In this checkpoint, we can get an inode page which contains zeros having valid
node footer only.

Cc: <stable@vger.kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: shrink blk plug region
Chao Yu [Mon, 27 Mar 2017 10:14:05 +0000 (18:14 +0800)]
f2fs: shrink blk plug region

Don't use blk plug covering area where there won't be any IOs being issued.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: extract rb-tree operation infrastructure
Chao Yu [Tue, 11 Apr 2017 01:25:22 +0000 (09:25 +0800)]
f2fs: extract rb-tree operation infrastructure

rb-tree lookup/update functions are deeply coupled into extent cache
codes, it's very hard to reuse these basic functions, this patch
extracts common rb-tree operation infrastructure for latter reusing.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: avoid frequent checkpoint during f2fs_gc
Jaegeuk Kim [Sat, 8 Apr 2017 00:25:54 +0000 (17:25 -0700)]
f2fs: avoid frequent checkpoint during f2fs_gc

Now we're doing SSR aggressively more than ever before, so once we reach to
the reserved_segment, f2fs_balance_fs will call f2fs_gc, which triggers
checkpoint everytime. We actually must avoid that.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: clean up some macros in terms of GET_SEGNO
Jaegeuk Kim [Fri, 7 Apr 2017 22:08:17 +0000 (15:08 -0700)]
f2fs: clean up some macros in terms of GET_SEGNO

This patch cleans several macros by introducing:
- BLKS_PER_SEC
- GET_SEC_FROM_SEG
- GET_SEG_FROM_SEC
- GET_ZONE_FROM_SEC
- GET_ZONE_FROM_SEG

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: clean up get_valid_blocks with consistent parameter
Jaegeuk Kim [Fri, 7 Apr 2017 21:33:22 +0000 (14:33 -0700)]
f2fs: clean up get_valid_blocks with consistent parameter

This patch cleans up get_valid_blocks, which has no functional change.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: use segment number for get_valid_blocks
Jaegeuk Kim [Fri, 7 Apr 2017 21:27:07 +0000 (14:27 -0700)]
f2fs: use segment number for get_valid_blocks

This patch fixes to submit a segment number for get_valid_blocks.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: guard macro variables with braces
Tomohiro Kusumi [Sat, 8 Apr 2017 23:11:36 +0000 (02:11 +0300)]
f2fs: guard macro variables with braces

Add braces around variables used within macros for those make sense
to do it. Many of the macros in f2fs already do this. What this commit
doesn't do is anything that changes line# as a result of adding braces,
which usually affects the binary via __LINE__.

Confirmed no diff in fs/f2fs/f2fs.ko before/after this commit on x86_64,
to make sure this has no functional change as well as there's been no
unexpected side effect due to callers' arithmetics within the existing
code.

Signed-off-by: Tomohiro Kusumi <tkusumi@tuxera.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: fix comment on f2fs_flush_merged_bios() after 86531d6b
Tomohiro Kusumi [Wed, 5 Apr 2017 19:49:44 +0000 (22:49 +0300)]
f2fs: fix comment on f2fs_flush_merged_bios() after 86531d6b

Callers are to unlock the page on failure after 86531d6b.

Signed-off-by: Tomohiro Kusumi <tkusumi@tuxera.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: prevent waiter encountering incorrect discard states
Chao Yu [Wed, 5 Apr 2017 10:26:26 +0000 (18:26 +0800)]
f2fs: prevent waiter encountering incorrect discard states

In f2fs_submit_discard_endio, we will wake up waiter before setting
discard command states, so waiter may use incorrect states. Change
the order between complete() and states setting to fix this issue.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: introduce f2fs_wait_discard_bios
Chao Yu [Wed, 5 Apr 2017 10:19:49 +0000 (18:19 +0800)]
f2fs: introduce f2fs_wait_discard_bios

Split f2fs_wait_discard_bios from f2fs_wait_discard_bio, just for cleanup,
no logic change.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: split discard_cmd_list
Chao Yu [Wed, 5 Apr 2017 10:19:48 +0000 (18:19 +0800)]
f2fs: split discard_cmd_list

Split discard_cmd_list to discard_{pend,wait}_list, so while sending/waiting
discard command, we can avoid traversing unneeded entries in original list.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agoRevert "f2fs: put allocate_segment after refresh_sit_entry"
Jaegeuk Kim [Tue, 4 Apr 2017 23:45:30 +0000 (16:45 -0700)]
Revert "f2fs: put allocate_segment after refresh_sit_entry"

This reverts commit 3436c4bdb30de421d46f58c9174669fbcfd40ce0.

This makes a leak to register dirty segments. I reproduced the issue by
modified postmark which injects a lot of file create/delete/update and
finally triggers huge number of SSR allocations.

Cc: <stable@vger.kernel.org> # v4.10+
[Jaegeuk Kim: Change missing incorrect comment]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: split make_dentry_ptr() into block and inline versions
Tomohiro Kusumi [Tue, 4 Apr 2017 10:01:22 +0000 (13:01 +0300)]
f2fs: split make_dentry_ptr() into block and inline versions

Since callers statically know which type to use, make_dentry_ptr()
can simply be splitted into two inline functions. This way, the code
has less inlined, fewer arguments, and no cast.

Signed-off-by: Tomohiro Kusumi <tkusumi@tuxera.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: submit bio of in-place-update pages
Jaegeuk Kim [Fri, 31 Mar 2017 04:02:46 +0000 (21:02 -0700)]
f2fs: submit bio of in-place-update pages

This patch tries to split in-place-update bios from sequential bios.

Suggested-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: remove the redundant variable definition
Kaixu Xia [Sat, 1 Apr 2017 18:39:48 +0000 (02:39 +0800)]
f2fs: remove the redundant variable definition

The variable 'i' has been defined before, so here we can
use it directly.

Signed-off-by: Kaixu Xia <xiakaixu@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: avoid IO split due to mixed WB_SYNC_ALL and WB_SYNC_NONE
Jaegeuk Kim [Wed, 29 Mar 2017 01:07:38 +0000 (18:07 -0700)]
f2fs: avoid IO split due to mixed WB_SYNC_ALL and WB_SYNC_NONE

If two threads try to flush dirty pages in different inodes respectively,
f2fs_write_data_pages() will produce WRITE and WRITE_SYNC one at a time,
resulting in a lot of 4KB seperated IOs.

So, this patch gives higher priority to WB_SYNC_ALL IOs and gathers write
IOs with a big WRITE_SYNC'ed bio.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: write small sized IO to hot log
Jaegeuk Kim [Sat, 25 Mar 2017 00:05:13 +0000 (20:05 -0400)]
f2fs: write small sized IO to hot log

It would better split small and large IOs separately in order to get more
consecutive big writes.

The default threshold is set to 64KB, but configurable by sysfs/min_hot_blocks.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: use bitmap in discard_entry
Chao Yu [Tue, 28 Mar 2017 10:18:50 +0000 (18:18 +0800)]
f2fs: use bitmap in discard_entry

This patch changes to use bitmap instead of extent in struct discard_entry
to indicate discard range in one segment, for fragmented space, this
implementation can save memory footprint.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: clean up destroy_discard_cmd_control
Chao Yu [Mon, 27 Mar 2017 10:14:04 +0000 (18:14 +0800)]
f2fs: clean up destroy_discard_cmd_control

Remove unneeded parameter and simply change flow in
destroy_discard_cmd_control.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: count discard command entry
Chao Yu [Sat, 25 Mar 2017 09:19:59 +0000 (17:19 +0800)]
f2fs: count discard command entry

Adds to count discard command entry and show the number in debugfs,
also fix to add cost of discard command cache into total comsumed
memory footprint.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: show issued flush/discard count
Chao Yu [Sat, 25 Mar 2017 09:19:58 +0000 (17:19 +0800)]
f2fs: show issued flush/discard count

Show historical count of flush command and discard command.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: relax node version check for victim data in gc
Jaegeuk Kim [Tue, 21 Mar 2017 14:59:50 +0000 (10:59 -0400)]
f2fs: relax node version check for victim data in gc

- has_not_enough_free_secs
node_secs: 0  dent_secs: 0  freed:0  free_segments:103  reserved:104

          - f2fs_gc
             - get_victim_by_default
alloc_mode 0, gc_mode 1, max_search 2672, offset 4654, ofs_unit 1

                - do_garbage_collect
start_segno 3976, end_segno 3977   type 0

                  - is_alive
nid 22797, blkaddr 2131882, ofs_in_node 0, version 0x8/0x0

                   - gc_data_segment 766, segno 3976, block 512/426 not alive

So, this patch fixes subtle corrupted case where node version does not match
to summary version which results in infinite loop by gc.

Reported-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: start SSR much eariler to avoid FG_GC
Jaegeuk Kim [Sat, 25 Mar 2017 01:08:56 +0000 (21:08 -0400)]
f2fs: start SSR much eariler to avoid FG_GC

This patch initiates SSR much eariler, resulting in less FG_GC.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: allocate node and hot data in the beginning of partition
Jaegeuk Kim [Sat, 25 Mar 2017 00:41:45 +0000 (20:41 -0400)]
f2fs: allocate node and hot data in the beginning of partition

In order to give more spatial locality, this patch changes the block allocation
policy which assigns beginning of partition for small and hot data/node blocks.
In order to do this, we set noheap allocation by default and introduce another
mount option, heap, to reset it back.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: fix wrong max cost initialization
Jaegeuk Kim [Sat, 25 Mar 2017 07:03:02 +0000 (00:03 -0700)]
f2fs: fix wrong max cost initialization

This patch fixes missing increased max cost caused by a patch that we increased
cose of data segments in greedy algorithm.

Cc: <stable@vger.kernel.org> # v4.10+
Fixes: b9cd20619 "f2fs: node segment is prior to data segment selected victim"
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: allow write page cache when writting cp
Yunlei He [Mon, 13 Mar 2017 12:22:18 +0000 (20:22 +0800)]
f2fs: allow write page cache when writting cp

This patch allow write data to normal file when writting
new checkpoint.

We relax three limitations for write_begin path:
1. data allocation
2. node allocation
3. variables in checkpoint

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: don't reserve additional space in xattr block
Chao Yu [Thu, 23 Mar 2017 05:38:26 +0000 (13:38 +0800)]
f2fs: don't reserve additional space in xattr block

In this patch, we change xattr block disk layout as below:

Before:
xattr node block layout
+---------------------------------------------+---------------+-------------+
|           node block xattr entries          |   reserved    | node footer |
|                  4068 Bytes                 |    4 Bytes    |  24 Bytes   |

In memory layout
+--------------------+---------------------------------+--------------------+
|    inline xattr    |     node block xattr entries    |      reserved      |
|     200 Bytes      |         4068 Bytes              |      4 Bytes       |

After:
xattr node block layout
+-------------------------------------------------------------+-------------+
|                  node block xattr entries                   | node footer |
|                         4072 Bytes                          |  24 Bytes   |

In memory layout
+--------------------+---------------------------------+--------------------+
|    inline xattr    |     node block xattr entries    |      reserved      |
|     200 Bytes      |         4072 Bytes              |      4 Bytes       |

With this change, we don't need to reserve additional space in node block,
just keep reserved space in logical in-memory layout. So that it would help
to enlarge valid free space of xattr node block.

As tested, generic/026 shows max stored xattr entires number increases from
531 to 532 when inline_xattr option is enabled.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: clean up xattr operation
Chao Yu [Thu, 23 Mar 2017 05:38:25 +0000 (13:38 +0800)]
f2fs: clean up xattr operation

1. don't allocate redundant memory in read_all_xattrs.
2. introduce RESERVED_XATTR_SIZE for cleanup.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Reviewed-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: don't track volatile file in dirty inode list
Chao Yu [Wed, 22 Mar 2017 09:23:46 +0000 (17:23 +0800)]
f2fs: don't track volatile file in dirty inode list

Don't track volatile file in dirty inode list, otherwise with data_flush
option, background thread will entry into endless loop for flushing
journal file's pages.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: show the max number of volatile operations
Chao Yu [Wed, 22 Mar 2017 09:23:45 +0000 (17:23 +0800)]
f2fs: show the max number of volatile operations

This patch adds to show the max number of volatile operations which are
conducting concurrently.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: fix race condition in between free nid allocator/initializer
Chao Yu [Wed, 22 Mar 2017 06:45:05 +0000 (14:45 +0800)]
f2fs: fix race condition in between free nid allocator/initializer

In below concurrent case, allocated nid can be loaded into free nid cache
and be allocated again.

Thread A Thread B
- f2fs_create
 - f2fs_new_inode
  - alloc_nid
   - __insert_nid_to_list(ALLOC_NID_LIST)
- f2fs_balance_fs_bg
 - build_free_nids
  - __build_free_nids
   - scan_nat_page
    - add_free_nid
     - __lookup_nat_cache
 - f2fs_add_link
  - init_inode_metadata
   - new_inode_page
    - new_node_page
     - set_node_addr
 - alloc_nid_done
  - __remove_nid_from_list(ALLOC_NID_LIST)
     - __insert_nid_to_list(FREE_NID_LIST)

This patch makes nat cache lookup and free nid list operation being atomical
to avoid this race condition.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: use set_page_private marcro in f2fs_trace_pid
Yunlei He [Wed, 22 Mar 2017 03:59:30 +0000 (11:59 +0800)]
f2fs: use set_page_private marcro in f2fs_trace_pid

Use set_page_private marcro instead of operte page struct
directly

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: fix recording invalid last_victim
Chao Yu [Tue, 21 Mar 2017 12:09:45 +0000 (20:09 +0800)]
f2fs: fix recording invalid last_victim

When doing garbage collection, we try to record segment offset which
locates at next one of last victim, using it as the start offset in
next searching.

But in some corner cases, recorded offset may cross the end of main
segment area, it will cause incorrectly searching in dirty_segmap
bitmap. This patch adds modular operation to avoid this issue.

Reported-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: more reasonable mem_size calculating of ino_entry
Kinglong Mee [Sat, 18 Mar 2017 01:26:13 +0000 (09:26 +0800)]
f2fs: more reasonable mem_size calculating of ino_entry

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: calculate the f2fs_stat_info into base_mem
Kinglong Mee [Sat, 18 Mar 2017 01:25:05 +0000 (09:25 +0800)]
f2fs: calculate the f2fs_stat_info into base_mem

The memory size of f2fs_stat_info also should be calculated.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: avoid stat_inc_atomic_write for non-atomic file
Kinglong Mee [Sat, 18 Mar 2017 01:20:55 +0000 (09:20 +0800)]
f2fs: avoid stat_inc_atomic_write for non-atomic file

After filemap_write_and_wait_range fail, the FI_ATOMIC_FILE flags is removed,
so that f2fs should not increase the stat of atomic_write.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: add missing INMEM_REVOKE trace enum definition
Chao Yu [Fri, 17 Mar 2017 10:46:14 +0000 (18:46 +0800)]
f2fs: add missing INMEM_REVOKE trace enum definition

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: sanity check of crc_offset from raw checkpoint
Kinglong Mee [Wed, 15 Mar 2017 13:12:50 +0000 (21:12 +0800)]
f2fs: sanity check of crc_offset from raw checkpoint

The crc_offset towards or beyond the end of block is wrong,
sanity check it.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: cleanup the disk level filename updating
Kinglong Mee [Fri, 10 Mar 2017 08:28:46 +0000 (16:28 +0800)]
f2fs: cleanup the disk level filename updating

As discuss with Jaegeuk and Chao,
"Once checkpoint is done, f2fs doesn't need to update there-in filename at all."

The disk-level filename is used only one case,
1. create a file A under a dir
2. sync A
3. godown
4. umount
5. mount (roll_forward)

Only the rename/cross_rename changes the filename, if it happens,
a. between step 1 and 2, the sync A will caused checkpoint, so that,
   the roll_forward at step 5 never happens.
b. after step 2, the roll_forward happens, file A will roll forward
   to the result as after step 1.

So that, any updating the disk filename is useless, just cleanup it.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: cover update_free_nid_bitmap with nid_list_lock
Chao Yu [Mon, 13 Mar 2017 12:10:41 +0000 (20:10 +0800)]
f2fs: cover update_free_nid_bitmap with nid_list_lock

free_nid_bitmap and free_nid_count in update_free_nid_bitmap should be
updated atomically, use nid_list_lock cover them to avoid race in
concurrent scenario.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Reviewed-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: fix bad prefetchw of NULL page
Kinglong Mee [Mon, 13 Mar 2017 08:35:13 +0000 (16:35 +0800)]
f2fs: fix bad prefetchw of NULL page

For f2fs_read_data_pages, the f2fs_mpage_readpages gets "page == NULL",
so that, the prefetchw(&page->flags) is operated on NULL.

Fixes: f1e8866016 ("f2fs: expose f2fs_mpage_readpages")
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: clear FI_DATA_EXIST flag in truncate_inline_inode
Kinglong Mee [Fri, 10 Mar 2017 12:43:20 +0000 (20:43 +0800)]
f2fs: clear FI_DATA_EXIST flag in truncate_inline_inode

Clear FI_DATA_EXIST flag atomically in truncate_inline_inode, and
the return value from truncate_inline_inode isn't used, remove it.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: move mnt_want_write_file after arguments checking
Kinglong Mee [Fri, 10 Mar 2017 09:55:07 +0000 (17:55 +0800)]
f2fs: move mnt_want_write_file after arguments checking

It's needless of mnt_want_write_file for arguments checking.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: check new size by inode_newsize_ok in f2fs_insert_range
Kinglong Mee [Fri, 10 Mar 2017 09:54:52 +0000 (17:54 +0800)]
f2fs: check new size by inode_newsize_ok in f2fs_insert_range

The inode_newsize_ok is better than only checking the maxbytes,
eg. the rlimit etc.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: avoid copy date to user-space if move file range fail
Kinglong Mee [Fri, 10 Mar 2017 09:54:26 +0000 (17:54 +0800)]
f2fs: avoid copy date to user-space if move file range fail

If move file range return error, the data copied to user-space is duplicate.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: drop duplicate new_size assign in f2fs_zero_range
Kinglong Mee [Fri, 10 Mar 2017 09:54:03 +0000 (17:54 +0800)]
f2fs: drop duplicate new_size assign in f2fs_zero_range

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: adjust the way of calculating nat block
Fan Li [Wed, 8 Mar 2017 05:39:16 +0000 (13:39 +0800)]
f2fs: adjust the way of calculating nat block

use a slightly simpler expression to calculate nat block with nid.

Signed-off-by: Fan Li <fanofcode.li@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: add fault injection on f2fs_truncate
Jaegeuk Kim [Thu, 9 Mar 2017 23:24:24 +0000 (15:24 -0800)]
f2fs: add fault injection on f2fs_truncate

Inject a fault during f2fs_truncate().

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: check range before defragment
Sheng Yong [Wed, 8 Mar 2017 02:47:12 +0000 (10:47 +0800)]
f2fs: check range before defragment

This patch checks the parameter range passed by ioctl to void that range
exceeds the max_file_blocks limit.

Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: use parameter max_items instead of PIDVEC_SIZE
Sheng Yong [Wed, 8 Mar 2017 02:47:11 +0000 (10:47 +0800)]
f2fs: use parameter max_items instead of PIDVEC_SIZE

Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: add a punch discard command function
Yunlei He [Thu, 2 Mar 2017 02:36:20 +0000 (10:36 +0800)]
f2fs: add a punch discard command function

This patch add a function to punch discard command if one segment
reuse before discard. Split this segment from multi-segments discard
range, and discard the left bigger range.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: allocate a bio for discarding when actually issuing it
Jaegeuk Kim [Wed, 8 Mar 2017 02:02:02 +0000 (18:02 -0800)]
f2fs: allocate a bio for discarding when actually issuing it

Let's allocate a bio when issuing discard commands later.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: skip writeback meta pages if cp_mutex acquire failed
Yunlei He [Wed, 1 Mar 2017 10:07:10 +0000 (18:07 +0800)]
f2fs: skip writeback meta pages if cp_mutex acquire failed

Skip writeback meta pages if cp_mutex lock acquire failed, cp will
flush dirty pages instead.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: show more precise message on orphan recovery failure
Jaegeuk Kim [Tue, 7 Mar 2017 21:54:56 +0000 (13:54 -0800)]
f2fs: show more precise message on orphan recovery failure

This case is not caused by fsck.f2fs. User needs to retry mount.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: remove dead macro PGOFS_OF_NEXT_DNODE
Kinglong Mee [Tue, 28 Feb 2017 13:34:37 +0000 (21:34 +0800)]
f2fs: remove dead macro PGOFS_OF_NEXT_DNODE

Fixes: 3cf4574705 ("f2fs: introduce get_next_page_offset to speed up SEEK_DATA")
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: drop duplicate radix tree lookup of nat_entry_set
Kinglong Mee [Tue, 28 Feb 2017 13:34:47 +0000 (21:34 +0800)]
f2fs: drop duplicate radix tree lookup of nat_entry_set

The nat entry is listed from the set list for freeing,
it's duplicate to do radix tree lookup again.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
[Jaegeuk Kim: remove unnecessary f2fs_bug_on]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: make sure trace all f2fs_issue_flush
Kinglong Mee [Sat, 4 Mar 2017 14:13:10 +0000 (22:13 +0800)]
f2fs: make sure trace all f2fs_issue_flush

The root device's issue flush trace is missing,
add it and tracing the result from submit.

Fixes d50aaeec90 ("f2fs: show actual device info in tracepoints")
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: don't allow volatile writes for non-regular file
Chao Yu [Fri, 17 Mar 2017 07:43:57 +0000 (15:43 +0800)]
f2fs: don't allow volatile writes for non-regular file

Now f2fs only supports volatile writes for journal db regular file.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: don't allow atomic writes for not regular files
Jaegeuk Kim [Fri, 17 Mar 2017 02:04:15 +0000 (10:04 +0800)]
f2fs: don't allow atomic writes for not regular files

The atomic writes only supports regular files for database.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: fix stale ATOMIC_WRITTEN_PAGE private pointer
Jaegeuk Kim [Fri, 17 Mar 2017 01:55:52 +0000 (09:55 +0800)]
f2fs: fix stale ATOMIC_WRITTEN_PAGE private pointer

When I forced to enable atomic operations intentionally, I could hit the below
panic, since we didn't clear page->private in f2fs_invalidate_page called by
file truncation.

The panic occurs due to NULL mapping having page->private.

BUG: unable to handle kernel paging request at ffffffffffffffff
IP: drop_buffers+0x38/0xe0
PGD 5d00c067
PUD 5d00e067
PMD 0
CPU: 3 PID: 1648 Comm: fsstress Tainted: G      D    OE   4.10.0+ #5
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
task: ffff9151952863c0 task.stack: ffffaaec40db4000
RIP: 0010:drop_buffers+0x38/0xe0
RSP: 0018:ffffaaec40db74c8 EFLAGS: 00010292
Call Trace:
 ? page_referenced+0x8b/0x170
 try_to_free_buffers+0xc5/0xe0
 try_to_release_page+0x49/0x50
 shrink_page_list+0x8bc/0x9f0
 shrink_inactive_list+0x1dd/0x500
 ? shrink_active_list+0x2c0/0x430
 shrink_node_memcg+0x5eb/0x7c0
 shrink_node+0xe1/0x320
 do_try_to_free_pages+0xef/0x2e0
 try_to_free_pages+0xe9/0x190
 __alloc_pages_slowpath+0x390/0xe70
 __alloc_pages_nodemask+0x291/0x2b0
 alloc_pages_current+0x95/0x140
 __page_cache_alloc+0xc4/0xe0
 pagecache_get_page+0xab/0x2a0
 grab_cache_page_write_begin+0x20/0x40
 get_read_data_page+0x2e6/0x4c0 [f2fs]
 ? f2fs_mark_inode_dirty_sync+0x16/0x30 [f2fs]
 ? truncate_data_blocks_range+0x238/0x2b0 [f2fs]
 get_lock_data_page+0x30/0x190 [f2fs]
 __exchange_data_block+0xaaf/0xf40 [f2fs]
 f2fs_fallocate+0x418/0xd00 [f2fs]
 vfs_fallocate+0x157/0x220
 SyS_fallocate+0x48/0x80

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Chao Yu: use INMEM_INVALIDATE for better tracing]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: build stat_info before orphan inode recovery
Jaegeuk Kim [Tue, 7 Mar 2017 21:41:22 +0000 (13:41 -0800)]
f2fs: build stat_info before orphan inode recovery

f2fs_sync_fs() -> write_checkpoint() calls stat_inc_cp_count(sbi->stat_info),
which needs stat_info allocation.
Otherwise, we can hit:

[254042.598623]  ? count_shadow_nodes+0xa0/0xa0
[254042.598633]  f2fs_sync_fs+0x65/0xd0 [f2fs]
[254042.598645]  f2fs_balance_fs_bg+0xe4/0x1c0 [f2fs]
[254042.598657]  f2fs_write_node_pages+0x34/0x1a0 [f2fs]
[254042.598664]  ? pagevec_lookup_entries+0x1e/0x30
[254042.598673]  do_writepages+0x1e/0x30
[254042.598682]  __writeback_single_inode+0x45/0x330
[254042.598688]  writeback_single_inode+0xd7/0x190
[254042.598694]  write_inode_now+0x86/0xa0
[254042.598699]  iput+0x122/0x200
[254042.598709]  f2fs_fill_super+0xd4a/0x14d0 [f2fs]
[254042.598717]  mount_bdev+0x184/0x1c0
[254042.598934]  ? f2fs_commit_super+0x100/0x100 [f2fs]
[254042.599142]  f2fs_mount+0x15/0x20 [f2fs]
[254042.599349]  mount_fs+0x39/0x160
[254042.599554]  ? __alloc_percpu+0x15/0x20
[254042.599759]  vfs_kern_mount+0x67/0x110
[254042.599972]  do_mount+0x1bb/0xc80
[254042.600175]  ? memdup_user+0x42/0x60
[254042.600380]  SyS_mount+0x83/0xd0
[254042.600583]  entry_SYSCALL_64_fastpath+0x1e/0xad

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: fix the fault of calculating blkstart twice
Kinglong Mee [Wed, 8 Mar 2017 01:49:53 +0000 (09:49 +0800)]
f2fs: fix the fault of calculating blkstart twice

When the zone type is BLK_ZONE_TYPE_CONVENTIONAL, the blkstart is
calculated twice.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: fix the fault of checking F2FS_LINK_MAX for rename inode
Kinglong Mee [Sat, 4 Mar 2017 13:48:28 +0000 (21:48 +0800)]
f2fs: fix the fault of checking F2FS_LINK_MAX for rename inode

The parent directory's nlink will change, not the inode.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: don't allow to get pino when filename is encrypted
Jaegeuk Kim [Tue, 7 Mar 2017 19:22:45 +0000 (11:22 -0800)]
f2fs: don't allow to get pino when filename is encrypted

After renaming an encrypted file, we have no way to get its
encrypted filename from its dentry.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>