Jaegeuk Kim [Thu, 4 Dec 2014 04:47:26 +0000 (20:47 -0800)]
f2fs: call radix_tree_preload before radix_tree_insert
This patch tries to fix:
BUG: using smp_processor_id() in preemptible [
00000000] code: f2fs_gc-254:0/384
(radix_tree_node_alloc+0x14/0x74) from [<
c033d8a0>] (radix_tree_insert+0x110/0x200)
(radix_tree_insert+0x110/0x200) from [<
c02e8264>] (gc_data_segment+0x340/0x52c)
(gc_data_segment+0x340/0x52c) from [<
c02e8658>] (f2fs_gc+0x208/0x400)
(f2fs_gc+0x208/0x400) from [<
c02e8a98>] (gc_thread_func+0x248/0x28c)
(gc_thread_func+0x248/0x28c) from [<
c0139944>] (kthread+0xa0/0xac)
(kthread+0xa0/0xac) from [<
c0105ef8>] (ret_from_fork+0x14/0x3c)
The reason is that f2fs calls radix_tree_insert under enabled preemption.
So, before calling it, we need to call radix_tree_preload.
Otherwise, we should use _GFP_WAIT for the radix tree, and use mutex or
semaphore to cover the radix tree operations.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Thu, 4 Dec 2014 05:15:10 +0000 (21:15 -0800)]
f2fs: use rw_semaphore for nat entry lock
Previoulsy, we used rwlock for nat_entry lock.
But, now we have a lot of complex operations in set_node_addr.
(e.g., allocating kernel memories, handling radix_trees, and so on)
So, this patches tries to change spinlock to rw_semaphore to give CPUs to other
threads.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Thu, 4 Dec 2014 00:40:28 +0000 (16:40 -0800)]
f2fs: fix missing kmem_cache_free
This patch fixes missing kmem_cache_free when handling errors.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Changman Lee [Fri, 28 Nov 2014 15:49:40 +0000 (15:49 +0000)]
f2fs: more fast lookup for gc_inode list
If there are many inodes that have data blocks in victim segment,
it takes long time to find a inode in gc_inode list.
Let's use radix_tree to reduce lookup time.
Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Changman Lee [Mon, 1 Dec 2014 07:29:58 +0000 (16:29 +0900)]
f2fs: cleanup redundant macro
We've already made fi and sbi for inode. Let's avoid duplicated work.
Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Mon, 1 Dec 2014 03:30:20 +0000 (11:30 +0800)]
f2fs: fix to return correct error number in f2fs_write_begin
Fix the wrong error number in error path of f2fs_write_begin.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Changman Lee [Thu, 27 Nov 2014 07:03:08 +0000 (16:03 +0900)]
f2fs: cleanup if-statement of phase in gc_data_segment
Little cleanup to distinguish each phase easily
Signed-off-by: Changman Lee <cm224.lee@samsung.com>
[Jaegeuk Kim: modify indentation for code readability]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Wed, 26 Nov 2014 01:27:38 +0000 (17:27 -0800)]
f2fs: fix to recover converted inline_data
If an inode has converted inline_data which was written to the disk, we should
set its inode flag for further fsync so that this inline_data can be recovered
from sudden power off.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Tue, 25 Nov 2014 19:34:02 +0000 (11:34 -0800)]
f2fs: make clean the page before writing
If a page is set to be written to the disk, we can make clean the page.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Changman Lee [Tue, 25 Nov 2014 03:44:24 +0000 (12:44 +0900)]
f2fs: no more dirty_nat_entires when flushing
After flushing dirty nat entries, it has to be no more dirty nat
entries.
Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Changman Lee [Tue, 25 Nov 2014 03:44:23 +0000 (12:44 +0900)]
f2fs: check dirty_nat_cnt before flushing nat entries in journal
It's meaningless to check dirty_nat_cnt after re-dirtying nat entries in
journal. And although there are rooms for dirty nat entires if dirty_nat_cnt
is zero, it's also meaningless to check __has_cursum_space.
Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Tue, 25 Nov 2014 18:59:45 +0000 (10:59 -0800)]
f2fs: fix deadlock during inline_data conversion
A deadlock can be occurred:
Thread 1] Thread 2]
- f2fs_write_data_pages - f2fs_write_begin
- lock_page(page #0)
- grab_cache_page(page #X)
- get_node_page(inode_page)
- grab_cache_page(page #0)
: to convert inline_data
- f2fs_write_data_page
- f2fs_write_inline_data
- get_node_page(inode_page)
In this case, trying to lock inode_page and page #0 causes deadlock.
In order to avoid this, this patch adds a rule for this locking policy,
which is that page #0 should be locked followed by inode_page lock.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Markus Elfring [Mon, 24 Nov 2014 14:52:00 +0000 (15:52 +0100)]
f2fs: fix typos for the word "destroy" in jump labels
Two jump labels were adjusted in the implementation of the
create_node_manager_caches() function because these identifiers
contained typos.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Sat, 22 Nov 2014 00:37:40 +0000 (16:37 -0800)]
f2fs: fix livelock calling f2fs_iget during f2fs_evict_inode
In f2fs_evict_inode,
commit_inmemory_pages
f2fs_gc
f2fs_iget
iget_locked
-> wait for inode free
Here, if the inode is same as the one to be evicted, f2fs should wait forever.
Actually, we should not call f2fs_balance_fs during f2fs_evict_inode to avoid
this.
But, the commit_inmem_pages calls f2fs_balance_fs by default, even if
f2fs_evict_inode wants to free inmemory pages only.
Hence, this patch adds to trigger f2fs_balance_fs only when there is something
to write.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Sat, 22 Nov 2014 00:36:28 +0000 (16:36 -0800)]
f2fs: introduce f2fs_dentry_kunmap to clean up
This patch introduces f2fs_dentry_kunmap to clean up dirty codes.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Changman Lee [Fri, 21 Nov 2014 06:42:07 +0000 (15:42 +0900)]
f2fs: fix wrong data structure when create slab
It used nat_entry_set when create slab for sit_entry_set.
Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Tue, 18 Nov 2014 18:50:21 +0000 (10:50 -0800)]
f2fs: call flush_dcache_page when the page was updated
Whenever f2fs updates mapped pages, it needs to call flush_dcache_page.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Wed, 19 Nov 2014 19:03:34 +0000 (11:03 -0800)]
f2fs: write SSA pages under memory pressure
Under memory pressure, we don't need to skip SSA page writes.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Wed, 19 Nov 2014 18:54:48 +0000 (10:54 -0800)]
f2fs: submit bio for node blocks in the reclaim path
If a node page is request to be written during the reclaiming path, we should
submit the bio to avoid pending to recliam it.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Tue, 18 Nov 2014 03:18:36 +0000 (11:18 +0800)]
f2fs: introduce struct inode_management to wrap inner fields
Now in f2fs, we have three inode cache: ORPHAN_INO, APPEND_INO, UPDATE_INO,
and we manage fields related to inode cache separately in struct f2fs_sb_info
for each inode cache type.
This makes codes a bit messy, so that this patch intorduce a new struct
inode_management to wrap inner fields as following which make codes more neat.
/* for inner inode cache management */
struct inode_management {
struct radix_tree_root ino_root; /* ino entry array */
spinlock_t ino_lock; /* for ino entry lock */
struct list_head ino_list; /* inode list head */
unsigned long ino_num; /* number of entries */
};
struct f2fs_sb_info {
...
struct inode_management im[MAX_INO_ENTRY]; /* manage inode cache */
...
}
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Tue, 18 Nov 2014 03:17:20 +0000 (11:17 +0800)]
f2fs: remove unneeded check code with option in f2fs_remount
Because we have checked the contrary condition in case of "if" judgment, we do
not need to check the condition again in case of "else" judgment. Let's remove
it.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Tue, 18 Nov 2014 03:16:01 +0000 (11:16 +0800)]
f2fs: avoid unable to restart gc thread in remount
In f2fs_remount, we will stop gc thread and set need_restart_gc as true when new
option is set without BG_GC, then if any error occurred in the following
procedure, we can restore to start the gc thread.
But after that, We will fail to restore gc thread in start_gc_thread as BG_GC is
not set in new option, so we'd better move this condition judgment out of
start_gc_thread to fix this issue.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Tue, 18 Nov 2014 00:14:11 +0000 (16:14 -0800)]
f2fs: put the inode page when error was occurred
We should put the inode page when error was occurred.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Tue, 18 Nov 2014 00:06:55 +0000 (16:06 -0800)]
f2fs: fix to call put_page at the error handling routine
The locked page should be released before returning the function.
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Tue, 11 Nov 2014 22:10:01 +0000 (14:10 -0800)]
f2fs: convert inline_data when i_size becomes large
If i_size becomes large outside of MAX_INLINE_DATA, we shoud convert the inode.
Otherwise, we can make some dirty pages during the truncation, and those pages
will be written through f2fs_write_data_page.
At that moment, the inode has still inline_data, so that it tries to write non-
zero pages into inline_data area.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Tue, 11 Nov 2014 19:01:01 +0000 (11:01 -0800)]
f2fs: fix deadlock to grab 0'th data page
The scenario is like this.
One trhead triggers:
f2fs_write_data_pages
lock_page
f2fs_write_data_page
f2fs_lock_op <- wait
The other thread triggers:
f2fs_truncate
truncate_blocks
f2fs_lock_op
truncate_partial_data_page
lock_page <- wait for locking the page
This patch resolves this bug by relocating truncate_partial_data_page.
This function is just to truncate user data page and not related to FS
consistency as well.
And, we don't need to call truncate_inline_data. Rather than that,
f2fs_write_data_page will finally update inline_data later.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Tue, 11 Nov 2014 00:29:14 +0000 (16:29 -0800)]
f2fs: reduce the number of inline_data inode before clearing it
The # of inline_data inode is decreased only when it has inline_data.
After clearing the flag, we can't decreased the number.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Mon, 10 Nov 2014 06:15:31 +0000 (22:15 -0800)]
f2fs: implement -o dirsync
If a mount option has dirsync, we should call checkpoint for all the directory
operations.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Fri, 7 Nov 2014 01:23:08 +0000 (17:23 -0800)]
f2fs: do not skip any writes under memory pressure
Under memory pressure, let's avoid skipping data writes.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Fri, 7 Nov 2014 01:21:24 +0000 (17:21 -0800)]
f2fs: write node pages if checkpoint is not doing
It needs to write node pages if checkpoint is not doing in order to avoid
memory pressure.
Reviewed-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Thu, 6 Nov 2014 23:24:46 +0000 (15:24 -0800)]
f2fs: control the memory footprint used by ino entries
This patch adds to control the memory footprint used by ino entries.
This will conduct best effort, not strictly.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Thu, 6 Nov 2014 23:16:04 +0000 (15:16 -0800)]
f2fs: introduce the number of inode entries
This patch adds to monitor the number of ino entries.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Thu, 6 Nov 2014 04:05:53 +0000 (20:05 -0800)]
f2fs: disable roll-forward when active_logs = 2
The roll-forward mechanism should be activated when the number of active
logs is not 2.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Fri, 31 Oct 2014 05:47:03 +0000 (22:47 -0700)]
f2fs: introduce -o fastboot for reducing booting time only
If a system wants to reduce the booting time as a top priority, now we can
use a mount option, -o fastboot.
With this option, f2fs conducts a little bit slow write_checkpoint, but
it can avoid the node page reads during the next mount time.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Fri, 31 Oct 2014 02:01:10 +0000 (19:01 -0700)]
f2fs: remove unnecessary macro
Let's remove unused macro.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Wed, 29 Oct 2014 21:37:22 +0000 (14:37 -0700)]
f2fs: avoid race condition in handling wait_io
__submit_merged_bio f2fs_write_end_io f2fs_write_end_io
wait_io = X wait_io = x
complete(X) complete(X)
wait_io = NULL
wait_for_completion()
free(X)
spin_lock(X)
kernel panic
In order to avoid this, this patch removes the wait_io facility.
Instead, we can use wait_on_all_pages_writeback(sbi) to wait for end_ios.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Wed, 29 Oct 2014 05:27:59 +0000 (22:27 -0700)]
f2fs: send discard commands in larger extent
If there is a chance to make a huge sized discard command, we don't need
to split it out, since each blkdev_issue_discard should wait one at a
time.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Fri, 24 Oct 2014 02:48:09 +0000 (19:48 -0700)]
f2fs: revisit inline_data to avoid data races and potential bugs
This patch simplifies the inline_data usage with the following rule.
1. inline_data is set during the file creation.
2. If new data is requested to be written ranges out of inline_data,
f2fs converts that inode permanently.
3. There is no cases which converts non-inline_data inode to inline_data.
4. The inline_data flag should be changed under inode page lock.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jan Kara [Thu, 23 Oct 2014 12:40:20 +0000 (14:40 +0200)]
f2fs: remove pointless bit testing in f2fs_delete_entry()
There's no point in using test_and_clear_bit_le() when we don't use the
return value of the function. Just use clear_bit_le() instead.
Coverity-id:
1016434
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Mon, 27 Oct 2014 20:54:27 +0000 (13:54 -0700)]
f2fs: do not discard data protected by the previous checkpoint
We should not discard any data protected by the previous checkpoint all
the time.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Mon, 27 Oct 2014 05:59:27 +0000 (22:59 -0700)]
f2fs: flush_dcache_page for inline data
When reading inline data, we should call flush_dcache_page.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Mon, 27 Oct 2014 18:04:35 +0000 (11:04 -0700)]
f2fs: call write_checkpoint under disabled gc
During the write_checkpoint, we should avoid f2fs_gc trigger to avoid any
filesystem consistency.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jan Kara [Wed, 22 Oct 2014 13:21:47 +0000 (15:21 +0200)]
f2fs: fix possible data corruption in f2fs_write_begin()
f2fs_write_begin() doesn't initialize the 'dn' variable if the inode has
inline data. However it uses its contents to decide whether it should
just zero out the page or load data to it. Thus if we are unlucky we can
zero out page contents instead of loading inline data into a page.
CC: stable@vger.kernel.org
CC: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Gu Zheng [Mon, 20 Oct 2014 09:45:49 +0000 (17:45 +0800)]
f2fs: use current_sit_addr to replace the open code
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Gu Zheng [Mon, 20 Oct 2014 09:45:51 +0000 (17:45 +0800)]
f2fs: rename f2fs_set/clear_bit to f2fs_test_and_set/clear_bit
Rename f2fs_set/clear_bit to f2fs_test_and_set/clear_bit, which mean
set/clear bit and return the old value, for better readability.
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Gu Zheng [Mon, 20 Oct 2014 09:45:53 +0000 (17:45 +0800)]
f2fs: set raw_super default to NULL to avoid compile warning
Set raw_super default to NULL to avoid the possibly used
uninitialized warning, though we may never hit it in fact.
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Gu Zheng [Mon, 20 Oct 2014 09:45:50 +0000 (17:45 +0800)]
f2fs: introduce f2fs_change_bit to simplify the change bit logic
Introduce f2fs_change_bit to simplify the change bit logic in
function set_to_next_nat{sit}.
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Gu Zheng [Mon, 20 Oct 2014 09:45:52 +0000 (17:45 +0800)]
f2fs: remove the redundant function cond_clear_inode_flag
Use clear_inode_flag to replace the redundant cond_clear_inode_flag.
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Gu Zheng [Mon, 20 Oct 2014 09:45:48 +0000 (17:45 +0800)]
f2fs: remove the seems unneeded argument 'type' from __get_victim
Remove the unneeded argument 'type' from __get_victim, use
NO_CHECK_TYPE directly when calling v_ops->get_victim().
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jan Kara [Tue, 21 Oct 2014 12:07:33 +0000 (14:07 +0200)]
f2fs: avoid returning uninitialized value to userspace from f2fs_trim_fs()
If user specifies too low end sector for trimming, f2fs_trim_fs() will
use uninitialized value as a number of trimmed blocks and returns it to
userspace. Initialize number of trimmed blocks early to avoid the
problem.
Coverity-id:
1248809
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Tue, 21 Oct 2014 03:28:49 +0000 (20:28 -0700)]
f2fs: declare f2fs_convert_inline_dir as a static function
This patch declares f2fs_convert_inline_dir as a static function, which was
reported by kbuild test robot.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Sun, 19 Oct 2014 06:41:38 +0000 (23:41 -0700)]
f2fs: use kmap_atomic instead of kmap
For better performance, we need to use kmap_atomic instead of kmap.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Sun, 19 Oct 2014 06:06:41 +0000 (23:06 -0700)]
f2fs: reuse make_empty_dir code for inline_dentry
This patch introduces do_make_empty_dir to mitigate code redundancy
for inline_dentry.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Sun, 19 Oct 2014 05:52:52 +0000 (22:52 -0700)]
f2fs: introduce f2fs_dentry_ptr structure for code clean-up
This patch introduces f2fs_dentry_ptr structure for the use of a function
parameter in inline_dentry operations.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Thu, 16 Oct 2014 19:24:14 +0000 (12:24 -0700)]
f2fs: should not truncate any inline_dentry
If the inode has inline_dentry, we should not truncate any block indices.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Thu, 16 Oct 2014 04:29:51 +0000 (21:29 -0700)]
f2fs: reuse core function in f2fs_readdir for inline_dentry
This patch introduces a core function, f2fs_fill_dentries, to remove
redundant code in f2fs_readdir and f2fs_read_inline_dir.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Tue, 14 Oct 2014 17:29:50 +0000 (10:29 -0700)]
f2fs: fix counting inline_data inode numbers
This patch fixes wrongly counting inline_data inode numbers.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Tue, 14 Oct 2014 03:00:16 +0000 (20:00 -0700)]
f2fs: add stat info for inline_dentry inodes
This patch adds status information for inline_dentry inodes.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Tue, 14 Oct 2014 02:42:53 +0000 (19:42 -0700)]
f2fs: avoid deadlock on init_inode_metadata
Previously, init_inode_metadata does not hold any parent directory's inode
page. So, f2fs_init_acl can grab its parent inode page without any problem.
But, when we use inline_dentry, that page is grabbed during f2fs_add_link,
so that we can fall into deadlock condition like below.
INFO: task mknod:11006 blocked for more than 120 seconds.
Tainted: G OE 3.17.0-rc1+ #13
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mknod D
ffff88003fc94580 0 11006 11004 0x00000000
ffff880007717b10 0000000000000002 ffff88003c323220 ffff880007717fd8
0000000000014580 0000000000014580 ffff88003daecb30 ffff88003c323220
ffff88003fc94e80 ffff88003ffbb4e8 ffff880007717ba0 0000000000000002
Call Trace:
[<
ffffffff8173dc40>] ? bit_wait+0x50/0x50
[<
ffffffff8173d4cd>] io_schedule+0x9d/0x130
[<
ffffffff8173dc6c>] bit_wait_io+0x2c/0x50
[<
ffffffff8173da3b>] __wait_on_bit_lock+0x4b/0xb0
[<
ffffffff811640a7>] __lock_page+0x67/0x70
[<
ffffffff810acf50>] ? autoremove_wake_function+0x40/0x40
[<
ffffffff811652cc>] pagecache_get_page+0x14c/0x1e0
[<
ffffffffa029afa9>] get_node_page+0x59/0x130 [f2fs]
[<
ffffffffa02a63ad>] read_all_xattrs+0x24d/0x430 [f2fs]
[<
ffffffffa02a6ca2>] f2fs_getxattr+0x52/0xe0 [f2fs]
[<
ffffffffa02a7481>] f2fs_get_acl+0x41/0x2d0 [f2fs]
[<
ffffffff8122d847>] get_acl+0x47/0x70
[<
ffffffff8122db5a>] posix_acl_create+0x5a/0x150
[<
ffffffffa02a7759>] f2fs_init_acl+0x29/0xcb [f2fs]
[<
ffffffffa0286a8d>] init_inode_metadata+0x5d/0x340 [f2fs]
[<
ffffffffa029253a>] f2fs_add_inline_entry+0x12a/0x2e0 [f2fs]
[<
ffffffffa0286ea5>] __f2fs_add_link+0x45/0x4a0 [f2fs]
[<
ffffffffa028b5b6>] ? f2fs_new_inode+0x146/0x220 [f2fs]
[<
ffffffffa028b816>] f2fs_mknod+0x86/0xf0 [f2fs]
[<
ffffffff811e3ec1>] vfs_mknod+0xe1/0x160
[<
ffffffff811e4b26>] SyS_mknod+0x1f6/0x200
[<
ffffffff81741d7f>] tracesys+0xe1/0xe6
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Tue, 14 Oct 2014 02:34:26 +0000 (19:34 -0700)]
f2fs: fix to wait correct block type
The inode page needs to wait NODE block io.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Tue, 14 Oct 2014 00:26:14 +0000 (17:26 -0700)]
f2fs: reuse find_in_block code for find_in_inline_dir
This patch removes redundant copied code in find_in_inline_dir.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Mon, 13 Oct 2014 23:28:13 +0000 (16:28 -0700)]
f2fs: reuse room_for_filename for inline dentry operation
This patch introduces to reuse the existing room_for_filename for inline dentry
operation.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Wed, 24 Sep 2014 10:20:23 +0000 (18:20 +0800)]
f2fs: update f2fs documentation for inline dir support
This patch adds descriptions for the inline dir support in f2fs document.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Wed, 24 Sep 2014 10:19:10 +0000 (18:19 +0800)]
f2fs: enable inline dir handling
Add inline dir functions into normal dir ops' function to handle inline ops.
Besides, we enable inline dir mode when a new dir inode is created if
inline_data option is on.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Wed, 24 Sep 2014 10:17:53 +0000 (18:17 +0800)]
f2fs: add key function to handle inline dir
Adds Functions to implement inline dir init/lookup/insert/delete/convert ops.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
[Jaegeuk Kim: remove needless reserved area copy, pointed by Dan Carpenter]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Wed, 24 Sep 2014 10:17:04 +0000 (18:17 +0800)]
f2fs: export dir operations for inline dir
This patch exports some dir operations for inline dir, additionally introduces
f2fs_drop_nlink from f2fs_delete_entry for reusing by inline dir function.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Wed, 24 Sep 2014 10:16:13 +0000 (18:16 +0800)]
f2fs: add a new mount option for inline dir
Adds a new mount option 'inline_dentry' for inline dir.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Wed, 24 Sep 2014 10:15:19 +0000 (18:15 +0800)]
f2fs: add infra struct and helper for inline dir
This patch defines macro/inline dentry structure, and adds some helpers for
inline dir infrastructure.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Fri, 17 Oct 2014 21:14:16 +0000 (14:14 -0700)]
f2fs: avoid infinite loop at cp_error
This patch avoids an infinite loop in sync_dirty_inode_page when -EIO was
detected.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Thu, 16 Oct 2014 18:43:30 +0000 (11:43 -0700)]
f2fs: avoid build warning
This patch removes build warning.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Sun, 19 Oct 2014 08:43:23 +0000 (01:43 -0700)]
f2fs: fix to call f2fs_unlock_op
This patch fixes to call f2fs_unlock_op, which was missing before.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Sat, 18 Oct 2014 03:33:55 +0000 (20:33 -0700)]
f2fs: avoid to allocate when inline_data was written
The sceanrio is like this.
inline_data i_size page write_begin/vm_page_mkwrite
X 30 dirty_page
X 30 write to #4096 position
X 30 get_dnode_of_data wait for get_dnode_of_data
O 30 write inline_data
O 30 get_dnode_of_data
O 30 reserve data block
..
In this case, we have #0 = NEW_ADDR and inline_data as well.
We should not allow this condition for further access.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Sat, 18 Oct 2014 00:57:29 +0000 (17:57 -0700)]
f2fs: use highmem for directory pages
This patch fixes to use highmem for directory pages.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Wed, 15 Oct 2014 17:24:34 +0000 (10:24 -0700)]
f2fs: fix race conditon on truncation with inline_data
Let's consider the following scenario.
blkaddr[0] inline_data i_size i_blocks writepage truncate
NEW X 4096 2 dirty page #0
NEW X 0 change i_size
NEW X 0 2 f2fs_write_inline_data
NEW X 0 2 get_dnode_of_data
NEW X 0 2 truncate_data_blocks_range
NULL O 0 1 memcpy(inline_data)
NULL O 0 1 f2fs_put_dnode
NULL O 0 1 f2fs_truncate
NULL O 0 1 get_dnode_of_data
NULL O 0 1 *invalid block addr*
This patch adds checking inline_data flag during f2fs_truncate not to refer
corrupted block indices.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Wed, 15 Oct 2014 17:16:54 +0000 (10:16 -0700)]
f2fs: should truncate any allocated block for inline_data write
When trying to write inline_data, we should truncate any data block allocated
and pointed by the inode block.
We should consider the data index is not 0.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Thu, 9 Oct 2014 20:39:06 +0000 (13:39 -0700)]
f2fs: invalidate inmemory page
If user truncates file's data, we should truncate inmemory pages too.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Thu, 9 Oct 2014 20:19:53 +0000 (13:19 -0700)]
f2fs: do not make dirty any inmemory pages
This patch let inmemory pages be clean all the time.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Linus Torvalds [Mon, 3 Nov 2014 22:09:33 +0000 (14:09 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/geert/linux-m68k
Pull m68k update from Geert Uytterhoeven.
Just wiring up the bpf system call.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k:
m68k: Wire up bpf
Linus Torvalds [Mon, 3 Nov 2014 22:07:05 +0000 (14:07 -0800)]
Merge tag 'armsoc-for-rc3' of git://git./linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Olof Johansson:
"A surprisingly small batch of fixes for -rc3. Suspiciously small, I'd
say.
Anyway, most of this are a few defconfig updates. Some for omap to
deal with kernel binary size (moving ipv6 to module, etc). A larger
one for socfpga that refreshes with some churn, but also turns on a
few options that makes the newly-added board in my bootfarm usable for
testing.
OMAP3 will also now warn when booted with legacy (non-DT) boot
protocols, hopefully encouraging those who still care about some of
those platforms to submit DT support and report bugs where needed.
Nothing stops working though, this is just to warn for future
deprecation.
Beyond this, very few actual bugfixes. A PXA fix for DEBUG_LL boot
hangs, a missing terminting entry in a dt_match array on RealView a
MTD fix on OMAP with NAND"
[ Obviously missed rc3, will make rc4 instead ;) ]
* tag 'armsoc-for-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
MAINTAINERS: drop list entry for davinci
ARM: OMAP2+: Warn about deprecated legacy booting mode
ARM: omap2plus_defconfig: Fix errors with NAND BCH
ARM: multi_v7_defconfig: fix support for APQ8084
soc: versatile: Add terminating entry for realview_soc_of_match
ARM: ixp4xx: remove compilation warnings in io.h
MAINTAINERS: Add Soren as reviewer for Zynq
ARM: omap2plus_defconfig: Fix bloat caused by having ipv6 built-in
ARM: socfpga_defconfig: Update defconfig for SoCFPGA
ARM: pxa: fix hang on startup with DEBUG_LL
Linus Torvalds [Sun, 2 Nov 2014 23:01:51 +0000 (15:01 -0800)]
Linux 3.18-rc3
Linus Torvalds [Sun, 2 Nov 2014 22:45:52 +0000 (14:45 -0800)]
Merge tag 'for-linus-
20141102' of git://git.infradead.org/linux-mtd
Pull MTD fixes from Brian Norris:
"Three main MTD fixes for 3.18:
- A regression from 3.16 which was noticed in 3.17. With the
restructuring of the m25p80.c driver and the SPI NOR library
framework, we omitted proper listing of the SPI device IDs. This
means m25p80.c wouldn't auto-load (modprobe) properly when built as
a module. For now, we duplicate the device IDs into both modules.
- The OMAP / ELM modules were depending on an implicit link ordering.
Use deferred probing so that the new link order (in 3.18-rc) can
still allow for successful probing.
- Fix suspend/resume support for LH28F640BF NOR flash"
* tag 'for-linus-
20141102' of git://git.infradead.org/linux-mtd:
mtd: cfi_cmdset_0001.c: fix resume for LH28F640BF chips
mtd: omap: fix mtd devices not showing up
mtd: m25p80,spi-nor: Fix module aliases for m25p80
mtd: spi-nor: make spi_nor_scan() take a chip type name, not spi_device_id
mtd: m25p80: get rid of spi_get_device_id
Linus Torvalds [Sun, 2 Nov 2014 22:39:35 +0000 (14:39 -0800)]
Merge tag 'scsi-for-linus' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"This is a set of six patches consisting of:
- two MAINTAINER updates
- two scsi-mq fixs for the old parallel interface (not every request
is tagged and we need to set the right flags to populate the SPI
tag message)
- a fix for a memory leak in scatterlist traversal caused by a
preallocation update in 3.17
- an ipv6 fix for cxgbi"
[ The scatterlist fix also came in separately through the block layer tree ]
* tag 'scsi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
MAINTAINERS: ufs - remove self
MAINTAINERS: change hpsa and cciss maintainer
libcxgbi : support ipv6 address host_param
scsi: set REQ_QUEUE for the blk-mq case
Revert "block: all blk-mq requests are tagged"
lib/scatterlist: fix memory leak with scsi-mq
Linus Torvalds [Sun, 2 Nov 2014 22:27:30 +0000 (14:27 -0800)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"Nothing too astounding or major: radeon, i915, vmwgfx, armada and
exynos.
Biggest ones:
- vmwgfx has one big locking regression fix
- i915 has come displayport fixes
- radeon has some stability and a memory alloc failure
- armada and exynos have some vblank fixes"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (24 commits)
drm/exynos: correct connector->dpms field before resuming
drm/exynos: enable vblank after DPMS on
drm/exynos: init kms poll at the end of initialization
drm/exynos: propagate plane initialization errors
drm/exynos: vidi: fix build warning
drm/exynos: remove explicit encoder/connector de-initialization
drm/exynos: init vblank with real number of crtcs
drm/vmwgfx: Filter out modes those cannot be supported by the current VRAM size.
drm/vmwgfx: Fix hash key computation
drm/vmwgfx: fix lock breakage
drm/i915/dp: only use training pattern 3 on platforms that support it
drm/radeon: remove some buggy dead code
drm/i915: Ignore VBT backlight check on Macbook 2, 1
drm/radeon: remove invalid pci id
drm/radeon: dpm fixes for asrock systems
radeon: clean up coding style differences in radeon_get_bios()
drm/radeon: Use drm_malloc_ab instead of kmalloc_array
drm/radeon/dpm: disable ulv support on SI
drm/i915: Fix GMBUSFREQ on vlv/chv
drm/i915: Ignore long hpds on eDP ports
...
Olof Johansson [Sun, 2 Nov 2014 21:36:05 +0000 (13:36 -0800)]
Merge tag 'fixes-against-v3.18-rc2' of git://git./linux/kernel/git/tmlind/linux-omap into fixes
Merge "omap fixes against v3.18-rc2" from Tony Lindgren:
Few fixes for omaps to enable NAND BCH so devices won't
produce errors when booted with omap2plus_defconfig, and
reduce bloat by making IPV6 a loadable module.
Also let's add a warning about legacy boot being deprecated
for omap3.
We now have things working with device tree, and only omap3 is
still booting in legacy mode. So hopefully this warning will
help move the remaining legacy mode users to boot with device
tree.
As the total reduction of code and static data is somewhere
around 20000 lines of code once we remove omap3 legacy mode
booting, we really do want to make omap3 to boot also in
device tree mode only over the next few merge cycles.
* tag 'fixes-against-v3.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap: (407 commits)
ARM: OMAP2+: Warn about deprecated legacy booting mode
ARM: omap2plus_defconfig: Fix errors with NAND BCH
ARM: omap2plus_defconfig: Fix bloat caused by having ipv6 built-in
+ Linux 3.18-rc2
Signed-off-by: Olof Johansson <olof@lixom.net>
Linus Torvalds [Sun, 2 Nov 2014 20:56:20 +0000 (12:56 -0800)]
Merge branch 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm
Pull ARM fixes from Russell King:
- add the new bpf syscall to ARM.
- drop a redundant return statement in __iommu_alloc_remap()
- fix a performance issue noticed by Thomas Petazzoni with
kmap_atomic().
- fix an issue with the L2 cache OF parsing code which caused it to
incorrectly print warnings on each boot, and make the warning text
more consistent with the rest of the code
* 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm:
ARM: 8180/1: mm: implement no-highmem fast path in kmap_atomic_pfn()
ARM: 8183/1: l2c: Improve l2c310_of_parse() error message
ARM: 8181/1: Drop extra return statement
ARM: 8182/1: l2c: Make l2x0_cache_size_of_parse() return 'int'
ARM: enable bpf syscall
Linus Torvalds [Sun, 2 Nov 2014 20:31:02 +0000 (12:31 -0800)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"A small set of x86 fixes. The most serious is an SRCU lockdep fix.
A bit late - needed some time to test the SRCU fix, which only came in
on Friday"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: vmx: defer load of APIC access page address during reset
KVM: nVMX: Disable preemption while reading from shadow VMCS
KVM: x86: Fix far-jump to non-canonical check
KVM: emulator: fix execution close to the segment limit
KVM: emulator: fix error code for __linearize
Dave Airlie [Sun, 2 Nov 2014 19:23:17 +0000 (05:23 +1000)]
Merge branch 'exynos-drm-fixes' of git://git./linux/kernel/git/daeinki/drm-exynos into drm-fixes
This pull-request includes some bug fixes and code cleanups.
Especially, this fixes the bind failure issue occurred when it tries
to re-bind Exynos drm driver after unbound, and the modetest failure
issue incurred by not having a pair to vblank on and off requests.
* 'exynos-drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos:
drm/exynos: correct connector->dpms field before resuming
drm/exynos: enable vblank after DPMS on
drm/exynos: init kms poll at the end of initialization
drm/exynos: propagate plane initialization errors
drm/exynos: vidi: fix build warning
drm/exynos: remove explicit encoder/connector de-initialization
drm/exynos: init vblank with real number of crtcs
Linus Torvalds [Sun, 2 Nov 2014 18:28:43 +0000 (10:28 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/viro/vfs
Pull VFS fixes from Al Viro:
"A bunch of assorted fixes, most of them followups to overlayfs merge"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
ovl: initialize ->is_cursor
Return short read or 0 at end of a raw device, not EIO
isofs: don't bother with ->d_op for normal case
isofs_cmp(): we'll never see a dentry for . or ..
overlayfs: fix lockdep misannotation
ovl: fix check for cursor
overlayfs: barriers for opening upper-layer directory
rcu: Provide counterpart to rcu_dereference() for non-RCU situations
staging: android: logger: Fix log corruption regression
Linus Torvalds [Sun, 2 Nov 2014 18:20:26 +0000 (10:20 -0800)]
irda: stop calling sk_prot->disconnect() on connection failure
The sk_prot is irda's own set of protocol handlers, so irda should
statically know what that function is anyway, without using an indirect
pointer. And as it happens, we know *exactly* what that pointer is
statically: it's NULL, because irda doesn't define a disconnect
operation.
So calling that function is doubly wrong, and will just cause an oops.
Reported-by: Martin Lang <mlg.hessigheim@gmail.com>
Cc: Samuel Ortiz <samuel@sortiz.org>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrzej Hajda [Fri, 10 Oct 2014 12:31:56 +0000 (14:31 +0200)]
drm/exynos: correct connector->dpms field before resuming
During system suspend after connector switch off its dpms field
is set to connector previous dpms state. To properly resume dpms field
should be set to its actual state (off) before resuming to previous dpms state.
Signed-off-by: Andrzej Hajda <a.hajda@samsung.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
Andrzej Hajda [Fri, 10 Oct 2014 12:31:55 +0000 (14:31 +0200)]
drm/exynos: enable vblank after DPMS on
Before DPMS off driver disables vblank.
It should be balanced by vblank enable after DPMS on.
The patch fixes issue with page_flip ioctl not being able
to acquire vblank counter introduced by patch:
drm: Always reject drm_vblank_get() after drm_vblank_off()
Signed-off-by: Andrzej Hajda <a.hajda@samsung.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
Andrzej Hajda [Fri, 10 Oct 2014 12:31:54 +0000 (14:31 +0200)]
drm/exynos: init kms poll at the end of initialization
HPD events can be generated by components even if drm_dev is not fully
initialized, to skip such events kms poll initialization should
be performed at the end of load callback followed directly by forced
connection detection.
Signed-off-by: Andrzej Hajda <a.hajda@samsung.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
Andrzej Hajda [Fri, 10 Oct 2014 12:31:53 +0000 (14:31 +0200)]
drm/exynos: propagate plane initialization errors
In case of error during plane initialization load callback
incorrectly return success, this patch fixes it.
Signed-off-by: Andrzej Hajda <a.hajda@samsung.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
Inki Dae [Tue, 7 Oct 2014 15:16:34 +0000 (00:16 +0900)]
drm/exynos: vidi: fix build warning
encoder object isn't used anymore so remove it.
Signed-off-by: Inki Dae <inki.dae@samsung.com>
Andrzej Hajda [Mon, 22 Sep 2014 09:30:48 +0000 (11:30 +0200)]
drm/exynos: remove explicit encoder/connector de-initialization
All KMS objects are destroyed by drm_mode_config_cleanup in proper order
so component drivers should not care about it.
Signed-off-by: Andrzej Hajda <a.hajda@samsung.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
Andrzej Hajda [Tue, 7 Oct 2014 13:09:14 +0000 (22:09 +0900)]
drm/exynos: init vblank with real number of crtcs
Initialization of vblank with MAX_CRTC caused attempts
to disabling vblanks for non-existing crtcs in case
drm used fewer crtcs. The patch fixes it.
Signed-off-by: Andrzej Hajda <a.hajda@samsung.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
Paolo Bonzini [Sun, 2 Nov 2014 06:54:30 +0000 (07:54 +0100)]
KVM: vmx: defer load of APIC access page address during reset
Most call paths to vmx_vcpu_reset do not hold the SRCU lock. Defer loading
the APIC access page to the next vmentry.
This avoids the following lockdep splat:
[ INFO: suspicious RCU usage. ]
3.18.0-rc2-test2+ #70 Not tainted
-------------------------------
include/linux/kvm_host.h:474 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
rcu_scheduler_active = 1, debug_locks = 0
1 lock held by qemu-system-x86/2371:
#0: (&vcpu->mutex){+.+...}, at: [<
ffffffffa037d800>] vcpu_load+0x20/0xd0 [kvm]
stack backtrace:
CPU: 4 PID: 2371 Comm: qemu-system-x86 Not tainted 3.18.0-rc2-test2+ #70
Hardware name: Dell Inc. OptiPlex 9010/0M9KCM, BIOS A12 01/10/2013
0000000000000001 ffff880209983ca8 ffffffff816f514f 0000000000000000
ffff8802099b8990 ffff880209983cd8 ffffffff810bd687 00000000000fee00
ffff880208a2c000 ffff880208a10000 ffff88020ef50040 ffff880209983d08
Call Trace:
[<
ffffffff816f514f>] dump_stack+0x4e/0x71
[<
ffffffff810bd687>] lockdep_rcu_suspicious+0xe7/0x120
[<
ffffffffa037d055>] gfn_to_memslot+0xd5/0xe0 [kvm]
[<
ffffffffa03807d3>] __gfn_to_pfn+0x33/0x60 [kvm]
[<
ffffffffa0380885>] gfn_to_page+0x25/0x90 [kvm]
[<
ffffffffa038aeec>] kvm_vcpu_reload_apic_access_page+0x3c/0x80 [kvm]
[<
ffffffffa08f0a9c>] vmx_vcpu_reset+0x20c/0x460 [kvm_intel]
[<
ffffffffa039ab8e>] kvm_vcpu_reset+0x15e/0x1b0 [kvm]
[<
ffffffffa039ac0c>] kvm_arch_vcpu_setup+0x2c/0x50 [kvm]
[<
ffffffffa037f7e0>] kvm_vm_ioctl+0x1d0/0x780 [kvm]
[<
ffffffff810bc664>] ? __lock_is_held+0x54/0x80
[<
ffffffff812231f0>] do_vfs_ioctl+0x300/0x520
[<
ffffffff8122ee45>] ? __fget+0x5/0x250
[<
ffffffff8122f0fa>] ? __fget_light+0x2a/0xe0
[<
ffffffff81223491>] SyS_ioctl+0x81/0xa0
[<
ffffffff816fed6d>] system_call_fastpath+0x16/0x1b
Reported-by: Takashi Iwai <tiwai@suse.de>
Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Reviewed-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Tested-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Fixes:
38b9917350cb2946e368ba684cfc33d1672f104e
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Jan Kiszka [Wed, 8 Oct 2014 16:05:39 +0000 (18:05 +0200)]
KVM: nVMX: Disable preemption while reading from shadow VMCS
In order to access the shadow VMCS, we need to load it. At this point,
vmx->loaded_vmcs->vmcs and the actually loaded one start to differ. If
we now get preempted by Linux, vmx_vcpu_put and, on return, the
vmx_vcpu_load will work against the wrong vmcs. That can cause
copy_shadow_to_vmcs12 to corrupt the vmcs12 state.
Fix the issue by disabling preemption during the copy operation.
copy_vmcs12_to_shadow is safe from this issue as it is executed by
vmx_vcpu_run when preemption is already disabled before vmentry.
This bug is exposed by running Jailhouse within KVM on CPUs with
shadow VMCS support. Jailhouse never expects an interrupt pending
vmexit, but the bug can cause it if, after copy_shadow_to_vmcs12
is preempted, the active VMCS happens to have the virtual interrupt
pending flag set in the CPU-based execution controls.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Nadav Amit [Mon, 27 Oct 2014 22:03:43 +0000 (00:03 +0200)]
KVM: x86: Fix far-jump to non-canonical check
Commit
d1442d85cc30 ("KVM: x86: Handle errors when RIP is set during far
jumps") introduced a bug that caused the fix to be incomplete. Due to
incorrect evaluation, far jump to segment with L bit cleared (i.e., 32-bit
segment) and RIP with any of the high bits set (i.e, RIP[63:32] != 0) set may
not trigger #GP. As we know, this imposes a security problem.
In addition, the condition for two warnings was incorrect.
Fixes:
d1442d85cc30ea75f7d399474ca738e0bc96f715
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
[Add #ifdef CONFIG_X86_64 to avoid complaints of undefined behavior. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Dave Airlie [Sat, 1 Nov 2014 23:23:31 +0000 (09:23 +1000)]
Merge branch 'vmwgfx-fixes-3.18' of git://people.freedesktop.org/~thomash/linux
A critical 3.18 regression fix from Rob, (thanks!)
A fix to avoid advertizing modes we can't support from Sinclair
(welcome Sinclair!)
and a fix for an incorrect hash key computation from me that is
completely harmless, but can wait 'til the next merge window if necessary.
(I can't really bother stable with this one).
* 'vmwgfx-fixes-3.18' of git://people.freedesktop.org/~thomash/linux:
drm/vmwgfx: Filter out modes those cannot be supported by the current VRAM size.
drm/vmwgfx: Fix hash key computation
drm/vmwgfx: fix lock breakage