Haicheng Li [Mon, 6 May 2013 15:15:41 +0000 (23:15 +0800)]
f2fs: bugfix for alloc_nid_failed()
Directly drop the free_nid cache when nm_i->fcnt > 2 * MAX_FREE_NIDS
Since there is NOT nmi->free_nid_list_lock spinlock protection between
a sequential calling of alloc_nid() and alloc_nid_failed(), some other
threads may already add new free_nid to the free_nid_list during this
period.
We need to make sure nmi->fcnt is never > 2 * MAX_FREE_NIDS.
Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com>
[Jaegeuk Kim: fit the coding style]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Chris Fries [Thu, 2 May 2013 21:09:05 +0000 (16:09 -0500)]
f2fs: recover when journal contains deleted files
When recovering a journal file with fsync data for files that have
been deleted, don't bail out on recovery.
Signed-off-by: Chris Fries <C.Fries@motorola.com>
Reviewed-by: Russell Knize <rknize2@motorola.com>
Reviewed-by: Jason Hrycay <jason.hrycay@motorola.com>
[Jaegeuk Kim: fit the coding style]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Chris Fries [Thu, 2 May 2013 21:07:34 +0000 (16:07 -0500)]
f2fs: continue to mount after failing recovery
When unable to roll forward the journal, we shouldn't bail out and
not mount, we should continue to attempt the mount. Bad recovery data
is likely unrecoverable at this point, and requiring the user to try
to mount again doesn't solve any issues.
Signed-off-by: Chris Fries <C.Fries@motorola.com>
Reviewed-by: Russell Knize <rknize2@motorola.com>
Reviewed-by: Jason Hrycay <jason.hrycay@motorola.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Jaegeuk Kim [Tue, 30 Apr 2013 02:33:27 +0000 (11:33 +0900)]
f2fs: avoid deadlock during evict after f2fs_gc
o Deadlock case #1
Thread 1:
- writeback_sb_inodes
- do_writepages
- f2fs_write_data_pages
- write_cache_pages
- f2fs_write_data_page
- f2fs_balance_fs
- wait mutex_lock(gc_mutex)
Thread 2:
- f2fs_balance_fs
- mutex_lock(gc_mutex)
- f2fs_gc
- f2fs_iget
- wait iget_locked(inode->i_lock)
Thread 3:
- do_unlinkat
- iput
- lock(inode->i_lock)
- evict
- inode_wait_for_writeback
o Deadlock case #2
Thread 1:
- __writeback_single_inode
: set I_SYNC
- do_writepages
- f2fs_write_data_page
- f2fs_balance_fs
- f2fs_gc
- iput
- evict
- inode_wait_for_writeback(I_SYNC)
In order to avoid this, even though iput is called with the zero-reference
count, we need to stop the eviction procedure if the inode is on writeback.
So this patch links f2fs_drop_inode which checks the I_SYNC flag.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Jaegeuk Kim [Mon, 29 Apr 2013 07:58:39 +0000 (16:58 +0900)]
f2fs: modify the number of issued pages to merge IOs
When testing f2fs on an SSD, I found some 128 page IOs followed by 1 page IO
were issued by f2fs_write_node_pages.
This means that there were some mishandling flows which degrades performance.
Previous f2fs_write_node_pages determines the number of pages to be written,
nr_to_write, as follows.
1. The bio_get_nr_vecs returns 129 pages.
2. The bio_alloc makes a room for 128 pages.
3. The initial 128 pages go into one bio.
4. The existing bio is submitted, and a new bio is prepared for the last 1 page.
5. Finally, sync_node_pages submits the last 1 page bio.
The problem is from the use of bio_get_nr_vecs, so this patch replace it
with max_hw_blocks using queue_max_sectors.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Haicheng Li [Sun, 28 Apr 2013 11:16:07 +0000 (19:16 +0800)]
f2fs: remove useless #include <linux/proc_fs.h> as we're now using sysfs as debug entry.
Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Haicheng Li [Sun, 28 Apr 2013 11:16:06 +0000 (19:16 +0800)]
f2fs: fix inconsistent using of NM_WOUT_THRESHOLD
try_to_free_nats() is usually called with parameter nr_shrink as
"nm_i->nat_cnt - NM_WOUT_THRESHOLD"
by flush_nat_entries() during checkpointing process.
However, this is inconsistent with the actual threshold check as
"if (nm_i->nat_cnt < 2 * NM_WOUT_THRESHOLD)"
, which will ignore the free_nats requests when
NM_WOUT_THRESHOLD < nm_i->nat_cnt < 2 * NM_WOUT_THRESHOLD
So fix the threshold check condition.
Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Jaegeuk Kim [Fri, 26 Apr 2013 02:55:17 +0000 (11:55 +0900)]
f2fs: check truncation of mapping after lock_page
We call lock_page when we need to update a page after readpage.
Between grab and lock page, the page can be truncated by other thread.
So, we should check the page after lock_page whether it was truncated or not.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Jaegeuk Kim [Thu, 25 Apr 2013 07:05:51 +0000 (16:05 +0900)]
f2fs: enhance alloc_nid and build_free_nids flows
In order to avoid build_free_nid lock contention, let's change the order of
function calls as follows.
At first, check whether there is enough free nids.
- If available, just get a free nid with spin_lock without any overhead.
- Otherwise, conduct build_free_nids.
: scan nat pages, journal nat entries, and nat cache entries.
We should consider carefullly not to serve free nids intermediately made by
build_free_nids.
We can get stable free nids only after build_free_nids is done.
Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Jaegeuk Kim [Thu, 25 Apr 2013 04:24:33 +0000 (13:24 +0900)]
f2fs: add a tracepoint on f2fs_new_inode
This can help when debugging the free nid allocation flows.
Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Jaegeuk Kim [Thu, 25 Apr 2013 04:21:12 +0000 (13:21 +0900)]
f2fs: check nid == 0 in add_free_nid
It is more obvious that add_free_nid checks whether the free nid is zero or not.
Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Namjae Jeon [Thu, 25 Apr 2013 02:45:21 +0000 (11:45 +0900)]
f2fs: add REQ_META about metadata requests for submit
Adding REQ_META for all the metadata requests can help in improving the
FS performance, if the underlying device supports TAGGING.
So, when considering the submit_bio path for all the f2fs requests. We can
add REQ_META for all the META requests.
As a precursor to this change we considered the commit
4265900e0be653f5b78baf2816857ef57cf1332f 'mmc: MMC-4.5 Data Tag Support'
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Jaegeuk Kim [Wed, 24 Apr 2013 04:19:56 +0000 (13:19 +0900)]
f2fs: give a chance to merge IOs by IO scheduler
Previously, background GC submits many 4KB read requests to load victim blocks
and/or its (i)node blocks.
...
f2fs_gc : f2fs_readpage: ino = 1, page_index = 0xb61, blkaddr = 0x3b964ed
f2fs_gc : block_rq_complete: 8,16 R ()
499854968 + 8 [0]
f2fs_gc : f2fs_readpage: ino = 1, page_index = 0xb6f, blkaddr = 0x3b964ee
f2fs_gc : block_rq_complete: 8,16 R ()
499854976 + 8 [0]
f2fs_gc : f2fs_readpage: ino = 1, page_index = 0xb79, blkaddr = 0x3b964ef
f2fs_gc : block_rq_complete: 8,16 R ()
499854984 + 8 [0]
...
However, by the fact that many IOs are sequential, we can give a chance to merge
the IOs by IO scheduler.
In order to do that, let's use blk_plug.
...
f2fs_gc : f2fs_iget: ino = 143
f2fs_gc : f2fs_readpage: ino = 143, page_index = 0x1c6, blkaddr = 0x2e6ee
f2fs_gc : f2fs_iget: ino = 143
f2fs_gc : f2fs_readpage: ino = 143, page_index = 0x1c7, blkaddr = 0x2e6ef
<idle> : block_rq_complete: 8,16 R ()
1519616 + 8 [0]
<idle> : block_rq_complete: 8,16 R ()
1519848 + 8 [0]
<idle> : block_rq_complete: 8,16 R ()
1520432 + 96 [0]
<idle> : block_rq_complete: 8,16 R ()
1520536 + 104 [0]
<idle> : block_rq_complete: 8,16 R ()
1521008 + 112 [0]
<idle> : block_rq_complete: 8,16 R ()
1521440 + 152 [0]
<idle> : block_rq_complete: 8,16 R ()
1521688 + 144 [0]
<idle> : block_rq_complete: 8,16 R ()
1522128 + 192 [0]
<idle> : block_rq_complete: 8,16 R ()
1523256 + 328 [0]
...
Note that this issue should be addressed in checkpoint, and some readahead
flows too.
Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Jaegeuk Kim [Wed, 24 Apr 2013 04:00:14 +0000 (13:00 +0900)]
f2fs: avoid frequent background GC
If there is no victim segments selected by background GC, let's wait
a little bit longer time to collect dirty segments.
By default, let's give 5 minutes.
Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Namjae Jeon [Tue, 23 Apr 2013 09:26:54 +0000 (18:26 +0900)]
f2fs: add tracepoints to debug checkpoint request
Add tracepoints to debug checkpoint request.
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Pankaj Kumar <pankaj.km@samsung.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
[Jaegeuk: change expressions]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Namjae Jeon [Tue, 23 Apr 2013 08:51:43 +0000 (17:51 +0900)]
f2fs: add tracepoints for write page operations
Add tracepoints to debug the various page write operation
like data pages, meta pages.
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Pankaj Kumar <pankaj.km@samsung.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
[Jaegeuk: remove unnecessary tracepoints]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Namjae Jeon [Tue, 23 Apr 2013 08:00:52 +0000 (17:00 +0900)]
f2fs: add tracepoints to debug the block allocation
Add tracepoints to debug the block allocation & fallocate.
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Pankaj Kumar <pankaj.km@samsung.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
[Jaegeuk: enhance information]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Namjae Jeon [Tue, 23 Apr 2013 07:42:53 +0000 (16:42 +0900)]
f2fs: add tracepoints for GC threads
Add tracepoints for tracing the garbage collector
threads in f2fs with status of collection & type.
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Pankaj Kumar <pankaj.km@samsung.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
[Jaegeuk: modify slightly to show information]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Namjae Jeon [Tue, 23 Apr 2013 07:38:02 +0000 (16:38 +0900)]
f2fs: add tracepoint for tracing the page i/o
Add tracepoints for page i/o operations and block allocation
tracing during page read operation.
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Pankaj Kumar <pankaj.km@samsung.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
[Jaegeuk: combine and modify the tracepoint structures]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Namjae Jeon [Fri, 19 Apr 2013 16:28:52 +0000 (01:28 +0900)]
f2fs: add tracepoints for truncate operation
add tracepoints for tracing the truncate operations
like truncate node/data blocks, f2fs_truncate etc.
Tracepoints are added at entry and exit of operation
to trace the success & failure of operation.
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Pankaj Kumar <pankaj.km@samsung.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
[Jaegeuk: combine and modify the tracepoint structures]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Namjae Jeon [Fri, 19 Apr 2013 16:28:40 +0000 (01:28 +0900)]
f2fs: add tracepoints for sync & inode operations
Add tracepoints in f2fs for tracing the syncing
operations like filesystem sync, file sync enter/exit.
It will helf to trace the code under debugging scenarios.
Also add tracepoints for tracing the various inode operations
like building inode, eviction of inode, link/unlike of
inodes.
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Pankaj Kumar <pankaj.km@samsung.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
[Jaegeuk: combine and modify the tracepoint structures]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Namjae Jeon [Fri, 19 Apr 2013 16:27:21 +0000 (01:27 +0900)]
f2fs: make is_multimedia_file code align with its name
The code conditions put inside the function is_multimedia_file are
reverse to the name i.e, we need to negate the return to actually
check if the file is a multimedia file. So, change the code and usage
path to align both the name and comparision conditions.
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Wei Yongjun [Fri, 12 Apr 2013 02:23:18 +0000 (10:23 +0800)]
f2fs: fix error return code in f2fs_fill_super()
Fix to return a negative error code from the error handling
case instead of 0, as returned elsewhere in this function.
Introduce by commit c0d39e(f2fs: fix return values from validate superblock)
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Acked-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Namjae Jeon [Sat, 6 Apr 2013 05:44:32 +0000 (14:44 +0900)]
f2fs: fix typo mistakes
Fix typo mistakes.
1. I think that it should be 'L' instead of 'V'.
2. and try to fix 'Front' instead of 'Frone'
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Jaegeuk Kim [Mon, 8 Apr 2013 07:01:00 +0000 (16:01 +0900)]
f2fs: write checkpoint before starting FG_GC
In order to be aware of prefree and free sections during FG_GC, let's start with
write_checkpoint().
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Zhihui Zhang [Sun, 7 Apr 2013 16:57:04 +0000 (12:57 -0400)]
f2fs: fix the logic of IS_DNODE()
If (ofs % (NIDS_PER_BLOCK + 1) == 0), the node is an indirect node block.
Signed-off-by: Zhihui Zhang <zzhsuny@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Jaegeuk Kim [Thu, 22 Nov 2012 07:21:29 +0000 (16:21 +0900)]
f2fs: introduce a new global lock scheme
In the previous version, f2fs uses global locks according to the usage types,
such as directory operations, block allocation, block write, and so on.
Reference the following lock types in f2fs.h.
enum lock_type {
RENAME, /* for renaming operations */
DENTRY_OPS, /* for directory operations */
DATA_WRITE, /* for data write */
DATA_NEW, /* for data allocation */
DATA_TRUNC, /* for data truncate */
NODE_NEW, /* for node allocation */
NODE_TRUNC, /* for node truncate */
NODE_WRITE, /* for node write */
NR_LOCK_TYPE,
};
In that case, we lose the performance under the multi-threading environment,
since every types of operations must be conducted one at a time.
In order to address the problem, let's share the locks globally with a mutex
array regardless of any types.
So, let users grab a mutex and perform their jobs in parallel as much as
possbile.
For this, I propose a new global lock scheme as follows.
0. Data structure
- f2fs_sb_info -> mutex_lock[NR_GLOBAL_LOCKS]
- f2fs_sb_info -> node_write
1. mutex_lock_op(sbi)
- try to get an avaiable lock from the array.
- returns the index of the gottern lock variable.
2. mutex_unlock_op(sbi, index of the lock)
- unlock the given index of the lock.
3. mutex_lock_all(sbi)
- grab all the locks in the array before the checkpoint.
4. mutex_unlock_all(sbi)
- release all the locks in the array after checkpoint.
5. block_operations()
- call mutex_lock_all()
- sync_dirty_dir_inodes()
- grab node_write
- sync_node_pages()
Note that,
the pairs of mutex_lock_op()/mutex_unlock_op() and
mutex_lock_all()/mutex_unlock_all() should be used together.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Jason Hrycay [Tue, 9 Apr 2013 01:16:44 +0000 (20:16 -0500)]
f2fs: move f2fs_balance_fs from truncate to punch_hole
Move the f2fs_balance_fs out of the truncate_hole function and only
perform that in punch_hole use case. The commit:
ed60b1644e7f7e5dd67d21caf7e4425dff05dad0
intended to do this but moved it into truncate_hole to cover more
cases. However, a deadlock scenario is possible when deleting an inode
entry under specific conditions:
f2fs_delete_entry()
mutex_lock_op(sbi, DENTRY_OPS);
truncate_hole()
f2fs_balance_fs()
mutex_lock(&sbi->gc_mutex);
f2fs_gc()
write_checkpoint()
block_operations()
mutex_lock_op(sbi, DENTRY_OPS);
Lets move it into the punch_hole case to cover the original intent of
avoiding it during fallocate's expand_inode_data case.
Change-Id: I29f8ea1056b0b88b70ba8652d901b6e8431bb27e
Signed-off-by: Jason Hrycay <jason.hrycay@motorola.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Jaegeuk Kim [Wed, 3 Apr 2013 13:19:03 +0000 (22:19 +0900)]
f2fs: reduce redundant spin_lock operations
This patch reduces redundant spin_lock operations in alloc_nid_failed().
The alloc_nid_failed() does not need to delete entry and add one again
by triggering spin_lock and spin_unlock redundantly.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Changman Lee [Wed, 3 Apr 2013 06:26:49 +0000 (15:26 +0900)]
f2fs: update f2fs.txt related with discard at mkfs
o mkfs.f2fs supports no discard option.
o fixed volume label size in 512 bytes.
Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
P J P [Wed, 3 Apr 2013 02:38:00 +0000 (11:38 +0900)]
f2fs: add NULL pointer check
Commit -
fa9150a84c - replaces a call to generic_writepages() in
f2fs_write_data_pages() with write_cache_pages(), with a function pointer
argument pointing to routine: __f2fs_writepage.
-> https://git.kernel.org/linus/
fa9150a84ca333f68127097c4fa1eda4b3913a22
This patch adds a NULL pointer check in f2fs_write_data_pages() to avoid
a possible NULL pointer dereference, in case if - mapping->a_ops->writepage -
is NULL.
Signed-off-by: P J P <ppandit@redhat.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Jaegeuk Kim [Mon, 1 Apr 2013 04:52:09 +0000 (13:52 +0900)]
f2fs: fix the bitmap consistency of dirty segments
Like below, there are 8 segment bitmaps for SSR victim candidates.
enum dirty_type {
DIRTY_HOT_DATA, /* dirty segments assigned as hot data logs */
DIRTY_WARM_DATA, /* dirty segments assigned as warm data logs */
DIRTY_COLD_DATA, /* dirty segments assigned as cold data logs */
DIRTY_HOT_NODE, /* dirty segments assigned as hot node logs */
DIRTY_WARM_NODE, /* dirty segments assigned as warm node logs */
DIRTY_COLD_NODE, /* dirty segments assigned as cold node logs */
DIRTY, /* to count # of dirty segments */
PRE, /* to count # of entirely obsolete segments */
NR_DIRTY_TYPE
};
The upper 6 bitmaps indicates segments dirtied by active log areas respectively.
And, the DIRTY bitmap integrates all the 6 bitmaps.
For example,
o DIRTY_HOT_DATA :
1010000
o DIRTY_WARM_DATA:
0100000
o DIRTY_COLD_DATA:
0001000
o DIRTY_HOT_NODE :
0000010
o DIRTY_WARM_NODE:
0000001
o DIRTY_COLD_NODE:
0000000
In this case,
o DIRTY :
1111011,
which means that we should guarantee the consistency between DIRTY and other
bitmaps concreately.
However, the SSR mode selects victims freely from any log types, which can set
multiple bits across the various bitmap types.
So, this patch eliminates this inconsistency.
Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Jaegeuk Kim [Sun, 31 Mar 2013 23:32:21 +0000 (08:32 +0900)]
f2fs: avoid race for summary information
In order to do GC more reliably, I'd like to lock the vicitm summary page
until its GC is completed, and also prevent any checkpoint process.
Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Jaegeuk Kim [Sun, 31 Mar 2013 04:58:51 +0000 (13:58 +0900)]
f2fs: allocate remained free segments in the LFS mode
This patch adds a new condition that allocates free segments in the current
active section even if SSR is needed.
Otherwise, f2fs cannot allocate remained free segments in the section since
SSR finds dirty segments only.
Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Jaegeuk Kim [Sun, 31 Mar 2013 04:49:18 +0000 (13:49 +0900)]
f2fs: check completion of foreground GC
The foreground GCs are triggered under not enough free sections.
So, we should not skip moving valid blocks in the victim segments.
Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Jaegeuk Kim [Sun, 31 Mar 2013 04:26:03 +0000 (13:26 +0900)]
f2fs: change GC bitmaps to apply the section granularity
This patch removes a bitmap for victim segments selected by foreground GC, and
modifies the other bitmap for victim segments selected by background GC.
1) foreground GC bitmap
: We don't need to manage this, since we just only one previous victim section
number instead of the whole victim history.
The f2fs uses the victim section number in order not to allocate currently
GC'ed section to current active logs.
2) background GC bitmap
: This bitmap is used to avoid selecting victims repeatedly by background GCs.
In addition, the victims are able to be selected by foreground GCs, since
there is no need to read victim blocks during foreground GCs.
By the fact that the foreground GC reclaims segments in a section unit, it'd
be better to manage this bitmap based on the section granularity.
Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Jaegeuk Kim [Sun, 31 Mar 2013 03:59:53 +0000 (12:59 +0900)]
f2fs: allocate new segment aligned with sections
When allocating a new segment under the LFS mode, we should keep the section
boundary.
Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Jaegeuk Kim [Sun, 31 Mar 2013 03:47:20 +0000 (12:47 +0900)]
f2fs: remove redundant lock_page calls
In get_node_page, we do not need to call lock_page all the time.
If the node page is cached as uptodate,
1. grab_cache_page locks the page,
2. read_node_page unlocks the page, and
3. lock_page is called for further process.
Let's avoid this.
Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Jaegeuk Kim [Sun, 31 Mar 2013 03:39:49 +0000 (12:39 +0900)]
f2fs: introduce TOTAL_SECS macro
Let's use a macro to get the total number of sections.
Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Jaegeuk Kim [Sun, 31 Mar 2013 03:30:04 +0000 (12:30 +0900)]
f2fs: do not use duplicate names in a macro
A macro should not use duplicate parameter names.
Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Alexandru Gheorghiu [Thu, 28 Mar 2013 00:24:53 +0000 (02:24 +0200)]
f2fs: use kmemdup
Use kmemdup instead of kzalloc and memcpy.
Signed-off-by: Alexandru Gheorghiu <gheorghiuandru@gmail.com>
Acked-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Jaegeuk Kim [Thu, 21 Mar 2013 06:21:57 +0000 (15:21 +0900)]
f2fs: fix to give correct parent inode number for roll forward
When we recover fsync'ed data after power-off-recovery, we should guarantee
that any parent inode number should be correct for each direct inode blocks.
So, let's make the following rules.
- The fsync should do checkpoint to all the inodes that were experienced hard
links.
- So, the only normal files can be recovered by roll-forward.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Jaegeuk Kim [Thu, 21 Mar 2013 03:53:19 +0000 (12:53 +0900)]
f2fs: remain nat cache entries for further free nid allocation
In the checkpoint flow, the f2fs investigates the total nat cache entries.
Previously, if an entry has NULL_ADDR, f2fs drops the entry and adds the
obsolete nid to the free nid list.
However, this free nid will be reused sooner, resulting in its nat entry miss.
In order to avoid this, we don't need to drop the nat cache entry at this moment.
Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Jaegeuk Kim [Wed, 20 Mar 2013 05:58:38 +0000 (14:58 +0900)]
f2fs: do not skip writing file meta during fsync
This patch removes data_version check flow during the fsync call.
The original purpose for the use of data_version was to avoid writng inode
pages redundantly by the fsync calls repeatedly.
However, when user can modify file meta and then call fsync, we should not
skip fsync procedure.
So, let's remove this condition check and hope that user triggers in right
manner.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Jaegeuk Kim [Wed, 20 Mar 2013 10:01:06 +0000 (19:01 +0900)]
f2fs: fix the recovery flow to handle errors correctly
We should handle errors during the recovery flow correctly.
For example, if we get -ENOMEM, we should report a mount failure instead of
conducting the remained mount procedure.
Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Masanari Iida [Mon, 18 Mar 2013 23:03:35 +0000 (08:03 +0900)]
f2fs: fix typo in comments
Correct spelling typo in comments
Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Acked-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Namjae Jeon [Sun, 17 Mar 2013 08:27:20 +0000 (17:27 +0900)]
f2fs: avoid BUG_ON from check_nid_range and update return path in do_read_inode
In function check_nid_range, there is no need to trigger BUG_ON and make kernel stop.
Instead it could just check and indicate the inode number to be EINVAL.
Update the return path in do_read_inode to use the return from check_nid_range.
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
[Jaegeuk: replace BUG_ON with WARN_ON]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Namjae Jeon [Sun, 17 Mar 2013 08:26:53 +0000 (17:26 +0900)]
f2fs: fix return values from validate superblock
validate super block is not returning with proper values.
When failure from sb_bread it should reflect there is an EIO otherwise
it should return of EINVAL.
Returning, '1' is not conveying proper message as the return type.
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Namjae Jeon [Sun, 17 Mar 2013 08:26:39 +0000 (17:26 +0900)]
f2fs: reorganize f2fs_setxattr
make use of F2FS_NAME_LEN for name length checking,
change return conditions at few places, by assigning
storing the errorvalue in 'error' and making a common
exit path.
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Namjae Jeon [Sun, 17 Mar 2013 08:26:14 +0000 (17:26 +0900)]
f2fs: notify when discard is not supported
Change f2fs so that a warning is emitted when an attempt is made to
mount a filesystem with the unsupported discard option.
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Jaegeuk Kim [Sat, 16 Mar 2013 02:13:04 +0000 (11:13 +0900)]
f2fs: fix to call WRITE_FLUSH at the end of fsync
The fsync call should be ended after flushing the in-device caches.
Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Jaegeuk Kim [Fri, 15 Mar 2013 23:34:37 +0000 (08:34 +0900)]
f2fs: fix not to allocate max_nid
The build_free_nid should not add free nids over nm_i->max_nid.
But, there was a hole that invalid free nid was added by the following scenario.
Let's suppose nm_i->max_nid = 150 and the last NAT page has 100 ~ 200 nids.
build_free_nids
- get_current_nat_page loads the last NAT page
- scan_nat_page can add 100 ~ 200 nids
-> Bug here!
So, when scanning an NAT page, we should check each candidate whether it is
over max_nid or not.
Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Jaegeuk Kim [Thu, 14 Mar 2013 00:24:32 +0000 (09:24 +0900)]
f2fs: fix return value of releasepage for node and data
If the return value of releasepage is equal to zero, the page cannot be reclaimed.
Instead, we should return 1 in order to reclaim clean pages.
Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Jaegeuk Kim [Wed, 13 Mar 2013 23:49:58 +0000 (08:49 +0900)]
f2fs: scan next nat page to reuse free nids in there
When we build new free nids, let's scan the just next NAT page instead of
skipping a couple of previously scanned pages in order to reuse free nids in
there.
Otherwise, we can use too much wide range of nids even though several nids were
deallocated, and also their node pages can be cached in the node_inode's address
space.
This means that we can retain lots of clean pages in the main memory, which
induces mm's reclaiming overhead.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Jaegeuk Kim [Wed, 13 Mar 2013 08:49:22 +0000 (17:49 +0900)]
f2fs: should check the node page was truncated first
Currently, f2fs doesn't reclaim any node pages.
However, if we found that a node page was truncated by checking its block
address with zero during f2fs_write_node_page, we should not skip that node
page and return zero to reclaim it.
Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Jaegeuk Kim [Fri, 8 Mar 2013 12:29:23 +0000 (21:29 +0900)]
f2fs: reduce unncessary locking pages during read
This patch reduces redundant locking and unlocking pages during read operations.
In f2fs_readpage, let's use wait_on_page_locked() instead of lock_page.
And then, when we need to modify any data finally, let's lock the page so that
we can avoid lock contention.
[readpage rule]
- The f2fs_readpage returns unlocked page, or released page too in error cases.
- Its caller should handle read error, -EIO, after locking the page, which
indicates read completion.
- Its caller should check PageUptodate after grab_cache_page.
Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Namjae Jeon [Sat, 2 Mar 2013 03:41:31 +0000 (12:41 +0900)]
f2fs: avoid extra ++ while returning from get_node_path
In all the breaking conditions in get_node_path, 'n' is used to
track index in offset[] array, but while breaking out also, in all
paths n++ is done.
So, remove the ++ from breaking paths. Also, avoid
reset of 'level=0' in first case.
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Jaegeuk Kim [Sun, 3 Mar 2013 04:58:05 +0000 (13:58 +0900)]
f2fs: align f2fs maximum name length to linux based filesystem
The maximum filename length supported in linux is 255 characters.
So let's follow that.
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Namjae Jeon [Sat, 2 Mar 2013 03:40:50 +0000 (12:40 +0900)]
f2fs: optimize and change return path in lookup_free_nid_list
Optimize and change return path in lookup_free_nid_list
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Namjae Jeon [Sat, 2 Feb 2013 14:51:51 +0000 (23:51 +0900)]
f2fs: optimize get node page readahead part
We can remove the call to find_get_page to get a page from the cache
and check for up-to-date, instead we can make use of grab_cache_page
part itself to fetch the page from the cache.
So, removing the call and moving the PageUptodate at proper place, also
taken care of moving the lock_page condition in the page_hit part.
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Changman Lee [Tue, 19 Feb 2013 22:47:06 +0000 (07:47 +0900)]
f2fs: check the level before calling get_nid function
The caller of get_nid should be careful not to put lower value than
NODE_DIR1_BLOCK in case of level is zero.
Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Jaegeuk Kim [Tue, 26 Feb 2013 04:10:46 +0000 (13:10 +0900)]
f2fs: introduce readahead mode of node pages
Previously, f2fs reads several node pages ahead when get_dnode_of_data is called
with RDONLY_NODE flag.
And, this flag is set by the following functions.
- get_data_block_ro
- get_lock_data_page
- do_write_data_page
- truncate_blocks
- truncate_hole
However, this readahead mechanism is initially introduced for the use of
get_data_block_ro to enhance the sequential read performance.
So, let's clarify all the cases with the additional modes as follows.
enum {
ALLOC_NODE, /* allocate a new node page if needed */
LOOKUP_NODE, /* look up a node without readahead */
LOOKUP_NODE_RA, /*
* look up a node with readahead called
* by get_datablock_ro.
*/
}
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Jaegeuk Kim [Tue, 26 Feb 2013 03:43:46 +0000 (12:43 +0900)]
f2fs: read with READ_SYNC when getting dnode page
The get_node_page_ra tries to:
1. grab or read a target node page for the given nid,
2. then, call ra_node_page to read other adjacent node pages in advance.
So, when we try to read a target node page by #1, we should submit bio with
READ_SYNC instead of READA.
And, in #2, READA should be used.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Jaegeuk Kim [Wed, 13 Mar 2013 08:45:15 +0000 (17:45 +0900)]
f2fs: fix to unlock node page when it was truncated
If the node page was truncated, its block address became zero.
This means that we don't need to write the node page, but have to unlock
NODE_WRITE, decrease the number of dirty node pages, and then unlock_page
before returning the f2fs_write_node_page with zero.
Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Changman Lee [Mon, 25 Feb 2013 08:38:02 +0000 (17:38 +0900)]
f2fs: fix overflow when calculating utilization on 32-bit
Use div_u64 to fix overflow when calculating utilization.
*long int* is 4-bytes on 32-bit so (user blocks * 100) might be
overflow if disk size is over e.g. 512GB.
Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Linus Torvalds [Thu, 7 Mar 2013 23:57:38 +0000 (15:57 -0800)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Peter Anvin:
"Several boot fixes (MacBook, legacy EFI bootloaders), another
please-don't-brick fix, and some minor stuff."
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: Do not try to sync identity map for non-mapped pages
x86, doc: Be explicit about what the x86 struct boot_params requires
x86: Don't clear efi_info even if the sentinel hits
x86, mm: Make sure to find a 2M free block for the first mapped area
x86: Fix 32-bit *_cpu_data initializers
efivarfs: return accurate error code in efivarfs_fill_super()
efivars: efivarfs_valid_name() should handle pstore syntax
efi: be more paranoid about available space when creating variables
iommu, x86: Add DMA remap fault reason
x86, smpboot: Remove unused variable
Linus Torvalds [Thu, 7 Mar 2013 22:55:54 +0000 (14:55 -0800)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"Misc radeon, nouveau, mgag200 and intel fixes.
The intel fixes should contain the fix for the touchpad on the
Chromebook - hey I'm an input maintainer now!"
Hate to pee on your parade, Dave, but I don't think being an input
maintainer is necessarily something to strive for..
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (25 commits)
drm/tegra: drop "select DRM_HDMI"
drm: Documentation typo fixes
drm/mgag200: Bug fix: Renesas board now selects native resolution.
drm/mgag200: Reject modes that are too big for VRAM
drm/mgag200: 'fbdev_list' in 'struct mga_fbdev' is not used
drm/radeon: don't check mipmap alignment if MIP_ADDRESS is FMASK
drm/radeon: skip MC reset as it's probably not hung
drm/radeon: add primary dac adj quirk for R200 board
drm/radeon: don't set hpd, afmt interrupts when interrupts are disabled
drm/i915: Turn off hsync and vsync on ADPA when disabling crt
drm/i915: Fix incorrect definition of ADPA HSYNC and VSYNC bits
drm/i915: also disable south interrupts when handling them
drm/i915: enable irqs earlier when resuming
drm/i915: Increase the RC6p threshold.
DRM/i915: On G45 enable cursor plane briefly after enabling the display plane.
drm/nv50-: prevent some races between modesetting and page flipping
drm/nouveau/i2c: drop parent refcount when creating ports
drm/nv84: fix regression in page flipping
drm/nouveau: Fix typo in init_idx_addr_latched().
drm/nouveau: Disable AGP on PowerPC again.
...
Linus Torvalds [Thu, 7 Mar 2013 22:54:28 +0000 (14:54 -0800)]
Merge tag 'pm+acpi-3.9-rc2' of git://git./linux/kernel/git/rafael/linux-pm
Pull ACPI and power management fixes from Rafael J Wysocki:
- Two fixes for the new intel_pstate driver from Dirk Brandewie.
- Fix for incorrect usage of the .find_bridge() callback from struct
acpi_bus_type in the USB core and subsequent removal of that callback
from Rafael J Wysocki.
- ACPI processor driver cleanups from Chen Gang and Syam Sidhardhan.
- ACPI initialization and error messages fix from Joe Perches.
- Operating Performance Points documentation improvement from Nishanth
Menon.
- Fixes for memory leaks and potential concurrency issues and sysfs
attributes leaks during device removal in the core device PM QoS code
from Rafael J Wysocki.
- Calxeda Highbank cpufreq driver simplification from Emilio López.
- cpufreq comment cleanup from Namhyung Kim.
- Fix for a section mismatch in Calxeda Highbank interprocessor
communication code from Mark Langsdorf (this is not a PM fix strictly
speaking, but the code in question went in through the PM tree).
* tag 'pm+acpi-3.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq / intel_pstate: Do not load on VM that does not report max P state.
cpufreq / intel_pstate: Fix intel_pstate_init() error path
ACPI / glue: Drop .find_bridge() callback from struct acpi_bus_type
ACPI / glue: Add .match() callback to struct acpi_bus_type
ACPI / porocessor: Beautify code, pr->id is u32 which is never < 0
ACPI / processor: Remove redundant NULL check before kfree
ACPI / Sleep: Avoid interleaved message on errors
PM / QoS: Remove device PM QoS sysfs attributes at the right place
PM / QoS: Fix concurrency issues and memory leaks in device PM QoS
cpufreq: highbank: do not initialize array with a loop
PM / OPP: improve introductory documentation
cpufreq: Fix a typo in comment
mailbox, pl320-ipc: remove __init from probe function
Paul Bolle [Tue, 5 Mar 2013 21:07:36 +0000 (22:07 +0100)]
drm/tegra: drop "select DRM_HDMI"
Commit
ac24c2204a76e5b42aa103bf963ae0eda1b827f3 ("drm/tegra: Use generic
HDMI infoframe helpers") added "select DRM_HDMI" to the DRM_TEGRA
Kconfig entry. But there is no Kconfig symbol named DRM_HDMI. The select
statement for that symbol is a nop. Drop it.
What was needed to use HDMI functionality was to select HDMI (which this
entry already did through depending on DRM) and to include linux/hdmi.h
(which this commit also did).
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Acked-by: Thierry Reding <thierry.reding@avionic-design.de>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Christopher Harvey [Thu, 7 Mar 2013 15:42:25 +0000 (10:42 -0500)]
drm: Documentation typo fixes
Signed-off-by: Christopher Harvey <charvey@matrox.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Julia Lemire [Thu, 7 Mar 2013 15:41:03 +0000 (10:41 -0500)]
drm/mgag200: Bug fix: Renesas board now selects native resolution.
Renesas boards were consistently defaulting to the 1024x768 resolution,
regardless of the native resolution of the monitor plugged in. It was
determined that the EDID of the monitor was not being read. Since the
DAC is a shared line, in order to read from or write to it we must take
control of the DAC clock. This can be done by setting the proper
register to one.
This bug fix sets the register MGA1064_GEN_IO_CTL2 to one. The DAC
control line can be used to determine whether or not a new monitor has
been plugged in. But since the hotplug feature is not one we will
support, it has been decided to simply leave the register set to one.
Signed-off-by: Julia Lemire <jlemire@matrox.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Christopher Harvey [Tue, 26 Feb 2013 15:55:44 +0000 (10:55 -0500)]
drm/mgag200: Reject modes that are too big for VRAM
A monitor or a user could request a resolution greater than the
available VRAM for the backing framebuffer. This change checks the
required framebuffer size against the max VRAM size and rejects modes
if they are too big. This change can also remove a mode request passed
in via the video= parameter.
Signed-off-by: Christopher Harvey <charvey@matrox.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Christopher Harvey [Tue, 26 Feb 2013 15:54:22 +0000 (10:54 -0500)]
drm/mgag200: 'fbdev_list' in 'struct mga_fbdev' is not used
Signed-off-by: Christopher Harvey <charvey@matrox.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Thu, 7 Mar 2013 22:28:22 +0000 (08:28 +1000)]
Merge branch 'drm-fixes-3.9' of git://people.freedesktop.org/~agd5f/linux into drm-next
Alex writes:
Radeon fixes pull. Not much to it.
- fix some splatter if the interrupt handler isn't registered
- Add a quirk for an old R200 board to fix washed out colors on the DAC
- Don't try and soft reset the MC when we reset the GPU. It usually doesn't
need it and doesn't always work reliably.
- A CS checker fix from Marek
* 'drm-fixes-3.9' of git://people.freedesktop.org/~agd5f/linux:
drm/radeon: don't check mipmap alignment if MIP_ADDRESS is FMASK
drm/radeon: skip MC reset as it's probably not hung
drm/radeon: add primary dac adj quirk for R200 board
drm/radeon: don't set hpd, afmt interrupts when interrupts are disabled
Linus Torvalds [Thu, 7 Mar 2013 21:47:18 +0000 (13:47 -0800)]
Merge branch 'fixes' of git://git.linaro.org/people/rmk/linux-arm
Pull ARM fixes from Russell King:
"Mainly a group of fixes, the only exception is the wiring up of the
kcmp syscall now that those patches went in during the last merge
window."
* 'fixes' of git://git.linaro.org/people/rmk/linux-arm:
ARM: 7668/1: fix memset-related crashes caused by recent GCC (4.7.2) optimizations
ARM: 7667/1: perf: Fix section mismatch on armpmu_init()
ARM: 7666/1: decompressor: add -mno-single-pic-base for building the decompressor
ARM: 7665/1: Wire up kcmp syscall
ARM: 7664/1: perf: remove erroneous semicolon from event initialisation
ARM: 7663/1: perf: fix ARMv7 EVTYPE_MASK to include NSH bit
ARM: 7662/1: hw_breakpoint: reset debug logic on secondary CPUs in s2ram resume
ARM: 7661/1: mm: perform explicit branch predictor maintenance when required
ARM: 7660/1: tlb: add branch predictor maintenance operations
ARM: 7659/1: mm: make mm->context.id an atomic64_t variable
ARM: 7658/1: mm: fix race updating mm->context.id on ASID rollover
ARM: 7657/1: head: fix swapper and idmap population with LPAE and big-endian
ARM: 7655/1: smp_twd: make twd_local_timer_of_register() no-op for nosmp
ARM: 7652/1: mm: fix missing use of 'asid' to get asid value from mm->context.id
ARM: 7642/1: netx: bump IRQ offset to 64
H. Peter Anvin [Thu, 7 Mar 2013 21:25:10 +0000 (13:25 -0800)]
Merge tag 'efi-for-3.9-rc2' into x86/urgent
EFI changes for v3.9-rc2,
* Make the EFI variable code more paranoid about running out of
space in NVRAM, since this is the root cause of the recent issue
where machines refuse to boot - from Matthew Garrett.
* Some efivarfs patches that fix regressions introduced in v3.9-rc1.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Dave Hansen [Thu, 7 Mar 2013 16:31:51 +0000 (08:31 -0800)]
x86: Do not try to sync identity map for non-mapped pages
kernel_map_sync_memtype() is called from a variety of contexts. The
pat.c code that calls it seems to ensure that it is not called for
non-ram areas by checking via pat_pagerange_is_ram(). It is important
that it only be called on the actual identity map because there *IS*
no map to sync for highmem pages, or for memory holes.
The ioremap.c uses are not as careful as those from pat.c, and call
kernel_map_sync_memtype() on PCI space which is in the middle of the
kernel identity map _range_, but is not actually mapped.
This patch adds a check to kernel_map_sync_memtype() which probably
duplicates some of the checks already in pat.c. But, it is necessary
for the ioremap.c uses and shouldn't hurt other callers.
I have reproduced this bug and this patch fixes it for me and the
original bug reporter:
https://lkml.org/lkml/2013/2/5/396
Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/20130307163151.D9B58C4E@kernel.stglabs.ibm.com
Signed-off-by: Dave Hansen <dave@sr71.net>
Tested-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Linus Torvalds [Thu, 7 Mar 2013 21:07:10 +0000 (13:07 -0800)]
Merge tag 'regulator-3.9-rc1' of git://git./linux/kernel/git/broonie/regulator
Pull regulator fixes from Mark Brown:
"A few small things here and there, nothing major here really. The
conversion of twl4030ldo_ops to get_voltage_sel is a fix, as covered
in the commit log it fixes inconsistency in handling of the IS_UNSUP()
feature in the driver."
* tag 'regulator-3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: fixed regulator_bulk_enable unwinding code
regulator: twl: Convert twl4030ldo_ops to get_voltage_sel
regulator: palmas: fix number of SMPS voltages
regulator: core: fix documentation error in regulator_allow_bypass
regulator: core: update kernel documentation for regulator_desc
regulator: db8500-prcmu - remove incorrect __exit markup
Linus Torvalds [Thu, 7 Mar 2013 21:06:21 +0000 (13:06 -0800)]
Merge tag 'regmap-v3.9-rc1' of git://git./linux/kernel/git/broonie/regmap
Pull regmap PM fix from Mark Brown:
"A simple fix to stop us leaking a runtime PM reference in the case
where we fail to enable a device."
* tag 'regmap-v3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
regmap: irq: call pm_runtime_put in pm_runtime_get_sync failed case
Linus Torvalds [Thu, 7 Mar 2013 20:47:24 +0000 (12:47 -0800)]
Merge tag 'ecryptfs-3.9-rc2-fixes' of git://git./linux/kernel/git/tyhicks/ecryptfs
Pull ecryptfs fixes from Tyler Hicks:
"Minor code cleanups and new Kconfig option to disable /dev/ecryptfs
The code cleanups fix up W=1 compiler warnings and some unnecessary
checks. The new Kconfig option, defaulting to N, allows the rarely
used eCryptfs kernel to userspace communication channel to be compiled
out. This may be the first step in it being eventually removed."
Hmm. I'm not sure whether these should be called "fixes", and it
probably should have gone in the merge window. But I'll let it slide.
* tag 'ecryptfs-3.9-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs:
eCryptfs: allow userspace messaging to be disabled
eCryptfs: Fix redundant error check on ecryptfs_find_daemon_by_euid()
ecryptfs: ecryptfs_msg_ctx_alloc_to_free(): remove kfree() redundant null check
eCryptfs: decrypt_pki_encrypted_session_key(): remove kfree() redundant null check
eCryptfs: remove unneeded checks in virt_to_scatterlist()
eCryptfs: Fix -Wmissing-prototypes warnings
eCryptfs: Fix -Wunused-but-set-variable warnings
eCryptfs: initialize payload_len in keystore.c
Linus Torvalds [Thu, 7 Mar 2013 20:46:25 +0000 (12:46 -0800)]
Merge tag 'hwmon-for-linus' of git://git./linux/kernel/git/groeck/linux-staging
Pull hwmon patches from Guenter Roeck:
"Bug fixes for sht15 and ltc2978 driver plus some documentation
updates"
* tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (sht15) Check return value of regulator_enable()
hwmon: (adt7410) Document ADT7420 support
hwmon: (pmbus/ltc2978) Use detected chip ID to select supported functionality
hwmon: (pmbus/ltc2978) Fix peak attribute handling
hwmon: (pmbus/ltc2978) Update datasheet links
hwmon: Update my e-mail address in driver documentation
Marek Olšák [Fri, 1 Mar 2013 12:40:31 +0000 (13:40 +0100)]
drm/radeon: don't check mipmap alignment if MIP_ADDRESS is FMASK
The MIP_ADDRESS state has 2 meanings. If the texture has one sample
per pixel, it's a pointer to the mipmap chain. If the texture has
multiple samples per pixel, it's a pointer to FMASK, a metadata buffer
needed for reading compressed MSAA textures. The mipmap
alignment rules do not apply to FMASK.
Signed-off-by: Marek Olšák <maraeo@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Thu, 28 Feb 2013 15:03:08 +0000 (10:03 -0500)]
drm/radeon: skip MC reset as it's probably not hung
The MC is mostly likely busy (e.g., display requests), not hung
so no need to reset it. Doing an MC reset is tricky and not
particularly reliable. Fixes hangs in certain cases.
Reported-by: Josh Boyer <jwboyer@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Wed, 27 Feb 2013 17:01:58 +0000 (12:01 -0500)]
drm/radeon: add primary dac adj quirk for R200 board
vbios values are wrong leading to colors that are
too bright. Use the default values instead.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Alex Deucher [Tue, 26 Feb 2013 21:17:33 +0000 (16:17 -0500)]
drm/radeon: don't set hpd, afmt interrupts when interrupts are disabled
Avoids splatter if the interrupt handler is not registered due
to acceleration being disabled.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Cc: stable@vger.kernel.org
Ivan Djelic [Wed, 6 Mar 2013 19:09:27 +0000 (20:09 +0100)]
ARM: 7668/1: fix memset-related crashes caused by recent GCC (4.7.2) optimizations
Recent GCC versions (e.g. GCC-4.7.2) perform optimizations based on
assumptions about the implementation of memset and similar functions.
The current ARM optimized memset code does not return the value of
its first argument, as is usually expected from standard implementations.
For instance in the following function:
void debug_mutex_lock_common(struct mutex *lock, struct mutex_waiter *waiter)
{
memset(waiter, MUTEX_DEBUG_INIT, sizeof(*waiter));
waiter->magic = waiter;
INIT_LIST_HEAD(&waiter->list);
}
compiled as:
800554d0 <debug_mutex_lock_common>:
800554d0:
e92d4008 push {r3, lr}
800554d4:
e1a00001 mov r0, r1
800554d8:
e3a02010 mov r2, #16 ; 0x10
800554dc:
e3a01011 mov r1, #17 ; 0x11
800554e0:
eb04426e bl
80165ea0 <memset>
800554e4:
e1a03000 mov r3, r0
800554e8:
e583000c str r0, [r3, #12]
800554ec:
e5830000 str r0, [r3]
800554f0:
e5830004 str r0, [r3, #4]
800554f4:
e8bd8008 pop {r3, pc}
GCC assumes memset returns the value of pointer 'waiter' in register r0; causing
register/memory corruptions.
This patch fixes the return value of the assembly version of memset.
It adds a 'mov' instruction and merges an additional load+store into
existing load/store instructions.
For ease of review, here is a breakdown of the patch into 4 simple steps:
Step 1
======
Perform the following substitutions:
ip -> r8, then
r0 -> ip,
and insert 'mov ip, r0' as the first statement of the function.
At this point, we have a memset() implementation returning the proper result,
but corrupting r8 on some paths (the ones that were using ip).
Step 2
======
Make sure r8 is saved and restored when (! CALGN(1)+0) == 1:
save r8:
- str lr, [sp, #-4]!
+ stmfd sp!, {r8, lr}
and restore r8 on both exit paths:
- ldmeqfd sp!, {pc} @ Now <64 bytes to go.
+ ldmeqfd sp!, {r8, pc} @ Now <64 bytes to go.
(...)
tst r2, #16
stmneia ip!, {r1, r3, r8, lr}
- ldr lr, [sp], #4
+ ldmfd sp!, {r8, lr}
Step 3
======
Make sure r8 is saved and restored when (! CALGN(1)+0) == 0:
save r8:
- stmfd sp!, {r4-r7, lr}
+ stmfd sp!, {r4-r8, lr}
and restore r8 on both exit paths:
bgt 3b
- ldmeqfd sp!, {r4-r7, pc}
+ ldmeqfd sp!, {r4-r8, pc}
(...)
tst r2, #16
stmneia ip!, {r4-r7}
- ldmfd sp!, {r4-r7, lr}
+ ldmfd sp!, {r4-r8, lr}
Step 4
======
Rewrite register list "r4-r7, r8" as "r4-r8".
Signed-off-by: Ivan Djelic <ivan.djelic@parrot.com>
Reviewed-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Dirk Behme <dirk.behme@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Peter Jones [Wed, 6 Mar 2013 18:00:23 +0000 (13:00 -0500)]
x86, doc: Be explicit about what the x86 struct boot_params requires
If the sentinel triggers, we do not want the boot loader authors to
just poke it and make the error go away, we want them to actually fix
the problem.
This should help avoid making the incorrect change in non-compliant
bootloaders.
[ hpa: dropped the Documentation/x86/boot.txt hunk pending
clarifications ]
Signed-off-by: Peter Jones <pjones@redhat.com>
Link: http://lkml.kernel.org/r/1362592823-28967-1-git-send-email-pjones@redhat.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Josh Boyer [Thu, 7 Mar 2013 04:23:30 +0000 (20:23 -0800)]
x86: Don't clear efi_info even if the sentinel hits
When boot_params->sentinel is set, all we really know is that some
undefined set of fields in struct boot_params contain garbage. In the
particular case of efi_info, however, there is a private magic for
that substructure, so it is generally safe to leave it even if the
bootloader is broken.
kexec (for which we did the initial analysis) did not initialize this
field, but of course all the EFI bootloaders do, and most EFI
bootloaders are broken in this respect (and should be fixed.)
Reported-by: Robin Holt <holt@sgi.com>
Link: http://lkml.kernel.org/r/CA%2B5PVA51-FT14p4CRYKbicykugVb=PiaEycdQ57CK2km_OQuRQ@mail.gmail.com
Tested-by: Josh Boyer <jwboyer@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Yinghai Lu [Thu, 7 Mar 2013 04:18:21 +0000 (20:18 -0800)]
x86, mm: Make sure to find a 2M free block for the first mapped area
Henrik reported that his MacAir 3.1 would not boot with
| commit
8d57470d8f859635deffe3919d7d4867b488b85a
| Date: Fri Nov 16 19:38:58 2012 -0800
|
| x86, mm: setup page table in top-down
It turns out that we do not calculate the real_end properly:
We try to get 2M size with 4K alignment, and later will round down
to 2M, so we will get less then 2M for first mapping, in extreme
case could be only 4K only. In Henrik's system it has (1M-32K) as
last usable rage is [mem 0x7f9db000-0x7fef8fff].
The problem is exposed when EFI booting have several holes and it
will force mapping to use PTE instead as we only map usable areas.
To fix it, just make it be 2M aligned, so we can be guaranteed to be
able to use large pages to map it.
Reported-by: Henrik Rydberg <rydberg@euromail.se>
Bisected-by: Henrik Rydberg <rydberg@euromail.se>
Tested-by: Henrik Rydberg <rydberg@euromail.se>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/CAE9FiQX4nQ7_1kg5RL_vh56rmcSHXUi1ExrZX7CwED4NGMnHfg@mail.gmail.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Krzysztof Mazur [Sat, 2 Mar 2013 23:14:42 +0000 (00:14 +0100)]
x86: Fix 32-bit *_cpu_data initializers
The commit
27be457000211a6903968dfce06d5f73f051a217
('x86 idle: remove 32-bit-only "no-hlt" parameter, hlt_works_ok
flag') removed the hlt_works_ok flag from struct cpuinfo_x86, but
boot_cpu_data and new_cpu_data initializers were not changed
causing setting f00f_bug flag, instead of fdiv_bug.
If CONFIG_X86_F00F_BUG is not set the f00f_bug flag is never
cleared.
To avoid such problems in future C99-style initialization is now
used.
Signed-off-by: Krzysztof Mazur <krzysiek@podlesie.net>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: len.brown@intel.com
Link: http://lkml.kernel.org/r/1362266082-2227-1-git-send-email-krzysiek@podlesie.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Dave Airlie [Thu, 7 Mar 2013 01:12:14 +0000 (11:12 +1000)]
Merge branch 'drm-intel-fixes' of git://people.freedesktop.org/~danvet/drm-intel into drm-next
A bunch of fixes, nothing truely horrible:
- Fix PCH irq handling race which resulted in missed gmbus/dp aux irqs
and subsequent fallout (Paulo)
- Fixup off-by-one in our hsw id table (Kenneth)
- Fixup ilk rc6 support (disabled by default), regression introduced in
3.8
- g4x plane w/a from Egbert Eich
- gen2/3/4 dpms suspend/standy fixes for VGA outputs from Patrik Jakobsson
- Workaround dying ivb machines with less aggressive rc6 values (Stéphane
Marchesin)
* 'drm-intel-fixes' of git://people.freedesktop.org/~danvet/drm-intel:
drm/i915: Turn off hsync and vsync on ADPA when disabling crt
drm/i915: Fix incorrect definition of ADPA HSYNC and VSYNC bits
drm/i915: also disable south interrupts when handling them
drm/i915: enable irqs earlier when resuming
drm/i915: Increase the RC6p threshold.
DRM/i915: On G45 enable cursor plane briefly after enabling the display plane.
drm/i915: Fix Haswell/CRW PCI IDs.
drm/i915: Don't clobber crtc->fb when queue_flip fails
drm/i915: wait_event_timeout's timeout is in jiffies
drm/i915: Fix missing variable initilization
Stephen Boyd [Tue, 5 Mar 2013 02:54:06 +0000 (03:54 +0100)]
ARM: 7667/1: perf: Fix section mismatch on armpmu_init()
WARNING: vmlinux.o(.text+0xfb80): Section mismatch in reference
from the function armpmu_register() to the function
.init.text:armpmu_init()
The function armpmu_register() references
the function __init armpmu_init().
This is often because armpmu_register lacks a __init
annotation or the annotation of armpmu_init is wrong.
Just drop the __init marking on armpmu_init() because
armpmu_register() no longer has an __init marking.
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Jonathan Austin [Mon, 4 Mar 2013 14:17:36 +0000 (15:17 +0100)]
ARM: 7666/1: decompressor: add -mno-single-pic-base for building the decompressor
Before jumping to (position independent) C-code from the decompressor's
assembler world we set-up the C environment. This setup currently does not
set r9, which for arm-none-uclinux-uclibceabi toolchains is by default
expected to be the PIC offset base register (IE should point to the
beginning of the GOT).
Currently, therefore, in order to build working kernels that use the
decompressor it is necessary to use an arm-linux-gnueabi toolchain, or
similar. uClinux toolchains cause a prefetch abort to occur at the beginning
of the decompress_kernel function.
This patch allows uClinux toolchains to build bootable zImages by forcing
the -mno-single-pic-base option, which ensures that the location of the GOT
is re-derived each time it is required, and r9 becomes free for use as a
general purpose register.
This has a small (4% in instruction terms) advantage over the alternative of
setting r9 to point to the GOT before calling into the C-world.
Signed-off-by: Jonathan Austin <jonathan.austin@arm.com>
Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Rafael J. Wysocki [Wed, 6 Mar 2013 22:42:05 +0000 (23:42 +0100)]
Merge branch 'pm-fixes' into fixes
* pm-fixes:
cpufreq / intel_pstate: Do not load on VM that does not report max P state.
cpufreq / intel_pstate: Fix intel_pstate_init() error path
PM / QoS: Remove device PM QoS sysfs attributes at the right place
PM / QoS: Fix concurrency issues and memory leaks in device PM QoS
cpufreq: highbank: do not initialize array with a loop
PM / OPP: improve introductory documentation
cpufreq: Fix a typo in comment
mailbox, pl320-ipc: remove __init from probe function
Rafael J. Wysocki [Wed, 6 Mar 2013 22:41:58 +0000 (23:41 +0100)]
Merge branch 'acpi-fixes' into fixes
* acpi-fixes:
ACPI / glue: Drop .find_bridge() callback from struct acpi_bus_type
ACPI / glue: Add .match() callback to struct acpi_bus_type
ACPI / porocessor: Beautify code, pr->id is u32 which is never < 0
ACPI / processor: Remove redundant NULL check before kfree
ACPI / Sleep: Avoid interleaved message on errors
Dirk Brandewie [Tue, 5 Mar 2013 22:15:26 +0000 (14:15 -0800)]
cpufreq / intel_pstate: Do not load on VM that does not report max P state.
It seems some VMs support the P state MSRs but return zeros. Fail
gracefully if we are running in this environment.
References: https://bugzilla.redhat.com/show_bug.cgi?id=916833
Reported-by: Josh Boyer <jwboyer@redhat.com>
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Dirk Brandewie [Tue, 5 Mar 2013 22:15:27 +0000 (14:15 -0800)]
cpufreq / intel_pstate: Fix intel_pstate_init() error path
If cpufreq_register_driver() fails just free memory that has been
allocated and return. intel_pstate_exit() function is removed since we
are built-in only now there is no reason for a module exit procedure.
Reported-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Patrik Jakobsson [Tue, 5 Mar 2013 13:24:48 +0000 (14:24 +0100)]
drm/i915: Turn off hsync and vsync on ADPA when disabling crt
According to PRM we need to disable hsync and vsync even though ADPA is
disabled. The previous code did infact do the opposite so we fix it.
Signed-off-by: Patrik Jakobsson <patrik.r.jakobsson@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=56359
Tested-by: max <manikulin@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Matt Fleming [Tue, 5 Mar 2013 12:46:30 +0000 (12:46 +0000)]
efivarfs: return accurate error code in efivarfs_fill_super()
Joseph was hitting a failure case when mounting efivarfs which
resulted in an incorrect error message,
$ sudo mount -v /sys/firmware/efi/efivars mount: Cannot allocate memory
triggered when efivarfs_valid_name() returned -EINVAL.
Make sure we pass accurate return values up the stack if
efivarfs_fill_super() fails to build inodes for EFI variables.
Reported-by: Joseph Yasi <joe.yasi@gmail.com>
Reported-by: Lingzhu Xiang <lxiang@redhat.com>
Cc: Josh Boyer <jwboyer@redhat.com>
Cc: Jeremy Kerr <jk@ozlabs.org>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: <stable@vger.kernel.org> # v3.8
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Matt Fleming [Tue, 5 Mar 2013 07:40:16 +0000 (07:40 +0000)]
efivars: efivarfs_valid_name() should handle pstore syntax
Stricter validation was introduced with commit
da27a24383b2b
("efivarfs: guid part of filenames are case-insensitive") and commit
47f531e8ba3b ("efivarfs: Validate filenames much more aggressively"),
which is necessary for the guid portion of efivarfs filenames, but we
don't need to be so strict with the first part, the variable name. The
UEFI specification doesn't impose any constraints on variable names
other than they be a NULL-terminated string.
The above commits caused a regression that resulted in users seeing
the following message,
$ sudo mount -v /sys/firmware/efi/efivars mount: Cannot allocate memory
whenever pstore EFI variables were present in the variable store,
since their variable names failed to pass the following check,
/* GUID should be right after the first '-' */
if (s - 1 != strchr(str, '-'))
as a typical pstore filename is of the form, dump-type0-10-1-<guid>.
The fix is trivial since the guid portion of the filename is GUID_LEN
bytes, we can use (len - GUID_LEN) to ensure the '-' character is
where we expect it to be.
(The bogus ENOMEM error value will be fixed in a separate patch.)
Reported-by: Joseph Yasi <joe.yasi@gmail.com>
Tested-by: Joseph Yasi <joe.yasi@gmail.com>
Reported-by: Lingzhu Xiang <lxiang@redhat.com>
Cc: Josh Boyer <jwboyer@redhat.com>
Cc: Jeremy Kerr <jk@ozlabs.org>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: <stable@vger.kernel.org> # v3.8
Signed-off-by: Matt Fleming <matt.fleming@intel.com>