Fabian Frederick [Fri, 8 Aug 2014 21:22:59 +0000 (14:22 -0700)]
fs/romfs/super.c: use pr_fmt in logging
- Remove "Error" in format logging (already in pr_ level)
- Use modulename in pr_fmt instead of ROMFS: in each pr_ callsites.
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fabian Frederick [Fri, 8 Aug 2014 21:22:57 +0000 (14:22 -0700)]
fs/romfs/super.c: convert printk to pr_foo()
Use current logging functions. Coalesce formats.
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fabian Frederick [Fri, 8 Aug 2014 21:22:55 +0000 (14:22 -0700)]
fs/cramfs/inode.c: use linux/uaccess.h
Fixes checkpatch warning:
WARNING: Use #include <linux/uaccess.h> instead of <asm/uaccess.h>
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fabian Frederick [Fri, 8 Aug 2014 21:22:52 +0000 (14:22 -0700)]
fs/cramfs: code clean-up
Fixes some checkpatch errors/warnings:
WARNING: Missing a blank line after declarations
ERROR: spaces required around that '=' (ctx:VxV)
ERROR: "foo * bar" should be "foo *bar"
ERROR: space prohibited after that open parenthesis '('
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fabian Frederick [Fri, 8 Aug 2014 21:22:50 +0000 (14:22 -0700)]
fs/cramfs: use pr_fmt
Use module name for "cramfs: " prefix. (note that uncompress.c printk had
no prefix).
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fabian Frederick [Fri, 8 Aug 2014 21:22:48 +0000 (14:22 -0700)]
fs/cramfs: convert printk to pr_foo()
Use current logging functions. No level printk converted to pr_err
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thierry Fauck [Fri, 8 Aug 2014 21:22:46 +0000 (14:22 -0700)]
tools/testing/selftests/ptrace/peeksiginfo.c: add PAGE_SIZE definition
On IBM powerpc where multiple page size value are supported, current
ppc64 and ppc64el distro don't define the PAGE_SIZE variable in
/usr/include as this is a dynamic value retrieved by the getpagesize()
or sysconf() defined in unistd.h. The PAGE_SIZE variable sounds defined
when only one value is supported by the kernel.
As such, when the PAGE_SIZE definition doesn't exist system should
retrieve the dynamic value.
Signed-off-by: Thierry Fauck <thierry@linux.vnet.ibm.com>
Cc: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Himangi Saraogi [Fri, 8 Aug 2014 21:22:44 +0000 (14:22 -0700)]
kfifo: use BUG_ON
Use BUG_ON(x) rather than if(x) BUG();
The semantic patch that fixes this problem is as follows:
// <smpl>
@@ identifier x; @@
-if (!x) BUG();
+BUG_ON(!x);
// </smpl>
Signed-off-by: Himangi Saraogi <himangi774@gmail.com>
Acked-by: Julia Lawall <julia.lawall@lip6.fr>
Cc: Stefani Seibold <stefani@seibold.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fabian Frederick [Fri, 8 Aug 2014 21:22:41 +0000 (14:22 -0700)]
fs/omfs/inode.c: replace count*size kzalloc by kcalloc
kcalloc manages count*sizeof overflow.
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Acked-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fabian Frederick [Fri, 8 Aug 2014 21:22:35 +0000 (14:22 -0700)]
fs/pstore/ram_core.c: replace count*size kmalloc by kmalloc_array
kmalloc_array manages count*sizeof overflow.
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Anton Vorontsov <anton@enomsg.org>
Cc: Colin Cross <ccross@android.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fabian Frederick [Fri, 8 Aug 2014 21:22:33 +0000 (14:22 -0700)]
drivers/parport/parport_ip32.c: use PTR_ERR_OR_ZERO
replace IS_ERR/PTR_ERR
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Wolfram Sang <wsa@the-dreams.de>
Cc: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Josh Hunt [Fri, 8 Aug 2014 21:22:31 +0000 (14:22 -0700)]
panic: add TAINT_SOFTLOCKUP
This taint flag will be set if the system has ever entered a softlockup
state. Similar to TAINT_WARN it is useful to know whether or not the
system has been in a softlockup state when debugging.
[akpm@linux-foundation.org: apply the taint before calling panic()]
Signed-off-by: Josh Hunt <johunt@akamai.com>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fabian Frederick [Fri, 8 Aug 2014 21:22:29 +0000 (14:22 -0700)]
fs/bfs: use bfs prefix for dump_imap
All bfs related functions use bfs_ prefix. This patch also moves extern
declaration to bfs.h and removes prototype from inode.c
This fixes checkpatch warning:
WARNING: externs should be avoided in .c files
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: "Tigran A. Aivazian" <tigran@aivazian.fsnet.co.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joe Perches [Fri, 8 Aug 2014 21:22:26 +0000 (14:22 -0700)]
adfs: add __printf verification, fix format/argument mismatches
Might as well do the right thing.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fabian Frederick [Fri, 8 Aug 2014 21:22:24 +0000 (14:22 -0700)]
fs/adfs/dir_fplus.c: replace count*size kzalloc by kcalloc
kcalloc manages count*sizeof overflow.
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fabian Frederick [Fri, 8 Aug 2014 21:22:22 +0000 (14:22 -0700)]
fs/adfs/dir_fplus.c: use ARRAY_SIZE instead of sizeof/sizeof[0]
Use kernel.h definition.
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fabian Frederick [Fri, 8 Aug 2014 21:22:20 +0000 (14:22 -0700)]
kernel/gcov/fs.c: remove unnecessary null test before debugfs_remove
This fixes checkpatch warning:
WARNING: debugfs_remove(NULL) is safe this check is probably not required
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fabian Frederick [Fri, 8 Aug 2014 21:22:18 +0000 (14:22 -0700)]
fs/exofs/ore_raid.c: replace count*size kzalloc by kcalloc
kcalloc manages count*sizeof overflow.
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Acked-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joe Perches [Fri, 8 Aug 2014 21:22:16 +0000 (14:22 -0700)]
sysctl: remove typedef ctl_table
Remove the final user, and the typedef itself.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Wei Yang [Fri, 8 Aug 2014 21:22:14 +0000 (14:22 -0700)]
lib/rbtree.c: fix typo in comment of __rb_insert()
In case 1, it passes down the BLACK color from G to p and u, and maintains
the color of n. By doing so, it maintains the black height of the
sub-tree.
While in the comment, it marks the color of n to BLACK. This is a typo
and not consistents with the code.
This patch fixs this typo in comment.
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: Michel Lespinasse <walken@google.com>
Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexandre Bounine [Fri, 8 Aug 2014 21:22:12 +0000 (14:22 -0700)]
rapidio/tsi721_dma: rework scatter-gather list handling
Rework Tsi721 RapidIO DMA engine support to allow handling data
scatter/gather lists longer than number of hardware buffer descriptors in
the DMA channel's descriptor list.
The current implementation of Tsi721 DMA transfers requires that number of
entries in a scatter/gather list provided by a caller of
dmaengine_prep_rio_sg() should not exceed number of allocated hardware
buffer descriptors.
This patch removes the limitation by processing long scatter/gather lists
by sections that can be transferred using hardware descriptor ring of
configured size. It also introduces a module parameter
"dma_desc_per_channel" to allow run-time configuration of Tsi721 hardware
buffer descriptor rings.
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com>
Cc: Stef van Os <stef.van.os@prodrive-technologies.com>
Cc: Vinod Koul <vinod.koul@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexandre Bounine [Fri, 8 Aug 2014 21:22:09 +0000 (14:22 -0700)]
rapidio: add new RapidIO DMA interface routines
Add RapidIO DMA interface routines that directly use reference to the mport
device object and/or target device destination ID as parameters.
This allows to perform RapidIO DMA transfer requests by modules that do not
have an access to the RapidIO device list.
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com>
Cc: Stef van Os <stef.van.os@prodrive-technologies.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrey Ryabinin [Fri, 8 Aug 2014 21:22:07 +0000 (14:22 -0700)]
lib/idr.c: fix out-of-bounds pointer dereference
I'm working on address sanitizer project for kernel. Recently we
started experiments with stack instrumentation, to detect out-of-bounds
read/write bugs on stack.
Just after booting I've hit out-of-bounds read on stack in idr_for_each
(and in __idr_remove_all as well):
struct idr_layer **paa = &pa[0];
while (id >= 0 && id <= max) {
...
while (n < fls(id)) {
n += IDR_BITS;
p = *--paa; <--- here we are reading pa[-1] value.
}
}
Despite the fact that after this dereference we are exiting out of loop
and never use p, such behaviour is undefined and should be avoided.
Fix this by moving pointer derference to the beggining of the loop,
right before we will use it.
Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com>
Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Alexey Preobrazhensky <preobr@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vitaly Kuznetsov [Fri, 8 Aug 2014 21:22:05 +0000 (14:22 -0700)]
fs/proc/vmcore.c:mmap_vmcore: skip non-ram pages reported by hypervisors
We have a special check in read_vmcore() handler to check if the page was
reported as ram or not by the hypervisor (pfn_is_ram()). However, when
vmcore is read with mmap() no such check is performed. That can lead to
unpredictable results, e.g. when running Xen PVHVM guest memcpy() after
mmap() on /proc/vmcore will hang processing HVMMEM_mmio_dm pages creating
enormous load in both DomU and Dom0.
Fix the issue by mapping each non-ram page to the zero page. Keep direct
path with remap_oldmem_pfn_range() to avoid looping through all pages on
bare metal.
The issue can also be solved by overriding remap_oldmem_pfn_range() in
xen-specific code, as remap_oldmem_pfn_range() was been designed for.
That, however, would involve non-obvious xen code path for all x86 builds
with CONFIG_XEN_PVHVM=y and would prevent all other hypervisor-specific
code on x86 arch from doing the same override.
[fengguang.wu@intel.com: remap_oldmem_pfn_checked() can be static]
[akpm@linux-foundation.org: clean up layout]
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vladimir Davydov [Fri, 8 Aug 2014 21:22:03 +0000 (14:22 -0700)]
kernel/fork.c: make mm_init_owner static
It's only used in fork.c:mm_init().
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vladimir Davydov [Fri, 8 Aug 2014 21:22:01 +0000 (14:22 -0700)]
fork: copy mm's vm usage counters under mmap_sem
If a forking process has a thread calling (un)mmap (silly but still),
the child process may have some of its mm's vm usage counters (total_vm
and friends) screwed up, because currently they are copied from oldmm
w/o holding any locks (memcpy in dup_mm).
This patch moves the counters initialization to dup_mmap() to be called
under oldmm->mmap_sem, which eliminates any possibility of race.
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vladimir Davydov [Fri, 8 Aug 2014 21:21:58 +0000 (14:21 -0700)]
fork: reset mm->pinned_vm
mm->pinned_vm counts pages of mm's address space that were permanently
pinned in memory by increasing their reference counter. The counter was
introduced by commit
bc3e53f682d9 ("mm: distinguish between mlocked and
pinned pages"), while before it locked_vm had been used for such pages.
Obviously, we should reset the counter on fork if !CLONE_VM, just like
we do with locked_vm, but currently we don't. Let's fix it.
This patch will fix the contents of /proc/pid/status:VmPin.
ib_umem_get[infiniband] and perf_mmap still check pinned_vm against
RLIMIT_MEMLOCK. It's left from the times when pinned pages were accounted
under locked_vm, but today it looks wrong. It isn't clear how we should
deal with it.
We still have some drivers accounting pinned pages under mm->locked_vm -
this is what commit
bc3e53f682d9 was fighting against. It's
infiniband/usnic and vfio.
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Roland Dreier <roland@kernel.org>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vladimir Davydov [Fri, 8 Aug 2014 21:21:56 +0000 (14:21 -0700)]
fork/exec: cleanup mm initialization
mm initialization on fork/exec is spread all over the place, which makes
the code look inconsistent.
We have mm_init(), which is supposed to init/nullify mm's internals, but
it doesn't init all the fields it should:
- on fork ->mmap,mm_rb,vmacache_seqnum,map_count,mm_cpumask,locked_vm
are zeroed in dup_mmap();
- on fork ->pmd_huge_pte is zeroed in dup_mm(), immediately before
calling mm_init();
- ->cpu_vm_mask_var ptr is initialized by mm_init_cpumask(), which is
called before mm_init() on both fork and exec;
- ->context is initialized by init_new_context(), which is called after
mm_init() on both fork and exec;
Let's consolidate all the initializations in mm_init() to make the code
look cleaner.
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Fri, 8 Aug 2014 21:21:54 +0000 (14:21 -0700)]
proc: remove INF macro
If you're applying this patch, all /proc/$PID/* files were converted
to seq_file interface and this code became unused.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Fri, 8 Aug 2014 21:21:52 +0000 (14:21 -0700)]
proc: convert /proc/$PID/hardwall to seq_file interface
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Fri, 8 Aug 2014 21:21:50 +0000 (14:21 -0700)]
proc: convert /proc/$PID/io to seq_file interface
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Fri, 8 Aug 2014 21:21:48 +0000 (14:21 -0700)]
proc: convert /proc/$PID/oom_score to seq_file interface
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Fri, 8 Aug 2014 21:21:46 +0000 (14:21 -0700)]
proc: convert /proc/$PID/schedstat to seq_file interface
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Fri, 8 Aug 2014 21:21:44 +0000 (14:21 -0700)]
proc: convert /proc/$PID/wchan to seq_file interface
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Fri, 8 Aug 2014 21:21:41 +0000 (14:21 -0700)]
proc: convert /proc/$PID/cmdline to seq_file interface
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Fri, 8 Aug 2014 21:21:39 +0000 (14:21 -0700)]
proc: convert /proc/$PID/syscall to seq_file interface
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Fri, 8 Aug 2014 21:21:37 +0000 (14:21 -0700)]
proc: convert /proc/$PID/limits to seq_file interface
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Fri, 8 Aug 2014 21:21:35 +0000 (14:21 -0700)]
proc: convert /proc/$PID/auxv to seq_file interface
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Fri, 8 Aug 2014 21:21:33 +0000 (14:21 -0700)]
proc: more "const char *" pointers
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Fri, 8 Aug 2014 21:21:31 +0000 (14:21 -0700)]
proc: remove proc_tty_ldisc variable
/proc/tty/ldisc appear to be unused as a directory and
it had been always that way.
But it is userspace visible thing.
Cowardly remove only in-kernel variable holding it.
[akpm@linux-foundation.org: add comment]
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Fri, 8 Aug 2014 21:21:29 +0000 (14:21 -0700)]
proc: make proc_subdir_lock static
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Fri, 8 Aug 2014 21:21:27 +0000 (14:21 -0700)]
proc: faster /proc/$PID lookup
Currently lookup for /proc/$PID first goes through spinlock and whole list
of misc /proc entries only to confirm that, yes, /proc/42 can not possibly
match random proc entry.
List is is several dozens entries long (52 entries on my setup).
None of this is necessary.
Try to convert dentry name to integer first.
If it works, it must be /proc/$PID.
If it doesn't, it must be random proc entry.
Based on patch from Al Viro.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Fri, 8 Aug 2014 21:21:25 +0000 (14:21 -0700)]
proc: add and remove /proc entry create checks
* remove proc_create(NULL, ...) check, let it oops
* warn about proc_create("", ...) and proc_create("very very long name", ...)
proc code keeps length as u8, no 256+ name length possible
* warn about proc_create("123", ...)
/proc/$PID and /proc/misc namespaces are separate things,
but dumb module might create funky a-la $PID entry.
* remove post mortem strchr('/') check
Triggering it implies either strchr() is buggy or memory corruption.
It should be VFS check anyway.
In reality, none of these checks will ever trigger,
it is preparation for the next patch.
Based on patch from Al Viro.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fabian Frederick [Fri, 8 Aug 2014 21:21:22 +0000 (14:21 -0700)]
proc: constify seq_operations
proc_uid_seq_operations, proc_gid_seq_operations and
proc_projid_seq_operations are only called in proc_id_map_open with
seq_open as const struct seq_operations so we can constify the 3
structures and update proc_id_map_open prototype.
text data bss dec hex filename
6817 404 1984 9205 23f5 kernel/user_namespace.o-before
6913 308 1984 9205 23f5 kernel/user_namespace.o-after
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fabian Frederick [Fri, 8 Aug 2014 21:21:20 +0000 (14:21 -0700)]
fs/proc/kcore.c: use PAGE_ALIGN instead of ALIGN(PAGE_SIZE)
Use mm.h definition.
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ionut Alexa [Fri, 8 Aug 2014 21:21:18 +0000 (14:21 -0700)]
kernel/exit.c: fix coding style warnings and errors
Fixed coding style warnings and errors.
Signed-off-by: Ionut Alexa <ionut.m.alexa@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fabian Frederick [Fri, 8 Aug 2014 21:21:16 +0000 (14:21 -0700)]
fs/hpfs/dnode.c: fix suspect code indent
Fix 2 checkpatch warnings:
WARNING: suspect code indent for conditional statements
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fabian Frederick [Fri, 8 Aug 2014 21:21:14 +0000 (14:21 -0700)]
fs/reiserfs/xattr.c: fix blank line missing after declarations
Fix checkpatch warning:
WARNING: Missing a blank line after declarations
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fabian Frederick [Fri, 8 Aug 2014 21:21:12 +0000 (14:21 -0700)]
fs/reiserfs: use linux/uaccess.h
Fix checkpatch warning
WARNING: Use #include <linux/uaccess.h> instead of <asm/uaccess.h>
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fabian Frederick [Fri, 8 Aug 2014 21:21:10 +0000 (14:21 -0700)]
fs/reiserfs: replace not-standard %Lu/%Ld
Fixes checkpatch warnings:
"WARNING: %Ld/%Lu are not-standard C, use %lld/%llu"
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fabian Frederick [Fri, 8 Aug 2014 21:21:08 +0000 (14:21 -0700)]
fs/ufs/inode.c: kernel-doc warning fixes
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fabian Frederick [Fri, 8 Aug 2014 21:21:05 +0000 (14:21 -0700)]
fs/ufs: convert UFSD printk to pr_debug
Convert no level printk to pr_debug in UFSD. DEBUG is defined with
CONFIG_UFS_DEBUG so pr_debug are emitted here.
Also fixing call to UFSD (add;)
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fabian Frederick [Fri, 8 Aug 2014 21:21:03 +0000 (14:21 -0700)]
fs/ufs/super.c: use va_format instead of buffer/vsnprintf
Remove error_buffer and use %pV
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fabian Frederick [Fri, 8 Aug 2014 21:21:01 +0000 (14:21 -0700)]
fs/ufs/super.c: use __func__ in logging
Replace approximate function name by __func__ using standard format
"function():"
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fabian Frederick [Fri, 8 Aug 2014 21:20:59 +0000 (14:20 -0700)]
fs/ufs: use pr_fmt
Replace UFS-fs, UFS-fs: and UFS: by pr_fmt with module name "ufs: "
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fabian Frederick [Fri, 8 Aug 2014 21:20:57 +0000 (14:20 -0700)]
fs/ufs: convert printk to pr_foo()
Use current logging functions.
- no level printk under CONFIG_UFS_DEBUG converted to pr_debug
- no level printk elsewhere converted to pr_err
- add DDEBUG flag in Makefile
- coalesce formats
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vyacheslav Dubeyko [Fri, 8 Aug 2014 21:20:55 +0000 (14:20 -0700)]
nilfs2: integrate sysfs support into driver
This patch integrates creation of sysfs groups and
attributes into NILFS file system driver.
It was found the issue with nilfs_sysfs_{create/delete}_snapshot_group
functions by Michael L Semon <mlsemon35@gmail.com> in the first
version of the patch:
BUG: sleeping function called from invalid context at kernel/locking/mutex.c:579
in_atomic(): 1, irqs_disabled(): 0, pid: 32676, name: umount.nilfs2
2 locks held by umount.nilfs2/32676:
#0: (&type->s_umount_key#21){++++..}, at: [<
790c18e2>] deactivate_super+0x37/0x58
#1: (&(&nilfs->ns_cptree_lock)->rlock){+.+...}, at: [<
791bf659>] nilfs_put_root+0x23/0x5a
Preemption disabled at:[<
791bf659>] nilfs_put_root+0x23/0x5a
CPU: 0 PID: 32676 Comm: umount.nilfs2 Not tainted 3.14.0+ #2
Hardware name: Dell Computer Corporation Dimension 2350/07W080, BIOS A01 12/17/2002
Call Trace:
dump_stack+0x4b/0x75
__might_sleep+0x111/0x16f
mutex_lock_nested+0x1e/0x3ad
kernfs_remove+0x12/0x26
sysfs_remove_dir+0x3d/0x62
kobject_del+0x13/0x38
nilfs_sysfs_delete_snapshot_group+0xb/0xd
nilfs_put_root+0x2a/0x5a
nilfs_detach_log_writer+0x1ab/0x2c1
nilfs_put_super+0x13/0x68
generic_shutdown_super+0x60/0xd1
kill_block_super+0x1d/0x60
deactivate_locked_super+0x22/0x3f
deactivate_super+0x3e/0x58
mntput_no_expire+0xe2/0x141
SyS_oldumount+0x70/0xa5
syscall_call+0x7/0xb
The reason of the issue was placement of
nilfs_sysfs_{create/delete}_snapshot_group() call under
nilfs->ns_cptree_lock protection. But this protection is unnecessary and
wrong solution. The second version of the patch fixes this issue.
[fengguang.wu@intel.com: nilfs_sysfs_create_mounted_snapshots_group can be static]
Reported-by: Michael L. Semon <mlsemon35@gmail.com>
Signed-off-by: Vyacheslav Dubeyko <Vyacheslav.Dubeyko@hgst.com>
Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Tested-by: Michael L. Semon <mlsemon35@gmail.com>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vyacheslav Dubeyko [Fri, 8 Aug 2014 21:20:52 +0000 (14:20 -0700)]
nilfs2: add /sys/fs/nilfs2/<device>/mounted_snapshots/<snapshot> group
This patch adds creation of <snapshot> group for every mounted
snapshot in /sys/fs/nilfs2/<device>/mounted_snapshots group.
The group contains details about mounted snapshot:
(1) inodes_count - show number of inodes for snapshot.
(2) blocks_count - show number of blocks for snapshot.
Signed-off-by: Vyacheslav Dubeyko <Vyacheslav.Dubeyko@hgst.com>
Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Michael L. Semon <mlsemon35@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vyacheslav Dubeyko [Fri, 8 Aug 2014 21:20:50 +0000 (14:20 -0700)]
nilfs2: add /sys/fs/nilfs2/<device>/mounted_snapshots group
This patch adds creation of /sys/fs/nilfs2/<device>/mounted_snapshots
group.
The mounted_snapshots group contains group for every
mounted snapshot.
Signed-off-by: Vyacheslav Dubeyko <Vyacheslav.Dubeyko@hgst.com>
Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Michael L. Semon <mlsemon35@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vyacheslav Dubeyko [Fri, 8 Aug 2014 21:20:48 +0000 (14:20 -0700)]
nilfs2: add /sys/fs/nilfs2/<device>/checkpoints group
This patch adds creation of /sys/fs/nilfs2/<device>/checkpoints
group.
The checkpoints group contains attributes that describe
details about volume's checkpoints:
(1) checkpoints_number - show number of checkpoints on volume.
(2) snapshots_number - show number of snapshots on volume.
(3) last_seg_checkpoint - show checkpoint number of the latest segment.
(4) next_checkpoint - show next checkpoint number.
Signed-off-by: Vyacheslav Dubeyko <Vyacheslav.Dubeyko@hgst.com>
Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Michael L. Semon <mlsemon35@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vyacheslav Dubeyko [Fri, 8 Aug 2014 21:20:46 +0000 (14:20 -0700)]
nilfs2: add /sys/fs/nilfs2/<device>/segments group
This patch adds creation of /sys/fs/nilfs2/<device>/segments
group.
The segments group contains attributes that describe
details about volume's segments:
(1) segments_number - show number of segments on volume.
(2) blocks_per_segment - show number of blocks in segment.
(3) clean_segments - show count of clean segments.
(4) dirty_segments - show count of dirty segments.
Signed-off-by: Vyacheslav Dubeyko <Vyacheslav.Dubeyko@hgst.com>
Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Michael L. Semon <mlsemon35@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vyacheslav Dubeyko [Fri, 8 Aug 2014 21:20:44 +0000 (14:20 -0700)]
nilfs2: add /sys/fs/nilfs2/<device>/segctor group
This patch adds creation of /sys/fs/nilfs2/<device>/segctor
group.
The segctor group contains attributes that describe
segctor thread activity details:
(1) last_pseg_block - show start block number of the latest segment.
(2) last_seg_sequence - show sequence value of the latest segment.
(3) last_seg_checkpoint - show checkpoint number of the latest segment.
(4) current_seg_sequence - show segment sequence counter.
(5) current_last_full_seg - show index number of the latest full segment.
(6) next_full_seg - show index number of the full segment index
to be used next.
(7) next_pseg_offset - show offset of next partial segment in
the current full segment.
(8) next_checkpoint - show next checkpoint number.
(9) last_seg_write_time - show write time of the last segment
in human-readable format.
(10) last_seg_write_time_secs - show write time of the last segment
in seconds.
(11) last_nongc_write_time - show write time of the last segment
not for cleaner operation in human-readable format.
(12) last_nongc_write_time_secs - show write time of the last segment
not for cleaner operation in seconds.
(13) dirty_data_blocks_count - show number of dirty data blocks.
Signed-off-by: Vyacheslav Dubeyko <Vyacheslav.Dubeyko@hgst.com>
Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Michael L. Semon <mlsemon35@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vyacheslav Dubeyko [Fri, 8 Aug 2014 21:20:42 +0000 (14:20 -0700)]
nilfs2: add /sys/fs/nilfs2/<device>/superblock group
This patch adds creation of /sys/fs/nilfs2/<device>/superblock
group.
The superblock group contains attributes that describe
superblock's details:
(1) sb_write_time - show previous write time of super block in
human-readable format.
(2) sb_write_time_secs - show previous write time of super block
in seconds.
(3) sb_write_count - show write count of super block.
(4) sb_update_frequency - show/set interval of periodical update
of superblock (in seconds). You can set preferable frequency of
superblock update by command:
echo <value> > /sys/fs/nilfs2/<device>/superblock/sb_update_frequency
Signed-off-by: Vyacheslav Dubeyko <Vyacheslav.Dubeyko@hgst.com>
Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Michael L. Semon <mlsemon35@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vyacheslav Dubeyko [Fri, 8 Aug 2014 21:20:39 +0000 (14:20 -0700)]
nilfs2: add /sys/fs/nilfs2/<device> group
This patch adds creation of /sys/fs/nilfs2/<device> group.
The <device> group contains attributes that describe file
system partition's details:
(1) revision - show NILFS file system revision.
(2) blocksize - show volume block size in bytes.
(3) device_size - show volume size in bytes.
(4) free_blocks - show count of free blocks on volume.
(5) uuid - show volume's UUID.
(6) volume_name - show volume's name.
Signed-off-by: Vyacheslav Dubeyko <Vyacheslav.Dubeyko@hgst.com>
Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Michael L. Semon <mlsemon35@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vyacheslav Dubeyko [Fri, 8 Aug 2014 21:20:37 +0000 (14:20 -0700)]
nilfs2: add /sys/fs/nilfs2/features group
This patchset implements creation of sysfs groups and attributes with
the purpose to show NILFS2 volume details, internal state of the driver
and to manage internal state of NILFS2 driver.
Sysfs is a virtual file system that exports information about devices
and drivers from the kernel device model to user space, and is also used
for configuration. NILFS2 is a complex file system that has segctor
thread, GC thread, checkpoint/snapshot model and so on. Sysfs namespace
provides native and easy way for: (1) getting info and statistics about
volume state; (2) getting info and configuration of internal subsystems
(segctor thread); (3) snapshots management.
Suggested patchset provides basis for managing segctor thread behaviour
and manipulation by snapshots. Currently, it informs only about segctor
thread's internal parameters and about mounted snapshots. But sysfs
interface can provide easy and simple way for deep management of segctor
thread and snapshots.
This patchset provides opportunity to manage interval of periodical
update of superblock (in seconds). Default value is 10 seconds. Now a
user can increase this value by means of
nilfs2/<device>/superblock/sb_update_frequency attribute in the case of
necessity.
Also the patchset provides opportunity to get information easily about
key volumes's parameters (free blocks, superblock write count,
superblock update frequency, latest segment info, dirty data blocks
count, count of clean segments, count of dirty segments and so on) in
real time manner. Such information can be used in scripts for subtle
management of filesystem.
Implemented functionality creates such groups:
(1) /sys/fs/nilfs2 - root group
(2) /sys/fs/nilfs2/features - group contains attributes that describe NILFS
file system driver features
(3) /sys/fs/nilfs2/<device> - group contains attributes that describe file
system partition's details
(4) /sys/fs/nilfs2/<device>/superblock - group contains attributes that describe
superblock's details
(5) /sys/fs/nilfs2/<device>/segctor - group contains attributes that describe
segctor thread activity details
(6) /sys/fs/nilfs2/<device>/segments - group contains attributes that describe
details about volume's segments
(7) /sys/fs/nilfs2/<device>/checkpoints - group contains attributes that describe
details about volume's checkpoints
(8) /sys/fs/nilfs2/<device>/mounted_snapshots - group contains group for every
mounted snapshot
(9) /sys/fs/nilfs2/<device>/mounted_snapshots/<snapshot> - group contains
details about mounted snapshot
This patch (of 9):
This patch adds code of creation /sys/fs/nilfs2 group and
/sys/fs/nilfs2/features group.
The features group contains attributes that describe NILFS
file system driver features:
(1) revision - show current revision of NILFS file system driver.
There are two formats of timestamp output - seconds and human-readable
format. Every showed timestamp has two sysfs files (time-<xxx> and
time-<xxx>-secs). One sysfs file (time-<xxx>) shows time in
human-readable format. Another sysfs file (time-<xxx>-secs) shows time in
seconds.
It was reported by Michael Semon that timestamp output in human-readable
format should be changed from "2014-4-12 14:5:38" to "2014-04-12
14:05:38". Second version of the patch fixes this issue.
Reported-by: Michael L. Semon <mlsemon35@gmail.com>
Signed-off-by: Vyacheslav Dubeyko <Vyacheslav.Dubeyko@hgst.com>
Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fabian Frederick [Fri, 8 Aug 2014 21:20:33 +0000 (14:20 -0700)]
fs/coda: use linux/uaccess.h
Fix checkpatch warning
WARNING: Use #include <linux/uaccess.h> instead of <asm/uaccess.h>
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Jan Harkes <jaharkes@cs.cmu.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fabian Frederick [Fri, 8 Aug 2014 21:20:31 +0000 (14:20 -0700)]
fs/befs/linuxvfs.c: check superblock before dump operation
befs_dump_super_block was called between befs_load_sb and befs_check_sb.
It has been reported to crash (5/900) with null block testing.
This patch loads, checks and only dump superblock if it's a valid one
then brelse bh.
(befs_dump_super_block uses disk_sb (bh->b_data) so it seems we need to
call it before brelse(bh) but I don't know why befs_check_sb was called
after brelse. Another thing I don't understand is why this problem
appears now).
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Qi Yong [Fri, 8 Aug 2014 21:20:29 +0000 (14:20 -0700)]
minix zmap block counts calculation fix
The original minix zmap blocks calculation was correct, in the formula of:
sbi->s_nzones - sbi->s_firstdatazone + 1
It is
sp->s_zones - (sp->s_firstdatazone - 1)
in the minix3 source code.
But a later commit
016e8d44bc06 ("fs/minix: Verify bitmap block counts
before mounting") has changed it unfortunately as:
sbi->s_nzones - (sbi->s_firstdatazone + 1)
This would show free blocks one block less than the real when the total
data blocks are in "full zmap blocks plus one".
This patch corrects that zmap blocks calculation and tidy a printk
message while at it.
Signed-off-by: Qi Yong <qiyong@fc-cn.com>
Cc: Josh Boyer <jwboyer@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thierry Reding [Fri, 8 Aug 2014 21:20:26 +0000 (14:20 -0700)]
drivers/rtc/rtc-tps65910.c: fix potential NULL-pointer dereference
The interrupt handler gets the driver data associated with the RTC
device and doesn't check it for validity. This can cause a NULL pointer
being dereferenced when and interrupt fires before the driver data was
properly set up.
Fix this by setting the driver data earlier (before the interrupt is
requested).
Signed-off-by: Thierry Reding <treding@nvidia.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hyogi Gim [Fri, 8 Aug 2014 21:20:24 +0000 (14:20 -0700)]
driver/rtc/class.c: check the error after rtc_read_time()
In rtc_suspend() and rtc_resume(), the error after rtc_read_time() is not
checked. If rtc device fail to read time, we cannot guarantee the
following process.
Add the verification code for returned rtc_read_time() error.
Signed-off-by: Hyogi Gim <hyogi.gim@lge.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Søren Andersen [Fri, 8 Aug 2014 21:20:22 +0000 (14:20 -0700)]
rtc: add pcf85063 support
Add support for the pcf85063 rtc chip.
[akpm@linux-foundation.org: fix comment typo, tweak conding style]
Signed-off-by: Soeren Andersen <san@rosetechnology.dk>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Stuart Longland [Fri, 8 Aug 2014 21:20:20 +0000 (14:20 -0700)]
drivers/rtc/rtc-isl12022.c: device tree support
Add support for configuring the ISL12022 real-time clock via the Device
tree framework. This is based on what I've seen in the related ISL12057
driver, it has been tested and works on a Technologic Systems TS-7670
device which uses a ISL12020 RTC device, my device tree looks like this:
apbx@
80040000 {
i2c0: i2c@
80058000 {
pinctrl-names = "default";
pinctrl-0 = <&i2c0_pins_a>;
clock-frequency = <400000>;
status = "okay";
isl12022@0x6f {
compatible = "isl,isl12022";
reg = <0x6f>;
};
};
... etc
};
Signed-off-by: Stuart Longland <stuartl@vrt.com.au>
Cc: Roman Fietze <roman.fietze@telemotive.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vincent Donnefort [Fri, 8 Aug 2014 21:20:18 +0000 (14:20 -0700)]
drivers/rtc/rtc-pcf8563.c: add alarm support
This patch adds alarm support for the NXP PCF8563 chip.
Signed-off-by: Vincent Donnefort <vdonnefort@gmail.com>
Cc: Simon Guinot <simon.guinot@sequanux.org>
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vincent Donnefort [Fri, 8 Aug 2014 21:20:16 +0000 (14:20 -0700)]
drivers/rtc/rtc-pcf8563.c: introduce read|write_block_data
This functions allow to factorize I2C I/O operations.
Signed-off-by: Vincent Donnefort <vdonnefort@gmail.com>
Cc: Simon Guinot <simon.guinot@sequanux.org>
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mark Salter [Fri, 8 Aug 2014 21:20:14 +0000 (14:20 -0700)]
rtc: ia64: allow other architectures to use EFI RTC
Currently, the rtc-efi driver is restricted to ia64 only. Newer
architectures with EFI support may want to also use that driver. This
patch moves the platform device setup from ia64 into drivers/rtc and
allow any architecture with CONFIG_EFI=y to use the rtc-efi driver.
Signed-off-by: Mark Salter <msalter@redhat.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hyogi Gim [Fri, 8 Aug 2014 21:20:11 +0000 (14:20 -0700)]
drivers/rtc/interface.c: check the error after __rtc_read_time()
In __rtc_set_alarm(), the error after __rtc_read_time() is not checked.
If rtc device fail to read time, we cannot guarantee the following
process.
Add the verification code for returned __rtc_read_time() error.
Signed-off-by: Hyogi Gim <hyogi.gim@lge.com>
Acked-by: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Beulich [Fri, 8 Aug 2014 21:20:09 +0000 (14:20 -0700)]
drivers/rtc/rtc-efi.c: check for invalid data coming back from UEFI
In particular seeing zero in eft->month is problematic, as it results in
-1 (converted to unsigned int, i.e. yielding 0xffffffff) getting passed
to rtc_year_days(), where the value gets used as an array index
(normally resulting in a crash). This was observed with the driver
enabled on x86 on some Fujitsu system (with possibly not up to date
firmware, but anyway).
Perhaps efi_read_alarm() should not fail if neither enabled nor pending
are set, but the returned time is invalid?
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reported-by: Raymund Will <rw@suse.de>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Jingoo Han <jg1.han@samsung.com>
Acked-by: Lee, Chun-Yi <jlee@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexander Shiyan [Fri, 8 Aug 2014 21:20:07 +0000 (14:20 -0700)]
drivers/rtc/rtc-ds1742.c: revert "drivers/rtc/rtc-ds1742.c: remove redundant of_match_ptr() helper"
Commit
5516f0971793 ("drivers/rtc/rtc-ds1742.c: remove redundant
of_match_ptr() helper") has description as: "'ds1742_rtc_of_match' is
always compiled in. Hence the helper macro is not needed".
It is not true, of_match_ptr() macro makes of_device_id parameter unused
and this constant is declared with __maybe_unused attribute, so normally
this variable should be discarded by linker. This patch revers this
commit, since there are no reason to compile of_device_id constant for
non-DT systems.
Signed-off-by: Alexander Shiyan <shc_work@mail.ru>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Sachin Kamat <sachin.kamat@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jean Delvare [Fri, 8 Aug 2014 21:20:05 +0000 (14:20 -0700)]
drivers/rtc/Kconfig: add hardware dependency to rtc-moxart
The rtc-moxart driver is only useful on MOXA ART systems so it should
depend on ARCH_MOXART as all other MOXA ART drivers already do. Add
COMPILE_TEST as an alternative to allow for build testing on other
systems.
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Jonas Jensen <jonas.jensen@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jean Delvare [Fri, 8 Aug 2014 21:20:03 +0000 (14:20 -0700)]
drivers/rtc/Kconfig: move DS2404 entry where it belongs
Move the DS2404 entry together with the other Dallas entries, and add
Maxim so that it looks nicer (and more correct, too.)
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Cc: Sven Schnelle <svens@stackframe.org>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Raghavendra Ganiga [Fri, 8 Aug 2014 21:20:01 +0000 (14:20 -0700)]
drivers/rtc/rtc-ds1343.c: add support of nvram for maxim dallas rtc ds1343
This is a patch to add support of nvram for maxim dallas rtc ds1343.
Signed-off-by: Raghavendra Chandra Ganiga <ravi23ganiga@gmail.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
NeilBrown [Fri, 8 Aug 2014 21:19:58 +0000 (14:19 -0700)]
autofs4: comment typo: remove a a doubled word
Signed-off-by: NeilBrown <neilb@suse.de>
Acked-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
NeilBrown [Fri, 8 Aug 2014 21:19:56 +0000 (14:19 -0700)]
autofs4: remove some unused inline functions
{__,}manage_dentry_{set,clear}_{automount,transit}
are 4 unused inline functions. Discard them.
Signed-off-by: NeilBrown <neilb@suse.de>
Acked-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
NeilBrown [Fri, 8 Aug 2014 21:19:54 +0000 (14:19 -0700)]
autofs4: don't take spinlock when not needed in autofs4_lookup_expiring
If the expiring_list is empty, we can avoid a costly spinlock in the
rcu-walk path through autofs4_d_manage (once the rest of the path
becomes rcu-walk friendly).
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
NeilBrown [Fri, 8 Aug 2014 21:19:52 +0000 (14:19 -0700)]
autofs4: remove a redundant assignment
The variable 'ino' already exists and already has the correct value.
The d_fsdata of a dentry is never changed after the d_fsdata is
instantiated, so this new assignment cannot be necessary.
It was introduced in commit
b5b801779d59 ("autofs4: Add d_manage()
dentry operation").
Signed-off-by: NeilBrown <neilb@suse.de>
Acked-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
NeilBrown [Fri, 8 Aug 2014 21:19:50 +0000 (14:19 -0700)]
autofs4: remove unused autofs4_ispending()
Signed-off-by: NeilBrown <neilb@suse.de>
Acked-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fabian Frederick [Fri, 8 Aug 2014 21:19:48 +0000 (14:19 -0700)]
kernel/test_kprobes.c: use current logging functions
- Add pr_fmt
- Coalesce formats
- Use current pr_foo() functions instead of printk
- Remove unnecessary "failed" display (already in log level).
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Paul Gortmaker [Fri, 8 Aug 2014 21:19:46 +0000 (14:19 -0700)]
init: make rootdelay=N consistent with rootwait behaviour
Currently rootdelay=N and rootwait behave differently (aside from the
obvious unbounded wait duration) because they are at different places in
the init sequence.
The difference manifests itself for md devices because the call to
md_run_setup() lives between rootdelay and rootwait, so if you try to use
rootdelay=20 to try and allow a slow RAID0 array to assemble, you get
this:
[ 4.526011] sd 6:0:0:0: [sdc] Attached SCSI removable disk
[ 22.972079] md: Waiting for all devices to be available before autodetect
i.e. you've achieved nothing other than delaying the probing 20s, when
what you wanted was a 20s delay _after_ the probing for md devices was
initiated.
Here we move the rootdelay code to be right beside the rootwait code, so
that their behaviour is consistent.
It should be noted that in doing so, the actions based on the
saved_root_name[0] and initrd_load() were previously put on hold by
rootdelay=N and now currently will not be delayed. However, I think
consistent behaviour is more important than matching historical behaviour
of delaying the above two operations.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fabian Frederick [Fri, 8 Aug 2014 21:19:43 +0000 (14:19 -0700)]
fs/ramfs/file-nommu.c: replace count*size kzalloc by kcalloc
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Axel Lin <axel.lin@ingics.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Namhyung Kim [Fri, 8 Aug 2014 21:19:41 +0000 (14:19 -0700)]
kernel/kallsyms.c: fix %pB when there's no symbol at the address
__sprint_symbol() should restore original address when kallsyms_lookup()
failed to find a symbol. It's reported when dumpstack shows an address in
a dynamically allocated trampoline for ftrace.
[ 1314.612287] [<
ffffffff81700312>] dump_stack+0x45/0x56
[ 1314.612290] [<
ffffffff8125f5b0>] ? meminfo_proc_open+0x30/0x30
[ 1314.612293] [<
ffffffffa080a494>] kpatch_ftrace_handler+0x14/0xf0 [kpatch]
[ 1314.612306] [<
ffffffffa00160c4>] 0xffffffffa00160c3
You can see a difference in the hex address - c4 and c3. Fix it.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Reported-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fabian Frederick [Fri, 8 Aug 2014 21:19:39 +0000 (14:19 -0700)]
fs/efs/namei.c: return is not a function
Fix checkpatch errors:
"ERROR: return is not a function, parentheses are not required"
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fabian Frederick [Fri, 8 Aug 2014 21:19:35 +0000 (14:19 -0700)]
mm/zswap.c: add __init to zswap_entry_cache_destroy()
zswap_entry_cache_destroy() is only called by __init init_zswap().
This patch also fixes function name zswap_entry_cache_ s/destory/destroy
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Acked-by: Seth Jennings <sjennings@variantweb.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Johannes Weiner [Fri, 8 Aug 2014 21:19:33 +0000 (14:19 -0700)]
mm: memcontrol: avoid charge statistics churn during page migration
Charge migration currently disables IRQs twice to update the charge
statistics for the old page and then again for the new page.
But migration is a seamless transition of a charge from one physical
page to another one of the same size, so this should be a non-event from
an accounting point of view. Leave the statistics alone.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Greg Thelen [Fri, 8 Aug 2014 21:19:31 +0000 (14:19 -0700)]
memcg: remove lookup_cgroup_page() prototype
Commit
6b208e3f6e35 ("mm: memcg: remove unused node/section info from
pc->flags") deleted lookup_cgroup_page() but left a prototype for it.
Kill the vestigial prototype.
Signed-off-by: Greg Thelen <gthelen@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vladimir Davydov [Fri, 8 Aug 2014 21:19:28 +0000 (14:19 -0700)]
page-cgroup: get rid of NR_PCG_FLAGS
It's not used anywhere today, so let's remove it.
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vladimir Davydov [Fri, 8 Aug 2014 21:19:26 +0000 (14:19 -0700)]
page-cgroup: trivial cleanup
Add forward declarations for struct pglist_data, mem_cgroup.
Remove __init, __meminit from function prototypes and inline functions.
Remove redundant inclusion of bit_spinlock.h.
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Johannes Weiner [Fri, 8 Aug 2014 21:19:24 +0000 (14:19 -0700)]
mm: memcontrol: use page lists for uncharge batching
Pages are now uncharged at release time, and all sources of batched
uncharges operate on lists of pages. Directly use those lists, and
get rid of the per-task batching state.
This also batches statistics accounting, in addition to the res
counter charges, to reduce IRQ-disabling and re-enabling.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Hugh Dickins <hughd@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vladimir Davydov <vdavydov@parallels.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Vladimir Davydov <vdavydov@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Johannes Weiner [Fri, 8 Aug 2014 21:19:22 +0000 (14:19 -0700)]
mm: memcontrol: rewrite uncharge API
The memcg uncharging code that is involved towards the end of a page's
lifetime - truncation, reclaim, swapout, migration - is impressively
complicated and fragile.
Because anonymous and file pages were always charged before they had their
page->mapping established, uncharges had to happen when the page type
could still be known from the context; as in unmap for anonymous, page
cache removal for file and shmem pages, and swap cache truncation for swap
pages. However, these operations happen well before the page is actually
freed, and so a lot of synchronization is necessary:
- Charging, uncharging, page migration, and charge migration all need
to take a per-page bit spinlock as they could race with uncharging.
- Swap cache truncation happens during both swap-in and swap-out, and
possibly repeatedly before the page is actually freed. This means
that the memcg swapout code is called from many contexts that make
no sense and it has to figure out the direction from page state to
make sure memory and memory+swap are always correctly charged.
- On page migration, the old page might be unmapped but then reused,
so memcg code has to prevent untimely uncharging in that case.
Because this code - which should be a simple charge transfer - is so
special-cased, it is not reusable for replace_page_cache().
But now that charged pages always have a page->mapping, introduce
mem_cgroup_uncharge(), which is called after the final put_page(), when we
know for sure that nobody is looking at the page anymore.
For page migration, introduce mem_cgroup_migrate(), which is called after
the migration is successful and the new page is fully rmapped. Because
the old page is no longer uncharged after migration, prevent double
charges by decoupling the page's memcg association (PCG_USED and
pc->mem_cgroup) from the page holding an actual charge. The new bits
PCG_MEM and PCG_MEMSW represent the respective charges and are transferred
to the new page during migration.
mem_cgroup_migrate() is suitable for replace_page_cache() as well,
which gets rid of mem_cgroup_replace_page_cache(). However, care
needs to be taken because both the source and the target page can
already be charged and on the LRU when fuse is splicing: grab the page
lock on the charge moving side to prevent changing pc->mem_cgroup of a
page under migration. Also, the lruvecs of both pages change as we
uncharge the old and charge the new during migration, and putback may
race with us, so grab the lru lock and isolate the pages iff on LRU to
prevent races and ensure the pages are on the right lruvec afterward.
Swap accounting is massively simplified: because the page is no longer
uncharged as early as swap cache deletion, a new mem_cgroup_swapout() can
transfer the page's memory+swap charge (PCG_MEMSW) to the swap entry
before the final put_page() in page reclaim.
Finally, page_cgroup changes are now protected by whatever protection the
page itself offers: anonymous pages are charged under the page table lock,
whereas page cache insertions, swapin, and migration hold the page lock.
Uncharging happens under full exclusion with no outstanding references.
Charging and uncharging also ensure that the page is off-LRU, which
serializes against charge migration. Remove the very costly page_cgroup
lock and set pc->flags non-atomically.
[mhocko@suse.cz: mem_cgroup_charge_statistics needs preempt_disable]
[vdavydov@parallels.com: fix flags definition]
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vladimir Davydov <vdavydov@parallels.com>
Tested-by: Jet Chen <jet.chen@intel.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Tested-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Johannes Weiner [Fri, 8 Aug 2014 21:19:20 +0000 (14:19 -0700)]
mm: memcontrol: rewrite charge API
These patches rework memcg charge lifetime to integrate more naturally
with the lifetime of user pages. This drastically simplifies the code and
reduces charging and uncharging overhead. The most expensive part of
charging and uncharging is the page_cgroup bit spinlock, which is removed
entirely after this series.
Here are the top-10 profile entries of a stress test that reads a 128G
sparse file on a freshly booted box, without even a dedicated cgroup (i.e.
executing in the root memcg). Before:
15.36% cat [kernel.kallsyms] [k] copy_user_generic_string
13.31% cat [kernel.kallsyms] [k] memset
11.48% cat [kernel.kallsyms] [k] do_mpage_readpage
4.23% cat [kernel.kallsyms] [k] get_page_from_freelist
2.38% cat [kernel.kallsyms] [k] put_page
2.32% cat [kernel.kallsyms] [k] __mem_cgroup_commit_charge
2.18% kswapd0 [kernel.kallsyms] [k] __mem_cgroup_uncharge_common
1.92% kswapd0 [kernel.kallsyms] [k] shrink_page_list
1.86% cat [kernel.kallsyms] [k] __radix_tree_lookup
1.62% cat [kernel.kallsyms] [k] __pagevec_lru_add_fn
After:
15.67% cat [kernel.kallsyms] [k] copy_user_generic_string
13.48% cat [kernel.kallsyms] [k] memset
11.42% cat [kernel.kallsyms] [k] do_mpage_readpage
3.98% cat [kernel.kallsyms] [k] get_page_from_freelist
2.46% cat [kernel.kallsyms] [k] put_page
2.13% kswapd0 [kernel.kallsyms] [k] shrink_page_list
1.88% cat [kernel.kallsyms] [k] __radix_tree_lookup
1.67% cat [kernel.kallsyms] [k] __pagevec_lru_add_fn
1.39% kswapd0 [kernel.kallsyms] [k] free_pcppages_bulk
1.30% cat [kernel.kallsyms] [k] kfree
As you can see, the memcg footprint has shrunk quite a bit.
text data bss dec hex filename
37970 9892 400 48262 bc86 mm/memcontrol.o.old
35239 9892 400 45531 b1db mm/memcontrol.o
This patch (of 4):
The memcg charge API charges pages before they are rmapped - i.e. have an
actual "type" - and so every callsite needs its own set of charge and
uncharge functions to know what type is being operated on. Worse,
uncharge has to happen from a context that is still type-specific, rather
than at the end of the page's lifetime with exclusive access, and so
requires a lot of synchronization.
Rewrite the charge API to provide a generic set of try_charge(),
commit_charge() and cancel_charge() transaction operations, much like
what's currently done for swap-in:
mem_cgroup_try_charge() attempts to reserve a charge, reclaiming
pages from the memcg if necessary.
mem_cgroup_commit_charge() commits the page to the charge once it
has a valid page->mapping and PageAnon() reliably tells the type.
mem_cgroup_cancel_charge() aborts the transaction.
This reduces the charge API and enables subsequent patches to
drastically simplify uncharging.
As pages need to be committed after rmap is established but before they
are added to the LRU, page_add_new_anon_rmap() must stop doing LRU
additions again. Revive lru_cache_add_active_or_unevictable().
[hughd@google.com: fix shmem_unuse]
[hughd@google.com: Add comments on the private use of -EAGAIN]
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vladimir Davydov <vdavydov@parallels.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oleg Nesterov [Fri, 8 Aug 2014 21:19:17 +0000 (14:19 -0700)]
vm_is_stack: use for_each_thread() rather then buggy while_each_thread()
Aleksei hit the soft lockup during reading /proc/PID/smaps. David
investigated the problem and suggested the right fix.
while_each_thread() is racy and should die, this patch updates
vm_is_stack().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reported-by: Aleksei Besogonov <alex.besogonov@gmail.com>
Tested-by: Aleksei Besogonov <alex.besogonov@gmail.com>
Suggested-by: David Rientjes <rientjes@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>