Eric Dumazet [Thu, 20 Dec 2012 16:00:27 +0000 (16:00 +0000)]
ip_gre: fix possible use after free
Once skb_realloc_headroom() is called, tiph might point to freed memory.
Cache tiph->ttl value before the reallocation, to avoid unexpected
behavior.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
Isaku Yamahata [Thu, 20 Dec 2012 15:12:52 +0000 (15:12 +0000)]
ip_gre: make ipgre_tunnel_xmit() not parse network header as IP unconditionally
ipgre_tunnel_xmit() parses network header as IP unconditionally.
But transmitting packets are not always IP packet. For example such packet
can be sent by packet socket with sockaddr_ll.sll_protocol set.
So make the function check if skb->protocol is IP.
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Wed, 19 Dec 2012 21:48:45 +0000 (21:48 +0000)]
solos-pci: double lock in geos_gpio_store()
There is a typo here so we do a double lock instead of an unlock.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Trond Myklebust [Fri, 21 Dec 2012 16:02:32 +0000 (11:02 -0500)]
NFS: Kill fscache warnings when mounting without -ofsc
The fscache code will currently bleat a "non-unique superblock keys"
warning even if the user is mounting without the 'fsc' option.
There should be no reason to even initialise the superblock cache cookie
unless we're planning on using fscache for something, so ensure that we
check for the NFS_OPTION_FSCACHE flag before calling into the fscache
code.
Reported-by: Paweł Sikora <pawel.sikora@agmk.net>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: David Howells <dhowells@redhat.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Howells [Fri, 21 Dec 2012 12:15:05 +0000 (12:15 +0000)]
NFS: Provide stub nfs_fscache_wait_on_invalidate() for when CONFIG_NFS_FSCACHE=n
Provide a stub nfs_fscache_wait_on_invalidate() function for when
CONFIG_NFS_FSCACHE=n lest the following error appear:
fs/nfs/inode.c: In function 'nfs_invalidate_mapping':
fs/nfs/inode.c:887:2: error: implicit declaration of function 'nfs_fscache_wait_on_invalidate' [-Werror=implicit-function-declaration]
cc1: some warnings being treated as errors
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Reported-by: Vineet Gupta <Vineet.Gupta1@synopsys.com>
Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 21 Dec 2012 05:30:12 +0000 (21:30 -0800)]
Merge tag 'vfio-for-v3.8-v2' of git://github.com/awilliam/linux-vfio
Pull vfio update from Alex Williamson.
* tag 'vfio-for-v3.8-v2' of git://github.com/awilliam/linux-vfio:
vfio-pci: Enable device before attempting reset
VFIO: fix out of order labels for error recovery in vfio_pci_init()
VFIO: use ACCESS_ONCE() to guard access to dev->driver
VFIO: unregister IOMMU notifier on error recovery path
vfio-pci: Re-order device reset
vfio: simplify kmalloc+copy_from_user to memdup_user
Linus Torvalds [Fri, 21 Dec 2012 04:11:52 +0000 (20:11 -0800)]
Merge branch 'for-next' of git://git.infradead.org/users/eparis/notify
Pull filesystem notification updates from Eric Paris:
"This pull mostly is about locking changes in the fsnotify system. By
switching the group lock from a spin_lock() to a mutex() we can now
hold the lock across things like iput(). This fixes a problem
involving unmounting a fs and having inodes be busy, first pointed out
by FAT, but reproducible with tmpfs.
This also restores signal driven I/O for inotify, which has been
broken since about 2.6.32."
Ugh. I *hate* the timing of this. It was rebased after the merge
window opened, and then left to sit with the pull request coming the day
before the merge window closes. That's just crap. But apparently the
patches themselves have been around for over a year, just gathering
dust, so now it's suddenly critical.
Fixed up semantic conflict in fs/notify/fdinfo.c as per Stephen
Rothwell's fixes from -next.
* 'for-next' of git://git.infradead.org/users/eparis/notify:
inotify: automatically restart syscalls
inotify: dont skip removal of watch descriptor if creation of ignored event failed
fanotify: dont merge permission events
fsnotify: make fasync generic for both inotify and fanotify
fsnotify: change locking order
fsnotify: dont put marks on temporary list when clearing marks by group
fsnotify: introduce locked versions of fsnotify_add_mark() and fsnotify_remove_mark()
fsnotify: pass group to fsnotify_destroy_mark()
fsnotify: use a mutex instead of a spinlock to protect a groups mark list
fanotify: add an extra flag to mark_remove_from_mask that indicates wheather a mark should be destroyed
fsnotify: take groups mark_lock before mark lock
fsnotify: use reference counting for groups
fsnotify: introduce fsnotify_get_group()
inotify, fanotify: replace fsnotify_put_group() with fsnotify_destroy_group()
Linus Torvalds [Fri, 21 Dec 2012 04:00:43 +0000 (20:00 -0800)]
Merge branch 'akpm' (Andrew's patch-bomb)
Merge the rest of Andrew's patches for -rc1:
"A bunch of fixes and misc missed-out-on things.
That'll do for -rc1. I still have a batch of IPC patches which still
have a possible bug report which I'm chasing down."
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (25 commits)
keys: use keyring_alloc() to create module signing keyring
keys: fix unreachable code
sendfile: allows bypassing of notifier events
SGI-XP: handle non-fatal traps
fat: fix incorrect function comment
Documentation: ABI: remove testing/sysfs-devices-node
proc: fix inconsistent lock state
linux/kernel.h: fix DIV_ROUND_CLOSEST with unsigned divisors
memcg: don't register hotcpu notifier from ->css_alloc()
checkpatch: warn on uapi #includes that #include <uapi/...
revert "rtc: recycle id when unloading a rtc driver"
mm: clean up transparent hugepage sysfs error messages
hfsplus: add error message for the case of failure of sync fs in delayed_sync_fs() method
hfsplus: rework processing of hfs_btree_write() returned error
hfsplus: rework processing errors in hfsplus_free_extents()
hfsplus: avoid crash on failed block map free
kcmp: include linux/ptrace.h
drivers/rtc/rtc-imxdi.c: must include <linux/spinlock.h>
mm: cma: WARN if freed memory is still in use
exec: do not leave bprm->interp on stack
...
Linus Torvalds [Fri, 21 Dec 2012 02:14:31 +0000 (18:14 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/viro/vfs
Pull VFS update from Al Viro:
"fscache fixes, ESTALE patchset, vmtruncate removal series, assorted
misc stuff."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (79 commits)
vfs: make lremovexattr retry once on ESTALE error
vfs: make removexattr retry once on ESTALE
vfs: make llistxattr retry once on ESTALE error
vfs: make listxattr retry once on ESTALE error
vfs: make lgetxattr retry once on ESTALE
vfs: make getxattr retry once on an ESTALE error
vfs: allow lsetxattr() to retry once on ESTALE errors
vfs: allow setxattr to retry once on ESTALE errors
vfs: allow utimensat() calls to retry once on an ESTALE error
vfs: fix user_statfs to retry once on ESTALE errors
vfs: make fchownat retry once on ESTALE errors
vfs: make fchmodat retry once on ESTALE errors
vfs: have chroot retry once on ESTALE error
vfs: have chdir retry lookup and call once on ESTALE error
vfs: have faccessat retry once on an ESTALE error
vfs: have do_sys_truncate retry once on an ESTALE error
vfs: fix renameat to retry on ESTALE errors
vfs: make do_unlinkat retry once on ESTALE errors
vfs: make do_rmdir retry once on ESTALE errors
vfs: add a flags argument to user_path_parent
...
Linus Torvalds [Fri, 21 Dec 2012 02:05:28 +0000 (18:05 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/viro/signal
Pull signal handling cleanups from Al Viro:
"sigaltstack infrastructure + conversion for x86, alpha and um,
COMPAT_SYSCALL_DEFINE infrastructure.
Note that there are several conflicts between "unify
SS_ONSTACK/SS_DISABLE definitions" and UAPI patches in mainline;
resolution is trivial - just remove definitions of SS_ONSTACK and
SS_DISABLED from arch/*/uapi/asm/signal.h; they are all identical and
include/uapi/linux/signal.h contains the unified variant."
Fixed up conflicts as per Al.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal:
alpha: switch to generic sigaltstack
new helpers: __save_altstack/__compat_save_altstack, switch x86 and um to those
generic compat_sys_sigaltstack()
introduce generic sys_sigaltstack(), switch x86 and um to it
new helper: compat_user_stack_pointer()
new helper: restore_altstack()
unify SS_ONSTACK/SS_DISABLE definitions
new helper: current_user_stack_pointer()
missing user_stack_pointer() instances
Bury the conditionals from kernel_thread/kernel_execve series
COMPAT_SYSCALL_DEFINE: infrastructure
Linus Torvalds [Fri, 21 Dec 2012 01:56:23 +0000 (17:56 -0800)]
Merge branch 'fixes' of git://git.linaro.org/people/rmk/linux-arm
Pull ARM fixes from Russell King:
"A number of smallish fixes scattered around the ARM code. Probably
the most serious one is the one from Al addressing the missing locking
in the swap emulation code."
* 'fixes' of git://git.linaro.org/people/rmk/linux-arm:
ARM: 7607/1: realview: fix private peripheral memory base for EB rev. B boards
ARM: 7606/1: cache: flush to LoUU instead of LoUIS on uniprocessor CPUs
ARM: missing ->mmap_sem around find_vma() in swp_emulate.c
ARM: 7605/1: vmlinux.lds: Move .notes section next to the rodata
ARM: 7602/1: Pass real "__machine_arch_type" variable to setup_machine_tags() procedure
ARM: 7600/1: include CONFIG_DEBUG_LL_INCLUDE rather than mach/debug-macro.S
Linus Torvalds [Fri, 21 Dec 2012 01:55:34 +0000 (17:55 -0800)]
Merge tag 'fixes2' of git://git./linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes part 2 from Olof Johansson:
"Here are a few more fixes for 3.8. Two branches of fixes for Samsung
platforms, including fixes for the audio build errors on all non-DT
platforms. There's also a fixup to the sunxi device-tree file renames
due to a bad patch application by me, and a fix for OMAP due to
function renames merged through the powerpc tree."
* tag 'fixes2' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
ARM: OMAP2+: Fix compillation error in mach-omap2/timer.c
ARM: sunxi: rename device tree source files
ARM: EXYNOS: Avoid passing the clks through platform data
ARM: S5PV210: Avoid passing the clks through platform data
ARM: S5P64X0: Add I2S clkdev support
ARM: S5PC100: Add I2S clkdev support
ARM: S3C64XX: Add I2S clkdev support
ARM: EXYNOS: Fix MSHC clocks instance names
ARM: EXYNOS: Fix NULL pointer dereference bug in SMDKV310
ARM: EXYNOS: Fix NULL pointer dereference bug in SMDK4X12
ARM: EXYNOS: Fix NULL pointer dereference bug in Origen
ARM: SAMSUNG: Add missing include guard to gpio-core.h
pinctrl: exynos5440/samsung: Staticize pcfgs
pinctrl: samsung: Fix a typo in pinctrl-samsung.h
ARM: EXYNOS: fix skip scu_enable() for EXYNOS5440
ARM: EXYNOS: fix GIC using for EXYNOS5440
ARM: EXYNOS: fix build error when MFC is not selected
Linus Torvalds [Fri, 21 Dec 2012 01:52:06 +0000 (17:52 -0800)]
Merge branch 'misc' of git://git./linux/kernel/git/mmarek/kbuild
Pull kbuild misc changes from Michal Marek:
"This is the non-critical part of kbuild
- scripts/kernel-doc requires a "Return:" section for non-void
functions
- ARCH=arm SUBARCH=... support for make tags
- COMPILED_SOURCE=1 support for make tags (only indexes .c files for
which a .o exists)
- New coccinelle check
- Option parsing fix for scripts/config"
* 'misc' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
scripts/config: Fix wrong "shift" for --keep-case
scripts/tags.sh: Support compiled source
scripts/tags.sh: Support subarch for ARM
scripts/coccinelle/misc/warn.cocci: use WARN
scripts/kernel-doc: check that non-void fcts describe their return value
Kernel-doc: Convention: Use a "Return" section to describe return values
David Howells [Thu, 20 Dec 2012 23:05:56 +0000 (15:05 -0800)]
keys: use keyring_alloc() to create module signing keyring
Use keyring_alloc() to create special keyrings now that it has
a permissions parameter rather than using key_alloc() +
key_instantiate_and_link().
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alan Cox [Thu, 20 Dec 2012 23:05:54 +0000 (15:05 -0800)]
keys: fix unreachable code
We set ret to NULL then test it. Remove the bogus test
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Scott Wolchok [Thu, 20 Dec 2012 23:05:52 +0000 (15:05 -0800)]
sendfile: allows bypassing of notifier events
do_sendfile() in fs/read_write.c does not call the fsnotify functions,
unlike its neighbors. This manifests as a lack of inotify ACCESS events
when a file is sent using sendfile(2).
Addresses
https://bugzilla.kernel.org/show_bug.cgi?id=12812
[akpm@linux-foundation.org: use fsnotify_modify(out.file), not fsnotify_access(), per Dave]
Signed-off-by: Alan Cox <alan@linux.intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Scott Wolchok <swolchok@umich.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Robin Holt [Thu, 20 Dec 2012 23:05:50 +0000 (15:05 -0800)]
SGI-XP: handle non-fatal traps
We found a user code which was raising a divide-by-zero trap. That trap
would lead to XPC connections between system-partitions being torn down
due to the die_chain notifier callouts it received.
This also revealed a different issue where multiple callers into
xpc_die_deactivate() would all attempt to do the disconnect in parallel
which would sometimes lock up but often overwhelm the console on very
large machines as each would print at least one line of output at the
end of the deactivate.
I reviewed all the users of the die_chain notifier and changed the code
to ignore the notifier callouts for reasons which will not actually lead
to a system to continue on to call die().
[akpm@linux-foundation.org: fix ia64]
Signed-off-by: Robin Holt <holt@sgi.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ravishankar N [Thu, 20 Dec 2012 23:05:46 +0000 (15:05 -0800)]
fat: fix incorrect function comment
fat_search_long() returns 0 on success, -ENOENT/ENOMEM on failure.
Change the function comment accordingly.
While at it, fix some trivial typos.
Signed-off-by: Ravishankar N <cyberax82@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@gmail.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Davidlohr Bueso [Thu, 20 Dec 2012 23:05:45 +0000 (15:05 -0800)]
Documentation: ABI: remove testing/sysfs-devices-node
This file is already documented in the stable ABI (see commit
5bbe1ec11fcf).
Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Greg KH <greg@kroah.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Xiaotian Feng [Thu, 20 Dec 2012 23:05:44 +0000 (15:05 -0800)]
proc: fix inconsistent lock state
Lockdep found an inconsistent lock state when rcu is processing delayed
work in softirq. Currently, kernel is using spin_lock/spin_unlock to
protect proc_inum_ida, but proc_free_inum is called by rcu in softirq
context.
Use spin_lock_bh/spin_unlock_bh fix following lockdep warning.
=================================
[ INFO: inconsistent lock state ]
3.7.0 #36 Not tainted
---------------------------------
inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
swapper/1/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
(proc_inum_lock){+.?...}, at: proc_free_inum+0x1c/0x50
{SOFTIRQ-ON-W} state was registered at:
__lock_acquire+0x8ae/0xca0
lock_acquire+0x199/0x200
_raw_spin_lock+0x41/0x50
proc_alloc_inum+0x4c/0xd0
alloc_mnt_ns+0x49/0xc0
create_mnt_ns+0x25/0x70
mnt_init+0x161/0x1c7
vfs_caches_init+0x107/0x11a
start_kernel+0x348/0x38c
x86_64_start_reservations+0x131/0x136
x86_64_start_kernel+0x103/0x112
irq event stamp:
2993422
hardirqs last enabled at (
2993422): _raw_spin_unlock_irqrestore+0x55/0x80
hardirqs last disabled at (
2993421): _raw_spin_lock_irqsave+0x29/0x70
softirqs last enabled at (
2993394): _local_bh_enable+0x13/0x20
softirqs last disabled at (
2993395): call_softirq+0x1c/0x30
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(proc_inum_lock);
<Interrupt>
lock(proc_inum_lock);
*** DEADLOCK ***
no locks held by swapper/1/0.
stack backtrace:
Pid: 0, comm: swapper/1 Not tainted 3.7.0 #36
Call Trace:
<IRQ> [<
ffffffff810a40f1>] ? vprintk_emit+0x471/0x510
print_usage_bug+0x2a5/0x2c0
mark_lock+0x33b/0x5e0
__lock_acquire+0x813/0xca0
lock_acquire+0x199/0x200
_raw_spin_lock+0x41/0x50
proc_free_inum+0x1c/0x50
free_pid_ns+0x1c/0x50
put_pid_ns+0x2e/0x50
put_pid+0x4a/0x60
delayed_put_pid+0x12/0x20
rcu_process_callbacks+0x462/0x790
__do_softirq+0x1b4/0x3b0
call_softirq+0x1c/0x30
do_softirq+0x59/0xd0
irq_exit+0x54/0xd0
smp_apic_timer_interrupt+0x95/0xa3
apic_timer_interrupt+0x72/0x80
cpuidle_enter_tk+0x10/0x20
cpuidle_enter_state+0x17/0x50
cpuidle_idle_call+0x287/0x520
cpu_idle+0xba/0x130
start_secondary+0x2b3/0x2bc
Signed-off-by: Xiaotian Feng <dannyfeng@tencent.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Guenter Roeck [Thu, 20 Dec 2012 23:05:42 +0000 (15:05 -0800)]
linux/kernel.h: fix DIV_ROUND_CLOSEST with unsigned divisors
Commit
263a523d18bc ("linux/kernel.h: Fix warning seen with W=1 due to
change in DIV_ROUND_CLOSEST") fixes a warning seen with W=1 due to
change in DIV_ROUND_CLOSEST.
Unfortunately, the C compiler converts divide operations with unsigned
divisors to unsigned, even if the dividend is signed and negative (for
example, -10 / 5U =
858993457). The C standard says "If one operand has
unsigned int type, the other operand is converted to unsigned int", so
the compiler is not to blame. As a result, DIV_ROUND_CLOSEST(0, 2U) and
similar operations now return bad values, since the automatic conversion
of expressions such as "0 - 2U/2" to unsigned was not taken into
account.
Fix by checking for the divisor variable type when deciding which
operation to perform. This fixes DIV_ROUND_CLOSEST(0, 2U), but still
returns bad values for negative dividends divided by unsigned divisors.
Mark the latter case as unsupported.
One observed effect of this problem is that the s2c_hwmon driver reports
a value of
4198403 instead of 0 if the ADC reads 0.
Other impact is unpredictable. Problem is seen if the divisor is an
unsigned variable or constant and the dividend is less than (divisor/2).
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Reported-by: Juergen Beisert <jbe@pengutronix.de>
Tested-by: Juergen Beisert <jbe@pengutronix.de>
Cc: Jean Delvare <khali@linux-fr.org>
Cc: <stable@vger.kernel.org> [3.7.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tejun Heo [Thu, 20 Dec 2012 23:05:40 +0000 (15:05 -0800)]
memcg: don't register hotcpu notifier from ->css_alloc()
Commit
648bb56d076b ("cgroup: lock cgroup_mutex in cgroup_init_subsys()")
made cgroup_init_subsys() grab cgroup_mutex before invoking
->css_alloc() for the root css. Because memcg registers hotcpu notifier
from ->css_alloc() for the root css, this introduced circular locking
dependency between cgroup_mutex and cpu hotplug.
Fix it by moving hotcpu notifier registration to a subsys initcall.
======================================================
[ INFO: possible circular locking dependency detected ]
3.7.0-rc4-work+ #42 Not tainted
-------------------------------------------------------
bash/645 is trying to acquire lock:
(cgroup_mutex){+.+.+.}, at: [<
ffffffff8110c5b7>] cgroup_lock+0x17/0x20
but task is already holding lock:
(cpu_hotplug.lock){+.+.+.}, at: [<
ffffffff8109300f>] cpu_hotplug_begin+0x2f/0x60
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (cpu_hotplug.lock){+.+.+.}:
lock_acquire+0x97/0x1e0
mutex_lock_nested+0x61/0x3b0
get_online_cpus+0x3c/0x60
rebuild_sched_domains_locked+0x1b/0x70
cpuset_write_resmask+0x298/0x2c0
cgroup_file_write+0x1ef/0x300
vfs_write+0xa8/0x160
sys_write+0x52/0xa0
system_call_fastpath+0x16/0x1b
-> #0 (cgroup_mutex){+.+.+.}:
__lock_acquire+0x14ce/0x1d20
lock_acquire+0x97/0x1e0
mutex_lock_nested+0x61/0x3b0
cgroup_lock+0x17/0x20
cpuset_handle_hotplug+0x1b/0x560
cpuset_update_active_cpus+0xe/0x10
cpuset_cpu_inactive+0x47/0x50
notifier_call_chain+0x66/0x150
__raw_notifier_call_chain+0xe/0x10
__cpu_notify+0x20/0x40
_cpu_down+0x7e/0x2f0
cpu_down+0x36/0x50
store_online+0x5d/0xe0
dev_attr_store+0x18/0x30
sysfs_write_file+0xe0/0x150
vfs_write+0xa8/0x160
sys_write+0x52/0xa0
system_call_fastpath+0x16/0x1b
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(cpu_hotplug.lock);
lock(cgroup_mutex);
lock(cpu_hotplug.lock);
lock(cgroup_mutex);
*** DEADLOCK ***
5 locks held by bash/645:
#0: (&buffer->mutex){+.+.+.}, at: [<
ffffffff8123bab8>] sysfs_write_file+0x48/0x150
#1: (s_active#42){.+.+.+}, at: [<
ffffffff8123bb38>] sysfs_write_file+0xc8/0x150
#2: (x86_cpu_hotplug_driver_mutex){+.+...}, at: [<
ffffffff81079277>] cpu_hotplug_driver_lock+0x1
+7/0x20
#3: (cpu_add_remove_lock){+.+.+.}, at: [<
ffffffff81093157>] cpu_maps_update_begin+0x17/0x20
#4: (cpu_hotplug.lock){+.+.+.}, at: [<
ffffffff8109300f>] cpu_hotplug_begin+0x2f/0x60
stack backtrace:
Pid: 645, comm: bash Not tainted 3.7.0-rc4-work+ #42
Call Trace:
print_circular_bug+0x28e/0x29f
__lock_acquire+0x14ce/0x1d20
lock_acquire+0x97/0x1e0
mutex_lock_nested+0x61/0x3b0
cgroup_lock+0x17/0x20
cpuset_handle_hotplug+0x1b/0x560
cpuset_update_active_cpus+0xe/0x10
cpuset_cpu_inactive+0x47/0x50
notifier_call_chain+0x66/0x150
__raw_notifier_call_chain+0xe/0x10
__cpu_notify+0x20/0x40
_cpu_down+0x7e/0x2f0
cpu_down+0x36/0x50
store_online+0x5d/0xe0
dev_attr_store+0x18/0x30
sysfs_write_file+0xe0/0x150
vfs_write+0xa8/0x160
sys_write+0x52/0xa0
system_call_fastpath+0x16/0x1b
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joe Perches [Thu, 20 Dec 2012 23:05:37 +0000 (15:05 -0800)]
checkpatch: warn on uapi #includes that #include <uapi/...
Avoid specifying internal uapi #include paths with uapi/... as
userspace should not use and never see that.
Neaten message line wrapping above.
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: David Howells <dhowells@redhat.com>
Acked-by: Andy Whitcroft <apw@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morton [Thu, 20 Dec 2012 23:05:34 +0000 (15:05 -0800)]
revert "rtc: recycle id when unloading a rtc driver"
Revert commit
2830a6d20139df2198d63235df7957712adb28e5.
We already perform the ida_simple_remove() in rtc_device_release(),
which is an appropriate place. Commit
2830a6d20 ("rtc: recycle id when
unloading a rtc driver") caused the kernel to emit
ida_remove called for id=0 which is not allocated.
warnings when rtc_device_release() tries to release an alread-released
ID.
Let's restore things to their previous state and then work out why
Vincent's kernel wasn't calling rtc_device_release() - presumably a bug
in a specific sub-driver.
Reported-by: Lothar Waßmann <LW@KARO-electronics.de>
Acked-by: Alexander Holler <holler@ahsoftware.de>
Cc: Vincent Palatin <vpalatin@chromium.org>
Cc: <stable@vger.kernel.org> [3.7.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeremy Eder [Thu, 20 Dec 2012 23:05:32 +0000 (15:05 -0800)]
mm: clean up transparent hugepage sysfs error messages
Clarify error messages and correct a few typos in the transparent hugepage
sysfs init code.
Signed-off-by: Jeremy Eder <jeder@redhat.com>
Acked-by: Rafael Aquini <aquini@redhat.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vyacheslav Dubeyko [Thu, 20 Dec 2012 23:05:29 +0000 (15:05 -0800)]
hfsplus: add error message for the case of failure of sync fs in delayed_sync_fs() method
Add an error message for the case of failure of sync fs in
delayed_sync_fs() method.
Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Hin-Tak Leung <htl10@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vyacheslav Dubeyko [Thu, 20 Dec 2012 23:05:28 +0000 (15:05 -0800)]
hfsplus: rework processing of hfs_btree_write() returned error
Add to hfs_btree_write() a return of -EIO on failure of b-tree node
searching. Also add logic ofor processing errors from hfs_btree_write()
in hfsplus_system_write_inode() with a message about b-tree writing
failure.
[akpm@linux-foundation.org: reduce scope of `err', print errno on error]
Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Hin-Tak Leung <htl10@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vyacheslav Dubeyko [Thu, 20 Dec 2012 23:05:25 +0000 (15:05 -0800)]
hfsplus: rework processing errors in hfsplus_free_extents()
Currently, it doesn't process error codes from the hfsplus_block_free()
call in hfsplus_free_extents() method. Add some error code processing.
Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Hin-Tak Leung <htl10@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alan Cox [Thu, 20 Dec 2012 23:05:24 +0000 (15:05 -0800)]
hfsplus: avoid crash on failed block map free
If the read fails we kmap an error code. This doesn't end well. Instead
print a critical error and pray. This mirrors the rest of the fs
behaviour with critical error cases.
Acked-by: Vyacheslav Dubeyko <slava@dubeyko.com>
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@suse.cz>
Acked-by: Hin-Tak Leung <htl10@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cyrill Gorcunov [Thu, 20 Dec 2012 23:05:21 +0000 (15:05 -0800)]
kcmp: include linux/ptrace.h
This makes it compile on s390. After all the ptrace_may_access
(which we use this file) is declared exactly in linux/ptrace.h.
This is preparatory work to wire this syscall up on all archs.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Alexander Kartashov <alekskartashov@parallels.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jean Delvare [Thu, 20 Dec 2012 23:05:19 +0000 (15:05 -0800)]
drivers/rtc/rtc-imxdi.c: must include <linux/spinlock.h>
Add the missing header include for spinlocks, to avoid potential build
failures on specific architectures or configurations.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Acked-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Marek Szyprowski [Thu, 20 Dec 2012 23:05:18 +0000 (15:05 -0800)]
mm: cma: WARN if freed memory is still in use
Memory returned to free_contig_range() must have no other references.
Let kernel to complain loudly if page reference count is not equal to 1.
[rientjes@google.com: support sparsemem]
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Kyungmin Park <kyungmin.park@samsung.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kees Cook [Thu, 20 Dec 2012 23:05:16 +0000 (15:05 -0800)]
exec: do not leave bprm->interp on stack
If a series of scripts are executed, each triggering module loading via
unprintable bytes in the script header, kernel stack contents can leak
into the command line.
Normally execution of binfmt_script and binfmt_misc happens recursively.
However, when modules are enabled, and unprintable bytes exist in the
bprm->buf, execution will restart after attempting to load matching
binfmt modules. Unfortunately, the logic in binfmt_script and
binfmt_misc does not expect to get restarted. They leave bprm->interp
pointing to their local stack. This means on restart bprm->interp is
left pointing into unused stack memory which can then be copied into the
userspace argv areas.
After additional study, it seems that both recursion and restart remains
the desirable way to handle exec with scripts, misc, and modules. As
such, we need to protect the changes to interp.
This changes the logic to require allocation for any changes to the
bprm->interp. To avoid adding a new kmalloc to every exec, the default
value is left as-is. Only when passing through binfmt_script or
binfmt_misc does an allocation take place.
For a proof of concept, see DoTest.sh from:
http://www.halfdog.net/Security/2012/LinuxKernelBinfmtScriptStackDataDisclosure/
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: halfdog <me@halfdog.net>
Cc: P J P <ppandit@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Zhenzhong Duan [Thu, 20 Dec 2012 23:05:14 +0000 (15:05 -0800)]
drivers/firmware/dmi_scan.c: fetch dmi version from SMBIOS if it exists
The right dmi version is in SMBIOS if it's zero in DMI region
This issue was originally found from an oracle bug.
One customer noticed system UUID doesn't match between dmidecode & uek2.
- HP ProLiant BL460c G6 :
# cat /sys/devices/virtual/dmi/id/product_uuid
00000000-0000-4C48-3031-
4D5030333531
# dmidecode | grep -i uuid
UUID:
00000000-0000-484C-3031-
4D5030333531
From SMBIOS 2.6 on, spec use little-endian encoding for UUID other than
network byte order.
So we need to get dmi version to distinguish. If version is 0.0, the
real version is taken from the SMBIOS version. This is part of original
kernel comment in code.
[akpm@linux-foundation.org: checkpatch fixes]
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Cc: Feng Jin <joe.jin@oracle.com>
Cc: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Zhenzhong Duan [Thu, 20 Dec 2012 23:05:13 +0000 (15:05 -0800)]
drivers/firmware/dmi_scan.c: check dmi version when get system uuid
As of version 2.6 of the SMBIOS specification, the first 3 fields of the
UUID are supposed to be little-endian encoded.
Also a minor fix to match variable meaning and mute checkpatch.pl
[akpm@linux-foundation.org: tweak code comment]
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Cc: Feng Jin <joe.jin@oracle.com>
Cc: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Josh Boyer [Thu, 20 Dec 2012 23:05:10 +0000 (15:05 -0800)]
Documentation: kernel-parameters.txt remove capability.disable
Remove the documentation for capability.disable. The code supporting
this parameter was removed with commit
5915eb53861c ("security: remove
dummy module")
Signed-off-by: Josh Boyer <jwboyer@redhat.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Cc: Rob Landley <rob@landley.net>
Cc: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sonny Rao [Thu, 20 Dec 2012 23:05:07 +0000 (15:05 -0800)]
mm: fix calculation of dirtyable memory
The system uses global_dirtyable_memory() to calculate number of
dirtyable pages/pages that can be allocated to the page cache. A bug
causes an underflow thus making the page count look like a big unsigned
number. This in turn confuses the dirty writeback throttling to
aggressively write back pages as they become dirty (usually 1 page at a
time). This generally only affects systems with highmem because the
underflowed count gets subtracted from the global count of dirtyable
memory.
The problem was introduced with
v3.2-4896-gab8fabd
Fix is to ensure we don't get an underflowed total of either highmem or
global dirtyable memory.
Signed-off-by: Sonny Rao <sonnyrao@chromium.org>
Signed-off-by: Puneet Kumar <puneetster@chromium.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Tested-by: Damien Wyart <damien.wyart@free.fr>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Minchan Kim [Thu, 20 Dec 2012 23:05:06 +0000 (15:05 -0800)]
compaction: fix build error in CMA && !COMPACTION
isolate_freepages_block() and isolate_migratepages_range() are used for
CMA as well as compaction so it breaks build for CONFIG_CMA &&
!CONFIG_COMPACTION.
This patch fixes it.
[akpm@linux-foundation.org: add "do { } while (0)", per Mel]
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeff Layton [Tue, 11 Dec 2012 17:10:18 +0000 (12:10 -0500)]
vfs: make lremovexattr retry once on ESTALE error
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Jeff Layton [Tue, 11 Dec 2012 17:10:17 +0000 (12:10 -0500)]
vfs: make removexattr retry once on ESTALE
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Jeff Layton [Tue, 11 Dec 2012 17:10:17 +0000 (12:10 -0500)]
vfs: make llistxattr retry once on ESTALE error
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Jeff Layton [Tue, 11 Dec 2012 17:10:16 +0000 (12:10 -0500)]
vfs: make listxattr retry once on ESTALE error
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Jeff Layton [Tue, 11 Dec 2012 17:10:16 +0000 (12:10 -0500)]
vfs: make lgetxattr retry once on ESTALE
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Jeff Layton [Tue, 11 Dec 2012 17:10:16 +0000 (12:10 -0500)]
vfs: make getxattr retry once on an ESTALE error
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Jeff Layton [Tue, 11 Dec 2012 17:10:15 +0000 (12:10 -0500)]
vfs: allow lsetxattr() to retry once on ESTALE errors
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Jeff Layton [Tue, 11 Dec 2012 17:10:15 +0000 (12:10 -0500)]
vfs: allow setxattr to retry once on ESTALE errors
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Jeff Layton [Tue, 11 Dec 2012 17:10:14 +0000 (12:10 -0500)]
vfs: allow utimensat() calls to retry once on an ESTALE error
Clearly, we can't handle the NULL filename case, but we can deal with
the case where there's a real pathname.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Jeff Layton [Tue, 11 Dec 2012 17:10:14 +0000 (12:10 -0500)]
vfs: fix user_statfs to retry once on ESTALE errors
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Jeff Layton [Tue, 11 Dec 2012 17:10:13 +0000 (12:10 -0500)]
vfs: make fchownat retry once on ESTALE errors
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Jeff Layton [Tue, 11 Dec 2012 17:10:13 +0000 (12:10 -0500)]
vfs: make fchmodat retry once on ESTALE errors
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Jeff Layton [Thu, 20 Dec 2012 22:08:32 +0000 (17:08 -0500)]
vfs: have chroot retry once on ESTALE error
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Jeff Layton [Tue, 11 Dec 2012 17:10:12 +0000 (12:10 -0500)]
vfs: have chdir retry lookup and call once on ESTALE error
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Jeff Layton [Tue, 11 Dec 2012 17:10:11 +0000 (12:10 -0500)]
vfs: have faccessat retry once on an ESTALE error
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Jeff Layton [Tue, 11 Dec 2012 17:10:11 +0000 (12:10 -0500)]
vfs: have do_sys_truncate retry once on an ESTALE error
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Jeff Layton [Tue, 11 Dec 2012 17:10:10 +0000 (12:10 -0500)]
vfs: fix renameat to retry on ESTALE errors
...as always, rename is the messiest of the bunch. We have to track
whether to retry or not via a separate flag since the error handling
is already quite complex.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Jeff Layton [Thu, 20 Dec 2012 21:38:04 +0000 (16:38 -0500)]
vfs: make do_unlinkat retry once on ESTALE errors
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Jeff Layton [Thu, 20 Dec 2012 21:28:33 +0000 (16:28 -0500)]
vfs: make do_rmdir retry once on ESTALE errors
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Jeff Layton [Tue, 11 Dec 2012 17:10:09 +0000 (12:10 -0500)]
vfs: add a flags argument to user_path_parent
...so we can pass in LOOKUP_REVAL. For now, nothing does yet.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Jeff Layton [Thu, 20 Dec 2012 21:15:38 +0000 (16:15 -0500)]
vfs: fix linkat to retry once on ESTALE errors
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Jeff Layton [Tue, 11 Dec 2012 17:10:08 +0000 (12:10 -0500)]
vfs: fix symlinkat to retry on ESTALE errors
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Jeff Layton [Thu, 20 Dec 2012 21:04:09 +0000 (16:04 -0500)]
vfs: fix mkdirat to retry once on an ESTALE error
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Jeff Layton [Thu, 20 Dec 2012 21:00:10 +0000 (16:00 -0500)]
vfs: fix mknodat to retry on ESTALE errors
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Jeff Layton [Tue, 11 Dec 2012 17:10:06 +0000 (12:10 -0500)]
vfs: turn is_dir argument to kern_path_create into a lookup_flags arg
Where we can pass in LOOKUP_DIRECTORY or LOOKUP_REVAL. Any other flags
passed in here are currently ignored.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Jeff Layton [Tue, 11 Dec 2012 17:10:06 +0000 (12:10 -0500)]
vfs: fix readlinkat to retry on ESTALE
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Jeff Layton [Tue, 11 Dec 2012 17:10:05 +0000 (12:10 -0500)]
vfs: make fstatat retry on ESTALE errors from getattr call
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Jeff Layton [Thu, 20 Dec 2012 19:59:40 +0000 (14:59 -0500)]
vfs: add a retry_estale helper function to handle retries on ESTALE
This function is expected to be called from path-based syscalls to help
them decide whether to try the lookup and call again in the event that
they got an -ESTALE return back on an earier try.
Currently, we only retry the call once on an ESTALE error, but in the
event that we decide that that's not enough in the future, we should be
able to change the logic in this helper without too much effort.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Thu, 20 Dec 2012 23:49:14 +0000 (18:49 -0500)]
Merge branch 'fscache' of git://git./linux/kernel/git/dhowells/linux-fs into for-linus
NeilBrown [Fri, 9 Nov 2012 00:09:37 +0000 (16:09 -0800)]
vfs: d_obtain_alias() needs to use "/" as default name.
NFS appears to use d_obtain_alias() to create the root dentry rather than
d_make_root. This can cause 'prepend_path()' to complain that the root
has a weird name if an NFS filesystem is lazily unmounted. e.g. if
"/mnt" is an NFS mount then
{ cd /mnt; umount -l /mnt ; ls -l /proc/self/cwd; }
will cause a WARN message like
WARNING: at /home/git/linux/fs/dcache.c:2624 prepend_path+0x1d7/0x1e0()
...
Root dentry has weird name <>
to appear in kernel logs.
So change d_obtain_alias() to use "/" rather than "" as the anonymous
name.
Signed-off-by: NeilBrown <neilb@suse.de>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Alessio Igor Bogani [Thu, 13 Dec 2012 11:22:39 +0000 (12:22 +0100)]
vfs: Remove useless function prototypes
Commit
8e22cc88d68ca1a46d7d582938f979eb640ed30f removes the (un)lock_super
function definitions but forgets to remove their prototypes.
Signed-off-by: Alessio Igor Bogani <abogani@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Marco Stornelli [Sat, 15 Dec 2012 11:00:38 +0000 (12:00 +0100)]
documentation: drop vmtruncate
Removed vmtruncate
Signed-off-by: Marco Stornelli <marco.stornelli@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Marco Stornelli [Sat, 15 Dec 2012 11:00:02 +0000 (12:00 +0100)]
mm: drop vmtruncate
Removed vmtruncate
Signed-off-by: Marco Stornelli <marco.stornelli@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Marco Stornelli [Sat, 15 Dec 2012 10:59:20 +0000 (11:59 +0100)]
vfs: drop vmtruncate
Removed vmtruncate
Signed-off-by: Marco Stornelli <marco.stornelli@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Marco Stornelli [Sat, 15 Dec 2012 10:58:36 +0000 (11:58 +0100)]
ntfs: drop vmtruncate
Removed vmtruncate
Signed-off-by: Marco Stornelli <marco.stornelli@gmail.com>
Reviewed-by: Anton Altaparmakov <anton@tuxera.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Marco Stornelli [Sat, 15 Dec 2012 10:57:37 +0000 (11:57 +0100)]
nilfs2: drop vmtruncate
Removed vmtruncate
Signed-off-by: Marco Stornelli <marco.stornelli@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Marco Stornelli [Sat, 15 Dec 2012 10:57:03 +0000 (11:57 +0100)]
ncpfs: drop vmtruncate
Removed vmtruncate
Signed-off-by: Marco Stornelli <marco.stornelli@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Marco Stornelli [Sat, 15 Dec 2012 10:56:25 +0000 (11:56 +0100)]
minix: drop vmtruncate
Removed vmtruncate
Signed-off-by: Marco Stornelli <marco.stornelli@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Marco Stornelli [Sat, 15 Dec 2012 10:55:42 +0000 (11:55 +0100)]
logfs: drop vmtruncate
Removed vmtruncate
Signed-off-by: Marco Stornelli <marco.stornelli@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Marco Stornelli [Sat, 15 Dec 2012 10:55:07 +0000 (11:55 +0100)]
hfsplus: drop vmtruncate
Removed vmtruncate
Signed-off-by: Marco Stornelli <marco.stornelli@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Marco Stornelli [Sat, 15 Dec 2012 10:54:25 +0000 (11:54 +0100)]
jfs: drop vmtruncate
Removed vmtruncate
Signed-off-by: Marco Stornelli <marco.stornelli@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Marco Stornelli [Sat, 15 Dec 2012 10:53:50 +0000 (11:53 +0100)]
hpfs: drop vmtruncate
Removed vmtruncate
Signed-off-by: Marco Stornelli <marco.stornelli@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
David Howells [Fri, 14 Dec 2012 11:02:22 +0000 (11:02 +0000)]
FS-Cache: Clear remaining page count on retrieval cancellation
Provide fscache_cancel_op() with a pointer to a function it should invoke under
lock if it cancels an operation.
Use this to clear the remaining page count upon cancellation of a pending
retrieval operation so that fscache_release_retrieval_op() doesn't get an
assertion failure (see below). This can happen when a signal occurs, say from
CTRL-C being pressed during data retrieval.
FS-Cache: Assertion failed
3 == 0 is false
------------[ cut here ]------------
kernel BUG at fs/fscache/page.c:237!
invalid opcode: 0000 [#641] SMP
Modules linked in: cachefiles(F) nfsv4(F) nfsv3(F) nfsv2(F) nfs(F) fscache(F) auth_rpcgss(F) nfs_acl(F) lockd(F) sunrpc(F)
CPU 0
Pid: 6075, comm: slurp-q Tainted: GF D 3.7.0-rc8-fsdevel+ #411 /DG965RY
RIP: 0010:[<
ffffffffa007f328>] [<
ffffffffa007f328>] fscache_release_retrieval_op+0x75/0xff [fscache]
RSP: 0000:
ffff88001c6d7988 EFLAGS:
00010296
RAX:
000000000000000f RBX:
ffff880014cdfe00 RCX:
ffffffff6c102000
RDX:
ffffffff8102d1ad RSI:
ffffffff6c102000 RDI:
ffffffff8102d1d6
RBP:
ffff88001c6d7998 R08:
0000000000000002 R09:
0000000000000000
R10:
0000000000000000 R11:
0000000000000000 R12:
00000000fffffe00
R13:
ffff88001c6d7ab4 R14:
ffff88001a8638a0 R15:
ffff88001552b190
FS:
00007f877aaf0700(0000) GS:
ffff88003bc00000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
000000008005003b
CR2:
00007fff11378fd2 CR3:
000000001c6c6000 CR4:
00000000000007f0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000ffff0ff0 DR7:
0000000000000400
Process slurp-q (pid: 6075, threadinfo
ffff88001c6d6000, task
ffff88001c6c4080)
Stack:
ffffffffa007ec07 ffff880014cdfe00 ffff88001c6d79c8 ffffffffa007db4d
ffffffffa007ec07 ffff880014cdfe00 00000000fffffe00 ffff88001c6d7ab4
ffff88001c6d7a38 ffffffffa008116d 0000000000000000 ffff88001c6c4080
Call Trace:
[<
ffffffffa007ec07>] ? fscache_cancel_op+0x194/0x1cf [fscache]
[<
ffffffffa007db4d>] fscache_put_operation+0x135/0x2ed [fscache]
[<
ffffffffa007ec07>] ? fscache_cancel_op+0x194/0x1cf [fscache]
[<
ffffffffa008116d>] __fscache_read_or_alloc_pages+0x413/0x4bc [fscache]
[<
ffffffff810ac8ae>] ? __alloc_pages_nodemask+0x195/0x75c
[<
ffffffffa00aab0f>] __nfs_readpages_from_fscache+0x86/0x13d [nfs]
[<
ffffffffa00a5fe0>] nfs_readpages+0x186/0x1bd [nfs]
[<
ffffffff810d23c8>] ? alloc_pages_current+0xc7/0xe4
[<
ffffffff810a68b5>] ? __page_cache_alloc+0x84/0x91
[<
ffffffff810af912>] ? __do_page_cache_readahead+0xa6/0x2e0
[<
ffffffff810afaa3>] __do_page_cache_readahead+0x237/0x2e0
[<
ffffffff810af912>] ? __do_page_cache_readahead+0xa6/0x2e0
[<
ffffffff810afe3e>] ra_submit+0x1c/0x20
[<
ffffffff810b019b>] ondemand_readahead+0x359/0x382
[<
ffffffff810b0279>] page_cache_sync_readahead+0x38/0x3a
[<
ffffffff810a77b5>] generic_file_aio_read+0x26b/0x637
[<
ffffffffa00f1852>] ? nfs_mark_delegation_referenced+0xb/0xb [nfsv4]
[<
ffffffffa009cc85>] nfs_file_read+0xaa/0xcf [nfs]
[<
ffffffff810db5b3>] do_sync_read+0x91/0xd1
[<
ffffffff810dbb8b>] vfs_read+0x9b/0x144
[<
ffffffff810dbc78>] sys_read+0x44/0x75
[<
ffffffff81422892>] system_call_fastpath+0x16/0x1b
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Thu, 13 Dec 2012 20:03:13 +0000 (20:03 +0000)]
FS-Cache: Mark cancellation of in-progress operation
Mark as cancelled an operation that is in progress rather than pending at the
time it is cancelled, and call fscache_complete_op() to cancel an operation so
that blocked ops can be started.
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Fri, 7 Dec 2012 10:41:26 +0000 (10:41 +0000)]
FS-Cache: One of the write operation paths doesn't set the object state
In fscache_write_op(), if the object is determined to have become inactive or
to have lost its cookie, we don't move the operation state from in-progress,
and so an assertion in fscache_put_operation() fails with an assertion (see
below).
Instrumenting fscache_op_work_func() indicates that it called
fscache_write_op() before calling fscache_put_operation() - where the assertion
failed. The assertion at line 433 indicates that the operation state is
IN_PROGRESS rather than being COMPLETE or CANCELLED.
Instrumenting fscache_write_op() showed that it was being called on an object
that had had its cookie removed and that this was due to relinquishment of the
cookie by the netfs. At this point fscache no longer has access to the pages
of netfs data that were requested to be written, and so simply cancelling the
operation is the thing to do.
FS-Cache: Assertion failed
3 == 5 is false
------------[ cut here ]------------
kernel BUG at fs/fscache/operation.c:433!
invalid opcode: 0000 [#1] SMP
Modules linked in: cachefiles(F) nfsv4(F) nfsv3(F) nfsv2(F) nfs(F) fscache(F) auth_rpcgss(F) nfs_acl(F) lockd(F) sunrpc(F)
CPU 0
Pid: 1035, comm: kworker/u:3 Tainted: GF 3.7.0-rc8-fsdevel+ #411 /DG965RY
RIP: 0010:[<
ffffffffa007db22>] [<
ffffffffa007db22>] fscache_put_operation+0x11a/0x2ed [fscache]
RSP: 0018:
ffff88003e32bcf8 EFLAGS:
00010296
RAX:
000000000000000f RBX:
ffff88001818eb78 RCX:
ffffffff6c102000
RDX:
ffffffff8102d1ad RSI:
ffffffff6c102000 RDI:
ffffffff8102d1d6
RBP:
ffff88003e32bd18 R08:
0000000000000002 R09:
0000000000000000
R10:
0000000000000000 R11:
0000000000000000 R12:
ffffffffa00811da
R13:
0000000000000001 R14:
0000000100625d26 R15:
0000000000000000
FS:
0000000000000000(0000) GS:
ffff88003bc00000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
000000008005003b
CR2:
00007fff7dd31c68 CR3:
000000003d730000 CR4:
00000000000007f0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000ffff0ff0 DR7:
0000000000000400
Process kworker/u:3 (pid: 1035, threadinfo
ffff88003e32a000, task
ffff88003bb38080)
Stack:
ffffffff8102d1ad ffff88001818eb78 ffffffffa00811da 0000000000000001
ffff88003e32bd48 ffffffffa007f0ad ffff88001818eb78 ffffffff819583c0
ffff88003df24e00 ffff88003882c3e0 ffff88003e32bde8 ffffffff81042de0
Call Trace:
[<
ffffffff8102d1ad>] ? vprintk_emit+0x3c6/0x41a
[<
ffffffffa00811da>] ? __fscache_read_or_alloc_pages+0x4bc/0x4bc [fscache]
[<
ffffffffa007f0ad>] fscache_op_work_func+0xec/0x123 [fscache]
[<
ffffffff81042de0>] process_one_work+0x21c/0x3b0
[<
ffffffff81042d82>] ? process_one_work+0x1be/0x3b0
[<
ffffffffa007efc1>] ? fscache_operation_gc+0x23e/0x23e [fscache]
[<
ffffffff8104332e>] worker_thread+0x202/0x2df
[<
ffffffff8104312c>] ? rescuer_thread+0x18e/0x18e
[<
ffffffff81047c1c>] kthread+0xd0/0xd8
[<
ffffffff81421bfa>] ? _raw_spin_unlock_irq+0x29/0x3e
[<
ffffffff81047b4c>] ? __init_kthread_worker+0x55/0x55
[<
ffffffff814227ec>] ret_from_fork+0x7c/0xb0
[<
ffffffff81047b4c>] ? __init_kthread_worker+0x55/0x55
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Fri, 7 Dec 2012 18:08:02 +0000 (18:08 +0000)]
FS-Cache: Fix signal handling during waits
wait_on_bit() with TASK_INTERRUPTIBLE returns 1 rather than a negative error
code, so change what we check for. This means that the signal handling in
fscache_wait_for_retrieval_activation() should now work properly.
Without this, the following bug can be seen if CTRL-C is pressed during
fscache read operation:
FS-Cache: Assertion failed
2 == 3 is false
------------[ cut here ]------------
kernel BUG at fs/fscache/page.c:347!
invalid opcode: 0000 [#1] SMP
Modules linked in: cachefiles(F) nfsv4(F) nfsv3(F) nfsv2(F) nfs(F) fscache(F) auth_rpcgss(F) nfs_acl(F) lockd(F) sunrpc(F)
CPU 1
Pid: 15006, comm: slurp-q Tainted: GF 3.7.0-rc8-fsdevel+ #411 /DG965RY
RIP: 0010:[<
ffffffffa007fcb4>] [<
ffffffffa007fcb4>] fscache_wait_for_retrieval_activation+0x167/0x177 [fscache]
RSP: 0018:
ffff88002a4c39a8 EFLAGS:
00010292
RAX:
000000000000001a RBX:
ffff88002d3dc158 RCX:
0000000000008685
RDX:
ffffffff8102ccd6 RSI:
0000000000000001 RDI:
ffffffff8102d1d6
RBP:
ffff88002a4c39c8 R08:
0000000000000002 R09:
0000000000000000
R10:
ffffffff8163afa0 R11:
ffff88003bd11900 R12:
ffffffffa00868c8
R13:
ffff880028306458 R14:
ffff88002d3dc1b0 R15:
ffff88001372e538
FS:
00007f17426a0700(0000) GS:
ffff88003bd00000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
000000008005003b
CR2:
00007f1742494a44 CR3:
0000000031bd7000 CR4:
00000000000007e0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000ffff0ff0 DR7:
0000000000000400
Process slurp-q (pid: 15006, threadinfo
ffff88002a4c2000, task
ffff880023de3040)
Stack:
ffff88002d3dc158 ffff88001372e538 ffff88002a4c3ab4 ffff8800283064e0
ffff88002a4c3a38 ffffffffa0080f6d 0000000000000000 ffff880023de3040
ffff88002a4c3ac8 ffffffff810ac8ae ffff880028306458 ffff88002a4c3bc8
Call Trace:
[<
ffffffffa0080f6d>] __fscache_read_or_alloc_pages+0x24f/0x4bc [fscache]
[<
ffffffff810ac8ae>] ? __alloc_pages_nodemask+0x195/0x75c
[<
ffffffffa00aab0f>] __nfs_readpages_from_fscache+0x86/0x13d [nfs]
[<
ffffffffa00a5fe0>] nfs_readpages+0x186/0x1bd [nfs]
[<
ffffffff810d23c8>] ? alloc_pages_current+0xc7/0xe4
[<
ffffffff810a68b5>] ? __page_cache_alloc+0x84/0x91
[<
ffffffff810af912>] ? __do_page_cache_readahead+0xa6/0x2e0
[<
ffffffff810afaa3>] __do_page_cache_readahead+0x237/0x2e0
[<
ffffffff810af912>] ? __do_page_cache_readahead+0xa6/0x2e0
[<
ffffffff810afe3e>] ra_submit+0x1c/0x20
[<
ffffffff810b019b>] ondemand_readahead+0x359/0x382
[<
ffffffff810b0279>] page_cache_sync_readahead+0x38/0x3a
[<
ffffffff810a77b5>] generic_file_aio_read+0x26b/0x637
[<
ffffffffa00f1852>] ? nfs_mark_delegation_referenced+0xb/0xb [nfsv4]
[<
ffffffffa009cc85>] nfs_file_read+0xaa/0xcf [nfs]
[<
ffffffff810db5b3>] do_sync_read+0x91/0xd1
[<
ffffffff810dbb8b>] vfs_read+0x9b/0x144
[<
ffffffff810dbc78>] sys_read+0x44/0x75
[<
ffffffff81422892>] system_call_fastpath+0x16/0x1b
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Wed, 5 Dec 2012 16:31:49 +0000 (16:31 +0000)]
NFS4: Open files for fscaching
nfs4_file_open() should open files for fscaching.
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Wed, 5 Dec 2012 13:34:49 +0000 (13:34 +0000)]
FS-Cache: Add transition to handle invalidate immediately after lookup
Add a missing transition to the FS-Cache object state machine to handle an
invalidation event occuring between the back end completing the object lookup
by calling fscache_obtained_object() (which moves to state OBJECT_AVAILABLE)
and the backend returning to fscache_lookup_object() and thence to
fscache_object_state_machine() which then does a goto lookup_transit to handle
the transition - but lookup_transit doesn't handle EV_INVALIDATE.
Without this, the following BUG can be logged:
FS-Cache: Unsupported event 2 [5/f7] in state OBJECT_AVAILABLE
------------[ cut here ]------------
kernel BUG at fs/fscache/object.c:357!
Where event 2 is EV_INVALIDATE.
Signed-off-by: David Howells <dhowells@redhat.com>
Linus Torvalds [Thu, 20 Dec 2012 22:15:53 +0000 (14:15 -0800)]
Merge branch 'kbuild' of git://git./linux/kernel/git/mmarek/kbuild
Pull kbuild changes from Michal Marek:
"The kbuild changes are minimal this time:
- scripts/pnmlogo fix for some newer format
- minor top-level Makefile cleanup
- fix for a v3.5 regression with make clean M=<directory>"
* 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
kbuild: Do not remove vmlinux when cleaning external module
scripts/pnmtologo: fix for plain PBM
kbuild: Remove reference to uninitialised variable
Tony Lindgren [Thu, 20 Dec 2012 19:50:34 +0000 (11:50 -0800)]
ARM: OMAP2+: Trivial fix for IOMMU merge issue
Commit
787314c35fbb ("Merge tag 'iommu-updates-v3.8' of
git://git./linux/kernel/git/joro/iommu") did not account for the changed
header location.
The headers were made local to mach-omap2 as they are specific to omap2+
only, and we wanted to get most of the #include <plat/*.h> headers fixed
up anyways for the ARM multiplatform support.
We attempted to avoid this kind of merge conflict early on by setting up
a minimal git branch shared by the arm-soc tree and the iommu tree, but
looks like we still hit a merge issue there as the branches got merged
as various topic branches.
Cc: Joerg Roedel <joro@8bytes.org>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Howells [Wed, 5 Dec 2012 13:34:49 +0000 (13:34 +0000)]
NFS: nfs_migrate_page() does not wait for FS-Cache to finish with a page
nfs_migrate_page() does not wait for FS-Cache to finish with a page, probably
leading to the following bad-page-state:
BUG: Bad page state in process python-bin pfn:17d39b
page:
ffffea00053649e8 flags:
004000000000100c count:0 mapcount:0 mapping:(null)
index:38686 (Tainted: G B ---------------- )
Pid: 31053, comm: python-bin Tainted: G B ----------------
2.6.32-71.24.1.el6.x86_64 #1
Call Trace:
[<
ffffffff8111bfe7>] bad_page+0x107/0x160
[<
ffffffff8111ee69>] free_hot_cold_page+0x1c9/0x220
[<
ffffffff8111ef19>] __pagevec_free+0x59/0xb0
[<
ffffffff8104b988>] ? flush_tlb_others_ipi+0x128/0x130
[<
ffffffff8112230c>] release_pages+0x21c/0x250
[<
ffffffff8115b92a>] ? remove_migration_pte+0x28a/0x2b0
[<
ffffffff8115f3f8>] ? mem_cgroup_get_reclaim_stat_from_page+0x18/0x70
[<
ffffffff81122687>] ____pagevec_lru_add+0x167/0x180
[<
ffffffff811226f8>] __lru_cache_add+0x58/0x70
[<
ffffffff81122731>] lru_cache_add_lru+0x21/0x40
[<
ffffffff81123f49>] putback_lru_page+0x69/0x100
[<
ffffffff8115c0bd>] migrate_pages+0x13d/0x5d0
[<
ffffffff81122687>] ? ____pagevec_lru_add+0x167/0x180
[<
ffffffff81152ab0>] ? compaction_alloc+0x0/0x370
[<
ffffffff8115255c>] compact_zone+0x4cc/0x600
[<
ffffffff8111cfac>] ? get_page_from_freelist+0x15c/0x820
[<
ffffffff810672f4>] ? check_preempt_wakeup+0x1c4/0x3c0
[<
ffffffff8115290e>] compact_zone_order+0x7e/0xb0
[<
ffffffff81152a49>] try_to_compact_pages+0x109/0x170
[<
ffffffff8111e94d>] __alloc_pages_nodemask+0x5ed/0x850
[<
ffffffff814c9136>] ? thread_return+0x4e/0x778
[<
ffffffff81150d43>] alloc_pages_vma+0x93/0x150
[<
ffffffff81167ea5>] do_huge_pmd_anonymous_page+0x135/0x340
[<
ffffffff814cb6f6>] ? rwsem_down_read_failed+0x26/0x30
[<
ffffffff81136755>] handle_mm_fault+0x245/0x2b0
[<
ffffffff814ce383>] do_page_fault+0x123/0x3a0
[<
ffffffff814cbdf5>] page_fault+0x25/0x30
nfs_migrate_page() calls nfs_fscache_release_page() which doesn't actually wait
- even if __GFP_WAIT is set. The reason that doesn't wait is that
fscache_maybe_release_page() might deadlock the allocator as the work threads
writing to the cache may all end up sleeping on memory allocation.
However, I wonder if that is actually a problem. There are a number of things
I can do to deal with this:
(1) Make nfs_migrate_page() wait.
(2) Make fscache_maybe_release_page() honour the __GFP_WAIT flag.
(3) Set a timeout around the wait.
(4) Make nfs_migrate_page() return an error if the page is still busy.
For the moment, I'll select (2) and (4).
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
David Howells [Wed, 5 Dec 2012 13:34:48 +0000 (13:34 +0000)]
FS-Cache: Exclusive op submission can BUG if there's been an I/O error
The function to submit an exclusive op (fscache_submit_exclusive_op()) can BUG
if there's been an I/O error because it may see the parent cache object in an
unexpected state. It should only BUG if there hasn't been an I/O error.
In this case the problem was produced by remounting the cache partition to be
R/O. The EROFS state was detected and the cache was aborted, but not
everything handled the aborting correctly.
SysRq : Emergency Remount R/O
EXT4-fs (sda6): re-mounted. Opts: (null)
Emergency Remount complete
CacheFiles: I/O Error: Failed to update xattr with error -30
FS-Cache: Cache cachefiles stopped due to I/O error
------------[ cut here ]------------
kernel BUG at fs/fscache/operation.c:128!
invalid opcode: 0000 [#1] SMP
CPU 0
Modules linked in: cachefiles nfs fscache auth_rpcgss nfs_acl lockd sunrpc
Pid: 6612, comm: kworker/u:2 Not tainted 3.1.0-rc8-fsdevel+ #1093 /DG965RY
RIP: 0010:[<
ffffffffa00739c0>] [<
ffffffffa00739c0>] fscache_submit_exclusive_op+0x2ad/0x2c2 [fscache]
RSP: 0018:
ffff880000853d40 EFLAGS:
00010206
RAX:
ffff880038ac72a8 RBX:
ffff8800181f2260 RCX:
ffffffff81f2b2b0
RDX:
0000000000000001 RSI:
ffffffff8179a478 RDI:
ffff8800181f2280
RBP:
ffff880000853d60 R08:
0000000000000002 R09:
0000000000000000
R10:
0000000000000001 R11:
0000000000000001 R12:
ffff880038ac7268
R13:
ffff8800181f2280 R14:
ffff88003a359190 R15:
000000010122b162
FS:
0000000000000000(0000) GS:
ffff88003bc00000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
000000008005003b
CR2:
00000034cc4a77f0 CR3:
0000000010e96000 CR4:
00000000000006f0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000ffff0ff0 DR7:
0000000000000400
Process kworker/u:2 (pid: 6612, threadinfo
ffff880000852000, task
ffff880014c3c040)
Stack:
ffff8800181f2260 ffff8800181f2310 ffff880038ac7268 ffff8800181f2260
ffff880000853dc0 ffffffffa0072375 ffff880037ecfe00 ffff88003a359198
ffff880000853dc0 0000000000000246 0000000000000000 ffff88000a91d308
Call Trace:
[<
ffffffffa0072375>] fscache_object_work_func+0x792/0xe65 [fscache]
[<
ffffffff81047e44>] process_one_work+0x1eb/0x37f
[<
ffffffff81047de6>] ? process_one_work+0x18d/0x37f
[<
ffffffffa0071be3>] ? fscache_enqueue_dependents+0xd8/0xd8 [fscache]
[<
ffffffff810482e4>] worker_thread+0x15a/0x21a
[<
ffffffff8104818a>] ? rescuer_thread+0x188/0x188
[<
ffffffff8104bf96>] kthread+0x7f/0x87
[<
ffffffff813ad6f4>] kernel_thread_helper+0x4/0x10
[<
ffffffff81026b98>] ? finish_task_switch+0x45/0xc0
[<
ffffffff813abd1d>] ? retint_restore_args+0xe/0xe
[<
ffffffff8104bf17>] ? __init_kthread_worker+0x53/0x53
[<
ffffffff813ad6f0>] ? gs_change+0xb/0xb
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Wed, 5 Dec 2012 13:34:48 +0000 (13:34 +0000)]
FS-Cache: Limit the number of I/O error reports for a cache
Limit the number of I/O error reports for a cache to 1 to prevent massive
amounts of noise. After the first I/O error the cache is taken off line
automatically, so must be restarted to resume caching.
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Wed, 5 Dec 2012 13:34:47 +0000 (13:34 +0000)]
FS-Cache: Don't mask off the object event mask when printing it
Don't mask off the object event mask when printing it. That way it can be seen
if threre are bits set that shouldn't be.
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Wed, 5 Dec 2012 13:34:46 +0000 (13:34 +0000)]
FS-Cache: Initialise the object event mask with the calculated mask
Initialise the object event mask with the calculated mask rather than unmasking
undefined events also.
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Wed, 5 Dec 2012 13:34:46 +0000 (13:34 +0000)]
FS-Cache: Convert the object event ID #defines into an enum
Convert the fscache_object event IDs from #defines into an enum. Also add an
extra label to the enum to carry the event count and redefine the event mask
in terms of that.
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Wed, 5 Dec 2012 13:34:45 +0000 (13:34 +0000)]
CacheFiles: Add missing retrieval completions
CacheFiles is missing some calls to fscache_retrieval_complete() in the error
handling/collision paths of its reader functions.
This can be seen by the following assertion tripping in fscache_put_operation()
whereby the operation being destroyed is still in the in-progress state and has
not been cancelled or completed:
FS-Cache: Assertion failed
3 == 5 is false
------------[ cut here ]------------
kernel BUG at fs/fscache/operation.c:408!
invalid opcode: 0000 [#1] SMP
CPU 2
Modules linked in: xfs ioatdma dca loop joydev evdev
psmouse dcdbas pcspkr serio_raw i5000_edac edac_core i5k_amb shpchp
pci_hotplug sg sr_mod]
Pid: 8062, comm: httpd Not tainted 3.1.0-rc8 #1 Dell Inc. PowerEdge 1950/0DT097
RIP: 0010:[<
ffffffff81197b24>] [<
ffffffff81197b24>] fscache_put_operation+0x304/0x330
RSP: 0018:
ffff880062f739d8 EFLAGS:
00010296
RAX:
0000000000000025 RBX:
ffff8800c5122e84 RCX:
ffffffff81ddf040
RDX:
00000000ffffffff RSI:
0000000000000082 RDI:
ffffffff81ddef30
RBP:
ffff880062f739f8 R08:
0000000000000005 R09:
0000000000000000
R10:
0000000000000000 R11:
0000000000000003 R12:
ffff8800c5122e40
R13:
ffff880037a2cd20 R14:
ffff880087c7a058 R15:
ffff880087c7a000
FS:
00007f63dcf636e0(0000) GS:
ffff88022fc80000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
00007f0c0a91f000 CR3:
0000000062ec2000 CR4:
00000000000006e0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000ffff0ff0 DR7:
0000000000000400
Process httpd (pid: 8062, threadinfo
ffff880062f72000, task
ffff880087e58000)
Stack:
ffff880062f73bf8 0000000000000000 ffff880062f73bf8 ffff880037a2cd20
ffff880062f73a68 ffffffff8119aa7e ffff88006540e000 ffff880062f73ad4
ffff88008e9a4308 ffff880037a2cd20 ffff880062f73a48 ffff8800c5122e40
Call Trace:
[<
ffffffff8119aa7e>] __fscache_read_or_alloc_pages+0x1fe/0x530
[<
ffffffff81250780>] __nfs_readpages_from_fscache+0x70/0x1c0
[<
ffffffff8123142a>] nfs_readpages+0xca/0x1e0
[<
ffffffff815f3c06>] ? rpc_do_put_task+0x36/0x50
[<
ffffffff8122755b>] ? alloc_nfs_open_context+0x4b/0x110
[<
ffffffff815ecd1a>] ? rpc_call_sync+0x5a/0x70
[<
ffffffff810e7e9a>] __do_page_cache_readahead+0x1ca/0x270
[<
ffffffff810e7f61>] ra_submit+0x21/0x30
[<
ffffffff810e818d>] ondemand_readahead+0x11d/0x250
[<
ffffffff810e83b6>] page_cache_sync_readahead+0x36/0x60
[<
ffffffff810dffa4>] generic_file_aio_read+0x454/0x770
[<
ffffffff81224ce1>] nfs_file_read+0xe1/0x130
[<
ffffffff81121bd9>] do_sync_read+0xd9/0x120
[<
ffffffff8114088f>] ? mntput+0x1f/0x40
[<
ffffffff811238cb>] ? fput+0x1cb/0x260
[<
ffffffff81122938>] vfs_read+0xc8/0x180
[<
ffffffff81122af5>] sys_read+0x55/0x90
Reported-by: Mark Moseley <moseleymark@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Thu, 20 Dec 2012 21:52:38 +0000 (21:52 +0000)]
NFS: Use FS-Cache invalidation
Use the new FS-Cache invalidation facility from NFS to deal with foreign
changes being detected on the server rather than attempting to retire the old
cookie and get a new one.
The problem with the old method was that NFS did not wait for all outstanding
storage and retrieval ops on the cache to complete. There was no automatic
wait between the calls to ->readpages() and calls to invalidate_inode_pages2()
as the latter can only wait on locked pages that have been added to the
pagecache (which they haven't yet on entry to ->readpages()).
This was leading to oopses like the one below when an outstanding read got cut
off from its cookie by a premature release.
BUG: unable to handle kernel NULL pointer dereference at
00000000000000a8
IP: [<
ffffffffa0075118>] __fscache_read_or_alloc_pages+0x1dd/0x315 [fscache]
PGD
15889067 PUD
15890067 PMD 0
Oops: 0000 [#1] SMP
CPU 0
Modules linked in: cachefiles nfs fscache auth_rpcgss nfs_acl lockd sunrpc
Pid: 4544, comm: tar Not tainted 3.1.0-rc4-fsdevel+ #1064 /DG965RY
RIP: 0010:[<
ffffffffa0075118>] [<
ffffffffa0075118>] __fscache_read_or_alloc_pages+0x1dd/0x315 [fscache]
RSP: 0018:
ffff8800158799e8 EFLAGS:
00010246
RAX:
0000000000000000 RBX:
ffff8800070d41e0 RCX:
ffff8800083dc1b0
RDX:
0000000000000000 RSI:
ffff880015879960 RDI:
ffff88003e627b90
RBP:
ffff880015879a28 R08:
0000000000000002 R09:
0000000000000002
R10:
0000000000000001 R11:
ffff880015879950 R12:
ffff880015879aa4
R13:
0000000000000000 R14:
ffff8800083dc158 R15:
ffff880015879be8
FS:
00007f671e9d87c0(0000) GS:
ffff88003bc00000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
000000008005003b
CR2:
00000000000000a8 CR3:
000000001587f000 CR4:
00000000000006f0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000ffff0ff0 DR7:
0000000000000400
Process tar (pid: 4544, threadinfo
ffff880015878000, task
ffff880015875040)
Stack:
ffffffffa00b1759 ffff8800070dc158 ffff8800000213da ffff88002a286508
ffff880015879aa4 ffff880015879be8 0000000000000001 ffff88002a2866e8
ffff880015879a88 ffffffffa00b20be 00000000000200da ffff880015875040
Call Trace:
[<
ffffffffa00b1759>] ? nfs_fscache_wait_bit+0xd/0xd [nfs]
[<
ffffffffa00b20be>] __nfs_readpages_from_fscache+0x7e/0x13f [nfs]
[<
ffffffff81095fe7>] ? __alloc_pages_nodemask+0x156/0x662
[<
ffffffffa0098763>] nfs_readpages+0xee/0x187 [nfs]
[<
ffffffff81098a5e>] __do_page_cache_readahead+0x1be/0x267
[<
ffffffff81098942>] ? __do_page_cache_readahead+0xa2/0x267
[<
ffffffff81098d7b>] ra_submit+0x1c/0x20
[<
ffffffff8109900a>] ondemand_readahead+0x28b/0x29a
[<
ffffffff810990ce>] page_cache_sync_readahead+0x38/0x3a
[<
ffffffff81091d8a>] generic_file_aio_read+0x2ab/0x67e
[<
ffffffffa008cfbe>] nfs_file_read+0xa4/0xc9 [nfs]
[<
ffffffff810c22c4>] do_sync_read+0xba/0xfa
[<
ffffffff810a62c9>] ? might_fault+0x4e/0x9e
[<
ffffffff81177a47>] ? security_file_permission+0x7b/0x84
[<
ffffffff810c25dd>] ? rw_verify_area+0xab/0xc8
[<
ffffffff810c29a4>] vfs_read+0xaa/0x13a
[<
ffffffff810c2a79>] sys_read+0x45/0x6c
[<
ffffffff813ac37b>] system_call_fastpath+0x16/0x1b
Reported-by: Mark Moseley <moseleymark@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Thu, 20 Dec 2012 21:52:36 +0000 (21:52 +0000)]
CacheFiles: Implement invalidation
Implement invalidation for CacheFiles. This is in two parts:
(1) Provide an invalidation method (which just truncates the backing file).
(2) Abort attempts to copy anything read from the backing file whilst
invalidation is in progress.
Question: CacheFiles uses truncation in a couple of places. It has been using
notify_change() rather than sys_truncate() or something similar. This means
it bypasses a bunch of checks and suchlike that it possibly should be making
(security, file locking, lease breaking, vfsmount write). Should it be using
vfs_truncate() as added by a preceding patch or should it use notify_write()
and assume that anyone poking around in the cache files on disk gets
everything they deserve?
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Thu, 20 Dec 2012 21:52:36 +0000 (21:52 +0000)]
VFS: Make more complete truncate operation available to CacheFiles
Make a more complete truncate operation available to CacheFiles (including
security checks and suchlike) so that it can use this to clear invalidated
cache files.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Linus Torvalds [Thu, 20 Dec 2012 22:04:11 +0000 (14:04 -0800)]
Merge branch 'for-3.8' of git://linux-nfs.org/~bfields/linux
Pull nfsd update from Bruce Fields:
"Included this time:
- more nfsd containerization work from Stanislav Kinsbursky: we're
not quite there yet, but should be by 3.9.
- NFSv4.1 progress: implementation of basic backchannel security
negotiation and the mandatory BACKCHANNEL_CTL operation. See
http://wiki.linux-nfs.org/wiki/index.php/Server_4.0_and_4.1_issues
for remaining TODO's
- Fixes for some bugs that could be triggered by unusual compounds.
Our xdr code wasn't designed with v4 compounds in mind, and it
shows. A more thorough rewrite is still a todo.
- If you've ever seen "RPC: multiple fragments per record not
supported" logged while using some sort of odd userland NFS client,
that should now be fixed.
- Further work from Jeff Layton on our mechanism for storing
information about NFSv4 clients across reboots.
- Further work from Bryan Schumaker on his fault-injection mechanism
(which allows us to discard selective NFSv4 state, to excercise
rarely-taken recovery code paths in the client.)
- The usual mix of miscellaneous bugs and cleanup.
Thanks to everyone who tested or contributed this cycle."
* 'for-3.8' of git://linux-nfs.org/~bfields/linux: (111 commits)
nfsd4: don't leave freed stateid hashed
nfsd4: free_stateid can use the current stateid
nfsd4: cleanup: replace rq_resused count by rq_next_page pointer
nfsd: warn on odd reply state in nfsd_vfs_read
nfsd4: fix oops on unusual readlike compound
nfsd4: disable zero-copy on non-final read ops
svcrpc: fix some printks
NFSD: Correct the size calculation in fault_inject_write
NFSD: Pass correct buffer size to rpc_ntop
nfsd: pass proper net to nfsd_destroy() from NFSd kthreads
nfsd: simplify service shutdown
nfsd: replace boolean nfsd_up flag by users counter
nfsd: simplify NFSv4 state init and shutdown
nfsd: introduce helpers for generic resources init and shutdown
nfsd: make NFSd service structure allocated per net
nfsd: make NFSd service boot time per-net
nfsd: per-net NFSd up flag introduced
nfsd: move per-net startup code to separated function
nfsd: pass net to __write_ports() and down
nfsd: pass net to nfsd_set_nrthreads()
...
David Howells [Thu, 20 Dec 2012 21:52:36 +0000 (21:52 +0000)]
FS-Cache: Provide proper invalidation
Provide a proper invalidation method rather than relying on the netfs retiring
the cookie it has and getting a new one. The problem with this is that isn't
easy for the netfs to make sure that it has completed/cancelled all its
outstanding storage and retrieval operations on the cookie it is retiring.
Instead, have the cache provide an invalidation method that will cancel or wait
for all currently outstanding operations before invalidating the cache, and
will cause new operations to queue up behind that. Whilst invalidation is in
progress, some requests will be rejected until the cache can stack a barrier on
the operation queue to cause new operations to be deferred behind it.
Signed-off-by: David Howells <dhowells@redhat.com>