Tejun Heo [Sat, 7 Jul 2012 23:08:18 +0000 (16:08 -0700)]
cgroup: fix cgroup hierarchy umount race
48ddbe1946 "cgroup: make css->refcnt clearing on cgroup removal
optional" allowed a css to linger after the associated cgroup is
removed. As a css holds a reference on the cgroup's dentry, it means
that cgroup dentries may linger for a while.
Destroying a superblock which has dentries with positive refcnts is a
critical bug and triggers BUG() in vfs code. As each cgroup dentry
holds an s_active reference, any lingering cgroup has both its dentry
and the superblock pinned and thus preventing premature release of
superblock.
Unfortunately, after
48ddbe1946, there's a small window while
releasing a cgroup which is directly under the root of the hierarchy.
When a cgroup directory is released, vfs layer first deletes the
corresponding dentry and then invokes dput() on the parent, which may
recurse further, so when a cgroup directly below root cgroup is
released, the cgroup is first destroyed - which releases the s_active
it was holding - and then the dentry for the root cgroup is dput().
This creates a window where the root dentry's refcnt isn't zero but
superblock's s_active is. If umount happens before or during this
window, vfs will see the root dentry with non-zero refcnt and trigger
BUG().
Before
48ddbe1946, this problem didn't exist because the last dentry
reference was guaranteed to be put synchronously from rmdir(2)
invocation which holds s_active around the whole process.
Fix it by holding an extra superblock->s_active reference across
dput() from css release, which is the dput() path added by
48ddbe1946
and the only one which doesn't hold an extra s_active ref across the
final cgroup dput().
Signed-off-by: Tejun Heo <tj@kernel.org>
LKML-Reference: <
4FEEA5CB.
8070809@huawei.com>
Reported-by: shyju pv <shyju.pv@huawei.com>
Tested-by: shyju pv <shyju.pv@huawei.com>
Cc: Sasha Levin <levinsasha928@gmail.com>
Acked-by: Li Zefan <lizefan@huawei.com>
Tejun Heo [Sat, 7 Jul 2012 22:55:47 +0000 (15:55 -0700)]
Revert "cgroup: superblock can't be released with active dentries"
This reverts commit
fa980ca87d15bb8a1317853f257a505990f3ffde. The
commit was an attempt to fix a race condition where a cgroup hierarchy
may be unmounted with positive dentry reference on root cgroup. While
the commit made the race condition slightly more difficult to trigger,
the race was still there and could be reliably triggered using a
different test case.
Revert the incorrect fix. The next commit will describe the race and
fix it correctly.
Signed-off-by: Tejun Heo <tj@kernel.org>
LKML-Reference: <
4FEEA5CB.
8070809@huawei.com>
Reported-by: shyju pv <shyju.pv@huawei.com>
Cc: Sasha Levin <levinsasha928@gmail.com>
Acked-by: Li Zefan <lizefan@huawei.com>
Linus Torvalds [Sat, 7 Jul 2012 18:20:59 +0000 (11:20 -0700)]
Merge branch 'fixes' of git://git.linaro.org/people/rmk/linux-arm
Pull ARM fixes from Russell King:
"Last merge window, we had some updates from Al cleaning up the signal
restart handling. These have caused some problems on ARM, and while
Al has some fixes, we have some concerns with Al's patches but we've
been unsuccesful with discussing this.
We have got to the point where we need to do something, and we've
decided that the best solution is to revert the appropriate commits
until Al is able to reply to us.
Also included here are four patches to fix warnings that I've noticed
in my build system, and one fix for kprobes test code."
* 'fixes' of git://git.linaro.org/people/rmk/linux-arm:
ARM: fix warning caused by wrongly typed arm_dma_limit
ARM: fix warnings about atomic64_read
ARM: 7440/1: kprobes: only test 'sub pc, pc, #1b-2b+8-2' on ARMv6
ARM: 7441/1: perf: return -EOPNOTSUPP if requested mode exclusion is unavailable
ARM: 7443/1: Revert "new way of handling ERESTART_RESTARTBLOCK"
ARM: 7442/1: Revert "remove unused restart trampoline"
ARM: fix set_domain() macro
ARM: fix mach-versatile/pci.c warning
Linus Torvalds [Fri, 6 Jul 2012 22:32:18 +0000 (15:32 -0700)]
Merge tag 'ecryptfs-3.5-rc6-fixes' of git://git./linux/kernel/git/tyhicks/ecryptfs
Pull eCryptfs fixes from Tyler Hicks:
"Fixes an incorrect access mode check when preparing to open a file in
the lower filesystem. This isn't an urgent fix, but it is simple and
the check was obviously incorrect.
Also fixes a couple important bugs in the eCryptfs miscdev interface.
These changes are low risk due to the small number of users that use
the miscdev interface. I was able to keep the changes minimal and I
have some cleaner, more complete changes queued up for the next merge
window that will build on these patches."
* tag 'ecryptfs-3.5-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs:
eCryptfs: Gracefully refuse miscdev file ops on inherited/passed files
eCryptfs: Fix lockdep warning in miscdev operations
eCryptfs: Properly check for O_RDONLY flag before doing privileged open
Linus Torvalds [Fri, 6 Jul 2012 20:59:50 +0000 (13:59 -0700)]
Merge git://git./linux/kernel/git/nab/target-pending
Pull target fixes from Nicholas Bellinger:
"Two minor target fixes. There is really nothing exciting and/or
controversial this time around.
There's one fix from MDR for a RCU debug warning message within tcm_fc
code (CC'ed to stable), and a small AC fix for qla_target.c based upon
a recent Coverity static report.
Also, there is one other outstanding virtio-scsi LUN scanning bugfix
that has been uncovered with the in-flight tcm_vhost driver over the
last days, and that needs to make it into 3.5 final too. This patch
has been posted to linux-scsi again here:
http://marc.info/?l=linux-scsi&m=
134160609212542&w=2
and I've asked James to include it in his next PULL request."
* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
qla2xxx: print the right array elements in qlt_async_event
tcm_fc: Resolve suspicious RCU usage warnings
Tyler Hicks [Mon, 11 Jun 2012 16:24:11 +0000 (09:24 -0700)]
eCryptfs: Gracefully refuse miscdev file ops on inherited/passed files
File operations on /dev/ecryptfs would BUG() when the operations were
performed by processes other than the process that originally opened the
file. This could happen with open files inherited after fork() or file
descriptors passed through IPC mechanisms. Rather than calling BUG(), an
error code can be safely returned in most situations.
In ecryptfs_miscdev_release(), eCryptfs still needs to handle the
release even if the last file reference is being held by a process that
didn't originally open the file. ecryptfs_find_daemon_by_euid() will not
be successful, so a pointer to the daemon is stored in the file's
private_data. The private_data pointer is initialized when the miscdev
file is opened and only used when the file is released.
https://launchpad.net/bugs/994247
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Reported-by: Sasha Levin <levinsasha928@gmail.com>
Tested-by: Sasha Levin <levinsasha928@gmail.com>
Alan Cox [Wed, 4 Jul 2012 15:35:35 +0000 (16:35 +0100)]
qla2xxx: print the right array elements in qlt_async_event
Based upon Alan's patch from Coverity scan id 793583, these debug
messages in qlt_async_event() should be starting from byte 0, which is
always the Asynchronous Event Status Code from the parent switch statement.
Also, rename reason_code -> login_code following the language used in
2500 FW spec for Port Database Changed (0x8014) -> Port Database Changed
Event Mailbox Register for mailbox[2].
Signed-off-by: Alan Cox <alan@linux.intel.com>
Cc: Chad Dupuis <chad.dupuis@qlogic.com>
Cc: Giridhar Malavali <giridhar.malavali@qlogic.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Mark Rustad [Tue, 26 Jun 2012 22:57:30 +0000 (15:57 -0700)]
tcm_fc: Resolve suspicious RCU usage warnings
Use rcu_dereference_protected to tell rcu that the ft_lport_lock
is held during ft_lport_create. This resolved "suspicious RCU usage"
warnings when debugging options are turned on.
Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Linus Torvalds [Fri, 6 Jul 2012 17:34:48 +0000 (10:34 -0700)]
Merge tag 'for-linus-
20120706' of git://git.infradead.org/linux-mtd
Pull two MTD fixes from David Woodhouse:
- Fix a logic error in OLPC CAFÉ NAND ready() function.
- Fix regression due to bitflip handling changes.
* tag 'for-linus-
20120706' of git://git.infradead.org/linux-mtd:
mtd: cafe_nand: fix an & vs | mistake
mtd: nand: initialize bitflip_threshold prior to BBT scanning
Andy Lutomirski [Thu, 5 Jul 2012 23:00:11 +0000 (16:00 -0700)]
mm: Hold a file reference in madvise_remove
Otherwise the code races with munmap (causing a use-after-free
of the vma) or with close (causing a use-after-free of the struct
file).
The bug was introduced by commit
90ed52ebe481 ("[PATCH] holepunch: fix
mmap_sem i_mutex deadlock")
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Miklos Szeredi <mszeredi@suse.cz>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: stable@vger.kernel.org
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 6 Jul 2012 17:04:39 +0000 (10:04 -0700)]
Merge branch 'fixes' of git://git./linux/kernel/git/jlbec/ocfs2
Pull ocfs2 fixes from Joel Becker.
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2:
aio: make kiocb->private NUll in init_sync_kiocb()
ocfs2: Fix bogus error message from ocfs2_global_read_info
ocfs2: for SEEK_DATA/SEEK_HOLE, return internal error unchanged if ocfs2_get_clusters_nocache() or ocfs2_inode_lock() call failed.
ocfs2: use spinlock irqsave for downconvert lock.patch
ocfs2: Misplaced parens in unlikley
ocfs2: clear unaligned io flag when dio fails
Linus Torvalds [Fri, 6 Jul 2012 17:02:12 +0000 (10:02 -0700)]
Merge git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French.
* git://git.samba.org/sfrench/cifs-2.6:
cifs: when server doesn't set CAP_LARGE_READ_X, cap default rsize at MaxBufferSize
cifs: fix parsing of password mount option
Linus Torvalds [Fri, 6 Jul 2012 16:50:39 +0000 (09:50 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input
Pull input layer fixes from Dmitry Torokhov:
"Two fixes for regressions in Wacom driver and fixes for drivers using
threaded IRQ framework without specifying IRQF_ONESHOT."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: request threaded-only IRQs with IRQF_ONESHOT
Input: wacom - don't retrieve touch_max when it is predefined
Input: wacom - fix retrieving touch_max bug
Input: fix input.h kernel-doc warning
Dan Carpenter [Sat, 9 Jun 2012 16:08:25 +0000 (19:08 +0300)]
mtd: cafe_nand: fix an & vs | mistake
The intent here was clearly to set result to true if the 0x40000000 flag
was set. But instead there was a | vs & typo and we always set result
to true.
Artem: check the spec at
wiki.laptop.org/images/5/5c/88ALP01_Datasheet_July_2007.pdf
and this fix looks correct.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: stable@vger.kernel.org
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Linus Torvalds [Thu, 5 Jul 2012 20:20:02 +0000 (13:20 -0700)]
Merge tag 'fixes-for-linus' of git://git./linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Arnd Bergmann:
"Small fixes on multiple ARM platforms
- A build regression from a previous fix on dove and mv78xx0
- Two fixes for recently (3.5-rc1) changed mmp/pxa code
- multiple omap2+ bug fixes
- two trivial fixes for i.MX
- one v3.5 regression for mxs"
* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
ARM: apx4devkit: fix FEC enabling PHY clock
ARM: OMAP2+: hwmod data: Fix wrong McBSP clock alias on OMAP4
ARM: OMAP4: hwmod data: temporarily comment out data for the usb_host_fs and aess IP blocks
ARM: Orion: Fix WDT compile for Dove and MV78xx0
ARM: mmp: remove mach/gpio-pxa.h
ARM: imx: assert SCC gate stays enabled
ARM: OMAP4: TWL6030: ensure sys_nirq1 is mux'd and wakeup enabled
ARM: OMAP2: Overo: init I2C before MMC to fix MMC suspend/resume failure
ARM: imx27_visstrim_m10: Do not include <asm/system.h>
ARM: pxa: hx4700: Fix basic suspend/resume
Linus Torvalds [Thu, 5 Jul 2012 20:16:21 +0000 (13:16 -0700)]
Merge git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fix from Marcelo Tosatti:
"Memory leak and oops on the x86 mmu code, and sanitization of the
KVM_IRQFD ioctl."
* git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: MMU: fix shrinking page from the empty mmu
KVM: fix fault page leak
KVM: Sanitize KVM_IRQFD flags
KVM: Add missing KVM_IRQFD API documentation
KVM: Pass kvm_irqfd to functions
Linus Torvalds [Thu, 5 Jul 2012 20:09:37 +0000 (13:09 -0700)]
Merge branch 'fixes-for-3.5' of git://git./linux/kernel/git/cooloney/linux-leds
Pull leds fix from Bryan Wu:
"Fix for heartbeat led trigger driver"
* 'fixes-for-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/cooloney/linux-leds:
leds: heartbeat: fix bug on panic
Linus Torvalds [Thu, 5 Jul 2012 20:06:25 +0000 (13:06 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/mason/linux-btrfs
Pull btrfs updates from Chris Mason:
"I held off on my rc5 pull because I hit an oops during log recovery
after a crash. I wanted to make sure it wasn't a regression because
we have some logging fixes in here.
It turns out that a commit during the merge window just made it much
more likely to trigger directory logging instead of full commits,
which exposed an old bug.
The new backref walking code got some additional fixes. This should
be the final set of them.
Josef fixed up a corner where our O_DIRECT writes and buffered reads
could expose old file contents (not stale, just not the most recent).
He and Liu Bo fixed crashes during tree log recover as well.
Ilya fixed errors while we resume disk balancing operations on
readonly mounts."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: run delayed directory updates during log replay
Btrfs: hold a ref on the inode during writepages
Btrfs: fix tree log remove space corner case
Btrfs: fix wrong check during log recovery
Btrfs: use _IOR for BTRFS_IOC_SUBVOL_GETFLAGS
Btrfs: resume balance on rw (re)mounts properly
Btrfs: restore restriper state on all mounts
Btrfs: fix dio write vs buffered read race
Btrfs: don't count I/O statistic read errors for missing devices
Btrfs: resolve tree mod log locking issue in btrfs_next_leaf
Btrfs: fix tree mod log rewind of ADD operations
Btrfs: leave critical region in btrfs_find_all_roots as soon as possible
Btrfs: always put insert_ptr modifications into the tree mod log
Btrfs: fix tree mod log for root replacements at leaf level
Btrfs: support root level changes in __resolve_indirect_ref
Btrfs: avoid waiting for delayed refs when we must not
Linus Torvalds [Thu, 5 Jul 2012 18:53:47 +0000 (11:53 -0700)]
Merge branch 'fixes-for-grant' of git://sources.calxeda.com/kernel/linux
Pull DT fixes from Rob Herring:
"Mainly some documentation updates and 2 fixes:
- An export symbol fix for of_platform_populate from Stephen W.
- A fix for the order compatible entries are matched to ensure the
first compatible string is matched when there are multiple matches."
Normally these would go through Grant Likely (thus the "fixes-for-grant"
branch name), but Grant is in the middle of moving to Scotland, and is
practically offline until sometime in August. So pull directly from Rob.
* 'fixes-for-grant' of git://sources.calxeda.com/kernel/linux:
of: match by compatible property first
dt: mc13xxx.txt: Fix gpio number assignment
dt: fsl-fec.txt: Fix gpio number assignment
dt: fsl-mma8450.txt: Add missing 'reg' description
dt: fsl-imx-esdhc.txt: Fix gpio number assignment
dt: fsl-imx-cspi.txt: Fix comment about GPIOs used for chip selects
of: Add Avionic Design vendor prefix
of: export of_platform_populate()
Russell King [Thu, 5 Jul 2012 12:11:31 +0000 (13:11 +0100)]
ARM: fix warning caused by wrongly typed arm_dma_limit
arch/arm/mm/init.c: In function 'arm_memblock_init':
arch/arm/mm/init.c:380: warning: comparison of distinct pointer types lacks a cast
by fixing the typecast in its definition when DMA_ZONE is disabled.
This was missed in
4986e5c7c (ARM: mm: fix type of the arm_dma_limit
global variable).
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Russell King [Thu, 5 Jul 2012 12:06:32 +0000 (13:06 +0100)]
ARM: fix warnings about atomic64_read
Fix:
net/netfilter/xt_connbytes.c: In function 'connbytes_mt':
net/netfilter/xt_connbytes.c:43: warning: passing argument 1 of 'atomic64_read' discards qualifiers from pointer target type
...
by adding the missing const.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Rabin Vincent [Wed, 4 Jul 2012 06:37:37 +0000 (07:37 +0100)]
ARM: 7440/1: kprobes: only test 'sub pc, pc, #1b-2b+8-2' on ARMv6
'sub pc, pc, #1b-2b+8-2' results in address<1:0> == '10'.
sub pc, pc, #const (== ADR pc, #const) performs an interworking branch
(BXWritePC()) on ARMv7+ and a simple branch (BranchWritePC()) on earlier
versions.
In ARM state, BXWritePC() is UNPREDICTABLE when address<1:0> == '10'.
In ARM state on ARMv6+, BranchWritePC() ignores address<1:0>. Before
ARMv6, BranchWritePC() is UNPREDICTABLE if address<1:0> != '00'
So the instruction is UNPREDICTABLE both before and after v6.
Acked-by: Jon Medhurst <tixy@yxit.co.uk>
Signed-off-by: Rabin Vincent <rabin.vincent@stericsson.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Arnd Bergmann [Thu, 5 Jul 2012 10:16:13 +0000 (12:16 +0200)]
Merge tag 'omap-fixes-for-v3.5-rc5' of git://git./linux/kernel/git/tmlind/linux-omap into fixes
PM related fixes for omaps mostly to get suspend/resume
working again.
* tag 'omap-fixes-for-v3.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
ARM: OMAP2+: hwmod data: Fix wrong McBSP clock alias on OMAP4
ARM: OMAP4: hwmod data: temporarily comment out data for the usb_host_fs and aess IP blocks
ARM: OMAP4: TWL6030: ensure sys_nirq1 is mux'd and wakeup enabled
ARM: OMAP2: Overo: init I2C before MMC to fix MMC suspend/resume failure
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Arnd Bergmann [Thu, 5 Jul 2012 09:06:36 +0000 (11:06 +0200)]
Merge branch 'mxs/fixes-for-3.5' of git://git.linaro.org/people/shawnguo/linux-2.6 into fixes
* 'mxs/fixes-for-3.5' of git://git.linaro.org/people/shawnguo/linux-2.6:
ARM: apx4devkit: fix FEC enabling PHY clock
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Will Deacon [Wed, 4 Jul 2012 17:15:42 +0000 (18:15 +0100)]
ARM: 7441/1: perf: return -EOPNOTSUPP if requested mode exclusion is unavailable
We currently return -EPERM if the user requests mode exclusion that is
not supported by the CPU. This looks pretty confusing from userspace
and is inconsistent with other architectures (ppc, x86).
This patch returns -EOPNOTSUPP instead.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Will Deacon [Wed, 4 Jul 2012 17:17:16 +0000 (18:17 +0100)]
ARM: 7443/1: Revert "new way of handling ERESTART_RESTARTBLOCK"
This reverts commit
6b5c8045ecc7e726cdaa2a9d9c8e5008050e1252.
Conflicts:
arch/arm/kernel/ptrace.c
The new syscall restarting code can lead to problems if we take an
interrupt in userspace just before restarting the svc instruction. If
a signal is delivered when returning from the interrupt, the
TIF_SYSCALL_RESTARTSYS will remain set and cause any syscalls executed
from the signal handler to be treated as a restart of the previously
interrupted system call. This includes the final sigreturn call, meaning
that we may fail to exit from the signal context. Furthermore, if a
system call made from the signal handler requires a restart via the
restart_block, it is possible to clear the thread flag and fail to
restart the originally interrupted system call.
The right solution to this problem is to perform the restarting in the
kernel, avoiding the possibility of handling a further signal before the
restart is complete. Since we're almost at -rc6, let's revert the new
method for now and aim for in-kernel restarting at a later date.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Will Deacon [Wed, 4 Jul 2012 17:16:30 +0000 (18:16 +0100)]
ARM: 7442/1: Revert "remove unused restart trampoline"
This reverts commit
fa18484d0947b976a769d15c83c50617493c81c1.
We need the restart trampoline back so that we can revert a related
problematic patch
6b5c8045ecc7e726cdaa2a9d9c8e5008050e1252 ("arm: new
way of handling ERESTART_RESTARTBLOCK").
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Russell King [Wed, 4 Jul 2012 16:05:28 +0000 (17:05 +0100)]
ARM: fix set_domain() macro
Avoid polluting drivers with a set_domain() macro, which interferes with
structure member names:
drivers/net/wireless/ath/ath9k/dfs_pattern_detector.c:294:33: error: macro "set_domain" passed 2 arguments, but takes just 1
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Junxiao Bi [Wed, 27 Jun 2012 09:09:54 +0000 (17:09 +0800)]
aio: make kiocb->private NUll in init_sync_kiocb()
Ocfs2 uses kiocb.*private as a flag of unsigned long size. In
commit
a11f7e6 ocfs2: serialize unaligned aio, the unaligned
io flag is involved in it to serialize the unaligned aio. As
*private is not initialized in init_sync_kiocb() of do_sync_write(),
this unaligned io flag may be unexpectly set in an aligned dio.
And this will cause OCFS2_I(inode)->ip_unaligned_aio decreased
to -1 in ocfs2_dio_end_io(), thus the following unaligned dio
will hang forever at ocfs2_aiodio_wait() in ocfs2_file_aio_write().
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Cc: stable@vger.kernel.org
Acked-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Joel Becker <jlbec@evilplan.org>
Tony Lindgren [Thu, 5 Jul 2012 08:12:08 +0000 (01:12 -0700)]
Merge tag 'omap-fixes-b-for-3.5rc' of git://git./linux/kernel/git/pjw/omap-pending into fixes
A few more OMAP fixes for 3.5-rc. These fix some bugs with power
management and McBSP.
Lauri Hintsala [Thu, 5 Jul 2012 07:31:36 +0000 (10:31 +0300)]
ARM: apx4devkit: fix FEC enabling PHY clock
Ethernet stopped to work after mxs clk framework change.
Signed-off-by: Lauri Hintsala <lauri.hintsala@bluegiga.com>
Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Lars-Peter Clausen [Wed, 4 Jul 2012 20:02:56 +0000 (13:02 -0700)]
Input: request threaded-only IRQs with IRQF_ONESHOT
Since commit
1c6c69525b ("genirq: Reject bogus threaded irq requests")
threaded IRQs without a primary handler need to be requested with
IRQF_ONESHOT, otherwise the request will fail. This patch adds the
IRQF_ONESHOT to input drivers where it is missing. Not modified by
this patch are those drivers where the requested IRQ will always be a
nested IRQ (e.g. because it's part of an MFD), since for this special
case IRQF_ONESHOT is not required to be specified when requesting the
IRQ.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Russell King [Wed, 4 Jul 2012 16:04:57 +0000 (17:04 +0100)]
ARM: fix mach-versatile/pci.c warning
arch/arm/mach-versatile/pci.c: In function 'versatile_map_irq':
arch/arm/mach-versatile/pci.c:342: warning: unused variable 'devslot'
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Benoit Cousson [Wed, 4 Jul 2012 12:55:29 +0000 (06:55 -0600)]
ARM: OMAP2+: hwmod data: Fix wrong McBSP clock alias on OMAP4
The commit
503d0ea24d1d3dd3db95e5e0edd693da7a2a23eb
ARM: OMAP4: hwmod data: Add aliases for McBSP fclk clocks
added a wrong "prcm_clk" alias for PRCM clock whereas the McBSP
driver and previous OMAPs are using "prcm_fck".
It thus lead to the following warning.
[ 47.409729] omap-mcbsp: clks: could not clk_get() prcm_fck
Fix that by changing the opt_clk role to prcm_fck.
Reported-by: Misael Lopez Cruz <misael.lopez@ti.com>
Signed-off-by: Benoit Cousson <b-cousson@ti.com>
Cc: Peter Ujfalusi <peter.ujfalusi@ti.com>
Tested-by: Sebastien Guiriec <s-guiriec@ti.com>
Signed-off-by: Paul Walmsley <paul@pwsan.com>
Paul Walmsley [Wed, 4 Jul 2012 12:55:29 +0000 (06:55 -0600)]
ARM: OMAP4: hwmod data: temporarily comment out data for the usb_host_fs and aess IP blocks
The OMAP4 usb_host_fs (OHCI) and AESS IP blocks require some special
programming for them to enter idle. Without this programming, they
will prevent the rest of the chip from entering full chip idle.
To implement the idle programming cleanly, this will take some
coordination between maintainers. This is likely to take some time,
so it is probably best to leave this for 3.6 or 3.7. So, in the
meantime, prevent these IP blocks from being registered.
Later, once the appropriate support is available, this patch can be
reverted.
This second version comments out the IP block data since Benoît didn't
like removing it.
Signed-off-by: Paul Walmsley <paul@pwsan.com>
Cc: Benoît Cousson <b-cousson@ti.com>
Arnd Bergmann [Wed, 4 Jul 2012 11:49:58 +0000 (13:49 +0200)]
Merge branch 'fixes' of git://github.com/hzhuang1/linux into fixes
From Haojian Zhuang <haojian.zhuang@gmail.com>:
* 'fixes' of git://github.com/hzhuang1/linux:
ARM: mmp: remove mach/gpio-pxa.h
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Arnd Bergmann [Wed, 4 Jul 2012 11:46:05 +0000 (13:46 +0200)]
Merge tag 'v3.5-imx-fixes' of git://git.pengutronix.de/git/imx/linux-2.6 into fixes
From Sascha Hauer <s.hauer@pengutronix.de>:
ARM i.MX fixes for v3.5-rc5
* tag 'v3.5-imx-fixes' of git://git.pengutronix.de/git/imx/linux-2.6:
ARM: imx: assert SCC gate stays enabled
ARM: imx27_visstrim_m10: Do not include <asm/system.h>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Andrew Lunn [Fri, 29 Jun 2012 07:25:58 +0000 (09:25 +0200)]
ARM: Orion: Fix WDT compile for Dove and MV78xx0
Commit
0fa1f0609a0c1fe8b2be3c0089a2cb48f7fda521 (ARM: Orion: Fix
Virtual/Physical mixup with watchdog) broke the Dove & MV78xx0
build. Although these two SoC don't use the watchdog, the shared
platform code still needs to build. Add the necessary defines.
Cc: stable@vger.kernel.org
Reported-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Paul Bolle [Mon, 2 Jul 2012 21:40:14 +0000 (23:40 +0200)]
ARM: mmp: remove mach/gpio-pxa.h
Commit
157d2644cb0c1e71a18baaffca56d2b1d0ebf10f ("ARM: pxa: change gpio
to platform device") removed all includes of mach/gpio-pxa.h. It kept
this unused header in the tree. Using it can't work, as it itself
includes the non-existent header plat/gpio-pxa.h. This header can safely
be removed.
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Acked-by: Eric Miao <eric.y.miao@gmail.com>
Signed-off-by: Haojian Zhuang <haojian.zhuang@gmail.com>
Alexander Holler [Tue, 3 Jul 2012 06:35:47 +0000 (14:35 +0800)]
leds: heartbeat: fix bug on panic
With commit
49dca5aebfdeadd4bf27b6cb4c60392147dc35a4 I introduced
a bug (visible if CONFIG_PROVE_RCU is enabled) which occures when a panic
has happened:
[ 1526.520230] ===============================
[ 1526.520230] [ INFO: suspicious RCU usage. ]
[ 1526.520230] 3.5.0-rc1+ #12 Not tainted
[ 1526.520230] -------------------------------
[ 1526.520230] /c/kernel-tests/mm/include/linux/rcupdate.h:436 Illegal context switch in RCU read-side critical section!
[ 1526.520230]
[ 1526.520230] other info that might help us debug this:
[ 1526.520230]
[ 1526.520230]
[ 1526.520230] rcu_scheduler_active = 1, debug_locks = 0
[ 1526.520230] 3 locks held by net.agent/3279:
[ 1526.520230] #0: (&mm->mmap_sem){++++++}, at: [<
ffffffff82f85962>] do_page_fault+0x193/0x390
[ 1526.520230] #1: (panic_lock){+.+...}, at: [<
ffffffff82ed2830>] panic+0x37/0x1d3
[ 1526.520230] #2: (rcu_read_lock){.+.+..}, at: [<
ffffffff810b9b28>] rcu_lock_acquire+0x0/0x29
[ 1526.520230]
[ 1526.520230] stack backtrace:
[ 1526.520230] Pid: 3279, comm: net.agent Not tainted 3.5.0-rc1+ #12
[ 1526.520230] Call Trace:
[ 1526.520230] [<
ffffffff810e1570>] lockdep_rcu_suspicious+0x109/0x112
[ 1526.520230] [<
ffffffff810bfe3a>] rcu_preempt_sleep_check+0x45/0x47
[ 1526.520230] [<
ffffffff810bfe5a>] __might_sleep+0x1e/0x19a
[ 1526.520230] [<
ffffffff82f8010e>] down_write+0x26/0x81
[ 1526.520230] [<
ffffffff8276a966>] led_trigger_unregister+0x1f/0x9c
[ 1526.520230] [<
ffffffff8276def5>] heartbeat_reboot_notifier+0x15/0x19
[ 1526.520230] [<
ffffffff82f85bf5>] notifier_call_chain+0x96/0xcd
[ 1526.520230] [<
ffffffff82f85cba>] __atomic_notifier_call_chain+0x8e/0xff
[ 1526.520230] [<
ffffffff81094b7c>] ? kmsg_dump+0x37/0x1eb
[ 1526.520230] [<
ffffffff82f85d3f>] atomic_notifier_call_chain+0x14/0x16
[ 1526.520230] [<
ffffffff82ed28e1>] panic+0xe8/0x1d3
[ 1526.520230] [<
ffffffff811473e2>] out_of_memory+0x15d/0x1d3
So in case of a panic, now just turn of the LED. Other approaches like
scheduling a work to unregister the trigger aren't working because there
isn't much which still runs after a panic occured (except timers).
Signed-off-by: Alexander Holler <holler@ahsoftware.de>
Signed-off-by: Bryan Wu <bryan.wu@canonical.com>
Uwe Kleine-König [Mon, 2 Jul 2012 09:30:34 +0000 (11:30 +0200)]
ARM: imx: assert SCC gate stays enabled
The SCC clock is needed in internal boot mode and so must keep enabled.
This same issue was fixed for the pre-common-clk code in commit
3d6e614 (mx35: Fix boot ROM hang in internal boot mode)
Cc: John Ogness <jogness@linutronix.de>
Cc: Hans J. Koch <hjk@hansjkoch.de>
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Jan Kara [Fri, 10 Feb 2012 09:50:07 +0000 (10:50 +0100)]
ocfs2: Fix bogus error message from ocfs2_global_read_info
'status' variable in ocfs2_global_read_info() is always != 0 when leaving the
function because it happens to contain number of read bytes. Thus we always log
error message although everything is OK. Since all error cases properly call
mlog_errno() before jumping to out_err, there's no reason to call mlog_errno()
on exit at all. This is a fallout of
c1e8d35e (conversion of mlog_exit()
calls).
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Joel Becker <jlbec@evilplan.org>
Jeff Liu [Thu, 9 Feb 2012 06:42:22 +0000 (14:42 +0800)]
ocfs2: for SEEK_DATA/SEEK_HOLE, return internal error unchanged if ocfs2_get_clusters_nocache() or ocfs2_inode_lock() call failed.
Hello,
Since ENXIO only means "offset beyond EOF" for SEEK_DATA/SEEK_HOLE,
Hence we should return the internal error unchanged if ocfs2_inode_lock() or
ocfs2_get_clusters_nocache() call failed rather than ENXIO.
Otherwise, it will confuse the user applications when they trying to understand the root cause.
Thanks Dave for pointing this out.
Thanks,
-Jeff
Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Signed-off-by: Joel Becker <jlbec@evilplan.org>
Srinivas Eeda [Tue, 31 Jan 2012 05:51:22 +0000 (21:51 -0800)]
ocfs2: use spinlock irqsave for downconvert lock.patch
When ocfs2dc thread holds dc_task_lock spinlock and receives soft IRQ it
deadlock itself trying to get same spinlock in ocfs2_wake_downconvert_thread.
Below is the stack snippet.
The patch disables interrupts when acquiring dc_task_lock spinlock.
ocfs2_wake_downconvert_thread
ocfs2_rw_unlock
ocfs2_dio_end_io
dio_complete
.....
bio_endio
req_bio_endio
....
scsi_io_completion
blk_done_softirq
__do_softirq
do_softirq
irq_exit
do_IRQ
ocfs2_downconvert_thread
[kthread]
Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com>
Signed-off-by: Joel Becker <jlbec@evilplan.org>
roel [Mon, 12 Dec 2011 22:40:51 +0000 (23:40 +0100)]
ocfs2: Misplaced parens in unlikley
Fix misplaced parentheses
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Joel Becker <jlbec@evilplan.org>
Junxiao Bi [Wed, 27 Jun 2012 09:09:55 +0000 (17:09 +0800)]
ocfs2: clear unaligned io flag when dio fails
The unaligned io flag is set in the kiocb when an unaligned
dio is issued, it should be cleared even when the dio fails,
or it may affect the following io which are using the same
kiocb.
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Cc: stable@vger.kernel.org
Signed-off-by: Joel Becker <jlbec@evilplan.org>
Linus Torvalds [Wed, 4 Jul 2012 01:06:49 +0000 (18:06 -0700)]
Merge tag 'clk-fixes-for-linus' of git://git.linaro.org/people/mturquette/linux
Pull fix to common clk framework from Michael Turquette:
"The previous set of common clk fixes for -rc5 left an uninitialized
int which could lead to bad array indexing when switching clock
parents. The issue is fixed with a trivial change to the code flow in
__clk_set_parent."
* tag 'clk-fixes-for-linus' of git://git.linaro.org/people/mturquette/linux:
clk: fix parent validation in __clk_set_parent()
Linus Torvalds [Wed, 4 Jul 2012 01:05:35 +0000 (18:05 -0700)]
Merge tag 'md-3.5-fixes' of git://neil.brown.name/md
Pull raid10 build failure fix from NeilBrown:
"I really shouldn't do important things late in the day. It seems that
I get careless."
* tag 'md-3.5-fixes' of git://neil.brown.name/md:
md/raid10: fix careless build error
Linus Torvalds [Wed, 4 Jul 2012 01:01:54 +0000 (18:01 -0700)]
Merge git://git./linux/kernel/git/davem/net
Pull networking update from David Miller:
1) Fix RX sequence number handling in mwifiex, from Stone Piao.
2) Netfilter ipset mis-compares device names, fix from Florian
Westphal.
3) Fix route leak in ipv6 IPVS, from Eric Dumazet.
4) NFS fixes. Several buffer overflows in NCI layer from Dan
Rosenberg, and release sock OOPS'er fix from Eric Dumazet.
5) Fix WEP handling ath9k, we started using a bit the chip provides to
indicate undecrypted packets but that bit turns out to be unreliable
in certain configurations. Fix from Felix Fietkau.
6) Fix Kconfig dependency bug in wlcore, from Randy Dunlap.
7) New USB IDs for rtlwifi driver from Larry Finger.
8) Fix crashes in qmi_wwan usbnet driver when disconnecting, from Bjørn
Mork.
9) Gianfar driver programs coalescing settings properly in single queue
mode, but does not do so in multi-queue mode. Fix from Claudiu
Manoil.
10) Missing module.h include in davinci_cpdma.c, from Daniel Mack.
11) Need dummy handler for IPSET_CMD_NONE otherwise we crash in ipset if
we get this via nfnetlink, fix from Tomasz Bursztyka.
12) Missing RCU unlock in nfnetlink error path, also from Tomasz.
13) Fix divide by zero in igbvf when the user tries to set an RX
coalescing value of 0 usecs, from Mitch A Williams.
14) We can process SCTP sacks for the wrong transport, oops. Fix from
Neil Horman.
15) Remove hw IP payload checksumming from e1000e driver. This has zery
value in our stack, and turning it on creates a very unintuitive
restriction for users when using jumbo MTUs.
Specifically, when IP payload checksums are on you cannot use both
receive hashing offload and jumbo MTU. Fix from Bruce Allan.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (27 commits)
e1000e: remove use of IP payload checksum
sctp: be more restrictive in transport selection on bundled sacks
igbvf: fix divide by zero
netfilter: nfnetlink: fix missing rcu_read_unlock in nfnetlink_rcv_msg
netfilter: ipset: fix crash if IPSET_CMD_NONE command is sent
davinci_cpdma: include linux/module.h
gianfar: Fix RXICr/TXICr programming for multi-queue mode
net: Downgrade CAP_SYS_MODULE deprecated message from error to warning.
net: qmi_wwan: fix Oops while disconnecting
mwifiex: fix memory leak associated with IE manamgement
ath9k: fix panic caused by returning a descriptor we have queued for reuse
mac80211: correct behaviour on unrecognised action frames
ath9k: enable serialize_regmode for non-PCIE AR9287
rtlwifi: rtl8192cu: New USB IDs
NFC: Return from rawsock_release when sk is NULL
iwlwifi: fix activating inactive stations
wlcore: drop INET dependency
ath9k: fix dynamic WEP related regression
NFC: Prevent multiple buffer overflows in NCI
netfilter: update location of my trees
...
NeilBrown [Tue, 3 Jul 2012 23:35:35 +0000 (09:35 +1000)]
md/raid10: fix careless build error
build error introduced by commit
b357f04a67c2aeee8
That function doesn't get extra args until a later patch. Bother.
Reported-by: Fengguang Wu <wfg@linux.intel.com>
Reported-by: Simon Kirby <sim@hostway.ca>
Reported-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Signed-off-by: NeilBrown <neilb@suse.de>
Tyler Hicks [Mon, 11 Jun 2012 17:21:34 +0000 (10:21 -0700)]
eCryptfs: Fix lockdep warning in miscdev operations
Don't grab the daemon mutex while holding the message context mutex.
Addresses this lockdep warning:
ecryptfsd/2141 is trying to acquire lock:
(&ecryptfs_msg_ctx_arr[i].mux){+.+.+.}, at: [<
ffffffffa029c213>] ecryptfs_miscdev_read+0x143/0x470 [ecryptfs]
but task is already holding lock:
(&(*daemon)->mux){+.+...}, at: [<
ffffffffa029c2ec>] ecryptfs_miscdev_read+0x21c/0x470 [ecryptfs]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&(*daemon)->mux){+.+...}:
[<
ffffffff810a3b8d>] lock_acquire+0x9d/0x220
[<
ffffffff8151c6da>] __mutex_lock_common+0x5a/0x4b0
[<
ffffffff8151cc64>] mutex_lock_nested+0x44/0x50
[<
ffffffffa029c5d7>] ecryptfs_send_miscdev+0x97/0x120 [ecryptfs]
[<
ffffffffa029b744>] ecryptfs_send_message+0x134/0x1e0 [ecryptfs]
[<
ffffffffa029a24e>] ecryptfs_generate_key_packet_set+0x2fe/0xa80 [ecryptfs]
[<
ffffffffa02960f8>] ecryptfs_write_metadata+0x108/0x250 [ecryptfs]
[<
ffffffffa0290f80>] ecryptfs_create+0x130/0x250 [ecryptfs]
[<
ffffffff811963a4>] vfs_create+0xb4/0x120
[<
ffffffff81197865>] do_last+0x8c5/0xa10
[<
ffffffff811998f9>] path_openat+0xd9/0x460
[<
ffffffff81199da2>] do_filp_open+0x42/0xa0
[<
ffffffff81187998>] do_sys_open+0xf8/0x1d0
[<
ffffffff81187a91>] sys_open+0x21/0x30
[<
ffffffff81527d69>] system_call_fastpath+0x16/0x1b
-> #0 (&ecryptfs_msg_ctx_arr[i].mux){+.+.+.}:
[<
ffffffff810a3418>] __lock_acquire+0x1bf8/0x1c50
[<
ffffffff810a3b8d>] lock_acquire+0x9d/0x220
[<
ffffffff8151c6da>] __mutex_lock_common+0x5a/0x4b0
[<
ffffffff8151cc64>] mutex_lock_nested+0x44/0x50
[<
ffffffffa029c213>] ecryptfs_miscdev_read+0x143/0x470 [ecryptfs]
[<
ffffffff811887d3>] vfs_read+0xb3/0x180
[<
ffffffff811888ed>] sys_read+0x4d/0x90
[<
ffffffff81527d69>] system_call_fastpath+0x16/0x1b
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Tyler Hicks [Tue, 12 Jun 2012 18:17:01 +0000 (11:17 -0700)]
eCryptfs: Properly check for O_RDONLY flag before doing privileged open
If the first attempt at opening the lower file read/write fails,
eCryptfs will retry using a privileged kthread. However, the privileged
retry should not happen if the lower file's inode is read-only because a
read/write open will still be unsuccessful.
The check for determining if the open should be retried was intended to
be based on the access mode of the lower file's open flags being
O_RDONLY, but the check was incorrectly performed. This would cause the
open to be retried by the privileged kthread, resulting in a second
failed open of the lower file. This patch corrects the check to
determine if the open request should be handled by the privileged
kthread.
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Dan Carpenter <dan.carpenter@oracle.com>
Linus Torvalds [Tue, 3 Jul 2012 22:51:22 +0000 (15:51 -0700)]
floppy: cancel any pending fd_timeouts before adding a new one
In commit
070ad7e793dc ("floppy: convert to delayed work and
single-thread wq") the 'fd_timeout' timer was converted to a delayed
work. However, the "del_timer(&fd_timeout)" was lost in the process,
and any previous pending timeouts would stay active when we then
re-queued the timeout.
This resulted in the floppy probe sequence having a (stale) 20s timeout
rather than the intended 3s timeout, and thus made booting with the
floppy driver (but no actual floppy controller) take much longer than it
should.
Of course, there's little reason for most people to compile the floppy
driver into the kernel at all, which is why most people never noticed.
Canceling the delayed work where we used to do the del_timer() fixes the
issue, and makes the floppy probing use the proper new timeout instead.
The three second timeout is still very wasteful, but better than the 20s
one.
Reported-and-tested-by: Andi Kleen <ak@linux.intel.com>
Reported-and-tested-by: Calvin Walton <calvin.walton@kepstin.ca>
Cc: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Tue, 3 Jul 2012 22:45:10 +0000 (15:45 -0700)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block bits from Jens Axboe:
"As vacation is coming up, thought I'd better get rid of my pending
changes in my for-linus branch for this iteration. It contains:
- Two patches for mtip32xx. Killing a non-compliant sysfs interface
and moving it to debugfs, where it belongs.
- A few patches from Asias. Two legit bug fixes, and one killing an
interface that is no longer in use.
- A patch from Jan, making the annoying partition ioctl warning a bit
less annoying, by restricting it to !CAP_SYS_RAWIO only.
- Three bug fixes for drbd from Lars Ellenberg.
- A fix for an old regression for umem, it hasn't really worked since
the plugging scheme was changed in 3.0.
- A few fixes from Tejun.
- A splice fix from Eric Dumazet, fixing an issue with pipe
resizing."
* 'for-linus' of git://git.kernel.dk/linux-block:
scsi: Silence unnecessary warnings about ioctl to partition
block: Drop dead function blk_abort_queue()
block: Mitigate lock unbalance caused by lock switching
block: Avoid missed wakeup in request waitqueue
umem: fix up unplugging
splice: fix racy pipe->buffers uses
drbd: fix null pointer dereference with on-congestion policy when diskless
drbd: fix list corruption by failing but already aborted reads
drbd: fix access of unallocated pages and kernel panic
xen/blkfront: Add WARN to deal with misbehaving backends.
blkcg: drop local variable @q from blkg_destroy()
mtip32xx: Create debugfs entries for troubleshooting
mtip32xx: Remove 'registers' and 'flags' from sysfs
blkcg: fix blkg_alloc() failure path
block: blkcg_policy_cfq shouldn't be used if !CONFIG_CFQ_GROUP_IOSCHED
block: fix return value on cfq_init() failure
mtip32xx: Remove version.h header file inclusion
xen/blkback: Copy id field when doing BLKIF_DISCARD.
Xiao Guangrong [Tue, 3 Jul 2012 06:31:43 +0000 (14:31 +0800)]
KVM: MMU: fix shrinking page from the empty mmu
Fix:
[ 3190.059226] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 3190.062224] IP: [<
ffffffffa02aac66>] mmu_page_zap_pte+0x10/0xa7 [kvm]
[ 3190.063760] PGD
104f50067 PUD
112bea067 PMD 0
[ 3190.065309] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
[ 3190.066860] CPU 1
[ ...... ]
[ 3190.109629] Call Trace:
[ 3190.111342] [<
ffffffffa02aada6>] kvm_mmu_prepare_zap_page+0xa9/0x1fc [kvm]
[ 3190.113091] [<
ffffffffa02ab2f5>] mmu_shrink+0x11f/0x1f3 [kvm]
[ 3190.114844] [<
ffffffffa02ab25d>] ? mmu_shrink+0x87/0x1f3 [kvm]
[ 3190.116598] [<
ffffffff81150c9d>] ? prune_super+0x142/0x154
[ 3190.118333] [<
ffffffff8110a4f4>] ? shrink_slab+0x39/0x31e
[ 3190.120043] [<
ffffffff8110a687>] shrink_slab+0x1cc/0x31e
[ 3190.121718] [<
ffffffff8110ca1d>] do_try_to_free_pages
This is caused by shrinking page from the empty mmu, although we have
checked n_used_mmu_pages, it is useless since the check is out of mmu-lock
Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Xiao Guangrong [Tue, 3 Jul 2012 06:31:01 +0000 (14:31 +0800)]
KVM: fix fault page leak
fault_page is forgot to be freed
Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Rajendra Nayak [Tue, 3 Jul 2012 06:41:41 +0000 (12:11 +0530)]
clk: fix parent validation in __clk_set_parent()
The below commit introduced a bug in __clk_set_parent()
which could cause it to *skip* the parent validation
which makes sure the parent passed to the api is a valid
one.
commit
7975059db572eb47f0fb272a62afeae272a4b209
Author: Rajendra Nayak <rnayak@ti.com>
Date: Wed Jun 6 14:41:31 2012 +0530
clk: Allow late cache allocation for clk->parents
This was identified by the following compiler warning..
drivers/clk/clk.c: In function '__clk_set_parent':
drivers/clk/clk.c:1083:5: warning: 'i' may be used uninitialized in this function [-Wuninitialized]
.. as reported by Marc Kleine-Budde.
There were various options discussed on how to fix this, one
being initing 'i' to clk->num_parents, but the below approach
was found to be more appropriate as it also makes the 'parent
validation' code simpler to read.
Reported-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Rajendra Nayak <rnayak@ti.com>
Signed-off-by: Mike Turquette <mturquette@linaro.org>
Cc: stable@kernel.org
Linus Torvalds [Tue, 3 Jul 2012 18:10:18 +0000 (11:10 -0700)]
Merge tag 'sound-3.5' of git://git./linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Just a few driver-specific fixes for ASoC and HD-audio."
* tag 'sound-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda - Fix no sound from ALC662 after Windows reboot
ASoC: tlv320aic3x: Fix codec pll configure bug
ASoC: wm2200: Add missing BCLK rate
Linus Torvalds [Tue, 3 Jul 2012 18:08:16 +0000 (11:08 -0700)]
Merge tag 'dm-3.5-fixes' of git://git./linux/kernel/git/agk/linux-dm
Pull device-mapper fixes from Alasdair G Kergon:
"Four minor thin provisioning fixes and correct and update dm-verity
documentation."
* tag 'dm-3.5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm:
dm: verity fix documentation
dm persistent data: fix allocation failure in space map checker init
dm persistent data: handle space map checker creation failure
dm persistent data: fix shadow_info_leak on dm_tm_destroy
dm thin: commit metadata before creating metadata snapshot
Linus Torvalds [Tue, 3 Jul 2012 17:59:37 +0000 (10:59 -0700)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"One regression fix, two radeon fixes (one for an oops), and an i915
fix to unload framebuffers earlier.
We originally were going to leave the i915 fix until -next, but grub2
in some situations causes vesafb/efifb to be loaded now, and this
causes big slowdowns, and I have reports in rawhide I'd like to have
fixed."
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm/i915: kick any firmware framebuffers before claiming the gtt
drm: edid: Don't add inferred modes with higher resolution
drm/radeon: fix rare segfault
drm/radeon: fix VM page table setup on SI
Jeff Layton [Mon, 2 Jul 2012 11:24:25 +0000 (07:24 -0400)]
cifs: when server doesn't set CAP_LARGE_READ_X, cap default rsize at MaxBufferSize
When the server doesn't advertise CAP_LARGE_READ_X, then MS-CIFS states
that you must cap the size of the read at the client's MaxBufferSize.
Unfortunately, testing with many older servers shows that they often
can't service a read larger than their own MaxBufferSize.
Since we can't assume what the server will do in this situation, we must
be conservative here for the default. When the server can't do large
reads, then assume that it can't satisfy any read larger than its
MaxBufferSize either.
Luckily almost all modern servers can do large reads, so this won't
affect them. This is really just for older win9x and OS/2 era servers.
Also, note that this patch just governs the default rsize. The admin can
always override this if he so chooses.
Cc: <stable@vger.kernel.org> # 3.2
Reported-by: David H. Durgee <dhdurgee@acm.org>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steven French <sfrench@w500smf.(none)>
Linus Torvalds [Tue, 3 Jul 2012 17:40:43 +0000 (10:40 -0700)]
Merge tag 'md-3.5-fixes' of git://neil.brown.name/md
Pull md fixes from NeilBrown:
"md: collection of bug fixes for 3.5
You go away for 2 weeks vacation and what do you get when you come
back? Piles of bugs :-)
Some found by inspection, some by testing, some during use in the
field, and some while developing for the next window..."
* tag 'md-3.5-fixes' of git://neil.brown.name/md:
md: fix up plugging (again).
md: support re-add of recovering devices.
md/raid1: fix bug in read_balance introduced by hot-replace
raid5: delayed stripe fix
md/raid456: When read error cannot be recovered, record bad block
md: make 'name' arg to md_register_thread non-optional.
md/raid10: fix failure when trying to repair a read error.
md/raid5: fix refcount problem when blocked_rdev is set.
md:Add blk_plug in sync_thread.
md/raid5: In ops_run_io, inc nr_pending before calling md_wait_for_blocked_rdev
md/raid5: Do not add data_offset before call to is_badblock
md/raid5: prefer replacing failed devices over want-replacement devices.
md/raid10: Don't try to recovery unmatched (and unused) chunks.
Linus Torvalds [Tue, 3 Jul 2012 17:39:40 +0000 (10:39 -0700)]
Merge branch 'for-linus2' of git://git./linux/kernel/git/jmorris/linux-security
Pull security layer fixes from James Morris.
A documentation update, and a nommu build fix.
* 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
security: Fix nommu build.
security: document no_new_privs
Milan Broz [Tue, 3 Jul 2012 11:55:41 +0000 (12:55 +0100)]
dm: verity fix documentation
Veritysetup is now part of cryptsetup package.
Remove on-disk header description (which is not parsed in kernel)
and point users to cryptsetup where it the format is documented.
Mention units for block size paramaters.
Fix target line specification and dmsetup parameters.
Signed-off-by: Milan Broz <mbroz@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mike Snitzer [Tue, 3 Jul 2012 11:55:37 +0000 (12:55 +0100)]
dm persistent data: fix allocation failure in space map checker init
If CONFIG_DM_DEBUG_SPACE_MAPS is enabled and memory is fragmented and a
sufficiently-large metadata device is used in a thin pool then the space
map checker will fail to allocate the memory it requires.
Switch from kmalloc to vmalloc to allow larger virtually contiguous
allocations for the space map checker's internal count arrays.
Reported-by: Vivek Goyal <vgoyal@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mike Snitzer [Tue, 3 Jul 2012 11:55:35 +0000 (12:55 +0100)]
dm persistent data: handle space map checker creation failure
If CONFIG_DM_DEBUG_SPACE_MAPS is enabled and dm_sm_checker_create()
fails, dm_tm_create_internal() would still return success even though it
cleaned up all resources it was supposed to have created. This will
lead to a kernel crash:
general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
...
RIP: 0010:[<
ffffffff81593659>] [<
ffffffff81593659>] dm_bufio_get_block_size+0x9/0x20
Call Trace:
[<
ffffffff81599bae>] dm_bm_block_size+0xe/0x10
[<
ffffffff8159b8b8>] sm_ll_init+0x78/0xd0
[<
ffffffff8159c1a6>] sm_ll_new_disk+0x16/0xa0
[<
ffffffff8159c98e>] dm_sm_disk_create+0xfe/0x160
[<
ffffffff815abf6e>] dm_pool_metadata_open+0x16e/0x6a0
[<
ffffffff815aa010>] pool_ctr+0x3f0/0x900
[<
ffffffff8158d565>] dm_table_add_target+0x195/0x450
[<
ffffffff815904c4>] table_load+0xe4/0x330
[<
ffffffff815917ea>] ctl_ioctl+0x15a/0x2c0
[<
ffffffff81591963>] dm_ctl_ioctl+0x13/0x20
[<
ffffffff8116a4f8>] do_vfs_ioctl+0x98/0x560
[<
ffffffff8116aa51>] sys_ioctl+0x91/0xa0
[<
ffffffff81869f52>] system_call_fastpath+0x16/0x1b
Fix the space map checker code to return an appropriate ERR_PTR and have
dm_sm_disk_create() and dm_tm_create_internal() check for it with
IS_ERR.
Reported-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mike Snitzer [Tue, 3 Jul 2012 11:55:33 +0000 (12:55 +0100)]
dm persistent data: fix shadow_info_leak on dm_tm_destroy
Cleanup the shadow table before destroying the transaction manager.
Reference: leak was identified with kmemleak when running
test_discard_random_sectors in the thinp-test-suite.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Joe Thornber [Tue, 3 Jul 2012 11:55:31 +0000 (12:55 +0100)]
dm thin: commit metadata before creating metadata snapshot
Userland sometimes sees a corrupt metadata block if metadata is changing
rapidly when a metadata snapshot is reserved for userland, To make the
problem go away, commit before we take the metadata snapshot (which is a
sensible thing to do anyway).
The checksums mean userland spots this corruption immediately so there's
no risk of acting on incorrect data. No corruption exists from the
kernel's point of view, and thin_check passes after pool shutdown.
I believe this is to do with shared blocks at the first level of the
{device, mapping} btree. Prior to the metadata-snap support no sharing
at this level was possible, so this patch is only required after commit
cc8394d86f045b86ff303d3c9e4ce47d97148951 ("dm thin: provide userspace
access to pool metadata").
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Paul Mundt [Mon, 2 Jul 2012 05:34:11 +0000 (14:34 +0900)]
security: Fix nommu build.
The security + nommu configuration presently blows up with an undefined
reference to BDI_CAP_EXEC_MAP:
security/security.c: In function 'mmap_prot':
security/security.c:687:36: error: dereferencing pointer to incomplete type
security/security.c:688:16: error: 'BDI_CAP_EXEC_MAP' undeclared (first use in this function)
security/security.c:688:16: note: each undeclared identifier is reported only once for each function it appears in
include backing-dev.h directly to fix it up.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Daniel Vetter [Sun, 1 Jul 2012 15:09:42 +0000 (17:09 +0200)]
drm/i915: kick any firmware framebuffers before claiming the gtt
Especially vesafb likes to map everything as uc- (yikes), and if that
mapping hangs around still while we try to map the gtt as wc the
kernel will downgrade our request to uc-, resulting in abyssal
performance.
Unfortunately we can't do this as early as readon does (i.e. as the
first thing we do when initializing the hw) because our fb/mmio space
region moves around on a per-gen basis. So I've had to move it below
the gtt initialization, but that seems to work, too. The important
thing is that we do this before we set up the gtt wc mapping.
Now an altogether different question is why people compile their
kernels with vesafb enabled, but I guess making things just work isn't
bad per se ...
v2:
- s/radeondrmfb/inteldrmfb/
- fix up error handling
v3: Kill #ifdef X86, this is Intel after all. Noticed by Ben Widawsky.
v4: Jani Nikula complained about the pointless bool primary
initialization.
v5: Don't oops if we can't allocate, noticed by Chris Wilson.
v6: Resolve conflicts with agp rework and fixup whitespace.
This is commit
e188719a2891f01b3100d in drm-next.
Backport to 3.5 -fixes queue requested by Dave Airlie - due to grub
using vesa on fedora their initrd seems to load vesafb before loading
the real kms driver. So tons more people actually experience a
dead-slow gpu. Hence also the Cc: stable.
Cc: stable@vger.kernel.org
Reported-and-tested-by: "Kilarski, Bernard R" <bernard.r.kilarski@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Takashi Iwai [Tue, 3 Jul 2012 09:22:11 +0000 (11:22 +0200)]
drm: edid: Don't add inferred modes with higher resolution
When a monitor EDID doesn't give the preferred bit, driver assumes
that the mode with the higest resolution and rate is the preferred
mode. Meanwhile the recent changes for allowing more modes in the
GFT/CVT ranges give actually more modes, and some modes may be over
the native size. Thus such a mode would be picked up as the preferred
mode although it's no native resolution.
For avoiding such a problem, this patch limits the addition of
inferred modes by checking not to be greater than other modes.
Also, it checks the duplicated mode entry at the same time.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Jerome Glisse [Mon, 2 Jul 2012 16:40:54 +0000 (12:40 -0400)]
drm/radeon: fix rare segfault
In gem idle/busy ioctl the radeon object was derefenced after
drm_gem_object_unreference_unlocked which in case the object
have been destroyed lead to use of a possibly free pointer with
possibly wrong data.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
NeilBrown [Tue, 3 Jul 2012 07:45:31 +0000 (17:45 +1000)]
md: fix up plugging (again).
The value returned by "mddev_check_plug" is only valid until the
next 'schedule' as that will unplug things. This could happen at any
call to mempool_alloc.
So just calling mddev_check_plug at the start doesn't really make
sense.
So call it just before, or just after, queuing things for the thread.
As the action that happens at unplug is to wake the thread, this makes
lots of sense.
If we cannot add a plug (which requires a small GFP_ATOMIC alloc) we
wake thread immediately.
RAID5 is a bit different. Requests are queued for the thread and the
thread is woken by release_stripe. So we don't need to wake the
thread on failure.
However the thread doesn't perform certain actions when there is any
active plug, so it is important to install a plug before waking the
thread. So for RAID5 we install the plug *before* queuing the request
and waking the thread.
Without this patch it is possible for raid1 or raid10 to queue a
request without then waking the thread, resulting in the array locking
up.
Also change raid10 to only flush_pending_write when there are not
active plugs, just like raid1.
This patch is suitable for 3.0 or later. I plan to submit it to
-stable, but I'll like to let it spend a few weeks in mainline
first to be sure it is completely safe.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Tue, 3 Jul 2012 05:59:06 +0000 (15:59 +1000)]
md: support re-add of recovering devices.
We currently only allow a device to be re-added if it appear to be
in-sync. This is overly restrictive as it may be desirable to re-add
a device that is in the middle of recovery.
So remove the test for "InSync" - the test on rdev->raid_disk is
sufficient to ensure that the re-add will succeed.
Reported-by: Alexander Lyakas <alex.bolshoy@gmail.com>
Tested-by: Alexander Lyakas <alex.bolshoy@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Tue, 3 Jul 2012 05:58:42 +0000 (15:58 +1000)]
md/raid1: fix bug in read_balance introduced by hot-replace
When we added hot_replace we doubled the number of devices
that could be in a RAID1 array. So we doubled how far read_balance
would search. Unfortunately we didn't double the point at which
it looped back to the beginning - so it effectively loops over
all non-replacement disks twice.
This doesn't cause bad behaviour, but it pointless and means we
never read from replacement devices.
Signed-off-by: NeilBrown <neilb@suse.de>
Shaohua Li [Tue, 3 Jul 2012 05:57:19 +0000 (15:57 +1000)]
raid5: delayed stripe fix
There isn't locking setting STRIPE_DELAYED and STRIPE_PREREAD_ACTIVE bits, but
the two bits have relationship. A delayed stripe can be moved to hold list only
when preread active stripe count is below IO_THRESHOLD. If a stripe has both
the bits set, such stripe will be in delayed list and preread count not 0,
which will make such stripe never leave delayed list.
Signed-off-by: Shaohua Li <shli@fusionio.com>
Signed-off-by: NeilBrown <neilb@suse.de>
majianpeng [Tue, 3 Jul 2012 05:57:02 +0000 (15:57 +1000)]
md/raid456: When read error cannot be recovered, record bad block
We may not be able to fix a bad block if:
- the array is degraded
- the over-write fails.
In these cases we currently eject the device, but we should
record a bad block if possible.
Signed-off-by: majianpeng <majianpeng@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Tue, 3 Jul 2012 05:56:52 +0000 (15:56 +1000)]
md: make 'name' arg to md_register_thread non-optional.
Having the 'name' arg optional and defaulting to the current
personality name is no necessary and leads to errors, as when
changing the level of an array we can end up using the
name of the old level instead of the new one.
So make it non-optional and always explicitly pass the name
of the level that the array will be.
Reported-by: majianpeng <majianpeng@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Tue, 3 Jul 2012 05:55:33 +0000 (15:55 +1000)]
md/raid10: fix failure when trying to repair a read error.
commit
58c54fcca3bac5bf9290cfed31c76e4c4bfbabaf
md/raid10: handle further errors during fix_read_error better.
in 3.1 added "r10_sync_page_io" which takes an IO size in sectors.
But we were passing the IO size in bytes!!!
This resulting in bio_add_page failing, and empty request being sent
down, and a consequent BUG_ON in scsi_lib.
[fix missing space in error message at same time]
This fix is suitable for 3.1.y and later.
Cc: stable@vger.kernel.org
Reported-by: Christian Balzer <chibi@gol.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Linus Torvalds [Tue, 3 Jul 2012 02:52:25 +0000 (19:52 -0700)]
Merge branch 'merge' of git://git./linux/kernel/git/benh/powerpc
Pull a couple more powerpc fixes from Benjamin Herrenschmidt:
"Here are two more fixes that I "missed" when scrubbing patchwork last
week which are worth still having in 3.5."
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
powerpc/kvm: sldi should be sld
powerpc/xmon: Use cpumask iterator to avoid warning
Andy Lutomirski [Mon, 2 Jul 2012 21:03:58 +0000 (14:03 -0700)]
security: document no_new_privs
Document no_new_privs.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: James Morris <james.l.morris@oracle.com>
NeilBrown [Tue, 3 Jul 2012 02:13:29 +0000 (12:13 +1000)]
md/raid5: fix refcount problem when blocked_rdev is set.
commit
43220aa0f22cd3ce5b30246d50ccd696d119edea
md/raid5: fix a hang on device failure.
fixed a hang, but introduced a refcounting in-balance so
that if the presence of bad-blocks ever caused an rdev to
be 'blocked' we would increment the refcount on the rdev and
never decrement it.
So added the needed rdev_dec_pending when md_wait_for_blocked_rdev
is not called.
Reported-by: majianpeng <majianpeng@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
majianpeng [Tue, 3 Jul 2012 02:12:26 +0000 (12:12 +1000)]
md:Add blk_plug in sync_thread.
Add blk_plug in sync_thread will increase the performance of sync.
Because sync_thread did not blk_plug,so when raid sync, the bio merge
not well.
Testing environment:
SATA controller: Intel Corporation 82801JI (ICH10 Family) SATA AHCI
Controller.
OS:Linux xxx 3.5.0-rc2+ #340 SMP Tue Jun 12 09:00:25 CST 2012
x86_64 x86_64 x86_64 GNU/Linux.
RAID5: four ST31000524NS disk.
Without blk_plug:recovery speed about 63M/Sec;
Add blk_plug:recovery speed about 120M/Sec.
Using blktrace:
blktrace -d /dev/sdb -w 60 -o -|blkparse -i -
without blk_plug:
Total (8,16):
Reads Queued: 309811, 1239MiB Writes Queued: 0, 0KiB
Read Dispatches: 283583, 1189MiB Write Dispatches: 0, 0KiB
Reads Requeued: 0 Writes Requeued: 0
Reads Completed: 273351, 1149MiB Writes Completed: 0, 0KiB
Read Merges: 23533, 94132KiB Write Merges: 0, 0KiB
IO unplugs: 0 Timer unplugs: 0
add blk_plug:
Total (8,16):
Reads Queued: 428697, 1714MiB Writes Queued: 0, 0KiB
Read Dispatches: 3954, 1714MiB Write Dispatches: 0, 0KiB
Reads Requeued: 0 Writes Requeued: 0
Reads Completed: 3956, 1715MiB Writes Completed: 0, 0KiB
Read Merges: 424743, 1698MiB Write Merges: 0, 0KiB
IO unplugs: 0 Timer unplugs: 3384
The ratio of merge will be markedly increased.
Signed-off-by: majianpeng <majianpeng@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
majianpeng [Tue, 3 Jul 2012 02:11:54 +0000 (12:11 +1000)]
md/raid5: In ops_run_io, inc nr_pending before calling md_wait_for_blocked_rdev
In ops_run_io(), the call to md_wait_for_blocked_rdev will decrement
nr_pending so we lose the reference we hold on the rdev.
So atomic_inc it first to maintain the reference.
This bug was introduced by commit
73e92e51b7969ef5477d
md/raid5. Don't write to known bad block on doubtful devices.
which appeared in 3.0, so patch is suitable for stable kernels since
then.
Cc: stable@vger.kernel.org
Signed-off-by: majianpeng <majianpeng@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
majianpeng [Tue, 12 Jun 2012 00:31:10 +0000 (08:31 +0800)]
md/raid5: Do not add data_offset before call to is_badblock
In chunk_aligned_read() we are adding data_offset before calling
is_badblock. But is_badblock also adds data_offset, so that is bad.
So move the addition of data_offset to after the call to
is_badblock.
This bug was introduced by commit
31c176ecdf3563140e639
md/raid5: avoid reading from known bad blocks.
which first appeared in 3.0. So that patch is suitable for any
-stable kernel from 3.0.y onwards. However it will need minor
revision for most of those (as the comment didn't appear until
recently).
Cc: stable@vger.kernel.org
Signed-off-by: majianpeng <majianpeng@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Tue, 3 Jul 2012 01:46:53 +0000 (11:46 +1000)]
md/raid5: prefer replacing failed devices over want-replacement devices.
If a RAID5 has both a failed device and a device marked as
'WantReplacement', then we should preferentially replace the failed
device.
However the current code replaces whichever is found first.
So split into 2 loops, check fail failed/missing first, and only check
for WantReplacement if nothing is failed or missing.
Reported-by: majianpeng <majianpeng@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Tue, 3 Jul 2012 00:37:30 +0000 (10:37 +1000)]
md/raid10: Don't try to recovery unmatched (and unused) chunks.
If a RAID10 has an odd number of chunks - as might happen when there
are an odd number of devices - the last chunk has no pair and so is
not mirrored. We don't store data there, but when recovering the last
device in an array we retry to recover that last chunk from a
non-existent location. This results in an error, and the recovery
aborts.
When we get to that last chunk we should just stop - there is nothing
more to do anyway.
This bug has been present since the introduction of RAID10, so the
patch is appropriate for any -stable kernel.
Cc: stable@vger.kernel.org
Reported-by: Christian Balzer <chibi@gol.com>
Tested-by: Christian Balzer <chibi@gol.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Alex Williamson [Fri, 29 Jun 2012 15:56:24 +0000 (09:56 -0600)]
KVM: Sanitize KVM_IRQFD flags
We only know of one so far.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Alex Williamson [Fri, 29 Jun 2012 15:56:16 +0000 (09:56 -0600)]
KVM: Add missing KVM_IRQFD API documentation
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Alex Williamson [Fri, 29 Jun 2012 15:56:08 +0000 (09:56 -0600)]
KVM: Pass kvm_irqfd to functions
Prune this down to just the struct kvm_irqfd so we can avoid
changing function definition for every flag or field we use.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Arnd Bergmann [Mon, 2 Jul 2012 21:14:29 +0000 (23:14 +0200)]
Merge branch 'fixes' of git://github.com/hzhuang1/linux into fixes
* 'fixes' of git://github.com/hzhuang1/linux:
ARM: pxa: hx4700: Fix basic suspend/resume
Chris Mason [Mon, 2 Jul 2012 19:29:53 +0000 (15:29 -0400)]
Btrfs: run delayed directory updates during log replay
While we are resolving directory modifications in the
tree log, we are triggering delayed metadata updates to
the filesystem btrees.
This commit forces the delayed updates to run so the
replay code can find any modifications done. It stops
us from crashing because the directory deleltion replay
expects items to be removed immediately from the tree.
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
cc: stable@kernel.org
Josef Bacik [Wed, 27 Jun 2012 21:18:41 +0000 (17:18 -0400)]
Btrfs: hold a ref on the inode during writepages
We can race with unlink and not actually be able to do our igrab in
btrfs_add_ordered_extent. This will result in all sorts of problems.
Instead of doing the complicated work to try and handle returning an error
properly from btrfs_add_ordered_extent, just hold a ref to the inode during
writepages. If we cannot grab a ref we know we're freeing this inode anyway
and can just drop the dirty pages on the floor, because screw them we're
going to invalidate them anyway. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Josef Bacik [Wed, 27 Jun 2012 19:10:56 +0000 (15:10 -0400)]
Btrfs: fix tree log remove space corner case
The tree log stuff can have allocated space that we end up having split
across a bitmap and a real extent. The free space code does not deal with
this, it assumes that if it finds an extent or bitmap entry that the entire
range must fall within the entry it finds. This isn't necessarily the case,
so rework the remove function so it can handle this case properly. This
fixed two panics the user hit, first in the case where the space was
initially in a bitmap and then in an extent entry, and then the reverse
case. Thanks,
Reported-and-tested-by: Shaun Reich <sreich@kde.org>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Liu Bo [Tue, 26 Jun 2012 03:59:09 +0000 (21:59 -0600)]
Btrfs: fix wrong check during log recovery
When we're evicting an inode during log recovery, we need to ensure that the inode
is not in orphan state any more, which means inode's run_time flags has _no_
BTRFS_INODE_HAS_ORPHAN_ITEM. Thus, the BUG_ON was triggered because of a wrong
check for the flags.
Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Alexander Block [Mon, 25 Jun 2012 21:36:12 +0000 (15:36 -0600)]
Btrfs: use _IOR for BTRFS_IOC_SUBVOL_GETFLAGS
We used the wrong ioctl macro for the getflags ioctl before.
As we don't have the set/getflags ioctls in the user space ioctl.h
at the moment, it's safe to fix it now.
Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Alexander Block <ablock84@googlemail.com>
Ilya Dryomov [Fri, 22 Jun 2012 18:24:13 +0000 (12:24 -0600)]
Btrfs: resume balance on rw (re)mounts properly
This introduces btrfs_resume_balance_async(), which, given that
restriper state was recovered earlier by btrfs_recover_balance(),
resumes balance in btrfs-balance kthread.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Ilya Dryomov [Fri, 22 Jun 2012 18:24:12 +0000 (12:24 -0600)]
Btrfs: restore restriper state on all mounts
Fix a bug that triggered asserts in btrfs_balance() in both normal and
resume modes -- restriper state was not properly restored on read-only
mounts. This factors out resuming code from btrfs_restore_balance(),
which is now also called earlier in the mount sequence to avoid the
problem of some early writes getting the old profile.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Josef Bacik [Tue, 19 Jun 2012 14:59:00 +0000 (10:59 -0400)]
Btrfs: fix dio write vs buffered read race
Miao pointed out there's a problem with mixing dio writes and buffered
reads. If the read happens between us invalidating the page range and
actually locking the extent we can bring in pages into page cache. Then
once the write finishes if somebody tries to read again it will just find
uptodate pages and we'll read stale data. So we need to lock the extent and
check for uptodate bits in the range. If there are uptodate bits we need to
unlock and invalidate again. This will keep this race from happening since
we will hold the extent locked until we create the ordered extent, and then
teh read side always waits for ordered extents. There was also a race in
how we updated i_size, previously we were relying on the generic DIO stuff
to adjust the i_size after the DIO had completed, but this happens outside
of the extent lock which means reads could come in and not see the updated
i_size. So instead move this work into where we create the extents, and
then this way the update ordered i_size stuff works properly in the endio
handlers. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
Stefan Behrens [Thu, 14 Jun 2012 14:42:31 +0000 (16:42 +0200)]
Btrfs: don't count I/O statistic read errors for missing devices
It is normal behaviour of the low level btrfs function btrfs_map_bio()
to complete a bio with -EIO if the device is missing, instead of just
preventing the bio creation in an earlier step.
This used to cause I/O statistic read error increments and annoying
printk_ratelimited messages. This commit fixes the issue.
Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
Reported-by: Carey Underwood <cwillu@cwillu.com>