Linus Torvalds [Fri, 4 Oct 2013 16:04:26 +0000 (09:04 -0700)]
Merge tag 'arm64-stable' of git://git./linux/kernel/git/cmarinas/linux-aarch64
Pull ARM64 fixes/updates from Catalin Marinas:
- Bug-fixes (get_user/put_user, incorrect register width for ASID,
FPSIMD initialisation)
- Kconfig clean-up
- defconfig update
* tag 'arm64-stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-aarch64:
arm64: Remove duplicate DEBUG_STACK_USAGE config
arm64: include VIRTIO_{MMIO,BLK} in defconfig
arm64: include EXT4 in defconfig
arm64: fix possible invalid FPSIMD initialization state
arm64: use correct register width when retrieving ASID
arm64: avoid multiple evaluation of ptr in get_user/put_user()
Linus Torvalds [Fri, 4 Oct 2013 16:03:51 +0000 (09:03 -0700)]
Merge branch 'upstream' of git://git.linux-mips.org/ralf/upstream-linus
Pull MIPS fixes from Ralf Baechle:
"Two small fixes for 3.12 only this week. I have a few more fixes
pending but those are conceptually more complex so will have to wait
for a bit longer"
* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
MIPS: Fix forgotten preempt_enable() when CPU has inclusive pcaches
MIPS: Alchemy: MTX-1: fix incorrect placement of __initdata tag
Linus Torvalds [Fri, 4 Oct 2013 16:03:07 +0000 (09:03 -0700)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"Two simplefb fixes"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/simplefb: Mark framebuffer mem-resources as IORESOURCE_BUSY to avoid bootup warning
x86/simplefb: Fix overflow causing bogus fall-back
Linus Torvalds [Fri, 4 Oct 2013 16:02:35 +0000 (09:02 -0700)]
Merge branch 'irq-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull irq fix from Ingo Molnar:
"Frederic's minimal fix for hardirq/softirq nesting crashes"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irq: Force hardirq exit's softirq processing on its own stack
Linus Torvalds [Thu, 3 Oct 2013 15:56:41 +0000 (08:56 -0700)]
Merge git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm NULL deref fix from Gleb Natapov.
* git://git.kernel.org/pub/scm/virt/kvm/kvm:
Fix NULL dereference in gfn_to_hva_prot()
Linus Torvalds [Thu, 3 Oct 2013 15:55:50 +0000 (08:55 -0700)]
Merge branch 'for-curr' of git://git./linux/kernel/git/vgupta/arc
Pull ARC fix from Vineet Gupta:
"Chrisitian found/fixed issue with SA_SIGINFO based signal handler
corrupting the user space registers post after signal handling"
* 'for-curr' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
ARC: Fix signal frame management for SA_SIGINFO
Linus Torvalds [Thu, 3 Oct 2013 15:54:39 +0000 (08:54 -0700)]
Merge branch 'merge' of git://git./linux/kernel/git/benh/powerpc
Pull powerpc fixes from Ben Herrenschmidt:
"Here are a few powerpc fixes, all aimed at -stable, found in part
thanks to the ramping up of a major distro testing and in part thanks
to the LE guys hitting all sort interesting corner cases.
The most scary are probably the register clobber issues in
csum_partial_copy_generic(), especially since Anton even had a test
case for that thing, which didn't manage to hit the bugs :-)
Another highlight is that memory hotplug should work again with these
fixes.
Oh and the vio modalias one is worse than the cset implies as it
upsets distro installers, so I've been told at least, which is why I'm
shooting it to stable"
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
powerpc/tm: Switch out userspace PPR and DSCR sooner
powerpc/tm: Turn interrupts hard off in tm_reclaim()
powerpc/perf: Fix handling of FAB events
powerpc/vio: Fix modalias_show return values
powerpc/iommu: Use GFP_KERNEL instead of GFP_ATOMIC in iommu_init_table()
powerpc/sysfs: Disable writing to PURR in guest mode
powerpc: Restore registers on error exit from csum_partial_copy_generic()
powerpc: Fix parameter clobber in csum_partial_copy_generic()
powerpc: Fix memory hotplug with sparse vmemmap
Michael Neuling [Thu, 26 Sep 2013 03:29:09 +0000 (13:29 +1000)]
powerpc/tm: Switch out userspace PPR and DSCR sooner
When we do a treclaim or trecheckpoint we end up running with userspace
PPR and DSCR values. Currently we don't do anything special to avoid
running with user values which could cause a severe performance
degradation.
This patch moves the PPR and DSCR save and restore around treclaim and
trecheckpoint so that we run with user values for a much shorter period.
More care is taken with the PPR as it's impact is greater than the DSCR.
This is similar to user exceptions, where we run HTM_MEDIUM early to
ensure that we don't run with a userspace PPR values in the kernel.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Cc: <stable@vger.kernel.org> # 3.9+
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Michael Neuling [Wed, 2 Oct 2013 07:15:15 +0000 (17:15 +1000)]
powerpc/tm: Turn interrupts hard off in tm_reclaim()
We can't take IRQs in tm_reclaim as we might have a bogus r13 and r1.
This turns IRQs hard off in this function.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Cc: <stable@vger.kernel.org> # 3.9+
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Michael Ellerman [Wed, 2 Oct 2013 08:04:06 +0000 (18:04 +1000)]
powerpc/perf: Fix handling of FAB events
Commit
4df4899 "Add power8 EBB support" included a bug in the handling
of the FAB_CRESP_MATCH and FAB_TYPE_MATCH fields.
These values are pulled out of the event code using EVENT_THR_CTL_SHIFT,
however we were then or'ing that value directly into MMCR1.
This meant we were failing to set the FAB fields correctly, and also
potentially corrupting the value for PMC4SEL. Leading to no counts for
the FAB events and incorrect counts for PMC4.
The fix is simply to shift left the FAB value correctly before or'ing it
with MMCR1.
Reported-by: Sooraj Ravindran Nair <soonair3@in.ibm.com>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Cc: <stable@vger.kernel.org> # 3.10+
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Prarit Bhargava [Mon, 23 Sep 2013 13:33:36 +0000 (09:33 -0400)]
powerpc/vio: Fix modalias_show return values
modalias_show() should return an empty string on error, not -ENODEV.
This causes the following false and annoying error:
> find /sys/devices -name modalias -print0 | xargs -0 cat >/dev/null
cat: /sys/devices/vio/4000/modalias: No such device
cat: /sys/devices/vio/4001/modalias: No such device
cat: /sys/devices/vio/4002/modalias: No such device
cat: /sys/devices/vio/4004/modalias: No such device
cat: /sys/devices/vio/modalias: No such device
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: <stable@vger.kernel.org>
Nishanth Aravamudan [Tue, 1 Oct 2013 21:04:53 +0000 (14:04 -0700)]
powerpc/iommu: Use GFP_KERNEL instead of GFP_ATOMIC in iommu_init_table()
Under heavy (DLPAR?) stress, we tripped this panic() in
arch/powerpc/kernel/iommu.c::iommu_init_table():
page = alloc_pages_node(nid, GFP_ATOMIC, get_order(sz));
if (!page)
panic("iommu_init_table: Can't allocate %ld bytes\n", sz);
Before the panic() we got a page allocation failure for an order-2
allocation. There appears to be memory free, but perhaps not in the
ATOMIC context. I looked through all the call-sites of
iommu_init_table() and didn't see any obvious reason to need an ATOMIC
allocation. Most call-sites in fact have an explicit GFP_KERNEL
allocation shortly before the call to iommu_init_table(), indicating we
are not in an atomic context. There is some indirection for some paths,
but I didn't see any locks indicating that GFP_KERNEL is inappropriate.
With this change under the same conditions, we have not been able to
reproduce the panic.
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: <stable@vger.kernel.org>
Madhavan Srinivasan [Tue, 1 Oct 2013 19:04:10 +0000 (00:34 +0530)]
powerpc/sysfs: Disable writing to PURR in guest mode
arch/powerpc/kernel/sysfs.c exports PURR with write permission.
This may be valid for kernel in phyp mode. But writing to
the file in guest mode causes crash due to a priviledge violation
Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: <stable@vger.kernel.org>
Paul E. McKenney [Tue, 1 Oct 2013 07:11:35 +0000 (17:11 +1000)]
powerpc: Restore registers on error exit from csum_partial_copy_generic()
The csum_partial_copy_generic() function saves the PowerPC non-volatile
r14, r15, and r16 registers for the main checksum-and-copy loop.
Unfortunately, it fails to restore them upon error exit from this loop,
which results in silent corruption of these registers in the presumably
rare event of an access exception within that loop.
This commit therefore restores these register on error exit from the loop.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: stable@vger.kernel.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Paul E. McKenney [Tue, 1 Oct 2013 06:54:05 +0000 (16:54 +1000)]
powerpc: Fix parameter clobber in csum_partial_copy_generic()
The csum_partial_copy_generic() uses register r7 to adjust the remaining
bytes to process. Unfortunately, r7 also holds a parameter, namely the
address of the flag to set in case of access exceptions while reading
the source buffer. Lacking a quantum implementation of PowerPC, this
commit instead uses register r9 to do the adjusting, leaving r7's
pointer uncorrupted.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: stable@vger.kernel.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Nathan Fontenot [Fri, 27 Sep 2013 15:18:09 +0000 (10:18 -0500)]
powerpc: Fix memory hotplug with sparse vmemmap
Previous commit
46723bfa540... introduced a new config option
HAVE_BOOTMEM_INFO_NODE that ended up breaking memory hot-remove for ppc
when sparse vmemmap is not defined.
This patch defines HAVE_BOOTMEM_INFO_NODE for ppc and adds the call to
register_page_bootmem_info_node. Without this we get a BUG_ON for memory
hot remove in put_page_bootmem().
This also adds a stub for register_page_bootmem_memmap to allow ppc to build
with sparse vmemmap defined. Leaving this as a stub is fine since the same
vmemmap addresses are also handled in vmemmap_populate and as such are
properly mapped.
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: <stable@vger.kernel.org> [v3.9+]
Gleb Natapov [Tue, 1 Oct 2013 16:58:36 +0000 (19:58 +0300)]
Fix NULL dereference in gfn_to_hva_prot()
gfn_to_memslot() can return NULL or invalid slot. We need to check slot
validity before accessing it.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
David Herrmann [Wed, 2 Oct 2013 14:41:04 +0000 (16:41 +0200)]
x86/simplefb: Mark framebuffer mem-resources as IORESOURCE_BUSY to avoid bootup warning
IORESOURCE_BUSY is used to mark temporary driver mem-resources
instead of global regions. This suppresses warnings if regions
overlap with a region marked as BUSY.
This was always the case for VESA/VGA/EFI framebuffer regions so
do the same for simplefb regions. The reason we do this is to
allow device handover to real GPU drivers like
i915/radeon/nouveau which get the same regions via PCI BARs.
Maybe at some point we will be able to unregister platform
devices properly during the handover. In this case the simplefb
region would get removed before the new region is created.
However, this is currently not the case and would require rather
huge changes in remove_conflicting_framebuffers(). Add the BUSY
marker now and try to eventually rewrite the handover for a next release.
Also see kernel/resource.c for more information:
/*
* if a resource is "BUSY", it's not a hardware resource
* but a driver mapping of such a resource; we don't want
* to warn for those; some drivers legitimately map only
* partial hardware resources. (example: vesafb)
*/
This suppresses warnings like:
------------[ cut here ]------------
WARNING: CPU: 2 PID: 199 at arch/x86/mm/ioremap.c:171 __ioremap_caller+0x2e3/0x390()
Info: mapping multiple BARs. Your kernel is fine.
Call Trace:
dump_stack+0x54/0x8d
warn_slowpath_common+0x7d/0xa0
warn_slowpath_fmt+0x4c/0x50
iomem_map_sanity_check+0xac/0xe0
__ioremap_caller+0x2e3/0x390
ioremap_wc+0x32/0x40
i915_driver_load+0x670/0xf50 [i915]
...
Reported-by: Tom Gundersen <teg@jklm.no>
Tested-by: Tom Gundersen <teg@jklm.no>
Tested-by: Pavel Roskin <proski@gnu.org>
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Link: http://lkml.kernel.org/r/1380724864-1757-1-git-send-email-dh.herrmann@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Linus Torvalds [Thu, 3 Oct 2013 04:48:32 +0000 (21:48 -0700)]
Merge tag 'fixes-for-linus' of git://git./linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Olof Johansson:
"We have a fairly large batch of fixes this time around, mostly just
due to various platforms all having a fix or two more than usual.
Worth pointing out are:
- A fix for EDMA on Davinci/OMAP where channel allocation broke with
the DT conversion. Due to some miscommunication we didn't
understand the impact of the breakage, so we were pushing back on
it for 3.12, but it sounds like it's actually breaking quite a few
people out there.
- A bunch of fixes for Marvell platforms, some straggling fixes for
merge window fallout and some fixes for a couple of the platforms
(Netgear RN102 in particular).
- A fix for a race between multi-cluster power management and cpu
hotplug on Versatile Express.
And a bunch of other smaller fixes that all add up.
We'll be switching over into stricter regressions-only mode from here
on out"
* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (31 commits)
ARM: multi_v7_defconfig: add SDHCI for i.MX
bus: mvebu-mbus: Fix optional pcie-mem/io-aperture properties
ARM: mvebu: add missing DT Mbus ranges and relocate PCIe DT nodes for RN102
ARM: at91: sam9g45: shutdown ddr1 too when rebooting
MAINTAINERS: ARM: SIRF: use kernel.org mail box
MAINTAINERS: ARM: SIRF: add missed drivers into maintain list
ARM: edma: Fix clearing of unused list for DT DMA resources
ARM: vexpress: tc2: fix hotplug/idle/kexec race on cluster power down
ARM: dts: sirf: fix interrupt and dma prop of VIP for prima2 and atlas6
ARM: dts: sirf: fix the ranges of peri-iobrg of prima2
ARM: dts: makefile: build atlas6-evb.dtb for ARCH_ATLAS6
ARM: dts: sirf: fix fifosize, clks, dma channels for UART
ARM: mvebu: Add DT entry for ReadyNAS 102 to use gpio-poweroff driver
ARM: mvebu: fix ReadyNAS 102 Power button GPIO to make it active high
ARM: mach-integrator: Add stub for pci_v3_early_init() for !CONFIG_PCI
ARM: shmobile: Remove #gpio-ranges-cells DT property
gpio: rcar: Remove #gpio-range-cells DT property usage
ARM: shmobile: armadillo: fixup ether pinctrl naming
ARM: shmobile: Lager: add Micrel KSZ8041 PHY fixup
ARM: shmobile: update SDHI DT compatibility string to the <unit>-<soc> format
...
Christian Ruppert [Wed, 2 Oct 2013 09:13:38 +0000 (11:13 +0200)]
ARC: Fix signal frame management for SA_SIGINFO
Previously, when a signal was registered with SA_SIGINFO, parameters 2
and 3 of the signal handler were written to registers r1 and r2 before
the register set was saved. This led to corruption of these two
registers after returning from the signal handler (the wrong values were
restored).
With this patch, registers are now saved before any parameters are
passed, thus maintaining the processor state from before signal entry.
Signed-off-by: Christian Ruppert <christian.ruppert@abilis.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Linus Torvalds [Thu, 3 Oct 2013 03:58:33 +0000 (20:58 -0700)]
Merge git://git./linux/kernel/git/davem/sparc
Pull sparc fixes from David Miller:
"Couple of small bug fixes:
1) strlcpy in ldom_reboot() is still not quite right, use sprintf
instead from Kees Cook.
2) Generic hugetlb interface pte checks should use the widest return
type, otherwise high bits can get chopped off.
3) Fix build with PCI MSI enabled on 32-bit sparc"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
sparc: fix MSI build failure on Sparc32
sparc: remove deprecated IRQF_DISABLED
mm: Fix generic hugetlb pte check return type.
sparc: fix ldom_reboot buffer overflow harder
Olof Johansson [Thu, 3 Oct 2013 03:55:05 +0000 (20:55 -0700)]
Merge tag 'fixes-3.12-2' of git://git.infradead.org/linux-mvebu into fixes
From Jason Cooper:
mvebu fixes for v3.12 (round 2)
- mvebu
- fix ReadyNAS 102 power button (needs to be active high)
- fix ReadyNAS 102 automated rebooting (prevent hang) by add gpio-poweroff
node
- fix booting ReadyNAS 102 by adding MBus ranges and PCIe DT nodes
- mvebu-mbus: prevent PCIe driver from continuing with corrupted resource
* tag 'fixes-3.12-2' of git://git.infradead.org/linux-mvebu:
bus: mvebu-mbus: Fix optional pcie-mem/io-aperture properties
ARM: mvebu: add missing DT Mbus ranges and relocate PCIe DT nodes for RN102
ARM: mvebu: Add DT entry for ReadyNAS 102 to use gpio-poweroff driver
ARM: mvebu: fix ReadyNAS 102 Power button GPIO to make it active high
Signed-off-by: Olof Johansson <olof@lixom.net>
Olof Johansson [Mon, 30 Sep 2013 00:34:45 +0000 (17:34 -0700)]
ARM: multi_v7_defconfig: add SDHCI for i.MX
Turn on SDHCI for i.MX support so machines can boot with local rootfs
on SD. Tested on a Wandboard Quad.
Signed-off-by: Olof Johansson <olof@lixom.net>
Reviewed-by: Fabio Estevam <fabio.estevam@freescale.com>
Thomas Petazzoni [Wed, 11 Sep 2013 10:32:05 +0000 (12:32 +0200)]
sparc: fix MSI build failure on Sparc32
Commit
ebd97be635 ('PCI: remove ARCH_SUPPORTS_MSI kconfig option')
removes the ARCH_SUPPORTS_MSI Kconfig option that allowed
architectures to indicate whether they support PCI MSI or not. Now,
PCI MSI support can be compiled in on any architecture thanks to the
use of weak functions thanks to
4287d824f265 ('PCI: use weak functions
for MSI arch-specific functions').
So, architecture specific code is now responsible to ensure that its
PCI MSI code builds in all cases, or be appropriately conditionally
compiled.
On Sparc, the MSI support is only provided for Sparc64, so the
ARCH_SUPPORTS_MSI kconfig option was only selected for SPARC64, and
not for the Sparc architecture as a whole. Therefore, removing
ARCH_SUPPORTS_MSI broke Sparc32 configurations with CONFIG_PCI_MSI=y,
because the Sparc-specific MSI code is not designed to be built on
Sparc32.
To solve this, this commit ensures that the Sparc MSI code is only
built on Sparc64. This is done thanks to a new Kconfig Makefile helper
option SPARC64_PCI_MSI, modeled after the existing SPARC64_PCI. The
SPARC64_PCI_MSI option is an hidden option that is true when both
Sparc64 PCI support is enabled and MSI is enabled. The
arch/sparc/kernel/pci_msi.c file is now only built when
SPARC64_PCI_MSI is true.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Reported-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Opdenacker [Sat, 7 Sep 2013 07:38:09 +0000 (09:38 +0200)]
sparc: remove deprecated IRQF_DISABLED
This patch proposes to remove the IRQF_DISABLED flag from sparc architecture
code. It's a NOOP since 2.6.35 and it will be removed one day.
Signed-off-by: Michael Opdenacker <michael.opdenacker@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Miller [Wed, 2 Oct 2013 18:25:09 +0000 (14:25 -0400)]
mm: Fix generic hugetlb pte check return type.
The include/asm-generic/hugetlb.h stubs that just vector huge_pte_*()
calls to the pte_*() implementations won't work in certain situations.
x86 and sparc, for example, return "unsigned long" from the bit
checks, and just go "return pte_val(pte) & PTE_BIT_FOO;"
But since huge_pte_*() returns 'int', if any high bits on 64-bit are
relevant, they get chopped off.
The net effect is that we can loop forever trying to COW a huge page,
because the huge_pte_write() check signals false all the time.
Reported-by: Gurudas Pai <gurudas.pai@oracle.com>
Tested-by: Gurudas Pai <gurudas.pai@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: David Rientjes <rientjes@google.com>
Kees Cook [Wed, 2 Oct 2013 05:13:34 +0000 (22:13 -0700)]
sparc: fix ldom_reboot buffer overflow harder
The length argument to strlcpy was still wrong. It could overflow the end of
full_boot_str by 5 bytes. Instead of strcat and strlcpy, just use snprint.
Reported-by: Brad Spengler <spender@grsecurity.net>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stephen Boyd [Tue, 1 Oct 2013 20:48:43 +0000 (21:48 +0100)]
arm64: Remove duplicate DEBUG_STACK_USAGE config
This config item already exists generically in lib/Kconfig.debug.
Remove the duplicate config in arm64.
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Linus Torvalds [Wed, 2 Oct 2013 16:38:17 +0000 (09:38 -0700)]
Merge git://git.kvack.org/~bcrl/aio-next
Pull aio use-after-free fix from Ben LaHaise.
* git://git.kvack.org/~bcrl/aio-next:
aio: fix use-after-free in aio_migratepage
Linus Torvalds [Wed, 2 Oct 2013 16:36:10 +0000 (09:36 -0700)]
Merge tag 'sound-3.12' of git://git./linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"All small, mostly driver-specific fixes: a few ASoC driver fixes
(trivial stable fixes, sgtl5000 fixes), one DPCM fix, an old AC97 ID,
and a fix for HD-audio Conexant GPIO"
* tag 'sound-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda - Fix GPIO for Acer Aspire 3830TG
ALSA: ac97: Add ID for TI TLV320AIC27 codec
ASoC: imx-sgtl5000: Fix uninitialized pointer use in error path
ASoC: imx-sgtl5000: do not use devres on a foreign device
ASoC: blackfin: Add missing break statement to bf6xx
ASoC: 88pm860x: array overflow in snd_soc_put_volsw_2r_st()
ASoC: ab8500-codec: info leak in anc_status_control_put()
ASoC: max98095: a couple array underflows
ASoC: core: Only add platform DAI widgets once.
Linus Torvalds [Wed, 2 Oct 2013 16:34:47 +0000 (09:34 -0700)]
Merge tag 'pinctrl-v3.12-2' of git://git./linux/kernel/git/linusw/linux-pinctrl
Pull pin control fixes from Linus Walleij:
- Various build warning fixes.
- Correct the S5P pin count.
- Handle BIAS_DEFAULT properly in the Palmas driver.
* tag 'pinctrl-v3.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: palmas: do not abort pin configuration for BIAS_DEFAULT
pinctrl: Correct number of pins for s5pv210
pinctrl: remove an unnecessary cast
pinctrl: fix pinconf_dbg_config_write return type
pinctrl: tegra114: Remove MODULE_ALIAS
Yoichi Yuasa [Wed, 2 Oct 2013 06:03:03 +0000 (15:03 +0900)]
MIPS: Fix forgotten preempt_enable() when CPU has inclusive pcaches
[ 1.904000] BUG: scheduling while atomic: swapper/1/0x00000002
[ 1.908000] Modules linked in:
[ 1.916000] CPU: 0 PID: 1 Comm: swapper Not tainted 3.12.0-rc2-lemote-los.git-
5318619-dirty #1
[ 1.920000] Stack :
0000000031aac000 ffffffff810d0000 0000000000000052 ffffffff802730a4
0000000000000000 0000000000000001 ffffffff810cdf90 ffffffff810d0000
ffffffff8068b968 ffffffff806f5537 ffffffff810cdf90 980000009f0782e8
0000000000000001 ffffffff80720000 ffffffff806b0000 980000009f078000
980000009f290000 ffffffff805f312c 980000009f05b5d8 ffffffff80233518
980000009f05b5e8 ffffffff80274b7c 980000009f078000 ffffffff8068b968
0000000000000000 0000000000000000 0000000000000000 0000000000000000
0000000000000000 980000009f05b520 0000000000000000 ffffffff805f2f6c
0000000000000000 ffffffff80700000 ffffffff80700000 ffffffff806fc758
ffffffff80700000 ffffffff8020be98 ffffffff806fceb0 ffffffff805f2f6c
...
[ 2.028000] Call Trace:
[ 2.032000] [<
ffffffff8020be98>] show_stack+0x80/0x98
[ 2.036000] [<
ffffffff805f2f6c>] __schedule_bug+0x44/0x6c
[ 2.040000] [<
ffffffff805fac58>] __schedule+0x518/0x5b0
[ 2.044000] [<
ffffffff805f8a58>] schedule_timeout+0x128/0x1f0
[ 2.048000] [<
ffffffff80240314>] msleep+0x3c/0x60
[ 2.052000] [<
ffffffff80495400>] do_probe+0x238/0x3a8
[ 2.056000] [<
ffffffff804958b0>] ide_probe_port+0x340/0x7e8
[ 2.060000] [<
ffffffff80496028>] ide_host_register+0x2d0/0x7a8
[ 2.064000] [<
ffffffff8049c65c>] ide_pci_init_two+0x4e4/0x790
[ 2.068000] [<
ffffffff8049f9b8>] amd74xx_probe+0x148/0x2c8
[ 2.072000] [<
ffffffff803f571c>] pci_device_probe+0xc4/0x130
[ 2.076000] [<
ffffffff80478f60>] driver_probe_device+0x98/0x270
[ 2.080000] [<
ffffffff80479298>] __driver_attach+0xe0/0xe8
[ 2.084000] [<
ffffffff80476ab0>] bus_for_each_dev+0x78/0xe0
[ 2.088000] [<
ffffffff80478468>] bus_add_driver+0x230/0x310
[ 2.092000] [<
ffffffff80479b44>] driver_register+0x84/0x158
[ 2.096000] [<
ffffffff80200504>] do_one_initcall+0x104/0x160
Signed-off-by: Yoichi Yuasa <yuasa@linux-mips.org>
Reported-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Tested-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Cc: linux-mips@linux-mips.org
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Patchwork: https://patchwork.linux-mips.org/patch/5941/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Ingo Molnar [Wed, 2 Oct 2013 05:53:01 +0000 (07:53 +0200)]
Merge branch 'irq/urgent-v2' of git://git./linux/kernel/git/frederic/linux-dynticks into irq/urgent
Pull a hardirq-nesting fix from Frederic Weisbecker.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tom Gundersen [Tue, 1 Oct 2013 16:18:40 +0000 (18:18 +0200)]
x86/simplefb: Fix overflow causing bogus fall-back
On my MacBook Air lfb_size is 4M, which makes the bitshit
overflow (to 256GB - larger than 32 bits), meaning we fall
back to efifb unnecessarily.
Cast to u64 to avoid the overflow.
Signed-off-by: Tom Gundersen <teg@jklm.no>
Reviewed-by: David Herrmann <dh.herrmann@gmail.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Stephen Warren <swarren@nvidia.com>
Cc: Stephen Warren <swarren@wwwdotorg.org>
Link: http://lkml.kernel.org/r/1380644320-1026-1-git-send-email-teg@jklm.no
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Linus Torvalds [Tue, 1 Oct 2013 19:58:48 +0000 (12:58 -0700)]
Merge git://git./linux/kernel/git/davem/net
Pull networking changes from David Miller:
1) Multiply in netfilter IPVS can overflow when calculating destination
weight. From Simon Kirby.
2) Use after free fixes in IPVS from Julian Anastasov.
3) SFC driver bug fixes from Daniel Pieczko.
4) Memory leak in pcan_usb_core failure paths, from Alexey Khoroshilov.
5) Locking and encapsulation fixes to serial line CAN driver, from
Andrew Naujoks.
6) Duplex and VF handling fixes to bnx2x driver from Yaniv Rosner,
Eilon Greenstein, and Ariel Elior.
7) In lapb, if no other packets are outstanding, T1 timeouts actually
stall things and no packet gets sent. Fix from Josselin Costanzi.
8) ICMP redirects should not make it to the socket error queues, from
Duan Jiong.
9) Fix bugs in skge DMA mapping error handling, from Nikulas Patocka.
10) Fix setting of VLAN priority field on via-rhine driver, from Roget
Luethi.
11) Fix TX stalls and VLAN promisc programming in be2net driver from
Ajit Khaparde.
12) Packet padding doesn't get handled correctly in new usbnet SG
support code, from Ming Lei.
13) Fix races in netdevice teardown wrt. network namespace closing.
From Eric W. Biederman.
14) Fix potential missed initialization of net_secret if not TCP
connections are openned. From Eric Dumazet.
15) Cinterion PLXX product ID in qmi_wwan driver is wrong, from
Aleksander Morgado.
16) skb_cow_head() can change skb->data and thus packet header pointers,
don't use stale ip_hdr reference in ip_tunnel code.
17) Backend state transition handling fixes in xen-netback, from Paul
Durrant.
18) Packet offset for AH protocol is handled wrong in flow dissector,
from Eric Dumazet.
19) Taking down an fq packet scheduler instance can leave stale packets
in the queues, fix from Eric Dumazet.
20) Fix performance regressions introduced by TCP Small Queues. From
Eric Dumazet.
21) IPV6 GRE tunneling code calculates max_headroom incorrectly, from
Hannes Frederic Sowa.
22) Multicast timer handlers in ipv4 and ipv6 can be the last and final
reference to the ipv4/ipv6 specific network device state, so use the
reference put that will check and release the object if the
reference hits zero. From Salam Noureddine.
23) Fix memory corruption in ip_tunnel driver, and use skb_push()
instead of __skb_push() so that similar bugs are less hard to find.
From Steffen Klassert.
24) Add forgotten hookup of rtnl_ops in SIT and ip6tnl drivers, from
Nicolas Dichtel.
25) fq scheduler doesn't accurately rate limit in certain circumstances,
from Eric Dumazet.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (103 commits)
pkt_sched: fq: rate limiting improvements
ip6tnl: allow to use rtnl ops on fb tunnel
sit: allow to use rtnl ops on fb tunnel
ip_tunnel: Remove double unregister of the fallback device
ip_tunnel_core: Change __skb_push back to skb_push
ip_tunnel: Add fallback tunnels to the hash lists
ip_tunnel: Fix a memory corruption in ip_tunnel_xmit
qlcnic: Fix SR-IOV configuration
ll_temac: Reset dma descriptors indexes on ndo_open
skbuff: size of hole is wrong in a comment
ipv6 mcast: use in6_dev_put in timer handlers instead of __in6_dev_put
ipv4 igmp: use in_dev_put in timer handlers instead of __in_dev_put
ethernet: moxa: fix incorrect placement of __initdata tag
ipv6: gre: correct calculation of max_headroom
powerpc/83xx: gianfar_ptp: select 1588 clock source through dts file
Revert "powerpc/83xx: gianfar_ptp: select 1588 clock source through dts file"
bonding: Fix broken promiscuity reference counting issue
tcp: TSQ can use a dynamic limit
dm9601: fix IFF_ALLMULTI handling
pkt_sched: fq: qdisc dismantle fixes
...
Linus Torvalds [Tue, 1 Oct 2013 19:57:59 +0000 (12:57 -0700)]
Merge git://git./linux/kernel/git/davem/sparc
Pull sparc fix from David Miller:
"Just a single bug fix to a regression added during some strlcpy()
conversions"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
sparc64: Fix buggy strlcpy() conversion in ldom_reboot().
Linus Torvalds [Tue, 1 Oct 2013 17:28:11 +0000 (10:28 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/viro/vfs
Pull vfs lru leak fix from Al Viro:
"The fix in "super: fix for destroy lrus" didn't - they need to be
destroyed, all right, but that's the wrong place..."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs/super.c: fix lru_list leak for real
Linus Torvalds [Tue, 1 Oct 2013 17:25:10 +0000 (10:25 -0700)]
Merge git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull two KVM fixes from Gleb Natapov.
* git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: VMX: do not check bit 12 of EPT violation exit qualification when undefined
ARM: kvm: rename cpu_reset to avoid name clash
Al Viro [Tue, 1 Oct 2013 17:11:21 +0000 (13:11 -0400)]
fs/super.c: fix lru_list leak for real
Freeing ->s_{inode,dentry}_lru in deactivate_locked_super() is wrong;
the right place is destroy_super(). As it is, we leak them if sget()
decides that new superblock it has allocated (and never shown to
anybody) isn't needed and should be freed.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Jason Gunthorpe [Tue, 17 Sep 2013 20:11:04 +0000 (14:11 -0600)]
bus: mvebu-mbus: Fix optional pcie-mem/io-aperture properties
If the property was not specified then the returned resource had a
resource_size(..) == 1, rather than 0. The PCI-E driver checks for 0 so it
blindly continues on with a corrupted resource.
The regression was introduced into v3.12 by:
11be654 PCI: mvebu: Adapt to the new device tree layout
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Jason Cooper <jason@lakedaemon.net>
Eric Dumazet [Tue, 1 Oct 2013 16:10:16 +0000 (09:10 -0700)]
pkt_sched: fq: rate limiting improvements
FQ rate limiting suffers from two problems, reported
by Steinar :
1) FQ enforces a delay when flow quantum is exhausted in order
to reduce cpu overhead. But if packets are small, current
delay computation is slightly wrong, and observed rates can
be too high.
Steinar had this problem because he disabled TSO and GSO,
and default FQ quantum is 2*1514.
(Of course, I wish recent TSO auto sizing changes will help
to not having to disable TSO in the first place)
2) maxrate was not used for forwarded flows (skbs not attached
to a socket)
Tested:
tc qdisc add dev eth0 root est 1sec 4sec fq maxrate 8Mbit
netperf -H lpq84 -l 1000 &
sleep 10 ; tc -s qdisc show dev eth0
qdisc fq 8003: root refcnt 32 limit 10000p flow_limit 100p buckets 1024
quantum 3028 initial_quantum 15140 maxrate 8000Kbit
Sent
16819357 bytes 11258 pkt (dropped 0, overlimits 0 requeues 0)
rate 7831Kbit 653pps backlog 7570b 5p requeues 0
44 flows (43 inactive, 1 throttled), next packet delay
2977352 ns
0 gc, 0 highprio, 5545 throttled
lpq83:~# tcpdump -p -i eth0 host lpq84 -c 12
09:02:52.079484 IP lpq83 > lpq84: .
1389536928:
1389538376(1448) ack
3808678021 win 457 <nop,nop,timestamp 961812
572609068>
09:02:52.079499 IP lpq83 > lpq84: . 1448:2896(1448) ack 1 win 457 <nop,nop,timestamp 961812
572609068>
09:02:52.079906 IP lpq84 > lpq83: . ack 2896 win 16384 <nop,nop,timestamp
572609080 961812>
09:02:52.082568 IP lpq83 > lpq84: . 2896:4344(1448) ack 1 win 457 <nop,nop,timestamp 961815
572609071>
09:02:52.082581 IP lpq83 > lpq84: . 4344:5792(1448) ack 1 win 457 <nop,nop,timestamp 961815
572609071>
09:02:52.083017 IP lpq84 > lpq83: . ack 5792 win 16384 <nop,nop,timestamp
572609083 961815>
09:02:52.085678 IP lpq83 > lpq84: . 5792:7240(1448) ack 1 win 457 <nop,nop,timestamp 961818
572609074>
09:02:52.085693 IP lpq83 > lpq84: . 7240:8688(1448) ack 1 win 457 <nop,nop,timestamp 961818
572609074>
09:02:52.086117 IP lpq84 > lpq83: . ack 8688 win 16384 <nop,nop,timestamp
572609086 961818>
09:02:52.088792 IP lpq83 > lpq84: . 8688:10136(1448) ack 1 win 457 <nop,nop,timestamp 961821
572609077>
09:02:52.088806 IP lpq83 > lpq84: . 10136:11584(1448) ack 1 win 457 <nop,nop,timestamp 961821
572609077>
09:02:52.089217 IP lpq84 > lpq83: . ack 11584 win 16384 <nop,nop,timestamp
572609090 961821>
Reported-by: Steinar H. Gunderson <sesse@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nicolas Dichtel [Tue, 1 Oct 2013 16:05:00 +0000 (18:05 +0200)]
ip6tnl: allow to use rtnl ops on fb tunnel
rtnl ops where introduced by
c075b13098b3 ("ip6tnl: advertise tunnel param via
rtnl"), but I forget to assign rtnl ops to fb tunnels.
Now that it is done, we must remove the explicit call to
unregister_netdevice_queue(), because the fallback tunnel is added to the queue
in ip6_tnl_destroy_tunnels() when checking rtnl_link_ops of all netdevices (this
is valid since commit
0bd8762824e7 ("ip6tnl: add x-netns support")).
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nicolas Dichtel [Tue, 1 Oct 2013 16:04:59 +0000 (18:04 +0200)]
sit: allow to use rtnl ops on fb tunnel
rtnl ops where introduced by
ba3e3f50a0e5 ("sit: advertise tunnel param via
rtnl"), but I forget to assign rtnl ops to fb tunnels.
Now that it is done, we must remove the explicit call to
unregister_netdevice_queue(), because the fallback tunnel is added to the queue
in sit_destroy_tunnels() when checking rtnl_link_ops of all netdevices (this
is valid since commit
5e6700b3bf98 ("sit: add support of x-netns")).
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 1 Oct 2013 16:42:28 +0000 (12:42 -0400)]
Merge branch 'ip_tunnel'
ip_tunnel bug fixes from Steffen Klassert.
Signed-off-by: David S. Miller <davem@davemloft.net>
Steffen Klassert [Tue, 1 Oct 2013 09:37:37 +0000 (11:37 +0200)]
ip_tunnel: Remove double unregister of the fallback device
When queueing the netdevices for removal, we queue the
fallback device twice in ip_tunnel_destroy(). The first
time when we queue all netdevices in the namespace and
then again explicitly. Fix this by removing the explicit
queueing of the fallback device.
Bug was introduced when network namespace support was added
with commit
6c742e714d8 ("ipip: add x-netns support").
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steffen Klassert [Tue, 1 Oct 2013 09:35:51 +0000 (11:35 +0200)]
ip_tunnel_core: Change __skb_push back to skb_push
Git commit
0e6fbc5b ("ip_tunnels: extend iptunnel_xmit()")
moved the IP header installation to iptunnel_xmit() and
changed skb_push() to __skb_push(). This makes possible
bugs hard to track down, so change it back to skb_push().
Cc: Pravin Shelar <pshelar@nicira.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steffen Klassert [Tue, 1 Oct 2013 09:34:48 +0000 (11:34 +0200)]
ip_tunnel: Add fallback tunnels to the hash lists
Currently we can not update the tunnel parameters of
the fallback tunnels because we don't find them in the
hash lists. Fix this by adding them on initialization.
Bug was introduced with commit
c544193214
("GRE: Refactor GRE tunneling code.")
Cc: Pravin Shelar <pshelar@nicira.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steffen Klassert [Tue, 1 Oct 2013 09:33:59 +0000 (11:33 +0200)]
ip_tunnel: Fix a memory corruption in ip_tunnel_xmit
We might extend the used aera of a skb beyond the total
headroom when we install the ipip header. Fix this by
calling skb_cow_head() unconditionally.
Bug was introduced with commit
c544193214
("GRE: Refactor GRE tunneling code.")
Cc: Pravin Shelar <pshelar@nicira.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 1 Oct 2013 16:39:35 +0000 (12:39 -0400)]
Merge branch 'master' of git://git./linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:
====================
The following patchset contains Netfilter/IPVS fixes for your net
tree, they are:
* Fix BUG_ON splat due to malformed TCP packets seen by synproxy, from
Patrick McHardy.
* Fix possible weight overflow in lblc and lblcr schedulers due to
32-bits arithmetics, from Simon Kirby.
* Fix possible memory access race in the lblc and lblcr schedulers,
introduced when it was converted to use RCU, two patches from
Julian Anastasov.
* Fix hard dependency on CPU 0 when reading per-cpu stats in the
rate estimator, from Julian Anastasov.
* Fix race that may lead to object use after release, when invoking
ipvsadm -C && ipvsadm -R, introduced when adding RCU, from Julian
Anastasov.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Manish Chopra [Tue, 1 Oct 2013 06:23:48 +0000 (02:23 -0400)]
qlcnic: Fix SR-IOV configuration
o Interface needs to be brought down and up while configuring SR-IOV.
Protect interface up/down using rtnl_lock()/rtnl_unlock()
Signed-off-by: Manish Chopra <manish.chopra@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ricardo Ribalda [Tue, 1 Oct 2013 06:17:10 +0000 (08:17 +0200)]
ll_temac: Reset dma descriptors indexes on ndo_open
The dma descriptors indexes are only initialized on the probe function.
If a packet is on the buffer when temac_stop is called, the dma
descriptors indexes can be left on a incorrect state where no other
package can be sent.
So an interface could be left in an usable state after ifdow/ifup.
This patch makes sure that the descriptors indexes are in a proper
status when the device is open.
Signed-off-by: Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Frederic Weisbecker [Mon, 23 Sep 2013 22:50:25 +0000 (00:50 +0200)]
irq: Force hardirq exit's softirq processing on its own stack
The commit
facd8b80c67a3cf64a467c4a2ac5fb31f2e6745b
("irq: Sanitize invoke_softirq") converted irq exit
calls of do_softirq() to __do_softirq() on all architectures,
assuming it was only used there for its irq disablement
properties.
But as a side effect, the softirqs processed in the end
of the hardirq are always called on the inline current
stack that is used by irq_exit() instead of the softirq
stack provided by the archs that override do_softirq().
The result is mostly safe if the architecture runs irq_exit()
on a separate irq stack because then softirqs are processed
on that same stack that is near empty at this stage (assuming
hardirq aren't nesting).
Otherwise irq_exit() runs in the task stack and so does the softirq
too. The interrupted call stack can be randomly deep already and
the softirq can dig through it even further. To add insult to the
injury, this softirq can be interrupted by a new hardirq, maximizing
the chances for a stack overrun as reported in powerpc for example:
do_IRQ: stack overflow: 1920
CPU: 0 PID: 1602 Comm: qemu-system-ppc Not tainted 3.10.4-300.1.fc19.ppc64p7 #1
Call Trace:
[
c0000000050a8740] .show_stack+0x130/0x200 (unreliable)
[
c0000000050a8810] .dump_stack+0x28/0x3c
[
c0000000050a8880] .do_IRQ+0x2b8/0x2c0
[
c0000000050a8930] hardware_interrupt_common+0x154/0x180
--- Exception: 501 at .cp_start_xmit+0x3a4/0x820 [8139cp]
LR = .cp_start_xmit+0x390/0x820 [8139cp]
[
c0000000050a8d40] .dev_hard_start_xmit+0x394/0x640
[
c0000000050a8e00] .sch_direct_xmit+0x110/0x260
[
c0000000050a8ea0] .dev_queue_xmit+0x260/0x630
[
c0000000050a8f40] .br_dev_queue_push_xmit+0xc4/0x130 [bridge]
[
c0000000050a8fc0] .br_dev_xmit+0x198/0x270 [bridge]
[
c0000000050a9070] .dev_hard_start_xmit+0x394/0x640
[
c0000000050a9130] .dev_queue_xmit+0x428/0x630
[
c0000000050a91d0] .ip_finish_output+0x2a4/0x550
[
c0000000050a9290] .ip_local_out+0x50/0x70
[
c0000000050a9310] .ip_queue_xmit+0x148/0x420
[
c0000000050a93b0] .tcp_transmit_skb+0x4e4/0xaf0
[
c0000000050a94a0] .__tcp_ack_snd_check+0x7c/0xf0
[
c0000000050a9520] .tcp_rcv_established+0x1e8/0x930
[
c0000000050a95f0] .tcp_v4_do_rcv+0x21c/0x570
[
c0000000050a96c0] .tcp_v4_rcv+0x734/0x930
[
c0000000050a97a0] .ip_local_deliver_finish+0x184/0x360
[
c0000000050a9840] .ip_rcv_finish+0x148/0x400
[
c0000000050a98d0] .__netif_receive_skb_core+0x4f8/0xb00
[
c0000000050a99d0] .netif_receive_skb+0x44/0x110
[
c0000000050a9a70] .br_handle_frame_finish+0x2bc/0x3f0 [bridge]
[
c0000000050a9b20] .br_nf_pre_routing_finish+0x2ac/0x420 [bridge]
[
c0000000050a9bd0] .br_nf_pre_routing+0x4dc/0x7d0 [bridge]
[
c0000000050a9c70] .nf_iterate+0x114/0x130
[
c0000000050a9d30] .nf_hook_slow+0xb4/0x1e0
[
c0000000050a9e00] .br_handle_frame+0x290/0x330 [bridge]
[
c0000000050a9ea0] .__netif_receive_skb_core+0x34c/0xb00
[
c0000000050a9fa0] .netif_receive_skb+0x44/0x110
[
c0000000050aa040] .napi_gro_receive+0xe8/0x120
[
c0000000050aa0c0] .cp_rx_poll+0x31c/0x590 [8139cp]
[
c0000000050aa1d0] .net_rx_action+0x1dc/0x310
[
c0000000050aa2b0] .__do_softirq+0x158/0x330
[
c0000000050aa3b0] .irq_exit+0xc8/0x110
[
c0000000050aa430] .do_IRQ+0xdc/0x2c0
[
c0000000050aa4e0] hardware_interrupt_common+0x154/0x180
--- Exception: 501 at .bad_range+0x1c/0x110
LR = .get_page_from_freelist+0x908/0xbb0
[
c0000000050aa7d0] .list_del+0x18/0x50 (unreliable)
[
c0000000050aa850] .get_page_from_freelist+0x908/0xbb0
[
c0000000050aa9e0] .__alloc_pages_nodemask+0x21c/0xae0
[
c0000000050aaba0] .alloc_pages_vma+0xd0/0x210
[
c0000000050aac60] .handle_pte_fault+0x814/0xb70
[
c0000000050aad50] .__get_user_pages+0x1a4/0x640
[
c0000000050aae60] .get_user_pages_fast+0xec/0x160
[
c0000000050aaf10] .__gfn_to_pfn_memslot+0x3b0/0x430 [kvm]
[
c0000000050aafd0] .kvmppc_gfn_to_pfn+0x64/0x130 [kvm]
[
c0000000050ab070] .kvmppc_mmu_map_page+0x94/0x530 [kvm]
[
c0000000050ab190] .kvmppc_handle_pagefault+0x174/0x610 [kvm]
[
c0000000050ab270] .kvmppc_handle_exit_pr+0x464/0x9b0 [kvm]
[
c0000000050ab320] kvm_start_lightweight+0x1ec/0x1fc [kvm]
[
c0000000050ab4f0] .kvmppc_vcpu_run_pr+0x168/0x3b0 [kvm]
[
c0000000050ab9c0] .kvmppc_vcpu_run+0xc8/0xf0 [kvm]
[
c0000000050aba50] .kvm_arch_vcpu_ioctl_run+0x5c/0x1a0 [kvm]
[
c0000000050abae0] .kvm_vcpu_ioctl+0x478/0x730 [kvm]
[
c0000000050abc90] .do_vfs_ioctl+0x4ec/0x7c0
[
c0000000050abd80] .SyS_ioctl+0xd4/0xf0
[
c0000000050abe30] syscall_exit+0x0/0x98
Since this is a regression, this patch proposes a minimalistic
and low-risk solution by blindly forcing the hardirq exit processing of
softirqs on the softirq stack. This way we should reduce significantly
the opportunities for task stack overflow dug by softirqs.
Longer term solutions may involve extending the hardirq stack coverage to
irq_exit(), etc...
Reported-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: #3.9.. <stable@vger.kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@au1.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Mackerras <paulus@au1.ibm.com>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: James E.J. Bottomley <jejb@parisc-linux.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Nicolas Dichtel [Mon, 30 Sep 2013 12:16:41 +0000 (14:16 +0200)]
skbuff: size of hole is wrong in a comment
Since commit
c93bdd0e03e8 ("netvm: allow skb allocation to use PFMEMALLOC
reserves"), hole size is one bit less than what is written in the comment.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 1 Oct 2013 05:31:05 +0000 (22:31 -0700)]
Merge branch 'fixes-for-3.12' of git://gitorious.org/linux-can/linux-can
Signed-off-by: David S. Miller <davem@davemloft.net>
Salam Noureddine [Sun, 29 Sep 2013 20:41:34 +0000 (13:41 -0700)]
ipv6 mcast: use in6_dev_put in timer handlers instead of __in6_dev_put
It is possible for the timer handlers to run after the call to
ipv6_mc_down so use in6_dev_put instead of __in6_dev_put in the
handler function in order to do proper cleanup when the refcnt
reaches 0. Otherwise, the refcnt can reach zero without the
inet6_dev being destroyed and we end up leaking a reference to
the net_device and see messages like the following,
unregister_netdevice: waiting for eth0 to become free. Usage count = 1
Tested on linux-3.4.43.
Signed-off-by: Salam Noureddine <noureddine@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Salam Noureddine [Sun, 29 Sep 2013 20:39:42 +0000 (13:39 -0700)]
ipv4 igmp: use in_dev_put in timer handlers instead of __in_dev_put
It is possible for the timer handlers to run after the call to
ip_mc_down so use in_dev_put instead of __in_dev_put in the handler
function in order to do proper cleanup when the refcnt reaches 0.
Otherwise, the refcnt can reach zero without the in_device being
destroyed and we end up leaking a reference to the net_device and
see messages like the following,
unregister_netdevice: waiting for eth0 to become free. Usage count = 1
Tested on linux-3.4.43.
Signed-off-by: Salam Noureddine <noureddine@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bartlomiej Zolnierkiewicz [Mon, 30 Sep 2013 13:18:27 +0000 (15:18 +0200)]
ethernet: moxa: fix incorrect placement of __initdata tag
__initdata tag should be placed between the variable name and equal
sign for the variable to be placed in the intended .init.data section.
In this particular case __initdata is incorrect as moxart_mac_driver
can be used after the driver gets initialized.
Also while at it static-ize moxart_mac_driver.
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hannes Frederic Sowa [Sun, 29 Sep 2013 03:40:50 +0000 (05:40 +0200)]
ipv6: gre: correct calculation of max_headroom
gre_hlen already accounts for sizeof(struct ipv6_hdr) + gre header,
so initialize max_headroom to zero. Otherwise the
if (encap_limit >= 0) {
max_headroom += 8;
mtu -= 8;
}
increments an uninitialized variable before max_headroom was reset.
Found with coverity: 728539
Cc: Dmitry Kozlov <xeb@mail.ru>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Aida Mynzhasova [Fri, 27 Sep 2013 13:40:27 +0000 (17:40 +0400)]
powerpc/83xx: gianfar_ptp: select 1588 clock source through dts file
Currently IEEE 1588 timer reference clock source is determined through
hard-coded value in gianfar_ptp driver. This patch allows to select ptp
clock source by means of device tree file node.
For instance:
fsl,cksel = <0>;
for using external (TSEC_TMR_CLK input) high precision timer
reference clock.
Other acceptable values:
<1> : eTSEC system clock
<2> : eTSEC1 transmit clock
<3> : RTC clock input
When this attribute isn't used, eTSEC system clock will serve as
IEEE 1588 timer reference clock.
Signed-off-by: Aida Mynzhasova <aida.mynzhasova@skitlab.ru>
Acked-by: Kumar Gala <galak@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 1 Oct 2013 04:16:17 +0000 (21:16 -0700)]
Revert "powerpc/83xx: gianfar_ptp: select 1588 clock source through dts file"
This reverts commit
894116bd0e9b7749a0c4b6c62dec13c2a0ccef68.
I applied the wrong version of this patch, correct
version coming up.
Signed-off-by: David S. Miller <davem@davemloft.net>
Neil Horman [Fri, 27 Sep 2013 16:22:15 +0000 (12:22 -0400)]
bonding: Fix broken promiscuity reference counting issue
Recently grabbed this report:
https://bugzilla.redhat.com/show_bug.cgi?id=
1005567
Of an issue in which the bonding driver, with an attached vlan encountered the
following errors when bond0 was taken down and back up:
dummy1: promiscuity touches roof, set promiscuity failed. promiscuity feature of
device might be broken.
The error occurs because, during __bond_release_one, if we release our last
slave, we take on a random mac address and issue a NETDEV_CHANGEADDR
notification. With an attached vlan, the vlan may see that the vlan and bond
mac address were in sync, but no longer are. This triggers a call to dev_uc_add
and dev_set_rx_mode, which enables IFF_PROMISC on the bond device. Then, when
we complete __bond_release_one, we use the current state of the bond flags to
determine if we should decrement the promiscuity of the releasing slave. But
since the bond changed promiscuity state during the release operation, we
incorrectly decrement the slave promisc count when it wasn't in promiscuous mode
to begin with, causing the above error
Fix is pretty simple, just cache the bonding flags at the start of the function
and use those when determining the need to set promiscuity.
This is also needed for the ALLMULTI flag
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: Mark Wu <wudxw@linux.vnet.ibm.com>
CC: "David S. Miller" <davem@davemloft.net>
Reported-by: Mark Wu <wudxw@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 27 Sep 2013 10:28:54 +0000 (03:28 -0700)]
tcp: TSQ can use a dynamic limit
When TCP Small Queues was added, we used a sysctl to limit amount of
packets queues on Qdisc/device queues for a given TCP flow.
Problem is this limit is either too big for low rates, or too small
for high rates.
Now TCP stack has rate estimation in sk->sk_pacing_rate, and TSO
auto sizing, it can better control number of packets in Qdisc/device
queues.
New limit is two packets or at least 1 to 2 ms worth of packets.
Low rates flows benefit from this patch by having even smaller
number of packets in queues, allowing for faster recovery,
better RTT estimations.
High rates flows benefit from this patch by allowing more than 2 packets
in flight as we had reports this was a limiting factor to reach line
rate. [ In particular if TX completion is delayed because of coalescing
parameters ]
Example for a single flow on 10Gbp link controlled by FQ/pacing
14 packets in flight instead of 2
$ tc -s -d qd
qdisc fq 8001: dev eth0 root refcnt 32 limit 10000p flow_limit 100p
buckets 1024 quantum 3028 initial_quantum 15140
Sent
1168459366606 bytes
771822841 pkt (dropped 0, overlimits 0
requeues
6822476)
rate 9346Mbit 771713pps backlog
953820b 14p requeues
6822476
2047 flow, 2046 inactive, 1 throttled, delay 15673 ns
2372 gc, 0 highprio, 0 retrans,
9739249 throttled, 0 flows_plimit
Note that sk_pacing_rate is currently set to twice the actual rate, but
this might be refined in the future when a flow is in congestion
avoidance.
Additional change : skb->destructor should be set to tcp_wfree().
A future patch (for linux 3.13+) might remove tcp_limit_output_bytes
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnaud Ebalard [Mon, 30 Sep 2013 22:19:16 +0000 (00:19 +0200)]
ARM: mvebu: add missing DT Mbus ranges and relocate PCIe DT nodes for RN102
When
5e12a613 and
0cd3754a were introduced, Netgear ReadyNAS 102 .dts
file was queued for inclusion and missed the update to have Mbus (and
then BootROM) ranges properties declared. It also missed the relocation
of Armada 370/XP PCIe DT nodes introduced by
14fd8ed0 after
de1af8d4.
This patch fixes that which makes 3.12-rc3 bootable on the NAS.
Signed-off-by: Arnaud Ebalard <arno@natisbad.org>
Signed-off-by: Jason Cooper <jason@lakedaemon.net>
Linus Torvalds [Tue, 1 Oct 2013 00:10:26 +0000 (17:10 -0700)]
Merge tag 'nfs-for-3.12-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client bugfixes from Trond Myklebust:
- Stable fix for Oopses in the pNFS files layout driver
- Fix a regression when doing a non-exclusive file create on NFSv4.x
- NFSv4.1 security negotiation fixes when looking up the root
filesystem
- Fix a memory ordering issue in the pNFS files layout driver
* tag 'nfs-for-3.12-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFS: Give "flavor" an initial value to fix a compile warning
NFSv4.1: try SECINFO_NO_NAME flavs until one works
NFSv4.1: Ensure memory ordering between nfs4_ds_connect and nfs4_fl_prepare_ds
NFSv4.1: nfs4_fl_prepare_ds - fix bugs when the connect attempt fails
NFSv4: Honour the 'opened' parameter in the atomic_open() filesystem method
Peter Korsgaard [Mon, 30 Sep 2013 21:28:20 +0000 (23:28 +0200)]
dm9601: fix IFF_ALLMULTI handling
Pass-all-multicast is controlled by bit 3 in RX control, not bit 2
(pass undersized frames).
Reported-by: Joseph Chang <joseph_chang@davicom.com.tw>
Signed-off-by: Peter Korsgaard <peter@korsgaard.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Mon, 30 Sep 2013 21:32:32 +0000 (14:32 -0700)]
Merge branch 'akpm' (fixes from Andrew Morton)
Merge misc fixes from Andrew Morton.
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (22 commits)
pidns: fix free_pid() to handle the first fork failure
ipc,msg: prevent race with rmid in msgsnd,msgrcv
ipc/sem.c: update sem_otime for all operations
mm/hwpoison: fix the lack of one reference count against poisoned page
mm/hwpoison: fix false report on 2nd attempt at page recovery
mm/hwpoison: fix test for a transparent huge page
mm/hwpoison: fix traversal of hugetlbfs pages to avoid printk flood
block: change config option name for cmdline partition parsing
mm/mlock.c: prevent walking off the end of a pagetable in no-pmd configuration
mm: avoid reinserting isolated balloon pages into LRU lists
arch/parisc/mm/fault.c: fix uninitialized variable usage
include/asm-generic/vtime.h: avoid zero-length file
nilfs2: fix issue with race condition of competition between segments for dirty blocks
Documentation/kernel-parameters.txt: replace kernelcore with Movable
mm/bounce.c: fix a regression where MS_SNAP_STABLE (stable pages snapshotting) was ignored
kernel/kmod.c: check for NULL in call_usermodehelper_exec()
ipc/sem.c: synchronize the proc interface
ipc/sem.c: optimize sem_lock()
ipc/sem.c: fix race in sem_lock()
mm/compaction.c: periodically schedule when freeing pages
...
Oleg Nesterov [Mon, 30 Sep 2013 20:45:27 +0000 (13:45 -0700)]
pidns: fix free_pid() to handle the first fork failure
"case 0" in free_pid() assumes that disable_pid_allocation() should
clear PIDNS_HASH_ADDING before the last pid goes away.
However this doesn't happen if the first fork() fails to create the
child reaper which should call disable_pid_allocation().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Davidlohr Bueso [Mon, 30 Sep 2013 20:45:26 +0000 (13:45 -0700)]
ipc,msg: prevent race with rmid in msgsnd,msgrcv
This fixes a race in both msgrcv() and msgsnd() between finding the msg
and actually dealing with the queue, as another thread can delete shmid
underneath us if we are preempted before acquiring the
kern_ipc_perm.lock.
Manfred illustrates this nicely:
Assume a preemptible kernel that is preempted just after
msq = msq_obtain_object_check(ns, msqid)
in do_msgrcv(). The only lock that is held is rcu_read_lock().
Now the other thread processes IPC_RMID. When the first task is
resumed, then it will happily wait for messages on a deleted queue.
Fix this by checking for if the queue has been deleted after taking the
lock.
Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
Reported-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: <stable@vger.kernel.org> [3.11]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Manfred Spraul [Mon, 30 Sep 2013 20:45:25 +0000 (13:45 -0700)]
ipc/sem.c: update sem_otime for all operations
In commit
0a2b9d4c7967 ("ipc/sem.c: move wake_up_process out of the
spinlock section"), the update of semaphore's sem_otime(last semop time)
was moved to one central position (do_smart_update).
But since do_smart_update() is only called for operations that modify
the array, this means that wait-for-zero semops do not update sem_otime
anymore.
The fix is simple:
Non-alter operations must update sem_otime.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Reported-by: Jia He <jiakernel@gmail.com>
Tested-by: Jia He <jiakernel@gmail.com>
Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Wanpeng Li [Mon, 30 Sep 2013 20:45:24 +0000 (13:45 -0700)]
mm/hwpoison: fix the lack of one reference count against poisoned page
The lack of one reference count against poisoned page for hwpoison_inject
w/o hwpoison_filter enabled result in hwpoison detect -1 users still
referenced the page, however, the number should be 0 except the poison
handler held one after successfully unmap. This patch fix it by hold one
referenced count against poisoned page for hwpoison_inject w/ and w/o
hwpoison_filter enabled.
Before patch:
[ 71.902112] Injecting memory failure at pfn 224706
[ 71.902137] MCE 0x224706: dirty LRU page recovery: Failed
[ 71.902138] MCE 0x224706: dirty LRU page still referenced by -1 users
After patch:
[ 94.710860] Injecting memory failure at pfn 215b68
[ 94.710885] MCE 0x215b68: dirty LRU page recovery: Recovered
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Wanpeng Li [Mon, 30 Sep 2013 20:45:23 +0000 (13:45 -0700)]
mm/hwpoison: fix false report on 2nd attempt at page recovery
If the page is poisoned by software injection w/ MF_COUNT_INCREASED
flag, there is a false report during the 2nd attempt at page recovery
which is not truthful.
This patch fixes it by reporting the first attempt to try free buddy
page recovery if MF_COUNT_INCREASED is set.
Before patch:
[ 346.332041] Injecting memory failure at pfn 200010
[ 346.332189] MCE 0x200010: free buddy, 2nd try page recovery: Delayed
After patch:
[ 297.742600] Injecting memory failure at pfn 200010
[ 297.742941] MCE 0x200010: free buddy page recovery: Delayed
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Wanpeng Li [Mon, 30 Sep 2013 20:45:22 +0000 (13:45 -0700)]
mm/hwpoison: fix test for a transparent huge page
PageTransHuge() can't guarantee the page is a transparent huge page
since it returns true for both transparent huge and hugetlbfs pages.
This patch fixes it by checking the page is also !hugetlbfs page.
Before patch:
[ 121.571128] Injecting memory failure at pfn 23a200
[ 121.571141] MCE 0x23a200: huge page recovery: Delayed
[ 140.355100] MCE: Memory failure is now running on 0x23a200
After patch:
[ 94.290793] Injecting memory failure at pfn 23a000
[ 94.290800] MCE 0x23a000: huge page recovery: Delayed
[ 105.722303] MCE: Software-unpoisoned page 0x23a000
Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Wanpeng Li [Mon, 30 Sep 2013 20:45:21 +0000 (13:45 -0700)]
mm/hwpoison: fix traversal of hugetlbfs pages to avoid printk flood
madvise_hwpoison won't check if the page is small page or huge page and
traverses in small page granularity against the range unconditionally,
which result in a printk flood "MCE xxx: already hardware poisoned" if
the page is a huge page.
This patch fixes it by using compound_order(compound_head(page)) for
huge page iterator.
Testcase:
#define _GNU_SOURCE
#include <stdlib.h>
#include <stdio.h>
#include <sys/mman.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/types.h>
#include <errno.h>
#define PAGES_TO_TEST 3
#define PAGE_SIZE 4096 * 512
int main(void)
{
char *mem;
int i;
mem = mmap(NULL, PAGES_TO_TEST * PAGE_SIZE,
PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB, 0, 0);
if (madvise(mem, PAGES_TO_TEST * PAGE_SIZE, MADV_HWPOISON) == -1)
return -1;
munmap(mem, PAGES_TO_TEST * PAGE_SIZE);
return 0;
}
Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Paul Gortmaker [Mon, 30 Sep 2013 20:45:19 +0000 (13:45 -0700)]
block: change config option name for cmdline partition parsing
Recently commit
bab55417b10c ("block: support embedded device command
line partition") introduced CONFIG_CMDLINE_PARSER. However, that name
is too generic and sounds like it enables/disables generic kernel boot
arg processing, when it really is block specific.
Before this option becomes a part of a full/final release, add the BLK_
prefix to it so that it is clear in absence of any other context that it
is block specific.
In addition, fix up the following less critical items:
- help text was not really at all helpful.
- index file for Documentation was not updated
- add the new arg to Documentation/kernel-parameters.txt
- clarify wording in source comments
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Cai Zhiyong <caizhiyong@huawei.com>
Cc: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vlastimil Babka [Mon, 30 Sep 2013 20:45:18 +0000 (13:45 -0700)]
mm/mlock.c: prevent walking off the end of a pagetable in no-pmd configuration
The function __munlock_pagevec_fill() introduced in commit
7a8010cd3627
("mm: munlock: manual pte walk in fast path instead of
follow_page_mask()") uses pmd_addr_end() for restricting its operation
within current page table.
This is insufficient on architectures/configurations where pmd is folded
and pmd_addr_end() just returns the end of the full range to be walked.
In this case, it allows pte++ to walk off the end of a page table
resulting in unpredictable behaviour.
This patch fixes the function by using pgd_addr_end() and pud_addr_end()
before pmd_addr_end(), which will yield correct page table boundary on
all configurations. This is similar to what existing page walkers do
when walking each level of the page table.
Additionaly, the patch clarifies a comment for get_locked_pte() call in the
function.
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Reviewed-by: Bob Liu <bob.liu@oracle.com>
Cc: Jörn Engel <joern@logfs.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michel Lespinasse <walken@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rafael Aquini [Mon, 30 Sep 2013 20:45:16 +0000 (13:45 -0700)]
mm: avoid reinserting isolated balloon pages into LRU lists
Isolated balloon pages can wrongly end up in LRU lists when
migrate_pages() finishes its round without draining all the isolated
page list.
The same issue can happen when reclaim_clean_pages_from_list() tries to
reclaim pages from an isolated page list, before migration, in the CMA
path. Such balloon page leak opens a race window against LRU lists
shrinkers that leads us to the following kernel panic:
BUG: unable to handle kernel NULL pointer dereference at
0000000000000028
IP: [<
ffffffff810c2625>] shrink_page_list+0x24e/0x897
PGD
3cda2067 PUD
3d713067 PMD 0
Oops: 0000 [#1] SMP
CPU: 0 PID: 340 Comm: kswapd0 Not tainted
3.12.0-rc1-22626-g4367597 #87
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
RIP: shrink_page_list+0x24e/0x897
RSP: 0000:
ffff88003da499b8 EFLAGS:
00010286
RAX:
0000000000000000 RBX:
ffff88003e82bd60 RCX:
00000000000657d5
RDX:
0000000000000000 RSI:
000000000000031f RDI:
ffff88003e82bd40
RBP:
ffff88003da49ab0 R08:
0000000000000001 R09:
0000000081121a45
R10:
ffffffff81121a45 R11:
ffff88003c4a9a28 R12:
ffff88003e82bd40
R13:
ffff88003da0e800 R14:
0000000000000001 R15:
ffff88003da49d58
FS:
0000000000000000(0000) GS:
ffff88003fc00000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
00000000067d9000 CR3:
000000003ace5000 CR4:
00000000000407b0
Call Trace:
shrink_inactive_list+0x240/0x3de
shrink_lruvec+0x3e0/0x566
__shrink_zone+0x94/0x178
shrink_zone+0x3a/0x82
balance_pgdat+0x32a/0x4c2
kswapd+0x2f0/0x372
kthread+0xa2/0xaa
ret_from_fork+0x7c/0xb0
Code: 80 7d 8f 01 48 83 95 68 ff ff ff 00 4c 89 e7 e8 5a 7b 00 00 48 85 c0 49 89 c5 75 08 80 7d 8f 00 74 3e eb 31 48 8b 80 18 01 00 00 <48> 8b 74 0d 48 8b 78 30 be 02 00 00 00 ff d2 eb
RIP [<
ffffffff810c2625>] shrink_page_list+0x24e/0x897
RSP <
ffff88003da499b8>
CR2:
0000000000000028
---[ end trace
703d2451af6ffbfd ]---
Kernel panic - not syncing: Fatal exception
This patch fixes the issue, by assuring the proper tests are made at
putback_movable_pages() & reclaim_clean_pages_from_list() to avoid
isolated balloon pages being wrongly reinserted in LRU lists.
[akpm@linux-foundation.org: clarify awkward comment text]
Signed-off-by: Rafael Aquini <aquini@redhat.com>
Reported-by: Luiz Capitulino <lcapitulino@redhat.com>
Tested-by: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Rik van Riel <riel@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Felipe Pena [Mon, 30 Sep 2013 20:45:14 +0000 (13:45 -0700)]
arch/parisc/mm/fault.c: fix uninitialized variable usage
The FAULT_FLAG_WRITE flag has been set based on uninitialized variable.
Fixes a regression added by commit
759496ba6407 ("arch: mm: pass
userspace fault flag to generic fault handler")
Signed-off-by: Felipe Pena <felipensp@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Helge Deller <deller@gmx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morton [Mon, 30 Sep 2013 20:45:13 +0000 (13:45 -0700)]
include/asm-generic/vtime.h: avoid zero-length file
patch(1) can't handle zero-length files - it appears to simply not create
the file, so my powerpc build fails.
Put something in here to make life easier.
Cc: Hugh Dickins <hughd@google.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vyacheslav Dubeyko [Mon, 30 Sep 2013 20:45:12 +0000 (13:45 -0700)]
nilfs2: fix issue with race condition of competition between segments for dirty blocks
Many NILFS2 users were reported about strange file system corruption
(for example):
NILFS: bad btree node (blocknr=185027): level = 0, flags = 0x0, nchildren = 768
NILFS error (device sda4): nilfs_bmap_last_key: broken bmap (inode number=11540)
But such error messages are consequence of file system's issue that takes
place more earlier. Fortunately, Jerome Poulin <jeromepoulin@gmail.com>
and Anton Eliasson <devel@antoneliasson.se> were reported about another
issue not so recently. These reports describe the issue with segctor
thread's crash:
BUG: unable to handle kernel paging request at
0000000000004c83
IP: nilfs_end_page_io+0x12/0xd0 [nilfs2]
Call Trace:
nilfs_segctor_do_construct+0xf25/0x1b20 [nilfs2]
nilfs_segctor_construct+0x17b/0x290 [nilfs2]
nilfs_segctor_thread+0x122/0x3b0 [nilfs2]
kthread+0xc0/0xd0
ret_from_fork+0x7c/0xb0
These two issues have one reason. This reason can raise third issue
too. Third issue results in hanging of segctor thread with eating of
100% CPU.
REPRODUCING PATH:
One of the possible way or the issue reproducing was described by
Jermoe me Poulin <jeromepoulin@gmail.com>:
1. init S to get to single user mode.
2. sysrq+E to make sure only my shell is running
3. start network-manager to get my wifi connection up
4. login as root and launch "screen"
5. cd /boot/log/nilfs which is a ext3 mount point and can log when NILFS dies.
6. lscp | xz -9e > lscp.txt.xz
7. mount my snapshot using mount -o cp=
3360839,ro /dev/vgUbuntu/root /mnt/nilfs
8. start a screen to dump /proc/kmsg to text file since rsyslog is killed
9. start a screen and launch strace -f -o find-cat.log -t find
/mnt/nilfs -type f -exec cat {} > /dev/null \;
10. start a screen and launch strace -f -o apt-get.log -t apt-get update
11. launch the last command again as it did not crash the first time
12. apt-get crashes
13. ps aux > ps-aux-crashed.log
13. sysrq+W
14. sysrq+E wait for everything to terminate
15. sysrq+SUSB
Simplified way of the issue reproducing is starting kernel compilation
task and "apt-get update" in parallel.
REPRODUCIBILITY:
The issue is reproduced not stable [60% - 80%]. It is very important to
have proper environment for the issue reproducing. The critical
conditions for successful reproducing:
(1) It should have big modified file by mmap() way.
(2) This file should have the count of dirty blocks are greater that
several segments in size (for example, two or three) from time to time
during processing.
(3) It should be intensive background activity of files modification
in another thread.
INVESTIGATION:
First of all, it is possible to see that the reason of crash is not valid
page address:
NILFS [nilfs_segctor_complete_write]:2100 bh->b_count 0, bh->b_blocknr
13895680, bh->b_size
13897727, bh->b_page
0000000000001a82
NILFS [nilfs_segctor_complete_write]:2101 segbuf->sb_segnum 6783
Moreover, value of b_page (0x1a82) is 6786. This value looks like segment
number. And b_blocknr with b_size values look like block numbers. So,
buffer_head's pointer points on not proper address value.
Detailed investigation of the issue is discovered such picture:
[-----------------------------SEGMENT 6783-------------------------------]
NILFS [nilfs_segctor_do_construct]:2310 nilfs_segctor_begin_construction
NILFS [nilfs_segctor_do_construct]:2321 nilfs_segctor_collect
NILFS [nilfs_segctor_do_construct]:2336 nilfs_segctor_assign
NILFS [nilfs_segctor_do_construct]:2367 nilfs_segctor_update_segusage
NILFS [nilfs_segctor_do_construct]:2371 nilfs_segctor_prepare_write
NILFS [nilfs_segctor_do_construct]:2376 nilfs_add_checksums_on_logs
NILFS [nilfs_segctor_do_construct]:2381 nilfs_segctor_write
NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector
111149024, segbuf->sb_segnum 6783
[-----------------------------SEGMENT 6784-------------------------------]
NILFS [nilfs_segctor_do_construct]:2310 nilfs_segctor_begin_construction
NILFS [nilfs_segctor_do_construct]:2321 nilfs_segctor_collect
NILFS [nilfs_lookup_dirty_data_buffers]:782 bh->b_count 1, bh->b_page
ffffea000709b000, page->index 0, i_ino
1033103, i_size
25165824
NILFS [nilfs_lookup_dirty_data_buffers]:783 bh->b_assoc_buffers.next
ffff8802174a6798, bh->b_assoc_buffers.prev
ffff880221cffee8
NILFS [nilfs_segctor_do_construct]:2336 nilfs_segctor_assign
NILFS [nilfs_segctor_do_construct]:2367 nilfs_segctor_update_segusage
NILFS [nilfs_segctor_do_construct]:2371 nilfs_segctor_prepare_write
NILFS [nilfs_segctor_do_construct]:2376 nilfs_add_checksums_on_logs
NILFS [nilfs_segctor_do_construct]:2381 nilfs_segctor_write
NILFS [nilfs_segbuf_submit_bh]:575 bh->b_count 1, bh->b_page
ffffea000709b000, page->index 0, i_ino
1033103, i_size
25165824
NILFS [nilfs_segbuf_submit_bh]:576 segbuf->sb_segnum 6784
NILFS [nilfs_segbuf_submit_bh]:577 bh->b_assoc_buffers.next
ffff880218a0d5f8, bh->b_assoc_buffers.prev
ffff880218bcdf50
NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector
111150080, segbuf->sb_segnum 6784, segbuf->sb_nbio 0
[----------] ditto
NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector
111164416, segbuf->sb_segnum 6784, segbuf->sb_nbio 15
[-----------------------------SEGMENT 6785-------------------------------]
NILFS [nilfs_segctor_do_construct]:2310 nilfs_segctor_begin_construction
NILFS [nilfs_segctor_do_construct]:2321 nilfs_segctor_collect
NILFS [nilfs_lookup_dirty_data_buffers]:782 bh->b_count 2, bh->b_page
ffffea000709b000, page->index 0, i_ino
1033103, i_size
25165824
NILFS [nilfs_lookup_dirty_data_buffers]:783 bh->b_assoc_buffers.next
ffff880219277e80, bh->b_assoc_buffers.prev
ffff880221cffc88
NILFS [nilfs_segctor_do_construct]:2367 nilfs_segctor_update_segusage
NILFS [nilfs_segctor_do_construct]:2371 nilfs_segctor_prepare_write
NILFS [nilfs_segctor_do_construct]:2376 nilfs_add_checksums_on_logs
NILFS [nilfs_segctor_do_construct]:2381 nilfs_segctor_write
NILFS [nilfs_segbuf_submit_bh]:575 bh->b_count 2, bh->b_page
ffffea000709b000, page->index 0, i_ino
1033103, i_size
25165824
NILFS [nilfs_segbuf_submit_bh]:576 segbuf->sb_segnum 6785
NILFS [nilfs_segbuf_submit_bh]:577 bh->b_assoc_buffers.next
ffff880218a0d5f8, bh->b_assoc_buffers.prev
ffff880222cc7ee8
NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector
111165440, segbuf->sb_segnum 6785, segbuf->sb_nbio 0
[----------] ditto
NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector
111177728, segbuf->sb_segnum 6785, segbuf->sb_nbio 12
NILFS [nilfs_segctor_do_construct]:2399 nilfs_segctor_wait
NILFS [nilfs_segbuf_wait]:676 segbuf->sb_segnum 6783
NILFS [nilfs_segbuf_wait]:676 segbuf->sb_segnum 6784
NILFS [nilfs_segbuf_wait]:676 segbuf->sb_segnum 6785
NILFS [nilfs_segctor_complete_write]:2100 bh->b_count 0, bh->b_blocknr
13895680, bh->b_size
13897727, bh->b_page
0000000000001a82
BUG: unable to handle kernel paging request at
0000000000001a82
IP: [<
ffffffffa024d0f2>] nilfs_end_page_io+0x12/0xd0 [nilfs2]
Usually, for every segment we collect dirty files in list. Then, dirty
blocks are gathered for every dirty file, prepared for write and
submitted by means of nilfs_segbuf_submit_bh() call. Finally, it takes
place complete write phase after calling nilfs_end_bio_write() on the
block layer. Buffers/pages are marked as not dirty on final phase and
processed files removed from the list of dirty files.
It is possible to see that we had three prepare_write and submit_bio
phases before segbuf_wait and complete_write phase. Moreover, segments
compete between each other for dirty blocks because on every iteration
of segments processing dirty buffer_heads are added in several lists of
payload_buffers:
[SEGMENT 6784]: bh->b_assoc_buffers.next
ffff880218a0d5f8, bh->b_assoc_buffers.prev
ffff880218bcdf50
[SEGMENT 6785]: bh->b_assoc_buffers.next
ffff880218a0d5f8, bh->b_assoc_buffers.prev
ffff880222cc7ee8
The next pointer is the same but prev pointer has changed. It means
that buffer_head has next pointer from one list but prev pointer from
another. Such modification can be made several times. And, finally, it
can be resulted in various issues: (1) segctor hanging, (2) segctor
crashing, (3) file system metadata corruption.
FIX:
This patch adds:
(1) setting of BH_Async_Write flag in nilfs_segctor_prepare_write()
for every proccessed dirty block;
(2) checking of BH_Async_Write flag in
nilfs_lookup_dirty_data_buffers() and
nilfs_lookup_dirty_node_buffers();
(3) clearing of BH_Async_Write flag in nilfs_segctor_complete_write(),
nilfs_abort_logs(), nilfs_forget_buffer(), nilfs_clear_dirty_page().
Reported-by: Jerome Poulin <jeromepoulin@gmail.com>
Reported-by: Anton Eliasson <devel@antoneliasson.se>
Cc: Paul Fertser <fercerpav@gmail.com>
Cc: ARAI Shun-ichi <hermes@ceres.dti.ne.jp>
Cc: Piotr Szymaniak <szarpaj@grubelek.pl>
Cc: Juan Barry Manuel Canham <Linux@riotingpacifist.net>
Cc: Zahid Chowdhury <zahid.chowdhury@starsolutions.com>
Cc: Elmer Zhang <freeboy6716@gmail.com>
Cc: Kenneth Langga <klangga@gmail.com>
Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com>
Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Weiping Pan [Mon, 30 Sep 2013 20:45:10 +0000 (13:45 -0700)]
Documentation/kernel-parameters.txt: replace kernelcore with Movable
Han Pingtian found a typo in Documentation/kernel-parameters.txt about
"kernelcore=", that "kernelcore" should be replaced with "Movable" here.
Signed-off-by: Weiping Pan <wpan@redhat.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Darrick J. Wong [Mon, 30 Sep 2013 20:45:09 +0000 (13:45 -0700)]
mm/bounce.c: fix a regression where MS_SNAP_STABLE (stable pages snapshotting) was ignored
The "force" parameter in __blk_queue_bounce was being ignored, which
means that stable page snapshots are not always happening (on ext3).
This of course leads to DIF disks reporting checksum errors, so fix this
regression.
The regression was introduced in commit
6bc454d15004 ("bounce: Refactor
__blk_queue_bounce to not use bi_io_vec")
Reported-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Kent Overstreet <koverstreet@google.com>
Cc: <stable@vger.kernel.org> [3.10+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tetsuo Handa [Mon, 30 Sep 2013 20:45:08 +0000 (13:45 -0700)]
kernel/kmod.c: check for NULL in call_usermodehelper_exec()
If /proc/sys/kernel/core_pattern contains only "|", a NULL pointer
dereference happens upon core dump because argv_split("") returns
argv[0] == NULL.
This bug was once fixed by commit
264b83c07a84 ("usermodehelper: check
subprocess_info->path != NULL") but was by error reintroduced by commit
7f57cfa4e2aa ("usermodehelper: kill the sub_info->path[0] check").
This bug seems to exist since 2.6.19 (the version which core dump to
pipe was added). Depending on kernel version and config, some side
effect might happen immediately after this oops (e.g. kernel panic with
2.6.32-358.18.1.el6).
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Manfred Spraul [Mon, 30 Sep 2013 20:45:07 +0000 (13:45 -0700)]
ipc/sem.c: synchronize the proc interface
The proc interface is not aware of sem_lock(), it instead calls
ipc_lock_object() directly. This means that simple semop() operations
can run in parallel with the proc interface. Right now, this is
uncritical, because the implementation doesn't do anything that requires
a proper synchronization.
But it is dangerous and therefore should be fixed.
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Manfred Spraul [Mon, 30 Sep 2013 20:45:06 +0000 (13:45 -0700)]
ipc/sem.c: optimize sem_lock()
Operations that need access to the whole array must guarantee that there
are no simple operations ongoing. Right now this is achieved by
spin_unlock_wait(sem->lock) on all semaphores.
If complex_count is nonzero, then this spin_unlock_wait() is not
necessary, because it was already performed in the past by the thread
that increased complex_count and even though sem_perm.lock was dropped
inbetween, no simple operation could have started, because simple
operations cannot start when complex_count is non-zero.
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Mike Galbraith <bitbucket@online.de>
Cc: Rik van Riel <riel@redhat.com>
Reviewed-by: Davidlohr Bueso <davidlohr@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Manfred Spraul [Mon, 30 Sep 2013 20:45:04 +0000 (13:45 -0700)]
ipc/sem.c: fix race in sem_lock()
The exclusion of complex operations in sem_lock() is insufficient: after
acquiring the per-semaphore lock, a simple op must first check that
sem_perm.lock is not locked and only after that test check
complex_count. The current code does it the other way around - and that
creates a race. Details are below.
The patch is a complete rewrite of sem_lock(), based in part on the code
from Mike Galbraith. It removes all gotos and all loops and thus the
risk of livelocks.
I have tested the patch (together with the next one) on my i3 laptop and
it didn't cause any problems.
The bug is probably also present in 3.10 and 3.11, but for these kernels
it might be simpler just to move the test of sma->complex_count after
the spin_is_locked() test.
Details of the bug:
Assume:
- sma->complex_count = 0.
- Thread 1: semtimedop(complex op that must sleep)
- Thread 2: semtimedop(simple op).
Pseudo-Trace:
Thread 1: sem_lock(): acquire sem_perm.lock
Thread 1: sem_lock(): check for ongoing simple ops
Nothing ongoing, thread 2 is still before sem_lock().
Thread 1: try_atomic_semop()
<<< preempted.
Thread 2: sem_lock():
static inline int sem_lock(struct sem_array *sma, struct sembuf *sops,
int nsops)
{
int locknum;
again:
if (nsops == 1 && !sma->complex_count) {
struct sem *sem = sma->sem_base + sops->sem_num;
/* Lock just the semaphore we are interested in. */
spin_lock(&sem->lock);
/*
* If sma->complex_count was set while we were spinning,
* we may need to look at things we did not lock here.
*/
if (unlikely(sma->complex_count)) {
spin_unlock(&sem->lock);
goto lock_array;
}
<<<<<<<<<
<<< complex_count is still 0.
<<<
<<< Here it is preempted
<<<<<<<<<
Thread 1: try_atomic_semop() returns, notices that it must sleep.
Thread 1: increases sma->complex_count.
Thread 1: drops sem_perm.lock
Thread 2:
/*
* Another process is holding the global lock on the
* sem_array; we cannot enter our critical section,
* but have to wait for the global lock to be released.
*/
if (unlikely(spin_is_locked(&sma->sem_perm.lock))) {
spin_unlock(&sem->lock);
spin_unlock_wait(&sma->sem_perm.lock);
goto again;
}
<<< sem_perm.lock already dropped, thus no "goto again;"
locknum = sops->sem_num;
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Mike Galbraith <bitbucket@online.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: <stable@vger.kernel.org> [3.10+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Rientjes [Mon, 30 Sep 2013 20:45:03 +0000 (13:45 -0700)]
mm/compaction.c: periodically schedule when freeing pages
We've been getting warnings about an excessive amount of time spent
allocating pages for migration during memory compaction without
scheduling. isolate_freepages_block() already periodically checks for
contended locks or the need to schedule, but isolate_freepages() never
does.
When a zone is massively long and no suitable targets can be found, this
iteration can be quite expensive without ever doing cond_resched().
Check periodically for the need to reschedule while the compaction free
scanner iterates.
Signed-off-by: David Rientjes <rientjes@google.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dan Aloni [Mon, 30 Sep 2013 20:45:02 +0000 (13:45 -0700)]
fs/binfmt_elf.c: prevent a coredump with a large vm_map_count from Oopsing
A high setting of max_map_count, and a process core-dumping with a large
enough vm_map_count could result in an NT_FILE note not being written,
and the kernel crashing immediately later because it has assumed
otherwise.
Reproduction of the oops-causing bug described here:
https://lkml.org/lkml/2013/8/30/50
Rge ussue originated in commit
2aa362c49c31 ("coredump: extend core dump
note section to contain file names of mapped file") from Oct 4, 2012.
This patch make that section optional in that case. fill_files_note()
should signify the error, and also let the info struct in
elf_core_dump() be zero-initialized so that we can check for the
optionally written note.
[akpm@linux-foundation.org: avoid abusing E2BIG, remove a couple of not-really-needed local variables]
[akpm@linux-foundation.org: fix sparse warning]
Signed-off-by: Dan Aloni <alonid@stratoscale.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Denys Vlasenko <vda.linux@googlemail.com>
Reported-by: Martin MOKREJS <mmokrejs@gmail.com>
Tested-by: Martin MOKREJS <mmokrejs@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joonyoung Shim [Mon, 30 Sep 2013 20:45:01 +0000 (13:45 -0700)]
revert "mm/memory-hotplug: fix lowmem count overflow when offline pages"
This reverts commit
cea27eb2a202 ("mm/memory-hotplug: fix lowmem count
overflow when offline pages").
The fixed bug by commit
cea27eb was fixed to another way by commit
3dcc0571cd64 ("mm: correctly update zone->managed_pages"). That commit
enhances memory_hotplug.c to adjust totalhigh_pages when hot-removing
memory, for details please refer to:
http://marc.info/?l=linux-mm&m=
136957578620221&w=2
As a result, commit
cea27eb2a202 currently causes duplicated decreasing
of totalhigh_pages, thus the revert.
Signed-off-by: Joonyoung Shim <jy0922.shim@samsung.com>
Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Cc: Jiang Liu <liuj97@gmail.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric Dumazet [Fri, 27 Sep 2013 21:20:01 +0000 (14:20 -0700)]
pkt_sched: fq: qdisc dismantle fixes
fq_reset() should drops all packets in queue, including
throttled flows.
This patch moves code from fq_destroy() to fq_reset()
to do the cleaning.
fq_change() must stop calling fq_dequeue() if all remaining
packets are from throttled flows.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 26 Sep 2013 15:44:06 +0000 (08:44 -0700)]
net: flow_dissector: fix thoff for IPPROTO_AH
In commit
8ed781668dd49 ("flow_keys: include thoff into flow_keys for
later usage"), we missed that existing code was using nhoff as a
temporary variable that could not always contain transport header
offset.
This is not a problem for TCP/UDP because port offset (@poff)
is 0 for these protocols.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Daniel Borkmann <dborkman@redhat.com>
Cc: Nikolay Aleksandrov <nikolay@redhat.com>
Acked-by: Nikolay Aleksandrov <nikolay@redhat.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Liu [Thu, 26 Sep 2013 11:41:53 +0000 (12:41 +0100)]
MAINTAINERS: add myself as maintainer of xen-netback
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paul Durrant [Thu, 26 Sep 2013 11:09:52 +0000 (12:09 +0100)]
xen-netback: Handle backend state transitions in a more robust way
When the frontend state changes netback now specifies its desired state to
a new function, set_backend_state(), which transitions through any
necessary intermediate states.
This fixes an issue observed with some old Windows frontend drivers where
they failed to transition through the Closing state and netback would not
behave correctly.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paul Marks [Wed, 25 Sep 2013 22:12:55 +0000 (15:12 -0700)]
ipv6: Fix preferred_lft not updating in some cases
Consider the scenario where an IPv6 router is advertising a fixed
preferred_lft of 1800 seconds, while the valid_lft begins at 3600
seconds and counts down in realtime.
A client should reset its preferred_lft to 1800 every time the RA is
received, but a bug is causing Linux to ignore the update.
The core problem is here:
if (prefered_lft != ifp->prefered_lft) {
Note that ifp->prefered_lft is an offset, so it doesn't decrease over
time. Thus, the comparison is always (1800 != 1800), which fails to
trigger an update.
The most direct solution would be to compute a "stored_prefered_lft",
and use that value in the comparison. But I think that trying to filter
out unnecessary updates here is a premature optimization. In order for
the filter to apply, both of these would need to hold:
- The advertised valid_lft and preferred_lft are both declining in
real time.
- No clock skew exists between the router & client.
So in this patch, I've set "update_lft = 1" unconditionally, which
allows the surrounding code to be greatly simplified.
Signed-off-by: Paul Marks <pmarks@google.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pravin B Shelar [Wed, 25 Sep 2013 16:57:47 +0000 (09:57 -0700)]
ip_tunnel: Do not use stale inner_iph pointer.
While sending packet skb_cow_head() can change skb header which
invalidates inner_iph pointer to skb header. Following patch
avoid using it. Found by code inspection.
This bug was introduced by commit
0e6fbc5b6c6218 (ip_tunnels: extend
iptunnel_xmit()).
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Aleksander Morgado [Wed, 25 Sep 2013 15:02:36 +0000 (17:02 +0200)]
net: qmi_wwan: fix Cinterion PLXX product ID
Cinterion PLXX LTE devices have a 0x0060 product ID, not 0x12d1.
The blacklisting in the serial/option driver does actually use the correct PID,
as per commit
8ff10bdb14a52e3f25d4ce09e0582a8684c1a6db ('USB: Blacklisted
Cinterion's PLxx WWAN Interface').
CC: Hans-Christoph Schemmel <hans-christoph.schemmel@gemalto.com>
CC: Christian Schmiedl <christian.schmiedl@gemalto.com>
CC: Nicolaus Colberg <nicolaus.colberg@gemalto.com>
Signed-off-by: Aleksander Morgado <aleksander@lanedo.com>
Acked-by: Bjørn Mork <bjorn@mork.no>
Acked-by: Christian Schmiedl <christian.schmiedl@gemalto.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Aida Mynzhasova [Wed, 25 Sep 2013 07:24:23 +0000 (11:24 +0400)]
powerpc/83xx: gianfar_ptp: select 1588 clock source through dts file
Currently IEEE 1588 timer reference clock source is determined through
hard-coded value in gianfar_ptp driver. This patch allows to select ptp
clock source by means of device tree file node.
For instance:
fsl,cksel = <0>;
for using external (TSEC_TMR_CLK input) high precision timer
reference clock.
Other acceptable values:
<1> : eTSEC system clock
<2> : eTSEC1 transmit clock
<3> : RTC clock input
When this attribute isn't used, eTSEC system clock will serve as
IEEE 1588 timer reference clock.
Signed-off-by: Aida Mynzhasova <aida.mynzhasova@skitlab.ru>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pravin B Shelar [Tue, 24 Sep 2013 17:25:40 +0000 (10:25 -0700)]
vxlan: Use RCU apis to access sk_user_data.
Use of RCU api makes vxlan code easier to understand. It also
fixes bug due to missing ACCESS_ONCE() on sk_user_data dereference.
In rare case without ACCESS_ONCE() compiler might omit vs on
sk_user_data dereference.
Compiler can use vs as alias for sk->sk_user_data, resulting in
multiple sk_user_data dereference in rcu read context which
could change.
CC: Jesse Gross <jesse@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Mon, 30 Sep 2013 18:13:33 +0000 (11:13 -0700)]
Merge tag 'regulator-v3.12-rc3' of git://git./linux/kernel/git/broonie/regulator
Pull regulator fixes from Mark Brown:
"Quite a few fixes here, mostly small driver specific ones.
The stand out thing is a fix for errors generating the documentation
from Randy Dunlap, otherwise unless you're using the driver in
question there should be no impact"
* tag 'regulator-v3.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: ti-abb: Fix bias voltage glitch in transition to bypass mode
regulator: wm831x-ldo: Fix max_uV for gp_ldo and aldo linear range settings
regulator: wm8350: correct the max_uV of LDO
regulator: fix fatal kernel-doc error
regulator: palmas: Remove wrong comment for the equation calculating num_voltages
regulator: da9063: Fix PTR_ERR/ERR_PTR mismatch
regulator: palmas: configure enable time for LDOs
regulator: palmas: fix the n_voltages for smps to 122
Linus Torvalds [Mon, 30 Sep 2013 18:12:20 +0000 (11:12 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jmorris/linux-security
Pull apparmor fixes from James Morris:
"Bugfixes for the Apparmor code for regressions introduced in the 3.12
pull request"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
apparmor: fix suspicious RCU usage warning in policy.c/policy.h
apparmor: Use shash crypto API interface for profile hashes
Linus Torvalds [Mon, 30 Sep 2013 18:11:28 +0000 (11:11 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/viro/vfs
Pull assorted vfs fixes from Al Viro:
"A couple of bug fixes + removal of dead code in afs ->d_revalidate()"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
afs: dget_parent() can't return a negative dentry
ocfs2: needs ->d_lock to poke in ->d_parent->d_inode from ->d_revalidate()
sysv: Add forgotten superblock lock init for v7 fs