Jussi Kivilinna [Sun, 7 Apr 2013 13:43:41 +0000 (16:43 +0300)]
crypto: gcm - make GMAC work when dst and src are different
The GMAC code assumes that dst==src, which causes problems when trying to add
rfc4543(gcm(aes)) test vectors.
So fix this code to work when source and destination buffer are different.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Alexander Clouter [Sun, 31 Mar 2013 16:34:51 +0000 (17:34 +0100)]
hwrng: timeriomem - added devicetree hooks
This patch allows timeriomem_rng to be used via devicetree.
Signed-off-by: Alexander Clouter <alex@digriz.org.uk>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Alexander Clouter [Sun, 31 Mar 2013 16:34:50 +0000 (17:34 +0100)]
hwrng: timeriomem - update to support more than one device
timeriomem_rng only supports a single device instance. This patch
enables multiple timeriomem_rng devices to coexist as well as adds
some additional error checking.
Signed-off-by: Alexander Clouter <alex@digriz.org.uk>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Sandy Wu [Fri, 29 Mar 2013 00:05:44 +0000 (17:05 -0700)]
crypto: crc32-pclmul - Use gas macro for pclmulqdq
Occurs when CONFIG_CRYPTO_CRC32C_INTEL=y and CONFIG_CRYPTO_CRC32C_INTEL=y.
Older versions of bintuils do not support the pclmulqdq instruction. The
PCLMULQDQ gas macro is used instead.
Signed-off-by: Sandy Wu <sandyw@twitter.com>
Cc: stable@vger.kernel.org # 3.8+
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Lubomir Rintel [Thu, 28 Mar 2013 06:19:38 +0000 (07:19 +0100)]
hwrng: bcm2835 - Add Broadcom BCM2835 RNG driver
This adds a driver for random number generator present on Broadcom BCM2835 SoC,
used in Raspberry Pi and Roku 2 devices.
Signed-off-by: Dom Cobley <popcornmix@gmail.com>
Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
Tested-by: Stephen Warren <swarren@wwwdotorg.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Stephen Warren <swarren@wwwdotorg.org>
Cc: Matt Mackall <mpm@selenic.com>
Cc: linux-rpi-kernel@lists.infradead.org
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Kim Phillips [Tue, 26 Mar 2013 23:10:15 +0000 (18:10 -0500)]
crypto: caam - static constify error data
checkstack reports report_deco_status(), report_ccb_status() as
particularly excessive stack users. Move their lookup tables
off the stack and put them in .rodata.
Signed-off-by: Kim Phillips <kim.phillips@freescale.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Kim Phillips [Tue, 26 Mar 2013 23:10:14 +0000 (18:10 -0500)]
crypto: caam - change key gen functions to return signed int
commit
2af8f4a "crypto: caam - coccicheck fixes" added error
return values yet neglected to change the type from unsigned.
Signed-off-by: Kim Phillips <kim.phillips@freescale.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tim Chen [Tue, 26 Mar 2013 21:00:02 +0000 (14:00 -0700)]
crypto: sha512 - Create module providing optimized SHA512 routines using SSSE3, AVX or AVX2 instructions.
We added glue code and config options to create crypto
module that uses SSE/AVX/AVX2 optimized SHA512 x86_64 assembly routines.
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tim Chen [Tue, 26 Mar 2013 20:59:55 +0000 (13:59 -0700)]
crypto: sha512 - Optimized SHA512 x86_64 assembly routine using AVX2 RORX instruction.
Provides SHA512 x86_64 assembly routine optimized with SSE, AVX and
AVX2's RORX instructions. Speedup of 70% or more has been
measured over the generic implementation.
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tim Chen [Tue, 26 Mar 2013 20:59:46 +0000 (13:59 -0700)]
crypto: sha512 - Optimized SHA512 x86_64 assembly routine using AVX instructions.
Provides SHA512 x86_64 assembly routine optimized with SSE and AVX instructions.
Speedup of 60% or more has been measured over the generic implementation.
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tim Chen [Tue, 26 Mar 2013 20:59:37 +0000 (13:59 -0700)]
crypto: sha512 - Optimized SHA512 x86_64 assembly routine using Supplemental SSE3 instructions.
Provides SHA512 x86_64 assembly routine optimized with SSSE3 instructions.
Speedup of 40% or more has been measured over the generic implementation.
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tim Chen [Tue, 26 Mar 2013 20:59:25 +0000 (13:59 -0700)]
crypto: sha512 - Expose generic sha512 routine to be callable from other modules
Other SHA512 routines may need to use the generic routine when
FPU is not available.
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tim Chen [Tue, 26 Mar 2013 20:59:17 +0000 (13:59 -0700)]
crypto: sha256 - Create module providing optimized SHA256 routines using SSSE3, AVX or AVX2 instructions.
We added glue code and config options to create crypto
module that uses SSE/AVX/AVX2 optimized SHA256 x86_64 assembly routines.
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tim Chen [Tue, 26 Mar 2013 20:59:10 +0000 (13:59 -0700)]
crypto: sha256 - Optimized sha256 x86_64 routine using AVX2's RORX instructions
Provides SHA256 x86_64 assembly routine optimized with SSE, AVX and
AVX2's RORX instructions. Speedup of 70% or more has been
measured over the generic implementation.
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tim Chen [Tue, 26 Mar 2013 20:59:05 +0000 (13:59 -0700)]
crypto: sha256 - Optimized sha256 x86_64 assembly routine with AVX instructions.
Provides SHA256 x86_64 assembly routine optimized with SSE and AVX instructions.
Speedup of 60% or more has been measured over the generic implementation.
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tim Chen [Tue, 26 Mar 2013 20:58:58 +0000 (13:58 -0700)]
crypto: sha256 - Optimized sha256 x86_64 assembly routine using Supplemental SSE3 instructions.
Provides SHA256 x86_64 assembly routine optimized with SSSE3 instructions.
Speedup of 40% or more has been measured over the generic implementation.
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tim Chen [Tue, 26 Mar 2013 20:58:49 +0000 (13:58 -0700)]
crypto: sha256 - Expose SHA256 generic routine to be callable externally.
Other SHA256 routine may need to use the generic routine when
FPU is not available.
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Jussi Kivilinna [Sun, 24 Mar 2013 13:34:07 +0000 (15:34 +0200)]
crypto: x86 - build AVX block cipher implementations only if assembler supports AVX instructions
These modules require AVX support in assembler, so add new check to Makefile
for this.
Other option would be to use CONFIG_AS_AVX inside source files, but that would
result dummy/empty/no-fuctionality modules being created.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Jussi Kivilinna [Sun, 24 Mar 2013 12:32:01 +0000 (14:32 +0200)]
crypto: x86/crc32-pclmul - assembly clean-ups: use ENTRY/ENDPROC
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Wei Yongjun [Fri, 22 Mar 2013 13:18:44 +0000 (21:18 +0800)]
crypto: ux500 - fix error return code in hash_dma_final()
Fix to return a negative error code from the error handling
case instead of 0, as returned elsewhere in this function.
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Sachin Kamat [Thu, 14 Mar 2013 10:16:58 +0000 (15:46 +0530)]
crypto: picoxcell - Use of_match_ptr() macro
This eliminates having an #ifdef returning NULL for the case
when OF is disabled.
Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Acked-by: Jamie Iles <jamie@jamieiles.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Fabio Estevam [Wed, 13 Mar 2013 03:57:27 +0000 (00:57 -0300)]
hwrng: mxc-rnga - Use devm_ioremap_resource()
Using devm_ioremap_resource() can make the code cleaner and simpler.
Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Vakul Garg [Tue, 12 Mar 2013 08:39:24 +0000 (14:09 +0530)]
crypto: caam - Fix missing init of '.type' in AEAD algos.
Following AEAD algo templates are updated for '.type' initialization.
(a) authenc(hmac(sha224),cbc(aes))
(b) authenc(hmac(sha384),cbc(aes))
(c) authenc(hmac(sha224),cbc(des3_ede))
(d) authenc(hmac(sha384),cbc(des3_ede))
(e) authenc(hmac(sha224),cbc(des))
(f) authenc(hmac(sha384),cbc(des))
Signed-off-by: Vakul Garg <vakul@freescale.com>
Reviewed-by: Kim Phillips <kim.phillips@freescale.com>
Signed-off-by: Andy Fleming <afleming@freescale.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Vakul Garg [Tue, 12 Mar 2013 08:25:21 +0000 (13:55 +0530)]
crypto: caam - set RDB bit in security configuration register
This change is required for post SEC-5.0 devices which have RNG4.
Setting RDB in security configuration register allows CAAM to use the
"Random Data Buffer" to be filled by a single request. The Random Data
Buffer is large enough for ten packets to get their IVs from a single
request. If the Random Data Buffer is not enabled, then each IV causes a
separate request, and RNG4 hardware cannot keep up resulting in lower
IPSEC throughput if random IVs are used.
Signed-off-by: Vakul Garg <vakul@freescale.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Jingoo Han [Tue, 12 Mar 2013 05:46:22 +0000 (14:46 +0900)]
hwrng: exynos - add CONFIG_PM_SLEEP/CONFIG_PM_RUNTIME to suspend/resume
This patch adds CONFIG_PM_SLEEP to suspend/resume functions to fix
the following build warning when CONFIG_PM_SLEEP is not selected.
drivers/char/hw_random/exynos-rng.c:147:12: warning: 'exynos_rng_runtime_suspend' defined but not used [-Wunused-function]
drivers/char/hw_random/exynos-rng.c:157:12: warning: 'exynos_rng_runtime_resume' defined but not used [-Wunused-function]
Add CONFIG_PM_RUNTIME to suspend/resume functions to fix the build
error. It is because UNIVERSAL_DEV_PM_OPS macro is related to both
CONFIG_PM_SLEEP and CONFIG_PM_RUNTIME.
drivers/char/hw_random/exynos-rng.c:167:8: error: 'exynos_rng_runtime_suspend' undeclared here (not in a function)
drivers/char/hw_random/exynos-rng.c:167:8: error: 'exynos_rng_runtime_resume' undeclared here (not in a function)
Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Reported-by: David Rientjes <rientjes@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Mihnea Dobrescu-Balaur [Mon, 11 Mar 2013 10:48:10 +0000 (12:48 +0200)]
crypto: ux500 - replace kmalloc and then memcpy with kmemdup
Signed-off-by: Mihnea Dobrescu-Balaur <mihneadb@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Javier Martin [Fri, 1 Mar 2013 11:37:53 +0000 (12:37 +0100)]
crypto: sahara - Add driver for SAHARA2 accelerator.
SAHARA2 HW module is included in the i.MX27 SoC from
Freescale. It is capable of performing cipher algorithms
such as AES, 3DES..., hashing and RNG too.
This driver provides support for AES-CBC and AES-ECB
by now.
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Javier Martin <javier.martin@vista-silicon.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tang Chen [Thu, 7 Mar 2013 10:38:17 +0000 (18:38 +0800)]
hwrng: Fix a wrong comment in Documentation/hw_random.txt
Seeing from the comment, there should be three reasons for removing request_mem_region.
Change the comment "two" to "three".
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Acked-by: Rob Landley <rob@landley.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Paul Bolle [Tue, 5 Mar 2013 13:33:16 +0000 (14:33 +0100)]
crypto: caam - fix typo "CRYPTO_AHASH"
The Kconfig entry for CAAM's hash algorithm implementations has always
selected CRYPTO_AHASH. But there's no corresponding Kconfig symbol.
It seems it was intended to select CRYPTO_HASH, like other crypto
drivers do. That would apparently (indirectly) select CRYPTO_HASH2,
which would enable the ahash functionality this driver uses.
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Reviewed-by: Kim Phillips <kim.phillips@freescale.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Sachin Kamat [Mon, 4 Mar 2013 09:39:43 +0000 (15:09 +0530)]
crypto: omap-sham - Use module_platform_driver macro
module_platform_driver() makes the code simpler by eliminating boilerplate
code.
Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Sachin Kamat [Mon, 4 Mar 2013 09:39:42 +0000 (15:09 +0530)]
crypto: omap-aes - Use module_platform_driver macro
module_platform_driver() makes the code simpler by eliminating boilerplate
code.
Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Joel A Fernandes [Tue, 26 Feb 2013 16:04:32 +0000 (10:04 -0600)]
crypto: omap-aes - Use pm_runtime_put instead of pm_runtime_put_sync in tasklet
After DMA is complete, the omap_aes_finish_req function is called as
a part of the done_task tasklet. During this its atomic and any calls
to pm functions should not assume they wont sleep.
The patch replaces a call to pm_runtime_put_sync (which can sleep) with
pm_runtime_put thus fixing a kernel panic observed on AM33xx SoC during
AES operation.
Tested on an AM33xx SoC device (beaglebone board).
To reproduce the problem, I used the tcrypt kernel module as:
modprobe tcrypt sec=2 mode=500
Signed-off-by: Joel A Fernandes <joelagnel@ti.com>
Cc: David S. Miller <davem@davemloft.net>
Acked-by: Mark A. Greer <mgreer@animalcreek.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Joel A Fernandes [Tue, 26 Feb 2013 16:04:31 +0000 (10:04 -0600)]
crypto: omap-sham - Use pm_runtime_put instead of pm_runtime_put_sync in tasklet
After DMA is complete, the omap_sham_finish_req function is called as
a part of the done_task tasklet. During this its atomic and any calls
to pm functions should not assume they wont sleep.
The patch replaces a call to pm_runtime_put_sync (which can sleep) with
pm_runtime_put thus fixing a kernel panic observed on AM33xx SoC during
SHA operation.
Tested on an AM33xx SoC device (beaglebone board).
To reproduce the problem, used the tcrypt kernel module as:
modprobe tcrypt sec=2 mode=403
Signed-off-by: Joel A Fernandes <joelagnel@ti.com>
Cc: David S. Miller <davem@davemloft.net>
Acked-by: Mark A. Greer <mgreer@animalcreek.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Syam Sidhardhan [Sun, 24 Feb 2013 22:27:39 +0000 (03:57 +0530)]
crypto: bfin_crc - Fix possible NULL pointer dereference
If we define dev_dbg(), then there is a possible NULL pointer
dereference.
Signed-off-by: Syam Sidhardhan <s.syam@samsung.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Mathias Krause [Sun, 24 Feb 2013 13:09:12 +0000 (14:09 +0100)]
crypto: user - constify netlink dispatch table
There is no need to modify the netlink dispatch table at runtime and
making it const even makes the resulting object file slightly smaller.
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tim Chen [Thu, 21 Feb 2013 19:04:22 +0000 (11:04 -0800)]
crypto: crc32c - Update the links to the white papers on CRC32C calculations with PCLMULQDQ instructions.
Herbert,
The following patch update the stale link to the CRC32C white paper
that was referenced.
Tim
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Nicolas Royer [Wed, 20 Feb 2013 16:10:26 +0000 (17:10 +0100)]
crypto: atmel-sha - add support for latest release of the IP (0x410)
Updates from IP release 0x320 to 0x400:
- add DMA support (previous IP revision use PDC)
- add DMA double input buffer support
- add SHA224 support
Update from IP release 0x400 to 0x410:
- add SHA384 and SHA512 support
Signed-off-by: Nicolas Royer <nicolas@eukrea.com>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Acked-by: Eric Bénard <eric@eukrea.com>
Tested-by: Eric Bénard <eric@eukrea.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Nicolas Royer [Wed, 20 Feb 2013 16:10:25 +0000 (17:10 +0100)]
crypto: atmel-tdes - add support for latest release of the IP (0x700)
Update from previous IP release (0x600):
- add DMA support (previous IP release use PDC)
Signed-off-by: Nicolas Royer <nicolas@eukrea.com>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Acked-by: Eric Bénard <eric@eukrea.com>
Tested-by: Eric Bénard <eric@eukrea.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Nicolas Royer [Wed, 20 Feb 2013 16:10:24 +0000 (17:10 +0100)]
crypto: atmel-aes - add support for latest release of the IP (0x130)
Updates from previous IP release (0x120):
- add cfb64 support
- add DMA double input buffer support
Signed-off-by: Nicolas Royer <nicolas@eukrea.com>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Acked-by: Eric Bénard <eric@eukrea.com>
Tested-by: Eric Bénard <eric@eukrea.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Nicolas Royer [Wed, 20 Feb 2013 16:10:23 +0000 (17:10 +0100)]
ARM: AT91SAM9G45: same platform data structure for all crypto peripherals
Only AES use DMA in AT91SAM9G45 (TDES and SHA use PDC).
However latest Atmel TDES and SHA IP releases use DMA instead of PDC.
--> Atmel TDES and SHA drivers need DMA platform data for those IP releases.
Goal of this patch is to use the same platform data structure for all Atmel
crypto peripherals. This structure contains information about DMA interface.
Signed-off-by: Nicolas Royer <nicolas@eukrea.com>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Acked-by: Eric Bénard <eric@eukrea.com>
Tested-by: Eric Bénard <eric@eukrea.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Herbert Xu [Tue, 26 Feb 2013 09:52:15 +0000 (17:52 +0800)]
crypto: crc32c - Kill pointless CRYPTO_CRC32C_X86_64 option
This bool option can never be set to anything other than y. So
let's just kill it.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Linus Torvalds [Mon, 25 Feb 2013 23:56:15 +0000 (15:56 -0800)]
Merge git://git./linux/kernel/git/herbert/crypto-2.6
Pull crypto update from Herbert Xu:
"Here is the crypto update for 3.9:
- Added accelerated implementation of crc32 using pclmulqdq.
- Added test vector for fcrypt.
- Added support for OMAP4/AM33XX cipher and hash.
- Fixed loose crypto_user input checks.
- Misc fixes"
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (43 commits)
crypto: user - ensure user supplied strings are nul-terminated
crypto: user - fix empty string test in report API
crypto: user - fix info leaks in report API
crypto: caam - Added property fsl,sec-era in SEC4.0 device tree binding.
crypto: use ERR_CAST
crypto: atmel-aes - adjust duplicate test
crypto: crc32-pclmul - Kill warning on x86-32
crypto: x86/twofish - assembler clean-ups: use ENTRY/ENDPROC, localize jump labels
crypto: x86/sha1 - assembler clean-ups: use ENTRY/ENDPROC
crypto: x86/serpent - use ENTRY/ENDPROC for assember functions and localize jump targets
crypto: x86/salsa20 - assembler cleanup, use ENTRY/ENDPROC for assember functions and rename ECRYPT_* to salsa20_*
crypto: x86/ghash - assembler clean-up: use ENDPROC at end of assember functions
crypto: x86/crc32c - assembler clean-up: use ENTRY/ENDPROC
crypto: cast6-avx: use ENTRY()/ENDPROC() for assembler functions
crypto: cast5-avx: use ENTRY()/ENDPROC() for assembler functions and localize jump targets
crypto: camellia-x86_64/aes-ni: use ENTRY()/ENDPROC() for assembler functions and localize jump targets
crypto: blowfish-x86_64: use ENTRY()/ENDPROC() for assembler functions and localize jump targets
crypto: aesni-intel - add ENDPROC statements for assembler functions
crypto: x86/aes - assembler clean-ups: use ENTRY/ENDPROC, localize jump targets
crypto: testmgr - add test vector for fcrypt
...
Linus Torvalds [Mon, 25 Feb 2013 23:47:03 +0000 (15:47 -0800)]
Merge tag 'please-pull-vm_unwrapped' of git://git./linux/kernel/git/aegl/linux
Pull ia64 update from Tony Luck:
"ia64 vm patch series that was cooking in -mm tree"
* tag 'please-pull-vm_unwrapped' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
mm: use vm_unmapped_area() in hugetlbfs on ia64 architecture
mm: use vm_unmapped_area() on ia64 architecture
Linus Torvalds [Mon, 25 Feb 2013 23:45:29 +0000 (15:45 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jmorris/linux-security
Pull security subsystem fixes from James Morris:
"From Mimi:
Both of these patches are bug fixes for patches, which were
upstreamed in this open window. The first patch addresses a merge
issue. The second patch addresses a CONFIG_BLOCK dependency."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
block: fix part_pack_uuid() build error
ima: "remove enforce checking duplication" merge fix
Linus Torvalds [Mon, 25 Feb 2013 23:43:21 +0000 (15:43 -0800)]
Merge tag 'ktest-v3.9' of git://git./linux/kernel/git/rostedt/linux-ktest
Pull ktest update from Steven Rostedt:
"Added ability to have all builds test warnings.
Fixed failing reboot when the reboot produces a non fatal error.
Config reading fixes and other cleanups"
* tag 'ktest-v3.9' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest:
ktest: Remove indexes from warnings check
ktest: Ignore warnings during reboot
ktest: Search for linux banner for successful reboot
ktest: Add make_warnings_file and process full warnings
ktest: Allow a test option to use its default option
ktest: Strip off '\n' when reading which files were modified
ktest: Do not require CONSOLE for build or install bisects
Linus Torvalds [Mon, 25 Feb 2013 23:41:43 +0000 (15:41 -0800)]
Merge tag 'modules-next-for-linus' of git://git./linux/kernel/git/rusty/linux
Pull module update from Rusty Russell:
"The sweeping change is to make add_taint() explicitly indicate whether
to disable lockdep, but it's a mechanical change."
* tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
MODSIGN: Add option to not sign modules during modules_install
MODSIGN: Add -s <signature> option to sign-file
MODSIGN: Specify the hash algorithm on sign-file command line
MODSIGN: Simplify Makefile with a Kconfig helper
module: clean up load_module a little more.
modpost: Ignore ARC specific non-alloc sections
module: constify within_module_*
taint: add explicit flag to show whether lock dep is still OK.
module: printk message when module signature fail taints kernel.
Mimi Zohar [Mon, 25 Feb 2013 04:42:37 +0000 (23:42 -0500)]
block: fix part_pack_uuid() build error
Commit "
85865c1 ima: add policy support for file system uuid"
introduced a CONFIG_BLOCK dependency. This patch defines a
wrapper called blk_part_pack_uuid(), which returns -EINVAL,
when CONFIG_BLOCK is not defined.
security/integrity/ima/ima_policy.c:538:4: error: implicit declaration
of function 'part_pack_uuid' [-Werror=implicit-function-declaration]
Changelog v2:
- Reference commit number in patch description
Changelog v1:
- rename ima_part_pack_uuid() to blk_part_pack_uuid()
- resolve scripts/checkpatch.pl warnings
Changelog v0:
- fix UUID scripts/Lindent msgs
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: David Rientjes <rientjes@google.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Mimi Zohar [Mon, 25 Feb 2013 04:42:36 +0000 (23:42 -0500)]
ima: "remove enforce checking duplication" merge fix
Commit "
750943a ima: remove enforce checking duplication" combined
the 'in IMA policy' and 'enforcing file integrity' checks. For
the non-file, kernel module verification, a specific check for
'enforcing file integrity' was not added. This patch adds the
check.
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Linus Torvalds [Mon, 25 Feb 2013 04:00:58 +0000 (20:00 -0800)]
Merge tag 'mfd-3.9-1' of git://git./linux/kernel/git/sameo/mfd-2.6
Pull MFS updates from Samuel Ortiz:
"This is the MFD pull request for the 3.9 merge window.
No new drivers this time, but a bunch of fairly big cleanups:
- Roger Quadros worked on a OMAP USBHS and TLL platform data
consolidation, OMAP5 support and clock management code cleanup.
- The first step of a major sync for the ab8500 driver from Lee
Jones. In particular, the debugfs and the sysct interfaces got
extended and improved.
- Peter Ujfalusi sent a nice patchset for cleaning and fixing the
twl-core driver, with a much needed module id lookup code
improvement.
- The regular wm5102 and arizona cleanups and fixes from Mark Brown.
- Laxman Dewangan extended the palmas APIs in order to implement the
palmas GPIO and rt drivers.
- Laxman also added DT support for the tps65090 driver.
- The Intel SCH and ICH drivers got a couple fixes from Aaron Sierra
and Darren Hart.
- Linus Walleij patchset for the ab8500 driver allowed ab8500 and
ab9540 based devices to switch to the new abx500 pin-ctrl driver.
- The max8925 now has device tree and irqdomain support thanks to
Qing Xu.
- The recently added rtsx driver got a few cleanups and fixes for a
better card detection code path and now also supports the RTS5227
chipset, thanks to Wei Wang and Roger Tseng."
* tag 'mfd-3.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6: (109 commits)
mfd: lpc_ich: Use devres API to allocate private data
mfd: lpc_ich: Add Device IDs for Intel Wellsburg PCH
mfd: lpc_sch: Accomodate partial population of the MFD devices
mfd: da9052-i2c: Staticize da9052_i2c_fix()
mfd: syscon: Fix sparse warning
mfd: twl-core: Fix kernel panic on boot
mfd: rtsx: Fix issue that booting OS with SD card inserted
mfd: ab8500: Fix compile error
mfd: Add missing GENERIC_HARDIRQS dependecies
Documentation: Add docs for max8925 dt
mfd: max8925: Add dts
mfd: max8925: Support dt for backlight
mfd: max8925: Fix onkey driver irq base
mfd: max8925: Fix mfd device register failure
mfd: max8925: Add irqdomain for dt
mfd: vexpress: Allow vexpress-sysreg to self-initialise
mfd: rtsx: Support RTS5227
mfd: rtsx: Implement driving adjustment to device-dependent callbacks
mfd: vexpress: Add pseudo-GPIO based LEDs
mfd: ab8500: Rename ab8500 to abx500 for hwmon driver
...
Linus Torvalds [Mon, 25 Feb 2013 01:35:10 +0000 (17:35 -0800)]
Merge branch 'v4l_for_linus' of git://git./linux/kernel/git/mchehab/linux-media
Pull media updates from Mauro Carvalho Chehab:
- Some cleanups at V4L2 documentation
- new drivers: ts2020 frontend, ov9650 sensor, s5c73m3 sensor,
sh-mobile veu mem2mem driver, radio-ma901, davinci_vpfe staging
driver
- Lots of missing MAINTAINERS entries added
- several em28xx driver improvements, including its conversion to
videobuf2
- several fixups on drivers to make them to better comply with the API
- DVB core: add support for DVBv5 stats, allowing the implementation of
statistics for new standards like ISDB
- mb86a20s: add statistics to the driver
- lots of new board additions, cleanups, and driver improvements.
* 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (596 commits)
[media] media: Add 0x3009 USB PID to ttusb2 driver (fixed diff)
[media] rtl28xxu: Add USB IDs for Compro VideoMate U620F
[media] em28xx: add usb id for terratec h5 rev. 3
[media] media: rc: gpio-ir-recv: add support for device tree parsing
[media] mceusb: move check earlier to make smatch happy
[media] radio-si470x doc: add info about v4l2-ctl and sox+alsa
[media] staging: media: Remove unnecessary OOM messages
[media] sh_vou: Use vou_dev instead of vou_file wherever possible
[media] sh_vou: Use video_drvdata()
[media] drivers/media/platform/soc_camera/pxa_camera.c: use devm_ functions
[media] mt9t112: mt9t111 format set up differs from mt9t112
[media] sh-mobile-ceu-camera: fix SHARPNESS control default
Revert "[media] fc0011: Return early, if the frequency is already tuned"
[media] cx18/ivtv: fix regression: remove __init from a non-init function
[media] em28xx: fix analog streaming with USB bulk transfers
[media] stv0900: remove unnecessary null pointer check
[media] fc0011: Return early, if the frequency is already tuned
[media] fc0011: Add some sanity checks and cleanups
[media] fc0011: Fix xin value clamping
Revert "[media] [PATH,1/2] mxl5007 move reset to attach"
...
Linus Torvalds [Mon, 25 Feb 2013 01:32:15 +0000 (17:32 -0800)]
Merge tag 'upstream-linus' of git://git./linux/kernel/git/jgarzik/libata-dev
Pull libata updates from Jeff Garzik:
1) apply, and then revert, the sysfs export of ATA host controller
number. Discussion was continuing after patch application, trying to
figure out how to best mesh exported data with the installers,
boot-time agents and other parties that want this info.
2) Merge Zero-Power Optical Device Driver (ZPODD) support, bringing the
wonderfulness of sane power management to your CD/DVD device.
Includes one SCSI-subsystem patch (with appropriate ACKs), adding
runtime PM support to 'sr' driver. That is the ZPODD interaction
bits.
Patchset went through some 13 revisions before it got here; kudos to
Intel for persistence.
3) pata_samsung_cf: use devm_clk_get()
4) more ata_piix, ahci PCI IDs
5) Add SATA driver for R-Car SoC
6) Convert libata to use devm_ioremap_resource (Note: I think Greg sent
this to you, also)
7) Set proper Sense Key (SK) in the SCSI simulator when ATA passthrough
indicates check condition. Google and specification hawks everywhere
shall rejoice.
* tag 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev: (22 commits)
[libata] fix smatch warning for zpodd_wake_dev
[libata] Set proper SK when CK_COND is set.
[libata] Convert to devm_ioremap_resource()
libata: add R-Car SATA driver
ahci: Add Device IDs for Intel Wellsburg PCH
ata_piix: Add Device IDs for Intel Wellsburg PCH
[SCSI] remove can_power_off flag from scsi_device
[libata] scsi: no poll when ODD is powered off
[SCSI] sr: support runtime pm
ahci: AHCI-mode SATA patch for Intel Avoton DeviceIDs
ata_piix: IDE-mode SATA patch for Intel Avoton DeviceIDs
[libata] PM code cleanup for ata port
[libata] pm: differentiate system and runtime pm for ata port
Revert "libata: export host controller number thru /sys"
libata: do not suspend port if normal ODD is attached
libata: expose pm qos flags for ata device
libata: handle power transition of ODD
libata: check zero power ready status for ZPODD
libata: move acpi notification code to zpodd
libata: identify and init ZPODD devices
...
Nicolas Pitre [Mon, 25 Feb 2013 01:06:09 +0000 (20:06 -0500)]
tty vt: fix character insertion overflow
Commit
81732c3b2fed ("tty vt: Fix line garbage in virtual console on
command line edition") broke insert_char() in multiple ways. Then
commit
b1a925f44a3a ("tty vt: Fix a regression in command line edition")
partially fixed it. However, the buffer being moved is still too large
and overflowing beyond the end of the current line, corrupting existing
characters on the next line.
Example test case:
echo -e "abc\nde\x1b[A\x1b[4h \x1b[4l\x1b[B"
Expected result:
ab c
de
Current result:
ab c
e
Needless to say that this is very annoying when inserting words in the
middle of paragraphs with certain text editors.
Signed-off-by: Nicolas Pitre <nico@linaro.org>
Cc: Jean-François Moine <moinejf@free.fr>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Mon, 25 Feb 2013 00:06:13 +0000 (16:06 -0800)]
Merge tag 'stable/for-linus-3.9-rc0-tag' of git://git./linux/kernel/git/konrad/xen
Pull Xen update from Konrad Rzeszutek Wilk:
"This has two new ACPI drivers for Xen - a physical CPU offline/online
and a memory hotplug. The way this works is that ACPI kicks the
drivers and they make the appropiate hypercall to the hypervisor to
tell it that there is a new CPU or memory. There also some changes to
the Xen ARM ABIs and couple of fixes. One particularly nasty bug in
the Xen PV spinlock code was fixed by Stefan Bader - and has been
there since the 2.6.32!
Features:
- Xen ACPI memory and CPU hotplug drivers - allowing Xen hypervisor
to be aware of new CPU and new DIMMs
- Cleanups
Bug-fixes:
- Fixes a long-standing bug in the PV spinlock wherein we did not
kick VCPUs that were in a tight loop.
- Fixes in the error paths for the event channel machinery"
Fix up a few semantic conflicts with the ACPI interface changes in
drivers/xen/xen-acpi-{cpu,mem}hotplug.c.
* tag 'stable/for-linus-3.9-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen: event channel arrays are xen_ulong_t and not unsigned long
xen: Send spinlock IPI to all waiters
xen: introduce xen_remap, use it instead of ioremap
xen: close evtchn port if binding to irq fails
xen-evtchn: correct comment and error output
xen/tmem: Add missing %s in the printk statement.
xen/acpi: move xen_acpi_get_pxm under CONFIG_XEN_DOM0
xen/acpi: ACPI cpu hotplug
xen/acpi: Move xen_acpi_get_pxm to Xen's acpi.h
xen/stub: driver for CPU hotplug
xen/acpi: ACPI memory hotplug
xen/stub: driver for memory hotplug
xen: implement updated XENMEM_add_to_physmap_range ABI
xen/smp: Move the common CPU init code a bit to prep for PVH patch.
Linus Torvalds [Sun, 24 Feb 2013 21:07:18 +0000 (13:07 -0800)]
Merge tag 'kvm-3.9-1' of git://git./virt/kvm/kvm
Pull KVM updates from Marcelo Tosatti:
"KVM updates for the 3.9 merge window, including x86 real mode
emulation fixes, stronger memory slot interface restrictions, mmu_lock
spinlock hold time reduction, improved handling of large page faults
on shadow, initial APICv HW acceleration support, s390 channel IO
based virtio, amongst others"
* tag 'kvm-3.9-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (143 commits)
Revert "KVM: MMU: lazily drop large spte"
x86: pvclock kvm: align allocation size to page size
KVM: nVMX: Remove redundant get_vmcs12 from nested_vmx_exit_handled_msr
x86 emulator: fix parity calculation for AAD instruction
KVM: PPC: BookE: Handle alignment interrupts
booke: Added DBCR4 SPR number
KVM: PPC: booke: Allow multiple exception types
KVM: PPC: booke: use vcpu reference from thread_struct
KVM: Remove user_alloc from struct kvm_memory_slot
KVM: VMX: disable apicv by default
KVM: s390: Fix handling of iscs.
KVM: MMU: cleanup __direct_map
KVM: MMU: remove pt_access in mmu_set_spte
KVM: MMU: cleanup mapping-level
KVM: MMU: lazily drop large spte
KVM: VMX: cleanup vmx_set_cr0().
KVM: VMX: add missing exit names to VMX_EXIT_REASONS array
KVM: VMX: disable SMEP feature when guest is in non-paging mode
KVM: Remove duplicate text in api.txt
Revert "KVM: MMU: split kvm_mmu_free_page"
...
Linus Torvalds [Sun, 24 Feb 2013 02:50:11 +0000 (18:50 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/viro/signal
Pull signal handling cleanups from Al Viro:
"This is the first pile; another one will come a bit later and will
contain SYSCALL_DEFINE-related patches.
- a bunch of signal-related syscalls (both native and compat)
unified.
- a bunch of compat syscalls switched to COMPAT_SYSCALL_DEFINE
(fixing several potential problems with missing argument
validation, while we are at it)
- a lot of now-pointless wrappers killed
- a couple of architectures (cris and hexagon) forgot to save
altstack settings into sigframe, even though they used the
(uninitialized) values in sigreturn; fixed.
- microblaze fixes for delivery of multiple signals arriving at once
- saner set of helpers for signal delivery introduced, several
architectures switched to using those."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: (143 commits)
x86: convert to ksignal
sparc: convert to ksignal
arm: switch to struct ksignal * passing
alpha: pass k_sigaction and siginfo_t using ksignal pointer
burying unused conditionals
make do_sigaltstack() static
arm64: switch to generic old sigaction() (compat-only)
arm64: switch to generic compat rt_sigaction()
arm64: switch compat to generic old sigsuspend
arm64: switch to generic compat rt_sigqueueinfo()
arm64: switch to generic compat rt_sigpending()
arm64: switch to generic compat rt_sigprocmask()
arm64: switch to generic sigaltstack
sparc: switch to generic old sigsuspend
sparc: COMPAT_SYSCALL_DEFINE does all sign-extension as well as SYSCALL_DEFINE
sparc: kill sign-extending wrappers for native syscalls
kill sparc32_open()
sparc: switch to use of generic old sigaction
sparc: switch sys_compat_rt_sigaction() to COMPAT_SYSCALL_DEFINE
mips: switch to generic sys_fork() and sys_clone()
...
Linus Torvalds [Sun, 24 Feb 2013 01:50:35 +0000 (17:50 -0800)]
Merge branch 'akpm' (more incoming from Andrew)
Merge second patch-bomb from Andrew Morton:
- A little DM fix
- the MM queue
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (154 commits)
ksm: allocate roots when needed
mm: cleanup "swapcache" in do_swap_page
mm,ksm: swapoff might need to copy
mm,ksm: FOLL_MIGRATION do migration_entry_wait
ksm: shrink 32-bit rmap_item back to 32 bytes
ksm: treat unstable nid like in stable tree
ksm: add some comments
tmpfs: fix mempolicy object leaks
tmpfs: fix use-after-free of mempolicy object
mm/fadvise.c: drain all pagevecs if POSIX_FADV_DONTNEED fails to discard all pages
mm: export mmu notifier invalidates
mm: accelerate mm_populate() treatment of THP pages
mm: use long type for page counts in mm_populate() and get_user_pages()
mm: accurately document nr_free_*_pages functions with code comments
HWPOISON: change order of error_states[]'s elements
HWPOISON: fix misjudgement of page_action() for errors on mlocked pages
memcg: stop warning on memcg_propagate_kmem
net: change type of virtio_chan->p9_max_pages
vmscan: change type of vm_total_pages to unsigned long
fs/nfsd: change type of max_delegations, nfsd_drc_max_mem and nfsd_drc_mem_used
...
Hugh Dickins [Sat, 23 Feb 2013 00:36:12 +0000 (16:36 -0800)]
ksm: allocate roots when needed
It is a pity to have MAX_NUMNODES+MAX_NUMNODES tree roots statically
allocated, particularly when very few users will ever actually tune
merge_across_nodes 0 to use more than 1+1 of those trees. Not a big
deal (only 16kB wasted on each machine with CONFIG_MAXSMP), but a pity.
Start off with 1+1 statically allocated, then if merge_across_nodes is
ever tuned, allocate for nr_node_ids+nr_node_ids. Do not attempt to
free up the extra if it's tuned back, that would be a waste of effort.
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Petr Holasek <pholasek@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Izik Eidus <izik.eidus@ravellosystems.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Sat, 23 Feb 2013 00:36:10 +0000 (16:36 -0800)]
mm: cleanup "swapcache" in do_swap_page
I dislike the way in which "swapcache" gets used in do_swap_page():
there is always a page from swapcache there (even if maybe uncached by
the time we lock it), but tests are made according to "swapcache".
Rework that with "page != swapcache", as has been done in unuse_pte().
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Petr Holasek <pholasek@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Izik Eidus <izik.eidus@ravellosystems.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Sat, 23 Feb 2013 00:36:09 +0000 (16:36 -0800)]
mm,ksm: swapoff might need to copy
Before establishing that KSM page migration was the cause of my
WARN_ON_ONCE(page_mapped(page))s, I suspected that they came from the
lack of a ksm_might_need_to_copy() in swapoff's unuse_pte() - which in
many respects is equivalent to faulting in a page.
In fact I've never caught that as the cause: but in theory it does at
least need the KSM_RUN_UNMERGE check in ksm_might_need_to_copy(), to
avoid bringing a KSM page back in when it's not supposed to be.
I intended to copy how it's done in do_swap_page(), but have a strong
aversion to how "swapcache" ends up being used there: rework it with
"page != swapcache".
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Petr Holasek <pholasek@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Izik Eidus <izik.eidus@ravellosystems.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Sat, 23 Feb 2013 00:36:07 +0000 (16:36 -0800)]
mm,ksm: FOLL_MIGRATION do migration_entry_wait
In "ksm: remove old stable nodes more thoroughly" I said that I'd never
seen its WARN_ON_ONCE(page_mapped(page)). True at the time of writing,
but it soon appeared once I tried fuller tests on the whole series.
It turned out to be due to the KSM page migration itself: unmerge_and_
remove_all_rmap_items() failed to locate and replace all the KSM pages,
because of that hiatus in page migration when old pte has been replaced
by migration entry, but not yet by new pte. follow_page() finds no page
at that instant, but a KSM page reappears shortly after, without a
fault.
Add FOLL_MIGRATION flag, so follow_page() can do migration_entry_wait()
for KSM's break_cow(). I'd have preferred to avoid another flag, and do
it every time, in case someone else makes the same easy mistake; but did
not find another transgressor (the common get_user_pages() is of course
safe), and cannot be sure that every follow_page() caller is prepared to
sleep - ia64's xencomm_vtop()? Now, THP's wait_split_huge_page() can
already sleep there, since anon_vma locking was changed to mutex, but
maybe that's somehow excluded.
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Petr Holasek <pholasek@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Izik Eidus <izik.eidus@ravellosystems.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Sat, 23 Feb 2013 00:36:06 +0000 (16:36 -0800)]
ksm: shrink 32-bit rmap_item back to 32 bytes
Think of struct rmap_item as an extension of struct page (restricted to
MADV_MERGEABLE areas): there may be a lot of them, we need to keep them
small, especially on 32-bit architectures of limited lowmem.
Siting "int nid" after "unsigned int checksum" works nicely on 64-bit,
making no change to its 64-byte struct rmap_item; but bloats the 32-bit
struct rmap_item from (nicely cache-aligned) 32 bytes to 36 bytes, which
rounds up to 40 bytes once allocated from slab. We'd better avoid that.
Hey, I only just remembered that the anon_vma pointer in struct
rmap_item has no purpose until the rmap_item is hung from a stable tree
node (which has its own nid field); and rmap_item's nid field no purpose
than to say which tree root to tell rb_erase() when unlinking from an
unstable tree.
Double them up in a union. There's just one place where we set anon_vma
early (when we already hold mmap_sem): now we must remove tree_rmap_item
from its unstable tree there, before overwriting nid. No need to
spatter BUG()s around: we'd be seeing oopses if this were wrong.
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Petr Holasek <pholasek@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Izik Eidus <izik.eidus@ravellosystems.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Sat, 23 Feb 2013 00:36:05 +0000 (16:36 -0800)]
ksm: treat unstable nid like in stable tree
An inconsistency emerged in reviewing the NUMA node changes to KSM: when
meeting a page from the wrong NUMA node in a stable tree, we say that
it's okay for comparisons, but not as a leaf for merging; whereas when
meeting a page from the wrong NUMA node in an unstable tree, we bail out
immediately.
Now, it might be that a wrong NUMA node in an unstable tree is more
likely to correlate with instablility (different content, with rbnode
now misplaced) than page migration; but even so, we are accustomed to
instablility in the unstable tree.
Without strong evidence for which strategy is generally better, I'd
rather be consistent with what's done in the stable tree: accept a page
from the wrong NUMA node for comparison, but not as a leaf for merging.
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Petr Holasek <pholasek@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Izik Eidus <izik.eidus@ravellosystems.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Sat, 23 Feb 2013 00:36:03 +0000 (16:36 -0800)]
ksm: add some comments
Added slightly more detail to the Documentation of merge_across_nodes, a
few comments in areas indicated by review, and renamed get_ksm_page()'s
argument from "locked" to "lock_it". No functional change.
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Petr Holasek <pholasek@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Izik Eidus <izik.eidus@ravellosystems.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Greg Thelen [Sat, 23 Feb 2013 00:36:02 +0000 (16:36 -0800)]
tmpfs: fix mempolicy object leaks
Fix several mempolicy leaks in the tmpfs mount logic. These leaks are
slow - on the order of one object leaked per mount attempt.
Leak 1 (umount doesn't free mpol allocated in mount):
while true; do
mount -t tmpfs -o mpol=interleave,size=100M nodev /mnt
umount /mnt
done
Leak 2 (errors parsing remount options will leak mpol):
mount -t tmpfs -o size=100M nodev /mnt
while true; do
mount -o remount,mpol=interleave,size=x /mnt 2> /dev/null
done
umount /mnt
Leak 3 (multiple mpol per mount leak mpol):
while true; do
mount -t tmpfs -o mpol=interleave,mpol=interleave,size=100M nodev /mnt
umount /mnt
done
This patch fixes all of the above. I could have broken the patch into
three pieces but is seemed easier to review as one.
[akpm@linux-foundation.org: fix handling of mpol_parse_str() errors, per Hugh]
Signed-off-by: Greg Thelen <gthelen@google.com>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Greg Thelen [Sat, 23 Feb 2013 00:36:01 +0000 (16:36 -0800)]
tmpfs: fix use-after-free of mempolicy object
The tmpfs remount logic preserves filesystem mempolicy if the mpol=M
option is not specified in the remount request. A new policy can be
specified if mpol=M is given.
Before this patch remounting an mpol bound tmpfs without specifying
mpol= mount option in the remount request would set the filesystem's
mempolicy object to a freed mempolicy object.
To reproduce the problem boot a DEBUG_PAGEALLOC kernel and run:
# mkdir /tmp/x
# mount -t tmpfs -o size=100M,mpol=interleave nodev /tmp/x
# grep /tmp/x /proc/mounts
nodev /tmp/x tmpfs rw,relatime,size=102400k,mpol=interleave:0-3 0 0
# mount -o remount,size=200M nodev /tmp/x
# grep /tmp/x /proc/mounts
nodev /tmp/x tmpfs rw,relatime,size=204800k,mpol=??? 0 0
# note ? garbage in mpol=... output above
# dd if=/dev/zero of=/tmp/x/f count=1
# panic here
Panic:
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [< (null)>] (null)
[...]
Oops: 0010 [#1] SMP DEBUG_PAGEALLOC
Call Trace:
mpol_shared_policy_init+0xa5/0x160
shmem_get_inode+0x209/0x270
shmem_mknod+0x3e/0xf0
shmem_create+0x18/0x20
vfs_create+0xb5/0x130
do_last+0x9a1/0xea0
path_openat+0xb3/0x4d0
do_filp_open+0x42/0xa0
do_sys_open+0xfe/0x1e0
compat_sys_open+0x1b/0x20
cstar_dispatch+0x7/0x1f
Non-debug kernels will not crash immediately because referencing the
dangling mpol will not cause a fault. Instead the filesystem will
reference a freed mempolicy object, which will cause unpredictable
behavior.
The problem boils down to a dropped mpol reference below if
shmem_parse_options() does not allocate a new mpol:
config = *sbinfo
shmem_parse_options(data, &config, true)
mpol_put(sbinfo->mpol)
sbinfo->mpol = config.mpol /* BUG: saves unreferenced mpol */
This patch avoids the crash by not releasing the mempolicy if
shmem_parse_options() doesn't create a new mpol.
How far back does this issue go? I see it in both 2.6.36 and 3.3. I did
not look back further.
Signed-off-by: Greg Thelen <gthelen@google.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mel Gorman [Sat, 23 Feb 2013 00:35:59 +0000 (16:35 -0800)]
mm/fadvise.c: drain all pagevecs if POSIX_FADV_DONTNEED fails to discard all pages
Rob van der Heij reported the following (paraphrased) on private mail.
The scenario is that I want to avoid backups to fill up the page
cache and purge stuff that is more likely to be used again (this is
with s390x Linux on z/VM, so I don't give it as much memory that
we don't care anymore). So I have something with LD_PRELOAD that
intercepts the close() call (from tar, in this case) and issues
a posix_fadvise() just before closing the file.
This mostly works, except for small files (less than 14 pages)
that remains in page cache after the face.
Unfortunately Rob has not had a chance to test this exact patch but the
test program below should be reproducing the problem he described.
The issue is the per-cpu pagevecs for LRU additions. If the pages are
added by one CPU but fadvise() is called on another then the pages
remain resident as the invalidate_mapping_pages() only drains the local
pagevecs via its call to pagevec_release(). The user-visible effect is
that a program that uses fadvise() properly is not obeyed.
A possible fix for this is to put the necessary smarts into
invalidate_mapping_pages() to globally drain the LRU pagevecs if a
pagevec page could not be discarded. The downside with this is that an
inode cache shrink would send a global IPI and memory pressure
potentially causing global IPI storms is very undesirable.
Instead, this patch adds a check during fadvise(POSIX_FADV_DONTNEED) to
check if invalidate_mapping_pages() discarded all the requested pages.
If a subset of pages are discarded it drains the LRU pagevecs and tries
again. If the second attempt fails, it assumes it is due to the pages
being mapped, locked or dirty and does not care. With this patch, an
application using fadvise() correctly will be obeyed but there is a
downside that a malicious application can force the kernel to send
global IPIs and increase overhead.
If accepted, I would like this to be considered as a -stable candidate.
It's not an urgent issue but it's a system call that is not working as
advertised which is weak.
The following test program demonstrates the problem. It should never
report that pages are still resident but will without this patch. It
assumes that CPU 0 and 1 exist.
int main() {
int fd;
int pagesize = getpagesize();
ssize_t written = 0, expected;
char *buf;
unsigned char *vec;
int resident, i;
cpu_set_t set;
/* Prepare a buffer for writing */
expected = FILESIZE_PAGES * pagesize;
buf = malloc(expected + 1);
if (buf == NULL) {
printf("ENOMEM\n");
exit(EXIT_FAILURE);
}
buf[expected] = 0;
memset(buf, 'a', expected);
/* Prepare the mincore vec */
vec = malloc(FILESIZE_PAGES);
if (vec == NULL) {
printf("ENOMEM\n");
exit(EXIT_FAILURE);
}
/* Bind ourselves to CPU 0 */
CPU_ZERO(&set);
CPU_SET(0, &set);
if (sched_setaffinity(getpid(), sizeof(set), &set) == -1) {
perror("sched_setaffinity");
exit(EXIT_FAILURE);
}
/* open file, unlink and write buffer */
fd = open("fadvise-test-file", O_CREAT|O_EXCL|O_RDWR);
if (fd == -1) {
perror("open");
exit(EXIT_FAILURE);
}
unlink("fadvise-test-file");
while (written < expected) {
ssize_t this_write;
this_write = write(fd, buf + written, expected - written);
if (this_write == -1) {
perror("write");
exit(EXIT_FAILURE);
}
written += this_write;
}
free(buf);
/*
* Force ourselves to another CPU. If fadvise only flushes the local
* CPUs pagevecs then the fadvise will fail to discard all file pages
*/
CPU_ZERO(&set);
CPU_SET(1, &set);
if (sched_setaffinity(getpid(), sizeof(set), &set) == -1) {
perror("sched_setaffinity");
exit(EXIT_FAILURE);
}
/* sync and fadvise to discard the page cache */
fsync(fd);
if (posix_fadvise(fd, 0, expected, POSIX_FADV_DONTNEED) == -1) {
perror("posix_fadvise");
exit(EXIT_FAILURE);
}
/* map the file and use mincore to see which parts of it are resident */
buf = mmap(NULL, expected, PROT_READ, MAP_SHARED, fd, 0);
if (buf == NULL) {
perror("mmap");
exit(EXIT_FAILURE);
}
if (mincore(buf, expected, vec) == -1) {
perror("mincore");
exit(EXIT_FAILURE);
}
/* Check residency */
for (i = 0, resident = 0; i < FILESIZE_PAGES; i++) {
if (vec[i])
resident++;
}
if (resident != 0) {
printf("Nr unexpected pages resident: %d\n", resident);
exit(EXIT_FAILURE);
}
munmap(buf, expected);
close(fd);
free(vec);
exit(EXIT_SUCCESS);
}
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reported-by: Rob van der Heij <rvdheij@gmail.com>
Tested-by: Rob van der Heij <rvdheij@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cliff Wickman [Sat, 23 Feb 2013 00:35:58 +0000 (16:35 -0800)]
mm: export mmu notifier invalidates
We at SGI have a need to address some very high physical address ranges
with our GRU (global reference unit), sometimes across partitioned
machine boundaries and sometimes with larger addresses than the cpu
supports. We do this with the aid of our own 'extended vma' module
which mimics the vma. When something (either unmap or exit) frees an
'extended vma' we use the mmu notifiers to clean them up.
We had been able to mimic the functions
__mmu_notifier_invalidate_range_start() and
__mmu_notifier_invalidate_range_end() by locking the per-mm lock and
walking the per-mm notifier list. But with the change to a global srcu
lock (static in mmu_notifier.c) we can no longer do that. Our module has
no access to that lock.
So we request that these two functions be exported.
Signed-off-by: Cliff Wickman <cpw@sgi.com>
Acked-by: Robin Holt <holt@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michel Lespinasse [Sat, 23 Feb 2013 00:35:56 +0000 (16:35 -0800)]
mm: accelerate mm_populate() treatment of THP pages
This change adds a follow_page_mask function which is equivalent to
follow_page, but with an extra page_mask argument.
follow_page_mask sets *page_mask to HPAGE_PMD_NR - 1 when it encounters
a THP page, and to 0 in other cases.
__get_user_pages() makes use of this in order to accelerate populating
THP ranges - that is, when both the pages and vmas arrays are NULL, we
don't need to iterate HPAGE_PMD_NR times to cover a single THP page (and
we also avoid taking mm->page_table_lock that many times).
Signed-off-by: Michel Lespinasse <walken@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michel Lespinasse [Sat, 23 Feb 2013 00:35:55 +0000 (16:35 -0800)]
mm: use long type for page counts in mm_populate() and get_user_pages()
Use long type for page counts in mm_populate() so as to avoid integer
overflow when running the following test code:
int main(void) {
void *p = mmap(NULL, 0x100000000000, PROT_READ,
MAP_PRIVATE | MAP_ANON, -1, 0);
printf("p: %p\n", p);
mlockall(MCL_CURRENT);
printf("done\n");
return 0;
}
Signed-off-by: Michel Lespinasse <walken@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Zhang Yanfei [Sat, 23 Feb 2013 00:35:54 +0000 (16:35 -0800)]
mm: accurately document nr_free_*_pages functions with code comments
nr_free_zone_pages(), nr_free_buffer_pages() and nr_free_pagecache_pages()
are horribly badly named, so accurately document them with code comments
in case of the misuse of them.
[akpm@linux-foundation.org: tweak comments]
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Naoya Horiguchi [Sat, 23 Feb 2013 00:35:53 +0000 (16:35 -0800)]
HWPOISON: change order of error_states[]'s elements
error_states[] has two separate states "unevictable LRU page" and
"mlocked LRU page", and the former one has the higher priority now. But
because of that the latter one is rarely chosen because pages with
PageMlocked highly likely have PG_unevictable set. On the other hand,
PG_unevictable without PageMlocked is common for ramfs or SHM_LOCKed
shared memory, so reversing the priority of these two states helps us
clearly distinguish them.
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Chen Gong <gong.chen@linux.intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Naoya Horiguchi [Sat, 23 Feb 2013 00:35:51 +0000 (16:35 -0800)]
HWPOISON: fix misjudgement of page_action() for errors on mlocked pages
memory_failure() can't handle memory errors on mlocked pages correctly,
because page_action() judges such errors as ones on "unknown pages"
instead of ones on "unevictable LRU page" or "mlocked LRU page". In
order to determine page_state page_action() checks page flags at the
timing of the judgement, but such page flags are not the same with those
just after memory_failure() is called, because memory_failure() does
unmapping of the error pages before doing page_action(). This unmapping
changes the page state, especially page_remove_rmap() (called from
try_to_unmap_one()) clears PG_mlocked, so page_action() can't catch
mlocked pages after that.
With this patch, we store the page flag of the error page before doing
unmap, and (only) if the first check with page flags at the time decided
the error page is unknown, we do the second check with the stored page
flag. This implementation doesn't change error handling for the page
types for which the first check can determine the page state correctly.
[akpm@linux-foundation.org: tweak comments]
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Chen Gong <gong.chen@linux.intel.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Sat, 23 Feb 2013 00:35:50 +0000 (16:35 -0800)]
memcg: stop warning on memcg_propagate_kmem
Whilst I run the risk of a flogging for disloyalty to the Lord of Sealand,
I do have CONFIG_MEMCG=y CONFIG_MEMCG_KMEM not set, and grow tired of the
"mm/memcontrol.c:4972:12: warning: `memcg_propagate_kmem' defined but not
used [-Wunused-function]" seen in 3.8-rc: move the #ifdef outwards.
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Glauber Costa <glommer@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Zhang Yanfei [Sat, 23 Feb 2013 00:35:49 +0000 (16:35 -0800)]
net: change type of virtio_chan->p9_max_pages
This member of struct virtio_chan is calculated from nr_free_buffer_pages
so change its type to unsigned long in case of overflow.
Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: David Miller <davem@davemloft.net>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Ron Minnich <rminnich@sandia.gov>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Zhang Yanfei [Sat, 23 Feb 2013 00:35:48 +0000 (16:35 -0800)]
vmscan: change type of vm_total_pages to unsigned long
This variable is calculated from nr_free_pagecache_pages so
change its type to unsigned long.
Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Zhang Yanfei [Sat, 23 Feb 2013 00:35:47 +0000 (16:35 -0800)]
fs/nfsd: change type of max_delegations, nfsd_drc_max_mem and nfsd_drc_mem_used
The three variables are calculated from nr_free_buffer_pages so change
their types to unsigned long in case of overflow.
Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Zhang Yanfei [Sat, 23 Feb 2013 00:35:46 +0000 (16:35 -0800)]
fs/buffer.c: change type of max_buffer_heads to unsigned long
max_buffer_heads is calculated from nr_free_buffer_pages(), so change
its type to unsigned long in case of overflow.
Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Zhang Yanfei [Sat, 23 Feb 2013 00:35:45 +0000 (16:35 -0800)]
ia64: use %ld to print pages calculated in nr_free_buffer_pages
Now the function nr_free_buffer_pages returns unsigned long, so use %ld
to print its return value.
Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Zhang Yanfei [Sat, 23 Feb 2013 00:35:43 +0000 (16:35 -0800)]
mm: fix return type for functions nr_free_*_pages
Currently, the amount of RAM that functions nr_free_*_pages return is
held in unsigned int. But in machines with big memory (exceeding 16TB),
the amount may be incorrect because of overflow, so fix it.
Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: Simon Horman <horms@verge.net.au>
Cc: Julian Anastasov <ja@ssi.bg>
Cc: David Miller <davem@davemloft.net>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Ron Minnich <rminnich@sandia.gov>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michal Hocko [Sat, 23 Feb 2013 00:35:41 +0000 (16:35 -0800)]
memcg: cleanup mem_cgroup_init comment
We should encourage all memcg controller initialization independent on a
specific mem_cgroup to be done here rather than exploit css_alloc
callback and assume that nothing happens before root cgroup is created.
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tejun Heo <htejun@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michal Hocko [Sat, 23 Feb 2013 00:35:40 +0000 (16:35 -0800)]
memcg: move memcg_stock initialization to mem_cgroup_init
memcg_stock are currently initialized during the root cgroup allocation
which is OK but it pointlessly pollutes memcg allocation code with
something that can be called when the memcg subsystem is initialized by
mem_cgroup_init along with other controller specific parts.
This patch wraps the current memcg_stock initialization code into a
helper calls it from the controller subsystem initialization code.
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tejun Heo <htejun@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michal Hocko [Sat, 23 Feb 2013 00:35:39 +0000 (16:35 -0800)]
memcg: move mem_cgroup_soft_limit_tree_init to mem_cgroup_init
Per-node-zone soft limit tree is currently initialized when the root
cgroup is created which is OK but it pointlessly pollutes memcg
allocation code with something that can be called when the memcg
subsystem is initialized by mem_cgroup_init along with other controller
specific parts.
While we are at it let's make mem_cgroup_soft_limit_tree_init void
because it doesn't make much sense to report memory failure because if
we fail to allocate memory that early during the boot then we are
screwed anyway (this saves some code).
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tejun Heo <htejun@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Minchan Kim [Sat, 23 Feb 2013 00:35:37 +0000 (16:35 -0800)]
mm: use up free swap space before reaching OOM kill
Recently, Luigi reported there are lots of free swap space when OOM
happens. It's easily reproduced on zram-over-swap, where many instance
of memory hogs are running and laptop_mode is enabled. He said there
was no problem when he disabled laptop_mode. The problem when I
investigate problem is following as.
Assumption for easy explanation: There are no page cache page in system
because they all are already reclaimed.
1. try_to_free_pages disable may_writepage when laptop_mode is enabled.
2. shrink_inactive_list isolates victim pages from inactive anon lru list.
3. shrink_page_list adds them to swapcache via add_to_swap but it doesn't
pageout because sc->may_writepage is 0 so the page is rotated back into
inactive anon lru list. The add_to_swap made the page Dirty by SetPageDirty.
4. 3 couldn't reclaim any pages so do_try_to_free_pages increase priority and
retry reclaim with higher priority.
5. shrink_inactlive_list try to isolate victim pages from inactive anon lru list
but got failed because it try to isolate pages with ISOLATE_CLEAN mode but
inactive anon lru list is full of dirty pages by 3 so it just returns
without any reclaim progress.
6. do_try_to_free_pages doesn't set may_writepage due to zero total_scanned.
Because sc->nr_scanned is increased by shrink_page_list but we don't call
shrink_page_list in 5 due to short of isolated pages.
Above loop is continued until OOM happens.
The problem didn't happen before [1] was merged because old logic's
isolatation in shrink_inactive_list was successful and tried to call
shrink_page_list to pageout them but it still ends up failed to page out
by may_writepage. But important point is that sc->nr_scanned was
increased although we couldn't swap out them so do_try_to_free_pages
could set may_writepages.
Since commit
f80c0673610e ("mm: zone_reclaim: make isolate_lru_page()
filter-aware") was introduced, it's not a good idea any more to depends
on only the number of scanned pages for setting may_writepage. So this
patch adds new trigger point of setting may_writepage as below
DEF_PRIOIRTY - 2 which is used to show the significant memory pressure
in VM so it's good fit for our purpose which would be better to lose
power saving or clickety rather than OOM killing.
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reported-by: Luigi Semenzato <semenzato@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Rientjes [Sat, 23 Feb 2013 00:35:36 +0000 (16:35 -0800)]
mm: use NUMA_NO_NODE
Make a sweep through mm/ and convert code that uses -1 directly to using
the more appropriate NUMA_NO_NODE.
Signed-off-by: David Rientjes <rientjes@google.com>
Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Robin Holt [Sat, 23 Feb 2013 00:35:34 +0000 (16:35 -0800)]
mmu_notifier_unregister NULL Pointer deref and multiple ->release() callouts
There is a race condition between mmu_notifier_unregister() and
__mmu_notifier_release().
Assume two tasks, one calling mmu_notifier_unregister() as a result of a
filp_close() ->flush() callout (task A), and the other calling
mmu_notifier_release() from an mmput() (task B).
A B
t1 srcu_read_lock()
t2 if (!hlist_unhashed())
t3 srcu_read_unlock()
t4 srcu_read_lock()
t5 hlist_del_init_rcu()
t6 synchronize_srcu()
t7 srcu_read_unlock()
t8 hlist_del_rcu() <--- NULL pointer deref.
Additionally, the list traversal in __mmu_notifier_release() is not
protected by the by the mmu_notifier_mm->hlist_lock which can result in
callouts to the ->release() notifier from both mmu_notifier_unregister()
and __mmu_notifier_release().
-stable suggestions:
The stable trees prior to 3.7.y need commits
21a92735f660 and
70400303ce0c cherry-picked in that order prior to cherry-picking this
commit. The 3.7.y tree already has those two commits.
Signed-off-by: Robin Holt <holt@sgi.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Sagi Grimberg <sagig@mellanox.co.il>
Cc: Haggai Eran <haggaie@mellanox.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cody P Schafer [Sat, 23 Feb 2013 00:35:32 +0000 (16:35 -0800)]
mm/memory_hotplug: use pgdat_end_pfn() instead of open coding the same.
Replace open coded pgdat_end_pfn() with helper function.
Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
Cc: David Hansen <dave@linux.vnet.ibm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cody P Schafer [Sat, 23 Feb 2013 00:35:31 +0000 (16:35 -0800)]
mm/memory_hotplug: use ensure_zone_is_initialized()
Remove open coding of ensure_zone_is_initialzied().
Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
Cc: David Hansen <dave@linux.vnet.ibm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cody P Schafer [Sat, 23 Feb 2013 00:35:30 +0000 (16:35 -0800)]
mm: add helper ensure_zone_is_initialized()
ensure_zone_is_initialized() checks if a zone is in a empty & not
initialized state (typically occuring after it is created in memory
hotplugging), and, if so, calls init_currently_empty_zone() to
initialize the zone.
Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
Cc: David Hansen <dave@linux.vnet.ibm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cody P Schafer [Sat, 23 Feb 2013 00:35:28 +0000 (16:35 -0800)]
mm/page_alloc: add informative debugging message in page_outside_zone_boundaries()
Add a debug message which prints when a page is found outside of the
boundaries of the zone it should belong to. Format is:
"page $pfn outside zone [ $start_pfn - $end_pfn ]"
[akpm@linux-foundation.org: s/pr_debug/pr_err/]
Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
Cc: David Hansen <dave@linux.vnet.ibm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cody P Schafer [Sat, 23 Feb 2013 00:35:27 +0000 (16:35 -0800)]
mmzone: add pgdat_{end_pfn,is_empty}() helpers & consolidate.
Add pgdat_end_pfn() and pgdat_is_empty() helpers which match the similar
zone_*() functions.
Change node_end_pfn() to be a wrapper of pgdat_end_pfn().
Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
Cc: David Hansen <dave@linux.vnet.ibm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cody P Schafer [Sat, 23 Feb 2013 00:35:25 +0000 (16:35 -0800)]
mm/page_alloc: add a VM_BUG in __free_one_page() if the zone is uninitialized.
Freeing pages to uninitialized zones is not handled by
__free_one_page(), and should never happen when the code is correct.
Ran into this while writing some code that dynamically onlines extra
zones.
Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
Cc: David Hansen <dave@linux.vnet.ibm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cody P Schafer [Sat, 23 Feb 2013 00:35:24 +0000 (16:35 -0800)]
mm: add zone_is_empty() and zone_is_initialized()
Factoring out these 2 checks makes it more clear what we are actually
checking for.
Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
Cc: David Hansen <dave@linux.vnet.ibm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cody P Schafer [Sat, 23 Feb 2013 00:35:23 +0000 (16:35 -0800)]
mm: add & use zone_end_pfn() and zone_spans_pfn()
Add 2 helpers (zone_end_pfn() and zone_spans_pfn()) to reduce code
duplication.
This also switches to using them in compaction (where an additional
variable needed to be renamed), page_alloc, vmstat, memory_hotplug, and
kmemleak.
Note that in compaction.c I avoid calling zone_end_pfn() repeatedly
because I expect at some point the sycronization issues with start_pfn &
spanned_pages will need fixing, either by actually using the seqlock or
clever memory barrier usage.
Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
Cc: David Hansen <dave@linux.vnet.ibm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cody P Schafer [Sat, 23 Feb 2013 00:35:21 +0000 (16:35 -0800)]
mm: add SECTION_IN_PAGE_FLAGS
Instead of directly utilizing a combination of config options to determine
this, add a macro to specifically address it.
Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
Cc: David Hansen <dave@linux.vnet.ibm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Johannes Weiner [Sat, 23 Feb 2013 00:35:20 +0000 (16:35 -0800)]
mm/mlock.c: document scary-looking stack expansion mlock chain
The fact that mlock calls get_user_pages, and get_user_pages might call
mlock when expanding a stack looks like a potential recursion.
However, mlock makes sure the requested range is already contained
within a vma, so no stack expansion will actually happen from mlock.
Should this ever change: the stack expansion mlocks only the newly
expanded range and so will not result in recursive expansion.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Hugh Dickins <hughd@google.com>
Acked-by: Michel Lespinasse <walken@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Johannes Weiner [Sat, 23 Feb 2013 00:35:19 +0000 (16:35 -0800)]
mm: refactor inactive_file_is_low() to use get_lru_size()
An inactive file list is considered low when its active counterpart is
bigger, regardless of whether it is a global zone LRU list or a memcg
zone LRU list. The only difference is in how the LRU size is assessed.
get_lru_size() does the right thing for both global and memcg reclaim
situations.
Get rid of inactive_file_is_low_global() and
mem_cgroup_inactive_file_is_low() by using get_lru_size() and compare
the numbers in common code.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Johannes Weiner [Sat, 23 Feb 2013 00:35:17 +0000 (16:35 -0800)]
mm: shmem: use new radix tree iterator
In shmem_find_get_pages_and_swap(), use the faster radix tree iterator
construct from commit
78c1d78488a3 ("radix-tree: introduce bit-optimized
iterator").
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Sat, 23 Feb 2013 00:35:16 +0000 (16:35 -0800)]
ksm: stop hotremove lockdep warning
Complaints are rare, but lockdep still does not understand the way
ksm_memory_callback(MEM_GOING_OFFLINE) takes ksm_thread_mutex, and holds
it until the ksm_memory_callback(MEM_OFFLINE): that appears to be a
problem because notifier callbacks are made under down_read of
blocking_notifier_head->rwsem (so first the mutex is taken while holding
the rwsem, then later the rwsem is taken while still holding the mutex);
but is not in fact a problem because mem_hotplug_mutex is held
throughout the dance.
There was an attempt to fix this with mutex_lock_nested(); but if that
happened to fool lockdep two years ago, apparently it does so no longer.
I had hoped to eradicate this issue in extending KSM page migration not
to need the ksm_thread_mutex. But then realized that although the page
migration itself is safe, we do still need to lock out ksmd and other
users of get_ksm_page() while offlining memory - at some point between
MEM_GOING_OFFLINE and MEM_OFFLINE, the struct pages themselves may
vanish, and get_ksm_page()'s accesses to them become a violation.
So, give up on holding ksm_thread_mutex itself from MEM_GOING_OFFLINE to
MEM_OFFLINE, and add a KSM_RUN_OFFLINE flag, and wait_while_offlining()
checks, to achieve the same lockout without being caught by lockdep.
This is less elegant for KSM, but it's more important to keep lockdep
useful to other users - and I apologize for how long it took to fix.
Signed-off-by: Hugh Dickins <hughd@google.com>
Reported-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Tested-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Petr Holasek <pholasek@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Izik Eidus <izik.eidus@ravellosystems.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Sat, 23 Feb 2013 00:35:14 +0000 (16:35 -0800)]
mm: remove offlining arg to migrate_pages
No functional change, but the only purpose of the offlining argument to
migrate_pages() etc, was to ensure that __unmap_and_move() could migrate a
KSM page for memory hotremove (which took ksm_thread_mutex) but not for
other callers. Now all cases are safe, remove the arg.
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Petr Holasek <pholasek@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Izik Eidus <izik.eidus@ravellosystems.com>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Sat, 23 Feb 2013 00:35:13 +0000 (16:35 -0800)]
ksm: enable KSM page migration
Migration of KSM pages is now safe: remove the PageKsm restrictions from
mempolicy.c and migrate.c.
But keep PageKsm out of __unmap_and_move()'s anon_vma contortions, which
are irrelevant to KSM: it looks as if that code was preventing hotremove
migration of KSM pages, unless they happened to be in swapcache.
There is some question as to whether enforcing a NUMA mempolicy migration
ought to migrate KSM pages, mapped into entirely unrelated processes; but
moving page_mapcount > 1 is only permitted with MPOL_MF_MOVE_ALL anyway,
and it seems reasonable to assume that you wouldn't set MADV_MERGEABLE on
any area where this is a worry.
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Petr Holasek <pholasek@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Izik Eidus <izik.eidus@ravellosystems.com>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>