Dan Streetman [Wed, 22 Jul 2015 18:26:36 +0000 (14:26 -0400)]
crypto: nx - merge nx-compress and nx-compress-crypto
Merge the nx-842.c code into nx-842-crypto.c.
This allows later patches to remove the 'platform' driver, and instead
allow each platform driver to directly register with the crypto
compression api.
Signed-off-by: Dan Streetman <ddstreet@ieee.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Dan Streetman [Wed, 22 Jul 2015 18:26:35 +0000 (14:26 -0400)]
crypto: nx - use common code for both NX decompress success cases
Replace the duplicated finishing code (set destination buffer length and
set return code to 0) in the case of decompressing a buffer with no header
with a goto to the success case of decompressing a buffer with a header.
This is a trivial change that allows both success cases to use common code,
and includes the pr_debug() msg in both cases as well.
Signed-off-by: Dan Streetman <ddstreet@ieee.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Dan Streetman [Wed, 22 Jul 2015 18:26:34 +0000 (14:26 -0400)]
crypto: nx - don't register pSeries driver if ENODEV
Don't register the pSeries driver when parsing the device tree returns
ENODEV.
The nx842_probe() function in the pSeries driver returns error instead
of registering as a crypto compression driver, when it receives an
error return value from the nx842_OF_upd() function that probes the
device tree nodes, except when ENODEV is returned. However ENODEV
should not be a special case and the driver should not register when
there is no hw device, or the hw device is disabled.
Signed-off-by: Dan Streetman <ddstreet@ieee.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Dan Streetman [Wed, 22 Jul 2015 18:26:33 +0000 (14:26 -0400)]
crypto: nx - move kzalloc() out of spinlock
Move the kzalloc() calls in nx842_probe() and nx842_OF_upd() to the top
of the functions, before taking the devdata spinlock.
Since kzalloc() without GFP_ATOMIC can sleep, it can't be called while
holding a spinlock. Move the calls to before taking the lock.
Signed-off-by: Dan Streetman <ddstreet@ieee.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Dan Streetman [Wed, 22 Jul 2015 18:26:32 +0000 (14:26 -0400)]
crypto: nx - remove pSeries NX 'status' field
Remove the 'status' field from the pSeries NX driver data.
The 'status' field isn't used by the driver at all; it simply checks the
devicetree status node at initialization, and returns success if 'okay'
and failure otherwise.
Signed-off-by: Dan Streetman <ddstreet@ieee.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Dan Streetman [Wed, 22 Jul 2015 18:26:31 +0000 (14:26 -0400)]
crypto: nx - remove __init/__exit from VIO functions
Remove the __init and __exit modifiers from the VIO driver probe and
remove functions.
The driver functions should not be marked __init/__exit because they
can/will be called during runtime, not only at module init and exit.
Signed-off-by: Dan Streetman <ddstreet@ieee.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tadeusz Struk [Wed, 22 Jul 2015 05:07:47 +0000 (22:07 -0700)]
crypto: qat - Don't attempt to register algorithm multiple times
When multiple devices are present in the system the driver attempts
to register the same algorithm many times.
Changes in v2:
- use proper synchronization mechanizm between register and unregister
Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tadeusz Struk [Tue, 21 Jul 2015 00:18:27 +0000 (17:18 -0700)]
crypto: qat - fix invalid check for RSA keylen in fips mode
The condition checking allowed key length was invalid.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tadeusz Struk [Tue, 21 Jul 2015 00:18:26 +0000 (17:18 -0700)]
crypto: rsa - fix invalid check for keylen in fips mode
The condition checking allowed key length was invalid.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
LABBE Corentin [Fri, 17 Jul 2015 14:39:42 +0000 (16:39 +0200)]
MAINTAINERS: Add myself as maintainer of Allwinner Security System
Signed-off-by: LABBE Corentin <clabbe.montjoie@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
LABBE Corentin [Fri, 17 Jul 2015 14:39:41 +0000 (16:39 +0200)]
crypto: sunxi-ss - Add Allwinner Security System crypto accelerator
Add support for the Security System included in Allwinner SoC A20.
The Security System is a hardware cryptographic accelerator that support:
- MD5 and SHA1 hash algorithms
- AES block cipher in CBC/ECB mode with 128/196/256bits keys.
- DES and 3DES block cipher in CBC/ECB mode
Signed-off-by: LABBE Corentin <clabbe.montjoie@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
LABBE Corentin [Fri, 17 Jul 2015 14:39:40 +0000 (16:39 +0200)]
ARM: sun4i: dt: Add DT bindings documentation for SUN4I Security System
This patch adds documentation for Device-Tree bindings for the
Security System cryptographic accelerator driver.
Signed-off-by: LABBE Corentin <clabbe.montjoie@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
LABBE Corentin [Fri, 17 Jul 2015 14:39:39 +0000 (16:39 +0200)]
ARM: sun7i: dt: Add Security System to A20 SoC DTS
The Security System is a hardware cryptographic accelerator that support
AES/MD5/SHA1/DES/3DES/PRNG algorithms.
It could be found on many Allwinner SoC.
This patch enable the Security System on the Allwinner A20 SoC Device-tree.
Signed-off-by: LABBE Corentin <clabbe.montjoie@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
LABBE Corentin [Fri, 17 Jul 2015 14:39:38 +0000 (16:39 +0200)]
ARM: sun4i: dt: Add Security System to A10 SoC DTS
The Security System is a hardware cryptographic accelerator that support
AES/MD5/SHA1/DES/3DES/PRNG algorithms.
It could be found on many Allwinner SoC.
This patch enable the Security System on the Allwinner A10 SoC Device-tree.
Signed-off-by: LABBE Corentin <clabbe.montjoie@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tudor Ambarus [Fri, 17 Jul 2015 13:54:54 +0000 (16:54 +0300)]
crypto: caam - fix warning in APPEND_MATH_IMM_u64
An implicit truncation is done when using a variable of 64 bits
in MATH command:
warning: large integer implicitly truncated to unsigned type [-Woverflow]
Silence the compiler by feeding it with an explicit truncated value.
Signed-off-by: Tudor Ambarus <tudor.ambarus@freescale.com>
Signed-off-by: Horia Geant? <horia.geanta@freescale.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Horia Geant? [Fri, 17 Jul 2015 13:54:53 +0000 (16:54 +0300)]
crypto: caam - fix RNG init descriptor ret. code checking
When successful, the descriptor that performs RNG initialization
is allowed to return a status code of 7000_0000h, since last command
in the descriptor is a JUMP HALT.
Signed-off-by: Horia Geant? <horia.geanta@freescale.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Horia Geant? [Fri, 17 Jul 2015 13:54:52 +0000 (16:54 +0300)]
crypto: caam - fix snooping for write transactions
HW coherency won't work properly for CAAM write transactions
if AWCACHE is left to default (POR) value - 4'b0001.
It has to be programmed to 4'b0010, i.e. AXI3 Cacheable bit set.
For platforms that have HW coherency support:
-PPC-based: the update has no effect; CAAM coherency already works
due to the IOMMU (PAMU) driver setting the correct memory coherency
attributes
-ARM-based: the update fixes cache coherency issues,
since IOMMU (SMMU) driver is not programmed to behave similar to PAMU
Signed-off-by: Horia Geant? <horia.geanta@freescale.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Alex Porosanu [Fri, 17 Jul 2015 13:54:51 +0000 (16:54 +0300)]
crypto: caam - fix ERA property reading
In order to ensure that the ERA property is properly read from DT
on all platforms, of_property_read* function needs to be used.
Signed-off-by: Alex Porosanu <alexandru.porosanu@freescale.com>
Signed-off-by: Horia Geant? <horia.geanta@freescale.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Martin Willi [Thu, 16 Jul 2015 17:14:08 +0000 (19:14 +0200)]
crypto: poly1305 - Add a four block AVX2 variant for x86_64
Extends the x86_64 Poly1305 authenticator by a function processing four
consecutive Poly1305 blocks in parallel using AVX2 instructions.
For large messages, throughput increases by ~15-45% compared to two
block SSE2:
testing speed of poly1305 (poly1305-simd)
test 0 ( 96 byte blocks, 16 bytes per update, 6 updates):
3809514 opers/sec,
365713411 bytes/sec
test 1 ( 96 byte blocks, 32 bytes per update, 3 updates):
5973423 opers/sec,
573448627 bytes/sec
test 2 ( 96 byte blocks, 96 bytes per update, 1 updates):
9446779 opers/sec,
906890803 bytes/sec
test 3 ( 288 byte blocks, 16 bytes per update, 18 updates):
1364814 opers/sec,
393066691 bytes/sec
test 4 ( 288 byte blocks, 32 bytes per update, 9 updates):
2045780 opers/sec,
589184697 bytes/sec
test 5 ( 288 byte blocks, 288 bytes per update, 1 updates):
3711946 opers/sec,
1069040592 bytes/sec
test 6 ( 1056 byte blocks, 32 bytes per update, 33 updates): 573686 opers/sec,
605812732 bytes/sec
test 7 ( 1056 byte blocks, 1056 bytes per update, 1 updates):
1647802 opers/sec,
1740079440 bytes/sec
test 8 ( 2080 byte blocks, 32 bytes per update, 65 updates): 292970 opers/sec,
609378224 bytes/sec
test 9 ( 2080 byte blocks, 2080 bytes per update, 1 updates): 943229 opers/sec,
1961916528 bytes/sec
test 10 ( 4128 byte blocks, 4128 bytes per update, 1 updates): 494623 opers/sec,
2041804569 bytes/sec
test 11 ( 8224 byte blocks, 8224 bytes per update, 1 updates): 254045 opers/sec,
2089271014 bytes/sec
testing speed of poly1305 (poly1305-simd)
test 0 ( 96 byte blocks, 16 bytes per update, 6 updates):
3826224 opers/sec,
367317552 bytes/sec
test 1 ( 96 byte blocks, 32 bytes per update, 3 updates):
5948638 opers/sec,
571069267 bytes/sec
test 2 ( 96 byte blocks, 96 bytes per update, 1 updates):
9439110 opers/sec,
906154627 bytes/sec
test 3 ( 288 byte blocks, 16 bytes per update, 18 updates):
1367756 opers/sec,
393913872 bytes/sec
test 4 ( 288 byte blocks, 32 bytes per update, 9 updates):
2056881 opers/sec,
592381958 bytes/sec
test 5 ( 288 byte blocks, 288 bytes per update, 1 updates):
3711153 opers/sec,
1068812179 bytes/sec
test 6 ( 1056 byte blocks, 32 bytes per update, 33 updates): 574940 opers/sec,
607136745 bytes/sec
test 7 ( 1056 byte blocks, 1056 bytes per update, 1 updates):
1948830 opers/sec,
2057964585 bytes/sec
test 8 ( 2080 byte blocks, 32 bytes per update, 65 updates): 293308 opers/sec,
610082096 bytes/sec
test 9 ( 2080 byte blocks, 2080 bytes per update, 1 updates):
1235224 opers/sec,
2569267792 bytes/sec
test 10 ( 4128 byte blocks, 4128 bytes per update, 1 updates): 684405 opers/sec,
2825226316 bytes/sec
test 11 ( 8224 byte blocks, 8224 bytes per update, 1 updates): 367101 opers/sec,
3019039446 bytes/sec
Benchmark results from a Core i5-4670T.
Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Martin Willi [Thu, 16 Jul 2015 17:14:07 +0000 (19:14 +0200)]
crypto: poly1305 - Add a two block SSE2 variant for x86_64
Extends the x86_64 SSE2 Poly1305 authenticator by a function processing two
consecutive Poly1305 blocks in parallel using a derived key r^2. Loop
unrolling can be more effectively mapped to SSE instructions, further
increasing throughput.
For large messages, throughput increases by ~45-65% compared to single
block SSE2:
testing speed of poly1305 (poly1305-simd)
test 0 ( 96 byte blocks, 16 bytes per update, 6 updates):
3790063 opers/sec,
363846076 bytes/sec
test 1 ( 96 byte blocks, 32 bytes per update, 3 updates):
5913378 opers/sec,
567684355 bytes/sec
test 2 ( 96 byte blocks, 96 bytes per update, 1 updates):
9352574 opers/sec,
897847104 bytes/sec
test 3 ( 288 byte blocks, 16 bytes per update, 18 updates):
1362145 opers/sec,
392297990 bytes/sec
test 4 ( 288 byte blocks, 32 bytes per update, 9 updates):
2007075 opers/sec,
578037628 bytes/sec
test 5 ( 288 byte blocks, 288 bytes per update, 1 updates):
3709811 opers/sec,
1068425798 bytes/sec
test 6 ( 1056 byte blocks, 32 bytes per update, 33 updates): 566272 opers/sec,
597984182 bytes/sec
test 7 ( 1056 byte blocks, 1056 bytes per update, 1 updates):
1111657 opers/sec,
1173910108 bytes/sec
test 8 ( 2080 byte blocks, 32 bytes per update, 65 updates): 288857 opers/sec,
600823808 bytes/sec
test 9 ( 2080 byte blocks, 2080 bytes per update, 1 updates): 590746 opers/sec,
1228751888 bytes/sec
test 10 ( 4128 byte blocks, 4128 bytes per update, 1 updates): 301825 opers/sec,
1245936902 bytes/sec
test 11 ( 8224 byte blocks, 8224 bytes per update, 1 updates): 153075 opers/sec,
1258896201 bytes/sec
testing speed of poly1305 (poly1305-simd)
test 0 ( 96 byte blocks, 16 bytes per update, 6 updates):
3809514 opers/sec,
365713411 bytes/sec
test 1 ( 96 byte blocks, 32 bytes per update, 3 updates):
5973423 opers/sec,
573448627 bytes/sec
test 2 ( 96 byte blocks, 96 bytes per update, 1 updates):
9446779 opers/sec,
906890803 bytes/sec
test 3 ( 288 byte blocks, 16 bytes per update, 18 updates):
1364814 opers/sec,
393066691 bytes/sec
test 4 ( 288 byte blocks, 32 bytes per update, 9 updates):
2045780 opers/sec,
589184697 bytes/sec
test 5 ( 288 byte blocks, 288 bytes per update, 1 updates):
3711946 opers/sec,
1069040592 bytes/sec
test 6 ( 1056 byte blocks, 32 bytes per update, 33 updates): 573686 opers/sec,
605812732 bytes/sec
test 7 ( 1056 byte blocks, 1056 bytes per update, 1 updates):
1647802 opers/sec,
1740079440 bytes/sec
test 8 ( 2080 byte blocks, 32 bytes per update, 65 updates): 292970 opers/sec,
609378224 bytes/sec
test 9 ( 2080 byte blocks, 2080 bytes per update, 1 updates): 943229 opers/sec,
1961916528 bytes/sec
test 10 ( 4128 byte blocks, 4128 bytes per update, 1 updates): 494623 opers/sec,
2041804569 bytes/sec
test 11 ( 8224 byte blocks, 8224 bytes per update, 1 updates): 254045 opers/sec,
2089271014 bytes/sec
Benchmark results from a Core i5-4670T.
Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Martin Willi [Thu, 16 Jul 2015 17:14:06 +0000 (19:14 +0200)]
crypto: poly1305 - Add a SSE2 SIMD variant for x86_64
Implements an x86_64 assembler driver for the Poly1305 authenticator. This
single block variant holds the 130-bit integer in 5 32-bit words, but uses
SSE to do two multiplications/additions in parallel.
When calling updates with small blocks, the overhead for kernel_fpu_begin/
kernel_fpu_end() negates the perfmance gain. We therefore use the
poly1305-generic fallback for small updates.
For large messages, throughput increases by ~5-10% compared to
poly1305-generic:
testing speed of poly1305 (poly1305-generic)
test 0 ( 96 byte blocks, 16 bytes per update, 6 updates):
4080026 opers/sec,
391682496 bytes/sec
test 1 ( 96 byte blocks, 32 bytes per update, 3 updates):
6221094 opers/sec,
597225024 bytes/sec
test 2 ( 96 byte blocks, 96 bytes per update, 1 updates):
9609750 opers/sec,
922536057 bytes/sec
test 3 ( 288 byte blocks, 16 bytes per update, 18 updates):
1459379 opers/sec,
420301267 bytes/sec
test 4 ( 288 byte blocks, 32 bytes per update, 9 updates):
2115179 opers/sec,
609171609 bytes/sec
test 5 ( 288 byte blocks, 288 bytes per update, 1 updates):
3729874 opers/sec,
1074203856 bytes/sec
test 6 ( 1056 byte blocks, 32 bytes per update, 33 updates): 593000 opers/sec,
626208000 bytes/sec
test 7 ( 1056 byte blocks, 1056 bytes per update, 1 updates):
1081536 opers/sec,
1142102332 bytes/sec
test 8 ( 2080 byte blocks, 32 bytes per update, 65 updates): 302077 opers/sec,
628320576 bytes/sec
test 9 ( 2080 byte blocks, 2080 bytes per update, 1 updates): 554384 opers/sec,
1153120176 bytes/sec
test 10 ( 4128 byte blocks, 4128 bytes per update, 1 updates): 278715 opers/sec,
1150536345 bytes/sec
test 11 ( 8224 byte blocks, 8224 bytes per update, 1 updates): 140202 opers/sec,
1153022070 bytes/sec
testing speed of poly1305 (poly1305-simd)
test 0 ( 96 byte blocks, 16 bytes per update, 6 updates):
3790063 opers/sec,
363846076 bytes/sec
test 1 ( 96 byte blocks, 32 bytes per update, 3 updates):
5913378 opers/sec,
567684355 bytes/sec
test 2 ( 96 byte blocks, 96 bytes per update, 1 updates):
9352574 opers/sec,
897847104 bytes/sec
test 3 ( 288 byte blocks, 16 bytes per update, 18 updates):
1362145 opers/sec,
392297990 bytes/sec
test 4 ( 288 byte blocks, 32 bytes per update, 9 updates):
2007075 opers/sec,
578037628 bytes/sec
test 5 ( 288 byte blocks, 288 bytes per update, 1 updates):
3709811 opers/sec,
1068425798 bytes/sec
test 6 ( 1056 byte blocks, 32 bytes per update, 33 updates): 566272 opers/sec,
597984182 bytes/sec
test 7 ( 1056 byte blocks, 1056 bytes per update, 1 updates):
1111657 opers/sec,
1173910108 bytes/sec
test 8 ( 2080 byte blocks, 32 bytes per update, 65 updates): 288857 opers/sec,
600823808 bytes/sec
test 9 ( 2080 byte blocks, 2080 bytes per update, 1 updates): 590746 opers/sec,
1228751888 bytes/sec
test 10 ( 4128 byte blocks, 4128 bytes per update, 1 updates): 301825 opers/sec,
1245936902 bytes/sec
test 11 ( 8224 byte blocks, 8224 bytes per update, 1 updates): 153075 opers/sec,
1258896201 bytes/sec
Benchmark results from a Core i5-4670T.
Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Martin Willi [Thu, 16 Jul 2015 17:14:05 +0000 (19:14 +0200)]
crypto: poly1305 - Export common Poly1305 helpers
As architecture specific drivers need a software fallback, export Poly1305
init/update/final functions together with some helpers in a header file.
Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Martin Willi [Thu, 16 Jul 2015 17:14:04 +0000 (19:14 +0200)]
crypto: testmgr - Add a longer ChaCha20 test vector
The AVX2 variant of ChaCha20 is used only for messages with >= 512 bytes
length. With the existing test vectors, the implementation could not be
tested. Due that lack of such a long official test vector, this one is
self-generated using chacha20-generic.
Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Martin Willi [Thu, 16 Jul 2015 17:14:03 +0000 (19:14 +0200)]
crypto: chacha20 - Add an eight block AVX2 variant for x86_64
Extends the x86_64 ChaCha20 implementation by a function processing eight
ChaCha20 blocks in parallel using AVX2.
For large messages, throughput increases by ~55-70% compared to four block
SSSE3:
testing speed of chacha20 (chacha20-simd) encryption
test 0 (256 bit key, 16 byte blocks):
42249230 operations in 10 seconds (
675987680 bytes)
test 1 (256 bit key, 64 byte blocks):
46441641 operations in 10 seconds (
2972265024 bytes)
test 2 (256 bit key, 256 byte blocks):
33028112 operations in 10 seconds (
8455196672 bytes)
test 3 (256 bit key, 1024 byte blocks):
11568759 operations in 10 seconds (
11846409216 bytes)
test 4 (256 bit key, 8192 byte blocks):
1448761 operations in 10 seconds (
11868250112 bytes)
testing speed of chacha20 (chacha20-simd) encryption
test 0 (256 bit key, 16 byte blocks):
41999675 operations in 10 seconds (
671994800 bytes)
test 1 (256 bit key, 64 byte blocks):
45805908 operations in 10 seconds (
2931578112 bytes)
test 2 (256 bit key, 256 byte blocks):
32814947 operations in 10 seconds (
8400626432 bytes)
test 3 (256 bit key, 1024 byte blocks):
19777167 operations in 10 seconds (
20251819008 bytes)
test 4 (256 bit key, 8192 byte blocks):
2279321 operations in 10 seconds (
18672197632 bytes)
Benchmark results from a Core i5-4670T.
Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Martin Willi [Thu, 16 Jul 2015 17:14:02 +0000 (19:14 +0200)]
crypto: chacha20 - Add a four block SSSE3 variant for x86_64
Extends the x86_64 SSSE3 ChaCha20 implementation by a function processing
four ChaCha20 blocks in parallel. This avoids the word shuffling needed
in the single block variant, further increasing throughput.
For large messages, throughput increases by ~110% compared to single block
SSSE3:
testing speed of chacha20 (chacha20-simd) encryption
test 0 (256 bit key, 16 byte blocks):
43141886 operations in 10 seconds (
690270176 bytes)
test 1 (256 bit key, 64 byte blocks):
46845874 operations in 10 seconds (
2998135936 bytes)
test 2 (256 bit key, 256 byte blocks):
18458512 operations in 10 seconds (
4725379072 bytes)
test 3 (256 bit key, 1024 byte blocks):
5360533 operations in 10 seconds (
5489185792 bytes)
test 4 (256 bit key, 8192 byte blocks): 692846 operations in 10 seconds (
5675794432 bytes)
testing speed of chacha20 (chacha20-simd) encryption
test 0 (256 bit key, 16 byte blocks):
42249230 operations in 10 seconds (
675987680 bytes)
test 1 (256 bit key, 64 byte blocks):
46441641 operations in 10 seconds (
2972265024 bytes)
test 2 (256 bit key, 256 byte blocks):
33028112 operations in 10 seconds (
8455196672 bytes)
test 3 (256 bit key, 1024 byte blocks):
11568759 operations in 10 seconds (
11846409216 bytes)
test 4 (256 bit key, 8192 byte blocks):
1448761 operations in 10 seconds (
11868250112 bytes)
Benchmark results from a Core i5-4670T.
Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Martin Willi [Thu, 16 Jul 2015 17:14:01 +0000 (19:14 +0200)]
crypto: chacha20 - Add a SSSE3 SIMD variant for x86_64
Implements an x86_64 assembler driver for the ChaCha20 stream cipher. This
single block variant works on a single state matrix using SSE instructions.
It requires SSSE3 due the use of pshufb for efficient 8/16-bit rotate
operations.
For large messages, throughput increases by ~65% compared to
chacha20-generic:
testing speed of chacha20 (chacha20-generic) encryption
test 0 (256 bit key, 16 byte blocks):
45089207 operations in 10 seconds (
721427312 bytes)
test 1 (256 bit key, 64 byte blocks):
43839521 operations in 10 seconds (
2805729344 bytes)
test 2 (256 bit key, 256 byte blocks):
12702056 operations in 10 seconds (
3251726336 bytes)
test 3 (256 bit key, 1024 byte blocks):
3371173 operations in 10 seconds (
3452081152 bytes)
test 4 (256 bit key, 8192 byte blocks): 422468 operations in 10 seconds (
3460857856 bytes)
testing speed of chacha20 (chacha20-simd) encryption
test 0 (256 bit key, 16 byte blocks):
43141886 operations in 10 seconds (
690270176 bytes)
test 1 (256 bit key, 64 byte blocks):
46845874 operations in 10 seconds (
2998135936 bytes)
test 2 (256 bit key, 256 byte blocks):
18458512 operations in 10 seconds (
4725379072 bytes)
test 3 (256 bit key, 1024 byte blocks):
5360533 operations in 10 seconds (
5489185792 bytes)
test 4 (256 bit key, 8192 byte blocks): 692846 operations in 10 seconds (
5675794432 bytes)
Benchmark results from a Core i5-4670T.
Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Martin Willi [Thu, 16 Jul 2015 17:14:00 +0000 (19:14 +0200)]
crypto: chacha20 - Export common ChaCha20 helpers
As architecture specific drivers need a software fallback, export a
ChaCha20 en-/decryption function together with some helpers in a header
file.
Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Martin Willi [Thu, 16 Jul 2015 17:13:59 +0000 (19:13 +0200)]
crypto: tcrypt - Add ChaCha20/Poly1305 speed tests
Adds individual ChaCha20 and Poly1305 and a combined rfc7539esp AEAD speed
test using mode numbers 214, 321 and 213. For Poly1305 we add a specific
speed template, as it expects the key prepended to the input data.
Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Herbert Xu [Thu, 16 Jul 2015 04:35:08 +0000 (12:35 +0800)]
crypto: chacha20poly1305 - Convert to new AEAD interface
This patch converts rfc7539 and rfc7539esp to the new AEAD interface.
The test vectors for rfc7539esp have also been updated to include
the IV.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tested-by: Martin Willi <martin@strongswan.org>
Tadeusz Struk [Wed, 15 Jul 2015 22:28:43 +0000 (15:28 -0700)]
crypto: rsa - limit supported key lengths
Introduce constrains for RSA keys lengths.
Only key lengths of 512, 1024, 1536, 2048, 3072, and 4096 bits
will be supported.
Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tadeusz Struk [Wed, 15 Jul 2015 22:28:38 +0000 (15:28 -0700)]
crypto: qat - Add support for RSA algorithm
Add RSA support to QAT driver.
Removed unused RNG rings.
Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tadeusz Struk [Wed, 15 Jul 2015 22:28:32 +0000 (15:28 -0700)]
crypto: qat - add MMP FW support to accel engine
Add code that loads the MMP firmware
Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Pingchao Yang [Wed, 15 Jul 2015 22:28:26 +0000 (15:28 -0700)]
crypto: qat - add support for MMP FW
Load Modular Math Processor(MMP) firmware into QAT devices to support
public key algorithm acceleration.
Signed-off-by: Pingchao Yang <pingchao.yang@intel.com>
Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Herbert Xu [Tue, 14 Jul 2015 08:53:22 +0000 (16:53 +0800)]
crypto: testmgr - Reenable rfc4309 test
Now that all implementations of rfc4309 have been converted we can
reenable the test.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Herbert Xu [Tue, 14 Jul 2015 08:53:21 +0000 (16:53 +0800)]
crypto: nx - Convert ccm to new AEAD interface
This patch converts the nx ccm and 4309 implementations to the
new AEAD interface.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Herbert Xu [Tue, 14 Jul 2015 08:53:19 +0000 (16:53 +0800)]
crypto: aes-ce-ccm - Convert to new AEAD interface
This patch converts the ARM64 aes-ce-ccm implementation to the
new AEAD interface.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Herbert Xu [Tue, 14 Jul 2015 08:53:18 +0000 (16:53 +0800)]
crypto: ccm - Convert to new AEAD interface
This patch converts generic ccm and its associated transforms to
the new AEAD interface.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Herbert Xu [Tue, 14 Jul 2015 08:53:17 +0000 (16:53 +0800)]
crypto: testmgr - Disable rfc4309 test and convert test vectors
This patch disables the rfc4309 test while the conversion to the
new seqiv calling convention takes place. It also replaces the
rfc4309 test vectors with ones that will work with the new IV
convention.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Leonidas Da Silva Barbosa [Mon, 13 Jul 2015 16:51:39 +0000 (13:51 -0300)]
crypto: vmx - Adding enable_kernel_vsx() to access VSX instructions
vmx-crypto driver make use of some VSX instructions which are
only available if VSX is enabled. Running in cases where VSX
are not enabled vmx-crypto fails in a VSX exception.
In order to fix this enable_kernel_vsx() was added to turn on
VSX instructions for vmx-crypto.
Signed-off-by: Leonidas S. Barbosa <leosilva@linux.vnet.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Leonidas Da Silva Barbosa [Mon, 13 Jul 2015 16:51:01 +0000 (13:51 -0300)]
powerpc: Uncomment and make enable_kernel_vsx() routine available
enable_kernel_vsx() function was commented since anything was using
it. However, vmx-crypto driver uses VSX instructions which are
only available if VSX is enable. Otherwise it rises an exception oops.
This patch uncomment enable_kernel_vsx() routine and makes it available.
Signed-off-by: Leonidas S. Barbosa <leosilva@linux.vnet.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Krzysztof Kozlowski [Fri, 10 Jul 2015 05:46:16 +0000 (14:46 +0900)]
crypto: marvell/cesa - Drop owner assignment from platform_driver
platform_driver does not need to set an owner because
platform_driver_register() will set it.
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Herbert Xu [Wed, 8 Jul 2015 23:17:34 +0000 (07:17 +0800)]
crypto: testmgr - Reenable rfc4106 test
Now that all implementations of rfc4106 have been converted we can
reenable the test.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Herbert Xu [Wed, 8 Jul 2015 23:17:33 +0000 (07:17 +0800)]
crypto: caam - Use new IV convention
This patch converts rfc4106 to the new calling convention where
the IV is now part of the AD and needs to be skipped.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Herbert Xu [Wed, 8 Jul 2015 23:17:31 +0000 (07:17 +0800)]
crypto: nx - Use new IV convention
This patch converts rfc4106 to the new calling convention where
the IV is now part of the AD and needs to be skipped. This patch
also makes use of type-safe AEAD functions where possible.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Herbert Xu [Wed, 8 Jul 2015 23:17:30 +0000 (07:17 +0800)]
crypto: gcm - Use new IV convention
This patch converts rfc4106 to the new calling convention where
the IV is now part of the AD and needs to be skipped. This patch
also makes use of the new type-safe way of freeing instances.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Herbert Xu [Wed, 8 Jul 2015 23:17:28 +0000 (07:17 +0800)]
crypto: aesni - Use new IV convention
This patch converts rfc4106 to the new calling convention where
the IV is now in the AD and needs to be skipped.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Herbert Xu [Wed, 8 Jul 2015 23:17:26 +0000 (07:17 +0800)]
crypto: tcrypt - Add support for new IV convention
This patch allows the AEAD speed tests to cope with the new seqiv
calling convention as well as the old one.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Herbert Xu [Wed, 8 Jul 2015 23:17:25 +0000 (07:17 +0800)]
crypto: testmgr - Disable rfc4106 test and convert test vectors
This patch disables the rfc4106 test while the conversion to the
new seqiv calling convention takes place. It also converts the
rfc4106 test vectors to the new format.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Herbert Xu [Wed, 8 Jul 2015 23:17:23 +0000 (07:17 +0800)]
crypto: aead - Propagate new AEAD implementation flag for IV generators
This patch allows the CRYPTO_ALG_AEAD_NEW flag to be propagated.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Herbert Xu [Wed, 8 Jul 2015 23:17:22 +0000 (07:17 +0800)]
crypto: seqiv - Replace seqniv with seqiv
This patch replaces the seqniv generator with seqiv when the
underlying algorithm understands the new calling convention.
This not only makes more sense as now seqiv is solely responsible
for IV generation rather than also determining how the IV is going
to be used, it also allows for optimisations in the underlying
implementation. For example, the space for the IV could be used
to add padding for authentication.
This patch also removes the unnecessary copying of IV to dst
during seqiv decryption as the IV is part of the AD and not cipher
text.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Herbert Xu [Wed, 8 Jul 2015 23:17:20 +0000 (07:17 +0800)]
crypto: echainiv - Fix encryption convention
This patch fixes a bug where we were incorrectly including the
IV in the AD during encryption. The IV must remain in the plain
text for it to be encrypted.
During decryption there is no need to copy the IV to dst because
it's now part of the AD.
This patch removes an unncessary check on authsize which would be
performed by the underlying decrypt call.
Finally this patch makes use of the type-safe init/exit functions.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Herbert Xu [Wed, 8 Jul 2015 23:17:19 +0000 (07:17 +0800)]
crypto: cryptd - Propagate new AEAD implementation flag
This patch allows the CRYPTO_ALG_AEAD_NEW flag to be propagated.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Herbert Xu [Wed, 8 Jul 2015 23:17:18 +0000 (07:17 +0800)]
crypto: pcrypt - Propagate new AEAD implementation flag
This patch allows the CRYPTO_ALG_AEAD_NEW flag to be propagated.
It also restores the ASYNC bit that went missing during the AEAD
conversion.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Herbert Xu [Wed, 8 Jul 2015 23:17:17 +0000 (07:17 +0800)]
crypto: aead - Add type-safe function for freeing instances
This patch adds a type-safe function for freeing AEAD instances
to struct aead_instance. This replaces the existing free function
in struct crypto_template which does not know the type of the
instance that it's freeing.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Herbert Xu [Wed, 8 Jul 2015 23:17:15 +0000 (07:17 +0800)]
crypto: api - Add instance free function to crypto_type
Currently the task of freeing an instance is given to the crypto
template. However, it has no type information on the instance so
we have to resort to checking type information at runtime.
This patch introduces a free function to crypto_type that will be
used to free an instance. This can then be used to free an instance
in a type-safe manner.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Herbert Xu [Wed, 8 Jul 2015 13:40:39 +0000 (21:40 +0800)]
crypto: nx/842 - Fix context corruption
The transform context is shared memory and must not be written
to without locking. This patch adds locking to nx-842 to prevent
context corruption.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Herbert Xu [Wed, 8 Jul 2015 04:15:14 +0000 (12:15 +0800)]
crypto: aead - Add aead_queue interface
This patch adds a type-safe queueing interface for AEAD.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Herbert Xu [Wed, 8 Jul 2015 03:55:30 +0000 (11:55 +0800)]
crypto: api - Remove unused __crypto_dequeue_request
The function __crypto_dequeue_request is completely unused.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Herbert Xu [Tue, 14 Jul 2015 06:55:32 +0000 (14:55 +0800)]
Merge git://git./linux/kernel/git/herbert/crypto-2.6
Merge the crypto tree to pull in the nx reentrancy patch.
Vutla, Lokesh [Tue, 7 Jul 2015 15:31:49 +0000 (21:01 +0530)]
crypto: tcrypt - Fix AEAD speed tests
The AEAD speed tests doesn't do a wait_for_completition,
if the return value is EINPROGRESS or EBUSY.
Fixing it here.
Also add a test case for gcm(aes).
Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Vutla, Lokesh [Tue, 7 Jul 2015 15:31:46 +0000 (21:01 +0530)]
crypto: omap-aes - Use BIT() macro
Use BIT()/GENMASK() macros for all register definitions instead of
hand-writing bit masks.
Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Vutla, Lokesh [Tue, 7 Jul 2015 15:31:45 +0000 (21:01 +0530)]
crypto: omap-aes - Fix configuring of AES mode
AES_CTRL_REG is used to configure AES mode. Before configuring
any mode we need to make sure all other modes are reset or else
driver will misbehave. So mask all modes before configuring
any AES mode.
Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Vutla, Lokesh [Tue, 7 Jul 2015 15:31:44 +0000 (21:01 +0530)]
crypto: omap-aes - Increase priority of hw accelerator
Increasing the priority of omap-aes hw algos, in order to take
precedence over sw algos.
Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Vutla, Lokesh [Tue, 7 Jul 2015 15:31:43 +0000 (21:01 +0530)]
crypto: omap-aes - Fix CTR mode
Algo self tests are failing for CTR mode with omap-aes driver,
giving the following error:
[ 150.053644] omap_aes_crypt: request size is not exact amount of AES blocks
[ 150.061262] alg: skcipher: encryption failed on test 5 for ctr-aes-omap: ret=22
This is because the input length is not aligned with AES_BLOCK_SIZE.
Adding support for omap-aes driver for inputs with length not aligned
with AES_BLOCK_SIZE.
Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Herbert Xu [Tue, 7 Jul 2015 09:30:25 +0000 (17:30 +0800)]
crypto: nx - Fix reentrancy bugs
This patch fixes a host of reentrancy bugs in the nx driver. The
following algorithms are affected:
* CCM
* GCM
* CTR
* XCBC
* SHA256
* SHA512
The crypto API allows a single transform to be used by multiple
threads simultaneously. For example, IPsec will use a single tfm
to process packets for a given SA. As packets may arrive on
multiple CPUs that tfm must be reentrant.
The nx driver does try to deal with this by using a spin lock.
Unfortunately only the basic AES/CBC/ECB algorithms do this in
the correct way.
The symptom of these bugs may range from the generation of incorrect
output to memory corruption.
Cc: stable@vger.kernel.org
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Nishanth Aravamudan [Mon, 6 Jul 2015 17:06:21 +0000 (10:06 -0700)]
crypto: nx - reduce chattiness of platform drivers
While we never would successfully load on the wrong machine type, there
is extra output by default regardless of machine type.
For instance, on a PowerVM LPAR, we see the following:
nx_compress_powernv: loading
nx_compress_powernv: no coprocessors found
even though those coprocessors could never be found.
Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Cc: Dan Streetman <ddstreet@us.ibm.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: linux-crypto@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
LABBE Corentin [Mon, 6 Jul 2015 11:37:33 +0000 (13:37 +0200)]
crypto: testmgr - add a chunking test for cbc(aes)
All tests for cbc(aes) use only blocks of data with a multiple of 4.
This test adds a test with some odd SG size.
Signed-off-by: LABBE Corentin <clabbe.montjoie@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Herbert Xu [Mon, 6 Jul 2015 11:11:03 +0000 (19:11 +0800)]
crypto: cryptd - Fix AEAD request context corruption
The AEAD version of cryptd uses the same context for its own state
as well as that of the child. In doing so it did not maintain the
proper ordering, thus resulting in potential state corruption where
the child will overwrite the state stored by cryptd.
This patch fixes and also sets the request size properly.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Baruch Siach [Mon, 6 Jul 2015 04:03:37 +0000 (07:03 +0300)]
crypto: arm - ignore generated SHA2 assembly files
These files are generated since commits
f2f770d74a8d (crypto: arm/sha256 - Add
optimized SHA-256/224, 2015-04-03) and
c80ae7ca3726 (crypto: arm/sha512 -
accelerated SHA-512 using ARM generic ASM and NEON, 2015-05-08).
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Nishanth Aravamudan [Thu, 2 Jul 2015 22:40:09 +0000 (15:40 -0700)]
crypto: nx - do not emit extra output if status is disabled
If the device-tree indicates the nx-842 device's status is 'disabled',
we emit two messages:
nx_compress_pseries ibm,compression-v1: nx842_OF_upd_status: status 'disabled' is not 'okay'.
nx_compress_pseries ibm,compression-v1: nx842_OF_upd: device disabled
Given that 'disabled' is a valid state, and we are going to emit that
the device is disabled, only print out a non-'okay' status if it is not
'disabled'.
Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Nishanth Aravamudan [Thu, 2 Jul 2015 22:39:21 +0000 (15:39 -0700)]
crypto: nx - rename nx842_{init, exit} to nx842_pseries_{init, exit}
While there is no technical reason that both nx-842.c and
nx-842-pseries.c can have the same name for the init/exit functions, it
is a bit confusing with initcall_debug. Rename the pseries specific
functions appropriately
Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Nishanth Aravamudan [Thu, 2 Jul 2015 22:38:48 +0000 (15:38 -0700)]
crypto: nx - nx842_OF_upd_status should return ENODEV if device is not 'okay'
The current documention mentions explicitly that EINVAL should be
returned if the device is not available, but nx842_OF_upd_status()
always returns 0. However, nx842_probe() specifically checks for
non-ENODEV returns from nx842_of_upd() (which in turn calls
nx842_OF_upd_status()) and emits an extra error in that case. It seems
like the proper return code of a disabled device is ENODEV.
Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tom Lendacky [Tue, 30 Jun 2015 17:57:14 +0000 (12:57 -0500)]
crypto: ccp - Provide support to autoload CCP driver
Add the necessary module device tables to the platform support to allow
for autoloading of the CCP driver. This will allow for the CCP's hwrng
support to be available without having to manually load the driver. The
module device table entry for the pci support is already present.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Vutla, Lokesh [Thu, 2 Jul 2015 13:03:28 +0000 (18:33 +0530)]
crypto: omap-des - Fix unmapping of dma channels
dma_unmap_sg() is being called twice after completing the
task. Looks like this is a copy paste error when creating
des driver.
With this the following warn appears during boot:
[ 4.210457] ------------[ cut here ]------------
[ 4.215114] WARNING: CPU: 0 PID: 0 at lib/dma-debug.c:1080 check_unmap+0x710/0x9a0()
[ 4.222899] omap-des
480a5000.des: DMA-API: device driver tries to free DMA memory it has not allocated [device address=0x00000000ab2ce000] [size=8 bytes]
[ 4.236785] Modules linked in:
[ 4.239860] CPU: 0 PID: 0 Comm: swapper/0 Not tainted
3.14.39-02999-g1bc045a-dirty #182
[ 4.247918] [<
c001678c>] (unwind_backtrace) from [<
c0012574>] (show_stack+0x10/0x14)
[ 4.255710] [<
c0012574>] (show_stack) from [<
c05a37e8>] (dump_stack+0x84/0xb8)
[ 4.262977] [<
c05a37e8>] (dump_stack) from [<
c0046464>] (warn_slowpath_common+0x68/0x8c)
[ 4.271107] [<
c0046464>] (warn_slowpath_common) from [<
c004651c>] (warn_slowpath_fmt+0x30/0x40)
[ 4.279854] [<
c004651c>] (warn_slowpath_fmt) from [<
c02d50a4>] (check_unmap+0x710/0x9a0)
[ 4.287991] [<
c02d50a4>] (check_unmap) from [<
c02d5478>] (debug_dma_unmap_sg+0x90/0x19c)
[ 4.296128] [<
c02d5478>] (debug_dma_unmap_sg) from [<
c04a77d8>] (omap_des_done_task+0x1cc/0x3e4)
[ 4.304963] [<
c04a77d8>] (omap_des_done_task) from [<
c004a090>] (tasklet_action+0x84/0x124)
[ 4.313370] [<
c004a090>] (tasklet_action) from [<
c004a4ac>] (__do_softirq+0xf0/0x20c)
[ 4.321235] [<
c004a4ac>] (__do_softirq) from [<
c004a840>] (irq_exit+0x98/0xec)
[ 4.328500] [<
c004a840>] (irq_exit) from [<
c000f9ac>] (handle_IRQ+0x50/0xb0)
[ 4.335589] [<
c000f9ac>] (handle_IRQ) from [<
c0008688>] (gic_handle_irq+0x28/0x5c)
Removing the duplicate call to dma_unmap_sg().
Cc: stable@vger.kernel.org
Reported-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Linus Torvalds [Sun, 5 Jul 2015 18:01:52 +0000 (11:01 -0700)]
Linux 4.2-rc1
Linus Torvalds [Sun, 5 Jul 2015 17:54:09 +0000 (10:54 -0700)]
Merge tag 'platform-drivers-x86-v4.2-2' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86
Pull late x86 platform driver updates from Darren Hart:
"The following came in a bit later and I wanted them to bake in next a
few more days before submitting, thus the second pull.
A new intel_pmc_ipc driver, a symmetrical allocation and free fix in
dell-laptop, a couple minor fixes, and some updated documentation in
the dell-laptop comments.
intel_pmc_ipc:
- Add Intel Apollo Lake PMC IPC driver
tc1100-wmi:
- Delete an unnecessary check before the function call "kfree"
dell-laptop:
- Fix allocating & freeing SMI buffer page
- Show info about WiGig and UWB in debugfs
- Update information about wireless control"
* tag 'platform-drivers-x86-v4.2-2' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86:
intel_pmc_ipc: Add Intel Apollo Lake PMC IPC driver
tc1100-wmi: Delete an unnecessary check before the function call "kfree"
dell-laptop: Fix allocating & freeing SMI buffer page
dell-laptop: Show info about WiGig and UWB in debugfs
dell-laptop: Update information about wireless control
Linus Torvalds [Sun, 5 Jul 2015 02:36:06 +0000 (19:36 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/viro/vfs
Pull more vfs updates from Al Viro:
"Assorted VFS fixes and related cleanups (IMO the most interesting in
that part are f_path-related things and Eric's descriptor-related
stuff). UFS regression fixes (it got broken last cycle). 9P fixes.
fs-cache series, DAX patches, Jan's file_remove_suid() work"
[ I'd say this is much more than "fixes and related cleanups". The
file_table locking rule change by Eric Dumazet is a rather big and
fundamental update even if the patch isn't huge. - Linus ]
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (49 commits)
9p: cope with bogus responses from server in p9_client_{read,write}
p9_client_write(): avoid double p9_free_req()
9p: forgetting to cancel request on interrupted zero-copy RPC
dax: bdev_direct_access() may sleep
block: Add support for DAX reads/writes to block devices
dax: Use copy_from_iter_nocache
dax: Add block size note to documentation
fs/file.c: __fget() and dup2() atomicity rules
fs/file.c: don't acquire files->file_lock in fd_install()
fs:super:get_anon_bdev: fix race condition could cause dev exceed its upper limitation
vfs: avoid creation of inode number 0 in get_next_ino
namei: make set_root_rcu() return void
make simple_positive() public
ufs: use dir_pages instead of ufs_dir_pages()
pagemap.h: move dir_pages() over there
remove the pointless include of lglock.h
fs: cleanup slight list_entry abuse
xfs: Correctly lock inode when removing suid and file capabilities
fs: Call security_ops->inode_killpriv on truncate
fs: Provide function telling whether file_remove_privs() will do anything
...
Linus Torvalds [Sun, 5 Jul 2015 02:11:33 +0000 (19:11 -0700)]
bluetooth: fix list handling
Commit
835a6a2f8603 ("Bluetooth: Stop sabotaging list poisoning")
thought that the code was sabotaging the list poisoning when NULL'ing
out the list pointers and removed it.
But what was going on was that the bluetooth code was using NULL
pointers for the list as a way to mark it empty, and that commit just
broke it (and replaced the test with NULL with a "list_empty()" test on
a uninitialized list instead, breaking things even further).
So fix it all up to use the regular and real list_empty() handling
(which does not use NULL, but a pointer to itself), also making sure to
initialize the list properly (the previous NULL case was initialized
implicitly by the session being allocated with kzalloc())
This is a combination of patches by Marcel Holtmann and Tedd Ho-Jeong
An.
[ I would normally expect to get this through the bt tree, but I'm going
to release -rc1, so I'm just committing this directly - Linus ]
Reported-and-tested-by: Jörg Otte <jrg.otte@gmail.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Original-by: Tedd Ho-Jeong An <tedd.an@intel.com>
Original-by: Marcel Holtmann <marcel@holtmann.org>:
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sat, 4 Jul 2015 21:13:43 +0000 (14:13 -0700)]
Merge branch 'for-next' of git://git./linux/kernel/git/nab/target-pending
Pull SCSI target updates from Nicholas Bellinger:
"It's been a busy development cycle for target-core in a number of
different areas.
The fabric API usage for se_node_acl allocation is now within
target-core code, dropping the external API callers for all fabric
drivers tree-wide.
There is a new conversion to RCU hlists for se_node_acl and
se_portal_group LUN mappings, that turns fast-past LUN lookup into a
completely lockless code-path. It also removes the original
hard-coded limitation of 256 LUNs per fabric endpoint.
The configfs attributes for backends can now be shared between core
and driver code, allowing existing drivers to use common code while
still allowing flexibility for new backend provided attributes.
The highlights include:
- Merge sbc_verify_dif_* into common code (sagi)
- Remove iscsi-target support for obsolete IFMarker/OFMarker
(Christophe Vu-Brugier)
- Add bidi support in target/user backend (ilias + vangelis + agover)
- Move se_node_acl allocation into target-core code (hch)
- Add crc_t10dif_update common helper (akinobu + mkp)
- Handle target-core odd SGL mapping for data transfer memory
(akinobu)
- Move transport ID handling into target-core (hch)
- Move task tag into struct se_cmd + support 64-bit tags (bart)
- Convert se_node_acl->device_list[] to RCU hlist (nab + hch +
paulmck)
- Convert se_portal_group->tpg_lun_list[] to RCU hlist (nab + hch +
paulmck)
- Simplify target backend driver registration (hch)
- Consolidate + simplify target backend attribute implementations
(hch + nab)
- Subsume se_port + t10_alua_tg_pt_gp_member into se_lun (hch)
- Drop lun_sep_lock for se_lun->lun_se_dev RCU usage (hch + nab)
- Drop unnecessary core_tpg_register TFO parameter (nab)
- Use 64-bit LUNs tree-wide (hannes)
- Drop left-over TARGET_MAX_LUNS_PER_TRANSPORT limit (hannes)"
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (76 commits)
target: Bump core version to v5.0
target: remove target_core_configfs.h
target: remove unused TARGET_CORE_CONFIG_ROOT define
target: consolidate version defines
target: implement WRITE_SAME with UNMAP bit using ->execute_unmap
target: simplify UNMAP handling
target: replace se_cmd->execute_rw with a protocol_data field
target/user: Fix inconsistent kmap_atomic/kunmap_atomic
target: Send UA when changing LUN inventory
target: Send UA upon LUN RESET tmr completion
target: Send UA on ALUA target port group change
target: Convert se_lun->lun_deve_lock to normal spinlock
target: use 'se_dev_entry' when allocating UAs
target: Remove 'ua_nacl' pointer from se_ua structure
target_core_alua: Correct UA handling when switching states
xen-scsiback: Fix compile warning for 64-bit LUN
target: Remove TARGET_MAX_LUNS_PER_TRANSPORT
target: use 64-bit LUNs
target: Drop duplicate + unused se_dev_check_wce
target: Drop unnecessary core_tpg_register TFO parameter
...
Linus Torvalds [Sat, 4 Jul 2015 21:07:47 +0000 (14:07 -0700)]
Merge tag 'ntb-4.2' of git://github.com/jonmason/ntb
Pull NTB updates from Jon Mason:
"This includes a pretty significant reworking of the NTB core code, but
has already produced some significant performance improvements.
An abstraction layer was added to allow the hardware and clients to be
easily added. This required rewriting the NTB transport layer for
this abstraction layer. This modification will allow future "high
performance" NTB clients.
In addition to this change, a number of performance modifications were
added. These changes include NUMA enablement, using CPU memcpy
instead of asyncdma, and modification of NTB layer MTU size"
* tag 'ntb-4.2' of git://github.com/jonmason/ntb: (22 commits)
NTB: Add split BAR output for debugfs stats
NTB: Change WARN_ON_ONCE to pr_warn_once on unsafe
NTB: Print driver name and version in module init
NTB: Increase transport MTU to 64k from 16k
NTB: Rename Intel code names to platform names
NTB: Default to CPU memcpy for performance
NTB: Improve performance with write combining
NTB: Use NUMA memory in Intel driver
NTB: Use NUMA memory and DMA chan in transport
NTB: Rate limit ntb_qp_link_work
NTB: Add tool test client
NTB: Add ping pong test client
NTB: Add parameters for Intel SNB B2B addresses
NTB: Reset transport QP link stats on down
NTB: Do not advance transport RX on link down
NTB: Differentiate transport link down messages
NTB: Check the device ID to set errata flags
NTB: Enable link for Intel root port mode in probe
NTB: Read peer info from local SPAD in transport
NTB: Split ntb_hw_intel and ntb_transport drivers
...
Al Viro [Sat, 4 Jul 2015 20:17:39 +0000 (16:17 -0400)]
9p: cope with bogus responses from server in p9_client_{read,write}
if server claims to have written/read more than we'd told it to,
warn and cap the claimed byte count to avoid advancing more than
we are ready to.
Al Viro [Sat, 4 Jul 2015 20:11:05 +0000 (16:11 -0400)]
p9_client_write(): avoid double p9_free_req()
Braino in "9p: switch p9_client_write() to passing it struct iov_iter *";
if response is impossible to parse and we discard the request, get the
out of the loop right there.
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sat, 4 Jul 2015 20:04:19 +0000 (16:04 -0400)]
9p: forgetting to cancel request on interrupted zero-copy RPC
If we'd already sent a request and decide to abort it, we *must*
issue TFLUSH properly and not just blindly reuse the tag, or
we'll get seriously screwed when response eventually arrives
and we confuse it for response to later request that had reused
the same tag.
Cc: stable@vger.kernel.org # v3.2 and later
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Matthew Wilcox [Fri, 3 Jul 2015 14:40:43 +0000 (10:40 -0400)]
dax: bdev_direct_access() may sleep
The brd driver is the only in-tree driver that may sleep currently.
After some discussion on linux-fsdevel, we decided that any driver
may choose to sleep in its ->direct_access method. To ensure that all
callers of bdev_direct_access() are prepared for this, add a call
to might_sleep().
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Matthew Wilcox [Fri, 3 Jul 2015 14:40:42 +0000 (10:40 -0400)]
block: Add support for DAX reads/writes to block devices
If a block device supports the ->direct_access methods, bypass the normal
DIO path and use DAX to go straight to memcpy() instead of allocating
a DIO and a BIO.
Includes support for the DIO_SKIP_DIO_COUNT flag in DAX, as is done in
do_blockdev_direct_IO().
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Matthew Wilcox [Fri, 3 Jul 2015 14:40:39 +0000 (10:40 -0400)]
dax: Use copy_from_iter_nocache
When userspace does a write, there's no need for the written data to
pollute the CPU cache. This matches the original XIP code.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Matthew Wilcox [Fri, 3 Jul 2015 14:40:38 +0000 (10:40 -0400)]
dax: Add block size note to documentation
For block devices which are small enough, mkfs will default to creating
a filesystem with block sizes smaller than page size.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Linus Torvalds [Sat, 4 Jul 2015 18:29:59 +0000 (11:29 -0700)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"Except for the preempt notifiers fix, these are all small bugfixes
that could have been waited for -rc2. Sending them now since I was
taking care of Peter's patch anyway"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
kvm: add hyper-v crash msrs values
KVM: x86: remove data variable from kvm_get_msr_common
KVM: s390: virtio-ccw: don't overwrite config space values
KVM: x86: keep track of LVT0 changes under APICv
KVM: x86: properly restore LVT0
KVM: x86: make vapics_in_nmi_mode atomic
sched, preempt_notifier: separate notifier registration from static_key inc/dec
Dave Jiang [Thu, 18 Jun 2015 09:17:30 +0000 (05:17 -0400)]
NTB: Add split BAR output for debugfs stats
When split BAR is enabled, the driver needs to dump out the split BAR
registers rather than the original 64bit BAR registers.
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Dave Jiang [Mon, 15 Jun 2015 12:22:30 +0000 (08:22 -0400)]
NTB: Change WARN_ON_ONCE to pr_warn_once on unsafe
The unsafe doorbell and scratchpad access should display reason when
WARN is called. Otherwise we get a stack dump without any explanation.
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Dave Jiang [Mon, 15 Jun 2015 12:21:33 +0000 (08:21 -0400)]
NTB: Print driver name and version in module init
Printouts driver name and version to indicate what is being loaded.
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Dave Jiang [Wed, 3 Jun 2015 15:29:38 +0000 (11:29 -0400)]
NTB: Increase transport MTU to 64k from 16k
Benchmarking showed a significant performance increase with the MTU size
to 64k instead of 16k. Change the driver default to 64k.
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Dave Jiang [Wed, 20 May 2015 16:55:47 +0000 (12:55 -0400)]
NTB: Rename Intel code names to platform names
Instead of using the platform code names, use the correct platform names
to identify the respective Intel NTB hardware.
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Dave Jiang [Tue, 19 May 2015 20:52:04 +0000 (16:52 -0400)]
NTB: Default to CPU memcpy for performance
Disable DMA usage by default, since the CPU provides much better
performance with write combining. Provide a module parameter to enable
DMA usage when offloading the memcpy is preferred.
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Dave Jiang [Tue, 19 May 2015 20:45:46 +0000 (16:45 -0400)]
NTB: Improve performance with write combining
Changing the memory window BAR mappings to write combining significantly
boosts the performance. We will also use memcpy that uses non-temporal
store, which showed performance improvement when doing non-cached
memcpys.
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Allen Hubbe [Tue, 19 May 2015 16:04:52 +0000 (12:04 -0400)]
NTB: Use NUMA memory in Intel driver
Allocate memory for the NUMA node of the NTB device.
Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Allen Hubbe [Mon, 18 May 2015 10:20:47 +0000 (06:20 -0400)]
NTB: Use NUMA memory and DMA chan in transport
Allocate memory and request the DMA channel for the same NUMA node as
the NTB device.
Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Allen Hubbe [Mon, 11 May 2015 14:08:26 +0000 (10:08 -0400)]
NTB: Rate limit ntb_qp_link_work
When the ntb transport is connecting and waiting for the peer, the debug
console receives lots of debug level messages about the remote qp link
status being down. Rate limit those messages.
Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Allen Hubbe [Thu, 21 May 2015 06:51:39 +0000 (02:51 -0400)]
NTB: Add tool test client
This is a simple debugging driver that enables the doorbell and
scratch pad registers to be read and written from the debugfs. This
tool enables more complicated debugging to be scripted from user space.
This driver may be used to test that your ntb hardware and drivers are
functioning at a basic level.
Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Allen Hubbe [Wed, 15 Apr 2015 15:12:41 +0000 (11:12 -0400)]
NTB: Add ping pong test client
This is a simple ping pong driver that exercises the scratch pads and
doorbells of the ntb hardware. This driver may be used to test that
your ntb hardware and drivers are functioning at a basic level.
Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>