crypto: chacha20 - Add a SSSE3 SIMD variant for x86_64
Implements an x86_64 assembler driver for the ChaCha20 stream cipher. This
single block variant works on a single state matrix using SSE instructions.
It requires SSSE3 due the use of pshufb for efficient 8/16-bit rotate
operations.
For large messages, throughput increases by ~65% compared to
chacha20-generic:
testing speed of chacha20 (chacha20-generic) encryption
test 0 (256 bit key, 16 byte blocks):
45089207 operations in 10 seconds (
721427312 bytes)
test 1 (256 bit key, 64 byte blocks):
43839521 operations in 10 seconds (
2805729344 bytes)
test 2 (256 bit key, 256 byte blocks):
12702056 operations in 10 seconds (
3251726336 bytes)
test 3 (256 bit key, 1024 byte blocks):
3371173 operations in 10 seconds (
3452081152 bytes)
test 4 (256 bit key, 8192 byte blocks): 422468 operations in 10 seconds (
3460857856 bytes)
testing speed of chacha20 (chacha20-simd) encryption
test 0 (256 bit key, 16 byte blocks):
43141886 operations in 10 seconds (
690270176 bytes)
test 1 (256 bit key, 64 byte blocks):
46845874 operations in 10 seconds (
2998135936 bytes)
test 2 (256 bit key, 256 byte blocks):
18458512 operations in 10 seconds (
4725379072 bytes)
test 3 (256 bit key, 1024 byte blocks):
5360533 operations in 10 seconds (
5489185792 bytes)
test 4 (256 bit key, 8192 byte blocks): 692846 operations in 10 seconds (
5675794432 bytes)
Benchmark results from a Core i5-4670T.
Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>