arm64/crypto: issue aese/aesmc instructions in pairs
This changes the AES core transform implementations to issue aese/aesmc
(and aesd/aesimc) in pairs. This enables a micro-architectural optimization
in recent Cortex-A5x cores that improves performance by 50-90%.
Measured performance in cycles per byte (Cortex-A57):
CBC enc CBC dec CTR
before 3.64 1.34 1.32
after 1.95 0.85 0.93
Note that this results in a ~5% performance decrease for older cores.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>