To prevent unnecessary branching, mark the exit condition of the
primary loop as likely(), given that a carry in a 32-bit counter
occurs very rarely.
On arm64, the resulting code is emitted by GCC as
9a8: cmp w1, #0x3
9ac: add x3, x0, w1, uxtw
9b0: b.ls 9e0 <crypto_inc+0x38>
9b4: ldr w2, [x3,#-4]!
9b8: rev w2, w2
9bc: add w2, w2, #0x1
9c0: rev w4, w2
9c4: str w4, [x3]
9c8: cbz w2, 9d0 <crypto_inc+0x28>
9cc: ret
where the two remaining branch conditions (one for size < 4 and one for
the carry) are statically predicted as non-taken, resulting in optimal
execution in the vast majority of cases.
Also, replace the open coded alignment test with IS_ALIGNED().
Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
u32 c;
if (IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) ||
- !((unsigned long)b & (__alignof__(*b) - 1)))
+ IS_ALIGNED((unsigned long)b, __alignof__(*b)))
for (; size >= 4; size -= 4) {
c = be32_to_cpu(*--b) + 1;
*b = cpu_to_be32(c);
- if (c)
+ if (likely(c))
return;
}