x86/i386: Use less assembly in strlen(), speed things up a bit
authorAlexey Dobriyan <adobriyan@gmail.com>
Sun, 11 Dec 2011 18:13:19 +0000 (21:13 +0300)
committerIngo Molnar <mingo@elte.hu>
Mon, 12 Dec 2011 17:33:42 +0000 (18:33 +0100)
commit890890cb8e415e1e7a61bfe3c8e246f710196824
treea4694f6470484ebac1d82ed317fa5ce5c198006e
parent79f1ddd06471b094ae30eb17b33beb9f1234ca93
x86/i386: Use less assembly in strlen(), speed things up a bit

Current i386 strlen() hardcodes NOT/DEC sequence. DEC is
mentioned to be suboptimal on Core2. So, put only REPNE SCASB
sequence in assembly, compiler can do the rest.

The difference in generated code is like below (MCORE2=y):

<strlen>:
push   %edi
mov    $0xffffffff,%ecx
mov    %eax,%edi
xor    %eax,%eax
repnz scas %es:(%edi),%al
not    %ecx

- dec    %ecx
- mov    %ecx,%eax
+ lea    -0x1(%ecx),%eax

pop    %edi
ret

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jan Beulich <JBeulich@suse.com>
Link: http://lkml.kernel.org/r/20111211181319.GA17097@p183.telecom.by
Signed-off-by: Ingo Molnar <mingo@elte.hu>
arch/x86/lib/string_32.c