perf bench: Also allow measuring alternative memcpy implementations
authorJan Beulich <JBeulich@suse.com>
Wed, 18 Jan 2012 13:28:56 +0000 (13:28 +0000)
committerArnaldo Carvalho de Melo <acme@redhat.com>
Tue, 24 Jan 2012 21:51:01 +0000 (19:51 -0200)
Intended to be able to support the current selection of the preferred
memcpy() implementation, this patch adds the ability to also measure the
two alternative implementations, again by way of using some
pre-processsor replacement.

While on my Westmere system this proves that the movsb based variant is
worse than the movsq based one (since the ERMS feature isn't there), it
also shows that here for the default as well as small sizes the unrolled
variant outperforms the movsq one.

Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/4F16D728020000780006D732@nat28.tlf.novell.com
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
tools/perf/bench/mem-memcpy-x86-64-asm-def.h
tools/perf/bench/mem-memcpy-x86-64-asm.S

index d588b87696fcb2f3967759d1e31ef7278aee06a7..d66ab799b35fd5cab83e5486368e40c39c2927ee 100644 (file)
@@ -2,3 +2,11 @@
 MEMCPY_FN(__memcpy,
        "x86-64-unrolled",
        "unrolled memcpy() in arch/x86/lib/memcpy_64.S")
+
+MEMCPY_FN(memcpy_c,
+       "x86-64-movsq",
+       "movsq-based memcpy() in arch/x86/lib/memcpy_64.S")
+
+MEMCPY_FN(memcpy_c_e,
+       "x86-64-movsb",
+       "movsb-based memcpy() in arch/x86/lib/memcpy_64.S")
index 384b60788ab960568e9832325b95e4aea6518eb3..a20780bd0771704c3099f3517bdd90fc890f4ecf 100644 (file)
@@ -1,2 +1,6 @@
 #define memcpy MEMCPY /* don't hide glibc's memcpy() */
+#define altinstr_replacement text
+#define globl p2align 4; .globl
+#define Lmemcpy_c globl memcpy_c; memcpy_c
+#define Lmemcpy_c_e globl memcpy_c_e; memcpy_c_e
 #include "../../../arch/x86/lib/memcpy_64.S"