perf bench futex: Cache align the worker struct
authorSebastian Andrzej Siewior <bigeasy@linutronix.de>
Sun, 16 Oct 2016 19:08:02 +0000 (21:08 +0200)
committerArnaldo Carvalho de Melo <acme@redhat.com>
Mon, 24 Oct 2016 14:07:45 +0000 (11:07 -0300)
It popped up in perf testing that the worker consumes some amount of
CPU. It boils down to the increment of `ops` which causes cache line
bouncing between the individual threads.

This patch aligns the struct by 256 bytes to ensure that not a cache
line is shared among CPUs. 128 byte is the x86 worst case and grep says
that L1_CACHE_SHIFT is set to 8 on s390.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20161016190803.3392-1-bigeasy@linutronix.de
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
tools/perf/bench/futex-hash.c

index 8024cd5febd226da1e3bfe99003f8eb217bdfc69..d9e5e80bb4d0b7b2b49972057c28a377a40aec89 100644 (file)
@@ -39,12 +39,15 @@ static unsigned int threads_starting;
 static struct stats throughput_stats;
 static pthread_cond_t thread_parent, thread_worker;
 
+#define SMP_CACHE_BYTES 256
+#define __cacheline_aligned __attribute__ ((aligned (SMP_CACHE_BYTES)))
+
 struct worker {
        int tid;
        u_int32_t *futex;
        pthread_t thread;
        unsigned long ops;
-};
+} __cacheline_aligned;
 
 static const struct option options[] = {
        OPT_UINTEGER('t', "threads", &nthreads, "Specify amount of threads"),