powerpc: Use lwsync for acquire barrier if CPU supports it
authorAnton Blanchard <anton@samba.org>
Wed, 10 Feb 2010 01:10:25 +0000 (01:10 +0000)
committerBenjamin Herrenschmidt <benh@kernel.crashing.org>
Wed, 17 Feb 2010 03:03:16 +0000 (14:03 +1100)
Nick Piggin discovered that lwsync barriers around locks were faster than isync
on 970. That was a long time ago and I completely dropped the ball in testing
his patches across other ppc64 processors.

Turns out the idea helps on other chips. Using a microbenchmark that
uses a lot of threads to contend on a global pthread mutex (and therefore a
global futex), POWER6 improves 8% and POWER7 improves 2%. I checked POWER5
and while I couldn't measure an improvement, there was no regression.

This patch uses the lwsync patching code to replace the isyncs with lwsyncs
on CPUs that support the instruction. We were marking POWER3 and RS64 as lwsync
capable but in reality they treat it as a full sync (ie slow). Remove the
CPU_FTR_LWSYNC bit from these CPUs so they continue to use the faster isync
method.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
arch/powerpc/include/asm/cputable.h
arch/powerpc/include/asm/synch.h

index 80f315e8a421b2f0bde0bb6cc9d931c7fb5d5372..abb833b0e58f71951acd570397990e72b5bc2722 100644 (file)
@@ -381,9 +381,9 @@ extern const char *powerpc_base_platform;
 #define CPU_FTRS_GENERIC_32    (CPU_FTR_COMMON | CPU_FTR_NODSISRALIGN)
 
 /* 64-bit CPUs */
-#define CPU_FTRS_POWER3        (CPU_FTR_USE_TB | CPU_FTR_LWSYNC | \
+#define CPU_FTRS_POWER3        (CPU_FTR_USE_TB | \
            CPU_FTR_IABR | CPU_FTR_PPC_LE)
-#define CPU_FTRS_RS64  (CPU_FTR_USE_TB | CPU_FTR_LWSYNC | \
+#define CPU_FTRS_RS64  (CPU_FTR_USE_TB | \
            CPU_FTR_IABR | \
            CPU_FTR_MMCRA | CPU_FTR_CTRL)
 #define CPU_FTRS_POWER4        (CPU_FTR_USE_TB | CPU_FTR_LWSYNC | \
index 5db1f0d5ea82944554a747d2be6371483977f4a8..d7cab44643c51d90f1f79509939e3c734b735eba 100644 (file)
@@ -37,7 +37,11 @@ static inline void isync(void)
 #endif
 
 #ifdef CONFIG_SMP
-#define PPC_ACQUIRE_BARRIER    "\n\tisync\n"
+#define __PPC_ACQUIRE_BARRIER                          \
+       START_LWSYNC_SECTION(97);                       \
+       isync;                                          \
+       MAKE_LWSYNC_SECTION_ENTRY(97, __lwsync_fixup);
+#define PPC_ACQUIRE_BARRIER    "\n" stringify_in_c(__PPC_ACQUIRE_BARRIER)
 #define PPC_RELEASE_BARRIER    stringify_in_c(LWSYNC) "\n"
 #else
 #define PPC_ACQUIRE_BARRIER