percpu: improve generic percpu modify-return implementation
authorNicholas Piggin <npiggin@gmail.com>
Thu, 22 Sep 2016 15:55:54 +0000 (11:55 -0400)
committerTejun Heo <tj@kernel.org>
Thu, 22 Sep 2016 16:06:53 +0000 (12:06 -0400)
commit1b5ca12127427c51be605a75ecd0141eb3357249
tree5181f24d2ff40d73916a9d7aba9b2315bd01ad21
parenta67823c1ed1092160da94c31e6da5aeb35dca81c
percpu: improve generic percpu modify-return implementation

Some architectures require an additional load to find the address of
percpu pointers. In some implemenatations, the C aliasing rules do not
allow the result of that load to be kept over the store that modifies
the percpu variable, which causes additional loads.

Work around this by finding the pointer first, then operating on that.

It's also possible to mark things as restrict and those kind of games,
but that can require larger and arch specific changes.

On powerpc, __this_cpu_inc_return compiles to:

        ld 10,48(13)
        ldx 9,3,10
        addi 9,9,1
        stdx 9,3,10
        ld 9,48(13)
        ldx 3,9,3

With this patch it compiles to:

        ld 10,48(13)
        ldx 9,3,10
        addi 9,9,1
        stdx 9,3,10

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
To: Tejun Heo <tj@kernel.org>
To: Christoph Lameter <cl@linux.com>
Cc: linux-kernel@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Signed-off-by: Tejun Heo <tj@kernel.org>
include/asm-generic/percpu.h