From dc322a99d31fff5d3f8acfa061ad033953efdebe Mon Sep 17 00:00:00 2001 From: Christoph Lameter Date: Mon, 7 Apr 2014 15:39:38 -0700 Subject: [PATCH] mm: use raw_cpu ops for determining current NUMA node With the preempt checking logic for __this_cpu_ops we will get false positives from locations in the code that use numa_node_id. Before the __this_cpu ops where introduced there were no checks for preemption present either. smp_raw_processor_id() was used. See http://www.spinics.net/lists/linux-numa/msg00641.html Therefore we need to use raw_cpu_read here to avoid false postives. Note that this issue has been discussed in prior years. If the process changes nodes after retrieving the current numa node then that is acceptable since most uses of numa_node etc are for optimization and not for correctness. There were suggestions to implement a raw_numa_node_id in order to do preempt checks for numa_node_id as well. But I think we better defer that to another patch since that would mean investigating how numa_node_id() is used throughout the kernel which would increase the scope of this patchset significantly. After all preemption was never checked before when numa_node_id() was used. Some sample traces: __this_cpu_read operation in preemptible [00000000] code: login/1456 caller is __this_cpu_preempt_check+0x2b/0x2d CPU: 0 PID: 1456 Comm: login Not tainted 3.12.0-rc4-cl-00062-g2fe80d3-dirty #185 Call Trace: dump_stack+0x4e/0x82 check_preemption_disabled+0xc5/0xe0 __this_cpu_preempt_check+0x2b/0x2d get_task_policy+0x1d/0x49 get_vma_policy+0x14/0x76 alloc_pages_vma+0x35/0xff handle_mm_fault+0x290/0x73b __do_page_fault+0x3fe/0x44d do_page_fault+0x9/0xc page_fault+0x22/0x30 generic_file_aio_read+0x38e/0x624 do_sync_read+0x54/0x73 vfs_read+0x9d/0x12a SyS_read+0x47/0x7e cstar_dispatch+0x7/0x23 caller is __this_cpu_preempt_check+0x2b/0x2d CPU: 0 PID: 1456 Comm: login Not tainted 3.12.0-rc4-cl-00062-g2fe80d3-dirty #185 Call Trace: dump_stack+0x4e/0x82 check_preemption_disabled+0xc5/0xe0 __this_cpu_preempt_check+0x2b/0x2d alloc_pages_current+0x8f/0xbc __page_cache_alloc+0xb/0xd __do_page_cache_readahead+0xf4/0x219 ra_submit+0x1c/0x20 ondemand_readahead+0x28c/0x2b4 page_cache_sync_readahead+0x38/0x3a generic_file_aio_read+0x261/0x624 do_sync_read+0x54/0x73 vfs_read+0x9d/0x12a SyS_read+0x47/0x7e cstar_dispatch+0x7/0x23 Signed-off-by: Christoph Lameter Acked-by: Ingo Molnar Cc: Alex Shi Cc: Tejun Heo Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/topology.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/include/linux/topology.h b/include/linux/topology.h index 12ae6ce997d6..7062330a1329 100644 --- a/include/linux/topology.h +++ b/include/linux/topology.h @@ -188,7 +188,7 @@ DECLARE_PER_CPU(int, numa_node); /* Returns the number of the current Node. */ static inline int numa_node_id(void) { - return __this_cpu_read(numa_node); + return raw_cpu_read(numa_node); } #endif @@ -245,7 +245,7 @@ static inline void set_numa_mem(int node) /* Returns the number of the nearest Node with memory */ static inline int numa_mem_id(void) { - return __this_cpu_read(_numa_mem_); + return raw_cpu_read(_numa_mem_); } #endif -- 2.20.1