thp: fix MADV_DONTNEED vs. numa balancing race
authorKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Thu, 13 Apr 2017 21:56:20 +0000 (14:56 -0700)
committerLinus Torvalds <torvalds@linux-foundation.org>
Fri, 14 Apr 2017 01:24:20 +0000 (18:24 -0700)
commitced108037c2aa542b3ed8b7afd1576064ad1362a
tree9bbfcea14b7f7098f667e213a6cc5851e7adc105
parent0a85e51d37645e9ce57e5e1a30859e07810ed07c
thp: fix MADV_DONTNEED vs. numa balancing race

In case prot_numa, we are under down_read(mmap_sem).  It's critical to
not clear pmd intermittently to avoid race with MADV_DONTNEED which is
also under down_read(mmap_sem):

CPU0: CPU1:
change_huge_pmd(prot_numa=1)
 pmdp_huge_get_and_clear_notify()
madvise_dontneed()
 zap_pmd_range()
  pmd_trans_huge(*pmd) == 0 (without ptl)
  // skip the pmd
 set_pmd_at();
 // pmd is re-established

The race makes MADV_DONTNEED miss the huge pmd and don't clear it
which may break userspace.

Found by code analysis, never saw triggered.

Link: http://lkml.kernel.org/r/20170302151034.27829-3-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
mm/huge_memory.c