x86: Flush TLB if PGD entry is changed in i386 PAE mode
authorShaohua Li <shaohua.li@intel.com>
Wed, 16 Mar 2011 03:37:29 +0000 (11:37 +0800)
committerIngo Molnar <mingo@elte.hu>
Fri, 18 Mar 2011 10:44:01 +0000 (11:44 +0100)
According to intel CPU manual, every time PGD entry is changed in i386 PAE
mode, we need do a full TLB flush. Current code follows this and there is
comment for this too in the code.

But current code misses the multi-threaded case. A changed page table
might be used by several CPUs, every such CPU should flush TLB. Usually
this isn't a problem, because we prepopulate all PGD entries at process
fork. But when the process does munmap and follows new mmap, this issue
will be triggered.

When it happens, some CPUs keep doing page faults:

  http://marc.info/?l=linux-kernel&m=129915020508238&w=2

Reported-by: Yasunori Goto<y-goto@jp.fujitsu.com>
Tested-by: Yasunori Goto<y-goto@jp.fujitsu.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Shaohua Li<shaohua.li@intel.com>
Cc: Mallick Asit K <asit.k.mallick@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm <linux-mm@kvack.org>
Cc: stable <stable@kernel.org>
LKML-Reference: <1300246649.2337.95.camel@sli10-conroe>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
arch/x86/include/asm/pgtable-3level.h
arch/x86/mm/pgtable.c

index 94b979d1b58dbcef07dc7b02c3e9b1e94e71926d..effff47a3c8280fe4d0b5979c433a8400d570129 100644 (file)
@@ -69,8 +69,6 @@ static inline void native_pmd_clear(pmd_t *pmd)
 
 static inline void pud_clear(pud_t *pudp)
 {
-       unsigned long pgd;
-
        set_pud(pudp, __pud(0));
 
        /*
@@ -79,13 +77,10 @@ static inline void pud_clear(pud_t *pudp)
         * section 8.1: in PAE mode we explicitly have to flush the
         * TLB via cr3 if the top-level pgd is changed...
         *
-        * Make sure the pud entry we're updating is within the
-        * current pgd to avoid unnecessary TLB flushes.
+        * Currently all places where pud_clear() is called either have
+        * flush_tlb_mm() followed or don't need TLB flush (x86_64 code or
+        * pud_clear_bad()), so we don't need TLB flush here.
         */
-       pgd = read_cr3();
-       if (__pa(pudp) >= pgd && __pa(pudp) <
-           (pgd + sizeof(pgd_t)*PTRS_PER_PGD))
-               write_cr3(pgd);
 }
 
 #ifdef CONFIG_SMP
index 0113d19c8aa60985764dfce5bcc4488c8ee57c5c..8573b83a63d037bf2f7eb09c8922b94c2b535c45 100644 (file)
@@ -168,8 +168,7 @@ void pud_populate(struct mm_struct *mm, pud_t *pudp, pmd_t *pmd)
         * section 8.1: in PAE mode we explicitly have to flush the
         * TLB via cr3 if the top-level pgd is changed...
         */
-       if (mm == current->active_mm)
-               write_cr3(read_cr3());
+       flush_tlb_mm(mm);
 }
 #else  /* !CONFIG_X86_PAE */