stop_machine: Mark per cpu stopper enabled early
authorThomas Gleixner <tglx@linutronix.de>
Tue, 26 Feb 2013 17:44:33 +0000 (18:44 +0100)
committerThomas Gleixner <tglx@linutronix.de>
Tue, 26 Feb 2013 21:25:17 +0000 (22:25 +0100)
commit 14e568e78 (stop_machine: Use smpboot threads) introduced the
following regression:

Before this commit the stopper enabled bit was set in the online
notifier.

CPU0 CPU1
cpu_up
cpu online
hotplug_notifier(ONLINE)
  stopper(CPU1)->enabled = true;
...
stop_machine()

The conversion to smpboot threads moved the enablement to the wakeup
path of the parked thread. The majority of users seem to have the
following working order:

CPU0 CPU1
cpu_up
cpu online
unpark_threads()
  wakeup(stopper[CPU1])
....
stopper thread runs
  stopper(CPU1)->enabled = true;
stop_machine()

But Konrad and Sander have observed:

CPU0 CPU1
cpu_up
cpu online
unpark_threads()
  wakeup(stopper[CPU1])
....
stop_machine()
stopper thread runs
  stopper(CPU1)->enabled = true;

Now the stop machinery kicks CPU0 into the stop loop, where it gets
stuck forever because the queue code saw stopper(CPU1)->enabled ==
false, so CPU0 waits for CPU1 to enter stomp_machine, but the CPU1
stopper work got discarded due to enabled == false.

Add a pre_unpark function to the smpboot thread descriptor and call it
before waking the thread.

This fixes the problem at hand, but the stop_machine code should be
more robust. The stopper->enabled flag smells fishy at best.

Thanks to Konrad for going through a loop of debug patches and
providing the information to decode this issue.

Reported-and-tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reported-and-tested-by: Sander Eikelenboom <linux@eikelenboom.it>
Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1302261843240.22263@ionos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
include/linux/smpboot.h
kernel/smpboot.c
kernel/stop_machine.c

index c65dee059913c8d429a614bf3c9b16e088369216..13e929679550ab4967b72a73e544dbf531f74004 100644 (file)
@@ -24,6 +24,9 @@ struct smpboot_thread_data;
  *                     parked (cpu offline)
  * @unpark:            Optional unpark function, called when the thread is
  *                     unparked (cpu online)
+ * @pre_unpark:                Optional unpark function, called before the thread is
+ *                     unparked (cpu online). This is not guaranteed to be
+ *                     called on the target cpu of the thread. Careful!
  * @selfparking:       Thread is not parked by the park function.
  * @thread_comm:       The base name of the thread
  */
@@ -37,6 +40,7 @@ struct smp_hotplug_thread {
        void                            (*cleanup)(unsigned int cpu, bool online);
        void                            (*park)(unsigned int cpu);
        void                            (*unpark)(unsigned int cpu);
+       void                            (*pre_unpark)(unsigned int cpu);
        bool                            selfparking;
        const char                      *thread_comm;
 };
index d4abac261779e037b2396b80f5b4fe3ec6efad71..8eaed9aa9cf0c1995520605af1de8ef3b9e95485 100644 (file)
@@ -209,6 +209,8 @@ static void smpboot_unpark_thread(struct smp_hotplug_thread *ht, unsigned int cp
 {
        struct task_struct *tsk = *per_cpu_ptr(ht->store, cpu);
 
+       if (ht->pre_unpark)
+               ht->pre_unpark(cpu);
        kthread_unpark(tsk);
 }
 
index 95d178c62d5a8537c18fa2d6f1d947aed1a93448..c09f2955ae3055b42f1edde601ee1eb431bfc18a 100644 (file)
@@ -336,7 +336,7 @@ static struct smp_hotplug_thread cpu_stop_threads = {
        .create                 = cpu_stop_create,
        .setup                  = cpu_stop_unpark,
        .park                   = cpu_stop_park,
-       .unpark                 = cpu_stop_unpark,
+       .pre_unpark             = cpu_stop_unpark,
        .selfparking            = true,
 };