stop_machine, sched: Fix migrate_swap() vs. active_balance() deadlock
authorPeter Zijlstra <peterz@infradead.org>
Fri, 20 Apr 2018 09:50:05 +0000 (11:50 +0200)
committerGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Wed, 20 Jun 2018 19:02:53 +0000 (04:02 +0900)
commite7a65e899d521eba676a1e652f9bf48d67722a18
tree6645defaf55b76d2cbf17296284638adea2847f6
parenta814d1101042d117069de12ba6a0ba152ac3f79a
stop_machine, sched: Fix migrate_swap() vs. active_balance() deadlock

[ Upstream commit 0b26351b910fb8fe6a056f8a1bbccabe50c0e19f ]

Matt reported the following deadlock:

CPU0 CPU1

schedule(.prev=migrate/0) <fault>
  pick_next_task()   ...
    idle_balance()     migrate_swap()
      active_balance()       stop_two_cpus()
spin_lock(stopper0->lock)
spin_lock(stopper1->lock)
ttwu(migrate/0)
  smp_cond_load_acquire() -- waits for schedule()
        stop_one_cpu(1)
  spin_lock(stopper1->lock) -- waits for stopper lock

Fix this deadlock by taking the wakeups out from under stopper->lock.
This allows the active_balance() to queue the stop work and finish the
context switch, which in turn allows the wakeup from migrate_swap() to
observe the context and complete the wakeup.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reported-by: Matt Fleming <matt@codeblueprint.co.uk>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180420095005.GH4064@hirez.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
kernel/stop_machine.c