sched/core: Rework TASK_DEAD preemption exception
authorPeter Zijlstra <peterz@infradead.org>
Mon, 28 Sep 2015 16:02:03 +0000 (18:02 +0200)
committerIngo Molnar <mingo@kernel.org>
Tue, 6 Oct 2015 15:08:13 +0000 (17:08 +0200)
TASK_DEAD is special in that the final schedule call from do_exit()
must be done with preemption disabled.

This means we end up scheduling with a preempt_count() higher than
usual (3 instead of the 'expected' 2).

Since future patches will want to rely on an invariant
preempt_count() value during schedule, fix this up.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
kernel/sched/core.c

index 88a425443ff4afeb678610b4870fd5ccb7b181ad..530fe8baa6450b7f69ae538e231502c1facd44cd 100644 (file)
@@ -2949,12 +2949,8 @@ static inline void schedule_debug(struct task_struct *prev)
 #ifdef CONFIG_SCHED_STACK_END_CHECK
        BUG_ON(unlikely(task_stack_end_corrupted(prev)));
 #endif
-       /*
-        * Test if we are atomic. Since do_exit() needs to call into
-        * schedule() atomically, we ignore that path. Otherwise whine
-        * if we are scheduling when we should not.
-        */
-       if (unlikely(in_atomic_preempt_off() && prev->state != TASK_DEAD))
+
+       if (unlikely(in_atomic_preempt_off()))
                __schedule_bug(prev);
        rcu_sleep_check();
 
@@ -3053,6 +3049,17 @@ static void __sched __schedule(void)
        rcu_note_context_switch();
        prev = rq->curr;
 
+       /*
+        * do_exit() calls schedule() with preemption disabled as an exception;
+        * however we must fix that up, otherwise the next task will see an
+        * inconsistent (higher) preempt count.
+        *
+        * It also avoids the below schedule_debug() test from complaining
+        * about this.
+        */
+       if (unlikely(prev->state == TASK_DEAD))
+               preempt_enable_no_resched_notrace();
+
        schedule_debug(prev);
 
        if (sched_feat(HRTICK))