sched: Skip double execution of pick_next_task_fair()
authorPeter Zijlstra <peterz@infradead.org>
Thu, 24 Apr 2014 10:00:47 +0000 (12:00 +0200)
committerIngo Molnar <mingo@kernel.org>
Wed, 7 May 2014 09:51:35 +0000 (11:51 +0200)
Tim wrote:

 "The current code will call pick_next_task_fair a second time in the
  slow path if we did not pull any task in our first try.  This is
  really unnecessary as we already know no task can be pulled and it
  doubles the delay for the cpu to enter idle.

  We instrumented some network workloads and that saw that
  pick_next_task_fair is frequently called twice before a cpu enters
  idle.  The call to pick_next_task_fair can add non trivial latency as
  it calls load_balance which runs find_busiest_group on an hierarchy of
  sched domains spanning the cpus for a large system.  For some 4 socket
  systems, we saw almost 0.25 msec spent per call of pick_next_task_fair
  before a cpu can be idled."

Optimize the second call away for the common case and document the
dependency.

Reported-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Len Brown <len.brown@intel.com>
Link: http://lkml.kernel.org/r/20140424100047.GP11096@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
kernel/sched/core.c

index e62c65a12d5bc641b21bb62c1a84a10591de8976..28921ec91b3d4271eb3627a12f5bc420ab0de227 100644 (file)
@@ -2592,8 +2592,14 @@ pick_next_task(struct rq *rq, struct task_struct *prev)
        if (likely(prev->sched_class == class &&
                   rq->nr_running == rq->cfs.h_nr_running)) {
                p = fair_sched_class.pick_next_task(rq, prev);
-               if (likely(p && p != RETRY_TASK))
-                       return p;
+               if (unlikely(p == RETRY_TASK))
+                       goto again;
+
+               /* assumes fair_sched_class->next == idle_sched_class */
+               if (unlikely(!p))
+                       p = idle_sched_class.pick_next_task(rq, prev);
+
+               return p;
        }
 
 again: