tracing: Skip more functions when doing stack tracing of events
authorSteven Rostedt (Red Hat) <rostedt@goodmis.org>
Thu, 23 Jun 2016 18:03:47 +0000 (14:03 -0400)
committerSteven Rostedt <rostedt@goodmis.org>
Thu, 23 Jun 2016 22:48:56 +0000 (18:48 -0400)
 # echo 1 > options/stacktrace
 # echo 1 > events/sched/sched_switch/enable
 # cat trace
          <idle>-0     [002] d..2  1982.525169: <stack trace>
 => save_stack_trace
 => __ftrace_trace_stack
 => trace_buffer_unlock_commit_regs
 => event_trigger_unlock_commit
 => trace_event_buffer_commit
 => trace_event_raw_event_sched_switch
 => __schedule
 => schedule
 => schedule_preempt_disabled
 => cpu_startup_entry
 => start_secondary

The above shows that we are seeing 6 functions before ever making it to the
caller of the sched_switch event.

 # echo stacktrace > events/sched/sched_switch/trigger
 # cat trace
          <idle>-0     [002] d..3  2146.335208: <stack trace>
 => trace_event_buffer_commit
 => trace_event_raw_event_sched_switch
 => __schedule
 => schedule
 => schedule_preempt_disabled
 => cpu_startup_entry
 => start_secondary

The stacktrace trigger isn't as bad, because it adds its own skip to the
stacktracing, but still has two events extra.

One issue is that if the stacktrace passes its own "regs" then there should
be no addition to the skip, as the regs will not include the functions being
called. This was an issue that was fixed by commit 7717c6be6999 ("tracing:
Fix stacktrace skip depth in trace_buffer_unlock_commit_regs()" as adding
the skip number for kprobes made the probes not have any stack at all.

But since this is only an issue when regs is being used, a skip should be
added if regs is NULL. Now we have:

 # echo 1 > options/stacktrace
 # echo 1 > events/sched/sched_switch/enable
 # cat trace
          <idle>-0     [000] d..2  1297.676333: <stack trace>
 => __schedule
 => schedule
 => schedule_preempt_disabled
 => cpu_startup_entry
 => rest_init
 => start_kernel
 => x86_64_start_reservations
 => x86_64_start_kernel

 # echo stacktrace > events/sched/sched_switch/trigger
 # cat trace
          <idle>-0     [002] d..3  1370.759745: <stack trace>
 => __schedule
 => schedule
 => schedule_preempt_disabled
 => cpu_startup_entry
 => start_secondary

And kprobes are not touched.

Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
kernel/trace/trace.c

index 45e6747589c66d3409f9955b22bc9d769426336b..3d9f31b576f33c5fa2adf9a487d877dd5bf00a45 100644 (file)
@@ -2118,7 +2118,17 @@ void trace_buffer_unlock_commit_regs(struct trace_array *tr,
 {
        __buffer_unlock_commit(buffer, event);
 
-       ftrace_trace_stack(tr, buffer, flags, 0, pc, regs);
+       /*
+        * If regs is not set, then skip the following callers:
+        *   trace_buffer_unlock_commit_regs
+        *   event_trigger_unlock_commit
+        *   trace_event_buffer_commit
+        *   trace_event_raw_event_sched_switch
+        * Note, we can still get here via blktrace, wakeup tracer
+        * and mmiotrace, but that's ok if they lose a function or
+        * two. They are that meaningful.
+        */
+       ftrace_trace_stack(tr, buffer, flags, regs ? 0 : 4, pc, regs);
        ftrace_trace_userstack(buffer, flags, pc);
 }
 
@@ -2168,6 +2178,13 @@ static void __ftrace_trace_stack(struct ring_buffer *buffer,
        trace.nr_entries        = 0;
        trace.skip              = skip;
 
+       /*
+        * Add two, for this function and the call to save_stack_trace()
+        * If regs is set, then these functions will not be in the way.
+        */
+       if (!regs)
+               trace.skip += 2;
+
        /*
         * Since events can happen in NMIs there's no safe way to
         * use the per cpu ftrace_stacks. We reserve it and if an interrupt