Bart Van Assche [Sun, 5 Apr 2009 14:20:02 +0000 (16:20 +0200)]
branch tracer: Fix for enabling branch profiling makes sparse unusable
One of the changes between kernels 2.6.28 and 2.6.29 is that a branch profiler
has been added for if() statements. Unfortunately this patch makes the sparse
output unusable with CONFIG_TRACE_BRANCH_PROFILING=y: when branch profiling is
enabled, sparse prints so much false positives that the real issues are no
longer visible. This behavior can be reproduced as follows:
* enable CONFIG_TRACE_BRANCH_PROFILING, e.g. by running make allyesconfig or
make allmodconfig.
* run make C=2
Result: a huge number of the following sparse warnings.
...
include/linux/cpumask.h:547:2: warning: symbol '______r' shadows an earlier one
include/linux/cpumask.h:547:2: originally declared here
...
The patch below fixes this by disabling branch profiling while analyzing the
kernel code with sparse.
See also:
* http://lkml.org/lkml/2008/11/21/18
* http://bugzilla.kernel.org/show_bug.cgi?id=12925
Signed-off-by: Bart Van Assche <bart.vanassche@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Steven Rostedt <srostedt@redhat.com>
LKML-Reference: <
200904051620.02311.bart.vanassche@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Zhaolei [Fri, 3 Apr 2009 10:24:46 +0000 (18:24 +0800)]
ftrace: Correct a text align for event format output
If we cat debugfs/tracing/events/ftrace/bprint/format, we'll see:
name: bprint
ID: 6
format:
field:unsigned char common_type; offset:0; size:1;
field:unsigned char common_flags; offset:1; size:1;
field:unsigned char common_preempt_count; offset:2; size:1;
field:int common_pid; offset:4; size:4;
field:int common_tgid; offset:8; size:4;
field:unsigned long ip; offset:12; size:4;
field:char * fmt; offset:16; size:4;
field: char buf; offset:20; size:0;
print fmt: "%08lx (%d) fmt:%p %s"
There is an inconsistent blank before char buf.
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
LKML-Reference: <
49D5E3EE.70201@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Nikanth Karthikesan [Mon, 23 Mar 2009 06:28:31 +0000 (11:58 +0530)]
Update /debug/tracing/README
Some of the tracers have been renamed, which was not updated in the in-kernel
run-time README file. Update it.
Signed-off-by: Nikanth Karthikesan <knikanth@suse.de>
LKML-Reference: <
200903231158.32151.knikanth@suse.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Frederic Weisbecker [Wed, 1 Apr 2009 20:53:08 +0000 (22:53 +0200)]
tracing/ftrace: alloc the started cpumask for the trace file
Impact: fix a crash while cat trace file
Currently we are using a cpumask to remind each cpu where a
trace occured. It lets us notice the user that a cpu just had
its first trace.
But on latest -tip we have the following crash once we cat the trace
file:
IP: [<
c0270c4a>] print_trace_fmt+0x45/0xe7
*pde =
00000000
Oops: 0000 [#1] PREEMPT SMP
last sysfs file: /sys/class/net/eth0/carrier
Pid: 3897, comm: cat Not tainted (
2.6.29-tip-02825-g0f22972-dirty #81)
EIP: 0060:[<
c0270c4a>] EFLAGS:
00010297 CPU: 0
EIP is at print_trace_fmt+0x45/0xe7
EAX:
00000000 EBX:
00000000 ECX:
c12d9e98 EDX:
ccdb7010
ESI:
d31f4000 EDI:
00322401 EBP:
d31f3f10 ESP:
d31f3efc
DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Process cat (pid: 3897, ti=
d31f2000 task=
d3b3cf20 task.ti=
d31f2000)
Stack:
d31f4080 ccdb7010 d31f4000 d691fe70 ccdb7010 d31f3f24 c0270e5c d31f4000
d691fe70 d31f4000 d31f3f34 c02718e8 c12d9e98 d691fe70 d31f3f70 c02bfc33
00001000 09130000 d3b46e00 d691fe98 00000000 00000079 00000001 00000000
Call Trace:
[<
c0270e5c>] ? print_trace_line+0x170/0x17c
[<
c02718e8>] ? s_show+0xa7/0xbd
[<
c02bfc33>] ? seq_read+0x24a/0x327
[<
c02bf9e9>] ? seq_read+0x0/0x327
[<
c02ab18b>] ? vfs_read+0x86/0xe1
[<
c02ab289>] ? sys_read+0x40/0x65
[<
c0202d8f>] ? sysenter_do_call+0x12/0x3c
Code: 00 00 00 89 45 ec f7 c7 00 20 00 00 89 55 f0 74 4e f6 86 98 10 00 00 02 74 45 8b 86 8c 10 00 00 8b 9e a8 10 00 00 e8 52 f3 ff ff <0f> a3 03 19 c0 85 c0 75 2b 8b 86 8c 10 00 00 8b 9e a8 10 00 00
EIP: [<
c0270c4a>] print_trace_fmt+0x45/0xe7 SS:ESP 0068:
d31f3efc
CR2:
0000000000000000
---[ end trace
aa9cf38e5ebed9dd ]---
This is because we alloc the iter->started cpumask on tracing_pipe_open but
not on tracing_open.
It hadn't been noticed until now because we need to have ring buffer overruns
to activate the starting of cpu buffer detection.
Also, we need a check to not print the messagge for the first trace on the file.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <
1238619188-6109-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Huang Weiyi [Tue, 31 Mar 2009 12:41:31 +0000 (20:41 +0800)]
tracing, x86: remove duplicated #include
Remove duplicated #include in arch/x86/kernel/ftrace.c.
Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com>
LKML-Reference: <
1238503291-2532-1-git-send-email-weiyi.huang@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Zhaolei [Tue, 31 Mar 2009 07:24:51 +0000 (15:24 +0800)]
ftrace: Add check of sched_stopped for probe_sched_wakeup
The wakeup tracing in sched_switch does not stop when a user
disables tracing. This is because the probe_sched_wakeup() is missing
the check to prevent the wakeup from being traced.
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
LKML-Reference: <
49D1C543.
3010307@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Steven Rostedt [Thu, 26 Mar 2009 00:55:00 +0000 (20:55 -0400)]
function-graph: add proper initialization for init task
Impact: fix to crash going to kexec
The init task did not properly initialize the function graph pointers.
Altough these pointers are NULL, they can not be assumed to be NULL
for the init task, and must still be properly initialize.
This usually is not an issue since a problem only arises when a task
exits, and the init tasks do not usually exit. But when doing tests
with kexec, the init tasks do exit, and the bug appears.
This patch properly initializes the init tasks function graph data
structures.
Reported-and-Tested-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <alpine.DEB.2.00.
0903252053080.5675@gandalf.stny.rr.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Frederic Weisbecker [Fri, 27 Mar 2009 13:22:10 +0000 (14:22 +0100)]
tracing/ftrace: fix missing include string.h
Building a kernel with tracing can raise the following warning on
tip/master:
kernel/trace/trace.c:1249: error: implicit declaration of function 'vbin_printf'
We are missing an include to string.h
Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <
1238160130-7437-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Lai Jiangshan [Mon, 30 Mar 2009 05:48:00 +0000 (13:48 +0800)]
tracing: fix incorrect return type of ns2usecs()
Impact: fix time output bug in 32bits system
ns2usecs() returns 'long', it's incorrect.
(In i386)
...
<idle>-0 [000] 521.442100: _spin_lock <-tick_do_update_jiffies64
<idle>-0 [000] 521.442101: do_timer <-tick_do_update_jiffies64
<idle>-0 [000] 521.442102: update_wall_time <-do_timer
<idle>-0 [000] 521.442102: update_xtime_cache <-update_wall_time
....
(It always print the time less than 2200 seconds besides ...)
Because 'long' is 32bits in i386. ( (1<<31) useconds is about 2200 seconds)
...
<idle>-0 [001]
4154502640.134759: rcu_bh_qsctr_inc <-__do_softirq
<idle>-0 [001]
4154502640.134760: _local_bh_enable <-__do_softirq
<idle>-0 [001]
4154502640.134761: idle_cpu <-irq_exit
...
(very large value)
Because 'long' is a signed type and it is 32bits in i386.
Changes in v2:
return 'unsigned long long' instead of 'cycle_t'
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
LKML-Reference: <
49D05D10.
4030009@cn.fujitsu.com>
Reported-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Steven Rostedt [Fri, 3 Apr 2009 15:12:23 +0000 (11:12 -0400)]
tracing: remove CALLER_ADDR2 from wakeup tracer
Maneesh Soni was getting a crash when running the wakeup tracer.
We debugged it down to the recording of the function with the
CALLER_ADDR2 macro. This is used to get the location of the caller
to schedule.
But the problem comes when schedule is called by assmebly. In the case
that Maneesh had, retint_careful would call schedule. But retint_careful
does not set up a proper frame pointer. CALLER_ADDR2 is defined as
__builtin_return_address(2). This produces the following assembly in
the wakeup tracer code.
mov 0x0(%rbp),%rcx <--- get the frame pointer of the caller
mov %r14d,%r8d
mov 0xf2de8e(%rip),%rdi
mov 0x8(%rcx),%rsi <-- this is __builtin_return_address(1)
mov 0x28(%rdi,%rax,8),%rbx
mov (%rcx),%rax <-- get the frame pointer of the caller's caller
mov %r12,%rcx
mov 0x8(%rax),%rdx <-- this is __builtin_return_address(2)
At the reading of 0x8(%rax) Maneesh's machine would take a fault.
The reason is that retint_careful did not set up the return address
and the content of %rax here was zero.
To verify this, I sent Maneesh a patch to create a frame pointer
in retint_careful. He ran the test again but this time he would take
the same type of fault from sysret_careful. The retint_careful was no
longer an issue, but there are other callers that still have issues.
Instead of adding frame pointers for all callers to schedule (in possibly
all archs), it is much safer to simply not use CALLER_ADDR2. This
loses out on knowing what called schedule, but the function tracer
will help there if needed.
Reported-by: Maneesh Soni <maneesh@in.ibm.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar [Tue, 7 Apr 2009 11:40:49 +0000 (13:40 +0200)]
Merge branch 'tracing/blktrace-fixes' into tracing/urgent
Merge reason: this used to be a tracing/blktrace-v2 devel topic still
cooking during the merge window - has propagated to fixes
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Li Zefan [Thu, 2 Apr 2009 05:43:26 +0000 (13:43 +0800)]
blktrace: fix pdu_len when tracing packet command requests
Impact: output all of packet commands - not just the first 4 / 8 bytes
Since commit
d7e3c3249ef23b4617393c69fe464765b4ff1645 ("block: add
large command support"), struct request->cmd has been changed from
unsinged char cmd[BLK_MAX_CDB] to unsigned char *cmd.
v1 -> v2: by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
- make sure rq->cmd_len is always intialized, and then we can use
rq->cmd_len instead of BLK_MAX_CDB.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
LKML-Reference: <
49D4507E.
2060602@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Li Zefan [Fri, 3 Apr 2009 07:31:34 +0000 (15:31 +0800)]
blktrace: small cleanup in blk_msg_write()
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: "Alan D. Brunelle" <alan.brunelle@hp.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
LKML-Reference: <
49D5BB56.
7000807@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Carl Henrik Lunde [Fri, 3 Apr 2009 12:27:15 +0000 (14:27 +0200)]
blktrace: NUL-terminate user space messages
Impact: fix corrupted blkparse output
Make sure messages from user space are NUL-terminated strings,
otherwise we could dump random memory to the block trace file.
Additionally, I've limited the message to BLK_TN_MAX_MSG-1
characters, because the last character would be stripped by
vscnprintf anyway.
Signed-off-by: Carl Henrik Lunde <chlunde@ping.uio.no>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: "Alan D. Brunelle" <alan.brunelle@hp.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <
20090403122714.GT5178@kernel.dk>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Lai Jiangshan [Mon, 30 Mar 2009 06:50:04 +0000 (14:50 +0800)]
tracing: move scripts/trace/power.pl to scripts/tracing/power.pl
Impact: Cleanup
We use scripts/tracing/ to contain tracing scripts.
Use one directory only instead of two.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Steven Rostedt <srostedt@redhat.com>
LKML-Reference: <
49D06B9C.
3070209@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar [Tue, 31 Mar 2009 15:46:40 +0000 (17:46 +0200)]
Merge branches 'tracing/docs', 'tracing/filters', 'tracing/ftrace', 'tracing/kprobes', 'tracing/blktrace-v2' and 'tracing/textedit' into tracing/core-v2
Li Zefan [Fri, 27 Mar 2009 02:21:00 +0000 (10:21 +0800)]
trace: make argument 'mem' of trace_seq_putmem() const
Impact: fix build warning
I passed a const value to trace_seq_putmem(), and I got compile warning.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Eduard - Gabriel Munteanu [Mon, 23 Mar 2009 13:12:23 +0000 (15:12 +0200)]
tracing: add missing 'extern' keywords to trace_output.h
Impact: cleanup
Many declarations within trace_output.h are missing the 'extern' keyword
in an inconsistent manner. This adds 'extern' where it should be.
Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Eduard - Gabriel Munteanu [Mon, 23 Mar 2009 13:12:22 +0000 (15:12 +0200)]
tracing: provide trace_seq_reserve()
trace_seq_reserve() allows a caller to reserve space in a trace_seq and
write directly into it. This makes it easier to export binary data to
userspace via the tracing interface, by simply filling in a struct.
Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Li Zefan [Fri, 27 Mar 2009 02:21:54 +0000 (10:21 +0800)]
blktrace: print out BLK_TN_MESSAGE properly
Impact: improve ftrace plugin output
Before this patch:
# cat trace
make-5383 [001] 741.240059: 8,7 P N [make]
__trace_note_message: cfq1074
# echo 1 > options/blk_classic
# cat trace
8,7 1 0.
692221252 0 C W
130411392 + 1024 [0]
Bad pc action 6361
Bad pc action 283d
# echo 0 > options/blk_classic
# echo bin > trace_options
# cat trace_pipe | blkparse -i -
(can't parse messages generated by blk_add_trace_msg())
After this patch:
# cat trace
<idle>-0 [001] 187.600933: 8,7 C W
145220224 + 8 [0]
<idle>-0 [001] 187.600946: 8,7 m N cfq1076 complete
# echo 1 > options/blk_classic
# cat trace
8,7 1 0.
256378996 238 I W
113190728 + 8 [pdflush]
8,7 1 0.
256378998 238 m N cfq1076 insert_request
# echo 0 > options/blk_classic
# echo bin > trace_options
# cat trace_pipe | blkparse -i -
8,7 1 0 22.
973250293 0 C W
102770576 + 8 [0]
8,7 1 0 22.
973259213 0 m N cfq1076 complete
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Li Zefan [Fri, 27 Mar 2009 02:21:23 +0000 (10:21 +0800)]
blktrace: extract duplidate code
Impact: cleanup
blk_trace_event_print() and blk_tracer_print_line() share most of the code.
text data bss dec hex filename
8605 393 12 9010 2332 kernel/trace/blktrace.o.orig
text data bss dec hex filename
8555 393 12 8960 2300 kernel/trace/blktrace.o
This patch also prepares for the next patch, that prints out BLK_TN_MESSAGE.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Li Zefan [Fri, 27 Mar 2009 02:20:24 +0000 (10:20 +0800)]
blktrace: fix memory leak when freeing struct blk_io_trace
Impact: fix mixed ioctl and ftrace-plugin blktrace use memory leak
When mixing the use of ioctl-based blktrace and ftrace-based blktrace,
we can leak memory in this way:
# btrace /dev/sda > /dev/null &
# echo 0 > /sys/block/sda/sda1/trace/enable
now we leak bt->dropped_file, bt->msg_file, bt->rchan...
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Li Zefan [Fri, 27 Mar 2009 02:20:09 +0000 (10:20 +0800)]
blktrace: fix blk_probes_ref chaos
Impact: fix mixed ioctl and ftrace-plugin blktrace use refcount bugs
ioctl-based blktrace allocates bt and registers tracepoints when
ioctl(BLKTRACESETUP), and do all cleanups when ioctl(BLKTRACETEARDOWN).
while ftrace-based blktrace allocates/frees bt when:
# echo 1/0 > /sys/block/sda/sda1/trace/enable
and registers/unregisters tracepoints when:
# echo blk/nop > /debugfs/tracing/current_tracer
or
# echo 1/0 > /debugfs/tracing/tracing_enable
The separatation of allocation and registeration causes 2 problems:
1. current user-space blktrace still calls ioctl(TEARDOWN) when
ioctl(SETUP) failed:
# echo 1 > /sys/block/sda/sda1/trace/enable
# blktrace /dev/sda
BLKTRACESETUP: Device or resource busy
^C
and now blk_probes_ref == -1
2. Another way to make blk_probes_ref == -1:
# plugin sdb && mount sdb1
# echo 1 > /sys/block/sdb/sdb1/trace/enable
# remove sdb
This patch does the allocation and registeration when writing
sdaX/trace/enable.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Li Zefan [Fri, 27 Mar 2009 02:19:46 +0000 (10:19 +0800)]
blktrace: make classic output more classic
Impact: fix ftrace plugin timestamp output
In the classic user-space blktrace, the output timestamp is sec.nsec
not sec.usec.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Li Zefan [Tue, 24 Mar 2009 08:05:27 +0000 (16:05 +0800)]
blktrace: fix off-by-one bug
'what' is used as the index of array what2act, so it can't >= the array size.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Li Zefan [Wed, 25 Mar 2009 09:21:26 +0000 (17:21 +0800)]
blktrace: fix the original blktrace
Currently the original blktrace, which is using relay and is used via
ioctl, is broken. You can use ftrace to see the output of blktrace,
but user-space blktrace is unusable.
It's broken by "blktrace: add ftrace plugin"
(
c71a896154119f4ca9e89d6078f5f63ad60ef199)
- if (unlikely(bt->trace_state != Blktrace_running))
+ if (unlikely(bt->trace_state != Blktrace_running || !blk_tracer_enabled))
return;
With this patch, both ioctl and ftrace can be used, but of course you
can't use both of them at the same time.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Li Zefan [Wed, 25 Mar 2009 09:19:33 +0000 (17:19 +0800)]
blktrace: fix a race when creating blk_tree_root in debugfs
t1 t2
------ ------
do_blk_trace_setup() do_blk_trace_setup()
if (!blk_tree_root) {
if (!blk_tree_root)
blk_tree_root = create_dir()
blk_tree_root = create_dir();
(now blk_tree_root == NULL)
...
dir = create_dir(name, blk_tree_root);
Due to this race, t1 will create 'dir' in /debugfs but not /debugfs/block.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Li Zefan [Wed, 25 Mar 2009 09:18:56 +0000 (17:18 +0800)]
blktrace: fix timestamp in binary output
I found the timestamp is wrong:
# echo bin > trace_option
# echo blk > current_tracer
# cat trace_pipe | blkparse -i -
8,0 0 0 0.
000000000 504 A W ...
...
8,7 1 0 0.
008534097 0 C R ...
(should be 8.534097xxx)
user-space blkparse expects the timestamp to be nanosecond.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar [Mon, 30 Mar 2009 22:25:23 +0000 (00:25 +0200)]
tracing, Text Edit Lock: cleanup
Remove incorrectly introduced headers.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Tom Zanussi [Thu, 26 Mar 2009 06:24:34 +0000 (01:24 -0500)]
tracing: filter fix for TRACE_EVENT_FORMAT events
Impact: fix crash (hang) when using TRACE_EVENT_FORMAT filter files
filters are only hooked up to the tracepoint events defined using
TRACE_EVENT but not the tracers that use TRACE_EVENT_FORMAT, such
as ftrace.
Do not display the filter files at all for TRACE_EVENT_FORMAT events
for the time being.
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: =?ISO-8859-1?Q?Fr=E9d=E9ric?= Weisbecker <fweisbec@gmail.com>
LKML-Reference: <
1237878882.8339.61.camel@charm-linux>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Zhaolei [Wed, 25 Mar 2009 04:06:05 +0000 (12:06 +0800)]
ftrace: Using FTRACE_WARN_ON() to check "freed record" in ftrace_release()
"Because when we call ftrace_free_rec we change the rec->ip to point to the
next record in the chain. Something is very wrong if rec->ip >= s &&
rec->ip < e and the record is already free."
"Note, use FTRACE_WARN_ON() macro. This way it shuts down ftrace if it is
hit and helps to avoid further damage later."
-- Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Masami Hiramatsu [Mon, 23 Mar 2009 14:14:52 +0000 (10:14 -0400)]
x86: kretprobe-booster interrupt emulation code fix
Fix interrupt emulation code in kretprobe-booster according to
pt_regs update (es/ds change and gs adding).
This issue has been reported on systemtap-bugzilla:
http://sources.redhat.com/bugzilla/show_bug.cgi?id=9965
| On a -tip kernel on x86_32, kretprobe_example (from samples) triggers the
| following backtrace when its retprobing a class of functions that cause a
| copy_from/to_user().
|
| BUG: sleeping function called from invalid context at mm/memory.c:3196
| in_atomic(): 0, irqs_disabled(): 1, pid: 2286, name: cat
Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Tested-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Cc: systemtap-ml <systemtap@sources.redhat.com>
LKML-Reference: <
49C7995C.
2010601@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Lai Jiangshan [Wed, 25 Mar 2009 09:06:30 +0000 (17:06 +0800)]
init,cpuset: fix initialize order
Impact: cpuset_wq should be initialized after init_workqueues()
When I read /debugfs/tracing/trace_stat/workqueues,
I got this:
# CPU INSERTED EXECUTED NAME
# | | | |
0 0 0 cpuset
0 285 285 events/0
0 2 2 work_on_cpu/0
0 1115 1115 khelper
0 325 325 kblockd/0
0 0 0 kacpid
0 0 0 kacpi_notify
0 0 0 ata/0
0 0 0 ata_aux
0 0 0 ksuspend_usbd
0 0 0 aio/0
0 0 0 nfsiod
0 0 0 kpsmoused
0 0 0 kstriped
0 0 0 kondemand/0
0 1 1 hid_compat
0 0 0 rpciod/0
1 64 64 events/1
1 2 2 work_on_cpu/1
1 5 5 kblockd/1
1 0 0 ata/1
1 0 0 aio/1
1 0 0 kondemand/1
1 0 0 rpciod/1
I found "cpuset" is at the earliest.
I found a create_singlethread_workqueue() is earlier than
init_workqueues():
kernel_init()
->cpuset_init_smp()
->create_singlethread_workqueue()
->do_basic_setup()
->init_workqueues()
I think it's better that create_singlethread_workqueue() is called
after workqueue subsystem has been initialized.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Steven Rostedt <srostedt@redhat.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul Menage <menage@google.com>
Cc: miaoxie <miaox@cn.fujitsu.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <
49C9F416.
1050707@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Lai Jiangshan [Wed, 25 Mar 2009 08:59:18 +0000 (16:59 +0800)]
trace_workqueues: fix empty line's output
Empty lines separate cpus stat. After previous
fix(trace_stat: keep original order) applied, the empty lines
are displayed at incorrect position.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Steven Rostedt <srostedt@redhat.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <
49C9F266.
2060706@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Lai Jiangshan [Wed, 25 Mar 2009 08:58:39 +0000 (16:58 +0800)]
trace_stat: keep original order
Impact: make trace_stat files show items with the original order
trace_stat tracer reverse the items, it makes the output
looks a little ugly.
Example, when we read trace_stat/workqueues, we get cpu#7's stat.
at first, and then cpu#6... cpu#0.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Steven Rostedt <srostedt@redhat.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <
49C9F23F.
5040307@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Lai Jiangshan [Wed, 25 Mar 2009 08:27:17 +0000 (16:27 +0800)]
trace_stat: don't call seq_printf() in seq_operation->start()
Impact: Fix incorrect way using seq_file's API
Use SEQ_START_TOKEN instead of calling ->stat_headers()
int seq_operation->start().
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Steven Rostedt <srostedt@redhat.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
LKML-Reference: <
49C9EAE5.
5070202@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Jody McIntyre [Tue, 24 Mar 2009 20:00:28 +0000 (16:00 -0400)]
tracing: Documentation / sample code fixes for tracepoints
Fix the tracepoint documentation to refer to "tracepoint-sample"
instead of "tracepoint-example" to match what actually exists;
fix the directory, and clarify how to compile.
Change every instance of "example" in the sample tracepoint code
to "sample" for consistency.
Signed-off-by: Jody McIntyre <scjody@sun.com>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: torvalds@linux-foundation.org
LKML-Reference: <
20090324200027.GH8294@clouds>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Lai Jiangshan [Tue, 24 Mar 2009 05:38:06 +0000 (13:38 +0800)]
tracing: use union for multi-usages field
Impact: cleanup
struct dyn_ftrace::ip has different usages in his lifecycle,
we use union for it. And also for struct dyn_ftrace::flags.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Steven Rostedt <srostedt@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <
49C871BE.
3080405@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Lai Jiangshan [Tue, 24 Mar 2009 03:03:01 +0000 (11:03 +0800)]
ftrace: show virtual PID
Impact: fix PID output under namespaces
When current namespace is not the global namespace,
pid read from set_ftrace_pid is no correct.
# ~/newpid_namespace_run bash
# echo $$
1
# echo 1 > set_ftrace_pid
# cat set_ftrace_pid
3756
Since we write virtual PID to set_ftrace_pid, we need get
virtual PID when we read it.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Steven Rostedt <srostedt@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <
49C84D65.
9050606@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Steven Rostedt [Tue, 24 Mar 2009 15:06:24 +0000 (11:06 -0400)]
function-graph: add option for include sleep times
Impact: give user a choice to show times spent while sleeping
The user may want to see the time a function spent sleeping.
This patch adds the trace option "sleep-time" to allow that.
The "sleep-time" option is default on.
echo sleep-time > /debug/tracing/trace_options
produces:
------------------------------------------
2) avahi-d-3428 => <idle>-0
------------------------------------------
2) | finish_task_switch() {
2) 0.621 us | _spin_unlock_irq();
2) 2.202 us | }
2) ! 1002.197 us | }
2) ! 1003.521 us | }
where as,
echo nosleep-time > /debug/tracing/trace_options
produces:
0) <idle>-0 => yum-upd-3416
------------------------------------------
0) | finish_task_switch() {
0) 0.643 us | _spin_unlock_irq();
0) 2.342 us | }
0) + 41.302 us | }
0) + 42.453 us | }
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Steven Rostedt [Tue, 24 Mar 2009 05:10:15 +0000 (01:10 -0400)]
function-graph: ignore times across schedule
Impact: more accurate timings
The current method of function graph tracing does not take into
account the time spent when a task is not running. This shows functions
that call schedule have increased costs:
3) + 18.664 us | }
------------------------------------------
3) <idle>-0 => kblockd-123
------------------------------------------
3) | finish_task_switch() {
3) 1.441 us | _spin_unlock_irq();
3) 3.966 us | }
3) ! 2959.433 us | }
3) ! 2961.465 us | }
This patch uses the tracepoint in the scheduling context switch to
account for time that has elapsed while a task is scheduled out.
Now we see:
------------------------------------------
3) <idle>-0 => edac-po-1067
------------------------------------------
3) | finish_task_switch() {
3) 0.685 us | _spin_unlock_irq();
3) 2.331 us | }
3) + 41.439 us | }
3) + 42.663 us | }
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Steven Rostedt [Tue, 24 Mar 2009 04:18:31 +0000 (00:18 -0400)]
function-graph: prevent more than one tracer registering
Impact: prevent crash due to multiple function graph tracers
The function graph tracer can currently only handle a single tracer
being registered. If another tracer registers with the function
graph tracer it can crash the system.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Steven Rostedt [Tue, 24 Mar 2009 03:38:49 +0000 (23:38 -0400)]
function-graph: moved the timestamp from arch to generic code
This patch move the timestamp from happening in the arch specific
code into the general code. This allows for better control by the tracer
to time manipulation.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Steven Rostedt [Sat, 21 Mar 2009 06:44:50 +0000 (02:44 -0400)]
tracing: fix memory leak in trace_stat
If the function profiler does not have any items recorded and one were
to cat the function stat file, the kernel would take a BUG with a NULL
pointer dereference.
Looking further into this, I found that returning NULL from stat_start
did not stop the stat logic, and would later call stat_next. This breaks
from the way seq_file works, so I looked into fixing the stat code.
This is where I noticed that the last next_entry is never freed.
It is allocated, and if the stat_next returns NULL, the code breaks out
of the loop, unlocks the mutex and exits. We never link the next_entry
nor do we free it. Thus it is a real memory leak.
This patch rearranges the code a bit to not only fix the memory leak,
but also to act more like seq_file where nothing is printed if there
is nothing to print. That is, stat_start returns NULL.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Li Zefan [Tue, 24 Mar 2009 09:43:30 +0000 (17:43 +0800)]
blktrace: print human-readable act_mask
Impact: new feature, allow symbolic values in /debug/tracing/act_mask
Print stringified act_mask instead of hex value:
# cat act_mask
read,write,barrier,sync,queue,requeue,issue,complete,fs,pc,ahead,meta,
discard,drv_data
# echo "meta,write" > act_mask
# cat act_mask
write,meta
Also:
- make act_mask accept "ahead", "meta", "discard" and "drv_data"
- use strsep() instead of strchr() to parse user input
- return -EINVAL if a token is not found in the mask map
- fix a bug that 'value' is unsigned, so it can < 0
- propagate error value of blk_trace_mask2str() to userspace, but not
always return -ENXIO.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <
49C8AB42.
1000802@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Li Zefan [Tue, 24 Mar 2009 08:05:51 +0000 (16:05 +0800)]
blktrace: fix t_error()
Impact: fix error flag output
t_error() should return t->error but not t->sector.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <
49C8945F.
5020802@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Li Zefan [Tue, 24 Mar 2009 08:05:06 +0000 (16:05 +0800)]
blktrace: fix wrong calculation of RWBS
Impact: fix the output of IO type category characters
Trace categories are the upper 16 bits, not the lower 16 bits.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <
49C89432.
8010805@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Li Zefan [Tue, 24 Mar 2009 08:04:37 +0000 (16:04 +0800)]
blktrace: mark ddir_act[] const
Impact: cleanup
ddir_act and what2act always stay immutable.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <
49C89415.
5080503@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Tom Zanussi [Tue, 24 Mar 2009 07:14:42 +0000 (02:14 -0500)]
tracing/filters: disallow integer values for string filters and vice versa
Impact: fix filter use boundary condition / crash
Make sure filters for string fields don't use integer values and vice
versa. Getting it wrong can crash the system or produce bogus
results.
Signed-off-by: Tom Zanussi <tzanussi@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: =?ISO-8859-1?Q?Fr=E9d=E9ric?= Weisbecker <fweisbec@gmail.com>
LKML-Reference: <
1237878882.8339.61.camel@charm-linux>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Tom Zanussi [Tue, 24 Mar 2009 07:14:31 +0000 (02:14 -0500)]
tracing/filters: use trace_seq_printf() to print filters
Impact: cleanup
Instead of just using the trace_seq buffer to print the filters, use
trace_seq_printf() as it was intended to be used.
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Tom Zanussi <tzanussi@gmail.com>
Cc: =?ISO-8859-1?Q?Fr=E9d=E9ric?= Weisbecker <fweisbec@gmail.com>
LKML-Reference: <
1237878871.8339.59.camel@charm-linux>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Tom Zanussi [Tue, 24 Mar 2009 07:14:11 +0000 (02:14 -0500)]
tracing/filters: free pred when clearing filters
Impact: fix (small) per trace filter modification memory leak
Free the current pred when clearing the filters via the filter files.
Signed-off-by: Tom Zanussi <tzanussi@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: =?ISO-8859-1?Q?Fr=E9d=E9ric?= Weisbecker <fweisbec@gmail.com>
LKML-Reference: <
1237878851.8339.58.camel@charm-linux>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Tom Zanussi [Tue, 24 Mar 2009 07:14:01 +0000 (02:14 -0500)]
tracing/filters: use list_for_each_entry
Impact: cleanup
No need to use the safe version here, so use list_for_each_entry instead
of list_for_each_entry_safe in find_event_field().
Signed-off-by: Tom Zanussi <tzanussi@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: =?ISO-8859-1?Q?Fr=E9d=E9ric?= Weisbecker <fweisbec@gmail.com>
LKML-Reference: <
1237878841.8339.57.camel@charm-linux>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Frederic Weisbecker [Mon, 23 Mar 2009 21:17:01 +0000 (22:17 +0100)]
tracing/function-graph-tracer: fix functions call traces imbalance
Impact: fix traces output
Sometimes one can observe an imbalance in the traces between function
calls and function return traces:
func1() {
}
}
The curly brace inside func1() is the return of another function nested
inside func1. The return trace have been inserted in the buffer but not
the entry.
We are storing a return address on the function traces stack while we
haven't inserted its entry on the buffer, hence the imbalance on the
traces.
This is because the tracers doesn't check all failures that can happen
on buffer insertion.
This patch reports the tracing recursion failures and the ring buffer
failures. In such cases, we now restore the original return address for
the function, giving up its return trace.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <
1237843021-11695-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Anton Vorontsov [Mon, 23 Mar 2009 22:07:24 +0000 (01:07 +0300)]
tracing: Fix TRACING_SUPPORT dependency for PPC32
commit
40ada30f9621fbd831ac2437b9a2a399aa ("tracing: clean up menu"),
despite the "clean up" in its purpose, introduced a behavioural
change for Kconfig symbols: we no longer able to select tracing
support on PPC32 (because IRQFLAGS_SUPPORT isn't yet implemented).
The IRQFLAGS_SUPPORT is not mandatory for most tracers, tracing core
has a special case for platforms w/o irqflags (which, by the way, has
become useless as of the commit above).
Though according to Ingo Molnar, there was periodic build failures on
weird, unmaintained architectures that had no irqflags-tracing support
and hence didn't know the raw_irqs_save/restore primitives. Thus we'd
better not enable irqflags-less tracing for all architectures.
This patch restores the old behaviour for PPC32, and thus brings the
tracing back. Other architectures can either add themselves to the
exception list or (better) implement TRACE_IRQFLAGS_SUPPORT.
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Acked-b: Steven Rostedt <rostedt@goodmis.org>
Cc: linuxppc-dev@ozlabs.org
LKML-Reference: <
20090323220724.GA9851@oksana.dev.rtsoft.ru>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Frederic Weisbecker [Sun, 22 Mar 2009 22:10:45 +0000 (23:10 +0100)]
tracing/ftrace: check if debugfs is registered before creating files
Impact: fix a crash with ftrace={nop,boot} parameter
If the nop or initcall tracers are launched as boot tracers,
they will attempt to create their option directory and files.
But these tracers are registered very early and then assigned
as "boot tracers" very early if asked to.
Since they do this before debugfs has been registered (core initcall),
a crash is triggered.
Another early tracers could also come later. So we fix it by
checking if debugfs is initialized before creating the root
tracing directory.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <
1237759847-21025-3-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Frederic Weisbecker [Sun, 22 Mar 2009 22:10:44 +0000 (23:10 +0100)]
debugfs: function to know if debugfs is initialized
Impact: add new debugfs API
With ftrace, some tracers are registered in early initcalls
and attempt to create files on the debugfs filesystem.
Depending on when they are activated, they can try to create their
file at any time. Some checks can be done on the tracing area
but providing a helper to know if debugfs is registered make it
really more easy.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <
1237759847-21025-2-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Tom Zanussi [Mon, 23 Mar 2009 08:26:48 +0000 (03:26 -0500)]
tracing/filters: clean up filter_add_subsystem_pred()
Impact: cleanup, memory leak fix
This patch cleans up filter_add_subsystem_pred():
- searches for the field before creating a copy of the pred
- fixes memory leak in the case a predicate isn't applied
- if -ENOMEM, makes sure there's no longer a reference to the
pred so the caller can free the half-finished filter
- changes the confusing i == MAX_FILTER_PRED - 1 comparison
previously remarked upon
This affects only per-subsystem event filtering.
Signed-off-by: Tom Zanussi <tzanussi@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: =?ISO-8859-1?Q?Fr=E9d=E9ric?= Weisbecker <fweisbec@gmail.com>
LKML-Reference: <
1237796808.7527.40.camel@charm-linux>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Tom Zanussi [Mon, 23 Mar 2009 08:26:42 +0000 (03:26 -0500)]
tracing/filters: fix bug in copy_pred()
Impact: fix potential crash on subsystem filter expression freeing
When making a copy of the predicate, pred->field_name needs to be
duplicated in the copy as well, otherwise bad things can happen due to
later multiple frees of the same string.
This affects only per-subsystem event filtering.
Signed-off-by: Tom Zanussi <tzanussi@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: =?ISO-8859-1?Q?Fr=E9d=E9ric?= Weisbecker <fweisbec@gmail.com>
LKML-Reference: <
1237796802.7527.39.camel@charm-linux>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Tom Zanussi [Mon, 23 Mar 2009 08:26:28 +0000 (03:26 -0500)]
tracing/filters: use list_for_each_entry_safe
Impact: cleanup
Use list_for_each_entry_safe instead of list_for_each_entry in
find_event_field().
Reported-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Tom Zanussi <tzanussi@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <
1237796788.7527.35.camel@charm-linux>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Frederic Weisbecker [Sun, 22 Mar 2009 23:18:39 +0000 (00:18 +0100)]
tracing/events: don't discard an event after commit
When we want to filter an event, the filter test is done after
the event is commited to the ring-buffer to be discarded later if
needed.
But a reader could be reading this event while we are trying to discard
it. Other kind of racy events can even happen because the event is
commited and can be read and/or consumed.
What we want is to discard the event before committing it.
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
LKML-Reference: <
1237763919-21505-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Frederic Weisbecker [Sun, 22 Mar 2009 22:10:47 +0000 (23:10 +0100)]
tracing/ftrace: make nop-tracer use polling wait for events on pipe
Impact: display events when they arrive
Now that the events don't use wake_up() anymore, we need the nop
tracer to poll waiting for events on the pipe. Especially because
nop is useful to look at orphan traces types (traces types that
don't rely on specific tracers) because it doesn't produce traces
itself.
And unlike other tracers that trigger specific traces periodically,
nop triggers no traces by itself that can wake him.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <
1237759847-21025-5-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Frederic Weisbecker [Sun, 22 Mar 2009 22:10:46 +0000 (23:10 +0100)]
tracing/events: don't use wake up for events
Impact: fix hard-lockup with sched switch events
Some ftrace events, such as sched wakeup, can be traced
while the runqueue lock is hold. Since they are using
trace_current_buffer_unlock_commit(), they call wake_up()
which can try to grab the runqueue lock too, resulting in
a deadlock.
Now for all event, we call a new helper:
trace_nowake_buffer_unlock_commit() which do pretty the same than
trace_current_buffer_unlock_commit() except than it doesn't call
trace_wake_up().
Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <
1237759847-21025-4-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Frederic Weisbecker [Sun, 22 Mar 2009 22:10:43 +0000 (23:10 +0100)]
tracing/events: make the filter files writable
We need the filter files to be writable, the current
filter file permissions are only set readable.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tom Zanussi <tzanussi@gmail.com>
LKML-Reference: <
1237759847-21025-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar [Sun, 22 Mar 2009 17:41:59 +0000 (18:41 +0100)]
tracing: add run-time field descriptions for event filtering, kfree fix
Impact: fix potential kfree of random data in (rare) failure path
Zero-initialize the field structure.
Reported-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
LKML-Reference: <
1237710639.7703.46.camel@charm-linux>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Tom Zanussi [Sun, 22 Mar 2009 08:31:17 +0000 (03:31 -0500)]
tracing: add per-subsystem filtering
This patch adds per-subsystem filtering to the event tracing subsystem.
It adds a 'filter' debugfs file to each subsystem directory. This file
can be written to to set filters; reading from it will display the
current set of filters set for that subsystem.
Basically what it does is propagate the filter down to each event
contained in the subsystem. If a particular event doesn't have a field
with the name specified in the filter, it simply doesn't get set for
that event. You can verify whether or not the filter was set for a
particular event by looking at the filter file for that event.
As with per-event filters, compound expressions are supported, echoing
'0' to the subsystem's filter file clears all filters in the subsystem,
etc.
Signed-off-by: Tom Zanussi <tzanussi@gmail.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <
1237710677.7703.49.camel@charm-linux>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Tom Zanussi [Sun, 22 Mar 2009 08:31:04 +0000 (03:31 -0500)]
tracing: add per-event filtering
This patch adds per-event filtering to the event tracing subsystem.
It adds a 'filter' debugfs file to each event directory. This file can
be written to to set filters; reading from it will display the current
set of filters set for that event.
Basically, any field listed in the 'format' file for an event can be
filtered on (including strings, but not yet other array types) using
either matching ('==') or non-matching ('!=') 'predicates'. A
'predicate' can be either a single expression:
# echo pid != 0 > filter
# cat filter
pid != 0
or a compound expression of up to 8 sub-expressions combined using '&&'
or '||':
# echo comm == Xorg > filter
# echo "&& sig != 29" > filter
# cat filter
comm == Xorg
&& sig != 29
Only events having field values matching an expression will be available
in the trace output; non-matching events are discarded.
Note that a compound expression is built up by echoing each
sub-expression separately - it's not the most efficient way to do
things, but it keeps the parser simple and assumes that compound
expressions will be relatively uncommon. In any case, a subsequent
patch introducing a way to set filters for entire subsystems should
mitigate any need to do this for lots of events.
Setting a filter without an '&&' or '||' clears the previous filter
completely and sets the filter to the new expression:
# cat filter
comm == Xorg
&& sig != 29
# echo comm != Xorg
# cat filter
comm != Xorg
To clear a filter, echo 0 to the filter file:
# echo 0 > filter
# cat filter
none
The limit of 8 predicates for a compound expression is arbitrary - for
efficiency, it's implemented as an array of pointers to predicates, and
8 seemed more than enough for any filter...
Signed-off-by: Tom Zanussi <tzanussi@gmail.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <
1237710665.7703.48.camel@charm-linux>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Tom Zanussi [Sun, 22 Mar 2009 08:30:49 +0000 (03:30 -0500)]
tracing: add ring_buffer_event_discard() to ring buffer
This patch overloads RINGBUF_TYPE_PADDING to provide a way to discard
events from the ring buffer, for the event-filtering mechanism
introduced in a subsequent patch.
I did the initial version but thanks to Steven Rostedt for adding
the parts that actually made it work. ;-)
Signed-off-by: Tom Zanussi <tzanussi@gmail.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Dmitri Vorobiev [Sun, 22 Mar 2009 17:11:11 +0000 (19:11 +0200)]
tracing: fix four sparse warnings
Impact: cleanup.
This patch fixes the following sparse warnings:
kernel/trace/trace.c:385:9: warning: symbol 'trace_seq_to_buffer' was
not declared. Should it be static?
kernel/trace/trace_clock.c:29:13: warning: symbol 'trace_clock_local'
was not declared. Should it be static?
kernel/trace/trace_clock.c:54:13: warning: symbol 'trace_clock' was not
declared. Should it be static?
kernel/trace/trace_clock.c:74:13: warning: symbol 'trace_clock_global'
was not declared. Should it be static?
Signed-off-by: Dmitri Vorobiev <dmitri.vorobiev@movial.com>
LKML-Reference: <
1237741871-5827-4-git-send-email-dmitri.vorobiev@movial.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Dmitri Vorobiev [Sun, 22 Mar 2009 17:11:10 +0000 (19:11 +0200)]
tracing, Text Edit Lock: Fix one sparse warning in kernel/extable.c
Impact: cleanup.
The global mutex text_mutex if declared in linux/memory.h, so
this file needs to be included into kernel/extable.c, where the
same mutex is defined. This fixes the following sparse warning:
kernel/extable.c:32:1: warning: symbol 'text_mutex' was not declared.
Should it be static?
Signed-off-by: Dmitri Vorobiev <dmitri.vorobiev@movial.com>
LKML-Reference: <
1237741871-5827-3-git-send-email-dmitri.vorobiev@movial.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Tom Zanussi [Sun, 22 Mar 2009 08:30:39 +0000 (03:30 -0500)]
tracing: add run-time field descriptions for event filtering
This patch makes the field descriptions defined for event tracing
available at run-time, for the event-filtering mechanism introduced
in a subsequent patch.
The common event fields are prepended with 'common_' in the format
display, allowing them to be distinguished from the other fields
that might internally have same name and can therefore be
unambiguously used in filters.
Signed-off-by: Tom Zanussi <tzanussi@gmail.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <
1237710639.7703.46.camel@charm-linux>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar [Sun, 22 Mar 2009 17:10:02 +0000 (18:10 +0100)]
Merge branches 'tracing/ftrace', 'tracing/hw-breakpoints', 'tracing/ring-buffer', 'tracing/textedit' and 'linus' into tracing/core
Frederic Weisbecker [Sun, 22 Mar 2009 14:13:07 +0000 (15:13 +0100)]
tracing: keep the tracing buffer after self-test failure
Instead of using ftrace_dump_on_oops, it's far more convenient
to have the trace leading up to a self-test failure available
in /debug/tracing/trace.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <
1237694675-23509-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Frederic Weisbecker [Sun, 22 Mar 2009 04:04:35 +0000 (05:04 +0100)]
tracing/function-graph-tracer: prevent hangs during self-tests
Impact: detect tracing related hangs
Sometimes, with some configs, the function graph tracer can make
the timer interrupt too much slow, hanging the kernel in an endless
loop of timer interrupts servicing.
As suggested by Ingo, this patch brings a watchdog which stops the
selftest after a defined number of functions traced, definitely
disabling this tracer.
For those who want to debug the cause of the function graph trace
hang, you can pass the ftrace_dump_on_oops kernel parameter to dump
the traces after this hang detection.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <
1237694675-23509-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Li Zefan [Fri, 20 Mar 2009 02:34:00 +0000 (10:34 +0800)]
blktrace: avoid accessing NULL bdev->bd_disk
bdev->bd_disk can be NULL, if the block device is not opened.
Try this against an unmounted partition, and you'll see NULL dereference:
# echo 1 > /sys/block/sda/sda5/enable
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <
49C30098.
6080107@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Li Zefan [Fri, 20 Mar 2009 03:33:55 +0000 (11:33 +0800)]
blktrace: remove sysfs_blk_trace_enable_show/store()
sysfs_blk_trace_enable_show()/store() share most of code with
sysfs_blk_trace_attr_show()/store().
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <
49C30EA3.
1060004@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Li Zefan [Fri, 20 Mar 2009 01:49:08 +0000 (09:49 +0800)]
blktrace: report EBUSY correctly
blk_trace_remove_queue() returns EINVAL if q->blk_trace == NULL,
but blk_trace_setup_queue() doesn't return EBUSY if
q->blk_trace != NULL.
# echo 0 > sdaX/trace/enable
# echo 0 > sdaX/trace/enable
bash: echo: write error: Invalid argument
# echo 1 > sdaX/trace/enable
# echo 1 > sdaX/trace/enable
(should return EBUSY)
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <
49C2F614.
2010101@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Li Zefan [Fri, 20 Mar 2009 01:48:47 +0000 (09:48 +0800)]
blktrace: don't increase blk_probes_ref if failed to setup blk trace
do_blk_trace_setup() may return EBUSY, but the current code
doesn't decrease blk_probes_ref in this case.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <
49C2F5FF.80002@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Li Zefan [Fri, 20 Mar 2009 01:48:26 +0000 (09:48 +0800)]
blktrace: remove blk_probe_mutex
blk_register_tracepoints() always returns 0, so make it return void,
thus we don't need to use blk_probe_mutex to protect blk_probes_ref.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <
49C2F5EA.
8060606@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Li Zefan [Fri, 20 Mar 2009 01:48:03 +0000 (09:48 +0800)]
blktrace: make blk_tracer_enabled a bool flag
It doesn't have to be a counter, and it can be a bool flag instead.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <
49C2F5D3.
8090104@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Li Zefan [Fri, 20 Mar 2009 01:47:30 +0000 (09:47 +0800)]
blktrace: fix possible memory leak
When we failed to create "block" debugfs dir, we should do some
cleanups.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <
49C2F5B2.
8000800@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Frederic Weisbecker [Sat, 21 Mar 2009 03:33:36 +0000 (04:33 +0100)]
tracing/ring-buffer: don't annotate rb_cpu_notify with __cpuinit
Impact: remove a section warning
CONFIG_DEBUG_SECTION_MISMATCH raises the following warning on -tip:
WARNING: kernel/trace/built-in.o(.text+0x5bc5): Section mismatch in
reference from the function ring_buffer_alloc() to the function
.cpuinit.text:rb_cpu_notify()
The function ring_buffer_alloc() references
the function __cpuinit rb_cpu_notify().
This is actually harmless. The code in the ring buffer don't build
rb_cpu_notify and other cpu hotplug stuffs when !CONFIG_HOTPLUG_CPU
so we have no risk to reference freed memory here (it would even
be harmless if we unconditionally build it because register_cpu_notifier
would do nothing when !CONFIG_HOTPLUG_CPU.
But since ring_buffer_alloc() can be called everytime, we don't want it
to be annotated with __cpuinit so we drop the __cpuinit from
rb_cpu_notify.
This is not a waste of memory because it is only defined and used on
CONFIG_HOTPLUG_CPU.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <
1237606416-22268-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar [Fri, 20 Mar 2009 10:05:04 +0000 (11:05 +0100)]
tracing, Text Edit Lock - kprobes architecture independent support, nommu fix
Impact: build fix on SH !CONFIG_MMU
Stephen Rothwell reported this linux-next build failure on the SH
architecture:
kernel/built-in.o: In function `disable_all_kprobes':
kernel/kprobes.c:1382: undefined reference to `text_mutex'
[...]
And observed:
| Introduced by commit
4460fdad85becd569f11501ad5b91814814335ff ("tracing,
| Text Edit Lock - kprobes architecture independent support") from the
| tracing tree. text_mutex is defined in mm/memory.c which is only built
| if CONFIG_MMU is defined, which is not true for sh allmodconfig.
Move this lock to kernel/extable.c (which is already home to various
kernel text related routines), which file is always built-in.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
LKML-Reference: <
20090320110602.
86351a91.sfr@canb.auug.org.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Peter Zijlstra [Thu, 19 Mar 2009 19:26:15 +0000 (20:26 +0100)]
ftrace: event profile hooks
Impact: new tracing infrastructure feature
Provide infrastructure to generate software perf counter events
from tracepoints.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <
20090319194233.
557364871@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Peter Zijlstra [Thu, 19 Mar 2009 19:26:14 +0000 (20:26 +0100)]
ftrace: ensure every event gets an id
Impact: widen user-space visibe event IDs to all events
Previously only TRACE_EVENT events got ids, because only they
generated raw output which needs to be demuxed from the trace.
In order to provide a unique ID for each event, register everybody,
regardless.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <
20090319194233.
464914218@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Peter Zijlstra [Thu, 19 Mar 2009 19:26:13 +0000 (20:26 +0100)]
ftrace: provide an id file for each event
Since not every event has a format file to read the id from,
expose it explicitly in a separate file.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <
20090319194233.
372534033@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar [Fri, 20 Mar 2009 09:15:13 +0000 (10:15 +0100)]
Merge branch 'tip/tracing/ftrace' of git://git./linux/kernel/git/rostedt/linux-2.6-trace into tracing/ftrace
Ingo Molnar [Fri, 20 Mar 2009 09:14:53 +0000 (10:14 +0100)]
Merge branches 'tracing/ftrace', 'tracing/kprobes', 'tracing/tasks' and 'linus' into tracing/core
Jeff Moyer [Thu, 19 Mar 2009 00:04:21 +0000 (17:04 -0700)]
aio: lookup_ioctx can return the wrong value when looking up a bogus context
The libaio test harness turned up a problem whereby lookup_ioctx on a
bogus io context was returning the 1 valid io context from the list
(harness/cases/3.p).
Because of that, an extra put_iocontext was done, and when the process
exited, it hit a BUG_ON in the put_iocontext macro called from exit_aio
(since we expect a users count of 1 and instead get 0).
The problem was introduced by "aio: make the lookup_ioctx() lockless"
(commit
abf137dd7712132ee56d5b3143c2ff61a72a5faa).
Thanks to Zach for pointing out that hlist_for_each_entry_rcu will not
return with a NULL tpos at the end of the loop, even if the entry was
not found.
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Acked-by: Zach Brown <zach.brown@oracle.com>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Davide Libenzi [Thu, 19 Mar 2009 00:04:19 +0000 (17:04 -0700)]
eventfd: remove fput() call from possible IRQ context
Remove a source of fput() call from inside IRQ context. Myself, like Eric,
wasn't able to reproduce an fput() call from IRQ context, but Jeff said he was
able to, with the attached test program. Independently from this, the bug is
conceptually there, so we might be better off fixing it. This patch adds an
optimization similar to the one we already do on ->ki_filp, on ->ki_eventfd.
Playing with ->f_count directly is not pretty in general, but the alternative
here would be to add a brand new delayed fput() infrastructure, that I'm not
sure is worth it.
Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Cc: Zach Brown <zach.brown@oracle.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Thu, 19 Mar 2009 22:53:19 +0000 (15:53 -0700)]
Move cc-option to below arch-specific setup
Sam Ravnborg says:
"We have several architectures that plays strange games with $(CC) and
$(CROSS_COMPILE).
So we need to postpone any use of $(call cc-option..) until we have
included the arch specific Makefile so we try with the correct $(CC)
version."
Requested-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Thu, 19 Mar 2009 21:56:35 +0000 (14:56 -0700)]
Merge branch 'for-linus' of git://git390.marist.edu/linux-2.6
* 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6:
[S390] make page table upgrade work again
[S390] make page table walking more robust
[S390] Dont check for pfn_valid() in uaccess_pt.c
[S390] ftrace/mcount: fix kernel stack backchain
[S390] topology: define SD_MC_INIT to fix performance regression
[S390] __div64_31 broken for CONFIG_MARCH_G5
Linus Torvalds [Thu, 19 Mar 2009 21:50:15 +0000 (14:50 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jikos/hid
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
HID: fix waitqueue usage in hiddev
HID: fix incorrect free in hiddev
Linus Torvalds [Thu, 19 Mar 2009 21:49:55 +0000 (14:49 -0700)]
Merge git://git./linux/kernel/git/mason/btrfs-unstable
* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
Btrfs: Clear space_info full when adding new devices
Btrfs: Fix locking around adding new space_info
Steven Rostedt [Thu, 19 Mar 2009 19:14:46 +0000 (15:14 -0400)]
function-graph: show binary events as comments
With the added TRACE_EVENT macro, the events no longer appear in
the function graph tracer. This was because the function graph
did not know how to display the entries. The graph tracer was
only aware of its own entries and the printk entries.
By using the event call back feature, the graph tracer can now display
the events.
# echo irq > /debug/tracing/set_event
Which can show:
0) | handle_IRQ_event() {
0) | /* irq_handler_entry: irq=48 handler=eth0 */
0) | e1000_intr() {
0) 0.926 us | __napi_schedule();
0) 3.888 us | }
0) | /* irq_handler_exit: irq=48 return=handled */
0) 0.655 us | runqueue_is_locked();
0) | __wake_up() {
0) 0.831 us | _spin_lock_irqsave();
The irq entry and exit events show up as comments.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Steven Rostedt [Thu, 19 Mar 2009 18:03:53 +0000 (14:03 -0400)]
tracing: remove recording function depth from trace_printk
The function depth in trace_printk was to facilitate the function
graph output. Now that the function graph calculates the depth within
the trace output, we no longer need to record the depth when the
trace_printk is called.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Steven Rostedt [Thu, 19 Mar 2009 17:24:42 +0000 (13:24 -0400)]
function-graph: calculate function depth within function graph tracer
Currently, the function graph tracer depends on the trace_printk
to record the depth. All the information is already there in the trace
to calculate function depth, with the exception of having the printk
be the first item. But as soon as a entry or exit is reached, then
we know the depth.
This patch changes the iter->private data from recording a per cpu
last_pid, to a structure that holds both the last_pid and the current
depth. This data is used to determine the function depth for the
printks.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Steven Rostedt [Thu, 19 Mar 2009 16:20:38 +0000 (12:20 -0400)]
tracing: make print_(b)printk_msg_only global
This patch makes print_printk_msg_only and print_bprintk_msg_only
global for other functions to use. It also renames them by adding
a "trace_" to the beginning to avoid namespace collisions.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Linus Torvalds [Thu, 19 Mar 2009 18:32:05 +0000 (11:32 -0700)]
Fix race in create_empty_buffers() vs __set_page_dirty_buffers()
Nick Piggin noticed this (very unlikely) race between setting a page
dirty and creating the buffers for it - we need to hold the mapping
private_lock until we've set the page dirty bit in order to make sure
that create_empty_buffers() might not build up a set of buffers without
the dirty bits set when the page is dirty.
I doubt anybody has ever hit this race (and it didn't solve the issue
Nick was looking at), but as Nick says: "Still, it does appear to solve
a real race, which we should close."
Acked-by: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Thu, 19 Mar 2009 18:10:17 +0000 (11:10 -0700)]
Add '-fwrapv' to gcc CFLAGS
This makes sure that gcc doesn't try to optimize away wrapping
arithmetic, which the kernel occasionally uses for overflow testing, ie
things like
if (ptr + offset < ptr)
which technically is undefined for non-unsigned types. See
http://bugzilla.kernel.org/show_bug.cgi?id=12597
for details.
Not all versions of gcc support it, so we need to make it conditional
(it looks like it was introduced in gcc-3.4).
Reminded-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Frederic Weisbecker [Thu, 19 Mar 2009 13:47:33 +0000 (14:47 +0100)]
tracing/ring-buffer: fix non cpu hotplug case
Impact: fix warning with irqsoff tracer
The ring buffer allocates its buffers on pre-smp time (early_initcall).
It means that, at first, only the boot cpu buffer is allocated and
the ring-buffer cpumask only has the boot cpu set (cpu_online_mask).
Later, the secondary cpu will show up and the ring-buffer will be notified
about this event: the appropriate buffer will be allocated and the cpumask
will be updated.
Unfortunately, if !CONFIG_CPU_HOTPLUG, the ring-buffer will not be
notified about the secondary cpus, meaning that the cpumask will have
only the cpu boot set, and only one cpu buffer allocated.
We fix that by using cpu_possible_mask if !CONFIG_CPU_HOTPLUG.
This patch fixes the following warning with irqsoff tracer running:
[ 169.317794] WARNING: at kernel/trace/trace.c:466 update_max_tr_single+0xcc/0xf3()
[ 169.318002] Hardware name: AMILO Li 2727
[ 169.318002] Modules linked in:
[ 169.318002] Pid: 5624, comm: bash Not tainted
2.6.29-rc8-tip-02636-g6aafa6c #11
[ 169.318002] Call Trace:
[ 169.318002] [<
ffffffff81036182>] warn_slowpath+0xea/0x13d
[ 169.318002] [<
ffffffff8100b9d6>] ? ftrace_call+0x5/0x2b
[ 169.318002] [<
ffffffff8100b9d6>] ? ftrace_call+0x5/0x2b
[ 169.318002] [<
ffffffff8100b9d1>] ? ftrace_call+0x0/0x2b
[ 169.318002] [<
ffffffff8101ef10>] ? ftrace_modify_code+0xa9/0x108
[ 169.318002] [<
ffffffff8106e27f>] ? trace_hardirqs_off+0x25/0x27
[ 169.318002] [<
ffffffff8149afe7>] ? _spin_unlock_irqrestore+0x1f/0x2d
[ 169.318002] [<
ffffffff81064f52>] ? ring_buffer_reset_cpu+0xf6/0xfb
[ 169.318002] [<
ffffffff8106637c>] ? ring_buffer_reset+0x36/0x48
[ 169.318002] [<
ffffffff8106aeda>] update_max_tr_single+0xcc/0xf3
[ 169.318002] [<
ffffffff8100bc17>] ? sysret_check+0x22/0x5d
[ 169.318002] [<
ffffffff8106e3ea>] stop_critical_timing+0x142/0x204
[ 169.318002] [<
ffffffff8106e4cf>] trace_hardirqs_on_caller+0x23/0x25
[ 169.318002] [<
ffffffff8149ac28>] trace_hardirqs_on_thunk+0x3a/0x3c
[ 169.318002] [<
ffffffff8100bc17>] ? sysret_check+0x22/0x5d
[ 169.318002] ---[ end trace
db76cbf775a750cf ]---
Because this tracer may try to swap two cpu ring buffers for an
unregistered cpu on the ring buffer.
This patch might also fix a fair loss of traces due to unallocated buffers
for secondary cpus.
Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-b: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <
1237470453-5427-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>