Lai Jiangshan [Fri, 24 Apr 2009 03:27:05 +0000 (11:27 +0800)]
ring_buffer: compressed event header
RB_MAX_SMALL_DATA = 28bytes is too small for most tracers, it wastes
an 'u32' to save the actually length for events which data size > 28.
This fix uses compressed event header and enlarges RB_MAX_SMALL_DATA.
[ Impact: saves about 0%-12.5%(depends on tracer) memory in ring_buffer ]
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
LKML-Reference: <
49F13189.
3090000@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Steven Rostedt [Fri, 24 Apr 2009 03:26:18 +0000 (23:26 -0400)]
tracing: fix cut and paste macro error
In case a module uses the TRACE_EVENT macro for creating automated
events in ftrace, it may choose to use a different file name
than the defined system name, or choose to use a different path than
the default "include/trace/events" include path.
If this is done, then before including trace/define_trace.h the
header would define either "TRACE_INCLUDE_FILE" for the file
name or "TRACE_INCLUDE_PATH" for the include path.
If it does not define these, then the define_trace.h defines them
instead. If define trace defines them, then define_trace.h should
also undefine them before exiting. To do this a macro is used
to note this:
#ifndef TRACE_INCLUDE_FILE
# define TRACE_INCLUDE_FILE TRACE_SYSTEM
# define UNDEF_TRACE_INCLUDE_FILE
#endif
[...]
#ifdef UNDEF_TRACE_INCLUDE_FILE
# undef TRACE_INCLUDE_FILE
# undef UNDEF_TRACE_INCLUDE_FILE
#endif
The UNDEF_TRACE_INCLUDE_FILE acts as a CPP variable to know to undef
the TRACE_INCLUDE_FILE before leaving define_trace.h.
Unfortunately, due to cut and paste errors, the macros between
FILE and PATH got mixed up.
[ Impact: undef TRACE_INCLUDE_FILE and/or TRACE_INCLUDE_PATH when needed ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Chris Wright [Thu, 23 Apr 2009 17:21:38 +0000 (10:21 -0700)]
x86: use native register access for native tlb flushing
currently these are paravirtulaized, doesn't appear any callers rely on
this (no pv_ops backends are using native_tlb and overriding cr3/4
access).
[ Impact: fix lockdep warning with paravirt and function tracer ]
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
LKML-Reference: <
20090423172138.GR3036@sequoia.sous-sol.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Steven Rostedt [Thu, 26 Mar 2009 15:43:36 +0000 (11:43 -0400)]
tracing: add size checks for exported ftrace internal structures
The events exported by TRACE_EVENT are automated and are guaranteed
to be correct when used.
The internal ftrace structures on the other hand are more manually
exported. These require the ftrace maintainer to make sure they
are up to date.
This patch adds a size check to help flag when a type changes in
an internal ftrace data structure, and the update needs to be reflected
in the export.
If a export is incorrect, then the only harm is that the user space
tools will not know how to correctly read the internal structures of
ftrace.
[ Impact: help prevent inconsistent ftrace format print outs ]
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Steven Rostedt [Thu, 26 Mar 2009 15:03:29 +0000 (11:03 -0400)]
tracing: increase size of number of possible events
With the new event tracing registration, we must increase the number
of events that can be registered. Currently the type field is only
one byte, which leaves us only 256 possible events.
Since we do not save the CPU number in the tracer anymore (it is determined
by the per cpu ring buffer that is used) we have an extra byte to use.
This patch increases the size of type from 1 byte (256 events) to
2 bytes (65,536 events).
It also adds a WARN_ON_ONCE if we exceed that limit.
[ Impact: allow more than 255 events ]
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Steven Rostedt [Thu, 26 Mar 2009 14:25:24 +0000 (10:25 -0400)]
tracing/wakeup: move access to wakeup_cpu into spinlock
The code had the following outside the lock:
if (next != wakeup_task)
return;
pc = preempt_count();
/* The task we are waiting for is waking up */
data = wakeup_trace->data[wakeup_cpu];
On initialization, wakeup_task is NULL and wakeup_cpu -1. This code
is not under a lock. If wakeup_task is set on another CPU as that
task is waking up, we can see the wakeup_task before wakeup_cpu is
set. If we read wakeup_cpu while it is still -1 then we will have
a bad data pointer.
This patch moves the reading of wakeup_cpu within the protection of
the spinlock used to protect the writing of wakeup_cpu and wakeup_task.
[ Impact: remove possible race causing invalid pointer dereference ]
Reported-by: Maneesh Soni <maneesh@in.ibm.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Frederic Weisbecker [Tue, 21 Apr 2009 22:41:09 +0000 (00:41 +0200)]
tracing/events: protect __get_str()
The __get_str() macro is used in a code part then its content should be
protected with parenthesis.
[ Impact: make macro definition more robust ]
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Frederic Weisbecker [Sun, 19 Apr 2009 02:54:49 +0000 (04:54 +0200)]
tracing/lock: provide lock_acquired event support for dynamic size string
Now that we can support the dynamic sized string, make the lock tracing
able to use it, making it safe against modules removal and consuming
the right amount of memory needed for each lock name
Changes in v2:
adapt to the __ending_string() updates and the opening_string() removal.
[ Impact: protect lock tracer against module removal ]
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Steven Rostedt <rostedt@goodmis.org>
Frederic Weisbecker [Sun, 19 Apr 2009 02:51:29 +0000 (04:51 +0200)]
tracing/events: provide string with undefined size support
This patch provides the support for dynamic size strings on
event tracing.
The key concept is to use a structure with an ending char array field of
undefined size and use such ability to allocate the minimal size on the
ring buffer to make one or more string entries fit inside, as opposite
to a fixed length strings with upper bound.
The strings themselves are represented using fields which have an offset
value from the beginning of the entry.
This patch provides three new macros:
__string(item, src)
This one declares a string to the structure inside TP_STRUCT__entry.
You need to provide the name of the string field and the source that will
be copied inside.
This will also add the dynamic size of the string needed for the ring
buffer entry allocation.
A stack allocated structure is used to temporarily store the offset
of each strings, avoiding double calls to strlen() on each event
insertion.
__get_str(field)
This one will give you a pointer to the string you have created. This
is an abstract helper to resolve the absolute address given the field
name which is a relative address from the beginning of the trace_structure.
__assign_str(dst, src)
Use this macro to automatically perform the string copy from src to
dst. src must be a variable to assign and dst is the name of a __string
field.
Example on how to use it:
TRACE_EVENT(my_event,
TP_PROTO(char *src1, char *src2),
TP_ARGS(src1, src2),
TP_STRUCT__entry(
__string(str1, src1)
__string(str2, src2)
),
TP_fast_assign(
__assign_str(str1, src1);
__assign_str(str2, src2);
),
TP_printk("%s %s", __get_str(src1), __get_str(src2))
)
Of course you can mix-up any __field or __array inside this
TRACE_EVENT. The position of the __string or __assign_str
doesn't matter.
Changes in v2:
Address the suggestion of Steven Rostedt: drop the opening_string() macro
and redefine __ending_string() to get the size of the string to be copied
instead of overwritting the whole ring buffer allocation.
Changes in v3:
Address other suggestions of Steven Rostedt and Peter Zijlstra with
some changes: drop the __ending_string and the need to have only one
string field.
Use offsets instead of absolute addresses.
[ Impact: allow more compact memory usage for string tracing ]
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Li Zefan [Wed, 22 Apr 2009 08:53:34 +0000 (16:53 +0800)]
tracing/events: make struct trace_entry->type to be int type
struct trace_entry->type is unsigned char, while trace event's id is
int type, thus for a event with id >= 256, it's entry->type is cast
to (id % 256), and then we can't see the trace output of this event.
# insmod trace-events-sample.ko
# echo foo_bar > /mnt/tracing/set_event
# cat /debug/tracing/events/trace-events-sample/foo_bar/id
256
# cat /mnt/tracing/trace_pipe
<...>-3548 [001] 215.091142: Unknown type 0
<...>-3548 [001] 216.089207: Unknown type 0
<...>-3548 [001] 217.087271: Unknown type 0
<...>-3548 [001] 218.085332: Unknown type 0
[ Impact: fix output for trace events with id >= 256 ]
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tom Zanussi <tzanussi@gmail.com>
LKML-Reference: <
49EEDB0E.
5070207@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Steven Rostedt [Tue, 21 Apr 2009 13:41:26 +0000 (09:41 -0400)]
ring-buffer: only warn on wrap if buffer is bigger than two pages
On boot up, to save memory, ftrace allocates the minimum buffer
which is two pages. Ftrace also goes through a series of tests
(when configured) on boot up. These tests can fill up a page within
a single interrupt.
The ring buffer also has a WARN_ON when it detects that the buffer was
completely filled within a single commit (other commits are allowed to
be nested).
Combine the small buffer on start up, with the tests that can fill more
than a single page within an interrupt, this can trigger the WARN_ON.
This patch makes the WARN_ON only happen when the ring buffer consists
of more than two pages.
[ Impact: prevent false WARN_ON in ftrace startup tests ]
Reported-by: Ingo Molnar <mingo@elte.hu>
LKML-Reference: <
20090421094616.GA14561@elte.hu>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Li Zefan [Tue, 21 Apr 2009 09:12:11 +0000 (17:12 +0800)]
tracing/filters: allow user-input to be integer-like string
Suppose we would like to trace all tasks named '123', but this
will fail:
# echo 'parent_comm == 123' > events/sched/sched_process_fork/filter
bash: echo: write error: Invalid argument
Don't guess the type of the filter pred in filter_parse(), but instead
we check it in __filter_add_pred().
[ Impact: extend allowed filter field string values ]
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <
49ED8DEB.
6000700@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Li Zefan [Tue, 21 Apr 2009 09:11:46 +0000 (17:11 +0800)]
tracing/filters: don't remove old filters when failed to write subsys->filter
If writing subsys->filter returns EINVAL or ENOSPC, the original
filters in subsys/ and subsys/events/ will be removed. This is
definitely wrong.
[ Impact: fix filter setting semantics on error condition ]
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <
49ED8DD2.
2070700@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Steven Rostedt [Mon, 20 Apr 2009 22:16:44 +0000 (18:16 -0400)]
tracing: use nowakeup version of commit for function event trace tests
The startup tests for the event tracer also runs with the function
tracer enabled. The "wakeup" version of the trace commit was used
which can grab spinlocks. If a task was preempted by an NMI
that called a function being traced, it could deadlock due to the
function tracer trying to grab the same lock.
Thanks to Frederic Weisbecker for pointing out where the bug was.
Reported-by: Ingo Molnar <mingo@elte.hu>
Reported-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Steven Rostedt [Mon, 20 Apr 2009 20:16:11 +0000 (16:16 -0400)]
tracing: use recursive counter over irq level
Althought using the irq level (hardirq_count, softirq_count and in_nmi)
was nice to detect bad recursion right away, but since the counters are
not atomically updated with respect to the interrupts, the function tracer
might trigger the test from an interrupt handler before the hardirq_count
is updated. This will trigger a false warning.
This patch converts the recursive detection to a simple counter.
If the depth is greater than 16 then the recursive detection will trigger.
16 is more than enough for any nested interrupts.
[ Impact: fix false positive trace recursion detection ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Steven Rostedt [Mon, 20 Apr 2009 17:32:44 +0000 (13:32 -0400)]
tracing: remove recursive test from ring_buffer_event_discard
The ring_buffer_event_discard is not tied to ring_buffer_lock_reserve.
It can be called inside or outside the reserve/commit. Even if it
is called inside the reserve/commit the commit part must also be called.
Only ring_buffer_discard_commit can be used as a replacement for
ring_buffer_unlock_commit.
This patch removes the trace_recursive_unlock from ring_buffer_event_discard
since it would be the wrong place to do so.
[Impact: prevent breakage in trace recursive testing ]
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Steven Rostedt [Mon, 20 Apr 2009 17:24:21 +0000 (13:24 -0400)]
tracing: fix recursive test level calculation
The recursive tests to detect same level recursion in the ring buffers
did not account for the hard/softirq_counts to be shifted. Thus the
numbers could be larger than then mask to be tested.
This patch includes the shift for the calculation of the irq depth.
[ Impact: stop false positives in trace recursion detection ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Steven Rostedt [Mon, 20 Apr 2009 16:59:29 +0000 (12:59 -0400)]
tracing: remove dangling semicolon
Due to a cut and paste error, the trace_seq_putc had a semicolon
after the prototype but before the stub function when tracing is
disabled.
[Impact: fix compile error ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Steven Rostedt [Mon, 20 Apr 2009 16:12:44 +0000 (12:12 -0400)]
tracing/events: call the correct event trace selftest init function
The late_initcall calls a helper function instead of the proper
init event selftest function.
This update may have been lost due to conflicting merges.
[ Impact: fix compiler warning and call extended event trace self tests ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Steven Rostedt [Mon, 20 Apr 2009 14:59:34 +0000 (10:59 -0400)]
tracing: rename EVENT_TRACER config to ENABLE_EVENT_TRACING
Currently we have two configs: EVENT_TRACING and EVENT_TRACER.
All tracers enable EVENT_TRACING. The EVENT_TRACER is only a
convenience to enable the EVENT_TRACING when no other tracers
are enabled.
The names EVENT_TRACER and EVENT_TRACING are too similar and confusing.
This patch renames EVENT_TRACER to ENABLE_EVENT_TRACING to be more
appropriate to what it actually does, as well as add a comment in
the help menu to explain the option's purpose.
[ Impact: rename config option to reduce confusion ]
Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Steven Rostedt [Mon, 20 Apr 2009 14:47:36 +0000 (10:47 -0400)]
tracing: create menuconfig for tracing infrastructure
During testing we often use randconfig to test various kernels.
The current configuration set up does not give an easy way to disable
all tracing with a single config. The case where randconfig would
test all tracing disabled is very unlikely.
This patch adds a config option to enable or disable all tracing.
It is hooked into the tracing menu just like other submenus are done.
[ Impact: allow randconfig to easily produce all traces disabled ]
Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Steven Rostedt [Mon, 20 Apr 2009 14:27:58 +0000 (10:27 -0400)]
tracing: change branch profiling to a choice selection
This patch makes the branch profiling into a choice selection:
None - no branch profiling
likely/unlikely - only profile likely/unlikely branches
all - profile all branches
The all profiler will also enable the likely/unlikely branches.
This does not change the way the profiler works or the dependencies
between the profilers.
What this patch does, is keep the branch profiling from being selected
by an allyesconfig make. The branch profiler is very intrusive and
it is known to break various architecture builds when selected as an
allyesconfig.
[ Impact: prevent branch profiler from being selected in allyesconfig ]
Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Frederic Weisbecker [Sun, 19 Apr 2009 21:39:33 +0000 (23:39 +0200)]
tracing/ring-buffer: Add unlock recursion protection on discard
The pair of helpers trace_recursive_lock() and trace_recursive_unlock()
have been introduced recently to provide generic tracing recursion
protection.
They are used in a symetric way:
- trace_recursive_lock() on buffer reserve
- trace_recursive_unlock() on buffer commit
However sometimes, we don't commit but discard on entry
to the buffer, ie: in case of filter checking.
Then we must also unlock the recursion protection on discard time,
otherwise the tracing gets definitely deactivated and a warning
is raised spuriously, such as:
111.119821] ------------[ cut here ]------------
[ 111.119829] WARNING: at kernel/trace/ring_buffer.c:1498 ring_buffer_lock_reserve+0x1b7/0x1d0()
[ 111.119835] Hardware name: AMILO Li 2727
[ 111.119839] Modules linked in:
[ 111.119846] Pid: 5731, comm: Xorg Tainted: G W 2.6.30-rc1 #69
[ 111.119851] Call Trace:
[ 111.119863] [<
ffffffff8025ce68>] warn_slowpath+0xd8/0x130
[ 111.119873] [<
ffffffff8028a30f>] ? __lock_acquire+0x19f/0x1ae0
[ 111.119882] [<
ffffffff8028a30f>] ? __lock_acquire+0x19f/0x1ae0
[ 111.119891] [<
ffffffff802199b0>] ? native_sched_clock+0x20/0x70
[ 111.119899] [<
ffffffff80286dee>] ? put_lock_stats+0xe/0x30
[ 111.119906] [<
ffffffff80286eb8>] ? lock_release_holdtime+0xa8/0x150
[ 111.119913] [<
ffffffff802c8ae7>] ring_buffer_lock_reserve+0x1b7/0x1d0
[ 111.119921] [<
ffffffff802cd110>] trace_buffer_lock_reserve+0x30/0x70
[ 111.119930] [<
ffffffff802ce000>] trace_current_buffer_lock_reserve+0x20/0x30
[ 111.119939] [<
ffffffff802474e8>] ftrace_raw_event_sched_switch+0x58/0x100
[ 111.119948] [<
ffffffff808103b7>] __schedule+0x3a7/0x4cd
[ 111.119957] [<
ffffffff80211b56>] ? ftrace_call+0x5/0x2b
[ 111.119964] [<
ffffffff80211b56>] ? ftrace_call+0x5/0x2b
[ 111.119971] [<
ffffffff80810c08>] schedule+0x18/0x40
[ 111.119977] [<
ffffffff80810e09>] preempt_schedule+0x39/0x60
[ 111.119985] [<
ffffffff80813bd3>] _read_unlock+0x53/0x60
[ 111.119993] [<
ffffffff807259d2>] sock_def_readable+0x72/0x80
[ 111.120002] [<
ffffffff807ad5ed>] unix_stream_sendmsg+0x24d/0x3d0
[ 111.120011] [<
ffffffff807219a3>] sock_aio_write+0x143/0x160
[ 111.120019] [<
ffffffff80211b56>] ? ftrace_call+0x5/0x2b
[ 111.120026] [<
ffffffff80721860>] ? sock_aio_write+0x0/0x160
[ 111.120033] [<
ffffffff80721860>] ? sock_aio_write+0x0/0x160
[ 111.120042] [<
ffffffff8031c283>] do_sync_readv_writev+0xf3/0x140
[ 111.120049] [<
ffffffff80211b56>] ? ftrace_call+0x5/0x2b
[ 111.120057] [<
ffffffff80276ff0>] ? autoremove_wake_function+0x0/0x40
[ 111.120067] [<
ffffffff8045d489>] ? cap_file_permission+0x9/0x10
[ 111.120074] [<
ffffffff8045c1e6>] ? security_file_permission+0x16/0x20
[ 111.120082] [<
ffffffff8031cab4>] do_readv_writev+0xd4/0x1f0
[ 111.120089] [<
ffffffff80211b56>] ? ftrace_call+0x5/0x2b
[ 111.120097] [<
ffffffff80211b56>] ? ftrace_call+0x5/0x2b
[ 111.120105] [<
ffffffff8031cc18>] vfs_writev+0x48/0x70
[ 111.120111] [<
ffffffff8031cd65>] sys_writev+0x55/0xc0
[ 111.120119] [<
ffffffff80211e32>] system_call_fastpath+0x16/0x1b
[ 111.120125] ---[ end trace
15605f4e98d5ccb5 ]---
[ Impact: fix spurious warning triggering tracing shutdown ]
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Frederic Weisbecker [Sun, 19 Apr 2009 21:38:12 +0000 (23:38 +0200)]
tracing/core: Add current context on tracing recursion warning
In case of tracing recursion detection, we only get the stacktrace.
But the current context may be very useful to debug the issue.
This patch adds the softirq/hardirq/nmi context with the warning
using lockdep context display to have a familiar output.
v2: Use printk_once()
v3: drop {hardirq,softirq}_context which depend on lockdep,
only keep what is part of current->trace_recursion,
sufficient to debug the warning source.
[ Impact: print context necessary to debug recursion ]
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Steven Rostedt [Fri, 17 Apr 2009 21:17:55 +0000 (17:17 -0400)]
tracing: remove format attribute of inline function
Due to a cut and paste error, I added the gcc attribute for printf
format to the static inline stub of trace_seq_printf.
This will cause a compile failure.
[ Impact: fix compiler error when CONFIG_TRACING is off ]
Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: =?ISO-8859-15?Q?Fr=E9d=E9ric_Weisbecker?= <fweisbec@gmail.com>
LKML-Reference: <alpine.DEB.2.00.
0904171717080.1016@gandalf.stny.rr.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Steven Rostedt [Fri, 17 Apr 2009 20:13:55 +0000 (16:13 -0400)]
tracing: protect trace_printk from recursion
trace_printk can be called from any context, including NMIs.
If this happens, then we must test for for recursion before
grabbing any spinlocks.
This patch prevents trace_printk from being called recursively.
[ Impact: prevent hard lockup in lockdep event tracer ]
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Steven Rostedt [Fri, 17 Apr 2009 01:41:52 +0000 (21:41 -0400)]
tracing: add same level recursion detection
The tracing infrastructure allows for recursion. That is, an interrupt
may interrupt the act of tracing an event, and that interrupt may very well
perform its own trace. This is a recursive trace, and is fine to do.
The problem arises when there is a bug, and the utility doing the trace
calls something that recurses back into the tracer. This recursion is not
caused by an external event like an interrupt, but by code that is not
expected to recurse. The result could be a lockup.
This patch adds a bitmask to the task structure that keeps track
of the trace recursion. To find the interrupt depth, the following
algorithm is used:
level = hardirq_count() + softirq_count() + in_nmi;
Here, level will be the depth of interrutps and softirqs, and even handles
the nmi. Then the corresponding bit is set in the recursion bitmask.
If the bit was already set, we know we had a recursion at the same level
and we warn about it and fail the writing to the buffer.
After the data has been committed to the buffer, we clear the bit.
No atomics are needed. The only races are with interrupts and they reset
the bitmask before returning anywy.
[ Impact: detect same irq level trace recursion ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Steven Rostedt [Fri, 17 Apr 2009 20:01:56 +0000 (16:01 -0400)]
tracing: add EXPORT_SYMBOL_GPL for trace commits
Not all the necessary symbols were exported to allow for tracing
by modules. This patch adds them in.
[ Impact: allow modules to commit data to the ring buffer ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Steven Rostedt [Fri, 17 Apr 2009 17:02:22 +0000 (13:02 -0400)]
tracing/events: enable code with EVENT_TRACING not EVENT_TRACER
The CONFIG_EVENT_TRACER is the way to turn on event tracing when no
other tracing has been configured. All code to get enabled should
depend on CONFIG_EVENT_TRACING. That is what is enabled when TRACING
(or CONFIG_EVENT_TRACER) is selected.
This patch enables the include/trace/ftrace.h file when
CONFIG_EVENT_TRACING is enabled.
[ Impact: fix warning in event tracer selftest ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Tom Zanussi [Fri, 17 Apr 2009 05:27:08 +0000 (00:27 -0500)]
tracing/filters: add filter_mutex to protect filter predicates
This patch adds a filter_mutex to prevent the filter predicates from
being accessed concurrently by various external functions.
It's based on a previous patch by Li Zefan:
"[PATCH 7/7] tracing/filters: make filter preds RCU safe"
v2 changes:
- fixed wrong value returned in a add_subsystem_pred() failure case
noticed by Li Zefan.
[ Impact: fix trace filter corruption/crashes on parallel access ]
Signed-off-by: Tom Zanussi <tzanussi@gmail.com>
Reviewed-by: Li Zefan <lizf@cn.fujitsu.com>
Tested-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: paulmck@linux.vnet.ibm.com
LKML-Reference: <
1239946028.6639.13.camel@tropicana>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Zhaolei [Fri, 17 Apr 2009 02:53:43 +0000 (10:53 +0800)]
tracing: Remove include/trace/kmem_event_types.h
kmem_event_types.h is no longer necessary since tracepoint definitions
are put into include/trace/events/kmem.h
[ Impact: remove now-unused file. ]
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
LKML-Reference: <
49E7EF37.
2080205@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Li Zefan [Fri, 17 Apr 2009 02:34:30 +0000 (10:34 +0800)]
tracing: fix file mode of trace and README
trace is read-write and README is read-only.
[ Impact: fix /debug/tracing/ file permissions. ]
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <
49E7EAB6.
4070605@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Jeremy Fitzhardinge [Fri, 17 Apr 2009 06:35:39 +0000 (23:35 -0700)]
tracing: avoid warnings from zero-arg tracepoints
Tracepoints with no arguments can issue two warnings:
"field" defined by not used
"ret" is uninitialized in this function
Mark field as being OK to leave unused, and initialize ret.
[ Impact: fix false positive compiler warnings. ]
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: mathieu.desnoyers@polymtl.ca
LKML-Reference: <
1239950139-1119-5-git-send-email-jeremy@goop.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Steven Rostedt [Thu, 16 Apr 2009 16:15:44 +0000 (12:15 -0400)]
tracing/events: perform function tracing in event selftests
We can find some bugs in the trace events if we stress the writes as well.
The function tracer is a good way to stress the events.
[ Impact: extend scope of event tracer self-tests ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <
20090416161746.
604786131@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Avadh Patel [Fri, 10 Apr 2009 20:04:48 +0000 (16:04 -0400)]
tracing: add saved_cmdlines file to show cached task comms
Export the cached task comms to userspace. This allows user apps to translate
the pids from a trace into their respective task command lines.
[ Impact: let userspace apps reading binary buffer know comm's of pids ]
Signed-off-by: Avadh Patel <avadh4all@gmail.com>
[ added error checking and use of buf pointer to index file_buf ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Steven Rostedt [Wed, 15 Apr 2009 20:53:47 +0000 (16:53 -0400)]
tracing/events/ring-buffer: expose format of ring buffer headers to users
Currently, every thing needed to read the binary output from the
ring buffers is available, with the exception of the way the ring
buffers handles itself internally.
This patch creates two special files in the debugfs/tracing/events
directory:
# cat /debug/tracing/events/header_page
field: u64 timestamp; offset:0; size:8;
field: local_t commit; offset:8; size:8;
field: char data; offset:16; size:4080;
# cat /debug/tracing/events/header_event
type : 2 bits
len : 3 bits
time_delta : 27 bits
array : 32 bits
padding : type == 0
time_extend : type == 1
data : type == 3
This is to allow a userspace app to see if the ring buffer format changes
or not.
[ Impact: allow userspace apps to know of ringbuffer format changes ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Steven Rostedt [Wed, 15 Apr 2009 17:36:40 +0000 (13:36 -0400)]
tracing/events: add startup tests for events
As events start to become popular, and the new way to add tracing
infrastructure into ftrace, it is important to catch any problems
that might happen with a mistake in the TRACE_EVENT macro.
This patch introduces a startup self test on the registered trace
events. Note, it can only do a generic test, any type of testing that
needs more involement is needed to be implemented by the tracepoint
creators.
The test goes down one by one enabling a trace point and running
some random tasks (random in the sense that I just made them up).
Those tasks are creating threads, grabbing mutexes and spinlocks
and using workqueues.
After testing each event individually, it does the same test after
enabling each system of trace points. Like sched, irq, lockdep.
Then finally it enables all tracepoints and performs the tasks again.
The output to the console on bootup will look like this when everything
works:
Running tests on trace events:
Testing event kfree_skb: OK
Testing event kmalloc: OK
Testing event kmem_cache_alloc: OK
Testing event kmalloc_node: OK
Testing event kmem_cache_alloc_node: OK
Testing event kfree: OK
Testing event kmem_cache_free: OK
Testing event irq_handler_exit: OK
Testing event irq_handler_entry: OK
Testing event softirq_entry: OK
Testing event softirq_exit: OK
Testing event lock_acquire: OK
Testing event lock_release: OK
Testing event sched_kthread_stop: OK
Testing event sched_kthread_stop_ret: OK
Testing event sched_wait_task: OK
Testing event sched_wakeup: OK
Testing event sched_wakeup_new: OK
Testing event sched_switch: OK
Testing event sched_migrate_task: OK
Testing event sched_process_free: OK
Testing event sched_process_exit: OK
Testing event sched_process_wait: OK
Testing event sched_process_fork: OK
Testing event sched_signal_send: OK
Running tests on trace event systems:
Testing event system skb: OK
Testing event system kmem: OK
Testing event system irq: OK
Testing event system lockdep: OK
Testing event system sched: OK
Running tests on all trace events:
Testing all events: OK
[ folded in:
tracing: add #include <linux/delay.h> to fix build failure in test_work()
This build failure occured on a few rare configs:
kernel/trace/trace_events.c: In function ‘test_work’:
kernel/trace/trace_events.c:975: error: implicit declaration of function ‘udelay’
kernel/trace/trace_events.c:980: error: implicit declaration of function ‘msleep’
delay.h is included in way too many other headers, hiding cases
where new usage is added without header inclusion.
[ Impact: build fix ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
]
[ Impact: add event tracer self-tests ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Steven Rostedt [Wed, 15 Apr 2009 17:24:06 +0000 (13:24 -0400)]
ftrace: use module notifier for function tracer
The hooks in the module code for the function tracer must be called
before any of that module code runs. The function tracer hooks
modify the module (replacing calls to mcount to nops). If the code
is executed while the change occurs, then the CPU can take a GPF.
To handle the above with a bit of paranoia, I originally implemented
the hooks as calls directly from the module code.
After examining the notifier calls, it looks as though the start up
notify is called before any of the module's code is executed. This makes
the use of the notify safe with ftrace.
Only the startup notify is required to be "safe". The shutdown simply
removes the entries from the ftrace function list, and does not modify
any code.
This change has another benefit. It removes a issue with a reverse dependency
in the mutexes of ftrace_lock and module_mutex.
[ Impact: fix lock dependency bug, cleanup ]
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Li Zefan [Wed, 15 Apr 2009 03:02:56 +0000 (11:02 +0800)]
blktrace: fix context-info when mixed-using blk tracer and trace events
When current tracer is set to blk tracer, TRACE_ITER_CONTEXT_INFO is
unset, but actually context-info is printed:
pdflush-431 [000] 821.181576: 8,0 P N [pdflush]
And then if we enable TRACE_ITER_CONTEXT_INFO:
# echo context-info > trace_options
We'll see context-info printed twice. What's worse, when we use blk
tracer and trace events at the same time, we'll see no context-info
for trace events at all:
jbd2_commit_logging: dev dm-0:8 transaction 333227
jbd2_end_commit: dev dm-0:8 transaction 333227 head 332814
rm-25433 [001] 9578.307485: 8,18 m N cfq25433 slice expired t=0
rm-25433 [001] 9578.307486: 8,18 m N cfq25433 put_queue
This patch adds blk_tracer->set_flags(), and context-info flag is unset
only when we set the output to classic mode.
Note after this patch, one should unset context-info explicitly if he
wants to get binary output that can be parsed by blkparse:
# echo nocontext-info > trace_options
# echo bin > trace_options
# echo blk > current_tracer
# cat trace_pipe | blkparse -i -
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <
49E54E60.50408@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Li Zefan [Tue, 14 Apr 2009 06:00:05 +0000 (14:00 +0800)]
blktrace: add trace/ to /sys/block/sda
Impact: allow ftrace-plugin blktrace to trace device-mapper devices
To trace a single partition:
# echo 1 > /sys/block/sda/sda1/enable
To trace the whole sda instead:
# echo 1 > /sys/block/sda/enable
Thus we also fix an issue reported by Ted, that ftrace-plugin blktrace
can't be used to trace device-mapper devices.
Now:
# echo 1 > /sys/block/dm-0/trace/enable
echo: write error: No such device or address
# mount -t ext4 /dev/dm-0 /mnt
# echo 1 > /sys/block/dm-0/trace/enable
# echo blk > /debug/tracing/current_tracer
Reported-by: Theodore Tso <tytso@mit.edu>
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Shawn Du <duyuyang@gmail.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
LKML-Reference: <
49E42665.
6020506@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Li Zefan [Tue, 14 Apr 2009 05:59:34 +0000 (13:59 +0800)]
blktrace: support per-partition tracing for ftrace plugin
The previous patch adds support to trace a single partition for
relay+ioctl blktrace, and this patch is for ftrace plugin blktrace:
# echo 1 > /sys/block/sda/sda7/enable
# cat start_lba
102398373
# cat end_lba
102703545
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Shawn Du <duyuyang@gmail.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
LKML-Reference: <
49E42646.
4060608@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Shawn Du [Tue, 14 Apr 2009 05:58:56 +0000 (13:58 +0800)]
blktrace: support per-partition tracing
Though one can specify '-d /dev/sda1' when using blktrace, it still
traces the whole sda.
To support per-partition tracing, when we start tracing, we initialize
bt->start_lba and bt->end_lba to the start and end sector of that
partition.
Note some actions are per device, thus we don't filter 0-sector events.
The original patch and discussion can be found here:
http://marc.info/?l=linux-btrace&m=
122949374214540&w=2
Signed-off-by: Shawn Du <duyuyang@gmail.com>
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
LKML-Reference: <
49E42620.
4050701@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Steven Rostedt [Wed, 15 Apr 2009 01:37:03 +0000 (21:37 -0400)]
tracing/events: add trace-events-sample
This patch adds a sample to the samples directory on how to create
and use TRACE_EVENT trace points.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Steven Rostedt [Tue, 14 Apr 2009 23:39:12 +0000 (19:39 -0400)]
tracing/events: move trace point headers into include/trace/events
Impact: clean up
Create a sub directory in include/trace called events to keep the
trace point headers in their own separate directory. Only headers that
declare trace points should be defined in this directory.
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Zhao Lei <zhaolei@cn.fujitsu.com>
Cc: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Steven Rostedt [Tue, 14 Apr 2009 22:49:38 +0000 (18:49 -0400)]
tracing/events: fix lockdep system name
Impact: fix compile error of lockdep event tracer
Ingo Molnar pointed out that the system name for the lockdep tracer was "lock"
which is used to include the event trace file name. It should be "lockdep"
Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Steven Rostedt [Tue, 14 Apr 2009 22:22:32 +0000 (18:22 -0400)]
tracing/events: fix compile for modules disabled
Impact: compile fix
The addition of TRACE_EVENT for modules breaks the build for when
modules are disabled. This code fixes that.
Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Steven Rostedt [Fri, 10 Apr 2009 18:53:50 +0000 (14:53 -0400)]
tracing/events: add support for modules to TRACE_EVENT
Impact: allow modules to add TRACE_EVENTS on load
This patch adds the final hooks to allow modules to use the TRACE_EVENT
macro. A notifier and a data structure are used to link the TRACE_EVENTs
defined in the module to connect them with the ftrace event tracing system.
It also adds the necessary automated clean ups to the trace events when a
module is removed.
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Steven Rostedt [Fri, 10 Apr 2009 22:12:50 +0000 (18:12 -0400)]
tracing/events: add export symbols for trace events in modules
Impact: let modules add trace events
The trace event code requires some functions to be exported to allow
modules to use TRACE_EVENT. This patch adds EXPORT_SYMBOL_GPL to the
necessary functions.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Steven Rostedt [Fri, 10 Apr 2009 17:52:20 +0000 (13:52 -0400)]
tracing/events: convert event call sites to use a link list
Impact: makes it possible to define events in modules
The events are created by reading down the section that they are linked
in by the macros. But this is not scalable to modules. This patch converts
the manipulations to use a global link list, and on boot up it adds
the items in the section to the list.
This change will allow modules to add their tracing events to the list as
well.
Note, this change alone does not permit modules to use the TRACE_EVENT macros,
but the change is needed for them to eventually do so.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Steven Rostedt [Mon, 13 Apr 2009 16:25:37 +0000 (12:25 -0400)]
tracing/events: move the ftrace event tracing code to core
This patch moves the ftrace creation into include/trace/ftrace.h and
simplifies the work of developers in adding new tracepoints.
Just the act of creating the trace points in include/trace and including
define_trace.h will create the events in the debugfs/tracing/events
directory.
This patch removes the need of include/trace/trace_events.h
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Steven Rostedt [Mon, 13 Apr 2009 15:20:49 +0000 (11:20 -0400)]
tracing/events: move declarations from trace directory to core include
In preparation to allowing trace events to happen in modules, we need
to move some of the local declarations in the kernel/trace directory
into include/linux.
This patch simply moves the declarations and performs no context changes.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Steven Rostedt [Sat, 11 Apr 2009 16:59:57 +0000 (12:59 -0400)]
tracing: make trace_seq operations available for core kernel
In the process to make TRACE_EVENT macro work for modules, the trace_seq
operations must be available for core kernel code.
These operations are quite useful and can be used for other implementations.
The main idea is that we create a trace_seq handle that acts very much
like the seq_file handle.
struct trace_seq *s = kmalloc(sizeof(*s, GFP_KERNEL);
trace_seq_init(s);
trace_seq_printf(s, "some data %d\n", variable);
printk("%s", s->buffer);
The main use is to allow a top level function call several other functions
that may store printf like data into the buffer. Then at the end, the top
level function can process all the data with any method it would like to.
It could be passed to userspace, output via printk or even use seq_file:
trace_seq_to_user(s, ubuf, cnt);
seq_puts(m, s->buffer);
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Steven Rostedt [Fri, 10 Apr 2009 13:36:00 +0000 (09:36 -0400)]
tracing: create automated trace defines
This patch lowers the number of places a developer must modify to add
new tracepoints. The current method to add a new tracepoint
into an existing system is to write the trace point macro in the
trace header with one of the macros TRACE_EVENT, TRACE_FORMAT or
DECLARE_TRACE, then they must add the same named item into the C file
with the macro DEFINE_TRACE(name) and then add the trace point.
This change cuts out the needing to add the DEFINE_TRACE(name).
Every file that uses the tracepoint must still include the trace/<type>.h
file, but the one C file must also add a define before the including
of that file.
#define CREATE_TRACE_POINTS
#include <trace/mytrace.h>
This will cause the trace/mytrace.h file to also produce the C code
necessary to implement the trace point.
Note, if more than one trace/<type>.h is used to create the C code
it is best to list them all together.
#define CREATE_TRACE_POINTS
#include <trace/foo.h>
#include <trace/bar.h>
#include <trace/fido.h>
Thanks to Mathieu Desnoyers and Christoph Hellwig for coming up with
the cleaner solution of the define above the includes over my first
design to have the C code include a "special" header.
This patch converts sched, irq and lockdep and skb to use this new
method.
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Zhao Lei <zhaolei@cn.fujitsu.com>
Cc: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Steven Rostedt [Fri, 10 Apr 2009 12:54:16 +0000 (08:54 -0400)]
tracing: consolidate trace and trace_event headers
Impact: clean up
Neil Horman (et. al.) criticized the way the trace events were broken up
into two files. The reason for that was that ftrace needed to separate out
the declarations from where the #include <linux/tracepoint.h> was used.
It then dawned on me that the tracepoint.h header only needs to define the
TRACE_EVENT macro if it is not already defined.
The solution is simply to test if TRACE_EVENT is defined, and if it is not
then the linux/tracepoint.h header can define it. This change consolidates
all the <traces>.h and <traces>_event_types.h into the <traces>.h file.
Reported-by: Neil Horman <nhorman@tuxdriver.com>
Reported-by: Theodore Tso <tytso@mit.edu>
Reported-by: Jiaying Zhang <jiayingz@google.com>
Cc: Zhaolei <zhaolei@cn.fujitsu.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Tom Zanussi [Mon, 13 Apr 2009 08:17:50 +0000 (03:17 -0500)]
tracing/filters: allow on-the-fly filter switching
This patch allows event filters to be safely removed or switched
on-the-fly while avoiding the use of rcu or the suspension of tracing of
previous versions.
It does it by adding a new filter_pred_none() predicate function which
does nothing and by never deallocating either the predicates or any of
the filter_pred members used in matching; the predicate lists are
allocated and initialized during ftrace_event_calls initialization.
Whenever a filter is removed or replaced, the filter_pred_* functions
currently in use by the affected ftrace_event_call are immediately
switched over to to the filter_pred_none() function, while the rest of
the filter_pred members are left intact, allowing any currently
executing filter_pred_* functions to finish up, using the values they're
currently using.
In the case of filter replacement, the new predicate values are copied
into the old predicates after the above step, and the filter_pred_none()
functions are replaced by the filter_pred_* functions for the new
filter. In this case, it is possible though very unlikely that a
previous filter_pred_* is still running even after the
filter_pred_none() switch and the switch to the new filter_pred_*. In
that case, however, because nothing has been deallocated in the
filter_pred, the worst that can happen is that the old filter_pred_*
function sees the new values and as a result produces either a false
positive or a false negative, depending on the values it finds.
So one downside to this method is that rarely, it can produce a bad
match during the filter switch, but it should be possible to live with
that, IMHO.
The other downside is that at least in this patch the predicate lists
are always pre-allocated, taking up memory from the start. They could
probably be allocated on first-use, and de-allocated when tracing is
completely stopped - if this patch makes sense, I could create another
one to do that later on.
Oh, and it also places a restriction on the size of __arrays in events,
currently set to 128, since they can't be larger than the now embedded
str_val arrays in the filter_pred struct.
Signed-off-by: Tom Zanussi <tzanussi@gmail.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: paulmck@linux.vnet.ibm.com
LKML-Reference: <
1239610670.6660.49.camel@tropicana>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar [Mon, 13 Apr 2009 22:02:16 +0000 (00:02 +0200)]
Merge branch 'linus' into tracing/core
Merge reason: merge latest tracing fixes to avoid conflicts in
kernel/trace/trace_events_filter.c with upcoming change
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Tom Zanussi [Wed, 8 Apr 2009 08:15:54 +0000 (03:15 -0500)]
tracing/filters: use ring_buffer_discard_commit() in filter_check_discard()
This patch changes filter_check_discard() to make use of the new
ring_buffer_discard_commit() function and modifies the current users to
call the old commit function in the non-discard case.
It also introduces a version of filter_check_discard() that uses the
global trace buffer (filter_current_check_discard()) for those cases.
v2 changes:
- fix compile error noticed by Ingo Molnar
Signed-off-by: Tom Zanussi <tzanussi@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: fweisbec@gmail.com
LKML-Reference: <
1239178554.10295.36.camel@tropicana>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Tom Zanussi [Wed, 8 Apr 2009 08:14:01 +0000 (03:14 -0500)]
tracing/infrastructure: separate event tracer from event support
Add a new config option, CONFIG_EVENT_TRACING that gets selected
when CONFIG_TRACING is selected and adds everything needed by the stuff
in trace_export - basically all the event tracing support needed by e.g.
bprint, minus the actual events, which are only included if
CONFIG_EVENT_TRACER is selected.
So CONFIG_EVENT_TRACER can be used to turn on or off the generated events
(what I think of as the 'event tracer'), while CONFIG_EVENT_TRACING turns
on or off the base event tracing support used by both the event tracer and
the other things such as bprint that can't be configured out.
Signed-off-by: Tom Zanussi <tzanussi@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: fweisbec@gmail.com
LKML-Reference: <
1239178441.10295.34.camel@tropicana>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Steven Rostedt [Thu, 2 Apr 2009 05:16:59 +0000 (01:16 -0400)]
tracing/filters: use ring_buffer_discard_commit for discarded events
The ring_buffer_discard_commit makes better usage of the ring_buffer
when an event has been discarded. It tries to remove it completely if
possible.
This patch converts the trace event filtering to use
ring_buffer_discard_commit instead of the ring_buffer_event_discard.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Steven Rostedt [Thu, 2 Apr 2009 04:09:41 +0000 (00:09 -0400)]
ring-buffer: add ring_buffer_discard_commit
The ring_buffer_discard_commit is similar to ring_buffer_event_discard
but it can only be done on an event that has yet to be commited.
Unpredictable results can happen otherwise.
The main difference between ring_buffer_discard_commit and
ring_buffer_event_discard is that ring_buffer_discard_commit will try
to free the data in the ring buffer if nothing has addded data
after the reserved event. If something did, then it acts almost the
same as ring_buffer_event_discard followed by a
ring_buffer_unlock_commit.
Note, either ring_buffer_commit_discard and ring_buffer_unlock_commit
can be called on an event, not both.
This commit also exports both discard functions to be usable by
GPL modules.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Tom Zanussi [Tue, 31 Mar 2009 05:49:16 +0000 (00:49 -0500)]
tracing/filters: add TRACE_EVENT_FORMAT_NOFILTER event macro
Frederic Weisbecker suggested that the trace_special event shouldn't be
filterable; this patch adds a TRACE_EVENT_FORMAT_NOFILTER event macro
that allows an event format to be exported without having a filter
attached, and removes filtering from the trace_special event.
Signed-off-by: Tom Zanussi <tzanussi@gmail.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Tom Zanussi [Tue, 31 Mar 2009 05:48:49 +0000 (00:48 -0500)]
tracing/filters: add run-time field descriptions to TRACE_EVENT_FORMAT events
This patch adds run-time field descriptions to all the event formats
exported using TRACE_EVENT_FORMAT. It also hooks up all the tracers
that use them (i.e. the tracers in the 'ftrace subsystem') so they can
also have their output filtered by the event-filtering mechanism.
When I was testing this, there were a couple of things that fooled me
into thinking the filters weren't working, when actually they were -
I'll mention them here so others don't make the same mistakes (and file
bug reports. ;-)
One is that some of the tracers trace multiple events e.g. the
sched_switch tracer uses the context_switch and wakeup events, and if
you don't set filters on all of the traced events, the unfiltered output
from the events without filters on them can make it look like the
filtering as a whole isn't working properly, when actually it is doing
what it was asked to do - it just wasn't asked to do the right thing.
The other is that for the really high-volume tracers e.g. the function
tracer, the volume of filtered events can be so high that it pushes the
unfiltered events out of the ring buffer before they can be read so e.g.
cat'ing the trace file repeatedly shows either no output, or once in
awhile some output but that isn't there the next time you read the
trace, which isn't what you normally expect when reading the trace file.
If you read from the trace_pipe file though, you can catch them before
they disappear.
Changes from v1:
As suggested by Frederic Weisbecker:
- get rid of externs in functions
- added unlikely() to filter_check_discard()
Signed-off-by: Tom Zanussi <tzanussi@gmail.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Linus Torvalds [Mon, 13 Apr 2009 19:20:01 +0000 (12:20 -0700)]
Merge git://git./linux/kernel/git/sam/kbuild-fixes
* git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild-fixes:
docbook: make cleandocs
kbuild: fix spurious initramfs rebuild
Documentation: explain the difference between __bitwise and __bitwise__
kbuild: make it possible for the linker to discard local symbols from vmlinux
kbuild: remove pointless strdup() on arguments passed to new_module() in modpost
kbuild: fix a few typos in top-level Makefile
kbuild: introduce destination-y for exported headers
kbuild: use git svn instead of git-svn in setlocalversion
kconfig: fix update-po-config to accect backslash in input
kbuild: fix option processing for -I in headerdep
Linus Torvalds [Mon, 13 Apr 2009 18:46:04 +0000 (11:46 -0700)]
Merge branch 'upstream-linus' of git://git./linux/kernel/git/jgarzik/libata-dev
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
ata: fix obviously wrong comment
ahci: force CAP_NCQ for earlier NV MCPs
[libata] sata_via: kill uninit'd var warning
Linus Torvalds [Mon, 13 Apr 2009 18:37:23 +0000 (11:37 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (22 commits)
Input: i8042 - add HP DV9700 to the noloop list
Input: arrange drivers/input/misc/Makefile in alphabetical order
Input: add AD7879 Touchscreen driver
Input: add AD7877 touchscreen driver
Input: bf54x-keys - fix typo in warning
Input: add driver for S1 button of rb532
Input: generic driver for rotary encoders on GPIOs
Input: hilkbd - fix crash when removing hilkbd module
Input: atkbd - add quirk for Fujitsu Siemens Amilo PA 1510
Input: atkbd - consolidate force release quirk setup
Input: add accelerated touchscreen support for Marvell Zylonite
Input: ucb1400_ts, mainstone-wm97xx - add BTN_TOUCH events
Input: wm97xx - use disable_irq_nosync() for Mainstone
Input: wm97xx - add BTN_TOUCH event to wm97xx to use it with Android
Input: fix polling of /proc/bus/input/devices
Input: psmouse - add newline to OLPC HGPK touchpad debugging
Input: ati_remote2 - check module params
Input: ati_remote2 - add per device attrs
Input: ati_remote2 - complete suspend support
Input: stop autorepeat timer on key release
...
Rafael J. Wysocki [Sun, 12 Apr 2009 18:06:56 +0000 (20:06 +0200)]
PM/Hibernate: Wait for SCSI devices scan to complete during resume
There is a race between resume from hibernation and the asynchronous
scanning of SCSI devices and to prevent it from happening we need to
call scsi_complete_async_scans() during resume from hibernation.
In addition, if the resume from hibernation is userland-driven, it's
better to wait for all device probes in the kernel to complete before
attempting to open the resume device.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Mon, 13 Apr 2009 18:35:50 +0000 (11:35 -0700)]
Merge git://git.infradead.org/iommu-2.6
* git://git.infradead.org/iommu-2.6:
intel-iommu: Avoid panic() for DRHD at address zero.
Intel-IOMMU Alignment Issue in dma_pte_clear_range()
Linus Torvalds [Mon, 13 Apr 2009 18:32:09 +0000 (11:32 -0700)]
Merge branch 'x86-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: add linux kernel support for YMM state
x86: fix wrong section of pat_disable & make it static
x86: Fix section mismatches in mpparse
x86: fix set_fixmap to use phys_addr_t
x86: Document get_user_pages_fast()
x86, intr-remap: fix eoi for interrupt remapping without x2apic
Linus Torvalds [Mon, 13 Apr 2009 18:31:28 +0000 (11:31 -0700)]
Merge branch 'tracing-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
tracing/filters: return proper error code when writing filter file
tracing/filters: allow user input integer to be oct or hex
tracing/filters: fix NULL pointer dereference
tracing/filters: NIL-terminate user input filter
ftrace: Output REC->var instead of __entry->var for trace format
Make __stringify support variable argument macros too
tracing: fix document references
tracing: fix splice return too large
tracing: update file->f_pos when splice(2) it
tracing: allocate page when needed
tracing: disable seeking for trace_pipe_raw
Linus Torvalds [Mon, 13 Apr 2009 18:30:26 +0000 (11:30 -0700)]
Merge branch 'core-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
lockdep: continue lock debugging despite some taints
lockdep: warn about lockdep disabling after kernel taint
Andrew Morton [Mon, 13 Apr 2009 17:27:49 +0000 (10:27 -0700)]
cpufreq: use smp_call_function_[single|many]() in acpi-cpufreq.c
Atttempting to rid us of the problematic work_on_cpu(). Just use
smp_call_fuction_single() here.
This repairs a 10% sysbench(oltp)+mysql regression which Mike reported,
due to
commit
6b44003e5ca66a3fffeb5bc90f40ada2c4340896
Author: Andrew Morton <akpm@linux-foundation.org>
Date: Thu Apr 9 09:50:37 2009 -0600
work_on_cpu(): rewrite it to create a kernel thread on demand
It seems that the kernel calls these acpi-cpufreq functions at a quite
high frequency.
Valdis Kletnieks also reports that this causes 70-90 forks per second on
his hardware.
Cc: Valdis.Kletnieks@vt.edu
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Zhao Yakui <yakui.zhao@intel.com>
Acked-by: Dave Jones <davej@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Mike Galbraith <efault@gmx.de>
Cc: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
[ Made it use smp_call_function_many() instead of looping over cpu's
with smp_call_function_single() - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Mon, 13 Apr 2009 15:32:48 +0000 (08:32 -0700)]
Merge branch 'i2c-for-linus' of git://jdelvare.pck.nerim.net/jdelvare-2.6
* 'i2c-for-linus' of git://jdelvare.pck.nerim.net/jdelvare-2.6:
i2c: Let new-style drivers implement attach_adapter
i2c: Fix sparse warnings for I2C_BOARD_INFO()
i2c-voodoo3: Deprecate in favor of tdfxfb
i2c-algo-pca: Fix use of uninitialized variable in debug message
Serge E. Hallyn [Mon, 13 Apr 2009 14:56:14 +0000 (09:56 -0500)]
add some long-missing capabilities to fs_mask
When POSIX capabilities were introduced during the 2.1 Linux
cycle, the fs mask, which represents the capabilities which having
fsuid==0 is supposed to grant, did not include CAP_MKNOD and
CAP_LINUX_IMMUTABLE. However, before capabilities the privilege
to call these did in fact depend upon fsuid==0.
This patch introduces those capabilities into the fsmask,
restoring the old behavior.
See the thread starting at http://lkml.org/lkml/2009/3/11/157 for
reference.
Note that if this fix is deemed valid, then earlier kernel versions (2.4
and 2.2) ought to be fixed too.
Changelog:
[Mar 23] Actually delete old CAP_FS_SET definition...
[Mar 20] Updated against J. Bruce Fields's patch
Reported-by: Igor Zhbanov <izh1979@gmail.com>
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Cc: stable@kernel.org
Cc: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Mon, 13 Apr 2009 15:23:46 +0000 (08:23 -0700)]
Merge branch 'gm_20090410' of git://repo.or.cz/linux-2.6/trivial-mods
* 'gm_20090410' of git://repo.or.cz/linux-2.6/trivial-mods:
MAINTAINERS - Update MN10300 patterns
MAINTAINERS - Update frv arch patterns
scripts/get_maintainer.pl - Allow multiple files on command line
MAINTAINERS - Update Freescale sound patterns
MAINTAINERS - Add additional patterns
MAINTAINERS - Add missing "/" to some pattern directories
MAINTAINERS - Update DRIVER CORE patterns
MAINTAINERS - Update M68K patterns
MAINTAINERS - Coalesce sections "DVB" and "Video for Linux"
MAINTAINERS - Remove cyblafb frame buffer no longer in tree
MAINTAINERS - Remove x86/Voyager no longer in tree
MAINTAINERS - Update FPU Emulator contact address and web page
MAINTAINERS - i2c_tiny_usb T: should be W:
MAINTAINERS - Add Linus Torvalds' git
MAINTAINERS - standardize "T: git urls"
MAINTAINERS - Remove HP Fibre Channel HBA no longer in tree
MAINTAINERS - Standardize style
MAINTAINERS - Add file patterns
Add scripts/get_maintainer.pl
Acked-by: Pavel Machek <pavel@ucw.cz>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Linus Torvalds [Mon, 13 Apr 2009 15:22:43 +0000 (08:22 -0700)]
Merge branch 'core-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
percpu: unbreak alpha percpu
mutex: have non-spinning mutexes on s390 by default
Linus Torvalds [Mon, 13 Apr 2009 15:19:45 +0000 (08:19 -0700)]
Merge branch 'for-linus' of git://git.o-hand.com/linux-rpurdie-leds
* 'for-linus' of git://git.o-hand.com/linux-rpurdie-leds:
leds: just ignore invalid GPIOs in leds-gpio
Linus Torvalds [Mon, 13 Apr 2009 15:18:30 +0000 (08:18 -0700)]
Merge git://git./linux/kernel/git/wim/linux-2.6-watchdog
* git://git.kernel.org/pub/scm/linux/kernel/git/wim/linux-2.6-watchdog:
[WATCHDOG] omap_wdt.c: move probe function to .devinit.text
[WATCHDOG] ks8695_wdt.c: move probe function to .devinit.text
[WATCHDOG] at91rm9200_wdt.c: move probe function to .devinit.text
[WATCHDOG] remove ARM26 sections
[WATCHDOG] orion5x_wdt: Add shutdown callback, use watchdog ping function
[WATCHDOG] i6300esb.c: Restructure initialization of the device
[WATCHDOG] i6300esb.c: Fix the GETSTATUS and GETBOOTSTATUS ioctls.
[WATCHDOG] i6300esb.c: Cleanup
Linus Torvalds [Mon, 13 Apr 2009 15:17:52 +0000 (08:17 -0700)]
Merge branch 'for-linus' of git://git.monstr.eu/linux-2.6-microblaze
* 'for-linus' of git://git.monstr.eu/linux-2.6-microblaze: (60 commits)
microblaze_v8: Add MAINTAINERS fragment
microblaze_v8: Uartlite for Microblaze
microblaze_v8: Makefiles for Microblaze cpu
microblaze_v8: Kconfig patches
microblaze_v8: Interrupt handling and timer support
microblaze_v8: syscalls.h
microblaze_v8: pci headers
microblaze_v8: Kbuild file
microblaze_v8: string.h thread_info.h
microblaze_v8: unistd.h
microblaze_v8: fcntl.h sockios.h ucontext.h
microblaze_v8: pool.h socket.h
microblaze_v8: device.h param.h topology.h
microblaze_v8: headers files entry.h current.h mman.h registers.h sembuf.h
microblaze_v8: namei.h
microblaze_v8: gpio.h, serial.h
microblaze_v8: headers simple files - empty or redirect to asm-generic
microblaze_v8: sigcontext.h siginfo.h
microblaze_v8: termbits.h termios.h
microblaze_v8: stats headers
...
Jean Delvare [Mon, 13 Apr 2009 15:02:14 +0000 (17:02 +0200)]
i2c: Let new-style drivers implement attach_adapter
While it isn't the way the standard device binding model works, it is
OK for new-style drivers to implement attach_adapter. It may help
convert the renaming legacy drivers to new style drivers faster.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: David Brownell <dbrownell@users.sourceforge.net>
Ben Hutchings [Mon, 13 Apr 2009 15:02:14 +0000 (17:02 +0200)]
i2c: Fix sparse warnings for I2C_BOARD_INFO()
Since the first argument to I2C_BOARD_INFO() must be a string constant,
there is no need to parenthesise it, and adding parentheses results in
an invalid initialiser for char[]. gcc obviously accepts this syntax as
an extension, but sparse complains, e.g.:
drivers/net/sfc/boards.c:173:2: warning: array initialized from parenthesized string constant
Therefore, remove the parentheses.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Jean Delvare [Mon, 13 Apr 2009 15:02:13 +0000 (17:02 +0200)]
i2c-voodoo3: Deprecate in favor of tdfxfb
Support for I2C/DDC was recently added to the tdfxfb driver, which
means that the i2c-voodoo3 driver can be deprecated.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: Krzysztof Helt <krzysztof.h1@wp.pl>
Jean Delvare [Mon, 13 Apr 2009 15:02:13 +0000 (17:02 +0200)]
i2c-algo-pca: Fix use of uninitialized variable in debug message
A recent change broke debugging of pca_xfer(), fix it.
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Acked-by: Wolfram Sang <w.sang@pengutronix.de>
Vegard Nossum [Wed, 8 Apr 2009 16:19:39 +0000 (18:19 +0200)]
ata: fix obviously wrong comment
Also remove the now-useless debug printouts which are supposed to
tell us when the scan starts and ends.
Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Tejun Heo [Wed, 8 Apr 2009 21:25:31 +0000 (14:25 -0700)]
ahci: force CAP_NCQ for earlier NV MCPs
Along with MCP65, MCP67 and 73 also don't set CAP_NCQ. Force it.
Reported by zaceni@yandex.ru on bko#13014 and confirmed by Peer Chen.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: NightFox <zaceni2@yandex.ru>
Cc: Peer Chen <pchen@nvidia.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Jeff Garzik [Mon, 13 Apr 2009 08:09:34 +0000 (04:09 -0400)]
[libata] sata_via: kill uninit'd var warning
Reported and initial patch by Marin Mitov.
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Frederic Weisbecker [Sat, 11 Apr 2009 01:17:18 +0000 (03:17 +0200)]
lockdep: continue lock debugging despite some taints
Impact: broaden lockdep checks
Lockdep is disabled after any kernel taints. This might be convenient
to ignore bad locking issues which sources come from outside the kernel
tree. Nevertheless, it might be a frustrating experience for the
staging developers or those who experience a warning but are focused
on another things that require lockdep.
The v2 of this patch simply don't disable anymore lockdep in case
of TAINT_CRAP and TAINT_WARN events.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: LTP <ltp-list@lists.sourceforge.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Greg KH <gregkh@suse.de>
LKML-Reference: <
1239412638-6739-2-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Frederic Weisbecker [Sat, 11 Apr 2009 01:17:17 +0000 (03:17 +0200)]
lockdep: warn about lockdep disabling after kernel taint
Impact: provide useful missing info for developers
Kernel taint can occur in several situations such as warnings,
load of prorietary or staging modules, bad page, etc...
But when such taint happens, a developer might still be working on
the kernel, expecting that lockdep is still enabled. But a taint
disables lockdep without ever warning about it.
Such a kernel behaviour doesn't really help for kernel development.
This patch adds this missing warning.
Since the taint is done most of the time after the main message that
explain the real source issue, it seems safe to warn about it inside
add_taint() so that it appears at last, without hurting the main
information.
v2: Use a generic helper to disable lockdep instead of an
open coded xchg().
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <
1239412638-6739-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Li Zefan [Wed, 1 Apr 2009 08:19:19 +0000 (16:19 +0800)]
blktrace: fix output of BLK_TC_PC events
BLK_TC_PC events should be treated differently with BLK_TC_FS events.
Before this patch:
# echo 1 > /sys/block/sda/sda1/trace/enable
# echo pc > /sys/block/sda/sda1/trace/act_mask
# echo blk > /debugfs/tracing/current_tracer
# (generate some BLK_TC_PC events)
# cat trace
bash-2184 [000] 1774.275413: 8,7 I N [bash]
bash-2184 [000] 1774.275435: 8,7 D N [bash]
bash-2184 [000] 1774.275540: 8,7 I R [bash]
bash-2184 [000] 1774.275547: 8,7 D R [bash]
ksoftirqd/0-4 [000] 1774.275580: 8,7 C N 0 [0]
bash-2184 [000] 1774.275648: 8,7 I R [bash]
bash-2184 [000] 1774.275653: 8,7 D R [bash]
ksoftirqd/0-4 [000] 1774.275682: 8,7 C N 0 [0]
bash-2184 [000] 1774.275739: 8,7 I R [bash]
bash-2184 [000] 1774.275744: 8,7 D R [bash]
ksoftirqd/0-4 [000] 1774.275771: 8,7 C N 0 [0]
bash-2184 [000] 1774.275804: 8,7 I R [bash]
bash-2184 [000] 1774.275808: 8,7 D R [bash]
ksoftirqd/0-4 [000] 1774.275836: 8,7 C N 0 [0]
After this patch:
# cat trace
bash-2263 [000] 366.782149: 8,7 I N 0 (00 ..) [bash]
bash-2263 [000] 366.782323: 8,7 D N 0 (00 ..) [bash]
bash-2263 [000] 366.782557: 8,7 I R 8 (25 00 ..) [bash]
bash-2263 [000] 366.782560: 8,7 D R 8 (25 00 ..) [bash]
ksoftirqd/0-4 [000] 366.782582: 8,7 C N (25 00 ..) [0]
bash-2263 [000] 366.782648: 8,7 I R 8 (5a 00 3f 00) [bash]
bash-2263 [000] 366.782650: 8,7 D R 8 (5a 00 3f 00) [bash]
ksoftirqd/0-4 [000] 366.782669: 8,7 C N (5a 00 3f 00) [0]
bash-2263 [000] 366.782710: 8,7 I R 8 (5a 00 08 00) [bash]
bash-2263 [000] 366.782713: 8,7 D R 8 (5a 00 08 00) [bash]
ksoftirqd/0-4 [000] 366.782730: 8,7 C N (5a 00 08 00) [0]
bash-2263 [000] 366.783375: 8,7 I R 36 (5a 00 08 00) [bash]
bash-2263 [000] 366.783379: 8,7 D R 36 (5a 00 08 00) [bash]
ksoftirqd/0-4 [000] 366.783404: 8,7 C N (5a 00 08 00) [0]
This is what we do with PC events in user-space blktrace.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <
49D32387.
9040106@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Li Zefan [Wed, 1 Apr 2009 08:18:53 +0000 (16:18 +0800)]
blktrace: fix output of unknown events
Not all events are pc (packet command) events. An event is a pc
event only if it has BLK_TC_PC bit set.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <
49D3236D.
3090705@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Zhaolei [Fri, 10 Apr 2009 06:27:38 +0000 (14:27 +0800)]
tracing, kmemtrace: Make kmem tracepoints use TRACE_EVENT macro
TRACE_EVENT is a more generic way to define tracepoints.
Doing so adds these new capabilities to this tracepoint:
- zero-copy and per-cpu splice() tracing
- binary tracing without printf overhead
- structured logging records exposed under /debug/tracing/events
- trace events embedded in function tracer output and other plugins
- user-defined, per tracepoint filter expressions
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Acked-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
LKML-Reference: <
49DEE6DA.80600@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Zhaolei [Fri, 10 Apr 2009 06:26:18 +0000 (14:26 +0800)]
tracing, kmemtrace: Separate include/trace/kmemtrace.h to kmemtrace part and tracepoint part
Impact: refactor code for future changes
Current kmemtrace.h is used both as header file of kmemtrace and kmem's
tracepoints definition.
Tracepoints' definition file may be used by other code, and should only have
definition of tracepoint.
We can separate include/trace/kmemtrace.h into 2 files:
include/linux/kmemtrace.h: header file for kmemtrace
include/trace/kmem.h: definition of kmem tracepoints
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Acked-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
LKML-Reference: <
49DEE68A.
5040902@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Theodore Ts'o [Sat, 11 Apr 2009 19:51:18 +0000 (15:51 -0400)]
tracing: Document the event tracing system
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <
1239479479-2603-3-git-send-email-tytso@mit.edu>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Theodore Ts'o [Sat, 11 Apr 2009 19:51:19 +0000 (15:51 -0400)]
tracing: Add documentation for the power tracer
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <srostedt@redhat.com>
LKML-Reference: <
1239479479-2603-4-git-send-email-tytso@mit.edu>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Suresh Siddha [Fri, 10 Apr 2009 22:21:24 +0000 (15:21 -0700)]
x86: add linux kernel support for YMM state
Impact: save/restore Intel-AVX state properly between tasks
Intel Advanced Vector Extensions (AVX) introduce 256-bit vector processing
capability. More about AVX at http://software.intel.com/sites/avx
Add OS support for YMM state management using xsave/xrstor infrastructure
to support AVX.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <
1239402084.27006.8057.camel@localhost.localdomain>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Marcin Slusarz [Fri, 10 Apr 2009 20:47:17 +0000 (22:47 +0200)]
x86: fix wrong section of pat_disable & make it static
pat_disable cannot be __cpuinit anymore because it's called from pat_init
and the callchain looks like this:
pat_disable [cpuinit] <- pat_init <- generic_set_all <-
ipi_handler <- set_mtrr <- (other non init/cpuinit functions)
WARNING: arch/x86/mm/built-in.o(.text+0x449e): Section mismatch in reference
from the function pat_init() to the function .cpuinit.text:pat_disable()
The function pat_init() references
the function __cpuinit pat_disable().
This is often because pat_init lacks a __cpuinit
annotation or the annotation of pat_disable is wrong.
Non CONFIG_X86_PAT version of pat_disable is static inline, so this version
can be static too (and there are no callers outside of this file).
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
LKML-Reference: <
49DFB055.
6070405@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Rakib Mullick [Sat, 11 Apr 2009 03:04:59 +0000 (09:04 +0600)]
x86: Fix section mismatches in mpparse
Impact: fix section mismatch
In arch/x86/kernel/mpparse.c, smp_reserve_bootmem() has been called
and also refers to a function which is in .init section. Thus causes
the first warning. And check_irq_src() also requires an __init,
because it refers to an .init section.
Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <b9df5fa10904102004g51265d9axc8d07278bfdb6ba0@mail.gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Li Zefan [Sat, 11 Apr 2009 07:55:28 +0000 (15:55 +0800)]
tracing/filters: return proper error code when writing filter file
- propagate return value of filter_add_pred() to the user
- return -ENOSPC but not -ENOMEM or -EINVAL when the filter array
is full
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Tom Zanussi <tzanussi@gmail.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <
49E04CF0.
3010105@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Li Zefan [Sat, 11 Apr 2009 07:52:51 +0000 (15:52 +0800)]
tracing/filters: allow user input integer to be oct or hex
Before patch:
# echo 'parent_pid == 0x10' > events/sched/sched_process_fork/filter
# cat sched/sched_process_fork/filter
parent_pid == 0
After patch:
# cat sched/sched_process_fork/filter
parent_pid == 16
Also check the input more strictly.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Tom Zanussi <tzanussi@gmail.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <
49E04C53.
4010600@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Li Zefan [Sat, 11 Apr 2009 07:52:35 +0000 (15:52 +0800)]
tracing/filters: fix NULL pointer dereference
Try this, and you'll see NULL pointer dereference bug:
# echo -n 'parent_comm ==' > sched/sched_process_fork/filter
Because we passed NULL ptr to simple_strtoull().
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Tom Zanussi <tzanussi@gmail.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <
49E04C43.
1050504@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Li Zefan [Sat, 11 Apr 2009 07:52:18 +0000 (15:52 +0800)]
tracing/filters: NIL-terminate user input filter
Make sure messages from user space are NIL-terminated strings,
otherwise we could dump random memory while reading filter file.
Try this:
# echo 'parent_comm ==' > events/sched/sched_process_fork/filter
# cat events/sched/sched_process_fork/filter
parent_comm == �
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Tom Zanussi <tzanussi@gmail.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <
49E04C32.
6060508@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>