Masami Hiramatsu [Mon, 27 Jun 2011 07:27:15 +0000 (16:27 +0900)]
perf probe: Move strtailcmp to string.c
Since strtailcmp() is enough generic, it should be defined in string.c.
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Link: http://lkml.kernel.org/r/20110627072715.6528.10677.stgit@fedora15
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Masami Hiramatsu [Mon, 27 Jun 2011 07:27:09 +0000 (16:27 +0900)]
perf probe: Rename DIE_FIND_CB_FOUND to DIE_FIND_CB_END
Since die_find/walk* callbacks use DIE_FIND_CB_FOUND for
both of failed and found cases, it should be "END"
instead "FOUND" for avoiding confusion.
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reported-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@elte.hu>
Link: http://lkml.kernel.org/r/20110627072709.6528.45706.stgit@fedora15
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Masami Hiramatsu [Mon, 27 Jun 2011 07:27:03 +0000 (16:27 +0900)]
tracing/kprobe: Update symbol reference when loading module
Since the address of a module-local variable can only be
solved after the target module is loaded, the symbol
fetch-argument should be updated when loading target
module.
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lkml.kernel.org/r/20110627072703.6528.75042.stgit@fedora15
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Masami Hiramatsu [Mon, 27 Jun 2011 07:26:56 +0000 (16:26 +0900)]
tracing/kprobes: Support module init function probing
To support probing module init functions, kprobe-tracer allows
user to define a probe on non-existed function when it is given
with a module name. This also enables user to set a probe on
a function on a specific module, even if a same name (but different)
function is locally defined in another module.
The module name must be in the front of function name and separated
by a ':'. e.g. btrfs:btrfs_init_sysfs
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lkml.kernel.org/r/20110627072656.6528.89970.stgit@fedora15
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Masami Hiramatsu [Mon, 27 Jun 2011 07:26:50 +0000 (16:26 +0900)]
kprobes: Return -ENOENT if probe point doesn't exist
Return -ENOENT if probe point doesn't exist, but still returns
-EINVAL if both of kprobe->addr and kprobe->symbol_name are
specified or both are not specified.
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Link: http://lkml.kernel.org/r/20110627072650.6528.67329.stgit@fedora15
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Masami Hiramatsu [Mon, 27 Jun 2011 07:26:44 +0000 (16:26 +0900)]
tracing/kprobes: Merge trace probe enable/disable functions
Merge redundant enable/disable functions into enable_trace_probe()
and disable_trace_probe().
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: yrl.pp-manager.tt@hitachi.com
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Link: http://lkml.kernel.org/r/20110627072644.6528.26910.stgit@fedora15
[ converted kprobe selftest to use enable_trace_probe ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Masami Hiramatsu [Mon, 27 Jun 2011 07:26:36 +0000 (16:26 +0900)]
tracing/kprobes: Rename probe_* to trace_probe_*
Rename probe_* to trace_probe_* for avoiding namespace
confliction. This also fixes improper names of find_probe_event()
and cleanup_all_probes() to find_trace_probe() and
release_all_trace_probes() respectively.
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/20110627072636.6528.60374.stgit@fedora15
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Cyrill Gorcunov [Fri, 8 Jul 2011 20:17:12 +0000 (00:17 +0400)]
perf, x86: P4 PMU - Introduce event alias feature
Instead of hw_nmi_watchdog_set_attr() weak function
and appropriate x86_pmu::hw_watchdog_set_attr() call
we introduce even alias mechanism which allow us
to drop this routines completely and isolate quirks
of Netburst architecture inside P4 PMU code only.
The main idea remains the same though -- to allow
nmi-watchdog and perf top run simultaneously.
Note the aliasing mechanism applies to generic
PERF_COUNT_HW_CPU_CYCLES event only because arbitrary
event (say passed as RAW initially) might have some
additional bits set inside ESCR register changing
the behaviour of event and we can't guarantee anymore
that alias event will give the same result.
P.S. Thanks a huge to Don and Steven for for testing
and early review.
Acked-by: Don Zickus <dzickus@redhat.com>
Tested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
CC: Ingo Molnar <mingo@elte.hu>
CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Stephane Eranian <eranian@google.com>
CC: Lin Ming <ming.m.lin@intel.com>
CC: Arnaldo Carvalho de Melo <acme@redhat.com>
CC: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/20110708201712.GS23657@sun
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Steven Rostedt [Thu, 14 Jul 2011 20:36:53 +0000 (16:36 -0400)]
tracing: Have dynamic size event stack traces
Currently the stack trace per event in ftace is only 8 frames.
This can be quite limiting and sometimes useless. Especially when
the "ignore frames" is wrong and we also use up stack frames for
the event processing itself.
Change this to be dynamic by adding a percpu buffer that we can
write a large stack frame into and then copy into the ring buffer.
For interrupts and NMIs that come in while another event is being
process, will only get to use the 8 frame stack. That should be enough
as the task that it interrupted will have the full stack frame anyway.
Requested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Sonny Rao [Thu, 14 Jul 2011 03:34:43 +0000 (13:34 +1000)]
perf: Robustify proc and debugfs file recording
While attempting to create a timechart of boot up I found perf didn't
tolerate modules being loaded/unloaded. This patch fixes this by
reading the file once and then writing the size read at the correct
point in the file. It also simplifies the code somewhat.
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Sonny Rao <sonnyrao@chromium.org>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Link: http://lkml.kernel.org/r/10011.1310614483@neuling.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Steven Rostedt [Wed, 13 Jul 2011 19:11:02 +0000 (15:11 -0400)]
ftrace: Fix dynamic selftest failure on some archs
Archs that do not implement CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST, will
fail the dynamic ftrace selftest.
The function tracer has a quick 'off' variable that will prevent
the call back functions from being called. This variable is called
function_trace_stop. In x86, this is implemented directly in the mcount
assembly, but for other archs, an intermediate function is used called
ftrace_test_stop_func().
In dynamic ftrace, the function pointer variable ftrace_trace_function is
used to update the caller code in the mcount caller. But for archs that
do not have CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST set, it only calls
ftrace_test_stop_func() instead, which in turn calls __ftrace_trace_function.
When more than one ftrace_ops is registered, the function it calls is
ftrace_ops_list_func(), which will iterate over all registered ftrace_ops
and call the callbacks that have their hash matching.
The issue happens when two ftrace_ops are registered for different functions
and one is then unregistered. The __ftrace_trace_function is then pointed
to the remaining ftrace_ops callback function directly. This mean it will
be called for all functions that were registered to trace by both ftrace_ops
that were registered.
This is not an issue for archs with CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST,
because the update of ftrace_trace_function doesn't happen until after all
functions have been updated, and then the mcount caller is updated. But
for those archs that do use the ftrace_test_stop_func(), the update is
immediate.
The dynamic selftest fails because it hits this situation, and the
ftrace_ops that it registers fails to only trace what it was suppose to
and instead traces all other functions.
The solution is to delay the setting of __ftrace_trace_function until
after all the functions have been updated according to the registered
ftrace_ops. Also, function_trace_stop is set during the update to prevent
function tracing from calling code that is caused by the function tracer
itself.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Steven Rostedt [Wed, 13 Jul 2011 19:08:31 +0000 (15:08 -0400)]
ftrace: Update filter when tracing enabled in set_ftrace_filter()
Currently, if set_ftrace_filter() is called when the ftrace_ops is
active, the function filters will not be updated. They will only be updated
when tracing is disabled and re-enabled.
Update the functions immediately during set_ftrace_filter().
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Steven Rostedt [Wed, 13 Jul 2011 19:03:44 +0000 (15:03 -0400)]
ftrace: Balance records when updating the hash
Whenever the hash of the ftrace_ops is updated, the record counts
must be balance. This requires disabling the records that are set
in the original hash, and then enabling the records that are set
in the updated hash.
Moving the update into ftrace_hash_move() removes the bug where the
hash was updated but the records were not, which results in ftrace
triggering a warning and disabling itself because the ftrace_ops filter
is updated while the ftrace_ops was registered, and then the failure
happens when the ftrace_ops is unregistered.
The current code will not trigger this bug, but new code will.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Steven Rostedt [Sat, 25 Jun 2011 03:28:13 +0000 (23:28 -0400)]
ftrace: Do not disable interrupts for modules in mcount update
When I mounted an NFS directory, it caused several modules to be loaded. At the
time I was running the preemptirqsoff tracer, and it showed the following
output:
# tracer: preemptirqsoff
#
# preemptirqsoff latency trace v1.1.5 on 2.6.33.9-rt30-mrg-test
# --------------------------------------------------------------------
# latency: 1177 us, #4/4, CPU#3 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4)
# -----------------
# | task: modprobe-19370 (uid:0 nice:0 policy:0 rt_prio:0)
# -----------------
# => started at: ftrace_module_notify
# => ended at: ftrace_module_notify
#
#
# _------=> CPU#
# / _-----=> irqs-off
# | / _----=> need-resched
# || / _---=> hardirq/softirq
# ||| / _--=> preempt-depth
# |||| /_--=> lock-depth
# |||||/ delay
# cmd pid |||||| time | caller
# \ / |||||| \ | /
modprobe-19370 3d.... 0us!: ftrace_process_locs <-ftrace_module_notify
modprobe-19370 3d.... 1176us : ftrace_process_locs <-ftrace_module_notify
modprobe-19370 3d.... 1178us : trace_hardirqs_on <-ftrace_module_notify
modprobe-19370 3d.... 1178us : <stack trace>
=> ftrace_process_locs
=> ftrace_module_notify
=> notifier_call_chain
=> __blocking_notifier_call_chain
=> blocking_notifier_call_chain
=> sys_init_module
=> system_call_fastpath
That's over 1ms that interrupts are disabled on a Real-Time kernel!
Looking at the cause (being the ftrace author helped), I found that the
interrupts are disabled before the code modification of mcounts into nops. The
interrupts only need to be disabled on start up around this code, not when
modules are being loaded.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Steven Rostedt [Tue, 14 Jun 2011 23:02:29 +0000 (19:02 -0400)]
tracing: Still trace filtered irq functions when irq trace is disabled
If a function is set to be traced by the set_graph_function, but the
option funcgraph-irqs is zero, and the traced function happens to be
called from a interrupt, it will not be traced.
The point of funcgraph-irqs is to not trace interrupts when we are
preempted by an irq, not to not trace functions we want to trace that
happen to be *in* a irq.
Luckily the current->trace_recursion element is perfect to add a flag
to help us be able to trace functions within an interrupt even when
we are not tracing interrupts that preempt the trace.
Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Tested-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Steven Rostedt [Sat, 2 Jul 2011 03:04:36 +0000 (23:04 -0400)]
tracing, x86/irq: Do not trace arch_local_{*,irq_*}() functions
I triggered a triple fault with gcc 4.5.1 because it did not
honor the inline annotation to arch_local_save_flags() function
and that function was added to the pool of functions traced by
the function tracer.
When preempt_schedule() called arch_local_save_flags() (called
by irqs_disabled()), it was traced, but the first thing the
function tracer does is disable preemption. When it enables
preemption, the NEED_RESCHED flag will not have been cleared and
the preemption check will trigger the call to preempt_schedule()
again.
Although the dynamic function tracer crashed immediately, the
static version of the function tracer (CONFIG_DYNAMIC_FTRACE is
not set) actually was able to show where the problem was.
swapper-1 3.N.. 103885us : arch_local_save_flags <-preempt_schedule
swapper-1 3.N.. 103886us : arch_local_save_flags <-preempt_schedule
swapper-1 3.N.. 103886us : arch_local_save_flags <-preempt_schedule
swapper-1 3.N.. 103887us : arch_local_save_flags <-preempt_schedule
swapper-1 3.N.. 103887us : arch_local_save_flags <-preempt_schedule
swapper-1 3.N.. 103888us : arch_local_save_flags <-preempt_schedule
swapper-1 3.N.. 103888us : arch_local_save_flags <-preempt_schedule
It went on for a while before it triple faulted with a corrupted
stack.
The arch_local_save_flags and arch_local_irq_* functions should
not be traced. Even though they are marked as inline, gcc may
still make them a function and enable tracing of them.
The simple solution is to just mark them as notrace. I had to
add the <linux/types.h> for this file to include the notrace
tag.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20110702033852.733414762@goodmis.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar [Tue, 5 Jul 2011 09:55:43 +0000 (11:55 +0200)]
Merge branch 'tip/perf/core-2' of git://git./linux/kernel/git/rostedt/linux-2.6-trace into perf/core
Anton Blanchard [Mon, 4 Jul 2011 11:57:50 +0000 (21:57 +1000)]
perf report/annotate/script: Add option to specify a CPU range
Add an option to perf report/annotate/script to specify which
CPUs to operate on. This enables us to take a single system wide
profile and analyse each CPU (or group of CPUs) in isolation.
This was useful when profiling a multiprocess workload where the
bottleneck was on one CPU but this was hidden in the overall
profile. Per process and per thread breakdowns didn't help
because multiple processes were running on each CPU and no
single process consumed an entire CPU.
The patch converts the list of CPUs returned by cpu_map__new
into a bitmap for fast lookup. I wanted to use -C to be
consistent with perf top/record/stat, but unfortunately perf
report already uses -C <comms>.
v2: Incorporate suggestions from David Ahern:
- Added -c to perf script
- Check that SAMPLE_CPU is set when -c is used
- Update documentation
v3: Create perf_session__cpu_bitmap()
Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: David Ahern <dsahern@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Link: http://lkml.kernel.org/r/20110704215750.11647eb9@kryten
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar [Mon, 4 Jul 2011 10:27:28 +0000 (12:27 +0200)]
Merge branch 'core' of git://git./linux/kernel/git/rric/oprofile into perf/core
Ingo Molnar [Sun, 3 Jul 2011 18:39:40 +0000 (20:39 +0200)]
Merge branch 'perf/stacktrace' of git://git./linux/kernel/git/frederic/random-tracing into perf/core
Frederic Weisbecker [Sat, 2 Jul 2011 14:52:45 +0000 (16:52 +0200)]
x86: Don't use frame pointer to save old stack on irq entry
rbp is used in SAVE_ARGS_IRQ to save the old stack pointer
in order to restore it later in ret_from_intr.
It is convenient because we save its value in the irq regs
and it's easily restored using the leave instruction.
However this is a kind of abuse of the frame pointer which
role is to help unwinding the kernel by chaining frames
together, each node following the return address to the
previous frame.
But although we are breaking the frame by changing the stack
pointer, there is no preceding return address before the new
frame. Hence using the frame pointer to link the two stacks
breaks the stack unwinders that find a random value instead of
a return address here.
There is no workaround that can work in every case. We are using
the fixup_bp_irq_link() function to dereference that abused frame
pointer in the case of non nesting interrupt (which means stack
changed).
But that doesn't fix the case of interrupts that don't change the
stack (but we still have the unconditional frame link), which is
the case of hardirq interrupting softirq. We have no way to detect
this transition so the frame irq link is considered as a real frame
pointer and the return address is dereferenced but it is still a
spurious one.
There are two possible results of this: either the spurious return
address, a random stack value, luckily belongs to the kernel text
and then the unwinding can continue and we just have a weird entry
in the stack trace. Or it doesn't belong to the kernel text and
unwinding stops there.
This is the reason why stacktraces (including perf callchains) on
irqs that interrupted softirqs don't work very well.
To solve this, we don't save the old stack pointer on rbp anymore
but we save it to a scratch register that we push on the new
stack and that we pop back later on irq return.
This preserves the whole frame chain without spurious return addresses
in the middle and drops the need for the horrid fixup_bp_irq_link()
workaround.
And finally irqs that interrupt softirq are sanely unwinded.
Before:
99.81% perf [kernel.kallsyms] [k] perf_pending_event
|
--- perf_pending_event
irq_work_run
smp_irq_work_interrupt
irq_work_interrupt
|
|--41.60%-- __read
| |
| |--99.90%-- create_worker
| | bench_sched_messaging
| | cmd_bench
| | run_builtin
| | main
| | __libc_start_main
| --0.10%-- [...]
After:
1.64% swapper [kernel.kallsyms] [k] perf_pending_event
|
--- perf_pending_event
irq_work_run
smp_irq_work_interrupt
irq_work_interrupt
|
|--95.00%-- arch_irq_work_raise
| irq_work_queue
| __perf_event_overflow
| perf_swevent_overflow
| perf_swevent_event
| perf_tp_event
| perf_trace_softirq
| __do_softirq
| call_softirq
| do_softirq
| irq_exit
| |
| |--73.68%-- smp_apic_timer_interrupt
| | apic_timer_interrupt
| | |
| | |--96.43%-- amd_e400_idle
| | | cpu_idle
| | | start_secondary
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jan Beulich <JBeulich@novell.com>
Frederic Weisbecker [Sat, 2 Jul 2011 13:03:44 +0000 (15:03 +0200)]
x86: Remove useless unwinder backlink from irq regs saving
The unwinder backlink in interrupt entry is very useless.
It's actually not part of the stack frame chain and thus is
never used.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jan Beulich <JBeulich@novell.com>
Frederic Weisbecker [Fri, 1 Jul 2011 00:25:17 +0000 (02:25 +0200)]
x86,64: Separate arg1 from rbp handling in SAVE_REGS_IRQ
Just for clarity in the code. Have a first block that handles
the frame pointer and a separate one that handles pt_regs
pointer and its use.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jan Beulich <JBeulich@novell.com>
Frederic Weisbecker [Thu, 30 Jun 2011 23:51:22 +0000 (01:51 +0200)]
x86,64: Simplify save_regs()
The save_regs function that saves the regs on low level
irq entry is complicated because of the fact it changes
its stack in the middle and also because it manipulates
data allocated in the caller frame and accesses there
are directly calculated from callee rsp value with the
return address in the middle of the way.
This complicates the static stack offsets calculation and
require more dynamic ones. It also needs a save/restore
of the function's return address.
To simplify and optimize this, turn save_regs() into a
macro.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jan Beulich <JBeulich@novell.com>
Frederic Weisbecker [Thu, 30 Jun 2011 17:04:56 +0000 (19:04 +0200)]
x86: Fetch stack from regs when possible in dump_trace()
When regs are passed to dump_stack(), we fetch the frame
pointer from the regs but the stack pointer is taken from
the current frame.
Thus the frame and stack pointers may not come from the same
context. For example this can result in the unwinder to
think the context is in irq, due to the current value of
the stack, but the frame pointer coming from the regs points
to a frame from another place. It then tries to fix up
the irq link but ends up dereferencing a random frame
pointer that doesn't belong to the irq stack:
[ 9131.706906] ------------[ cut here ]------------
[ 9131.707003] WARNING: at arch/x86/kernel/dumpstack_64.c:129 dump_trace+0x2aa/0x330()
[ 9131.707003] Hardware name: AMD690VM-FMH
[ 9131.707003] Perf: bad frame pointer =
0000000000000005 in callchain
[ 9131.707003] Modules linked in:
[ 9131.707003] Pid: 1050, comm: perf Not tainted 3.0.0-rc3+ #181
[ 9131.707003] Call Trace:
[ 9131.707003] <IRQ> [<
ffffffff8104bd4a>] warn_slowpath_common+0x7a/0xb0
[ 9131.707003] [<
ffffffff8104be21>] warn_slowpath_fmt+0x41/0x50
[ 9131.707003] [<
ffffffff8178b873>] ? bad_to_user+0x6d/0x10be
[ 9131.707003] [<
ffffffff8100c2da>] dump_trace+0x2aa/0x330
[ 9131.707003] [<
ffffffff810107d3>] ? native_sched_clock+0x13/0x50
[ 9131.707003] [<
ffffffff8101b164>] perf_callchain_kernel+0x54/0x70
[ 9131.707003] [<
ffffffff810d391f>] perf_prepare_sample+0x19f/0x2a0
[ 9131.707003] [<
ffffffff810d546c>] __perf_event_overflow+0x16c/0x290
[ 9131.707003] [<
ffffffff810d5430>] ? __perf_event_overflow+0x130/0x290
[ 9131.707003] [<
ffffffff810107d3>] ? native_sched_clock+0x13/0x50
[ 9131.707003] [<
ffffffff8100fbb9>] ? sched_clock+0x9/0x10
[ 9131.707003] [<
ffffffff810752e5>] ? T.375+0x15/0x90
[ 9131.707003] [<
ffffffff81084da4>] ? trace_hardirqs_on_caller+0x64/0x180
[ 9131.707003] [<
ffffffff810817bd>] ? trace_hardirqs_off+0xd/0x10
[ 9131.707003] [<
ffffffff810d5764>] perf_event_overflow+0x14/0x20
[ 9131.707003] [<
ffffffff810d588c>] perf_swevent_hrtimer+0x11c/0x130
[ 9131.707003] [<
ffffffff817821a1>] ? error_exit+0x51/0xb0
[ 9131.707003] [<
ffffffff81072e93>] __run_hrtimer+0x83/0x1e0
[ 9131.707003] [<
ffffffff810d5770>] ? perf_event_overflow+0x20/0x20
[ 9131.707003] [<
ffffffff81073256>] hrtimer_interrupt+0x106/0x250
[ 9131.707003] [<
ffffffff812a3bfd>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[ 9131.707003] [<
ffffffff81024833>] smp_apic_timer_interrupt+0x53/0x90
[ 9131.707003] [<
ffffffff81789053>] apic_timer_interrupt+0x13/0x20
[ 9131.707003] <EOI> [<
ffffffff817821a1>] ? error_exit+0x51/0xb0
[ 9131.707003] [<
ffffffff8178219c>] ? error_exit+0x4c/0xb0
[ 9131.707003] ---[ end trace
b2560d4876709347 ]---
Fix this by simply taking the stack pointer from regs->sp
when regs are provided.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Frederic Weisbecker [Sat, 2 Jul 2011 13:00:52 +0000 (15:00 +0200)]
x86: Save stack pointer in perf live regs savings
In order to prepare for fetching the stack pointer from the
regs when possible in dump_trace() instead of taking the
local one, save the current stack pointer in perf live regs saving.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Peter Zijlstra [Fri, 1 Jul 2011 13:24:53 +0000 (15:24 +0200)]
perf, powerpc: Fix build borkage
The patch
a8b0ca17b80e ("perf: Remove the nmi parameter from the swevent
and overflow interface") missed a spot in the ppc hw_breakpoint code,
fix this up so things compile again.
Reported-by: Ingo Molnar <mingo@elte.hu>
Cc: Anton Blanchard <anton@samba.org>
Cc: Eric B Munson <emunson@mgebm.net>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-09pfip95g88s70iwkxu6nnbt@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Zhengyu He [Thu, 23 Jun 2011 20:45:42 +0000 (13:45 -0700)]
perf stat: Add noise output for csv mode
Previously, when you want perf-stat to output the statistics in
csv mode, no information of the noise will be printed out.
For example right now we output this --repeat information:
./perf stat -r3 -x, sleep 1
1.164789,task-clock
8,context-switches
0,CPU-migrations
219,page-faults
3337800,cycles
With this patch, the output will be appended with an additional
entry for the noise value:
./perf stat -r3 -x, sleep 1
1.164789,task-clock,3.75%
8,context-switches,75.00%
0,CPU-migrations,100.00%
219,page-faults,0.00%
3337800,cycles,3.36%
Signed-off-by: Zhengyu He <zhengyuh@google.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Stephane Eranian <eranian@google.com>
Cc: Venkatesh Pallipadi <venki@google.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/1308861942-4945-1-git-send-email-zhengyuh@google.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar [Fri, 1 Jul 2011 09:51:58 +0000 (11:51 +0200)]
Merge branch 'perf/core' of git://git./linux/kernel/git/frederic/random-tracing into perf/core
Avi Kivity [Wed, 29 Jun 2011 15:42:37 +0000 (18:42 +0300)]
perf: export perf_event_refresh() to modules
KVM needs one-shot samples, since a PMC programmed to -X will fire after X
events and then again after 2^40 events (i.e. variable period).
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1309362157-6596-4-git-send-email-avi@redhat.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Avi Kivity [Wed, 29 Jun 2011 15:42:36 +0000 (18:42 +0300)]
x86, perf: Add constraints for architectural PMU
The v1 PMU does not have any fixed counters. Using the v2 constraints,
which do have fixed counters, causes an additional choice to be present
in the weight calculation, but not when actually scheduling the event,
leading to an event being not scheduled at all.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1309362157-6596-3-git-send-email-avi@redhat.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Avi Kivity [Wed, 29 Jun 2011 15:42:35 +0000 (18:42 +0300)]
perf: Add context field to perf_event
The perf_event overflow handler does not receive any caller-derived
argument, so many callers need to resort to looking up the perf_event
in their local data structure. This is ugly and doesn't scale if a
single callback services many perf_events.
Fix by adding a context parameter to perf_event_create_kernel_counter()
(and derived hardware breakpoints APIs) and storing it in the perf_event.
The field can be accessed from the callback as event->overflow_handler_context.
All callers are updated.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1309362157-6596-2-git-send-email-avi@redhat.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Peter Zijlstra [Fri, 22 Apr 2011 21:37:06 +0000 (23:37 +0200)]
perf, arch: Add generic NODE cache events
Add a NODE level to the generic cache events which is used to measure
local vs remote memory accesses. Like all other cache events, an
ACCESS is HIT+MISS, if there is no way to distinguish between reads
and writes do reads only etc..
The below needs filling out for !x86 (which I filled out with
unsupported events).
I'm fairly sure ARM can leave it like that since it doesn't strike me as
an architecture that even has NUMA support. SH might have something since
it does appear to have some NUMA bits.
Sparc64, PowerPC and MIPS certainly want a good look there since they
clearly are NUMA capable.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: David Miller <davem@davemloft.net>
Cc: Anton Blanchard <anton@samba.org>
Cc: David Daney <ddaney@caviumnetworks.com>
Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1303508226.4865.8.camel@laptop
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Peter Zijlstra [Mon, 23 May 2011 09:08:15 +0000 (11:08 +0200)]
perf, intel: Try alternative OFFCORE encodings
Since the OFFCORE registers are fully symmetric, try the other one
when the specified one is already in use.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1306141897.18455.8.camel@twins
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Stephane Eranian [Mon, 6 Jun 2011 14:57:12 +0000 (16:57 +0200)]
perf_events: Add Intel Sandy Bridge offcore_response low-level support
This patch adds Intel Sandy Bridge offcore_response support by
providing the low-level constraint table for those events.
On Sandy Bridge, there are two offcore_response events. Each uses
its own dedictated extra register. But those registers are NOT shared
between sibling CPUs when HT is on unlike Nehalem/Westmere. They are
always private to each CPU. But they still need to be controlled within
an event group. All events within an event group must use the same
value for the extra MSR. That's not controlled by the second patch in
this series.
Furthermore on Sandy Bridge, the offcore_response events have NO
counter constraints contrary to what the official documentation
indicates, so drop the events from the contraint table.
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20110606145712.GA7304@quad
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Stephane Eranian [Mon, 6 Jun 2011 14:57:08 +0000 (16:57 +0200)]
perf_events: Fix validation of events using an extra reg
The validate_group() function needs to validate events with
extra shared regs. Within an event group, only events with
the same value for the extra reg can co-exist. This was not
checked by validate_group() because it was missing the
shared_regs logic.
This patch changes the allocation of the fake cpuc used for
validation to also point to a fake shared_regs structure such
that group events be properly testing.
It modifies __intel_shared_reg_get_constraints() to use
spin_lock_irqsave() to avoid lockdep issues.
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20110606145708.GA7279@quad
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Stephane Eranian [Mon, 6 Jun 2011 14:57:03 +0000 (16:57 +0200)]
perf_events: Update Intel extra regs shared constraints management
This patch improves the code managing the extra shared registers
used for offcore_response events on Intel Nehalem/Westmere. The
idea is to use static allocation instead of dynamic allocation.
This simplifies greatly the get and put constraint routines for
those events.
The patch also renames per_core to shared_regs because the same
data structure gets used whether or not HT is on. When HT is
off, those events still need to coordination because they use
a extra MSR that has to be shared within an event group.
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20110606145703.GA7258@quad
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Peter Zijlstra [Mon, 27 Jun 2011 14:47:16 +0000 (16:47 +0200)]
perf: Remove the perf_output_begin(.sample) argument
Since only samples call perf_output_sample() its much saner (and more
correct) to put the sample logic in there than in the
perf_output_begin()/perf_output_end() pair.
Saves a useless argument, reduces conditionals and shrinks
struct perf_output_handle, win!
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-2crpvsx3cqu67q3zqjbnlpsc@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Peter Zijlstra [Mon, 27 Jun 2011 12:41:57 +0000 (14:41 +0200)]
perf: Remove the nmi parameter from the swevent and overflow interface
The nmi parameter indicated if we could do wakeups from the current
context, if not, we would set some state and self-IPI and let the
resulting interrupt do the wakeup.
For the various event classes:
- hardware: nmi=0; PMI is in fact an NMI or we run irq_work_run from
the PMI-tail (ARM etc.)
- tracepoint: nmi=0; since tracepoint could be from NMI context.
- software: nmi=[0,1]; some, like the schedule thing cannot
perform wakeups, and hence need 0.
As one can see, there is very little nmi=1 usage, and the down-side of
not using it is that on some platforms some software events can have a
jiffy delay in wakeup (when arch_irq_work_raise isn't implemented).
The up-side however is that we can remove the nmi parameter and save a
bunch of conditionals in fast paths.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Michael Cree <mcree@orcon.net.nz>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com>
Cc: Anton Blanchard <anton@samba.org>
Cc: Eric B Munson <emunson@mgebm.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Don Zickus <dzickus@redhat.com>
Link: http://lkml.kernel.org/n/tip-agjev8eu666tvknpb3iaj0fg@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cyrill Gorcunov [Thu, 23 Jun 2011 12:49:18 +0000 (16:49 +0400)]
perf, x86: Add hw_watchdog_set_attr() in a sake of nmi-watchdog on P4
Due to restriction and specifics of Netburst PMU we need a separated
event for NMI watchdog. In particular every Netburst event
consumes not just a counter and a config register, but also an
additional ESCR register.
Since ESCR registers are grouped upon counters (i.e. if ESCR is occupied
for some event there is no room for another event to enter until its
released) we need to pick up the "least" used ESCR (or the most available
one) for nmi-watchdog purposes -- so MSR_P4_CRU_ESCR2/3 was chosen.
With this patch nmi-watchdog and perf top should be able to run simultaneously.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
CC: Lin Ming <ming.m.lin@intel.com>
CC: Arnaldo Carvalho de Melo <acme@redhat.com>
CC: Frederic Weisbecker <fweisbec@gmail.com>
Tested-and-reviewed-by: Don Zickus <dzickus@redhat.com>
Tested-and-reviewed-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20110623124918.GC13050@sun
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Eric B Munson [Fri, 24 Jun 2011 16:26:26 +0000 (12:26 -0400)]
events: Ensure that timers are updated without requiring read() call
The event tracing infrastructure exposes two timers which should be updated
each time the value of the counter is updated. Currently, these counters are
only updated when userspace calls read() on the fd associated with an event.
This means that counters which are read via the mmap'd page exclusively never
have their timers updated. This patch adds ensures that the timers are updated
each time the values in the mmap'd page are updated.
Signed-off-by: Eric B Munson <emunson@mgebm.net>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1308932786-5111-1-git-send-email-emunson@mgebm.net
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Eric B Munson [Thu, 23 Jun 2011 20:34:38 +0000 (16:34 -0400)]
events: Move lockless timer calculation into helper function
Take the timer calculation from perf_output_read and move it to a helper
function for any place that needs timer values but cannot take the ctx->lock.
Signed-off-by: Eric B Munson <emunson@mgebm.net>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1308861279-15216-2-git-send-email-emunson@mgebm.net
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Eric B Munson [Thu, 23 Jun 2011 20:34:37 +0000 (16:34 -0400)]
events: Add note to update_event_times comment about holding ctx->lock
Signed-off-by: Eric B Munson <emunson@mgebm.net>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1308861279-15216-1-git-send-email-emunson@mgebm.net
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Richard Kennedy [Tue, 7 Jun 2011 15:33:38 +0000 (16:33 +0100)]
perf: Remove 64-bit alignment padding from perf_event_context
Reorder perf_event_context to remove 8 bytes of 64 bit alignment padding
shrinking its size to 192 bytes, allowing it to fit into a smaller slab
and use one fewer cache lines.
Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1307460819.1950.5.camel@castor.rsk
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Vince Weaver [Wed, 1 Jun 2011 19:15:36 +0000 (15:15 -0400)]
perf_events: Fix perf buffer watermark setting
Since 2.6.36 (specifically commit
d57e34fdd60b ("perf: Simplify the
ring-buffer logic: make perf_buffer_alloc() do everything needed"),
the perf_buffer_init_code() has been mis-setting the buffer watermark
if perf_event_attr.wakeup_events has a non-zero value.
This is because perf_event_attr.wakeup_events is a union with
perf_event_attr.wakeup_watermark.
This commit re-enables the check for perf_event_attr.watermark being
set before continuing with setting a non-default watermark.
This bug is most noticable when you are trying to use PERF_IOC_REFRESH
with a value larger than one and perf_event_attr.wakeup_events is set to
one. In this case the buffer watermark will be set to 1 and you will
get extraneous POLL_IN overflows rather than POLL_HUP as expected.
[ avoid using attr.wakeup_events when attr.watermark is set ]
Signed-off-by: Vince Weaver <vweaver1@eecs.utk.edu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: <stable@kernel.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.00.1106011506390.5384@cl320.eecs.utk.edu
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Peter Zijlstra [Tue, 28 Jun 2011 10:15:51 +0000 (12:15 +0200)]
irq_work, alpha: Fix up arch hooks
Commit
e360adbe29 ("irq_work: Add generic hardirq context
callbacks") fouled up the Alpha bit, not properly naming the
arch specific function that raises the 'self-IPI'.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Michael Cree <mcree@orcon.net.nz>
Cc: stable@kernel.org # 37+
Link: http://lkml.kernel.org/n/tip-gukh0txmql2l4thgrekzzbfy@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Peter Zijlstra [Mon, 27 Jun 2011 15:22:43 +0000 (17:22 +0200)]
irq_work, ppc: Fix up arch hooks
Commit
e360adbe29 ("irq_work: Add generic hardirq context
callbacks") fouled up the ppc bit, not properly naming the
arch specific function that raises the 'self-IPI'.
Cc: Huang Ying <ying.huang@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Anton Blanchard <anton@samba.org>
Cc: Eric B Munson <emunson@mgebm.net>
Cc: stable@kernel.org # 37+
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-eg0aqien8p1aqvzu9dft6dtv@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar [Fri, 1 Jul 2011 08:28:42 +0000 (10:28 +0200)]
Merge commit 'v3.0-rc5' into perf/core
Merge reason: Pick up the latest fixes.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Frederic Weisbecker [Wed, 29 Jun 2011 21:52:52 +0000 (23:52 +0200)]
perf tools: Only display parent field if explictly sorted
We don't need to display the parent field if the parent
sorting machinery is only used for parent filtering
(as in "-p foo").
However if parent filtering is used in combination with
explicit parent sorting ( -s parent), we want to
display it.
Result with:
perf report -p kernel_thread -s parent
Before:
# Overhead Parent symbol
# ........ .............
#
0.07%
|
--- ioread8
ata_sff_check_status
ata_sff_tf_load
ata_sff_qc_issue
ata_bmdma_qc_issue
ata_qc_issue
ata_scsi_translate
ata_scsi_queuecmd
scsi_dispatch_cmd
scsi_request_fn
__blk_run_queue
__make_request
generic_make_request
submit_bio
submit_bh
journal_submit_commit_record
jbd2_journal_commit_transaction
kjournald2
kthread
kernel_thread_helpe
After:
# Overhead Parent symbol
# ........ .............
#
0.07% kernel_thread_helper
|
--- ioread8
ata_sff_check_status
ata_sff_tf_load
ata_sff_qc_issue
ata_bmdma_qc_issue
ata_qc_issue
ata_scsi_translate
ata_scsi_queuecmd
scsi_dispatch_cmd
scsi_request_fn
__blk_run_queue
__make_request
generic_make_request
submit_bio
submit_bh
journal_submit_commit_record
jbd2_journal_commit_transaction
kjournald2
kthread
kernel_thread_helper
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Sam Liao <phyomh@gmail.com>
Frederic Weisbecker [Wed, 29 Jun 2011 21:08:14 +0000 (23:08 +0200)]
perf tools: Allow sort dimensions to be registered more than once
So that the parent sort dimension can be registered twice: once
if we add it as an explicit sort dimension (-s parent) and twice
if we request a parent filter (-p foo).
We'll have only one parent sort dimension in the end but this
allows to override the default parent filter with we gave in "-p"
option. The goal of this is to prepare to allow the use of
"-s parent" and "-p foo" at the same time, ie: sort by filtered
parent.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Sam Liao <phyomh@gmail.com>
Frederic Weisbecker [Wed, 29 Jun 2011 20:23:03 +0000 (22:23 +0200)]
perf tools: Don't display ignored entries on stdio ui
As for newt ui, don't display entries that have been marked
as ignored.
The practical current effect of this is to make parent
filtering really working. Before, entries that were ignored
were given a null parent but were still displayed. This
resulted in some weird effects:
# Overhead Command Shared Object Symbol
# ........ ........... ................. ............
#
^A
|
--- __lock_acquire
|
|--95.97%-- lock_acquire
| |
| |--30.75%-- _raw_spin_lock
Discard these from the stdio display.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Sam Liao <phyomh@gmail.com>
Frederic Weisbecker [Wed, 29 Jun 2011 01:25:14 +0000 (03:25 +0200)]
perf tools: Remove sort print helpers declarations
These are probably some old leftovers.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Sam Liao <phyomh@gmail.com>
Frederic Weisbecker [Wed, 29 Jun 2011 01:14:52 +0000 (03:14 +0200)]
perf tools: Make sort operations static
These don't need to be globally visible.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Sam Liao <phyomh@gmail.com>
Sam Liao [Tue, 7 Jun 2011 15:49:46 +0000 (23:49 +0800)]
perf tools: Add inverted call graph report support.
Add "caller/callee" option to support inverted butterfly report,
in the inverted report (with caller option), the call graph start
from the callee's ancestor. Users can use such view to catch system's
performance bottleneck from a sysprof like view. Using this option
with specified sort order like pid gives us high level view of call
graph statistics.
Also add "-G" alias for inverted call graph.
Signed-off-by: Sam Liao <phyomh@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Linus Torvalds [Tue, 28 Jun 2011 02:12:22 +0000 (19:12 -0700)]
Linux 3.0-rc5
Hugh Dickins [Mon, 27 Jun 2011 23:18:20 +0000 (16:18 -0700)]
drm/i915: more struct_mutex locking
When auditing the locking in i915_gem.c (for a prospective change which
I then abandoned), I noticed two places where struct_mutex is not held
across GEM object manipulations that would usually require it.
Since one is in initial setup and the other in driver unload, I'm
guessing the mutex is not required for either; but post a patch in case
it is.
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Keith Packard <keithp@keithp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Mon, 27 Jun 2011 23:18:19 +0000 (16:18 -0700)]
drm/i915: use shmem_truncate_range
The interface to ->truncate_range is changing very slightly: once "tmpfs:
take control of its truncate_range" has been applied, this can be applied.
For now there is only a slight inefficiency while this remains unapplied,
but it will soon become essential for managing shmem's use of swap.
Change i915_gem_object_truncate() to use shmem_truncate_range() directly:
which should also spare i915 later change if we switch from
inode_operations->truncate_range to file_operations->fallocate.
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Keith Packard <keithp@keithp.com>
Cc: Dave Airlie <airlied@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Mon, 27 Jun 2011 23:18:18 +0000 (16:18 -0700)]
drm/i915: use shmem_read_mapping_page
Soon tmpfs will stop supporting ->readpage and read_cache_page_gfp(): once
"tmpfs: add shmem_read_mapping_page_gfp" has been applied, this patch can
be applied to ease the transition.
Make i915_gem_object_get_pages_gtt() use shmem_read_mapping_page_gfp() in
the one place it's needed; elsewhere use shmem_read_mapping_page(), with
the mapping's gfp_mask properly initialized.
Forget about __GFP_COLD: since tmpfs initializes its pages with memset,
asking for a cold page is counter-productive.
Include linux/shmem_fs.h also in drm_gem.c: with shmem_file_setup() now
declared there too, we shall remove the prototype from linux/mm.h later.
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Keith Packard <keithp@keithp.com>
Cc: Dave Airlie <airlied@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Mon, 27 Jun 2011 23:18:17 +0000 (16:18 -0700)]
drm/ttm: use shmem_read_mapping_page
Soon tmpfs will stop supporting ->readpage and read_mapping_page(): once
"tmpfs: add shmem_read_mapping_page_gfp" has been applied, this patch can
be applied to ease the transition.
ttm_tt_swapin() and ttm_tt_swapout() use shmem_read_mapping_page() in
place of read_mapping_page(), since their swap_space has been created with
shmem_file_setup().
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: Dave Airlie <airlied@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morton [Mon, 27 Jun 2011 23:18:16 +0000 (16:18 -0700)]
drivers/tty/serial/8250_pci.c: fix warning
Fis the warning
drivers/tty/serial/8250_pci.c:1457: warning: initialization from incompatible pointer type
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ralf Baechle [Mon, 27 Jun 2011 23:18:15 +0000 (16:18 -0700)]
drivers/misc/ioc4.c: fix section mismatch / race condition
Fix this section mismatch:
WARNING: drivers/misc/ioc4.o(.data+0x144): Section mismatch in reference from the variable ioc4_load_modules_work to the function .devinit.text:ioc4_load_modules()
The variable ioc4_load_modules_work references
the function __devinit ioc4_load_modules()
If the reference is valid then annotate the
variable with __init* or __refdata (see linux/init.h) or name the variable:
*driver, *_template, *_timer, *_sht, *_ops, *_probe, *_probe_one, *_console
This one is potentially fatal; by the time ioc4_load_modules is invoked
it may already have been freed. For that reason ioc4_load_modules_work
can't be turned to __devinitdata but also because it's referenced in
ioc4_exit.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Acked-by: Brent Casavant <bcasavan@sgi.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ralf Baechle [Mon, 27 Jun 2011 23:18:14 +0000 (16:18 -0700)]
drivers/leds/leds-lp5523.c: fix section mismatches
Fix this section mismatch:
WARNING: drivers/leds/leds-lp5523.o(.text+0x12f4): Section mismatch in reference from the function lp5523_probe() to the function .init.text:lp5523_init_led()
The function lp5523_probe() references
the function __init lp5523_init_led().
This is often because lp5523_probe lacks a __init
annotation or the annotation of lp5523_init_led is wrong.
Fixing this one triggers one more mismatch, fix that one as well.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ralf Baechle [Mon, 27 Jun 2011 23:18:13 +0000 (16:18 -0700)]
drivers/leds/leds-lp5521.c: fix section mismatches
Fix this section mismatch:
WARNING: drivers/leds/leds-lp5521.o(.text+0xf2c): Section mismatch in reference from the function lp5521_probe() to the function .init.text:lp5521_init_led()
The function lp5521_probe() references
the function __init lp5521_init_led().
This is often because lp5521_probe lacks a __init
annotation or the annotation of lp5521_init_led is wrong.
Fixing this mismatch triggers one more mismatch, fix that one as well.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
KAMEZAWA Hiroyuki [Mon, 27 Jun 2011 23:18:12 +0000 (16:18 -0700)]
memcg: fix direct softlimit reclaim to be called in limit path
Commit
d149e3b25d7c ("memcg: add the soft_limit reclaim in global direct
reclaim") adds a softlimit hook to shrink_zones(). By this, soft limit
is called as
try_to_free_pages()
do_try_to_free_pages()
shrink_zones()
mem_cgroup_soft_limit_reclaim()
Then, direct reclaim is memcg softlimit hint aware, now.
But, the memory cgroup's "limit" path can call softlimit shrinker.
try_to_free_mem_cgroup_pages()
do_try_to_free_pages()
shrink_zones()
mem_cgroup_soft_limit_reclaim()
This will cause a global reclaim when a memcg hits limit.
This is bug. soft_limit_reclaim() should be called when
scanning_global_lru(sc) == true.
And the commit adds a variable "total_scanned" for counting softlimit
scanned pages....it's not "total". This patch removes the variable and
update sc->nr_scanned instead of it. This will affect shrink_slab()'s
scan condition but, global LRU is scanned by softlimit and I think this
change makes sense.
TODO: avoid too much scanning of a zone when softlimit did enough work.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Ying Han <yinghan@google.com>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vasiliy Kulikov [Mon, 27 Jun 2011 23:18:11 +0000 (16:18 -0700)]
taskstats: don't allow duplicate entries in listener mode
Currently a single process may register exit handlers unlimited times.
It may lead to a bloated listeners chain and very slow process
terminations.
Eg after 10KK sent TASKSTATS_CMD_ATTR_REGISTER_CPUMASKs ~300 Mb of
kernel memory is stolen for the handlers chain and "time id" shows 2-7
seconds instead of normal 0.003. It makes it possible to exhaust all
kernel memory and to eat much of CPU time by triggerring numerous exits
on a single CPU.
The patch limits the number of times a single process may register
itself on a single CPU to one.
One little issue is kept unfixed - as taskstats_exit() is called before
exit_files() in do_exit(), the orphaned listener entry (if it was not
explicitly deregistered) is kept until the next someone's exit() and
implicit deregistration in send_cpu_listeners(). So, if a process
registered itself as a listener exits and the next spawned process gets
the same pid, it would inherit taskstats attributes.
Signed-off-by: Vasiliy Kulikov <segooon@gmail.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Kara [Mon, 27 Jun 2011 23:18:10 +0000 (16:18 -0700)]
mm: fix assertion mapping->nrpages == 0 in end_writeback()
Under heavy memory and filesystem load, users observe the assertion
mapping->nrpages == 0 in end_writeback() trigger. This can be caused by
page reclaim reclaiming the last page from a mapping in the following
race:
CPU0 CPU1
...
shrink_page_list()
__remove_mapping()
__delete_from_page_cache()
radix_tree_delete()
evict_inode()
truncate_inode_pages()
truncate_inode_pages_range()
pagevec_lookup() - finds nothing
end_writeback()
mapping->nrpages != 0 -> BUG
page->mapping = NULL
mapping->nrpages--
Fix the problem by doing a reliable check of mapping->nrpages under
mapping->tree_lock in end_writeback().
Analyzed by Jay <jinshan.xiong@whamcloud.com>, lost in LKML, and dug out
by Miklos Szeredi <mszeredi@suse.de>.
Cc: Jay <jinshan.xiong@whamcloud.com>
Cc: Miklos Szeredi <mszeredi@suse.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Peter Zijlstra [Mon, 27 Jun 2011 23:18:09 +0000 (16:18 -0700)]
mm/memory-failure.c: fix spinlock vs mutex order
We cannot take a mutex while holding a spinlock, so flip the order and
fix the locking documentation.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Josh Hunt [Mon, 27 Jun 2011 23:18:08 +0000 (16:18 -0700)]
drivers/misc/lkdtm.c: fix race when crashpoint is hit multiple times before checking count
We observed the crash point count going negative in cases where the
crash point is hit multiple times before the check of "count == 0" is
done. Because of this we never call lkdtm_do_action(). This patch just
adds a spinlock to protect count.
Reported-by: Tapan Dhimant <tdhimant@akamai.com>
Signed-off-by: Josh Hunt <johunt@akamai.com>
Acked-by: Ankita Garg <ankita@in.ibm.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Chris Metcalf [Mon, 27 Jun 2011 23:18:07 +0000 (16:18 -0700)]
include/linux/compat.h: declare compat_sys_sendmmsg()
This is required for tilegx to be able to use the compat unistd.h header
where compat_sys_sendmmsg() is now mentioned.
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bob Liu [Mon, 27 Jun 2011 23:18:06 +0000 (16:18 -0700)]
romfs: fix romfs_get_unmapped_area() argument check
romfs_get_unmapped_area() checks argument `len' without considering
PAGE_ALIGN which will cause do_mmap_pgoff() return -EINVAL error after
commit
f67d9b1576c ("nommu: add page_align to mmap").
Fix the check by changing it in same way ramfs_nommu_get_unmapped_area()
was changed in ramfs/file-nommu.c.
Signed-off-by: Bob Liu <lliubbo@gmail.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Acked-by: Greg Ungerer <gerg@snapgear.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Richard Weinberger [Mon, 27 Jun 2011 23:18:05 +0000 (16:18 -0700)]
um: add asm/percpu.h
To make SLUB work on UML we need this_cpu_cmpxchg from
asm-generic/percpu.h.
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Priyanka Jain [Mon, 27 Jun 2011 23:18:04 +0000 (16:18 -0700)]
drivers/rtc/rtc-ds1307.c: add support for RTC device pt7c4338
PT7C4338 chip is being manufactured by Pericom Technology Inc. It is a
serial real-time clock which provides:
1) Low-power clock/calendar.
2) Programmable square-wave output.
It has 56 bytes of nonvolatile RAM. Its register set is same as that of
rtc device: DS1307.
Signed-off-by: Priyanka Jain <Priyanka.Jain@freescale.com>
Acked-by: Timur Tabi <timur@freescale.com>
Reviewed-by: Wolfram Sang <w.sang@pengutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Mon, 27 Jun 2011 23:18:04 +0000 (16:18 -0700)]
tmpfs: add shmem_read_mapping_page_gfp
Although it is used (by i915) on nothing but tmpfs, read_cache_page_gfp()
is unsuited to tmpfs, because it inserts a page into pagecache before
calling the filesystem's ->readpage: tmpfs may have pages in swapcache
which only it knows how to locate and switch to filecache.
At present tmpfs provides a ->readpage method, and copes with this by
copying pages; but soon we can simplify it by removing its ->readpage.
Provide shmem_read_mapping_page_gfp() now, ready for that transition,
Export shmem_read_mapping_page_gfp() and add it to list in shmem_fs.h,
with shmem_read_mapping_page() inline for the common mapping_gfp case.
(shmem_read_mapping_page_gfp or shmem_read_cache_page_gfp? Generally the
read_mapping_page functions use the mapping's ->readpage, and the
read_cache_page functions use the supplied filler, so I think
read_cache_page_gfp was slightly misnamed.)
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Mon, 27 Jun 2011 23:18:03 +0000 (16:18 -0700)]
tmpfs: take control of its truncate_range
2.6.35's new truncate convention gave tmpfs the opportunity to control
its file truncation, no longer enforced from outside by vmtruncate().
We shall want to build upon that, to handle pagecache and swap together.
Slightly redefine the ->truncate_range interface: let it now be called
between the unmap_mapping_range()s, with the filesystem responsible for
doing the truncate_inode_pages_range() from it - just as the filesystem
is nowadays responsible for doing that from its ->setattr.
Let's rename shmem_notify_change() to shmem_setattr(). Instead of
calling the generic truncate_setsize(), bring that code in so we can
call shmem_truncate_range() - which will later be updated to perform its
own variant of truncate_inode_pages_range().
Remove the punch_hole unmap_mapping_range() from shmem_truncate_range():
now that the COW's unmap_mapping_range() comes after ->truncate_range,
there is no need to call it a third time.
Export shmem_truncate_range() and add it to the list in shmem_fs.h, so
that i915_gem_object_truncate() can call it explicitly in future; get
this patch in first, then update drm/i915 once this is available (until
then, i915 will just be doing the truncate_inode_pages() twice).
Though introduced five years ago, no other filesystem is implementing
->truncate_range, and its only other user is madvise(,,MADV_REMOVE): we
expect to convert it to fallocate(,FALLOC_FL_PUNCH_HOLE,,) shortly,
whereupon ->truncate_range can be removed from inode_operations -
shmem_truncate_range() will help i915 across that transition too.
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Mon, 27 Jun 2011 23:18:02 +0000 (16:18 -0700)]
mm: move shmem prototypes to shmem_fs.h
Before adding any more global entry points into shmem.c, gather such
prototypes into shmem_fs.h. Remove mm's own declarations from swap.h,
but for now leave the ones in mm.h: because shmem_file_setup() and
shmem_zero_setup() are called from various places, and we should not
force other subsystems to update immediately.
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Mon, 27 Jun 2011 23:18:01 +0000 (16:18 -0700)]
mm: move vmtruncate_range to truncate.c
You would expect to find vmtruncate_range() next to vmtruncate() in
mm/truncate.c: move it there.
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vitaliy Ivanov [Mon, 27 Jun 2011 16:07:08 +0000 (19:07 +0300)]
Fix some kernel-doc warnings
Fix 'make htmldocs' warnings:
Warning(/include/linux/hrtimer.h:153): No description found for parameter 'clockid'
Warning(/include/linux/device.h:604): Excess struct/union/enum/typedef member 'of_match' description in 'device'
Warning(/include/net/sock.h:349): Excess struct/union/enum/typedef member 'sk_rmem_alloc' description in 'sock'
Signed-off-by: Vitaliy Ivanov <vitalivanov@gmail.com>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Mon, 27 Jun 2011 21:55:43 +0000 (14:55 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/cjb/mmc
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc:
mmc: queue: bring discard_granularity/alignment into line with SCSI
mmc: queue: append partition subname to queue thread name
mmc: core: make erase timeout calculation allow for gated clock
mmc: block: switch card to User Data Area when removing the block driver
mmc: sdio: reset card during power_restore
mmc: cb710: fix #ifdef HAVE_EFFICIENT_UNALIGNED_ACCESS
mmc: sdhi: DMA slave ID 0 is invalid
mmc: tmio: fix regression in TMIO_MMC_WRPROTECT_DISABLE handling
mmc: omap_hsmmc: use original sg_len for dma_unmap_sg
mmc: omap_hsmmc: fix ocr mask usage
mmc: sdio: fix runtime PM path during driver removal
mmc: Add PCI fixup quirks for Ricoh 1180:e823 reader
mmc: sdhi: fix module unloading
mmc: of_mmc_spi: add NO_IRQ define to of_mmc_spi.c
mmc: vub300: fix null dereferences in error handling
KAMEZAWA Hiroyuki [Thu, 16 Jun 2011 08:28:07 +0000 (17:28 +0900)]
Fix node_start/end_pfn() definition for mm/page_cgroup.c
commit
21a3c96 uses node_start/end_pfn(nid) for detection start/end
of nodes. But, it's not defined in linux/mmzone.h but defined in
/arch/???/include/mmzone.h which is included only under
CONFIG_NEED_MULTIPLE_NODES=y.
Then, we see
mm/page_cgroup.c: In function 'page_cgroup_init':
mm/page_cgroup.c:308: error: implicit declaration of function 'node_start_pfn'
mm/page_cgroup.c:309: error: implicit declaration of function 'node_end_pfn'
So, fixiing page_cgroup.c is an idea...
But node_start_pfn()/node_end_pfn() is a very generic macro and
should be implemented in the same manner for all archs.
(m32r has different implementation...)
This patch removes definitions of node_start/end_pfn() in each archs
and defines a unified one in linux/mmzone.h. It's not under
CONFIG_NEED_MULTIPLE_NODES, now.
A result of macro expansion is here (mm/page_cgroup.c)
for !NUMA
start_pfn = ((&contig_page_data)->node_start_pfn);
end_pfn = ({ pg_data_t *__pgdat = (&contig_page_data); __pgdat->node_start_pfn + __pgdat->node_spanned_pages;});
for NUMA (x86-64)
start_pfn = ((node_data[nid])->node_start_pfn);
end_pfn = ({ pg_data_t *__pgdat = (node_data[nid]); __pgdat->node_start_pfn + __pgdat->node_spanned_pages;});
Changelog:
- fixed to avoid using "nid" twice in node_end_pfn() macro.
Reported-and-acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Reported-and-tested-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Mon, 27 Jun 2011 20:32:14 +0000 (13:32 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/mason/btrfs-unstable
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
btrfs: fix inconsonant inode information
Btrfs: make sure to update total_bitmaps when freeing cache V3
Btrfs: fix type mismatch in find_free_extent()
Btrfs: make sure to record the transid in new inodes
Linus Torvalds [Mon, 27 Jun 2011 16:01:29 +0000 (09:01 -0700)]
Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs
* 'for-linus' of git://oss.sgi.com/xfs/xfs:
xfs: prevent bogus assert when trying to remove non-existent attribute
xfs: clear XFS_IDIRTY_RELEASE on truncate down
xfs: reset inode per-lifetime state when recycling it
Linus Torvalds [Mon, 27 Jun 2011 16:00:50 +0000 (09:00 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jikos/hid
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
HID: hid-multitouch: add support for a new Lumio dual-touch panel
HID: hid-multitouch: correct VID for Stantum panels
HID: hid-multitouch: ensure slots are initialized
Linus Torvalds [Mon, 27 Jun 2011 15:58:23 +0000 (08:58 -0700)]
Merge branch 'fixes' of /home/rmk/linux-2.6-arm
* 'fixes' of master.kernel.org:/home/rmk/linux-2.6-arm:
ARM: pm: ensure ARMv7 CPUs save and restore the TLS register
ARM: pm: proc-v7: fix missing struct processor pointers for suspend code
ARM: 6969/1: plat-iop: fix build error
ARM: 6961/1: zImage: Add build-time check for correctly-sized proc_type entries
ARM: SMP: wait for CPU to be marked active
ARM: 6963/1: Thumb-2: Relax relocation requirements for non-function symbols
ARM: 6962/1: mach-h720x: fix build error
ARM: 6959/1: SMP build fix for entry-macro-multi.S
Linus Torvalds [Mon, 27 Jun 2011 15:57:46 +0000 (08:57 -0700)]
Merge branch 'for-linus' of git://git390.marist.edu/linux-2.6
* 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6:
[S390] allow setting of upper 32 bit in smp_ctl_set_bit
[S390] hwsampler: Set a sane default sampling rate
[S390] s390: enforce HW limits for the initial sampling rate
[S390] kvm-s390: fix kconfig dependencies
Miao Xie [Thu, 23 Jun 2011 07:27:13 +0000 (07:27 +0000)]
btrfs: fix inconsonant inode information
When iputting the inode, We may leave the delayed nodes if they have some
delayed items that have not been dealt with. So when the inode is read again,
we must look up the relative delayed node, and use the information in it to
initialize the inode. Or we will get inconsonant inode information, it may
cause that the same directory index number is allocated again, and hit the
following oops:
[ 5447.554187] err add delayed dir index item(name: pglog_0.965_0) into the
insertion tree of the delayed node(root id: 262, inode id: 258, errno: -17)
[ 5447.569766] ------------[ cut here ]------------
[ 5447.575361] kernel BUG at fs/btrfs/delayed-inode.c:1301!
[SNIP]
[ 5447.790721] Call Trace:
[ 5447.793191] [<
ffffffffa0641c4e>] btrfs_insert_dir_item+0x189/0x1bb [btrfs]
[ 5447.800156] [<
ffffffffa0651a45>] btrfs_add_link+0x12b/0x191 [btrfs]
[ 5447.806517] [<
ffffffffa0651adc>] btrfs_add_nondir+0x31/0x58 [btrfs]
[ 5447.812876] [<
ffffffffa0651d6a>] btrfs_create+0xf9/0x197 [btrfs]
[ 5447.818961] [<
ffffffff8111f840>] vfs_create+0x72/0x92
[ 5447.824090] [<
ffffffff8111fa8c>] do_last+0x22c/0x40b
[ 5447.829133] [<
ffffffff8112076a>] path_openat+0xc0/0x2ef
[ 5447.834438] [<
ffffffff810c58e2>] ? __perf_event_task_sched_out+0x24/0x44
[ 5447.841216] [<
ffffffff8103ecdd>] ? perf_event_task_sched_out+0x59/0x67
[ 5447.847846] [<
ffffffff81121a79>] do_filp_open+0x3d/0x87
[ 5447.853156] [<
ffffffff811e126c>] ? strncpy_from_user+0x43/0x4d
[ 5447.859072] [<
ffffffff8111f1f5>] ? getname_flags+0x2e/0x80
[ 5447.864636] [<
ffffffff8111f179>] ? do_getname+0x14b/0x173
[ 5447.870112] [<
ffffffff8111f1b7>] ? audit_getname+0x16/0x26
[ 5447.875682] [<
ffffffff8112b1ab>] ? spin_lock+0xe/0x10
[ 5447.880882] [<
ffffffff81112d39>] do_sys_open+0x69/0xae
[ 5447.886153] [<
ffffffff81112db1>] sys_open+0x20/0x22
[ 5447.891114] [<
ffffffff813b9aab>] system_call_fastpath+0x16/0x1b
Fix it by reusing the old delayed node.
Reported-by: Jim Schutt <jaschut@sandia.gov>
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Tested-by: Jim Schutt <jaschut@sandia.gov>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Linus Torvalds [Mon, 27 Jun 2011 02:40:31 +0000 (19:40 -0700)]
Merge git://git./linux/kernel/git/sfrench/cifs-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
cifs: mark CONFIG_CIFS_NFSD_EXPORT as BROKEN
cifs: free blkcipher in smbhash
Linus Torvalds [Mon, 27 Jun 2011 02:39:22 +0000 (19:39 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
cifs: propagate errors from cifs_get_root() to mount(2)
cifs: tidy cifs_do_mount() up a bit
cifs: more breakage on mount failures
cifs: close sget() races
cifs: pull freeing mountdata/dropping nls/freeing cifs_sb into cifs_umount()
cifs: move cifs_umount() call into ->kill_sb()
cifs: pull cifs_mount() call up
sanitize cifs_umount() prototype
cifs: initialize ->tlink_tree in cifs_setup_cifs_sb()
cifs: allocate mountdata earlier
cifs: leak on mount if we share superblock
cifs: don't pass superblock to cifs_mount()
cifs: don't leak nls on mount failure
cifs: double free on mount failure
take bdi setup/destruction into cifs_mount/cifs_umount
Acked-by: Steve French <smfrench@gmail.com>
Adrian Hunter [Thu, 23 Jun 2011 10:40:29 +0000 (13:40 +0300)]
mmc: queue: bring discard_granularity/alignment into line with SCSI
SCSI defines discard alignment as the offset to the first
optimal discard. In the case of SD/MMC, that is always zero
which is the default.
SCSI defines discard granularity as a hint of a optimal
discard size. That is much better expressed by the MMC
"preferred erase size" (pref_erase) field.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Chris Ball <cjb@laptop.org>
Adrian Hunter [Thu, 23 Jun 2011 10:40:28 +0000 (13:40 +0300)]
mmc: queue: append partition subname to queue thread name
For example, an eMMC with 2 boot partitions will have 3 threads.
The names change from:
40 ? 00:00:00 mmcqd/0
41 ? 00:00:00 mmcqd/0
42 ? 00:00:00 mmcqd/0
to:
40 ? 00:00:00 mmcqd/0
41 ? 00:00:00 mmcqd/0boot0
42 ? 00:00:00 mmcqd/0boot1
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andrei Warkentin <andreiw@motorola.com>
Signed-off-by: Chris Ball <cjb@laptop.org>
Adrian Hunter [Thu, 23 Jun 2011 10:40:27 +0000 (13:40 +0300)]
mmc: core: make erase timeout calculation allow for gated clock
The erase timeout calculation may depend on clock rate
which is zero if the clock is gated, so use
mmc_host_clk_rate() which allows for that case.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Chris Ball <cjb@laptop.org>
Adrian Hunter [Thu, 23 Jun 2011 10:40:26 +0000 (13:40 +0300)]
mmc: block: switch card to User Data Area when removing the block driver
The MMC block driver and other drivers (e.g. mmc-test) will expect
the card to be switched to the User Data Area eMMC partition when
they start. Hence the MMC block driver should ensure it is that
way when it is removed.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andrei Warkentin <andreiw@motorola.com>
Signed-off-by: Chris Ball <cjb@laptop.org>
Daniel Drake [Sat, 25 Jun 2011 18:20:11 +0000 (19:20 +0100)]
mmc: sdio: reset card during power_restore
mmc_sdio_power_restore() skips some steps that are performed in other
power-related codepaths which are necessary to fully reset the card.
Without this, runtime PM fails for SD8686 SDIO wifi on OLPC XO-1.5.
Signed-off-by: Daniel Drake <dsd@laptop.org>
Signed-off-by: Chris Ball <cjb@laptop.org>
James Hogan [Tue, 21 Jun 2011 09:55:34 +0000 (10:55 +0100)]
mmc: cb710: fix #ifdef HAVE_EFFICIENT_UNALIGNED_ACCESS
HAVE_EFFICIENT_UNALIGNED_ACCESS is a config option, therefore it needs
the CONFIG_ before it when used by the preprocessor.
Signed-off-by: James Hogan <james@albanarts.com>
Acked-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: Chris Ball <cjb@laptop.org>
Guennadi Liakhovetski [Tue, 24 May 2011 10:24:07 +0000 (12:24 +0200)]
mmc: sdhi: DMA slave ID 0 is invalid
Don't try to allocate DMA resources if the platform didn't specify
positive DMA slave IDs.
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Chris Ball <cjb@laptop.org>
Guennadi Liakhovetski [Mon, 20 Jun 2011 14:51:10 +0000 (16:51 +0200)]
mmc: tmio: fix regression in TMIO_MMC_WRPROTECT_DISABLE handling
Commit
b6147490e6aac82 ("mmc: tmio: split core functionality, DMA and
MFD glue") broke handling of the TMIO_MMC_WRPROTECT_DISABLE flag by
the tmio-mmc driver. This patch restores the original behaviour.
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Cc: <stable@kernel.org>
Signed-off-by: Chris Ball <cjb@laptop.org>
Per Forlin [Fri, 17 Jun 2011 18:14:21 +0000 (20:14 +0200)]
mmc: omap_hsmmc: use original sg_len for dma_unmap_sg
Don't use the returned sg_len from dma_map_sg() as inparameter
to dma_unmap_sg(). Use the original sg_len for both dma_map_sg
and dma_unmap_sg according to the documentation in DMA-API.txt.
Signed-off-by: Per Forlin <per.forlin@linaro.org>
Reviewed-by: Venkatraman S <svenkatr@ti.com>
Signed-off-by: Chris Ball <cjb@laptop.org>
Anand Gadiyar [Tue, 14 Jun 2011 10:29:59 +0000 (15:59 +0530)]
mmc: omap_hsmmc: fix ocr mask usage
The OMAP HSMMC driver uses an ocr_mask to determine the list of voltages
supported by the card. It populates this mask based on the list of
voltages supported by the regulator that supplies the voltage.
Commit
64be97822b (omap4 hsmmc: Update ocr mask for MMC2 for regulator
to use) passed a fixed ocr_mask from the OMAP4 SDP board file to limit
the voltage to 2.9-3.0 Volts, and updated the driver to use this mask
if provided, instead of using the regulator's supported voltages.
However the commit is buggy - the ocr_mask is overridden by the
regulator's capabilities anyway. Fix this.
(The bug shows up when a system-wide suspend is attempted on the OMAP4
SDP/Blaze platforms. The eMMC card comes up at 3V, but drops to 1.65V
after the system resumes).
Signed-off-by: Anand Gadiyar <gadiyar@ti.com>
Acked-by: Balaji T K <balajitk@ti.com>
Acked-by: Venkatraman S <svenkatr@ti.com>
Tested-by: Kishore Kadiyala <kishore.kadiyala@ti.com>
Signed-off-by: Sourav Poddar <sourav.poddar@ti.com>
Signed-off-by: Chris Ball <cjb@laptop.org>
Ohad Ben-Cohen [Thu, 9 Jun 2011 23:40:27 +0000 (23:40 +0000)]
mmc: sdio: fix runtime PM path during driver removal
After commit
e1866b3 "PM / Runtime: Rework runtime PM handling
during driver removal" was introduced, the driver core stopped
incrementing the runtime PM usage counter of the device during
the invocation of the ->remove() callback.
This indirectly broke SDIO's runtime PM path during driver removal,
because no one calls _put_sync() anymore after ->remove() completes.
This means that the power of runtime-PM-managed SDIO cards is kept
high after their driver is removed (even if it was powered down
beforehand).
Fix that by directly calling _put_sync() when the last usage
counter is downref'ed by the SDIO bus.
Reported-and-tested-by: Daniel Drake <dsd@laptop.org>
Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
Signed-off-by: Chris Ball <cjb@laptop.org>
Steve French [Sat, 25 Jun 2011 19:22:28 +0000 (19:22 +0000)]
Merge branch 'master' of /linux/kernel/git/torvalds/linux-2.6
Linus Torvalds [Sat, 25 Jun 2011 14:23:59 +0000 (07:23 -0700)]
Merge branch 'timer-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip
* 'timer-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
rtc: vt8500: Fix build error & cleanup rtc_class_ops->update_irq_enable()
alarmtimers: Return -ENOTSUPP if no RTC device is present
alarmtimers: Handle late rtc module loading