Ingo Molnar [Tue, 21 Aug 2012 09:23:40 +0000 (11:23 +0200)]
Merge branch 'tip/perf/core' of git://git./linux/kernel/git/rostedt/linux-trace into perf/core
Pull ftrace updates from Steve Rostedt:
" This patch series extends ftrace function tracing utility to be
more dynamic for its users. It allows for data passing to the callback
functions, as well as reading regs as if a breakpoint were to trigger
at function entry.
The main goal of this patch series was to allow kprobes to use ftrace
as an optimized probe point when a probe is placed on an ftrace nop.
With lots of help from Masami Hiramatsu, and going through lots of
iterations, we finally came up with a good solution. "
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Masami Hiramatsu [Tue, 5 Jun 2012 10:28:38 +0000 (19:28 +0900)]
kprobes/x86: ftrace based optimization for x86
Add function tracer based kprobe optimization support
handlers on x86. This allows kprobes to use function
tracer for probing on mcount call.
Link: http://lkml.kernel.org/r/20120605102838.27845.26317.stgit@localhost.localdomain
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: "Frank Ch. Eigler" <fche@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
[ Updated to new port of ftrace save regs functions ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Masami Hiramatsu [Tue, 5 Jun 2012 10:28:32 +0000 (19:28 +0900)]
kprobes: introduce ftrace based optimization
Introduce function trace based kprobes optimization.
With using ftrace optimization, kprobes on the mcount calling
address, use ftrace's mcount call instead of breakpoint.
Furthermore, this optimization works with preemptive kernel
not like as current jump-based optimization. Of cource,
this feature works only if the probe is on mcount call.
Only if kprobe.break_handler is set, that probe is not
optimized with ftrace (nor put on ftrace). The reason why this
limitation comes is that this break_handler may be used only
from jprobes which changes ip address (for fetching the function
arguments), but function tracer ignores modified ip address.
Changes in v2:
- Fix ftrace_ops registering right after setting its filter.
- Unregister ftrace_ops if there is no kprobe using.
- Remove notrace dependency from __kprobes macro.
Link: http://lkml.kernel.org/r/20120605102832.27845.63461.stgit@localhost.localdomain
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: "Frank Ch. Eigler" <fche@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Steven Rostedt [Wed, 6 Jun 2012 17:45:31 +0000 (13:45 -0400)]
ftrace: Make ftrace_location() a nop on !DYNAMIC_FTRACE
When CONFIG_DYNAMIC_FTRACE is not set, ftrace_location() is not defined.
If a user (like kprobes) references this function, it will break
the compile when CONFIG_DYNAMIC_FTRACE is not set.
Add ftrace_location() as a nop (return 0) when DYNAMIC_FTRACE
is not defined.
Link: http://lkml.kernel.org/r/20120612225426.961092717@goodmis.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Masami Hiramatsu [Tue, 5 Jun 2012 10:28:26 +0000 (19:28 +0900)]
kprobes: Move locks into appropriate functions
Break a big critical region into fine-grained pieces at
registering kprobe path. This helps us to solve circular
locking dependency when introducing ftrace-based kprobes.
Link: http://lkml.kernel.org/r/20120605102826.27845.81689.stgit@localhost.localdomain
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: "Frank Ch. Eigler" <fche@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Masami Hiramatsu [Tue, 5 Jun 2012 10:28:20 +0000 (19:28 +0900)]
kprobes: cleanup to separate probe-able check
Separate probe-able address checking code from
register_kprobe().
Link: http://lkml.kernel.org/r/20120605102820.27845.90133.stgit@localhost.localdomain
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: "Frank Ch. Eigler" <fche@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Steven Rostedt [Tue, 5 Jun 2012 10:28:14 +0000 (19:28 +0900)]
kprobes: Inverse taking of module_mutex with kprobe_mutex
Currently module_mutex is taken before kprobe_mutex, but this
can cause issues when we have kprobes register ftrace, as the ftrace
mutex is taken before enabling a tracepoint, which currently takes
the module mutex.
If module_mutex is taken before kprobe_mutex, then we can not
have kprobes use the ftrace infrastructure.
There seems to be no reason that the kprobe_mutex can't be taken
before the module_mutex. Running lockdep shows that it is safe
among the kernels I've run.
Link: http://lkml.kernel.org/r/20120605102814.27845.21047.stgit@localhost.localdomain
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: "Frank Ch. Eigler" <fche@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Masami Hiramatsu [Tue, 5 Jun 2012 10:28:08 +0000 (19:28 +0900)]
ftrace: add ftrace_set_filter_ip() for address based filter
Add a new filter update interface ftrace_set_filter_ip()
to set ftrace filter by ip address, not only glob pattern.
Link: http://lkml.kernel.org/r/20120605102808.27845.67952.stgit@localhost.localdomain
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: "Frank Ch. Eigler" <fche@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Steven Rostedt [Fri, 20 Jul 2012 17:45:59 +0000 (13:45 -0400)]
ftrace: Add selftest to test function save-regs support
Add selftests to test the save-regs functionality of ftrace.
If the arch supports saving regs, then it will make sure that regs is
at least not NULL in the callback.
If the arch does not support saving regs, it makes sure that the
registering of the ftrace_ops that requests saving regs fails.
It then tests the registering of the ftrace_ops succeeds if the
'IF_SUPPORTED' flag is set. Then it makes sure that the regs passed to
the function is NULL.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Steven Rostedt [Fri, 20 Jul 2012 17:08:05 +0000 (13:08 -0400)]
ftrace: Add selftest to test function trace recursion protection
Add selftests to test the function tracing recursion protection actually
does work. It also tests if a ftrace_ops states it will perform its own
protection. Although, even if the ftrace_ops states it will protect itself,
the ftrace infrastructure may still provide protection if the arch does
not support all features or another ftrace_ops is registered.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Steven Rostedt [Fri, 20 Jul 2012 15:13:07 +0000 (11:13 -0400)]
ftrace: Only compile ftrace selftest if selftests are enabled
No need to compile in the ftrace selftest helper file if selftests are
not being executed.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Steven Rostedt [Fri, 20 Jul 2012 15:04:44 +0000 (11:04 -0400)]
ftrace: Add default recursion protection for function tracing
As more users of the function tracer utility are being added, they do
not always add the necessary recursion protection. To protect from
function recursion due to tracing, if the callback ftrace_ops does not
specifically specify that it protects against recursion (by setting
the FTRACE_OPS_FL_RECURSION_SAFE flag), the list operation will be
called by the mcount trampoline which adds recursion protection.
If the flag is set, then the function will be called directly with no
extra protection.
Note, the list operation is called if more than one function callback
is registered, or if the arch does not support all of the function
tracer features.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Steven Rostedt [Tue, 3 Jul 2012 20:16:09 +0000 (16:16 -0400)]
ftrace/x86: Remove function_trace_stop check from graph caller
The graph caller is called by the mcount callers, which already does
the check against the function_trace_stop variable. No reason to
check it again.
Link: http://lkml.kernel.org/r/20120711195745.588538769@goodmis.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Uros Bizjak [Thu, 19 Jul 2012 17:04:47 +0000 (13:04 -0400)]
ftrace/x86_32: Simplify parameter setup for ftrace_regs_caller
The final position of the stack after saving regs and setting up
the parameters for ftrace_regs_call, is the position of the pt_regs
needed for the 4th parameter. Instead of saving it into a temporary
reg and pushing the reg, simply push the stack pointer.
Link: http://lkml.kernel.org/r/1342702344.12353.16.camel@gandalf.stny.rr.com
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Oleg Nesterov [Sun, 29 Jul 2012 18:22:49 +0000 (20:22 +0200)]
uprobes: __replace_page() needs munlock_vma_page()
Like do_wp_page(), __replace_page() should do munlock_vma_page()
for the case when the old page still has other !VM_LOCKED
mappings. Unfortunately this needs mm/internal.h.
Also, move put_page() outside of ptl lock. This doesn't really
matter but looks a bit better.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar.vnet.ibm.com>
Cc: Anton Arapov <anton@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/20120729182249.GA20372@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Oleg Nesterov [Sun, 29 Jul 2012 18:22:47 +0000 (20:22 +0200)]
uprobes: Rename vma_address() and make it return "unsigned long"
1. vma_address() returns loff_t, this looks confusing and this
is unnecessary after the previous change. Make it return "ulong",
all callers truncate the result anyway.
2. Its name conflicts with mm/rmap.c:vma_address(), rename it to
offset_to_vaddr(), this matches vaddr_to_offset().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar.vnet.ibm.com>
Cc: Anton Arapov <anton@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/20120729182247.GA20365@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Oleg Nesterov [Sun, 29 Jul 2012 18:22:44 +0000 (20:22 +0200)]
uprobes: Fix register_for_each_vma()->vma_address() check
1. register_for_each_vma() checks that vma_address() == vaddr,
but this is not enough. We should also ensure that
vaddr >= vm_start, find_vma() guarantees "vaddr < vm_end" only.
2. After the prevous changes, register_for_each_vma() is the
only reason why vma_address() has to return loff_t, all other
users know that we have the valid mapping at this offset and
thus the overflow is not possible.
Change the code to use vaddr_to_offset() instead, imho this looks
more clean/understandable and now we can change vma_address().
3. While at it, remove the unnecessary type-cast.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar.vnet.ibm.com>
Cc: Anton Arapov <anton@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/20120729182244.GA20362@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Oleg Nesterov [Sun, 29 Jul 2012 18:22:42 +0000 (20:22 +0200)]
uprobes: Introduce vaddr_to_offset(vma, vaddr)
Add the new helper, vaddr_to_offset(vma, vaddr) which returns
the offset in vma->vm_file this vaddr is mapped at.
Change build_probe_list() and find_active_uprobe() to use the
new helper, the next patch adds another user.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar.vnet.ibm.com>
Cc: Anton Arapov <anton@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/20120729182242.GA20355@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Oleg Nesterov [Sun, 29 Jul 2012 18:22:40 +0000 (20:22 +0200)]
uprobes: Teach build_probe_list() to consider the range
Currently build_probe_list() builds the list of all uprobes
attached to the given inode, and the caller should filter out
those who don't fall into the [start,end) range, this is
sub-optimal.
This patch turns find_least_offset_node() into
find_node_in_range() which returns the first node inside the
[min,max] range, and changes build_probe_list() to use this node
as a starting point for rb_prev() and rb_next() to find all
other nodes the caller needs. The resulting list is no longer
sorted but we do not care.
This can speed up both build_probe_list() and the callers, but
there is another reason to introduce find_node_in_range(). It
can be used to figure out whether the given vma has uprobes or
not, this will be needed soon.
While at it, shift INIT_LIST_HEAD(tmp_list) into
build_probe_list().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar.vnet.ibm.com>
Cc: Anton Arapov <anton@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/20120729182240.GA20352@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Oleg Nesterov [Sun, 29 Jul 2012 18:22:38 +0000 (20:22 +0200)]
uprobes: Remove insert_vm_struct()->uprobe_mmap()
Remove insert_vm_struct()->uprobe_mmap(). It is not needed, nobody
except arch/ia64/kernel/perfmon.c uses insert_vm_struct(vma)
with vma->vm_file != NULL.
And it is wrong. Again, get_user_pages() can not succeed before
vma_link(vma) makes is visible to find_vma(). And even if this
worked, we must not insert the new bp before this mapping is
visible to vma_prio_tree_foreach() for uprobe_unregister().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar.vnet.ibm.com>
Cc: Anton Arapov <anton@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/20120729182238.GA20349@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Oleg Nesterov [Sun, 29 Jul 2012 18:22:36 +0000 (20:22 +0200)]
uprobes: Remove copy_vma()->uprobe_mmap()
Remove copy_vma()->uprobe_mmap(new_vma), it is absolutely wrong.
This new_vma was just initialized to represent the new unmapped
area, [vm_start, vm_end) was returned by get_unmapped_area() in
the caller.
This means that uprobe_mmap()->get_user_pages() will fail for
sure, simply because find_vma() can never succeed. And I
verified that sys_mremap()->mremap_to() indeed always fails with
the wrong ENOMEM code if [addr, addr+old_len] is probed.
And why this uprobe_mmap() was added? I believe the intent was
wrong. Note that the caller is going to do move_page_tables(),
all registered uprobes are already faulted in, we only change
the virtual addresses.
NOTE: However, somehow we need to close the race with
uprobe_register() which relies on map_info->vaddr. This needs
another fix I'll try to do later. Probably we need uprobe_mmap()
in move_vma() but we can not do this right now, this can confuse
uprobes_state.counter (which I still hope we are going to kill).
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar.vnet.ibm.com>
Cc: Anton Arapov <anton@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/20120729182236.GA20342@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Oleg Nesterov [Sun, 29 Jul 2012 18:22:33 +0000 (20:22 +0200)]
uprobes: Fix overflow in vma_address()/find_active_uprobe()
vma->vm_pgoff is "unsigned long", it should be promoted to
loff_t before the multiplication to avoid the overflow.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar.vnet.ibm.com>
Cc: Anton Arapov <anton@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/20120729182233.GA20339@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Oleg Nesterov [Sun, 29 Jul 2012 18:22:31 +0000 (20:22 +0200)]
uprobes: Suppress uprobe_munmap() from mmput()
uprobe_munmap() does get_user_pages() and it is also called from
the final mmput()->exit_mmap() path. This slows down
exit/mmput() for no reason, and I think it is simply
dangerous/wrong to try to fault-in a page into the dying mm. If
nothing else, this happens after the last sync_mm_rss(), afaics
handle_mm_fault() can change the task->rss_stat and make the
subsequent check_mm() unhappy.
Change uprobe_munmap() to check mm->mm_users != 0.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar.vnet.ibm.com>
Cc: Anton Arapov <anton@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/20120729182231.GA20336@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Oleg Nesterov [Sun, 29 Jul 2012 18:22:29 +0000 (20:22 +0200)]
uprobes: Uprobe_mmap/munmap needs list_for_each_entry_safe()
The bug was introduced by me in
449d0d7c ("uprobes: Simplify the
usage of uprobe->pending_list").
Yes, we do not care about uprobe->pending_list after return and
nobody can remove the current list entry, but put_uprobe(uprobe)
can actually free it and thus we need list_for_each_safe().
Reported-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar.vnet.ibm.com>
Cc: Anton Arapov <anton@redhat.com>
Link: http://lkml.kernel.org/r/20120729182229.GA20329@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Oleg Nesterov [Sun, 29 Jul 2012 18:22:20 +0000 (20:22 +0200)]
uprobes: Clean up and document write_opcode()->lock_page(old_page)
The comment above write_opcode()->lock_page(old_page) tells
about the race with do_wp_page(). I don't really understand
which exactly race it means, but afaics this lock_page() was not
enough to close all races with do_wp_page().
Anyway, since:
77fc4af1b59d uprobes: Change register_for_each_vma() to take mm->mmap_sem for writing
this code is always called with ->mmap_sem held for writing,
so we can forget about do_wp_page().
However, we can't simply remove this lock_page(), and the only
(afaics) reason is __replace_page()->try_to_free_swap().
Nothing in write_opcode() needs it, move it into
__replace_page() and fix the comment.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar.vnet.ibm.com>
Cc: Anton Arapov <anton@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/20120729182220.GA20322@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Oleg Nesterov [Sun, 29 Jul 2012 18:22:18 +0000 (20:22 +0200)]
uprobes: Kill write_opcode()->lock_page(new_page)
write_opcode() does lock_page(new_page) for no reason. Nobody
can see this page until __replace_page() exposes it under ptl
lock, and we do nothing with this page after pte_unmap_unlock().
If nothing else, the similar code in do_wp_page() doesn't lock
the new page for page_add_new_anon_rmap/set_pte_at_notify.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar.vnet.ibm.com>
Cc: Anton Arapov <anton@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/20120729182218.GA20315@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Oleg Nesterov [Sun, 29 Jul 2012 18:22:16 +0000 (20:22 +0200)]
uprobes: __replace_page() should not use page_address_in_vma()
page_address_in_vma(old_page) in __replace_page() is ugly and
wrong. The caller already knows the correct virtual address,
this page was found by get_user_pages(vaddr).
However, page_address_in_vma() can actually fail if
page->mapping was cleared by __delete_from_page_cache() after
get_user_pages() returns. But this means the race with page
reclaim, write_opcode() should not fail, it should retry and
read this page again. Probably the race with remove_mapping() is
not possible due to page_freeze_refs() logic, but afaics at
least shmem_writepage()->shmem_delete_from_page_cache() can
clear ->mapping.
We could change __replace_page() to return -EAGAIN in this case,
but it would be better to simply use the caller's vaddr and rely
on page_check_address().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar.vnet.ibm.com>
Cc: Anton Arapov <anton@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/20120729182216.GA20311@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Oleg Nesterov [Sun, 29 Jul 2012 18:22:12 +0000 (20:22 +0200)]
uprobes: Don't recheck vma/f_mapping in write_opcode()
write_opcode() rechecks valid_vma() and ->f_mapping, this is
pointless. The caller, register_for_each_vma() or uprobe_mmap(),
has already done these checks under mmap_sem.
To clarify, uprobe_mmap() checks valid_vma() only, but we can
rely on build_probe_list(vm_file->f_mapping->host).
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar.vnet.ibm.com>
Cc: Anton Arapov <anton@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/20120729182212.GA20304@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Jovi Zhang [Tue, 17 Jul 2012 02:14:41 +0000 (10:14 +0800)]
perf/x86: Fix missing struct before structure name
When CONFIG_PERF_EVENTS disabled, there will have a compiliation
error, because missing struct before structure name.
Signed-off-by: Jovi Zhang <bookjovi@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Jiri Kosina <trivial@kernel.org>
Link: http://lkml.kernel.org/r/CACV3sbKF%3DCX%2B2jWEWesfCA6rBoQ3wDM4-5ac9MuBtVbCtMRHdQ@mail.gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Yan, Zheng [Tue, 24 Jul 2012 02:44:10 +0000 (10:44 +0800)]
perf/x86: Fix format definition of SNB-EP uncore QPI box
The event control register of SNB-EP uncore QPI box has a one bit
extension at bit position 21.
Reported-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1343097850-4348-1-git-send-email-zheng.z.yan@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Peter Zijlstra [Mon, 9 Jul 2012 11:50:23 +0000 (13:50 +0200)]
perf/x86: Make bitfield unsigned
Fix:
arch/x86/kernel/cpu/perf_event.h:377:43: sparse: dubious one-bit signed bitfield
Cc: Borislav Petkov <bp@amd64.org>
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-2jxkmktkppkclj1qe6qxd7ah@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Yan, Zheng [Tue, 17 Jul 2012 09:27:55 +0000 (17:27 +0800)]
perf/x86: Fix LLC-* and node-* events on Intel SandyBridge
LLC-* and node-* events require using the OFFCORE_RESPONSE events
on SandyBridge, but the hw_cache_extra_regs is left uninitialized.
This patch adds the missing extra register configure table for
SandyBridge.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1342517275-2875-1-git-send-email-zheng.z.yan@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Yan, Zheng [Thu, 5 Jul 2012 06:32:17 +0000 (14:32 +0800)]
perf/x86: Add Intel Nehalem-EX uncore support
The uncore subsystem in Nehalem-EX consists of 7 components
(U-Box, C-Box, B-Box, S-Box, R-Box, M-Box and W-Box). This
patch is large because the way to program these boxes is
diverse.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/4FF534F1.3030307@intel.com
[ Improved the code. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Yan, Zheng [Mon, 23 Jul 2012 06:23:30 +0000 (14:23 +0800)]
perf/x86: Fix typo in format definition of uncore PCU filter
The format definition of uncore PCU filter should be filter_band*
instead of filter_brand*.
Reported-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1343024611-4692-1-git-send-email-zheng.z.yan@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Andrew Vagin [Thu, 12 Jul 2012 10:14:29 +0000 (14:14 +0400)]
sched: Deliver sched_switch events to the current task
Otherwise they can't be filtered for a defined task:
perf record -e sched:sched_switch ./foo
This command doesn't report any events without this patch.
I think it isn't a security concern if someone knows who will
be executed next - this can already be observed by polling /proc
state. By default perf is disabled for non-root users in any case.
I need these events for profiling sleep times. sched_switch is used for
getting callchains and sched_stat_* is used for getting time periods.
These events are combined in user space, then it can be analyzed by
perf tools.
Signed-off-by: Andrew Vagin <avagin@openvz.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Arun Sharma <asharma@fb.com>
Link: http://lkml.kernel.org/r/1342088069-1005148-1-git-send-email-avagin@openvz.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Ingo Molnar [Wed, 25 Jul 2012 18:52:59 +0000 (20:52 +0200)]
Merge tag 'perf-core-for-mingo' of git://git./linux/kernel/git/acme/linux into perf/core
Pull perf/core fixes and improvements from Arnaldo Carvalho de Melo:
* libtraceevent build fixes from Namhyung Kim
* Prevent overflow when calculating annotation buckets size, from Cody Schafer
* Set breakpoint events sample period to 1, just like trace events, from Jovi Zhang
* Fix strerror_r usage, from Kirill Shutemov
* Fix duplicate function declaration problems with bison 2.6, from Kirill Shutemov
* Add DSO caching facility (and perf test entry) to speed up binary file
reading, will be used by the DWARF unwind code, from Jiri Olsa
* Fix trace event storms by setting PERF_SAMPLE_PERIOD, from Frederic Weisbecker
* perf kvm fixes from David Ahern
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cody Schafer [Fri, 20 Jul 2012 03:05:25 +0000 (20:05 -0700)]
perf annotate: Prevent overflow in size calculation
A large enough symbol size causes an overflow in the size parameter to
the histogram allocation, leading to a segfault in
symbol__inc_addr_samples later on when this histogram is accessed.
In the case of being called via perf-report, this returns back and
gracefully ignores the sample, eventually ignoring the chained return
value of perf_session_deliver_event in flush_sample_queue.
Signed-off-by: Cody Schafer <cody@linux.vnet.ibm.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/1342753525-4521-1-git-send-email-cody@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Fri, 6 Jul 2012 07:21:33 +0000 (16:21 +0900)]
tools lib traceevent: Ignore TRACEEVENT-CFLAGS file
The TRACEEVENT-CFLAGS file is used to detect any change on compiler
flags. Just ignore it.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1341559297-25725-3-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Fri, 6 Jul 2012 07:21:32 +0000 (16:21 +0900)]
tools lib traceevent: Detect build environment changes
Cross compiling perf requires setting ARCH and CROSS_COMPILE variables,
but libtraceevent couldn't detect the changes so it ends up believing no
recompiling is required. Thus the linker failed like:
LINK perf
../lib/traceevent//libtraceevent.a: member ../lib/traceevent//libtraceevent.a(event-parse.o) in archive is not an object
collect2: ld returned 1 exit status
make: *** [perf] Error 1
This patch fixes this by adding TRACEEVENT-CFLAGS file like
PERF-CFLAGS to track those changes.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1341559297-25725-2-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Kirill A. Shutemov [Mon, 23 Jul 2012 21:04:07 +0000 (00:04 +0300)]
perf tools: Fix build error with bison 2.6
Bison 2.6 started to generate parse_events_parse() declaration in header. In
this case we have redundant redeclaration:
util/parse-events.c:29:5: error: redundant redeclaration of ‘parse_events_parse’ [-Werror=redundant-decls]
In file included from util/parse-events.c:14:0:
util/parse-events-bison.h:99:5: note: previous declaration of ‘parse_events_parse’ was here
cc1: all warnings being treated as errors
Let's disable -Wredundant-decls for util/parse-events.c since it includes
header we can't control.
Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20120723210407.GA25186@shutemov.name
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Kirill A. Shutemov [Mon, 23 Jul 2012 21:06:54 +0000 (00:06 +0300)]
perf tools: use XSI-complaint version of strerror_r() instead of GNU-specific
Perf uses GNU-specific version of strerror_r(). The GNU-specific strerror_r()
returns a pointer to a string containing the error message. This may be either
a pointer to a string that the function stores in buf, or a pointer to some
(immutable) static string (in which case buf is unused).
In glibc-2.16 GNU version was marked with attribute warn_unused_result. It
triggers few warnings in perf:
util/target.c: In function ‘perf_target__strerror’:
util/target.c:114:13: error: ignoring return value of ‘strerror_r’, declared with attribute warn_unused_result [-Werror=unused-result]
ui/browsers/hists.c: In function ‘hist_browser__dump’:
ui/browsers/hists.c:981:13: error: ignoring return value of ‘strerror_r’, declared with attribute warn_unused_result [-Werror=unused-result]
They are bugs.
Let's fix strerror_r() usage.
Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
Acked-by: Ulrich Drepper <drepper@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ulrich Drepper <drepper@gmail.com>
Link: http://lkml.kernel.org/r/20120723210654.GA25248@shutemov.name
[ committer note: s/assert/BUG_ON/g ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jovi Zhang [Sat, 14 Jul 2012 19:03:10 +0000 (03:03 +0800)]
perf tools: Make the breakpoint events sample period default to 1
There have one problem about hw_breakpoint perf event, as watched, the
events reported to userspace is not correctly, sometime one trigger
bp_event report several events, sometime bp_event cannot go through to
user.
The root cause is attr->freq is 1 passed to kernel defaultly in bp
events, this make kernel calculate event period not as expect, make
sample period to 1 will change attr->freq to 0, to fix this problem.
This patch is similar with commit f92128 about tracepoint events:
perf: Make the trace events sample period default to 1
Signed-off-by: Jovi Zhang <bookjovi@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/CACV3sbLF8taiCq_VYW-sgRJyupeMzg58C7ZXfMe3xZUiH_Mx6w@mail.gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Sun, 22 Jul 2012 12:14:40 +0000 (14:14 +0200)]
perf test: Add dso data caching tests
Adding automated test for DSO data reading. Testing raw/cached reads
from different file/cache locations.
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Cc: Arun Sharma <asharma@fb.com>
Cc: Benjamin Redelings <benjamin.redelings@nescent.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Frank Ch. Eigler <fche@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: Ulrich Drepper <drepper@gmail.com>
Link: http://lkml.kernel.org/r/1342959280-5361-18-git-send-email-jolsa@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Sun, 22 Jul 2012 12:14:39 +0000 (14:14 +0200)]
perf symbols: Add dso data caching
Adding dso data caching so we don't need to open/read/close, each time
we want dso data.
The DSO data caching affects following functions:
dso__data_read_offset
dso__data_read_addr
Each DSO read tries to find the data (based on offset) inside the cache.
If it's not present it fills the cache from file, and returns the data.
If it is present, data are returned with no file read.
Each data read is cached by reading cache page sized/aligned amount of
DSO data. The cache page size is hardcoded to 4096. The cache is using
RB tree with file offset as a sort key.
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Cc: Arun Sharma <asharma@fb.com>
Cc: Benjamin Redelings <benjamin.redelings@nescent.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Frank Ch. Eigler <fche@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: Ulrich Drepper <drepper@gmail.com>
Link: http://lkml.kernel.org/r/1342959280-5361-17-git-send-email-jolsa@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Sun, 22 Jul 2012 12:14:33 +0000 (14:14 +0200)]
perf symbols: Add interface to read DSO image data
Adding following interface for DSO object to allow
reading of DSO image data:
dso__data_fd
- opens DSO and returns file descriptor
Binary types are used to locate/open DSO in following order:
DSO_BINARY_TYPE__BUILD_ID_CACHE
DSO_BINARY_TYPE__SYSTEM_PATH_DSO
In other word we first try to open DSO build-id path,
and if that fails we try to open DSO system path.
dso__data_read_offset
- reads DSO data from specified offset
dso__data_read_addr
- reads DSO data from specified address/map.
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Cc: Arun Sharma <asharma@fb.com>
Cc: Benjamin Redelings <benjamin.redelings@nescent.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Frank Ch. Eigler <fche@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: Ulrich Drepper <drepper@gmail.com>
Link: http://lkml.kernel.org/r/1342959280-5361-11-git-send-email-jolsa@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Sun, 22 Jul 2012 12:14:32 +0000 (14:14 +0200)]
perf symbols: Factor DSO symtab types to generic binary types
Adding interface to access DSOs so it could be used
from another place.
New DSO binary type is added - making current SYMTAB__*
types more general:
DSO_BINARY_TYPE__* = SYMTAB__*
Following function is added to return path based on the specified
binary type:
dso__binary_type_file
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Cc: Arun Sharma <asharma@fb.com>
Cc: Benjamin Redelings <benjamin.redelings@nescent.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Frank Ch. Eigler <fche@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: Ulrich Drepper <drepper@gmail.com>
Link: http://lkml.kernel.org/r/1342959280-5361-10-git-send-email-jolsa@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Frederic Weisbecker [Wed, 18 Jul 2012 17:10:56 +0000 (19:10 +0200)]
perf hists: Print newline between hists callchains
Tiny cosmetic fix. The lack of a newline between hists callchains was
looking slightly messy.
Before:
0.24% swapper [kernel.kallsyms] [k] _raw_spin_lock_irq
|
--- _raw_spin_lock_irq
run_timer_softirq
__do_softirq
call_softirq
do_softirq
irq_exit
smp_apic_timer_interrupt
apic_timer_interrupt
default_idle
amd_e400_idle
cpu_idle
start_secondary
0.10% perf [kernel.kallsyms] [k] lock_is_held
|
--- lock_is_held
__might_sleep
mutex_lock_nested
perf_event_for_each_child
perf_ioctl
do_vfs_ioctl
sys_ioctl
system_call_fastpath
ioctl
cmd_record
run_builtin
main
__libc_start_main
After:
0.24% swapper [kernel.kallsyms] [k] _raw_spin_lock_irq
|
--- _raw_spin_lock_irq
run_timer_softirq
__do_softirq
call_softirq
do_softirq
irq_exit
smp_apic_timer_interrupt
apic_timer_interrupt
default_idle
amd_e400_idle
cpu_idle
start_secondary
0.10% perf [kernel.kallsyms] [k] lock_is_held
|
--- lock_is_held
__might_sleep
mutex_lock_nested
perf_event_for_each_child
perf_ioctl
do_vfs_ioctl
sys_ioctl
system_call_fastpath
ioctl
cmd_record
run_builtin
main
__libc_start_main
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1342631456-7233-3-git-send-email-fweisbec@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Frederic Weisbecker [Wed, 18 Jul 2012 17:10:54 +0000 (19:10 +0200)]
perf tools: Fix trace events storms due to weight demux
Trace events have a period (weight) of 1 by default. This can be
overriden on events definition by using the __perf_count() macro.
For example, the sched_stat_runtime() is weighted with the runtime of
the task that fired the event.
By default, perf handles such weighted event by dividing it into
individual events carrying a weight of 1. For example if
sched_stat_runtime is fired and the task has run
5000000 nsecs, perf
divides it into
5000000 events in the buffer.
This behaviour makes weighted events unusable because they quickly
fullfill the buffers and we lose most events.
The commit
5d81e5cfb37a174e8ddc0413e2e70cdf05807ace ("events: Don't
divide events if it has field period") solves this problem by sending
only one event when PERF_SAMPLE_PERIOD flag is set. The weight is
carried in the sample itself such that we don't need to demultiplex it
anymore.
This patch provides the last missing piece to use this feature by
setting PERF_SAMPLE_PERIOD from perf tools when we deal with trace
events.
Before:
$ ./perf record -e sched:* -a sleep 1
[ perf record: Woken up 3 times to write data ]
[ perf record: Captured and wrote 1.619 MB perf.data (~70749 samples) ]
Warning:
Processed 16909 events and lost 1 chunks!
Check IO/CPU overload!
$ ./perf script
perf 1894 [003] 824.898327: sched_migrate_task: comm=perf pid=1898 prio=120 orig_cpu=2 dest_cpu=0
perf 1894 [003] 824.898335: sched_stat_sleep: comm=perf pid=1898 delay=
113179500 [ns]
perf 1894 [003] 824.898336: sched_stat_sleep: comm=perf pid=1898 delay=
113179500 [ns]
perf 1894 [003] 824.898337: sched_stat_sleep: comm=perf pid=1898 delay=
113179500 [ns]
perf 1894 [003] 824.898338: sched_stat_sleep: comm=perf pid=1898 delay=
113179500 [ns]
perf 1894 [003] 824.898339: sched_stat_sleep: comm=perf pid=1898 delay=
113179500 [ns]
perf 1894 [003] 824.898340: sched_stat_sleep: comm=perf pid=1898 delay=
113179500 [ns]
perf 1894 [003] 824.898341: sched_stat_sleep: comm=perf pid=1898 delay=
113179500 [ns]
[...]
After:
$ ./perf record -e sched:* -a sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.074 MB perf.data (~3228 samples) ]
$ ./perf script
perf 1461 [000] 554.286957: sched_migrate_task: comm=perf pid=1465 prio=120 orig_cpu=3 dest_cpu=1
perf 1461 [000] 554.286964: sched_stat_sleep: comm=perf pid=1465 delay=
133047190 [ns]
perf 1461 [000] 554.286967: sched_wakeup: comm=perf pid=1465 prio=120 success=1 target_cpu=001
swapper 0 [001] 554.286976: sched_stat_wait: comm=perf pid=1465 delay=0 [ns]
swapper 0 [001] 554.286983: sched_switch: prev_comm=swapper/1 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=perf
[...]
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1342631456-7233-1-git-send-email-fweisbec@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Frederic Weisbecker [Wed, 18 Jul 2012 17:10:55 +0000 (19:10 +0200)]
perf hists: Return correct number of characters printed in callchain
Include the omitted number of characters printed for the first entry.
Not that it really matters because nobody seem to care about the number
of printed characters for now. But just in case.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1342631456-7233-2-git-send-email-fweisbec@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
David Ahern [Fri, 20 Jul 2012 23:25:52 +0000 (17:25 -0600)]
perf tools: Dump exclude_{guest,host}, precise_ip header info too
Adds the attributes to the event line in the header dump.
From:
event : name = cycles, type = 0, config = 0x0, config1 = 0x0,
config2 = 0x0, excl_usr = 0, excl_kern = 0, ...
to
event : name = cycles, type = 0, config = 0x0, config1 = 0x0,
config2 = 0x0, excl_usr = 0, excl_kern = 0, excl_host = 0,
excl_guest = 0, precise_ip = 0, ...
Signed-off-by: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1342826756-64663-8-git-send-email-dsahern@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
David Ahern [Fri, 20 Jul 2012 23:25:51 +0000 (17:25 -0600)]
perf kvm: Limit repetitive guestmount message to once per directory
After
7ed97ad use of the guestmount option without a subdir for *each*
VM generates an error message for each sample related to that VM. Once
per VM is enough.
Signed-off-by: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1342826756-64663-7-git-send-email-dsahern@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
David Ahern [Fri, 20 Jul 2012 23:25:49 +0000 (17:25 -0600)]
perf kvm: Fix bug resolving guest kernel syms
Guest kernel symbols are not resolved despite passing the information
needed to resolve them. e.g.,
perf kvm --guest --guestmount=/tmp/guest-mount record -a -- sleep 1
perf kvm --guest --guestmount=/tmp/guest-mount report --stdio
36.55% [guest/11399] [unknown] [g] 0xffffffff81600bc8
33.19% [guest/10474] [unknown] [g] 0x00000000c0116e00
30.26% [guest/11094] [unknown] [g] 0xffffffff8100a288
43.69% [guest/10474] [unknown] [g] 0x00000000c0103d90
37.38% [guest/11399] [unknown] [g] 0xffffffff81600bc8
12.24% [guest/11094] [unknown] [g] 0xffffffff810aa91d
6.69% [guest/11094] [unknown] [u] 0x00007fa784d721c3
which is just pathetic.
After a maddening 2 days sifting through perf minutia I found it --
id_hdr_size is not initialized for guest machines. This shows up on the
report side as random garbage for the cpu and timestamp, e.g.,
29816
7310572949125804849 0x1ac0 [0x50]: PERF_RECORD_MMAP ...
That messes up the sample sorting such that synthesized guest maps are
processed last.
With this patch you get a much more helpful report:
12.11% [guest/11399] [guest.kernel.kallsyms.11399] [g] irqtime_account_process_tick
10.58% [guest/11399] [guest.kernel.kallsyms.11399] [g] run_timer_softirq
6.95% [guest/11094] [guest.kernel.kallsyms.11094] [g] printk_needs_cpu
6.50% [guest/11094] [guest.kernel.kallsyms.11094] [g] do_timer
6.45% [guest/11399] [guest.kernel.kallsyms.11399] [g] idle_balance
4.90% [guest/11094] [guest.kernel.kallsyms.11094] [g] native_read_tsc
...
v2:
- changed rbtree walk to use rb_first per Namhyung's suggestion
Tested-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1342826756-64663-5-git-send-email-dsahern@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
David Ahern [Fri, 20 Jul 2012 23:25:48 +0000 (17:25 -0600)]
perf kvm: Guest userspace samples should not be lumped with host uspace
e.g., perf kvm --host --guest report -i perf.data --stdio -D
shows:
1
599127912065356 0x143b8 [0x48]: PERF_RECORD_SAMPLE(IP, 5): 5671/5676: 0x7fdf95a061c0 period: 1 addr: 0
... chain: nr:2
..... 0:
ffffffffffffff80
..... 1:
fffffffffffffe00
... thread: qemu-kvm:5671
...... dso: <not found>
(IP, 5) means sample in guest userspace. Those samples should not be lumped
into the VMM's host thread. i.e, the report output:
56.86% qemu-kvm [unknown] [u] 0x00007fdf95a061c0
With this patch the output emphasizes it is a guest userspace hit:
56.86% [guest/5671] [unknown] [u] 0x00007fdf95a061c0
Looking at 3 VMs (2 64-bit, 1 32-bit) with each running a CPU bound
process (openssl speed), perf report currently shows:
93.84% 117726 qemu-kvm [unknown] [u] 0x00007fd7dcaea8e5
which is wrong. With this patch you get:
31.50% 39258 [guest/18772] [unknown] [u] 0x00007fd7dcaea8e5
31.50% 39236 [guest/11230] [unknown] [u] 0x0000000000a57340
30.84% 39232 [guest/18395] [unknown] [u] 0x00007f66f641e107
Tested-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1342826756-64663-4-git-send-email-dsahern@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
David Ahern [Fri, 20 Jul 2012 23:25:47 +0000 (17:25 -0600)]
perf kvm: Set name for VM process in guest machine
COMM events are not generated in the context of a guest machine, so the
thread name is never set for the VMM process. For example, the qemu-kvm
name applies to the process in the host machine, not the guest machine.
So, samples for guest machines are currently displayed as:
99.67% :5671 [unknown] [g] 0xffffffff81366b41
where 5671 is the pid of the VMM. With this patch the samples in the guest
machine are shown as:
18.43% [guest/5671] [unknown] [g] 0xffffffff810d68b7
Tested-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1342826756-64663-3-git-send-email-dsahern@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
David Ahern [Fri, 20 Jul 2012 23:25:46 +0000 (17:25 -0600)]
perf symbols: Add machine id to modules debug message
Current debug message is:
Problems creating module maps, continuing anyway...
When running multiple VMs it would be nice to know which machine the
message is referring to:
$ perf kvm --guest --guestmount=/tmp/guest-mount record -av -- sleep 10
Problems creating module maps for guest 6613, continuing anyway...
Signed-off-by: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1342826756-64663-2-git-send-email-dsahern@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Linus Torvalds [Sun, 22 Jul 2012 19:46:27 +0000 (12:46 -0700)]
Merge branch 'x86-build-for-linus' of git://git./linux/kernel/git/tip/tip
Pull a x86/build change from Ingo Molnar.
This makes the default stack alignment on x86-64 be just 8, allowing for
improved code generation (it can avoid some unnecessary extra alignment
logic and use just pure push/pop sequences) and smaller stack frames.
We can't generally do SSE with 16-byte alignment issues in the kernel anyway.
* 'x86-build-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86-64, gcc: Use -mpreferred-stack-boundary=3 if supported
Linus Torvalds [Sun, 22 Jul 2012 19:37:15 +0000 (12:37 -0700)]
Merge branch 'x86-uv-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86/uv changes from Ingo Molnar:
"UV2 BAU productization fixes.
The BAU (Broadcast Assist Unit) is SGI's fancy out of line way on UV
hardware to do TLB flushes, instead of the normal APIC IPI methods.
The commits here fix / work around hangs in their latest hardware
iteration (UV2).
My understanding is that the main purpose of the out of line
signalling channel is to improve scalability: the UV APIC hardware
glue does not handle broadcasting to many CPUs very well, and this
matters most for TLB shootdowns.
[ I don't agree with all aspects of the current approach: in hindsight
it would have been better to link the BAU at the IPI/APIC driver
level instead of the TLB shootdown level, where TLB flushes are
really just one of the uses of broadcast SMP messages. Doing that
would improve scalability in some other ways and it would also
remove a few uglies from the TLB path. It would also be nice to
push more is_uv_system() tests into proper x86_init or x86_platform
callbacks. Cliff? ]"
* 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/uv: Work around UV2 BAU hangs
x86/uv: Implement UV BAU runtime enable and disable control via /proc/sgi_uv/
x86/uv: Fix the UV BAU destination timeout period
Linus Torvalds [Sun, 22 Jul 2012 19:25:47 +0000 (12:25 -0700)]
Merge branch 'x86-reboot-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86/reboot changes from Ingo Molnar:
"Now that the revampted x86 real-mode trampoline code is upstream and
seems to be working well, we can extend the 64-bit reboot code to be
as capable as the 32-bit one."
* 'x86-reboot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86-64, reboot: Be more paranoid in 64-bit reboot=bios
x86, reboot: Drop redundant write of reboot_mode
x86-64, reboot: Allow reboot=bios and reboot-cpu override on x86-64
Linus Torvalds [Sun, 22 Jul 2012 19:19:36 +0000 (12:19 -0700)]
Merge branch 'x86-platform-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 platform changes from Ingo Molnar:
"This tree mostly involves various APIC driver cleanups/robustization,
and vSMP motivated platform callback improvements/cleanups"
Fix up trivial conflict due to printk cleanup right next to return value
change.
* 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (29 commits)
Revert "x86/early_printk: Replace obsolete simple_strtoul() usage with kstrtoint()"
x86/apic/x2apic: Use multiple cluster members for the irq destination only with the explicit affinity
x86/apic/x2apic: Limit the vector reservation to the user specified mask
x86/apic: Optimize cpu traversal in __assign_irq_vector() using domain membership
x86/vsmp: Fix vector_allocation_domain's return value
irq/apic: Use config_enabled(CONFIG_SMP) checks to clean up irq_set_affinity() for UP
x86/vsmp: Fix linker error when CONFIG_PROC_FS is not set
x86/apic/es7000: Make apicid of a cluster (not CPU) from a cpumask
x86/apic/es7000+summit: Always make valid apicid from a cpumask
x86/apic/es7000+summit: Fix compile warning in cpu_mask_to_apicid()
x86/apic: Fix ugly casting and branching in cpu_mask_to_apicid_and()
x86/apic: Eliminate cpu_mask_to_apicid() operation
x86/x2apic/cluster: Vector_allocation_domain() should return a value
x86/apic/irq_remap: Silence a bogus pr_err()
x86/vsmp: Ignore IOAPIC IRQ affinity if possible
x86/apic: Make cpu_mask_to_apicid() operations check cpu_online_mask
x86/apic: Make cpu_mask_to_apicid() operations return error code
x86/apic: Avoid useless scanning thru a cpumask in assign_irq_vector()
x86/apic: Try to spread IRQ vectors to different priority levels
x86/apic: Factor out default vector_allocation_domain() operation
...
Linus Torvalds [Sun, 22 Jul 2012 19:04:44 +0000 (12:04 -0700)]
Merge branch 'x86-debug-for-linus' of git://git./linux/kernel/git/tip/tip
Pull debug-for-linus git tree from Ingo Molnar.
Fix up trivial conflict in arch/x86/kernel/cpu/perf_event_intel.c due to
a printk() having changed to a pr_info() differently in the two branches.
* 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: Move call to print_modules() out of show_regs()
x86/mm: Mark free_initrd_mem() as __init
x86/microcode: Mark microcode_id[] as __initconst
x86/nmi: Clean up register_nmi_handler() usage
x86: Save cr2 in NMI in case NMIs take a page fault (for i386)
x86: Remove cmpxchg from i386 NMI nesting code
x86: Save cr2 in NMI in case NMIs take a page fault
x86/debug: Add KERN_<LEVEL> to bare printks, convert printks to pr_<level>
Linus Torvalds [Sun, 22 Jul 2012 18:42:28 +0000 (11:42 -0700)]
Merge branch 'x86-asm-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86/asm changes from Ingo Molnar:
"Assorted single-commit improvements, as usual"
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm/mtrr: Slightly simplify print_mtrr_state()
x86/mm/mtrr: Fix alignment determination in range_to_mtrr()
x86/copy_user_generic: Optimize copy_user_generic with CPU erms feature
x86/alternatives: Use atomic_xchg() instead atomic_dec_and_test() for stop_machine_text_poke()
Linus Torvalds [Sun, 22 Jul 2012 18:35:46 +0000 (11:35 -0700)]
Merge branch 'timers-core-for-linus' of git://git./linux/kernel/git/tip/tip
Pull timer core changes from Ingo Molnar:
"Continued cleanups of the core time and NTP code, plus more nohz work
preparing for tick-less userspace execution."
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
time: Rework timekeeping functions to take timekeeper ptr as argument
time: Move xtime_nsec adjustment underflow handling timekeeping_adjust
time: Move arch_gettimeoffset() usage into timekeeping_get_ns()
time: Refactor accumulation of nsecs to secs
time: Condense timekeeper.xtime into xtime_sec
time: Explicitly use u32 instead of int for shift values
time: Whitespace cleanups per Ingo%27s requests
nohz: Move next idle expiry time record into idle logic area
nohz: Move ts->idle_calls incrementation into strict idle logic
nohz: Rename ts->idle_tick to ts->last_tick
nohz: Make nohz API agnostic against idle ticks cputime accounting
nohz: Separate idle sleeping time accounting from nohz logic
timers: Improve get_next_timer_interrupt()
timers: Add accounting of non deferrable timers
timers: Consolidate base->next_timer update
timers: Create detach_if_pending() and use it
Linus Torvalds [Sun, 22 Jul 2012 18:22:15 +0000 (11:22 -0700)]
Merge branch 'smp-hotplug-for-linus' of git://git./linux/kernel/git/tip/tip
Pull smp/hotplug changes from Ingo Molnar:
"Various cleanups to the SMP hotplug code - a continuing effort of
Thomas et al"
* 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
smpboot: Remove leftover declaration
smp: Remove num_booting_cpus()
smp: Remove ipi_call_lock[_irq]()/ipi_call_unlock[_irq]()
POWERPC: Smp: remove call to ipi_call_lock()/ipi_call_unlock()
SPARC: SMP: Remove call to ipi_call_lock_irq()/ipi_call_unlock_irq()
ia64: SMP: Remove call to ipi_call_lock_irq()/ipi_call_unlock_irq()
x86-smp-remove-call-to-ipi_call_lock-ipi_call_unlock
tile: SMP: Remove call to ipi_call_lock()/ipi_call_unlock()
S390: Smp: remove call to ipi_call_lock()/ipi_call_unlock()
parisc: Smp: remove call to ipi_call_lock()/ipi_call_unlock()
mn10300: SMP: Remove call to ipi_call_lock()/ipi_call_unlock()
hexagon: SMP: Remove call to ipi_call_lock()/ipi_call_unlock()
Linus Torvalds [Sun, 22 Jul 2012 18:10:36 +0000 (11:10 -0700)]
Merge branch 'perf-core-for-linus' of git://git./linux/kernel/git/tip/tip
Pull perf events changes from Ingo Molnar:
"- kernel side:
- Intel uncore PMU support for Nehalem and Sandy Bridge CPUs, we
support both the events available via the MSR and via the PCI
access space.
- various uprobes cleanups and restructurings
- PMU driver quirks by microcode version and required x86 microcode
loader cleanups/robustization
- various tracing robustness updates
- static keys: remove obsolete static_branch()
- tooling side:
- GTK browser improvements
- perf report browser: support screenshots to file
- more automated tests
- perf kvm improvements
- perf bench refinements
- build environment improvements
- pipe mode improvements
- libtraceevent updates, we have now hopefully merged most bits with
the out of tree forked code base
... and many other goodies."
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (138 commits)
tracing: Check for allocation failure in __tracing_open()
perf/x86: Fix intel_perfmon_event_mapformatting
jump label: Remove static_branch()
tracepoint: Use static_key_false(), since static_branch() is deprecated
perf/x86: Uncore filter support for SandyBridge-EP
perf/x86: Detect number of instances of uncore CBox
perf/x86: Fix event constraint for SandyBridge-EP C-Box
perf/x86: Use 0xff as pseudo code for fixed uncore event
perf/x86: Save a few bytes in 'struct x86_pmu'
perf/x86: Add a microcode revision check for SNB-PEBS
perf/x86: Improve debug output in check_hw_exists()
perf/x86/amd: Unify AMD's generic and family 15h pmus
perf/x86: Move Intel specific code to intel_pmu_init()
perf/x86: Rename Intel specific macros
perf/x86: Fix USER/KERNEL tagging of samples
perf tools: Split event symbols arrays to hw and sw parts
perf tools: Split out PE_VALUE_SYM parsing token to SW and HW tokens
perf tools: Add empty rule for new line in event syntax parsing
perf test: Use ARRAY_SIZE in parse events tests
tools lib traceevent: Cleanup realloc use
...
Linus Torvalds [Sun, 22 Jul 2012 17:45:05 +0000 (10:45 -0700)]
Merge branch 'core-rcu-for-linus' of git://git./linux/kernel/git/tip/tip
Pull RCU changes from Ingo Molnar:
"Quoting from Paul, the major features of this series are:
1. Preventing latency spikes of more than 200 microseconds for
kernels built with NR_CPUS=4096, which is reportedly becoming the
default for some distros. This is a first step, as it does not
help with systems that actually -have- 4096 CPUs (work on this case
is in progress, but is not yet ready for mainline).
This category also includes improving concurrency of rcu_barrier(),
placed here due to conflicts. Posted to LKML at:
https://lkml.org/lkml/2012/6/22/381
Note that patches 18-22 of that series have been defered to 3.7, as
they have not yet proven themselves to be mainline-ready (and yes,
these are the ones intended to get rid of RCU's latency spikes for
systems that actually have 4096 CPUs).
2. Updates to documentation and rcutorture fixes, the latter category
including improvements to rcu_barrier() testing. Posted to LKML at
http://lkml.indiana.edu/hypermail/linux/kernel/1206.1/04094.html.
3. Miscellaneous fixes posted to LKML at:
https://lkml.org/lkml/2012/6/22/500
with the exception of the last commit, which was posted here:
http://www.gossamer-threads.com/lists/linux/kernel/
1561830
4. RCU_FAST_NO_HZ fixes and improvements. Posted to LKML at:
http://lkml.indiana.edu/hypermail/linux/kernel/1206.1/00006.html
http://www.gossamer-threads.com/lists/linux/kernel/
1561833
The first four patches of the first series went into 3.5 to fix a
regression.
5. Code-style fixes. These were posted to LKML at
http://lkml.indiana.edu/hypermail/linux/kernel/1205.2/01180.html
http://lkml.indiana.edu/hypermail/linux/kernel/1205.2/01181.html"
* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (48 commits)
rcu: Fix broken strings in RCU's source code.
rcu: Fix code-style issues involving "else"
rcu: Introduce check for callback list/count mismatch
rcu: Make RCU_FAST_NO_HZ respect nohz= boot parameter
rcu: Fix qlen_lazy breakage
rcu: Round FAST_NO_HZ lazy timeout to nearest second
rcu: The rcu_needs_cpu() function is not a quiescent state
rcu: Dump only the current CPU's buffers for idle-entry/exit warnings
rcu: Add check for CPUs going offline with callbacks queued
rcu: Disable preemption in rcu_blocking_is_gp()
rcu: Prevent uninitialized string in RCU CPU stall info
rcu: Fix rcu_is_cpu_idle() #ifdef in TINY_RCU
rcu: Split RCU core processing out of __call_rcu()
rcu: Prevent __call_rcu() from invoking RCU core on offline CPUs
rcu: Make __call_rcu() handle invocation from idle
rcu: Remove function versions of __kfree_rcu and __is_kfree_rcu_offset
rcu: Consolidate tree/tiny __rcu_read_{,un}lock() implementations
rcu: Remove return value from rcu_assign_pointer()
key: Remove extraneous parentheses from rcu_assign_keypointer()
rcu: Remove return value from RCU_INIT_POINTER()
...
Linus Torvalds [Sun, 22 Jul 2012 17:39:32 +0000 (10:39 -0700)]
Merge branch 'core-iommu-for-linus' of git://git./linux/kernel/git/tip/tip
Pull core/iommu changes from Ingo Molnar.
* 'core-iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
iommu/dmar: Use pr_format() instead of PREFIX to tidy up pr_*() calls
iommu/dmar: Reserve mmio space used by the IOMMU, if the BIOS forgets to
iommu/dmar: Replace printks with appropriate pr_*()
Ingo Molnar [Fri, 22 Jun 2012 14:25:19 +0000 (16:25 +0200)]
Revert "x86/early_printk: Replace obsolete simple_strtoul() usage with kstrtoint()"
This reverts commit
fbd24153c48b8425b09c161a020483cd77da870e.
This commit is subtly buggy: kstrto*int() can return an error but
it's not checked in every path. simple_strtoul() on the other hand
could not fail, so this patch subtly intruduces new failure modes.
Signed-off-by: Shuah Khan <shuahkhan@gmail.com>
Link: http://lkml.kernel.org/r/1338424803.3569.5.camel@lorien2
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Linus Torvalds [Sat, 21 Jul 2012 20:58:29 +0000 (13:58 -0700)]
Linux 3.5
Rafael J. Wysocki [Sat, 21 Jul 2012 18:24:52 +0000 (20:24 +0200)]
Remove SYSTEM_SUSPEND_DISK system state
The SYSTEM_SUSPEND_DISK system state is never used, so drop it.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sat, 21 Jul 2012 17:34:13 +0000 (10:34 -0700)]
Merge branch 'anton-kgdb' (kgdb dmesg fixups)
Merge emailed kgdb dmesg fixups patches from Anton Vorontsov:
"The dmesg command appears to be broken after the printk rework. The
old logic in the kdb code makes no sense in terms of current
printk/logging storage format, and KDB simply hangs forever upon
entering 'dmesg' command.
The first patch revives the command by switching to kmsg_dumper
iterator. As a side-effect, the code is now much more simpler.
A few changes were needed in the printk.c: we needed unlocked variant
of the kmsg_dumper iterator, but these can surely wait for 3.6.
It's probably too late even for the first patch to go to 3.5, but I'll
try to convince otherwise. :-) Here we go:
- The current code is broken for sure, and has no hope to work at
all. It is a regression
- The new code works for me, and probably works for everyone else;
- If it compiles (and I urge everyone to compile-test it on your
setup), it hardly can make things worse."
* Merge emailed patches from Anton Vorontsov: (4 commits)
kdb: Switch to nolock variants of kmsg_dump functions
printk: Implement some unlocked kmsg_dump functions
printk: Remove kdb_syslog_data
kdb: Revive dmesg command
Anton Vorontsov [Sat, 21 Jul 2012 00:28:25 +0000 (17:28 -0700)]
kdb: Switch to nolock variants of kmsg_dump functions
The locked variants are prone to deadlocks (suppose we got to the
debugger w/ the logbuf lock held), so let's switch to nolock variants.
Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Anton Vorontsov [Sat, 21 Jul 2012 00:28:07 +0000 (17:28 -0700)]
printk: Implement some unlocked kmsg_dump functions
If used from KDB, the locked variants are prone to deadlocks (suppose we
got to the debugger w/ the logbuf lock held).
So, we have to implement a few routines that grab no logbuf lock.
Yet we don't need these functions in modules, so we don't export them.
Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Anton Vorontsov [Sat, 21 Jul 2012 00:27:54 +0000 (17:27 -0700)]
printk: Remove kdb_syslog_data
The function is no longer needed, so remove it.
Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Anton Vorontsov [Sat, 21 Jul 2012 00:27:37 +0000 (17:27 -0700)]
kdb: Revive dmesg command
The kgdb dmesg command is broken after the printk rework. The old logic
in kdb code makes no sense in terms of current printk/logging storage
format, and KDB simply hangs forever.
This patch revives the command by switching to kmsg_dumper iterator.
The code is now much more simpler and shorter.
Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 20 Jul 2012 19:02:02 +0000 (12:02 -0700)]
Merge branch 'upstream' of git://git.linux-mips.org/ralf/upstream-linus
Pull late MIPS fixes from Ralf Baechle:
"This fixes a number of lose ends in the MIPS code and various bug
fixes.
Aside of dropping some patch that should not be in this pull request
everything has sat in -next for quite a while and there are no known
issues.
The biggest patch in this patch set moves the allocation of an array
that is aliased to a function (for runtime generated code) to
assembler code. This avoids an issue with certain toolchains when
building for microMIPS."
* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (35 commits)
MIPS: PCI: Move fixups from __init to __devinit.
MIPS: Fix bug.h MIPS build regression
MIPS: sync-r4k: remove redundant irq operation
MIPS: smp: Warn on too early irq enable
MIPS: call set_cpu_online() on cpu being brought up with irq disabled
MIPS: call ->smp_finish() a little late
MIPS: Yosemite: delay irq enable to ->smp_finish()
MIPS: SMTC: delay irq enable to ->smp_finish()
MIPS: BMIPS: delay irq enable to ->smp_finish()
MIPS: Octeon: delay enable irq to ->smp_finish()
MIPS: Oprofile: Fix build as a module.
MIPS: BCM63XX: Fix BCM6368 IPSec clock bit
MIPS: perf: Fix build error caused by unused counters_per_cpu_to_total()
MIPS: Fix Magic SysRq L kernel crash.
MIPS: BMIPS: Fix duplicate header inclusion.
mips: mark const init data with __initconst instead of __initdata
MIPS: cmpxchg.h: Add missing include
MIPS: Malta may also be equipped with MIPS64 R2 processors.
MIPS: Fix typo multipy -> multiply
MIPS: Cavium: Fix duplicate ARCH_SPARSEMEM_ENABLE in kconfig.
...
Linus Torvalds [Fri, 20 Jul 2012 18:51:22 +0000 (11:51 -0700)]
Merge tag 'dm-3.5-fixes-2' of git://git./linux/kernel/git/agk/linux-dm
Pull device-mapper discard fixes from Alasdair G Kergon:
- avoid a crash in dm-raid1 when discards coincide with mirror
recovery;
- avoid discarding shared data that's still needed in dm-thin;
- don't guarantee that discarded blocks will be wiped in dm-raid1.
* tag 'dm-3.5-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm:
dm raid1: set discard_zeroes_data_unsupported
dm thin: do not send discards to shared blocks
dm raid1: fix crash with mirror recovery and discard
Linus Torvalds [Fri, 20 Jul 2012 18:43:53 +0000 (11:43 -0700)]
Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osd
Pull pnfs/ore fixes from Boaz Harrosh:
"These are catastrophic fixes to the pnfs objects-layout that were just
discovered. They are also destined for @stable.
I have found these and worked on them at around RC1 time but
unfortunately went to the hospital for kidney stones and had a very
slow recovery. I refrained from sending them as is, before proper
testing, and surly I have found a bug just yesterday.
So now they are all well tested, and have my sign-off. Other then
fixing the problem at hand, and assuming there are no bugs at the new
code, there is low risk to any surrounding code. And in anyway they
affect only these paths that are now broken. That is RAID5 in pnfs
objects-layout code. It does also affect exofs (which was not broken)
but I have tested exofs and it is lower priority then objects-layout
because no one is using exofs, but objects-layout has lots of users."
* 'for-linus' of git://git.open-osd.org/linux-open-osd:
pnfs-obj: Fix __r4w_get_page when offset is beyond i_size
pnfs-obj: don't leak objio_state if ore_write/read fails
ore: Unlock r4w pages in exact reverse order of locking
ore: Remove support of partial IO request (NFS crash)
ore: Fix NFS crash by supporting any unaligned RAID IO
Linus Torvalds [Fri, 20 Jul 2012 18:42:30 +0000 (11:42 -0700)]
Merge tag 'upstream-3.5-rc8' of git://git.infradead.org/linux-ubifs
Pull UBIFS free space fix-up bugfix from Artem Bityutskiy:
"It's been reported already twice recently:
http://lists.infradead.org/pipermail/linux-mtd/2012-May/041408.html
http://lists.infradead.org/pipermail/linux-mtd/2012-June/042422.html
and we finally have the fix. I am quite confident the fix is correct
because I could reproduce the problem with nandsim and verify the fix.
It was also verified by Iwo (the reporter).
I am also confident that this is OK to merge the fix so late because
this patch affects only the fixup functionality, which is not used by
most users."
* tag 'upstream-3.5-rc8' of git://git.infradead.org/linux-ubifs:
UBIFS: fix a bug in empty space fix-up
Mikulas Patocka [Fri, 20 Jul 2012 13:25:07 +0000 (14:25 +0100)]
dm raid1: set discard_zeroes_data_unsupported
We can't guarantee that REQ_DISCARD on dm-mirror zeroes the data even if
the underlying disks support zero on discard. So this patch sets
ti->discard_zeroes_data_unsupported.
For example, if the mirror is in the process of resynchronizing, it may
happen that kcopyd reads a piece of data, then discard is sent on the
same area and then kcopyd writes the piece of data to another leg.
Consequently, the data is not zeroed.
The flag was made available by commit
983c7db347db8ce2d8453fd1d89b7a4bb6920d56
(dm crypt: always disable discard_zeroes_data).
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mikulas Patocka [Fri, 20 Jul 2012 13:25:05 +0000 (14:25 +0100)]
dm thin: do not send discards to shared blocks
When process_discard receives a partial discard that doesn't cover a
full block, it sends this discard down to that block. Unfortunately, the
block can be shared and the discard would corrupt the other snapshots
sharing this block.
This patch detects block sharing and ends the discard with success when
sending it to the shared block.
The above change means that if the device supports discard it can't be
guaranteed that a discard request zeroes data. Therefore, we set
ti->discard_zeroes_data_unsupported.
Thin target discard support with this bug arrived in commit
104655fd4dcebd50068ef30253a001da72e3a081 (dm thin: support discards).
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mikulas Patocka [Fri, 20 Jul 2012 13:25:03 +0000 (14:25 +0100)]
dm raid1: fix crash with mirror recovery and discard
This patch fixes a crash when a discard request is sent during mirror
recovery.
Firstly, some background. Generally, the following sequence happens during
mirror synchronization:
- function do_recovery is called
- do_recovery calls dm_rh_recovery_prepare
- dm_rh_recovery_prepare uses a semaphore to limit the number
simultaneously recovered regions (by default the semaphore value is 1,
so only one region at a time is recovered)
- dm_rh_recovery_prepare calls __rh_recovery_prepare,
__rh_recovery_prepare asks the log driver for the next region to
recover. Then, it sets the region state to DM_RH_RECOVERING. If there
are no pending I/Os on this region, the region is added to
quiesced_regions list. If there are pending I/Os, the region is not
added to any list. It is added to the quiesced_regions list later (by
dm_rh_dec function) when all I/Os finish.
- when the region is on quiesced_regions list, there are no I/Os in
flight on this region. The region is popped from the list in
dm_rh_recovery_start function. Then, a kcopyd job is started in the
recover function.
- when the kcopyd job finishes, recovery_complete is called. It calls
dm_rh_recovery_end. dm_rh_recovery_end adds the region to
recovered_regions or failed_recovered_regions list (depending on
whether the copy operation was successful or not).
The above mechanism assumes that if the region is in DM_RH_RECOVERING
state, no new I/Os are started on this region. When I/O is started,
dm_rh_inc_pending is called, which increases reg->pending count. When
I/O is finished, dm_rh_dec is called. It decreases reg->pending count.
If the count is zero and the region was in DM_RH_RECOVERING state,
dm_rh_dec adds it to the quiesced_regions list.
Consequently, if we call dm_rh_inc_pending/dm_rh_dec while the region is
in DM_RH_RECOVERING state, it could be added to quiesced_regions list
multiple times or it could be added to this list when kcopyd is copying
data (it is assumed that the region is not on any list while kcopyd does
its jobs). This results in memory corruption and crash.
There already exist bypasses for REQ_FLUSH requests: REQ_FLUSH requests
do not belong to any region, so they are always added to the sync list
in do_writes. dm_rh_inc_pending does not increase count for REQ_FLUSH
requests. In mirror_end_io, dm_rh_dec is never called for REQ_FLUSH
requests. These bypasses avoid the crash possibility described above.
These bypasses were improperly implemented for REQ_DISCARD when
the mirror target gained discard support in commit
5fc2ffeabb9ee0fc0e71ff16b49f34f0ed3d05b4 (dm raid1: support discard).
In do_writes, REQ_DISCARD requests is always added to the sync queue and
immediately dispatched (even if the region is in DM_RH_RECOVERING). However,
dm_rh_inc and dm_rh_dec is called for REQ_DISCARD resusts. So it violates the
rule that no I/Os are started on DM_RH_RECOVERING regions, and causes the list
corruption described above.
This patch changes it so that REQ_DISCARD requests follow the same path
as REQ_FLUSH. This avoids the crash.
Reference: https://bugzilla.redhat.com/837607
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Boaz Harrosh [Thu, 7 Jun 2012 23:02:30 +0000 (02:02 +0300)]
pnfs-obj: Fix __r4w_get_page when offset is beyond i_size
It is very common for the end of the file to be unaligned on
stripe size. But since we know it's beyond file's end then
the XOR should be preformed with all zeros.
Old code used to just read zeros out of the OSD devices, which is a great
waist. But what scares me more about this situation is that, we now have
pages attached to the file's mapping that are beyond i_size. I don't
like the kind of bugs this calls for.
Fix both birds, by returning a global zero_page, if offset is beyond
i_size.
TODO:
Change the API to ->__r4w_get_page() so a NULL can be
returned without being considered as error, since XOR API
treats NULL entries as zero_pages.
[Bug since 3.2. Should apply the same way to all Kernels since]
CC: Stable Tree <stable@kernel.org>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Boaz Harrosh [Fri, 8 Jun 2012 02:29:40 +0000 (05:29 +0300)]
pnfs-obj: don't leak objio_state if ore_write/read fails
[Bug since 3.2 Kernel]
CC: Stable Tree <stable@kernel.org>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Boaz Harrosh [Wed, 11 Jul 2012 12:27:13 +0000 (15:27 +0300)]
ore: Unlock r4w pages in exact reverse order of locking
The read-4-write pages are locked in address ascending order.
But where unlocked in a way easiest for coding. Fix that,
locks should be released in opposite order of locking, .i.e
descending address order.
I have not hit this dead-lock. It was found by inspecting the
dbug print-outs. I suspect there is an higher lock at caller that
protects us, but fix it regardless.
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Boaz Harrosh [Fri, 8 Jun 2012 01:30:40 +0000 (04:30 +0300)]
ore: Remove support of partial IO request (NFS crash)
Do to OOM situations the ore might fail to allocate all resources
needed for IO of the full request. If some progress was possible
it would proceed with a partial/short request, for the sake of
forward progress.
Since this crashes NFS-core and exofs is just fine without it just
remove this contraption, and fail.
TODO:
Support real forward progress with some reserved allocations
of resources, such as mem pools and/or bio_sets
[Bug since 3.2 Kernel]
CC: Stable Tree <stable@kernel.org>
CC: Benny Halevy <bhalevy@tonian.com>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Boaz Harrosh [Thu, 7 Jun 2012 22:19:07 +0000 (01:19 +0300)]
ore: Fix NFS crash by supporting any unaligned RAID IO
In RAID_5/6 We used to not permit an IO that it's end
byte is not stripe_size aligned and spans more than one stripe.
.i.e the caller must check if after submission the actual
transferred bytes is shorter, and would need to resubmit
a new IO with the remainder.
Exofs supports this, and NFS was supposed to support this
as well with it's short write mechanism. But late testing has
exposed a CRASH when this is used with none-RPC layout-drivers.
The change at NFS is deep and risky, in it's place the fix
at ORE to lift the limitation is actually clean and simple.
So here it is below.
The principal here is that in the case of unaligned IO on
both ends, beginning and end, we will send two read requests
one like old code, before the calculation of the first stripe,
and also a new site, before the calculation of the last stripe.
If any "boundary" is aligned or the complete IO is within a single
stripe. we do a single read like before.
The code is clean and simple by splitting the old _read_4_write
into 3 even parts:
1._read_4_write_first_stripe
2. _read_4_write_last_stripe
3. _read_4_write_execute
And calling 1+3 at the same place as before. 2+3 before last
stripe, and in the case of all in a single stripe then 1+2+3
is preformed additively.
Why did I not think of it before. Well I had a strike of
genius because I have stared at this code for 2 years, and did
not find this simple solution, til today. Not that I did not try.
This solution is much better for NFS than the previous supposedly
solution because the short write was dealt with out-of-band after
IO_done, which would cause for a seeky IO pattern where as in here
we execute in order. At both solutions we do 2 separate reads, only
here we do it within a single IO request. (And actually combine two
writes into a single submission)
NFS/exofs code need not change since the ORE API communicates the new
shorter length on return, what will happen is that this case would not
occur anymore.
hurray!!
[Stable this is an NFS bug since 3.2 Kernel should apply cleanly]
CC: Stable Tree <stable@kernel.org>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Artem Bityutskiy [Sat, 14 Jul 2012 11:33:09 +0000 (14:33 +0300)]
UBIFS: fix a bug in empty space fix-up
UBIFS has a feature called "empty space fix-up" which is a quirk to work-around
limitations of dumb flasher programs. Namely, of those flashers that are unable
to skip NAND pages full of 0xFFs while flashing, resulting in empty space at
the end of half-filled eraseblocks to be unusable for UBIFS. This feature is
relatively new (introduced in v3.0).
The fix-up routine (fixup_free_space()) is executed only once at the very first
mount if the superblock has the 'space_fixup' flag set (can be done with -F
option of mkfs.ubifs). It basically reads all the UBIFS data and metadata and
writes it back to the same LEB. The routine assumes the image is pristine and
does not have anything in the journal.
There was a bug in 'fixup_free_space()' where it fixed up the log incorrectly.
All but one LEB of the log of a pristine file-system are empty. And one
contains just a commit start node. And 'fixup_free_space()' just unmapped this
LEB, which resulted in wiping the commit start node. As a result, some users
were unable to mount the file-system next time with the following symptom:
UBIFS error (pid 1): replay_log_leb: first log node at LEB 3:0 is not CS node
UBIFS error (pid 1): replay_log_leb: log error detected while replaying the log at LEB 3:0
The root-cause of this bug was that 'fixup_free_space()' wrongly assumed
that the beginning of empty space in the log head (c->lhead_offs) was known
on mount. However, it is not the case - it was always 0. UBIFS does not store
in it the master node and finds out by scanning the log on every mount.
The fix is simple - just pass commit start node size instead of 0 to
'fixup_leb()'.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@linux.intel.com>
Cc: stable@vger.kernel.org [v3.0+]
Reported-by: Iwo Mergler <Iwo.Mergler@netcommwireless.com>
Tested-by: Iwo Mergler <Iwo.Mergler@netcommwireless.com>
Reported-by: James Nute <newten82@gmail.com>
Linus Torvalds [Thu, 19 Jul 2012 23:11:28 +0000 (16:11 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/sage/ceph-client
Pull last minute Ceph fixes from Sage Weil:
"The important one fixes a bug in the socket failure handling behavior
that was turned up in some recent failure injection testing. The
other two are minor bug fixes."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
rbd: endian bug in rbd_req_cb()
rbd: Fix ceph_snap_context size calculation
libceph: fix messenger retry
Steven Rostedt [Wed, 6 Jun 2012 00:00:11 +0000 (20:00 -0400)]
ftrace/x86: Add save_regs for i386 function calls
Add saving full regs for function tracing on i386.
The saving of regs was influenced by patches sent out by
Masami Hiramatsu.
Link: http://lkml.kernel.org/r/20120711195745.379060003@goodmis.org
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Steven Rostedt [Mon, 30 Apr 2012 20:20:23 +0000 (16:20 -0400)]
ftrace/x86: Add separate function to save regs
Add a way to have different functions calling different trampolines.
If a ftrace_ops wants regs saved on the return, then have only the
functions with ops registered to save regs. Functions registered by
other ops would not be affected, unless the functions overlap.
If one ftrace_ops registered functions A, B and C and another ops
registered fucntions to save regs on A, and D, then only functions
A and D would be saving regs. Function B and C would work as normal.
Although A is registered by both ops: normal and saves regs; this is fine
as saving the regs is needed to satisfy one of the ops that calls it
but the regs are ignored by the other ops function.
x86_64 implements the full regs saving, and i386 just passes a NULL
for regs to satisfy the ftrace_ops passing. Where an arch must supply
both regs and ftrace_ops parameters, even if regs is just NULL.
It is OK for an arch to pass NULL regs. All function trace users that
require regs passing must add the flag FTRACE_OPS_FL_SAVE_REGS when
registering the ftrace_ops. If the arch does not support saving regs
then the ftrace_ops will fail to register. The flag
FTRACE_OPS_FL_SAVE_REGS_IF_SUPPORTED may be set that will prevent the
ftrace_ops from failing to register. In this case, the handler may
either check if regs is not NULL or check if ARCH_SUPPORTS_FTRACE_SAVE_REGS.
If the arch supports passing regs it will set this macro and pass regs
for ops that request them. All other archs will just pass NULL.
Link: http://lkml.kernel.org/r/20120711195745.107705970@goodmis.org
Cc: Alexander van Heukelum <heukelum@fastmail.fm>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Steven Rostedt [Thu, 11 Aug 2011 02:00:55 +0000 (22:00 -0400)]
ftrace/x86_32: Push ftrace_ops in as 3rd parameter to function tracer
Add support of passing the current ftrace_ops into the 3rd parameter
of the callback to the function tracer.
Link: http://lkml.kernel.org/r/20120612225424.942411318@goodmis.org
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Steven Rostedt [Tue, 9 Aug 2011 16:50:46 +0000 (12:50 -0400)]
ftrace: Return pt_regs to function trace callback
Return as the 4th paramater to the function tracer callback the pt_regs.
Later patches that implement regs passing for the architectures will require
having the ftrace_ops set the SAVE_REGS flag, which will tell the arch
to take the time to pass a full set of pt_regs to the ftrace_ops callback
function. If the arch does not support it then it should pass NULL.
If an arch can pass full regs, then it should define:
ARCH_SUPPORTS_FTRACE_SAVE_REGS to 1
Link: http://lkml.kernel.org/r/20120702201821.019966811@goodmis.org
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Steven Rostedt [Tue, 5 Jun 2012 13:44:25 +0000 (09:44 -0400)]
ftrace: Consolidate arch dependent functions with 'list' function
As the function tracer starts to get more features, the support for
theses features will spread out throughout the different architectures
over time. These features boil down to what each arch does in the
mcount trampoline (the ftrace_caller).
Currently there's two features that are not the same throughout the
archs.
1) Support to stop function tracing before the callback
2) passing of the ftrace ops
Both of these require placing an indirect function to support the
features if the mcount trampoline does not.
On a side note, for all architectures, when more than one callback
is registered to the function tracer, an intermediate 'list' function
is called by the mcount trampoline to iterate through the callbacks
that are registered.
Instead of making a separate function for each of these features,
and requiring several indirect calls, just use the single 'list' function
as the intermediate, to handle all cases. If an arch does not support
the 'stop function tracing' or the passing of ftrace ops, just force
it to use the list function that will handle the features required.
This makes the code cleaner and simpler and removes a lot of
#ifdefs in the code.
Link: http://lkml.kernel.org/r/20120612225424.495625483@goodmis.org
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Steven Rostedt [Mon, 8 Aug 2011 20:57:47 +0000 (16:57 -0400)]
ftrace: Pass ftrace_ops as third parameter to function trace callback
Currently the function trace callback receives only the ip and parent_ip
of the function that it traced. It would be more powerful to also return
the ops that registered the function as well. This allows the same function
to act differently depending on what ftrace_ops registered it.
Link: http://lkml.kernel.org/r/20120612225424.267254552@goodmis.org
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Linus Torvalds [Thu, 19 Jul 2012 15:27:13 +0000 (08:27 -0700)]
Merge tag 'md-3.5-fixes' of git://neil.brown.name/md
Pull three md bugfixes from NeilBrown:
"One of the bugs was introduced in 3.5-rc1. Others have been there for
longer."
* tag 'md-3.5-fixes' of git://neil.brown.name/md:
md/raid1: close some possible races on write errors during resync
md: avoid crash when stopping md array races with closing other open fds.
md: fix bug in handling of new_data_offset
Linus Torvalds [Thu, 19 Jul 2012 15:21:13 +0000 (08:21 -0700)]
Merge git://git./linux/kernel/git/davem/net
Pull networking changes from David Miller:
"Ok, we should be good to go now"
1) We have to statically initialize the init_net device list head rather
than do so in an initcall, otherwise netprio_cgroup crashes if it's
built statically rather than modular (Mark D. Rustad)
2) Fix SKB null oopser in CIPSO ipv4 option processing (Paul Moore)
3) Qlogic maintainers update (Anirban Chakraborty)
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
net: Statically initialize init_net.dev_base_head
MAINTAINERS: Changes in qlcnic and qlge maintainers list
cipso: don't follow a NULL pointer when setsockopt() is called
Linus Torvalds [Thu, 19 Jul 2012 15:15:55 +0000 (08:15 -0700)]
Merge branch 'upstream-fixes' of git://git./linux/kernel/git/jikos/hid
Pull HID update from Jiri Kosina:
"A final round of changes for HID for 3.5: just device ID additions."
* 'upstream-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
HID: hid-multitouch: add support for Zytronic panels
HID: add Sennheiser BTD500USB device support
HID: add battery quirk for Apple Wireless ANSI
Ezequiel Garcia [Wed, 18 Jul 2012 13:05:26 +0000 (10:05 -0300)]
cx25821: Remove bad strcpy to read-only char*
The strcpy was being used to set the name of the board. Since the
destination char* was read-only and the name is set statically at
compile time; this was both wrong and redundant.
The type of char* is changed to const char* to prevent future errors.
Reported-by: Radek Masin <radek@masin.eu>
Signed-off-by: Ezequiel Garcia <elezegarcia@gmail.com>
[ Taking directly due to vacations - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Benjamin Tissoires [Tue, 19 Jun 2012 12:39:54 +0000 (14:39 +0200)]
HID: hid-multitouch: add support for Zytronic panels
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@enac.fr>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Sebastian Andrzej Siewior [Thu, 19 Jul 2012 05:13:54 +0000 (07:13 +0200)]
MIPS: PCI: Move fixups from __init to __devinit.
Fixups are executed once the pci-device is found which is during boot
process so __init seems fine as long as the platform does not support
hotplug.
However it is possible to remove the PCI bus at run time and have it
rediscovered again via "echo 1 > /sys/bus/pci/rescan" and this will call
the fixups again.
[ralf@linux-mips.org: Made piixirqmap[] in malta_piix_func0_fixup()
__initdata.]
Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Cc: linux-mips@linux-mips.org
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>