Ingo Molnar [Fri, 3 Oct 2014 03:29:14 +0000 (05:29 +0200)]
Merge tag 'perf-core-for-mingo' of git://git./linux/kernel/git/acme/linux into perf/core
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
User visible changes:
* Fix mmap return address truncation to 32-bit in 'perf trace'. (Chang Hyun Park)
* Support operations for shared futexes. (Davidlohr Bueso)
* Fix error message for --filter option not coming after tracepoint. (Arnaldo Carvalho de Melo)
Infrastructure changes:
* Refactor unit and scale function parameters for PMU parsing routines. (Matt Fleming)
* Improve DSO long names lookup with rbtree, resulting in great speedup for
workloads with lots of DSOs. (Waiman Long)
* Fix build breakage on arm64 targets. (Will Deacon)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Arnaldo Carvalho de Melo [Wed, 1 Oct 2014 18:05:32 +0000 (15:05 -0300)]
perf record: Fix error message for --filter option not coming after tracepoint
[root@zoo ~]# perf record --filter "common_pid != PERF_PID" -a
-F option should follow a -e tracepoint option.
The -F option is for --freq, not --filter. Fix it up to show:
[root@zoo ~]# perf record --filter "common_pid != PERF_PID" -a
--filter option should follow a -e tracepoint option
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-z0yrm8stn9w3423nkov3eksg@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Will Deacon [Tue, 30 Sep 2014 11:27:12 +0000 (12:27 +0100)]
perf tools: Fix build breakage on arm64 targets
Attempting to build the perf tool for an arm64 target results in the
following failure:
arch/arm64/util/unwind-libunwind.c: In function 'libunwind__arch_reg_id':
arch/arm64/util/unwind-libunwind.c:77:3: error: implicit declaration of function 'pr_err'
pr_err("unwind: invalid reg id %d\n", regnum);
^
arch/arm64/util/unwind-libunwind.c:77:3: error: nested extern declaration of 'pr_err'
This is due to commit
84f5d36f4866 ("perf tools: Move pr_* debug macros
into debug object") moving the pr_* macros into a new header file, but
failing to update architectures other than x86.
This patch adds the missing include, and fixes the build again.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Cc: Jean Pihet <jean.pihet@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/1412076432-22045-1-git-send-email-will.deacon@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Waiman Long [Tue, 30 Sep 2014 17:36:15 +0000 (13:36 -0400)]
perf symbols: Improve DSO long names lookup speed with rbtree
With workload that spawns and destroys many threads and processes, it
was found that perf-mem could took a long time to post-process the perf
data after the target workload had completed its operation.
The performance bottleneck was found to be the lookup and insertion of
the new DSO structures (thousands of them in this case).
In a dual-socket Ivy-Bridge E7-4890 v2 machine (30-core, 60-thread), the
perf profile below shows what perf was doing after the profiled AIM7
shared workload completed:
- 83.94% perf libc-2.11.3.so [.] __strcmp_sse42
- __strcmp_sse42
- 99.82% map__new
machine__process_mmap_event
perf_session_deliver_event
perf_session__process_event
__perf_session__process_events
cmd_record
cmd_mem
run_builtin
main
__libc_start_main
- 13.17% perf perf [.] __dsos__findnew
__dsos__findnew
map__new
machine__process_mmap_event
perf_session_deliver_event
perf_session__process_event
__perf_session__process_events
cmd_record
cmd_mem
run_builtin
main
__libc_start_main
So about 97% of CPU times were spent in the map__new() function trying
to insert new DSO entry into the DSO linked list. The whole
post-processing step took about 9 minutes.
The DSO structures are currently searched linearly. So the total
processing time will be proportional to n^2.
To overcome this performance problem, the DSO code is modified to also
put the DSO structures in a RB tree sorted by its long name in
additional to being in a simple linked list. With this change, the
processing time will become proportional to n*log(n) which will be much
quicker for large n. However, the short name will still be searched
using the old linear searching method. With that patch in place, the
same perf-mem post-processing step took less than 30 seconds to
complete.
Signed-off-by: Waiman Long <Waiman.Long@hp.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Douglas Hatch <doug.hatch@hp.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Scott J Norton <scott.norton@hp.com>
Link: http://lkml.kernel.org/r/1412098575-27863-3-git-send-email-Waiman.Long@hp.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Waiman Long [Mon, 29 Sep 2014 20:07:28 +0000 (16:07 -0400)]
perf symbols: Encapsulate dsos list head into struct dsos
This is a precursor patch to enable long name searching of DSOs using
a rbtree.
In this patch, a new dsos structure is created which contains only a
list head structure for the moment.
The new dsos structure is used, in turn, in the machine structure for
the user_dsos and kernel_dsos fields.
Only the following 3 dsos functions are modified to accept the new dsos
structure parameter instead of list_head:
- dsos__add()
- dsos__find()
- __dsos__findnew()
Signed-off-by: Waiman Long <Waiman.Long@hp.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Douglas Hatch <doug.hatch@hp.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Scott J Norton <scott.norton@hp.com>
Link: http://lkml.kernel.org/r/1412021249-19201-2-git-send-email-Waiman.Long@hp.com
[ Move struct dsos to dso.h to reduce the dso methods depends on machine.h ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Davidlohr Bueso [Mon, 29 Sep 2014 16:41:08 +0000 (09:41 -0700)]
perf bench futex: Sanitize -q option in requeue
When given the number of threads to requeue at once by user input,
there's always the risk of this value being larger than the total number
of threads. This doesn't make any sense, and the kernel can easily deal
with such sort of situations, hence no big deal. We should however
prevent bogus output such as:
./perf bench --repeat 2 futex requeue -q 10
Run summary [PID 22210]: Requeuing 4 threads (from [private] 0x99ef3c to 0x99ef38), 10 at a time.
[Run 1]: Requeued 10 of 4 threads in 0.0040 ms
[Run 2]: Requeued 10 of 4 threads in 0.0030 ms
Requeued 10 of 4 threads in 0.0035 ms (+-14.29%)
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Davidlohr Bueso <dbueso@suse.de>
Link: http://lkml.kernel.org/r/1412008868-22328-2-git-send-email-dave@stgolabs.net
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Davidlohr Bueso [Mon, 29 Sep 2014 16:41:07 +0000 (09:41 -0700)]
perf bench futex: Support operations for shared futexes
Unlike futex-hash, requeuing and wakeup benchmarks do not support shared
futexes, limiting the usefulness of the programs. Correct this, and
allow using the local -S parameter. The default remains using private
futexes.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Davidlohr Bueso <dbueso@suse.de>
Link: http://lkml.kernel.org/r/1412008868-22328-1-git-send-email-dave@stgolabs.net
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Chang Hyun Park [Fri, 26 Sep 2014 12:54:01 +0000 (21:54 +0900)]
perf trace: Fix mmap return address truncation to 32-bit
Using 'perf trace' for mmap is truncating return values by stripping the
top 32 bits, actually printing only the lower 32 bits.
This was because the ret value was of an 'int' type and not a 'long'
type.
The Problem:
991258501.244 ( 0.004 ms): mmap(len:
40001536, prot: READ|WRITE, flags: PRIVATE|ANONYMOUS, fd: -1) = 0x56691000
991258501.257 ( 0.000 ms): minfault [_int_malloc+0x1038] => //anon@0x7fa056691008 //(d.)
The first line shows an mmap, which succeeds and returns 0x56691000.
However the next line shows a memory access to that virtual memory area,
specifically to 0x7fa056691008. The upper 32 bit is lost due to the
problem mentioned above, and thus mmap's return value didn't have the
upper 0x7fa0.
Tested on 3.17-rc5 from the linus's tree, and the HEAD of tip/master
Signed-off-by: Chang Hyun Park <heartinpiece@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1411736041-8017-1-git-send-email-heartinpiece@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Matt Fleming [Wed, 24 Sep 2014 14:04:06 +0000 (15:04 +0100)]
perf tools: Refactor unit and scale function parameters
Passing pointers to alias modifiers 'unit' and 'scale' isn't very
future-proof since if we add more modifiers to the list we'll end up
passing more arguments.
Instead wrap everything up in a struct perf_pmu_info, which can easily
be expanded when additional alias modifiers are necessary in the future.
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1411567455-31264-3-git-send-email-matt@console-pimps.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ingo Molnar [Sat, 27 Sep 2014 07:15:48 +0000 (09:15 +0200)]
Merge tag 'perf-core-for-mingo' of git://git./linux/kernel/git/acme/linux into perf/core
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
User visible changes:
o Restore "--callchain graph" output, broken in recent cset to end
up being the same as "fractal" (Namhyung Kim)
o Allow profiling when kptr_restrict == 1 for non root users,
kernel samples will just remain unresolved (Andi Kleen)
o Allow configuring default options for callchains in config file (Namhyung Kim)
o Fix line number in the config file error message (Jiri Olsa)
o Fix --per-core on multi socket systems (Andi Kleen)
Cleanups:
o Use ACCESS_ONCE() instead of volatile cast. (Pranith Kumar)
o Modify error code for when perf_session__new() fails (Taeung Song)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Jiri Olsa [Tue, 23 Sep 2014 11:56:56 +0000 (13:56 +0200)]
perf tools: Fix line number in the config file error message
If we fail to parse the config file within the callback function,
the line number counter 'could be' already on the next line.
This results in wrong line number report like:
$ cat ~/.perfconfig
[call-graph]
sort-key = krava
$ perf record ls
Fatal: bad config file line 3 in /home/jolsa/.perfconfig
Fixing this by saving the current line number for this case.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Milian Wolff <mail@milianw.de>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20140923115656.GC2979@krava.brq.redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Tue, 23 Sep 2014 01:01:44 +0000 (10:01 +0900)]
perf tools: Convert {record,top}.call-graph option to call-graph.record-mode
So that it'll be passed to perf_callchain_config().
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Milian Wolff <mail@milianw.de>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1411434104-5307-6-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Tue, 23 Sep 2014 01:01:43 +0000 (10:01 +0900)]
perf tools: Introduce perf_callchain_config()
This patch adds support for following config options to ~/.perfconfig file.
[call-graph]
record-mode = dwarf
dump-size = 8192
print-type = fractal
order = callee
threshold = 0.5
print-limit = 128
sort-key = function
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Milian Wolff <mail@milianw.de>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1411434104-5307-5-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Tue, 23 Sep 2014 01:01:42 +0000 (10:01 +0900)]
perf callchain: Move some parser functions to callchain.c
And rename record_callchain_parse() to parse_callchain_record_opt() in
accordance to parse_callchain_report_opt().
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Milian Wolff <mail@milianw.de>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1411434104-5307-4-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Tue, 23 Sep 2014 01:01:41 +0000 (10:01 +0900)]
perf tools: Move callchain config from record_opts to callchain_param
So that all callchain config parameters can be read/written to a single
place. It's a preparation to consolidate handling of all callchain
options.
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Milian Wolff <mail@milianw.de>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1411434104-5307-3-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Tue, 23 Sep 2014 01:01:40 +0000 (10:01 +0900)]
perf hists browser: Fix callchain print bug on TUI
Currently perf report -g graph option doesn't work as expected and
always work as same as -g fractal. This was a bug during recent
callchain print code cleanup.
Before:
$ perf report -g graph
Children Self Command Shared Object Symbol
================================================================
- 56.19% 35.41% sleep [kernel.kallsyms] [k] page_fault
- page_fault
+ 63.02% _dl_relocate_object
+ 36.98% clear_user
After:
Children Self Command Shared Object Symbol
================================================================
- 56.19% 35.41% sleep [kernel.kallsyms] [k] page_fault
- page_fault
+ 35.41% _dl_relocate_object
+ 20.78% clear_user
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Milian Wolff <mail@milianw.de>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1411434104-5307-2-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Pranith Kumar [Tue, 23 Sep 2014 14:55:08 +0000 (10:55 -0400)]
perf tools: Use ACCESS_ONCE() instead of volatile cast
Use ACCESS_ONCE() instead of the cast to volatile and read. This is just
a style change which is reader friendly.
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1411484109-10442-1-git-send-email-bobby.prani@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Taeung Song [Wed, 24 Sep 2014 01:33:37 +0000 (10:33 +0900)]
perf tools: Modify error code for when perf_session__new() fails
Because perf_session__new() can fail for more reasons than just ENOMEM,
modify error code(ENOMEM or EINVAL) to -1.
Signed-off-by: Taeung Song <treeze.taeung@gmail.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1411522417-9917-1-git-send-email-treeze.taeung@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Andi Kleen [Wed, 24 Sep 2014 21:39:54 +0000 (14:39 -0700)]
perf tools: Fix perf record as non root with kptr_restrict == 1
Currently perf record always errors out when you run it as non-root with
kptr_restrict == 1, which is often the default.
Make it only warn instead and fix the kernel resolve code to not
segfault later. Profiling works still fine, except kernel symbols are
not resolved.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1411594794-7229-1-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Andi Kleen [Wed, 24 Sep 2014 20:50:46 +0000 (13:50 -0700)]
perf stat: Fix --per-core on multi socket systems
On systems with more than one socket perf stat --per-core would either
segfault or stop before outputting all cores.
The problem was that the output code referenced the id including the
socket number in the higher bits, which is far beyond any per cpu array.
Mask out the socket number before referencing cpus in abs_printout.
I also renamed the variable in nsec_printout to be clear what it is,
even though it doesn't reference cpus.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Stephane Eranian <eranian@google.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1411591846-32736-1-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ingo Molnar [Fri, 26 Sep 2014 09:12:46 +0000 (11:12 +0200)]
Merge tag 'perf-fdarray-for-mingo' of git://git./linux/kernel/git/acme/linux into perf/core
Pull perf tooling updates from Arnaldo Carvalho de Melo.
Infrastructure changes:
* We were not handling POLLHUP notifications for event file descriptors.
Fix it by filtering entries in the events file descriptor array after
poll() returns, refcounting mmaps so that when the last fd pointing to
a perf mmap goes away we do the unmap. (Arnaldo Carvalho de Melo)
User visible changes:
* Now 'record' and 'trace' properly exit when a target thread exits.
(Arnaldo Carvalho de Melo)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Arnaldo Carvalho de Melo [Mon, 22 Sep 2014 17:39:48 +0000 (14:39 -0300)]
perf trace: Filter out POLLHUP'ed file descriptors
So that we don't continue polling on vanished file descriptors, i.e.
file descriptors for events monitoring threads that exited.
I.e. the following 'trace' command now exits as expected, instead
of staying in an eternal loop:
$ sleep 5s &
$ trace -p `pidof sleep`
Reported-by: Jiri Olsa <jolsa@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-6qegv786zbf6i8us6t4rxug9@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Wed, 13 Aug 2014 14:33:59 +0000 (11:33 -0300)]
perf record: Filter out POLLHUP'ed file descriptors
So that we don't continue polling on vanished file descriptors, i.e.
file descriptors for events monitoring threads that exited.
I.e. the following 'perf record' command now exits as expected, instead
of staying in an eternal loop:
$ sleep 5s &
$ perf record -p `pidof sleep`
Reported-by: Jiri Olsa <jolsa@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-8dg8o21t2ntzly2bfh53p3sg@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Mon, 8 Sep 2014 14:27:49 +0000 (11:27 -0300)]
perf evlist: Unmap when all refcounts to fd are gone and events drained
As noticed by receiving a POLLHUP for all its pollfd entries.
That will remove the refcount taken in perf_evlist__mmap_per_evsel(),
and when all events are consumed via perf_evlist__mmap_read() +
perf_evlist__mmap_consume(), the ring buffer will be unmap'ed.
Thanks to Jiri Olsa for pointing out that we must wait till all events
are consumed, not being ok to unmmap just when receiving all the
POLLHUPs.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jean Pihet <jean.pihet@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-t10w1xk4myp7ca7m9fvip6a0@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Mon, 8 Sep 2014 14:24:01 +0000 (11:24 -0300)]
tools lib fd array: Allow associating an integer cookie with each entry
We will use this in perf's evlist class so that it can, at
fdarray__filter() time, to unmap the associated ring buffer.
We may need to have further info associated with each fdarray entry, in
that case we'll make that int array a 'union fdarray_priv' one and put a
pointer there so that users can stash whatever they want there. For now,
an int is enough tho.
v2: Add clarification to the per array entry priv area, as well as make
it a union, which makes usage a bit longer, but if/when we make it
use more space by allowing per entry pointers existing users source
code will not have to be changed, just rebuilt.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jean Pihet <jean.pihet@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lkml.kernel.org/n/tip-0p00bn83quck3fio3kcs9vca@git.kernel.org
Arnaldo Carvalho de Melo [Mon, 8 Sep 2014 16:26:35 +0000 (13:26 -0300)]
perf evlist: Refcount mmaps
We need to know how many fds are using a perf mmap via
PERF_EVENT_IOC_SET_OUTPUT, so that we can know when to ditch an mmap,
refcount it.
v2: Automatically unmap it when the refcount hits one, which will happen
when all fds are filtered by perf_evlist__filter_pollfd(), in later
patches.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jean Pihet <jean.pihet@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20140908153824.GG2773@kernel.org
Link: http://lkml.kernel.org/n/tip-cpv7v2lw0g74ucmxa39xdpms@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Wed, 3 Sep 2014 21:02:59 +0000 (18:02 -0300)]
tools lib api: Adopt fdarray class from perf's evlist
The extensible file description array that grew in the perf_evlist class
can be useful for other tools, as it is not something that only evlists
need, so move it to tools/lib/api/fd to ease sharing it.
v2: Don't use {} like in:
libapi_dirs:
$(QUIET_MKDIR)mkdir -p $(OUTPUT){fs,fd}/
in Makefiles, as it will not work in some systems, as in ubuntu13.10.
v3: Add fd/*.[ch] to LIBAPIKFS_SOURCES (Fix from Jiri Olsa)
v4: Leave the fcntl(fd, O_NONBLOCK) in the evlist layer, remains to
be checked if it is really needed there, but has no place in the
fdarray class (Fix from Jiri Olsa)
v5: Remove evlist details from fdarray grow/filter tests. Improve it a
bit doing more tests about expected internal state.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jean Pihet <jean.pihet@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-kleuni3hckbc3s0lu6yb9x40@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Mon, 18 Aug 2014 20:25:59 +0000 (17:25 -0300)]
perf evlist: Introduce poll method for common code idiom
Since we have access two evlist members in all these poll calls, provide
a helper.
This will also help to make the patch introducing the pollfd class more
clear, as the evlist specific uses will be hiden away
perf_evlist__poll().
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jean Pihet <jean.pihet@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-jr9d4aop4lvy9453qahbcgp0@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Mon, 18 Aug 2014 20:12:30 +0000 (17:12 -0300)]
perf kvm stat live: Use perf_evlist__add_pollfd() instead of local equivalent
Since we can add file descriptors to the evlist pollfd and it will
autogrow, no need to copy all events to a local pollfd array, just add
the timer and stdin file descriptors.
Reviewed-by: David Ahern <dsahern@gmail.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jean Pihet <jean.pihet@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-2hvp9iromiheh6rl4oaa08x5@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Mon, 18 Aug 2014 19:49:00 +0000 (16:49 -0300)]
perf tests: Add pollfd growing test
[acme@ssdandy linux]$ perf test "Add fd"
34: Add fd to pollfd array, making it autogrow : Ok
[acme@ssdandy linux]$ perf test -v "Add fd"
34: Add fd to pollfd array, making it autogrow :
--- start ---
test child forked, pid 19817
before growing array: 2 [ 1, 2 ]
after 3rd add_pollfd: 3 [ 1, 2, 35 ]
after 4th add_pollfd: 4 [ 1, 2, 35, 88 ]
test child finished with 0
---- end ----
Add fd to pollfd array, making it autogrow: Ok
[acme@ssdandy linux]$
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jean Pihet <jean.pihet@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-smflpyta146bzog7z0effjss@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Mon, 18 Aug 2014 19:44:06 +0000 (16:44 -0300)]
perf evlist: Allow growing pollfd on add method
This way we will be able to add more file descriptors to be polled,
like stdin or some timer fd.
At this point we might as well yank the pollfd class from evlist so that
it can be used in other places.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jean Pihet <jean.pihet@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-o2mzsjl7taumsoc35ryol00i@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Mon, 8 Sep 2014 15:55:12 +0000 (12:55 -0300)]
perf evlist: We need to poll all event file descriptors
Because we want to notice when they get POLLHUP'ed, so that we can
figure out when all threads exited in a workload being monitored.
We can't just monitor the fds that were mmaped, we need to notice when
all the fds that were PERF_EVENT_IOC_SET_OUTPUT'ed too, because the mmap
stays even after the fd that originally was used to do the mmap call
went away, its only when all the set-output fds for a mmap are gone that
the mmap is.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/20140908151016.GH17728@krava.brq.redhat.com
Link: http://lkml.kernel.org/n/tip-24omlq5asrfg4uo3muuzn2bl@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Wed, 13 Aug 2014 14:26:21 +0000 (11:26 -0300)]
perf evlist: Monitor POLLERR and POLLHUP events too
We want to know when the fd went away, like when a monitored thread
exits.
If we do not monitor such events, then the tools will wait forever on
events from a vanished thread, like when running:
$ sleep 5s &
$ perf record -p `pidof sleep`
This builds upon the kernel patch by Jiri Olsa that actually makes a
poll on those file descriptors to return POLLHUP.
It is also needed to change the tools to use
perf_evlist__filter_pollfd() to check if there are remainings fds to
monitor or if all are gone, in which case they will exit the
poll/mmap/read loop.
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-a4fslwspov0bs69nj825hqpq@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Wed, 13 Aug 2014 02:34:06 +0000 (23:34 -0300)]
perf tests: Add test for perf_evlist__filter_pollfd()
That will use a synthetic evlist with just what is touched by this new
method to check that it works as expected.
Output in verbose mode:
$ perf test -v pollfd
33: Filter fds with revents mask in a pollfd array :
--- start ---
filtering all but pollfd[2]:
before: 5 [ 5, 4, 3, 2, 1 ]
after: 1 [ 3 ]
filtering all but (pollfd[0], pollfd[3]):
before: 5 [ 5, 4, 3, 2, 1 ]
after: 2 [ 5, 2 ]
test child finished with 0
---- end ----
Filter fds with revents mask in a pollfd array: Ok
$
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-x7c8liszdvc3ocmanf2cet8p@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Wed, 13 Aug 2014 02:04:11 +0000 (23:04 -0300)]
perf evlist: Introduce perf_evlist__filter_pollfd method
To remove all entries in evlist->pollfd[] that have revents matching at
least one of the bits in the specified mask.
It'll adjust evlist->nr_fds to the number of unfiltered fds and will
return this value, as a convenience and to avoid requiring direct access
to internal state of perf_evlist objects.
This will be used after polling the evlist fds so that we remove fds
that were closed by the kernel.
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-y2sca7z3wicvvy40a50lozwm@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Stephane Eranian [Wed, 17 Sep 2014 09:06:16 +0000 (11:06 +0200)]
perf/x86/intel/uncore: Update support for client uncore IMC PMU
This patch restructures the memory controller (IMC) uncore PMU support
for client SNB/IVB/HSW processors. The main change is that it can now
cope with more than one PCI device ID per processor model. There are
many flavors of memory controllers for each processor. They have
different PCI device ID, yet they behave the same w.r.t. the memory
controller PMU that we are interested in.
The patch now supports two distinct memory controllers for IVB
processors: one for mobile, one for desktop.
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140917090616.GA11281@quad
Cc: ak@linux.intel.com
Cc: kan.liang@intel.com
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Andi Kleen [Thu, 4 Sep 2014 23:08:29 +0000 (16:08 -0700)]
perf/x86/intel/uncore: Fix PCU filter setup for Sandy/Ivy/Haswell EP
The PCU frequency band filters use 8 bit each in a register.
When setting up the value the shift value was not correctly
scaled, which resulted in all filters except for band 0 to
be zero. Fix the scaling.
This allows to correctly monitor multiple uncore frequency bands.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/1409872109-31645-5-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Andi Kleen [Thu, 4 Sep 2014 23:08:28 +0000 (16:08 -0700)]
perf/x86/intel/uncore: Add missing cbox filter flags on IvyBridge-EP uncore driver
The IvyBridge-EP uncore driver was missing three filter flags:
NC, ISOC, C6 which are useful in some cases. Support them in the same way
as the Haswell EP driver, by allowing to set them and exposing
them in the sysfs formats.
Also fix a typo in a define.
Relies on the Haswell EP driver to be applied earlier.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1409872109-31645-4-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Yan, Zheng [Thu, 4 Sep 2014 23:08:27 +0000 (16:08 -0700)]
perf/x86/intel/uncore: Register the PMU only if the uncore pci device exists
Current code registers PMUs for all possible uncore pci devices.
This is not good because, on some machines, one or more uncore pci
devices can be missing. The missing pci device make corresponding
PMU unusable. Register the PMU only if the uncore device exists.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/1409872109-31645-3-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Yan, Zheng [Thu, 4 Sep 2014 23:08:26 +0000 (16:08 -0700)]
perf/x86/intel/uncore: Add Haswell-EP uncore support
The uncore subsystem in Haswell-EP is similar to Sandy/Ivy
Bridge-EP. There are some differences in config register
encoding and pci device IDs. The Haswell-EP uncore also
supports a few new events. Add the Haswell-EP driver to
the snbep split driver.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
[ Add missing break. Add imc events. Add cbox nc/isoc/c6. ]
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/1409872109-31645-2-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Andi Kleen [Tue, 2 Sep 2014 18:44:15 +0000 (11:44 -0700)]
perf/x86/intel: Use Broadwell cache event list for Haswell
Use the newly added Broadwell cache event list for Haswell too.
All Haswell and Broadwell events and offcore masks used in these lists
are identical.
However Haswell is very different from the Sandy Bridge
list that was used previously. That fixes a wide range of mis-counting
cache events.
The node events are now only for retired memory events, so prefetching
and speculative memory accesses are not included. They are PEBS
capable now, which makes it much easier to sample for them, plus it's
possible to create address maps with -d.
The prefetch events are gone now. They way the hardware counts
them is very misleading (some prefetches included, others not), so
it seemed best to leave them out.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/1409683455-29168-5-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Andi Kleen [Tue, 2 Sep 2014 18:44:14 +0000 (11:44 -0700)]
perf/x86: Add INST_RETIRED.ALL workarounds
On Broadwell INST_RETIRED.ALL cannot be used with any period
that doesn't have the lowest 6 bits cleared. And the period
should not be smaller than 128.
Add a new callback to enforce this, and set it for Broadwell.
This is erratum BDM57 and BDM11.
How does this handle the case when an app requests a specific
period with some of the bottom bits set
The apps thinks it is sampling at X occurences per sample, when it is
in fact at X - 63 (worst case).
Short answer:
Any useful instruction sampling period needs to be 4-6 orders
of magnitude larger than 128, as an PMI every 128 instructions
would instantly overwhelm the system and be throttled.
So the +-64 error from this is really small compared to the
period, much smaller than normal system jitter.
Long answer:
<write up by Peter:>
IFF we guarantee perf_event_attr::sample_period >= 128.
Suppose we start out with sample_period=192; then we'll set period_left
to 192, we'll end up with left = 128 (we truncate the lower bits). We
get an interrupt, find that period_left = 64 (>0 so we return 0 and
don't get an overflow handler), up that to 128. Then we trigger again,
at n=256. Then we find period_left = -64 (<=0 so we return 1 and do get
an overflow). We increment with sample_period so we get left = 128. We
fire again, at n=384, period_left = 0 (<=0 so we return 1 and get an
overflow). And on and on.
So while the individual interrupts are 'wrong' we get then with
interval=256,128 in exactly the right ratio to average out at 192. And
this works for everything >=128.
So the num_samples*fixed_period thing is still entirely correct +- 127,
which is good enough I'd say, as you already have that error anyhow.
So no need to 'fix' the tools, al we need to do is refuse to create
INST_RETIRED:ALL events with sample_period < 128.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com>
Cc: Mark Davies <junk@eslaf.co.uk>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1409683455-29168-4-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Andi Kleen [Tue, 2 Sep 2014 18:44:13 +0000 (11:44 -0700)]
perf/x86/intel: Add Broadwell core support
Add Broadwell support for Broadwell Client to perf. This is very
similar to Haswell. It uses a new cache event table, because there
were various changes there.
The constraint list has one new event that needs to be handled over
Haswell.
The PEBS event list is the same, so we reuse Haswell's.
[fengguang.wu: make intel_bdw_event_constraints[] static]
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/1409683455-29168-3-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Andi Kleen [Tue, 2 Sep 2014 18:44:12 +0000 (11:44 -0700)]
perf/x86/intel: Document all Haswell models
Add names for each Haswell model as requested by Peter.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/1409683455-29168-2-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Andi Kleen [Tue, 2 Sep 2014 18:44:11 +0000 (11:44 -0700)]
perf/x86/intel: Remove incorrect model number from Haswell perf
71 is a Broadwell, not a Haswell. The model number was added
by mistake earlier.
Remove it for now, until it can be re-added later with
real Broadwell support.
In practice it does not cause a lot of issues because the Broadwell
PMU is very similar to Haswell, but some details were wrong,
and it's better to handle it correctly.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: eranian@google.com
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Link: http://lkml.kernel.org/r/1409683455-29168-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Jiri Olsa [Fri, 12 Sep 2014 11:18:28 +0000 (13:18 +0200)]
Revert "perf: Do not allow optimized switch for non-cloned events"
This reverts commit
1f9a7268c67f0290837aada443d28fd953ddca90.
With the fix of the initial state for the cloned event we now correctly
handle the error described in:
1f9a7268c67f perf: Do not allow optimized switch for non-cloned events
so we can revert it.
I made an automated test for this, but its not suitable for automated
perf tests framework. It needs to be customized for each machine (the
more cpu the higher numbers for GROUPS/WORKERS/BYTES) and it could take
longer time to hit the issue.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20140910143535.GD2409@krava.brq.redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Jiri Olsa [Fri, 12 Sep 2014 11:18:27 +0000 (13:18 +0200)]
perf: Fix child event initial state setup
Currently we initialize the child event based on the original
parent state. This is wrong, because the original parent event
(and its state) is not related to current fork and also could
be already gone.
We need to initialize the child state based on the immediate
parent event state.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1410520708-19275-2-git-send-email-jolsa@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Jiri Olsa [Fri, 12 Sep 2014 11:18:26 +0000 (13:18 +0200)]
perf: Do not POLLHUP event if it has children
Currently we return POLLHUP in event polling if the monitored
process is done, but we didn't consider possible children,
that might be still running and producing data.
Before returning POLLHUP making sure that:
1) the monitored task has exited and that
2) we don't have any children to monitor
Also adding parent wakeup when the child event is gone.
Suggested-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1410520708-19275-1-git-send-email-jolsa@kernel.org
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Ingo Molnar [Fri, 19 Sep 2014 05:13:36 +0000 (07:13 +0200)]
Merge tag 'perf-core-for-mingo' of git://git./linux/kernel/git/acme/linux into perf/core
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
User visible changes:
o Add +field argument support for --sort option (Jiri Olsa)
o Do not access kallsyms when analyzing user binaries with 'probe' (Masami Hiramatsu)
o Ignore stripped vmlinux and fallback to kallsyms (Anton Blanchard)
o Add path to Ubuntu kernel debuginfo file (Anton Blanchard)
o Disable kernel symbol demangling by default (Avi Kivity)
Infrastructure changes:
o More intel PT prep work, from Adrian Hunter, including:
- Let a user specify a PMU event without any config terms
- Add perf-with-kcore script
- Let default config be defined for a PMU
- Add perf_pmu__scan_file()
o "perf kvm stat report" improvements by Alexander Yarygin:
o Save pid string in opts.target.pid
o Enable the target.system_wide flag
o Unify the title bar output
o Fix build issue on powerpc when DWARF support is disabled (Anton Blanchard)
o Allow to specify lib compile variable for spec usage (Jiri Olsa)
o Fix build on ARM (Stephane Eranian)
o Fix build on powerpc when DWARF support is disabled (Anton Blanchard)
o Don't include sys/poll.h directly (Arnaldo Carvalho de Melo)
o Use ring buffer consume method to look like other tools (Arnaldo Carvalho de Melo)
o Allow to specify lib compile variable for spec usage (Jiri Olsa)
o Fix GNU-only grep usage in Makefile (John Spencer)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Arnaldo Carvalho de Melo [Wed, 17 Sep 2014 19:42:58 +0000 (16:42 -0300)]
perf record: Use ring buffer consume method to look like other tools
All builtins that consume events from perf's ring buffer now end up
calling perf_evlist__mmap_consume(), which will allow unmapping the ring
buffer when all the fds gets closed and all events in the buffer
consumed.
This is in preparation for the patchkit that will notice POLLHUP on
perf events file descriptors.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-8vhaeeoq11ppz0713el4xcps@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Masami Hiramatsu [Wed, 17 Sep 2014 08:41:01 +0000 (08:41 +0000)]
perf probe: Do not use dwfl_module_addrsym if dwarf_diename finds symbol name
Do not use dwfl_module_addrsym if dwarf_diename can find the symbol
name, since dwfl_module_addrsym can be failed on shared libraries.
Without this patch
----
$ perf probe -x ../lib/traceevent/libtraceevent.so -V create_arg_op
Failed to find symbol at 0x11df1
Failed to find the address of create_arg_op
Error: Failed to show vars.
----
With this patch
----
$ perf probe -x ../lib/traceevent/libtraceevent.so -V create_arg_op
Available variables at create_arg_op
@<create_arg_op+0>
enum filter_op_type btype
struct filter_arg* arg
----
This bug was reported on linux-perf-users@vger.kernel.org.
Reported-by: david lerner <dlernerdroid@gmail.com>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: david lerner <dlernerdroid@gmail.com>
Cc: linux-perf-user@vger.kernel.org
Cc: yrl.pp-manager.tt@hitachi.com
Link: http://permalink.gmane.org/gmane.linux.kernel.perf.user/1691
Link: http://lkml.kernel.org/r/20140917084101.3722.25299.stgit@kbuild-f20.novalocal
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Masami Hiramatsu [Wed, 17 Sep 2014 08:40:54 +0000 (08:40 +0000)]
perf probe: Do not access kallsyms when analyzing user binaries
Do not access kallsyms to show available variables and show source lines
in user binaries.
This behavior always requires the root privilege when sysctl sets
kernel.kptr_restrict=1, but we don't need it just for analyzing user
binaries.
Without this patch (by normal user, kptr_restrict=1):
----
$ perf probe -x ./perf -V add_cmdname
Failed to init vmlinux path.
Error: Failed to show vars.
$ perf probe -x ./perf -L add_cmdname
Failed to init vmlinux path.
Error: Failed to show lines.
----
With this patch:
----
$ perf probe -x ./perf -V add_cmdname
Available variables at add_cmdname
@<perf_unknown_cmd_config+144>
(No matched variables)
@<list_commands_in_dir+160>
(No matched variables)
@<add_cmdname+0>
char* name
size_t len
struct cmdnames* cmds
$ perf probe -x ./perf -L add_cmdname
<add_cmdname@/home/fedora/ksrc/linux-3/tools/perf/util/help.c:0>
0 void add_cmdname(struct cmdnames *cmds, const char *name, size_t len)
1 {
2 struct cmdname *ent = malloc(sizeof(*ent) + len + 1);
4 ent->len = len;
5 memcpy(ent->name, name, len);
6 ent->name[len] = 0;
...
----
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: david lerner <dlernerdroid@gmail.com>
Cc: linux-perf-user@vger.kernel.org
Cc: yrl.pp-manager.tt@hitachi.com
Link: http://lkml.kernel.org/r/20140917084054.3722.73975.stgit@kbuild-f20.novalocal
[ Added missing 'bool user' argument to the !DWARF show_line_range() stub ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Anton Blanchard [Mon, 15 Sep 2014 19:57:56 +0000 (16:57 -0300)]
perf symbols: Add path to Ubuntu kernel debuginfo file
Ubuntu places the kernel debuginfo in /usr/lib/debug/boot/vmlinux-*
Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-`ranpwd
Link: http://lkml.kernel.org/r/20140909091152.2698c0f7@kryten
[ Adapted it to use the perf.data file kernel version as in
0a7e6d1b6844 ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Anton Blanchard [Mon, 8 Sep 2014 22:59:29 +0000 (08:59 +1000)]
perf symbols: Ignore stripped vmlinux and fallback to kallsyms
If a vmlinux is stripped, perf will use it and ignore kallsyms. We
end up with useless profiles where everything maps to a few
runtime symbols:
63.39% swapper [kernel.kallsyms] [k] hcall_real_table
4.90% beam.smp [kernel.kallsyms] [k] hcall_real_table
4.44% beam.smp [kernel.kallsyms] [k] __sched_text_start
3.72% beam.smp [kernel.kallsyms] [k] __run_at_kexec
Detect this case and fallback to using kallsyms. This fixes the issue:
62.81% swapper [kernel.kallsyms] [k] snooze_loop
4.44% beam.smp [kernel.kallsyms] [k] __schedule
0.91% beam.smp [kernel.kallsyms] [k] _switch
0.73% beam.smp [kernel.kallsyms] [k] put_prev_entity
Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20140909085929.4a5a81f0@kryten
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Chanho Park [Fri, 12 Sep 2014 02:10:17 +0000 (11:10 +0900)]
perf tools: define _DEFAULT_SOURCE for glibc_2.20
_BSD_SOURCE was deprecated in favour of _DEFAULT_SOURCE since glibc
2.20[1]. To avoid build warning on glibc2.20, _DEFAULT_SOURCE should
also be defined.
[1]: https://sourceware.org/glibc/wiki/Release/2.20
Signed-off-by: Chanho Park <chanho61.park@samsung.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1410487817-13403-1-git-send-email-chanho61.park@samsung.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Mon, 15 Sep 2014 18:54:34 +0000 (15:54 -0300)]
perf tools: Don't include sys/poll.h directly
Include poll.h instead.
Fixes the following warning in systems with musl's libc:
/usr/include/sys/poll.h:1:2: warning: #warning redirecting incorrect #include
<sys/poll.h> to <poll.h> [-Wcpp]
Reported-by: John Spencer <maillist-linux@barfooze.de>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://thread.gmane.org/gmane.linux.kernel.perf.user/1687/focus=1690
Link: http://lkml.kernel.org/n/tip-k4ocrq1de3fk146oevy346bi@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
John Spencer [Mon, 25 Aug 2014 19:36:32 +0000 (21:36 +0200)]
perf tools: Fix GNU-only grep usage in Makefile
This makes it work with non-GNU grep's as well.
Signed-off-by: John Spencer <maillist-linux@barfooze.de>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Link: http://thread.gmane.org/gmane.linux.kernel.perf.user/1686
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Avi Kivity [Sat, 13 Sep 2014 04:15:05 +0000 (07:15 +0300)]
perf tools: Disable kernel symbol demangling by default
Some Linux symbols (for example __vt_event_wait) are interpreted by the
demangler as C++ mangled names, which of course they aren't.
Disable kernel symbol demangling by default to avoid this, and allow
enabling it with a new option --demangle-kernel for those who wish it.
Reported-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1410581705-26968-1-git-send-email-avi@cloudius-systems.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Stephane Eranian [Fri, 5 Sep 2014 04:21:04 +0000 (06:21 +0200)]
perf tool: fix compilation for ARM
This patch fixes ARM compile of the perf tool. The debug.h header file
was missing from a couple of unwind related modules.
Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140905042103.GA3091@quad
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adrian Hunter [Thu, 31 Jul 2014 06:00:50 +0000 (09:00 +0300)]
perf tools: Add perf_pmu__scan_file()
Add a function to scan a sysfs file within the pmu device directory.
This will be used to read capability values from the PMU 'caps'
subdirectory.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1406786474-9306-8-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adrian Hunter [Thu, 31 Jul 2014 06:00:49 +0000 (09:00 +0300)]
perf tools: Let default config be defined for a PMU
This allows default config terms to be provided for a PMU. So, for
example, when the Intel PT PMU is added, it will be possible to specify:
intel_pt//
which will be the same as:
intel_pt/tsc=1,noretcomp=0/
meaning that the trace should contain TSC timestamps and perform 'return
compression'.
An important consideration of this patch is that it must be possible to
overwrite the default values. That has meant changing the logic so that
a zero value can replace a non-zero value.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1406786474-9306-7-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adrian Hunter [Thu, 31 Jul 2014 06:01:12 +0000 (09:01 +0300)]
perf tools: Add perf-with-kcore script
Decoding an Intel PT trace of the kernel requires an accurate kernel
object image. This is provided by making a copy of kcore. However the
copy needs to be made under the same conditions as the original
recording, and then it needs to be associated with the perf.data file.
The perf-with-kcore script does that.
The script also checks the permissions on the buildid cache and can be
used to fix them. That is needed for distributions where root does not
have a home directory and consequently writes to the same buildid cache
as the user, resulting in cached files that the user does not have
access to.
Example:
$ ./perf-with-kcore
Usage: perf-with-kcore <perf sub-command> <perf.data directory> [<sub-command options> [ -- <workload>]]
<perf sub-command> can be record, script, report or inject
or: perf-with-kcore fix_buildid_cache_permissions
$ ./perf-with-kcore record pt_uname -e intel_pt// -- uname
Recording
Using /home/ahunter/bin/perf
perf version 3.15.rc3.g4549ba
/home/ahunter/bin/perf record -o pt_uname/perf.data -e intel_pt// -- uname
Linux
[ perf record: Woken up 3 times to write data ]
[ perf record: Captured and wrote 0.023 MB pt_uname/perf.data ]
Copying kcore
[sudo] password for ahunter:
Done
$ tools/perf/perf-with-kcore.sh script pt_uname | head
Using /home/ahunter/bin/perf
perf version 3.15.rc3.g4549ba
/home/ahunter/bin/perf script -i pt_uname/perf.data --kallsyms=pt_uname/kcore_dir/kallsyms
swapper 0 [002] 161533.969666: sched:sched_switch: swapper/2:0 [120] R ==> perf:11316 [120]
:11315 11315 [003] 161533.969704: sched:sched_switch: perf:11315 [120] S ==> swapper/3:0 [120]
:11316 11316 [002] 161533.969783: sched:sched_switch: perf:11316 [120] R ==> migration/2:33 [0]
:33 33 [002] 161533.969791: sched:sched_switch: migration/2:33 [0] S ==> swapper/2:0 [120]
swapper 0 [003] 161533.969792: sched:sched_switch: swapper/3:0 [120] R ==> perf:11316 [120]
:11316 11316 [003] 161533.970062: branches: 0 [unknown] ([unknown]) =>
ffffffff810532fa native_write_msr_safe ([kernel.kallsyms])
:11316 11316 [003] 161533.970062: branches:
ffffffff810532fd native_write_msr_safe ([kernel.kallsyms]) =>
ffffffff81035b31 pt_config_start ([kernel.kallsyms])
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1406786474-9306-30-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adrian Hunter [Fri, 15 Aug 2014 19:08:40 +0000 (22:08 +0300)]
perf tools: Let a user specify a PMU event without any config terms
This enables a PMU event to be specified in the form:
pmu//
which is effectively the same as:
pmu/config=0/
This patch is a precursor to defining default config for a PMU.
Further explanation extracted from lkml thread:
Imagine that the 'tsc' term did not exist.
Intel PT trace data would not contain TSC packets, and the decoder would
not know how to decode them.
Then imagine that a new version of the hardware adds 'tsc'.
It is such a useful feature that we want it by default, but older
versions of the tools don't know how to decode it, so the kernel cannot
turn it on by default.
It is similar to why the kernel does not select perf_event_attr.mmap2 by
default.
The kernel doesn't know whether the tool supports it.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1408129739-17368-6-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Mon, 25 Aug 2014 14:55:52 +0000 (16:55 +0200)]
perf tools: Allow to specify lib compile variable for spec usage
We need a way to specify $(lib) part of the installation path for
traceevent plugin libraries. Currently we use 'lib64' for x86_64 and
'lib' otherwise.
Instead of listing all possible values, this change allows the rpm spec
code to specify the correct $(lib) part based on processed architecture,
like
$ make ... lib=%{_lib}
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Kyle McMartin <kyle@mcmartin.ca>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1408978552-17131-1-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Alexander Yarygin [Mon, 1 Sep 2014 13:44:55 +0000 (17:44 +0400)]
perf kvm stat report: Unify the title bar output
The 'live' command prints additional information to the "Analyze events
for " title bar about the current target. Let's print the same title
for the 'report' command.
Signed-off-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com>
Acked-by: David Ahern <dsahern@gmail.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1409579095-12963-4-git-send-email-yarygin@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Alexander Yarygin [Mon, 1 Sep 2014 13:44:54 +0000 (17:44 +0400)]
perf kvm stat report: Enable the target.system_wide flag
The 'perf kvm stat report' command can be used to analyze events either
for system wide or for specific pids.
Let's enable kvm->opts.target.system_wide flag when 'report' command is
running for system-wide analyzing. This helps to sync kvm->opts.target
values in 'report' and 'live' commands.
Signed-off-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com>
Acked-by: David Ahern <dsahern@gmail.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1409579095-12963-3-git-send-email-yarygin@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Alexander Yarygin [Mon, 1 Sep 2014 13:44:53 +0000 (17:44 +0400)]
perf kvm stat report: Save pid string in opts.target.pid
The 'perf kvm stat report' command uses the kvm->pid_str field to keep
the value of the --pid option. Let's use kvm->opts.target.pid instead.
Signed-off-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com>
Acked-by: David Ahern <dsahern@gmail.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1409579095-12963-2-git-send-email-yarygin@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Anton Blanchard [Mon, 25 Aug 2014 08:25:06 +0000 (18:25 +1000)]
perf tools powerpc: Fix build issue when DWARF support is disabled
The powerpc skip callchain code uses DWARF, so we must disable it if
DWARF is disabled.
Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/20140825182506.2be6512d@kryten
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Sat, 23 Aug 2014 12:59:48 +0000 (14:59 +0200)]
perf tools: Add +field argument support for --sort option
Adding support to add field(s) to default sort order via using the '+'
prefix, like for report:
$ perf report
Samples: 2K of event 'cycles', Event count (approx.):
882172583
Overhead Command Shared Object Symbol
7.39% swapper [kernel.kallsyms] [k] intel_idle
1.97% firefox libpthread-2.17.so [.] pthread_mutex_lock
1.39% firefox [snd_hda_intel] [k] azx_get_position
1.11% firefox libpthread-2.17.so [.] pthread_mutex_unlock
$ perf report -s +cpu
Samples: 2K of event 'cycles', Event count (approx.):
882172583
Overhead Command Shared Object Symbol CPU
2.89% swapper [kernel.kallsyms] [k] intel_idle 000
2.61% swapper [kernel.kallsyms] [k] intel_idle 002
1.20% swapper [kernel.kallsyms] [k] intel_idle 001
0.82% firefox libpthread-2.17.so [.] pthread_mutex_lock 002
Works in general for commands using --sort option.
v2 with changes suggested:
- Use dynamic memory instead static buffer
- Fix error message typo
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jean Pihet <jean.pihet@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20140823125948.GA1193@krava.brq.redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Mon, 8 Sep 2014 14:31:07 +0000 (16:31 +0200)]
perf: Do not check PERF_EVENT_STATE_EXIT on syscall read path
Revert PERF_EVENT_STATE_EXIT check on read syscall path.
It breaks standard way to read counter, which is to open
the counter, wait for the monitored process to die and
read the counter.
Reported-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Stephane Eranian <eranian@google.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: David Ahern <dsahern@gmail.com>
Link: http://lkml.kernel.org/r/20140908143107.GG17728@krava.brq.redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Andi Kleen [Fri, 29 Aug 2014 17:20:58 +0000 (10:20 -0700)]
perf/x86: Fix section mismatch in split uncore driver
The new split Intel uncore driver code that recently went
into tip added a section mismatch, which the build process
complains about.
uncore_pmu_register() can be called from uncore_pci_probe,()
which is not __init and can be called from pci driver ->probe.
I'm not fully sure if it's actually possible to call the probe
function later, but it seems safer to mark uncore_pmu_register
not __init.
This also fixes the warning.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1409332858-29039-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Mathias Krause [Tue, 26 Aug 2014 16:49:45 +0000 (18:49 +0200)]
perf/x86/intel: Mark initialization code as such
A few of the initialization functions are missing the __init annotation.
Fix this and thereby allow ~680 additional bytes of code to be released
after initialization.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: x86@kernel.org
Link: http://lkml.kernel.org/r/1409071785-26015-1-git-send-email-minipli@googlemail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Andreea-Cristina Bernat [Fri, 22 Aug 2014 13:26:05 +0000 (16:26 +0300)]
perf/core: Replace rcu_assign_pointer() with RCU_INIT_POINTER()
The use of "rcu_assign_pointer()" is NULLing out the pointer.
According to RCU_INIT_POINTER()'s block comment:
"1. This use of RCU_INIT_POINTER() is NULLing out the pointer"
it is better to use it instead of rcu_assign_pointer() because it has a
smaller overhead.
The following Coccinelle semantic patch was used:
@@
@@
- rcu_assign_pointer
+ RCU_INIT_POINTER
(..., NULL)
Signed-off-by: Andreea-Cristina Bernat <bernat.ada@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Link: http://lkml.kernel.org/r/20140822132605.GA20130@ada
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Andreea-Cristina Bernat [Fri, 22 Aug 2014 14:15:36 +0000 (17:15 +0300)]
perf/callchain: Replace rcu_assign_pointer() with RCU_INIT_POINTER()
The use of "rcu_assign_pointer()" is NULLing out the pointer.
According to RCU_INIT_POINTER()'s block comment:
"1. This use of RCU_INIT_POINTER() is NULLing out the pointer"
it is better to use it instead of rcu_assign_pointer() because it has a
smaller overhead.
The following Coccinelle semantic patch was used:
@@
@@
- rcu_assign_pointer
+ RCU_INIT_POINTER
(..., NULL)
Signed-off-by: Andreea-Cristina Bernat <bernat.ada@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: paulmck@linux.vnet.ibm.com
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Link: http://lkml.kernel.org/r/20140822141536.GA32051@ada
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Ingo Molnar [Tue, 9 Sep 2014 04:48:07 +0000 (06:48 +0200)]
Merge tag 'v3.17-rc4' into perf/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Linus Torvalds [Sun, 7 Sep 2014 23:09:43 +0000 (16:09 -0700)]
Linux 3.17-rc4
Sudip Mukherjee [Sun, 7 Sep 2014 18:26:12 +0000 (11:26 -0700)]
Documentation: new page link in SubmittingPatches
new link for - How to piss off a Linux kernel subsystem maintainer
Signed-off-by: Sudip Mukherjee <sudip@vectorindia.org>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Paul Bolle [Sun, 7 Sep 2014 18:25:55 +0000 (11:25 -0700)]
Documentation: NFS/RDMA: Document separate Kconfig symbols
The NFS/RDMA Kconfig symbol was split into separate options for client
and server in commit
2e8c12e1b765 ("xprtrdma: add separate Kconfig
options for NFSoRDMA client and server support").
Update the documentation to reflect this split.
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: "J. Bruce Fields" <bfields@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Masanari Iida [Sun, 7 Sep 2014 18:25:45 +0000 (11:25 -0700)]
Documentation: misc-devices: Rename freefall.c from hpfall.c in lis2lv02d
hpfall.c was renamed to freefall.c in 3.16, but this file still refer to
hpfall.c instead of freefall.c
Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jose Manuel Alarcon Roldan [Sun, 7 Sep 2014 18:25:00 +0000 (11:25 -0700)]
Documentation: i2c: rename variable "register" to "reg"
The example code provided with the i2c device interface documentation
won't compile since it uses the reserved word "register" to name a
variable.
The compiler fails with this error message:
error: expected identifier or '(' before '=' token
__u8 register = 0x20; /* Device register to access */
^
Rename the variable "register" to simply "reg" in the example code.
Another couple of typos has been fixed as well.
[Change "! =" to "!=".]
Signed-off-by: Jose Alarcon Roldan <jose.alarcon.roldan@gmail.com>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Wolfram Sang <wsa@the-dreams.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rob Jones [Sun, 7 Sep 2014 18:24:40 +0000 (11:24 -0700)]
Documentation: seq_file: Document seq_open_private(), seq_release_private()
Despite the fact that these functions have been around for years, they
are little used (only 15 uses in 13 files at the preseht time) even
though many other files use work-arounds to achieve the same result.
By documenting them, hopefully they will become more widely used.
Signed-off-by: Rob Jones <rob.jones@codethink.co.uk>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sun, 7 Sep 2014 18:57:27 +0000 (11:57 -0700)]
Merge tag 'pm+acpi-3.17-rc4' of git://git./linux/kernel/git/rafael/linux-pm
Pull ACPI and power management fixes from Rafael Wysocki:
"These are regression fixes (ACPI sysfs, ACPI video, suspend test),
ACPI cpuidle deadlock fix, missing runtime validation of ACPI _DSD
output, a fix and a new CPU ID for the RAPL driver, new blacklist
entry for the ACPI EC driver and a couple of trivial cleanups
(intel_pstate and generic PM domains).
Specifics:
- Fix for recently broken test_suspend= command line argument (Rafael
Wysocki).
- Fixes for regressions related to the ACPI video driver caused by
switching the default to native backlight handling in 3.16 from
Hans de Goede.
- Fix for a sysfs attribute of ACPI device objects that returns stale
values sometimes due to the fact that they are cached instead of
executing the appropriate method (_SUN) every time (broken in
3.14). From Yasuaki Ishimatsu.
- Fix for a deadlock between cpuidle_lock and cpu_hotplug.lock in the
ACPI processor driver from Jiri Kosina.
- Runtime output validation for the ACPI _DSD device configuration
object missing from the support for it that has been introduced
recently. From Mika Westerberg.
- Fix for an unuseful and misleading RAPL (Running Average Power
Limit) domain detection message in the RAPL driver from Jacob Pan.
- New Intel Haswell CPU ID for the RAPL driver from Jason Baron.
- New Clevo W350etq blacklist entry for the ACPI EC driver from Lan
Tianyu.
- Cleanup for the intel_pstate driver and the core generic PM domains
code from Gabriele Mazzotta and Geert Uytterhoeven"
* tag 'pm+acpi-3.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI / cpuidle: fix deadlock between cpuidle_lock and cpu_hotplug.lock
ACPI / scan: not cache _SUN value in struct acpi_device_pnp
cpufreq: intel_pstate: Remove unneeded variable
powercap / RAPL: change domain detection message
powercap / RAPL: add support for CPU model 0x3f
PM / domains: Make generic_pm_domain.name const
PM / sleep: Fix test_suspend= command line option
ACPI / EC: Add msi quirk for Clevo W350etq
ACPI / video: Disable native_backlight on HP ENVY 15 Notebook PC
ACPI / video: Add a disable_native_backlight quirk
ACPI / video: Fix use_native_backlight selection logic
ACPICA: ACPI 5.1: Add support for runtime validation of _DSD package.
Linus Torvalds [Sun, 7 Sep 2014 17:59:58 +0000 (10:59 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/viro/vfs
Pull filesystem fixes from Al Viro:
"Several bugfixes (all of them -stable fodder).
Alexey's one deals with double mutex_lock() in UFS (apparently, nobody
has tried to test "ufs: sb mutex merge + mutex_destroy" on something
like file creation/removal on ufs). Mine deal with two kinds of
umount bugs, in umount propagation and in handling of automounted
submounts, both resulting in bogus transient EBUSY from umount"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
ufs: fix deadlocks introduced by sb mutex merge
fix EBUSY on umount() from MNT_SHRINKABLE
get rid of propagate_umount() mistakenly treating slaves as busy.
Linus Torvalds [Sun, 7 Sep 2014 17:51:42 +0000 (10:51 -0700)]
Merge branch 'core-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull RCU fix from Ingo Molnar:
"A boot hang fix for the offloaded callback RCU model (RCU_NOCB_CPU=y
&& (TREE_CPU=y || TREE_PREEMPT_RC)) in certain bootup scenarios"
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
rcu: Make nocb leader kthreads process pending callbacks after spawning
Linus Torvalds [Sun, 7 Sep 2014 17:37:48 +0000 (10:37 -0700)]
Merge branch 'timers-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull timer fixes from Thomas Gleixner:
"Three fixlets from the timer departement:
- Update the timekeeper before updating vsyscall and pvclock. This
fixes the kvm-clock regression reported by Chris and Paolo.
- Use the proper irq work interface from NMI. This fixes the
regression reported by Catalin and Dave.
- Clarify the compat_nanosleep error handling mechanism to avoid
future confusion"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timekeeping: Update timekeeper before updating vsyscall and pvclock
compat: nanosleep: Clarify error handling
nohz: Restore NMI safe local irq work for local nohz kick
Alexey Khoroshilov [Tue, 2 Sep 2014 07:40:17 +0000 (11:40 +0400)]
ufs: fix deadlocks introduced by sb mutex merge
Commit
0244756edc4b ("ufs: sb mutex merge + mutex_destroy") introduces
deadlocks in ufs_new_inode() and ufs_free_inode().
Most callers of that functions acqure the mutex by themselves and
ufs_{new,free}_inode() do that via lock_ufs(),
i.e we have an unavoidable double lock.
The patch proposes to resolve the issue by making sure that
ufs_{new,free}_inode() are not called with the mutex held.
Found by Linux Driver Verification project (linuxtesting.org).
Cc: stable@vger.kernel.org # 3.16
Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Linus Torvalds [Sat, 6 Sep 2014 23:42:12 +0000 (16:42 -0700)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"A smattering of bug fixes across most architectures"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
powerpc/kvm/cma: Fix panic introduces by signed shift operation
KVM: s390/mm: Fix guest storage key corruption in ptep_set_access_flags
KVM: s390/mm: Fix storage key corruption during swapping
arm/arm64: KVM: Complete WFI/WFE instructions
ARM/ARM64: KVM: Nuke Hyp-mode tlbs before enabling MMU
KVM: s390/mm: try a cow on read only pages for key ops
KVM: s390: Fix user triggerable bug in dead code
Linus Torvalds [Sat, 6 Sep 2014 19:37:43 +0000 (12:37 -0700)]
Merge tag 'fixes-for-linus' of git://git./linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Kevin Hilman:
"Another round of fixes from arm-soc land, which are mostly DT fixes
for:
- OMAP: handful of DT fixes devices on newly supported hardware
- davinci: fix 2nd EDMA channel
- ux500: extend previous pinctrl fix to another board
- at91: clock registration fixes, compatibility string precision
And one more fix for event cleanup in drivers/bus/arm-ccn"
* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
bus: arm-ccn: Move event cleanup routine
ARM: at91/dt: rm9200: fix usb clock definition
ARM: at91: rm9200: fix clock registration
ARM: at91/dt: sam9g20: set at91sam9g20 pllb driver
ARM: dts: dra7-evm: Add vtt regulator support
ARM: dts: dra7-evm: Fix spi1 mux documentation
ARM: dts: am43x-epos-evm: Disable QSPI to prevent conflict with GPMC-NAND
ARM: OMAP2+: gpmc: Don't complain if wait pin is used without r/w monitoring
ARM: dts: am43xx-epos-evm: Don't use read/write wait monitoring
ARM: dts: am437x-gp-evm: Don't use read/write wait monitoring
ARM: dts: am437x-gp-evm: Use BCH16 ECC scheme instead of BCH8
ARM: dts: am43x-epos-evm: Use BCH16 ECC scheme instead of BCH8
ARM: dts: am4372: fix USB regs size
ARM: dts: am437x-gp: switch i2c0 to 100KHz
ARM: dts: dra7-evm: Fix 8th NAND partition's name
ARM: dts: dra7-evm: Fix i2c3 pinmux and frequency
ARM: ux500: disable msp2 node on Snowball
ARM: edma: Fix configuration parsing for SoCs with multiple eDMA3 CC
ARM: dts: set 'ti,set-rate-parent' for dpll4_m5x2 clock
Linus Torvalds [Sat, 6 Sep 2014 19:13:17 +0000 (12:13 -0700)]
Merge tag 'xfs-for-linus-3.17-rc3' of git://git./linux/kernel/git/dgc/linux-xfs
Pull xfs fixes from Dave Chinner:
"The fixes all address recently discovered data corruption issues.
The original Direct IO issue was discovered by Chris Mason @ Facebook
on a production workload which mixed buffered reads with direct reads
and writes IO to the same file. The fix for that exposed other issues
with page invalidation (exposed by millions of fsx operations) failing
due to dirty buffers beyond EOF.
Finally, the collapse_range code could also cause problems due to
racing writeback changing the extent map while it was being shifted
around. The commits for that problem are simple mitigation fixes that
prevent the problem from occuring. A more robust fix for 3.18 that
addresses the underlying problem is currently being worked on by
Brian.
Summary of fixes:
- a direct IO read/buffered read data corruption
- the associated fallout from the DIO data corruption fix
- collapse range bugs that are potential data corruption issues"
* tag 'xfs-for-linus-3.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs:
xfs: trim eofblocks before collapse range
xfs: xfs_file_collapse_range is delalloc challenged
xfs: don't log inode unless extent shift makes extent modifications
xfs: use ranged writeback and invalidation for direct IO
xfs: don't zero partial page cache pages during O_DIRECT writes
xfs: don't zero partial page cache pages during O_DIRECT writes
xfs: don't dirty buffers beyond EOF
Linus Torvalds [Sat, 6 Sep 2014 19:12:09 +0000 (12:12 -0700)]
Merge tag 'for-linus-
20140905' of git://git.infradead.org/linux-mtd
Pull mtd fixes from Brian Norris:
"Two trivial MTD updates for 3.17-rc4:
- a tiny comment tweak, to kill a bunch of DocBook warnings added
during the merge window
- a small fixup to the OTP routines' error handling"
* tag 'for-linus-
20140905' of git://git.infradead.org/linux-mtd:
mtd: nand: fix DocBook warnings on nand_sdr_timings doc
mtd: cfi_cmdset_0002: check return code for get_chip()
Thomas Gleixner [Sat, 6 Sep 2014 10:24:49 +0000 (12:24 +0200)]
timekeeping: Update timekeeper before updating vsyscall and pvclock
The update_walltime() code works on the shadow timekeeper to make the
seqcount protected region as short as possible. But that update to the
shadow timekeeper does not update all timekeeper fields because it's
sufficient to do that once before it becomes life. One of these fields
is tkr.base_mono. That stays stale in the shadow timekeeper unless an
operation happens which copies the real timekeeper to the shadow.
The update function is called after the update calls to vsyscall and
pvclock. While not correct, it did not cause any problems because none
of the invoked update functions used base_mono.
commit
cbcf2dd3b3d4 (x86: kvm: Make kvm_get_time_and_clockread()
nanoseconds based) changed that in the kvm pvclock update function, so
the stale mono_base value got used and caused kvm-clock to malfunction.
Put the update where it belongs and fix the issue.
Reported-by: Chris J Arges <chris.j.arges@canonical.com>
Reported-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: John Stultz <john.stultz@linaro.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1409050000570.3333@nanos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Thomas Gleixner [Sat, 6 Sep 2014 10:18:07 +0000 (12:18 +0200)]
compat: nanosleep: Clarify error handling
The error handling in compat_sys_nanosleep() is correct, but
completely non obvious. Document it and restrict it to the
-ERESTART_RESTARTBLOCK return value for clarity.
Reported-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Linus Torvalds [Fri, 5 Sep 2014 20:45:09 +0000 (13:45 -0700)]
Merge branch 'i2c/for-current' of git://git./linux/kernel/git/wsa/linux
Pull i2c bugfixes from Wolfram Sang:
"I2C driver bugfixes for the 3.17 release. Details can be found in the
commit messages, yet I think this is typical driver stuff"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
Revert "i2c: rcar: remove spinlock"
i2c: at91: add bound checking on SMBus block length bytes
i2c: rk3x: fix bug that cause transfer fails in master receive mode
i2c: at91: Fix a race condition during signal handling in at91_do_twi_xfer.
i2c: mv64xxx: continue probe when clock-frequency is missing
i2c: rcar: fix MNR interrupt handling
Kevin Hilman [Fri, 5 Sep 2014 20:28:33 +0000 (13:28 -0700)]
Merge tag 'at91-fixes' of git://github.com/at91linux/linux-at91 into fixes
Merge "at91: fixes for 3.17 #1" from Nicols Ferre:
First AT91 fixes batch for 3.17:
- compatibility string precision
- clock registration and USB DT fix for at91rm9200
* tag 'at91-fixes' of git://github.com/at91linux/linux-at91:
ARM: at91/dt: rm9200: fix usb clock definition
ARM: at91: rm9200: fix clock registration
ARM: at91/dt: sam9g20: set at91sam9g20 pllb driver
Signed-off-by: Kevin Hilman <khilman@linaro.org>
Pawel Moll [Tue, 2 Sep 2014 15:26:11 +0000 (16:26 +0100)]
bus: arm-ccn: Move event cleanup routine
The function cleaning up an initialized event
was called from the "event_del" handler, instead
of being used as the "destroy" callback. In case of
events group allocation this caused NULL pointer
dereference (as events are added and deleted
multiple times then). Fixed now.
Signed-off-by: Pawel Moll <mail@pawelmoll.com>
Signed-off-by: Kevin Hilman <khilman@linaro.org>
Linus Torvalds [Fri, 5 Sep 2014 15:43:48 +0000 (08:43 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/geert/linux-m68k
Pull m68k updates from Geert Uytterhoeven:
"Wire up new syscalls getrandom and memfd_create"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k:
m68k: Wire up memfd_create
m68k: Wire up getrandom
Alexandre Belloni [Fri, 5 Sep 2014 14:45:12 +0000 (16:45 +0200)]
ARM: at91/dt: rm9200: fix usb clock definition
The atmel,clk-divisors property is taking 4 divisors, if less are
provided, the clock registration will fail.
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Alexandre Belloni [Fri, 5 Sep 2014 14:15:33 +0000 (16:15 +0200)]
ARM: at91: rm9200: fix clock registration
Actually register clocks from device tree when using the common clock
framework.
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com>
[nicolas.ferre@atmel.com: add at91 to function name]
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Gaël PORTAY [Mon, 1 Sep 2014 21:29:46 +0000 (23:29 +0200)]
ARM: at91/dt: sam9g20: set at91sam9g20 pllb driver
The at91sam9g20 SOC uses its own pllb implementation which is different
from the one inherited from at91sam9260 SOC.
Signed-off-by: Gaël PORTAY <gael.portay@gmail.com>
Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Johannes Weiner [Fri, 5 Sep 2014 12:43:57 +0000 (08:43 -0400)]
mm: memcontrol: revert use of root_mem_cgroup res_counter
Dave Hansen reports a massive scalability regression in an uncontained
page fault benchmark with more than 30 concurrent threads, which he
bisected down to
05b843012335 ("mm: memcontrol: use root_mem_cgroup
res_counter") and pin-pointed on res_counter spinlock contention.
That change relied on the per-cpu charge caches to mostly swallow the
res_counter costs, but it's apparent that the caches don't scale yet.
Revert memcg back to bypassing res_counters on the root level in order
to restore performance for uncontained workloads.
Reported-by: Dave Hansen <dave@sr71.net>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Tested-by: Dave Hansen <dave.hansen@intel.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Vladimir Davydov <vdavydov@parallels.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>