Wang Nan [Wed, 13 Apr 2016 08:21:05 +0000 (08:21 +0000)]
perf data: Add perf_data_file__switch() helper
perf_data_file__switch() closes current output file, renames it, then
open a new one to continue recording. It will be used by 'perf record'
to split output into multiple perf.data files.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1460535673-159866-3-git-send-email-wangnan0@huawei.com
Signed-off-by: He Kuang <hekuang@huawei.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Wang Nan [Wed, 13 Apr 2016 08:21:04 +0000 (08:21 +0000)]
perf session: Make ordered_events reusable
ordered_events__free() leaves linked lists and timestamps not cleared,
so unable to be reused after ordered_events__free(). Which is inconvenient
after 'perf record' supports generating multiple perf.data output and
process build-ids for each of them.
Use ordered_events__reinit() for this.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1460535673-159866-2-git-send-email-wangnan0@huawei.com
Signed-off-by: He Kuang <hekuang@huawei.com>
[ Split from larger patch ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Wang Nan [Wed, 13 Apr 2016 08:21:04 +0000 (08:21 +0000)]
perf ordered_events: Introduce reinit()
'perf record' will use this when outputting multiple perf.data files.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1460535673-159866-2-git-send-email-wangnan0@huawei.com
Signed-off-by: He Kuang <hekuang@huawei.com>
[ Split from larger patch ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Wed, 13 Apr 2016 15:10:19 +0000 (12:10 -0300)]
perf trace: Move eventfd beautifiers to trace/beauty/ directory
To better organize all these beautifiers.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Milian Wolff <milian.wolff@kdab.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-zrw5zz7cnrs44o5osouyutvt@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Wed, 13 Apr 2016 15:05:44 +0000 (12:05 -0300)]
perf trace: Move mmap beautifiers to trace/beauty/ directory
To better organize all these beautifiers.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Milian Wolff <milian.wolff@kdab.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-zbr27mdy9ssdhux3ib2nfa7j@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Wed, 13 Apr 2016 14:55:18 +0000 (11:55 -0300)]
perf trace: Add getrandom beautifier related defines for older systems
Were the detached tarball (make perf-tar-src-pkg) build was failing because
those definitions aren't available in the system headers.
On RHEL7, for instance:
builtin-trace.c: In function ‘syscall_arg__scnprintf_getrandom_flags’:
builtin-trace.c:1113:14: error: ‘GRND_RANDOM’ undeclared (first use in this function)
P_FLAG(RANDOM);
^
builtin-trace.c:1114:14: error: ‘GRND_NONBLOCK’ undeclared (first use in this function)
P_FLAG(NONBLOCK);
^
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Milian Wolff <milian.wolff@kdab.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-r8496g24a3kbqynvk6617b0e@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Wed, 13 Apr 2016 14:50:23 +0000 (11:50 -0300)]
perf trace: Add seccomp beautifier related defines for older systems
Were the detached tarball (make perf-tar-src-pkg) build was failing because
those definitions aren't available in the system headers.
On RHEL7, for instance:
builtin-trace.c: In function ‘syscall_arg__scnprintf_seccomp_op’:
builtin-trace.c:1069:7: error: ‘SECCOMP_SET_MODE_STRICT’ undeclared (first use in this function)
P_SECCOMP_SET_MODE_OP(STRICT);
^
builtin-trace.c:1069:7: note: each undeclared identifier is reported only once for each function it appears in
builtin-trace.c:1070:7: error: ‘SECCOMP_SET_MODE_FILTER’ undeclared (first use in this function)
P_SECCOMP_SET_MODE_OP(FILTER);
^
builtin-trace.c: In function ‘syscall_arg__scnprintf_seccomp_flags’:
builtin-trace.c:1091:14: error: ‘SECCOMP_FILTER_FLAG_TSYNC’ undeclared (first use in this function)
P_FLAG(TSYNC);
^
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Milian Wolff <milian.wolff@kdab.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-4f8dzzwd7g6l5dzz693u7kul@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ingo Molnar [Wed, 13 Apr 2016 18:27:58 +0000 (20:27 +0200)]
Merge tag 'perf-core-for-mingo-
20160413' of git://git./linux/kernel/git/acme/linux into perf/core
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
User visible changes:
- Print callchains asked for events requested via 'perf trace --event' too:
(Arnaldo Carvalho de Melo)
# trace -e nanosleep --call dwarf --event sched:sched_switch/call-graph=fp/ usleep 1
0.346 (0.005 ms): usleep/24428 nanosleep(rqtp: 0x7fffa15a0540) ...
0.346 ( ): sched:sched_switch:usleep:24428 [120] S ==> swapper/3:0 [120])
__schedule+0xfe200402 ([kernel.kallsyms])
schedule+0xfe200035 ([kernel.kallsyms])
do_nanosleep+0xfe20006f ([kernel.kallsyms])
hrtimer_nanosleep+0xfe2000dc ([kernel.kallsyms])
sys_nanosleep+0xfe20007a ([kernel.kallsyms])
do_syscall_64+0xfe200062 ([kernel.kallsyms])
return_from_SYSCALL_64+0xfe200000 ([kernel.kallsyms])
__nanosleep+0xffff005b8d602010 (/usr/lib64/libc-2.22.so)
0.400 (0.059 ms): usleep/24428 ... [continued]: nanosleep()) = 0
__nanosleep+0x10 (/usr/lib64/libc-2.22.so)
usleep+0x34 (/usr/lib64/libc-2.22.so)
main+0x1eb (/usr/bin/usleep)
__libc_start_main+0xf0 (/usr/lib64/libc-2.22.so)
_start+0x29 (/usr/bin/usleep)
- Allow requesting that some CPUs or PIDs be highlighted in 'perf sched map' (Jiri Olsa)
- Compact 'perf sched map' to show just CPUs with activity, improving the output
in high core count systems (Jiri Olsa)
- Fix segfault with 'perf trace --no-syscalls -e syscall-names' by bailing out
such request, doesn't make sense to ask for no syscalls and then specify which
ones should be printed (Arnaldo Carvalho de Melo)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Arnaldo Carvalho de Melo [Tue, 12 Apr 2016 19:05:02 +0000 (16:05 -0300)]
perf trace: Do not accept --no-syscalls together with -e
Doesn't make sense and was causing a segfault, fix it.
# trace -e clone --no-syscalls --event sched:*exec firefox
The -e option can't be used with --no-syscalls.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Milian Wolff <milian.wolff@kdab.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-ccrahezikdk2uebptzr1eyyi@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Tue, 12 Apr 2016 18:16:15 +0000 (15:16 -0300)]
perf evsel: Move some methods from session.[ch] to evsel.[ch]
Those were converted to be evsel methods long ago, move the
source to where it belongs.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Milian Wolff <milian.wolff@kdab.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-vja8rjmkw3gd5ungaeyb5s2j@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Tue, 12 Apr 2016 13:29:31 +0000 (15:29 +0200)]
perf sched map: Display only given cpus
Introducing --cpus option that will display only given cpus. Could be
used together with color-cpus option.
$ perf sched map --cpus 0,1
*A0 309999.786924 secs A0 => rcu_sched:7
*. 309999.786930 secs
*B0 . 309999.786931 secs B0 => rcuos/2:25
B0 *A0 309999.786947 secs
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1460467771-26532-9-git-send-email-jolsa@kernel.org
[ Added entry to man page ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Tue, 12 Apr 2016 13:29:30 +0000 (15:29 +0200)]
perf sched map: Color given cpus
Adding --color-cpus option to display selected cpus with background
color (red by default). It helps on navigating through the perf sched
map output.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1460467771-26532-8-git-send-email-jolsa@kernel.org
[ Added entry to man page ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Tue, 12 Apr 2016 13:29:29 +0000 (15:29 +0200)]
perf sched map: Color given pids
Adding --color-pids option to display selected pids in color (blue by
default). It helps on navigating through the 'perf sched map' output.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1460467771-26532-7-git-send-email-jolsa@kernel.org
[ Added entry to man page ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Tue, 12 Apr 2016 13:29:28 +0000 (15:29 +0200)]
perf thread_map: Make new_by_tid_str constructor public
It will be used in following patch.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1460467771-26532-6-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Tue, 12 Apr 2016 13:29:27 +0000 (15:29 +0200)]
perf sched: Use color_fprintf for output
As preparation for next patch.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1460467771-26532-5-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Tue, 12 Apr 2016 13:29:26 +0000 (15:29 +0200)]
perf sched: Add compact display option
Add compact map display that does not output the whole cpu matrix, only
cpus that got event.
$ perf sched map --compact
*A0
1082427.094098 secs A0 => perf:19404 (CPU 2)
A0 *.
1082427.094127 secs . => swapper:0 (CPU 1)
A0 . *B0
1082427.094174 secs B0 => rcuos/2:25 (CPU 3)
A0 . *.
1082427.094177 secs
*C0 . .
1082427.094187 secs C0 => migration/2:21
C0 *A0 .
1082427.094193 secs
*. A0 .
1082427.094195 secs
*D0 A0 .
1082427.094402 secs D0 => rngd:968
*. A0 .
1082427.094406 secs
. *E0 .
1082427.095221 secs E0 => kworker/1:1:5333
. E0 *F0
1082427.095227 secs F0 => xterm:3342
It helps to display sane output for small thread loads on big cpu
servers.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1460467771-26532-4-git-send-email-jolsa@kernel.org
[ Add entry in 'perf sched' man page ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Tue, 12 Apr 2016 13:29:25 +0000 (15:29 +0200)]
perf cpu_map: Add has() method
Adding cpu_map__has() to return bool of cpu presence in cpus map.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1460467771-26532-3-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Tue, 12 Apr 2016 13:29:24 +0000 (15:29 +0200)]
perf thread_map: Add has() method
Adding thread_map__has() to return bool of pid presence in threads map.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1460467771-26532-2-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Tue, 12 Apr 2016 13:11:07 +0000 (10:11 -0300)]
perf trace: Support callchains for --event too
We already were able to ask for callchains for a specific event:
# trace -e nanosleep --call dwarf --event sched:sched_switch/call-graph=fp/ usleep 1
This would enable tracing just the "nanosleep" syscall, with callchains
at syscall exit and would ask the kernel for frame pointer callchains to
be enabled for the "sched:sched_switch" tracepoint event, its just that
we were not resolving the callchain and printing it in 'perf trace', do
it:
# trace -e nanosleep --call dwarf --event sched:sched_switch/call-graph=fp/ usleep 1
0.425 ( 0.013 ms): usleep/6718 nanosleep(rqtp: 0x7ffcc1d16e20) ...
0.425 ( ): sched:sched_switch:usleep:6718 [120] S ==> swapper/2:0 [120])
__schedule+0xfe200402 ([kernel.kallsyms])
schedule+0xfe200035 ([kernel.kallsyms])
do_nanosleep+0xfe20006f ([kernel.kallsyms])
hrtimer_nanosleep+0xfe2000dc ([kernel.kallsyms])
sys_nanosleep+0xfe20007a ([kernel.kallsyms])
do_syscall_64+0xfe200062 ([kernel.kallsyms])
return_from_SYSCALL_64+0xfe200000 ([kernel.kallsyms])
__nanosleep+0xffff008b8cbe2010 (/usr/lib64/libc-2.22.so)
0.486 ( 0.073 ms): usleep/6718 ... [continued]: nanosleep()) = 0
__nanosleep+0x10 (/usr/lib64/libc-2.22.so)
usleep+0x34 (/usr/lib64/libc-2.22.so)
main+0x1eb (/usr/bin/usleep)
__libc_start_main+0xf0 (/usr/lib64/libc-2.22.so)
_start+0x29 (/usr/bin/usleep)
#
Pretty compact, huh? DWARF callchains for raw_syscalls:sys_exit + frame
pointer callchains for a tracepoint, if your hardware supports LBR, go
wild with /call-graph=lbr/, guess the next step is to lift this from
'perf script':
-F, --fields <str> comma separated output fields prepend with 'type:'. Valid types: hw,sw,trace,raw.
Fields: comm,tid,pid,time,cpu,event,trace,ip,sym,dso,addr,symoff,period,iregs,brstack,brstacksym,flags
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Milian Wolff <milian.wolff@kdab.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-2e7yiv5hqdm8jywlmfivvx2v@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Peter Zijlstra [Mon, 4 Apr 2016 14:02:08 +0000 (16:02 +0200)]
perf/x86/amd/uncore: Do not register a task ctx for uncore PMUs
The new sanity check introduced by:
26657848502b ("perf/core: Verify we have a single perf_hw_context PMU")
... triggered on the AMD uncore driver.
Uncore PMUs are per node, they cannot have per-task counters. Fix it.
Reported-by: Borislav Petkov <bp@suse.de>
Reported-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@redhat.com
Cc: alexander.shishkin@linux.intel.com
Cc: eranian@google.com
Cc: jolsa@redhat.com
Cc: linux-tip-commits@vger.kernel.org
Cc: vincent.weaver@maine.edu
Link: http://lkml.kernel.org/r/20160404140208.GA3448@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Alexander Shishkin [Wed, 6 Apr 2016 14:35:07 +0000 (17:35 +0300)]
perf/x86/intel/pt: Use boot_cpu_has() because it's there
At the moment, initialization path is using test_cpu_cap(&boot_cpu_data),
to detect PT, which is just open coding boot_cpu_has(). Use the latter
instead.
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: eranian@google.com
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/1459953307-14372-1-git-send-email-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Julia Lawall [Sat, 9 Apr 2016 11:17:29 +0000 (13:17 +0200)]
uprobes/x86: Constify uprobe_xol_ops structures
The uprobe_xol_ops structures are never modified, so declare them as const.
Done with the help of Coccinelle.
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-janitors@vger.kernel.org
Link: http://lkml.kernel.org/r/1460200649-32526-1-git-send-email-Julia.Lawall@lip6.fr
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Ingo Molnar [Wed, 13 Apr 2016 07:02:07 +0000 (09:02 +0200)]
Merge tag 'perf-core-for-mingo-
20160411' of git://git./linux/kernel/git/acme/linux into perf/core
Pull perf/core improvements from Arnaldo Carvalho de Melo:
User visible changes:
- Automagically create a 'bpf-output' event, easing the setup of BPF
C "scripts" that produce output via the perf ring buffer. Now it is
just a matter of calling any perf tool, such as 'trace', with a C
source file that references the __bpf_stdout__ output channel and
that channel will be created and connected to the script:
# trace -e nanosleep --event test_bpf_stdout.c usleep 1
0.013 ( 0.013 ms): usleep/2818 nanosleep(rqtp: 0x7ffcead45f40 ) ...
0.013 ( ): __bpf_stdout__:Raise a BPF event!..)
0.015 ( ): perf_bpf_probe:func_begin:(
ffffffff81112460))
0.261 ( ): __bpf_stdout__:Raise a BPF event!..)
0.262 ( ): perf_bpf_probe:func_end:(
ffffffff81112460 <-
ffffffff81003d92))
0.264 ( 0.264 ms): usleep/2818 ... [continued]: nanosleep()) = 0
#
Further work is needed to reduce the number of lines in a perf bpf C source
file, this being the part where we greatly reduce the command line setup (Wang Nan)
- 'perf trace' now supports callchains, with 'trace --call-graph dwarf' using
libunwind, just like 'perf top', to ask the kernel for stack dumps for CFI
processing. This reduces the overhead by asking just for userspace callchains
and also only for the syscall exit tracepoint (raw_syscalls:sys_exit)
(Milian Wolff, Arnaldo Carvalho de Melo)
Try it with, for instance:
# perf trace --call dwarf ping 127.0.0.1
An excerpt of a system wide 'perf trace --call dwarf" session is at:
https://fedorapeople.org/~acme/perf/perf-trace--call-graph-dwarf--all-cpus.txt
You may need to bump the number of mmap pages, using -m/--mmap-pages,
but on a Broadwell machine the defaults allowed system wide tracing to
work without losing that many records, experiment with just some
syscalls, like:
# perf trace --call dwarf -e nanosleep,futex
All the targets available for 'perf record', 'perf top' (--pid, --tid, --cpu,
etc) should work. Also --duration may be interesting to try.
To get filenames from in various syscalls pointer args (open, ettc), add this
to the mix:
# perf probe 'vfs_getname=getname_flags:72 pathname=filename:string'
Making this work is next in line:
# trace --call dwarf --ev sched:sched_switch/call-graph=fp/ usleep 1
I.e. honouring per-tracepoint callchains in 'perf trace' in addition to
in raw_syscalls:sys_exit.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Ingo Molnar [Wed, 13 Apr 2016 06:57:50 +0000 (08:57 +0200)]
Merge tag 'perf-core-for-mingo-
20160408' of git://git./linux/kernel/git/acme/linux into perf/core
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
User visible changes:
- Beautify more syscall arguments in 'perf trace', using the type column in
tracepoint /format fields to attach, for instance, a pid_t resolver to the
thread COMM, also attach a mode_t beautifier in the same fashion
(Arnaldo Carvalho de Melo)
- Build the syscall table id <-> name resolver using the same .tbl file
used in the kernel to generate headers, to avoid the delay in getting
new syscalls supported in the audit-libs external dependency, done so
far only for x86_64 (Arnaldo Carvalho de Melo)
- Improve the documentation of event specifications (Andi Kleen)
- Process update events in 'perf script', fixing up this use case:
# perf stat -a -I 1000 -e cycles record | perf script -s script.py
- Shared object symbol adjustment fixes, fixing symbol resolution in
Android (Wang Nan)
Infrastructure changes:
- Add dedicated unwind addr_space member into thread struct, to allow
tools to use thread->priv, noticed while working on having callchains
in 'perf trace' (Jiri Olsa)
Build fixes:
- Fix the build in Ubuntu 12.04 (Arnaldo Carvalho de Melo, Vinson Lee)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Ingo Molnar [Wed, 13 Apr 2016 06:57:03 +0000 (08:57 +0200)]
Merge tag 'v4.6-rc3' into perf/core, to refresh the tree
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Arnaldo Carvalho de Melo [Tue, 12 Apr 2016 01:08:55 +0000 (22:08 -0300)]
perf trace: Print unresolved symbol names as addresses
Instead of having "[unknown]" as the name used for unresolved symbols,
use the address in the callchain, in hexadecimal form:
28.801 ( 0.007 ms): qemu-system-x8/10065 ppoll(ufds: 0x55c98b39e400, nfds: 72, tsp: 0x7fffe4e4fe60, sigsetsize: 8) = 0 Timeout
ppoll+0x91 (/usr/lib64/libc-2.22.so)
[0x337309] (/usr/bin/qemu-system-x86_64)
[0x336ab4] (/usr/bin/qemu-system-x86_64)
main+0x1724 (/usr/bin/qemu-system-x86_64)
__libc_start_main+0xf0 (/usr/lib64/libc-2.22.so)
[0xc59a9] (/usr/bin/qemu-system-x86_64)
35.265 (14.805 ms): gnome-shell/2287 ... [continued]: poll()) = 1
[0xf6fdd] (/usr/lib64/libc-2.22.so)
g_main_context_iterate.isra.29+0x17c (/usr/lib64/libglib-2.0.so.0.4600.2)
g_main_loop_run+0xc2 (/usr/lib64/libglib-2.0.so.0.4600.2)
meta_run+0x2c (/usr/lib64/libmutter.so.0.0.0)
main+0x3f7 (/usr/bin/gnome-shell)
__libc_start_main+0xf0 (/usr/lib64/libc-2.22.so)
[0x2909] (/usr/bin/gnome-shell)
Suggested-by: Milian Wolff <milian.wolff@kdab.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-fja1ods5vqpg42mdz09xcz3r@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Tue, 12 Apr 2016 01:03:56 +0000 (22:03 -0300)]
perf evsel: Allow unresolved symbol names to be printed as addresses
The fprintf_sym() and fprintf_callchain() methods now allow users to
change the existing behaviour of showing "[unknown]" as the name of
unresolved symbols to instead show "[0x123456]", i.e. its address.
The current patch doesn't change tools to use this facility, the results
from 'perf trace' and 'perf script' cotinue like:
70.109 ( 0.001 ms): qemu-system-x8/10153 poll(ufds: 0x7f2d93ffe870, nfds: 1) = 0 Timeout
[unknown] (/usr/lib64/libc-2.22.so)
[unknown] (/usr/lib64/libspice-server.so.1.10.0)
[unknown] (/usr/lib64/libspice-server.so.1.10.0)
[unknown] (/usr/lib64/libspice-server.so.1.10.0)
start_thread+0xca (/usr/lib64/libpthread-2.22.so)
__clone+0x6d (/usr/lib64/libc-2.22.so)
The next patch will make 'perf trace' use the new formatting.
Suggested-by: Milian Wolff <milian.wolff@kdab.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-fja1ods5vqpg42mdz09xcz3r@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Mon, 11 Apr 2016 21:42:37 +0000 (18:42 -0300)]
perf trace: Make "--call-graph" affect just "raw_syscalls:sys_exit"
We don't need the callchains at the syscall enter tracepoint, just when
finishing it at syscall exit, so reduce the overhead by asking for
callchains just at syscall exit.
Suggested-by: Milian Wolff <milian.wolff@kdab.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-fja1ods5vqpg42mdz09xcz3r@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Mon, 11 Apr 2016 21:39:37 +0000 (18:39 -0300)]
perf evsel: Rename config_callgraph() to config_callchain() and make it public
The rename is for consistency with the parameter name.
Make it public for fine grained control of which evsels should have
callchains enabled, like, for instance, will be done in the next
changesets in 'perf trace', to enable callchains just on the
"raw_syscalls:sys_exit" tracepoint.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Milian Wolff <milian.wolff@kdab.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-og8vup111rn357g4yagus3ao@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Mon, 11 Apr 2016 21:37:45 +0000 (18:37 -0300)]
perf evlist: Add (reset,set)_sample_bit methods
For fiddling with sample_type fields in all evsels in an evlist.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Milian Wolff <milian.wolff@kdab.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-dg6yavctt0hzl2tsgfb43qsr@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Mon, 11 Apr 2016 21:15:29 +0000 (18:15 -0300)]
perf evsel: Do not use globals in config()
Instead receive a callchain_param pointer to configure callchain
aspects, not doing so if NULL is passed.
This will allow fine grained control over which evsels in an evlist
gets callchains enabled.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Milian Wolff <milian.wolff@kdab.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-2mupip6khc92mh5x4nw9to82@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Mon, 11 Apr 2016 18:49:11 +0000 (15:49 -0300)]
perf trace: Exclude the kernel part of the callchain leading to a syscall
The kernel parts are not that useful:
# trace -m 512 -e nanosleep --call dwarf usleep 1
0.065 ( 0.065 ms): usleep/18732 nanosleep(rqtp: 0x7ffc4ee4e200) = 0
syscall_slow_exit_work ([kernel.kallsyms])
do_syscall_64 ([kernel.kallsyms])
return_from_SYSCALL_64 ([kernel.kallsyms])
__nanosleep (/usr/lib64/libc-2.22.so)
usleep (/usr/lib64/libc-2.22.so)
main (/usr/bin/usleep)
__libc_start_main (/usr/lib64/libc-2.22.so)
_start (/usr/bin/usleep)
#
So lets just use perf_event_attr.exclude_callchain_kernel to avoid
collecting it in the ring buffer:
# trace -m 512 -e nanosleep --call dwarf usleep 1
0.063 ( 0.063 ms): usleep/19212 nanosleep(rqtp: 0x7ffc3df10fb0) = 0
__nanosleep (/usr/lib64/libc-2.22.so)
usleep (/usr/lib64/libc-2.22.so)
main (/usr/bin/usleep)
__libc_start_main (/usr/lib64/libc-2.22.so)
_start (/usr/bin/usleep)
#
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Milian Wolff <milian.wolff@kdab.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-qctu3gqhpim0dfbcp9d86c91@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Mon, 11 Apr 2016 15:15:48 +0000 (12:15 -0300)]
perf evsel: Introduce fprintf_callchain() method out of fprintf_sym()
In 'perf trace' we're just interested in printing callchains, and we
don't want to use the symbol_conf.use_callchain, so move the callchain
part to a new method.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Milian Wolff <milian.wolff@kdab.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-kcn3romzivcpxb3u75s9nz33@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Mon, 11 Apr 2016 14:14:06 +0000 (11:14 -0300)]
perf evsel: Rename print_ip() to fprintf_sym()
As it receives a FILE, and its more than just the IP, which can even be
requested not to be printed.
For consistency with other similar methods in tools/perf/, name it as
perf_evsel__fprintf_sym() and make it return the number of bytes
printed, just like 'fprintf(3)'
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Milian Wolff <milian.wolff@kdab.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-84gawlqa3lhk63nf0t9vnqnn@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Milian Wolff [Fri, 8 Apr 2016 11:34:15 +0000 (13:34 +0200)]
perf trace: Add support for printing call chains on sys_exit events.
Now, one can print the call chain for every encountered sys_exit event,
e.g.:
$ perf trace -e nanosleep --call-graph dwarf path/to/ex_sleep
1005.757 (1000.090 ms): ex_sleep/13167 nanosleep(...) = 0
syscall_slow_exit_work ([kernel.kallsyms])
syscall_return_slowpath ([kernel.kallsyms])
int_ret_from_sys_call ([kernel.kallsyms])
__nanosleep (/usr/lib/libc-2.23.so)
[unknown] (/usr/lib/libQt5Core.so.5.6.0)
QThread::sleep (/usr/lib/libQt5Core.so.5.6.0)
main (path/to/ex_sleep)
__libc_start_main (/usr/lib/libc-2.23.so)
_start (path/to/ex_sleep)
Note that it is advised to increase the number of mmap pages to prevent
event losses when using this new feature. Often, adding `-m 10M` to the
`perf trace` invocation is enough.
This feature is also available in strace when built with libunwind via
`strace -k`. Performance wise, this solution is much better:
$ time find path/to/linux &> /dev/null
real 0m0.051s
user 0m0.013s
sys 0m0.037s
$ time perf trace -m 800M --call-graph dwarf find path/to/linux &> /dev/null
real 0m2.624s
user 0m1.203s
sys 0m1.333s
$ time strace -k find path/to/linux &> /dev/null
real 0m35.398s
user 0m10.403s
sys 0m23.173s
Note that it is currently not possible to configure the print output.
Adding such a feature, similar to what is available in `perf script` via
its `--fields` knob can be added later on.
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
LPU-Reference:
1460115255-17648-1-git-send-email-milian.wolff@kdab.com
[ Split from a larger patch, do not print the IP, left align,
remove dup call symbol__init(), added man page entry ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Mon, 11 Apr 2016 13:53:51 +0000 (10:53 -0300)]
perf evsel: Allow passing a left alignment when printing a symbol
For callchains, etc where we want it to align just below the syscall
name, for instance, in 'perf trace'
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-uk9ekchd67651c625ltaur5y@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Milian Wolff [Mon, 11 Apr 2016 13:18:11 +0000 (10:18 -0300)]
perf evsel: Allow specifying a file to output in perf_evsel__print_ip
As this function will be used in 'perf trace'.
Cc: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/n/tip-8x297v9utnxq77onikevvlse@git.kernel.org
[ Split from a larger patch ]
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Wang Nan [Fri, 8 Apr 2016 15:07:25 +0000 (15:07 +0000)]
perf bpf: Automatically create bpf-output event __bpf_stdout__
This patch removes the need to set a bpf-output event in cmdline. By
referencing a map named '__bpf_stdout__', perf automatically creates an
event for it.
For example:
# perf record -e ./test_bpf_trace.c usleep 100000
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.012 MB perf.data (2 samples) ]
# perf script
usleep 4639 [000] 261895.307826: 0 __bpf_stdout__:
ffffffff810eb9a1 ...
BPF output: 0000: 52 61 69 73 65 20 61 20 Raise a
0008: 42 50 46 20 65 76 65 6e BPF even
0010: 74 21 00 00 t!..
BPF string: "Raise a BPF event!"
usleep 4639 [000] 261895.407883: 0 __bpf_stdout__:
ffffffff8105d609 ...
BPF output: 0000: 52 61 69 73 65 20 61 20 Raise a
0008: 42 50 46 20 65 76 65 6e BPF even
0010: 74 21 00 00 t!..
BPF string: "Raise a BPF event!"
perf record -e ./test_bpf_trace.c usleep 100000
equals to:
perf record -e bpf-output/no-inherit=1,name=__bpf_stdout__/ \
-e ./test_bpf_trace.c/map:__bpf_stdout__.event=__bpf_stdout__/ \
usleep 100000
Where test_bpf_trace.c is:
/************************ BEGIN **************************/
#include <uapi/linux/bpf.h>
struct bpf_map_def {
unsigned int type;
unsigned int key_size;
unsigned int value_size;
unsigned int max_entries;
};
#define SEC(NAME) __attribute__((section(NAME), used))
static u64 (*ktime_get_ns)(void) =
(void *)BPF_FUNC_ktime_get_ns;
static int (*trace_printk)(const char *fmt, int fmt_size, ...) =
(void *)BPF_FUNC_trace_printk;
static int (*get_smp_processor_id)(void) =
(void *)BPF_FUNC_get_smp_processor_id;
static int (*perf_event_output)(void *, struct bpf_map_def *, int, void *, unsigned long) =
(void *)BPF_FUNC_perf_event_output;
struct bpf_map_def SEC("maps") __bpf_stdout__ = {
.type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
.key_size = sizeof(int),
.value_size = sizeof(u32),
.max_entries = __NR_CPUS__,
};
static inline int __attribute__((always_inline))
func(void *ctx, int type)
{
char output_str[] = "Raise a BPF event!";
char err_str[] = "BAD %d\n";
int err;
err = perf_event_output(ctx, &__bpf_stdout__, get_smp_processor_id(),
&output_str, sizeof(output_str));
if (err)
trace_printk(err_str, sizeof(err_str), err);
return 1;
}
SEC("func_begin=sys_nanosleep")
int func_begin(void *ctx) {return func(ctx, 1);}
SEC("func_end=sys_nanosleep%return")
int func_end(void *ctx) { return func(ctx, 2);}
char _license[] SEC("license") = "GPL";
int _version SEC("version") = LINUX_VERSION_CODE;
/************************* END ***************************/
Committer note:
Testing with 'perf trace':
# trace -e nanosleep --ev test_bpf_stdout.c usleep 1
0.007 ( 0.007 ms): usleep/729 nanosleep(rqtp: 0x7ffc5bbc5fe0) ...
0.007 ( ): __bpf_stdout__:Raise a BPF event!..)
0.008 ( ): perf_bpf_probe:func_begin:(
ffffffff81112460))
0.069 ( ): __bpf_stdout__:Raise a BPF event!..)
0.070 ( ): perf_bpf_probe:func_end:(
ffffffff81112460 <-
ffffffff81003d92))
0.072 ( 0.072 ms): usleep/729 ... [continued]: nanosleep()) = 0
#
Suggested-and-Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1460128045-97310-5-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Wang Nan [Fri, 8 Apr 2016 15:07:24 +0000 (15:07 +0000)]
perf bpf: Clone bpf stdout events in multiple bpf scripts
This patch allows cloning bpf-output event configuration among multiple
bpf scripts. If there exist a map named '__bpf_output__' and not
configured using 'map:__bpf_output__.event=', this patch clones the
configuration of another '__bpf_stdout__' map. For example, following
command:
# perf trace --ev bpf-output/no-inherit,name=evt/ \
--ev ./test_bpf_trace.c/map:__bpf_stdout__.event=evt/ \
--ev ./test_bpf_trace2.c usleep 100000
equals to:
# perf trace --ev bpf-output/no-inherit,name=evt/ \
--ev ./test_bpf_trace.c/map:__bpf_stdout__.event=evt/ \
--ev ./test_bpf_trace2.c/map:__bpf_stdout__.event=evt/ \
usleep 100000
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Suggested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1460128045-97310-4-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Linus Torvalds [Mon, 11 Apr 2016 00:58:30 +0000 (17:58 -0700)]
Linux 4.6-rc3
Linus Torvalds [Mon, 11 Apr 2016 00:48:17 +0000 (17:48 -0700)]
Merge branch 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm
Pull ARM fixes from Russell King:
"A couple of small fixes, and wiring up the new syscalls which appeared
during the merge window"
* 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm:
ARM: 8550/1: protect idiv patching against undefined gcc behavior
ARM: wire up preadv2 and pwritev2 syscalls
ARM: SMP enable of cache maintanence broadcast
Linus Torvalds [Mon, 11 Apr 2016 00:38:55 +0000 (17:38 -0700)]
Merge tag 'mmc-v4.6-rc1' of git://git.linaro.org/people/ulf.hansson/mmc
Pull MMC fixes from Ulf Hansson:
"Here are a couple of mmc fixes intended for v4.6 rc3:
MMC host:
- sdhci: Fix regression setting power on Trats2 board
- sdhci-pci: Add support and PCI IDs for more Broxton host controllers"
* tag 'mmc-v4.6-rc1' of git://git.linaro.org/people/ulf.hansson/mmc:
mmc: sdhci-pci: Add support and PCI IDs for more Broxton host controllers
mmc: sdhci: Fix regression setting power on Trats2 board
Linus Torvalds [Mon, 11 Apr 2016 00:04:42 +0000 (17:04 -0700)]
Merge branch 'i2c/for-current' of git://git./linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"Some bugfixes from I2C:
- fix a uevent triggered boot problem by removing a useless debug
print
- fix sysfs-attributes of the new i2c-demux-pinctrl driver to follow
standard kernel behaviour
- fix a potential division-by-zero error (needed two takes)"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: jz4780: really prevent potential division by zero
Revert "i2c: jz4780: prevent potential division by zero"
i2c: jz4780: prevent potential division by zero
i2c: mux: demux-pinctrl: Update docs to new sysfs-attributes
i2c: mux: demux-pinctrl: Clean up sysfs attributes
i2c: prevent endless uevent loop with CONFIG_I2C_DEBUG_CORE
Linus Torvalds [Sun, 10 Apr 2016 23:52:24 +0000 (16:52 -0700)]
Revert "ext4: allow readdir()'s of large empty directories to be interrupted"
This reverts commit
1028b55bafb7611dda1d8fed2aeca16a436b7dff.
It's broken: it makes ext4 return an error at an invalid point, causing
the readdir wrappers to write the the position of the last successful
directory entry into the position field, which means that the next
readdir will now return that last successful entry _again_.
You can only return fatal errors (that terminate the readdir directory
walk) from within the filesystem readdir functions, the "normal" errors
(that happen when the readdir buffer fills up, for example) happen in
the iterorator where we know the position of the actual failing entry.
I do have a very different patch that does the "signal_pending()"
handling inside the iterator function where it is allowable, but while
that one passes all the sanity checks, I screwed up something like four
times while emailing it out, so I'm not going to commit it today.
So my track record is not good enough, and the stars will have to align
better before that one gets committed. And it would be good to get some
review too, of course, since celestial alignments are always an iffy
debugging model.
IOW, let's just revert the commit that caused the problem for now.
Reported-by: Greg Thelen <gthelen@google.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sat, 9 Apr 2016 21:10:20 +0000 (14:10 -0700)]
Merge branch 'parisc-4.6-3' of git://git./linux/kernel/git/deller/parisc-linux
Pull parisc fixes from Helge Deller:
"Since commit
0de798584bde ("parisc: Use generic extable search and
sort routines") module loading is boken on parisc, because the parisc
module loader wasn't prepared for the new R_PARISC_PCREL32 relocations.
In addition, due to that breakage, Mikulas Patocka noticed that
handling exceptions from modules probably never worked on parisc. It
was just masked by the fact that exceptions from modules don't happen
during normal use.
This patch series fixes those issues and survives the tests of the
lib/test_user_copy kernel module test. Some patches are tagged for
stable"
* 'parisc-4.6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Update comment regarding relative extable support
parisc: Unbreak handling exceptions from kernel modules
parisc: Fix kernel crash with reversed copy_from_user()
parisc: Avoid function pointers for kernel exception routines
parisc: Handle R_PARISC_PCREL32 relocations in kernel modules
Linus Torvalds [Sat, 9 Apr 2016 21:05:45 +0000 (14:05 -0700)]
Merge branch 'libnvdimm-fixes' of git://git./linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm fixes from Dan Williams:
"Three fixes, the first two are tagged for -stable:
- The ndctl utility/library gained expanded unit tests illuminating a
long standing bug in the libnvdimm SMART data retrieval
implementation.
It has been broken since its initial implementation, now fixed.
- Another one line fix for the detection of stale info blocks.
Without this change userspace can get into a situation where it is
unable to reconfigure a namespace.
- Fix the badblock initialization path in the presence of the new (in
v4.6-rc1) section alignment workarounds.
Without this change badblocks will be reported at the wrong offset.
These have received a build success report from the kbuild robot and
have appeared in -next with no reported issues"
* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
libnvdimm, pfn: fix nvdimm_namespace_add_poison() vs section alignment
libnvdimm, pfn: fix uuid validation
libnvdimm: fix smart data retrieval
Linus Torvalds [Sat, 9 Apr 2016 20:28:50 +0000 (13:28 -0700)]
Merge tag 'gpio-v4.6-3' of git://git./linux/kernel/git/linusw/linux-gpio
Pull GPIO fixes from Linus Walleij:
"Here is a set of four GPIO fixes. The two fixes to the core are
serious as they are regressing minor architectures.
Core fixes:
- Defer GPIO device setup until after gpiolib is initialized.
It turns out that a few very tightly integrated GPIO platform
drivers initialize so early (befor core_initcall()) so that the
gpiolib isn't even initialized itself. That limits what the
library can do, and we cannot reference uninitialized fields until
later.
Defer some of the initialization until right after the gpiolib is
initialized in these (rare) cases.
- As a consequence: do not use devm_* resources when allocating the
states in the initial set-up of the gpiochip.
Driver fixes:
- In ACPI retrieveal: ignore GpioInt when looking for output GPIOs.
- Fix legacy builds on the PXA without a backing pin controller.
- Use correct datatype on pca953x register writes"
* tag 'gpio-v4.6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
gpio: pca953x: Use correct u16 value for register word write
gpiolib: Defer gpio device setup until after gpiolib initialization
gpiolib: Do not use devm functions when registering gpio chip
gpio: pxa: fix legacy non pinctrl aware builds
gpio / ACPI: ignore GpioInt() GPIOs when requesting GPIO_OUT_*
Linus Torvalds [Sat, 9 Apr 2016 19:32:44 +0000 (12:32 -0700)]
Merge tag 'tty-4.6-rc3' of git://git./linux/kernel/git/gregkh/tty
Pull tty fixes from Greg KH:
"Here are two tty fixes for issues found.
One was due to a merge error in 4.6-rc1, and the other a regression
fix for UML consoles that broke in 4.6-rc1.
Both have been in linux-next for a while"
* tag 'tty-4.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
tty: Fix merge of "tty: Refactor tty_open()"
tty: Fix UML console breakage
Linus Torvalds [Sat, 9 Apr 2016 19:23:02 +0000 (12:23 -0700)]
Merge tag 'usb-4.6-rc3' of git://git./linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are some USB fixes and new device ids for 4.6-rc3.
Nothing major, the normal USB gadget fixes and usb-serial driver ids,
along with some other fixes mixed in. All except the USB serial ids
have been tested in linux-next, the id additions should be fine as
they are 'trivial'"
* tag 'usb-4.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (25 commits)
USB: option: add "D-Link DWM-221 B1" device id
USB: serial: cp210x: Adding GE Healthcare Device ID
USB: serial: ftdi_sio: Add support for ICP DAS I-756xU devices
usb: dwc3: keystone: drop dma_mask configuration
usb: gadget: udc-core: remove manual dma configuration
usb: dwc3: pci: add ID for one more Intel Broxton platform
usb: renesas_usbhs: fix to avoid using a disabled ep in usbhsg_queue_done()
usb: dwc2: do not override forced dr_mode in gadget setup
usb: gadget: f_midi: unlock on error
USB: digi_acceleport: do sanity checking for the number of ports
USB: cypress_m8: add endpoint sanity check
USB: mct_u232: add sanity checking in probe
usb: fix regression in SuperSpeed endpoint descriptor parsing
USB: usbip: fix potential out-of-bounds write
usb: renesas_usbhs: disable TX IRQ before starting TX DMAC transfer
usb: renesas_usbhs: avoid NULL pointer derefernce in usbhsf_pkt_handler()
usb: gadget: f_midi: Fixed a bug when buflen was smaller than wMaxPacketSize
usb: phy: qcom-8x16: fix regulator API abuse
usb: ch9: Fix SSP Device Cap wFunctionalitySupport type
usb: gadget: composite: Access SSP Dev Cap fields properly
...
Linus Torvalds [Sat, 9 Apr 2016 19:09:37 +0000 (12:09 -0700)]
Merge tag 'staging-4.6-rc3' of git://git./linux/kernel/git/gregkh/staging
Pull staging and IIO driver fixes from Greg KH:
"Here are some IIO driver fixes, along with two staging driver fixes
for 4.6-rc3.
One staging driver patch reverts the deletion of a driver that
happened in 4.6-rc1. We thought that laptop.org was dead, but it's
still alive and kicking, and has users that were mad we broke their
hardware by deleting a driver for their machines. So that driver is
added back and everyone is happy again.
All of these have been in linux-next for a while with no reported
issues"
* tag 'staging-4.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
Revert "Staging: olpc_dcon: Remove obsolete driver"
staging/rdma/hfi1: select CRC32
iio: gyro: bmg160: fix buffer read values
iio: gyro: bmg160: fix endianness when reading axes
iio: accel: bmc150: fix endianness when reading axes
iio: st_magn: always define ST_MAGN_TRIGGER_SET_STATE
iio: fix config watermark initial value
iio: health: max30100: correct FIFO check condition
iio: imu: Fix inv_mpu6050 dependencies
iio: adc: Fix build error of missing devm_ioremap_resource on UM
iio: light: apds9960: correct FIFO check condition
iio: adc: max1363: correct reference voltage
iio: adc: max1363: add missing adc to max1363_id
Linus Torvalds [Sat, 9 Apr 2016 19:00:42 +0000 (12:00 -0700)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"This is a set of eight fixes.
Two are trivial gcc-6 updates (brace additions and unused variable
removal). There's a couple of cxlflash regressions, a correction for
sd being overly chatty on revalidation (causing excess log increases).
A VPD issue which could crash USB devices because they seem very
intolerant to VPD inquiries, an ALUA deadlock fix and a mpt3sas buffer
overrun fix"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: Do not attach VPD to devices that don't support it
sd: Fix excessive capacity printing on devices with blocks bigger than 512 bytes
scsi_dh_alua: Fix a recently introduced deadlock
scsi: Declare local symbols static
cxlflash: Move to exponential back-off when cmd_room is not available
cxlflash: Fix regression issue with re-ordering patch
mpt3sas: Don't overreach ioc->reply_post[] during initialization
aacraid: add missing curly braces
Linus Torvalds [Sat, 9 Apr 2016 18:23:27 +0000 (11:23 -0700)]
Merge tag 'md/4.6-rc2-fix' of git://git./linux/kernel/git/shli/md
Pull MD fixes from Shaohua Li:
"This update mainly fixes bugs:
- fix error handling (Guoqing)
- fix a crash when a disk is hotremoved (me)
- fix a dead loop (Wei Fang)"
* tag 'md/4.6-rc2-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md:
md/bitmap: clear bitmap if bitmap_create failed
MD: add rdev reference for super write
md: fix a trivial typo in comments
md:raid1: fix a dead loop when read from a WriteMostly disk
Linus Torvalds [Sat, 9 Apr 2016 18:03:48 +0000 (11:03 -0700)]
Merge tag 'pm+acpi-4.6-rc3' of git://git./linux/kernel/git/rafael/linux-pm
Pull power management and ACPI fixes from Rafael Wysocki:
"Fixes for some issues discovered after recent changes and for some
that have just been found lately regardless of those changes
(intel_pstate, intel_idle, PM core, mailbox/pcc, turbostat) plus
support for some new CPU models (intel_idle, Intel RAPL driver,
turbostat) and documentation updates (intel_pstate, PM core).
Specifics:
- intel_pstate fixes for two issues exposed by the recent switch over
from using timers and for one issue introduced during the 4.4 cycle
plus new comments describing data structures used by the driver
(Rafael Wysocki, Srinivas Pandruvada).
- intel_idle fixes related to CPU offline/online (Richard Cochran).
- intel_idle support (new CPU IDs and state definitions mostly) for
Skylake-X and Kabylake processors (Len Brown).
- PCC mailbox driver fix for an out-of-bounds memory access that may
cause the kernel to panic() (Shanker Donthineni).
- New (missing) CPU ID for one apparently overlooked Haswell model in
the Intel RAPL power capping driver (Srinivas Pandruvada).
- Fix for the PM core's wakeup IRQs framework to make it work after
wakeup settings reconfiguration from sysfs (Grygorii Strashko).
- Runtime PM documentation update to make it describe what needs to
be done during device removal more precisely (Krzysztof Kozlowski).
- Stale comment removal cleanup in the cpufreq-dt driver (Viresh
Kumar).
- turbostat utility fixes and support for Broxton, Skylake-X and
Kabylake processors (Len Brown)"
* tag 'pm+acpi-4.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (28 commits)
PM / wakeirq: fix wakeirq setting after wakup re-configuration from sysfs
tools/power turbostat: work around RC6 counter wrap
tools/power turbostat: initial KBL support
tools/power turbostat: initial SKX support
tools/power turbostat: decode BXT TSC frequency via CPUID
tools/power turbostat: initial BXT support
tools/power turbostat: print IRTL MSRs
tools/power turbostat: SGX state should print only if --debug
intel_idle: Add KBL support
intel_idle: Add SKX support
intel_idle: Clean up all registered devices on exit.
intel_idle: Propagate hot plug errors.
intel_idle: Don't overreact to a cpuidle registration failure.
intel_idle: Setup the timer broadcast only on successful driver load.
intel_idle: Avoid a double free of the per-CPU data.
intel_idle: Fix dangling registration on error path.
intel_idle: Fix deallocation order on the driver exit path.
intel_idle: Remove redundant initialization calls.
intel_idle: Fix a helper function's return value.
intel_idle: remove useless return from void function.
...
Linus Torvalds [Sat, 9 Apr 2016 17:50:44 +0000 (10:50 -0700)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) Stale SKB data pointer access across pskb_may_pull() calls in L2TP,
from Haishuang Yan.
2) Fix multicast frame handling in mac80211 AP code, from Felix
Fietkau.
3) mac80211 station hashtable insert errors not handled properly, fix
from Johannes Berg.
4) Fix TX descriptor count limit handling in e1000, from Alexander
Duyck.
5) Revert a buggy netdev refcount fix in netpoll, from Bjorn Helgaas.
6) Must assign rtnl_link_ops of the device before registering it, fix
in ip6_tunnel from Thadeu Lima de Souza Cascardo.
7) Memory leak fix in tc action net exit, from WANG Cong.
8) Add missing AF_KCM entries to name tables, from Dexuan Cui.
9) Fix regression in GRE handling of csums wrt. FOU, from Alexander
Duyck.
10) Fix memory allocation alignment and congestion map corruption in
RDS, from Shamir Rabinovitch.
11) Fix default qdisc regression in tuntap driver, from Jason Wang.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (44 commits)
bridge, netem: mark mailing lists as moderated
tuntap: restore default qdisc
mpls: find_outdev: check for err ptr in addition to NULL check
ipv6: Count in extension headers in skb->network_header
RDS: fix congestion map corruption for PAGE_SIZE > 4k
RDS: memory allocated must be align to 8
GRE: Disable segmentation offloads w/ CSUM and we are encapsulated via FOU
net: add the AF_KCM entries to family name tables
MAINTAINERS: intel-wired-lan list is moderated
lib/test_bpf: Add additional BPF_ADD tests
lib/test_bpf: Add test to check for result of 32-bit add that overflows
lib/test_bpf: Add tests for unsigned BPF_JGT
lib/test_bpf: Fix JMP_JSET tests
VSOCK: Detach QP check should filter out non matching QPs.
stmmac: fix adjust link call in case of a switch is attached
af_packet: tone down the Tx-ring unsupported spew.
net_sched: fix a memory leak in tc action
samples/bpf: Enable powerpc support
samples/bpf: Use llc in PATH, rather than a hardcoded value
samples/bpf: Fix build breakage with map_perf_test_user.c
...
Linus Torvalds [Sat, 9 Apr 2016 17:41:34 +0000 (10:41 -0700)]
Merge branch 'for-linus-4.6' of git://git./linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
"These are bug fixes, including a really old fsync bug, and a few trace
points to help us track down problems in the quota code"
* 'for-linus-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: fix file/data loss caused by fsync after rename and new inode
btrfs: Reset IO error counters before start of device replacing
btrfs: Add qgroup tracing
Btrfs: don't use src fd for printk
btrfs: fallback to vmalloc in btrfs_compare_tree
btrfs: handle non-fatal errors in btrfs_qgroup_inherit()
btrfs: Output more info for enospc_debug mount option
Btrfs: fix invalid reference in replace_path
Btrfs: Improve FL_KEEP_SIZE handling in fallocate
Linus Torvalds [Sat, 9 Apr 2016 17:33:58 +0000 (10:33 -0700)]
Merge tag 'for-linus-4.6-ofs1' of git://git./linux/kernel/git/hubcap/linux
Pull orangefs fixes from Mike Marshall:
"Orangefs cleanups and a strncpy vulnerability fix.
Cleanups:
- remove an unused variable from orangefs_readdir.
- clean up printk wrapper used for ofs "gossip" debugging.
- clean up truncate ctime and mtime setting in inode.c
- remove a useless null check found by coccinelle.
- optimize some memcpy/memset boilerplate code.
- remove some useless sanity checks from xattr.c
Fix:
- fix a potential strncpy vulnerability"
* tag 'for-linus-4.6-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux:
orangefs: remove unused variable
orangefs: Add KERN_<LEVEL> to gossip_<level> macros
orangefs: strncpy -> strscpy
orangefs: clean up truncate ctime and mtime setting
Orangefs: fix ifnullfree.cocci warnings
Orangefs: optimize boilerplate code.
Orangefs: xattr.c cleanup
Linus Torvalds [Sat, 9 Apr 2016 17:23:45 +0000 (10:23 -0700)]
Merge tag 'iommu-fixes-v4.6-rc2' of git://git./linux/kernel/git/joro/iommu
Pull IOMMU fixes from Joerg Roedel:
- compile-time fixes (warnings and failures)
- a bug in iommu core code which could cause the group->domain pointer
to be falsly cleared
- fix in scatterlist handling of the ARM common DMA-API code
- stall detection fix for the Rockchip IOMMU driver
* tag 'iommu-fixes-v4.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/vt-d: Silence an uninitialized variable warning
iommu/rockchip: Fix "is stall active" check
iommu: Don't overwrite domain pointer when there is no default_domain
iommu/dma: Restore scatterlist offsets correctly
iommu: provide of_xlate pointer unconditionally
Wolfram Sang [Sun, 3 Apr 2016 21:32:00 +0000 (23:32 +0200)]
i2c: jz4780: really prevent potential division by zero
Make sure we avoid a division-by-zero OOPS in case clock-frequency is
set too low in DT. Add missing '\n' while we are here.
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Acked-by: Axel Lin <axel.lin@ingics.com>
Wolfram Sang [Sat, 9 Apr 2016 06:32:37 +0000 (08:32 +0200)]
Revert "i2c: jz4780: prevent potential division by zero"
This reverts commit
34cf2acdafaa31a13821e45de5ee896adcd307b1. 'ret' is
not set when bailing out. Also, there is a better place to check for 0.
Reported-by: Axel Lin <axel.lin@ingics.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Greg Kroah-Hartman [Fri, 8 Apr 2016 22:41:58 +0000 (15:41 -0700)]
Merge tag 'usb-serial-4.6-rc3' of git://git./linux/kernel/git/johan/usb-serial into usb-linus
Johan writes:
USB-serial fixes for v4.6-rc3
Here are some new device ids.
Signed-off-by: Johan Hovold <johan@kernel.org>
David S. Miller [Fri, 8 Apr 2016 20:41:28 +0000 (16:41 -0400)]
Merge tag 'mac80211-for-davem-2016-04-06' of git://git./linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
For the current RC series, we have the following fixes:
* TDLS fixes from Arik and Ilan
* rhashtable fixes from Ben and myself
* documentation fixes from Luis
* U-APSD fixes from Emmanuel
* a TXQ fix from Felix
* and a compiler warning suppression from Jeff
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
stephen hemminger [Tue, 5 Apr 2016 20:43:53 +0000 (13:43 -0700)]
bridge, netem: mark mailing lists as moderated
I moderate these (lightly loaded) lists to block spam.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Helge Deller [Fri, 8 Apr 2016 19:36:06 +0000 (21:36 +0200)]
parisc: Update comment regarding relative extable support
Update the comment to reflect the changes of commit
0de7985 (parisc: Use
generic extable search and sort routines).
Signed-off-by: Helge Deller <deller@gmx.de>
Helge Deller [Fri, 8 Apr 2016 16:32:52 +0000 (18:32 +0200)]
parisc: Unbreak handling exceptions from kernel modules
Handling exceptions from modules never worked on parisc.
It was just masked by the fact that exceptions from modules
don't happen during normal use.
When a module triggers an exception in get_user() we need to load the
main kernel dp value before accessing the exception_data structure, and
afterwards restore the original dp value of the module on exit.
Noticed-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org
Helge Deller [Fri, 8 Apr 2016 16:18:48 +0000 (18:18 +0200)]
parisc: Fix kernel crash with reversed copy_from_user()
The kernel module testcase (lib/test_user_copy.c) exhibited a kernel
crash on parisc if the parameters for copy_from_user were reversed
("illegal reversed copy_to_user" testcase).
Fix this potential crash by checking the fault handler if the faulting
address is in the exception table.
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org
Cc: Kees Cook <keescook@chromium.org>
Helge Deller [Fri, 8 Apr 2016 16:11:33 +0000 (18:11 +0200)]
parisc: Avoid function pointers for kernel exception routines
We want to avoid the kernel module loader to create function pointers
for the kernel fixup routines of get_user() and put_user(). Changing
the external reference from function type to int type fixes this.
This unbreaks exception handling for get_user() and put_user() when
called from a kernel module.
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org
Helge Deller [Fri, 8 Apr 2016 20:10:35 +0000 (22:10 +0200)]
parisc: Handle R_PARISC_PCREL32 relocations in kernel modules
Commit
0de7985 (parisc: Use generic extable search and sort routines)
changed the exception tables to use 32bit relative offsets.
This patch now adds support to the kernel module loader to handle such
R_PARISC_PCREL32 relocations for 32- and 64-bit modules.
Signed-off-by: Helge Deller <deller@gmx.de>
Jason Wang [Fri, 8 Apr 2016 05:26:48 +0000 (13:26 +0800)]
tuntap: restore default qdisc
After commit
f84bb1eac027 ("net: fix IFF_NO_QUEUE for drivers using
alloc_netdev"), default qdisc was changed to noqueue because
tuntap does not set tx_queue_len during .setup(). This patch restores
default qdisc by setting tx_queue_len in tun_setup().
Fixes:
f84bb1eac027 ("net: fix IFF_NO_QUEUE for drivers using alloc_netdev")
Cc: Phil Sutter <phil@nwl.cc>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
Martin Brandenburg [Mon, 4 Apr 2016 20:26:38 +0000 (16:26 -0400)]
orangefs: remove unused variable
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
Rafael J. Wysocki [Fri, 8 Apr 2016 19:46:56 +0000 (21:46 +0200)]
Merge branches 'pm-core', 'powercap' and 'pm-tools'
* pm-core:
PM / wakeirq: fix wakeirq setting after wakup re-configuration from sysfs
PM / runtime: Document steps for device removal
* powercap:
powercap: intel_rapl: Add missing Haswell model
* pm-tools:
tools/power turbostat: work around RC6 counter wrap
tools/power turbostat: initial KBL support
tools/power turbostat: initial SKX support
tools/power turbostat: decode BXT TSC frequency via CPUID
tools/power turbostat: initial BXT support
tools/power turbostat: print IRTL MSRs
tools/power turbostat: SGX state should print only if --debug
Rafael J. Wysocki [Fri, 8 Apr 2016 19:46:05 +0000 (21:46 +0200)]
Merge branches 'pm-cpufreq', 'pm-cpuidle' and 'acpi-cppc'
* pm-cpufreq:
cpufreq: dt: Drop stale comment
cpufreq: intel_pstate: Documenation for structures
cpufreq: intel_pstate: fix inconsistency in setting policy limits
intel_pstate: Avoid extra invocation of intel_pstate_sample()
intel_pstate: Do not set utilization update hook too early
* pm-cpuidle:
intel_idle: Add KBL support
intel_idle: Add SKX support
intel_idle: Clean up all registered devices on exit.
intel_idle: Propagate hot plug errors.
intel_idle: Don't overreact to a cpuidle registration failure.
intel_idle: Setup the timer broadcast only on successful driver load.
intel_idle: Avoid a double free of the per-CPU data.
intel_idle: Fix dangling registration on error path.
intel_idle: Fix deallocation order on the driver exit path.
intel_idle: Remove redundant initialization calls.
intel_idle: Fix a helper function's return value.
intel_idle: remove useless return from void function.
* acpi-cppc:
mailbox: pcc: Don't access an unmapped memory address space
Joe Perches [Sun, 27 Mar 2016 21:34:52 +0000 (14:34 -0700)]
orangefs: Add KERN_<LEVEL> to gossip_<level> macros
Emit the logging messages at the appropriate levels.
Miscellanea:
o Change format to fmt
o Use the more common ##__VA_ARGS__
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
Martin Brandenburg [Fri, 8 Apr 2016 17:33:21 +0000 (13:33 -0400)]
orangefs: strncpy -> strscpy
It would have been possible for a rogue client-core to send in a symlink
target which is not NUL terminated. This returns EIO if the client-core
gives us corrupt data.
Leave debugfs and superblock code as is for now.
Other dcache.c and namei.c strncpy instances are safe because
ORANGEFS_NAME_MAX = NAME_MAX + 1; there is always enough space for a
name plus a NUL byte.
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
Martin Brandenburg [Mon, 4 Apr 2016 20:26:36 +0000 (16:26 -0400)]
orangefs: clean up truncate ctime and mtime setting
The ctime and mtime are always updated on a successful ftruncate and
only updated on a successful truncate where the size changed.
We handle the ``if the size changed'' bit.
This matches FUSE's behavior.
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
kbuild test robot [Sat, 26 Mar 2016 18:54:23 +0000 (02:54 +0800)]
Orangefs: fix ifnullfree.cocci warnings
fs/orangefs/orangefs-debugfs.c:130:2-26: WARNING: NULL check before freeing functions like kfree, debugfs_remove, debugfs_remove_recursive or usb_free_urb is not needed. Maybe consider reorganizing relevant code to avoid passing NULL values.
NULL check before some freeing functions is not needed.
Based on checkpatch warning
"kfree(NULL) is safe this check is probably not required"
and kfreeaddr.cocci by Julia Lawall.
Generated by: scripts/coccinelle/free/ifnullfree.cocci
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
Mike Marshall [Wed, 6 Apr 2016 15:19:37 +0000 (11:19 -0400)]
Orangefs: optimize boilerplate code.
Suggested by David Binderman <dcb314@hotmail.com>
The former can potentially be a performance win over the latter.
memcpy(d, s, len);
memset(d+len, c, size-len);
memset(d, c, size);
memcpy(d, s, len);
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
Mike Marshall [Wed, 6 Apr 2016 14:52:38 +0000 (10:52 -0400)]
Orangefs: xattr.c cleanup
1. It is nonsense to test for negative size_t, suggested by
David Binderman <dcb314@hotmail.com>
2. By the time Orangefs gets called, the vfs has ensured that
name != NULL, and that buffer and size are sane.
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
Roopa Prabhu [Fri, 8 Apr 2016 04:28:38 +0000 (21:28 -0700)]
mpls: find_outdev: check for err ptr in addition to NULL check
find_outdev calls inet{,6}_fib_lookup_dev() or dev_get_by_index() to
find the output device. In case of an error, inet{,6}_fib_lookup_dev()
returns error pointer and dev_get_by_index() returns NULL. But the function
only checks for NULL and thus can end up calling dev_put on an ERR_PTR.
This patch adds an additional check for err ptr after the NULL check.
Before: Trying to add an mpls route with no oif from user, no available
path to 10.1.1.8 and no default route:
$ip -f mpls route add 100 as 200 via inet 10.1.1.8
[ 822.337195] BUG: unable to handle kernel NULL pointer dereference at
00000000000003a3
[ 822.340033] IP: [<
ffffffff8148781e>] mpls_nh_assign_dev+0x10b/0x182
[ 822.340033] PGD
1db38067 PUD
1de9e067 PMD 0
[ 822.340033] Oops: 0000 [#1] SMP
[ 822.340033] Modules linked in:
[ 822.340033] CPU: 0 PID: 11148 Comm: ip Not tainted 4.5.0-rc7+ #54
[ 822.340033] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS
rel-1.7.5.1-0-g8936dbb-20141113_115728-nilsson.home.kraxel.org
04/01/2014
[ 822.340033] task:
ffff88001db82580 ti:
ffff88001dad4000 task.ti:
ffff88001dad4000
[ 822.340033] RIP: 0010:[<
ffffffff8148781e>] [<
ffffffff8148781e>]
mpls_nh_assign_dev+0x10b/0x182
[ 822.340033] RSP: 0018:
ffff88001dad7a88 EFLAGS:
00010282
[ 822.340033] RAX:
ffffffffffffff9b RBX:
ffffffffffffff9b RCX:
0000000000000002
[ 822.340033] RDX:
00000000ffffff9b RSI:
0000000000000008 RDI:
0000000000000000
[ 822.340033] RBP:
ffff88001ddc9ea0 R08:
ffff88001e9f1768 R09:
0000000000000000
[ 822.340033] R10:
ffff88001d9c1100 R11:
ffff88001e3c89f0 R12:
ffffffff8187e0c0
[ 822.340033] R13:
ffffffff8187e0c0 R14:
ffff88001ddc9e80 R15:
0000000000000004
[ 822.340033] FS:
00007ff9ed798700(0000) GS:
ffff88001fc00000(0000)
knlGS:
0000000000000000
[ 822.340033] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 822.340033] CR2:
00000000000003a3 CR3:
000000001de89000 CR4:
00000000000006f0
[ 822.340033] Stack:
[ 822.340033]
0000000000000000 0000000100000000 0000000000000000
0000000000000000
[ 822.340033]
0000000000000000 0801010a00000000 0000000000000000
0000000000000000
[ 822.340033]
0000000000000004 ffffffff8148749b ffffffff8187e0c0
000000000000001c
[ 822.340033] Call Trace:
[ 822.340033] [<
ffffffff8148749b>] ? mpls_rt_alloc+0x2b/0x3e
[ 822.340033] [<
ffffffff81488e66>] ? mpls_rtm_newroute+0x358/0x3e2
[ 822.340033] [<
ffffffff810e7bbc>] ? get_page+0x5/0xa
[ 822.340033] [<
ffffffff813b7d94>] ? rtnetlink_rcv_msg+0x17e/0x191
[ 822.340033] [<
ffffffff8111794e>] ? __kmalloc_track_caller+0x8c/0x9e
[ 822.340033] [<
ffffffff813c9393>] ?
rht_key_hashfn.isra.20.constprop.57+0x14/0x1f
[ 822.340033] [<
ffffffff813b7c16>] ? __rtnl_unlock+0xc/0xc
[ 822.340033] [<
ffffffff813cb794>] ? netlink_rcv_skb+0x36/0x82
[ 822.340033] [<
ffffffff813b4507>] ? rtnetlink_rcv+0x1f/0x28
[ 822.340033] [<
ffffffff813cb2b1>] ? netlink_unicast+0x106/0x189
[ 822.340033] [<
ffffffff813cb5b3>] ? netlink_sendmsg+0x27f/0x2c8
[ 822.340033] [<
ffffffff81392ede>] ? sock_sendmsg_nosec+0x10/0x1b
[ 822.340033] [<
ffffffff81393df1>] ? ___sys_sendmsg+0x182/0x1e3
[ 822.340033] [<
ffffffff810e4f35>] ?
__alloc_pages_nodemask+0x11c/0x1e4
[ 822.340033] [<
ffffffff8110619c>] ? PageAnon+0x5/0xd
[ 822.340033] [<
ffffffff811062fe>] ? __page_set_anon_rmap+0x45/0x52
[ 822.340033] [<
ffffffff810e7bbc>] ? get_page+0x5/0xa
[ 822.340033] [<
ffffffff810e85ab>] ? __lru_cache_add+0x1a/0x3a
[ 822.340033] [<
ffffffff81087ea9>] ? current_kernel_time64+0x9/0x30
[ 822.340033] [<
ffffffff813940c4>] ? __sys_sendmsg+0x3c/0x5a
[ 822.340033] [<
ffffffff8148f597>] ?
entry_SYSCALL_64_fastpath+0x12/0x6a
[ 822.340033] Code: 83 08 04 00 00 65 ff 00 48 8b 3c 24 e8 40 7c f2 ff
eb 13 48 c7 c3 9f ff ff ff eb 0f 89 ce e8 f1 ae f1 ff 48 89 c3 48 85 db
74 15 <48> 8b 83 08 04 00 00 65 ff 08 48 81 fb 00 f0 ff ff 76 0d eb 07
[ 822.340033] RIP [<
ffffffff8148781e>] mpls_nh_assign_dev+0x10b/0x182
[ 822.340033] RSP <
ffff88001dad7a88>
[ 822.340033] CR2:
00000000000003a3
[ 822.435363] ---[ end trace
98cc65e6f6b8bf11 ]---
After patch:
$ip -f mpls route add 100 as 200 via inet 10.1.1.8
RTNETLINK answers: Network is unreachable
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Reported-by: David Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnaldo Carvalho de Melo [Fri, 8 Apr 2016 15:04:29 +0000 (12:04 -0300)]
perf dwarf: Guard !x86_64 definitions under #ifdef else clause
To fix the build on Fedora Rawhide (gcc 6.0.0
20160311 (Red Hat 6.0.0-0.17):
CC /tmp/build/perf/arch/x86/util/dwarf-regs.o
arch/x86/util/dwarf-regs.c:66:36: error: 'x86_32_regoffset_table' defined but not used [-Werror=unused-const-variable=]
static const struct pt_regs_offset x86_32_regoffset_table[] = {
^~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-fghuksc1u8ln82bof4lwcj0o@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Fri, 8 Apr 2016 14:53:02 +0000 (11:53 -0300)]
perf tools: Use readdir() instead of deprecated readdir_r()
The readdir() function is thread safe as long as just one thread uses a
DIR, which is the case when parsing tracepoint event definitions, to
avoid breaking the build with glibc-2.23.90 (upcoming 2.24), use it
instead of readdir_r().
See: http://man7.org/linux/man-pages/man3/readdir.3.html
"However, in modern implementations (including the glibc implementation),
concurrent calls to readdir() that specify different directory streams
are thread-safe. In cases where multiple threads must read from the
same directory stream, using readdir() with external synchronization is
still preferable to the use of the deprecated readdir_r(3) function."
Noticed while building on a Fedora Rawhide docker container.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-wddn49r6bz6wq4ee3dxbl7lo@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Fri, 8 Apr 2016 14:32:15 +0000 (11:32 -0300)]
perf tools: Use readdir() instead of deprecated readdir_r()
The readdir() function is thread safe as long as just one thread uses a
DIR, which is the case when synthesizing events for pre-existing threads
by traversing /proc, so, to avoid breaking the build with glibc-2.23.90
(upcoming 2.24), use it instead of readdir_r().
See: http://man7.org/linux/man-pages/man3/readdir.3.html
"However, in modern implementations (including the glibc implementation),
concurrent calls to readdir() that specify different directory streams
are thread-safe. In cases where multiple threads must read from the
same directory stream, using readdir() with external synchronization is
still preferable to the use of the deprecated readdir_r(3) function."
Noticed while building on a Fedora Rawhide docker container.
CC /tmp/build/perf/util/event.o
util/event.c: In function '__event__synthesize_thread':
util/event.c:466:2: error: 'readdir_r' is deprecated [-Werror=deprecated-declarations]
while (!readdir_r(tasks, &dirent, &next) && next) {
^~~~~
In file included from /usr/include/features.h:368:0,
from /usr/include/stdint.h:25,
from /usr/lib/gcc/x86_64-redhat-linux/6.0.0/include/stdint.h:9,
from /git/linux/tools/include/linux/types.h:6,
from util/event.c:1:
/usr/include/dirent.h:189:12: note: declared here
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-i1vj7nyjp2p750rirxgrfd3c@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Fri, 8 Apr 2016 14:31:24 +0000 (11:31 -0300)]
perf thread_map: Use readdir() instead of deprecated readdir_r()
The readdir() function is thread safe as long as just one thread uses a
DIR, which is the case in thread_map, so, to avoid breaking the build
with glibc-2.23.90 (upcoming 2.24), use it instead of readdir_r().
See: http://man7.org/linux/man-pages/man3/readdir.3.html
"However, in modern implementations (including the glibc implementation),
concurrent calls to readdir() that specify different directory streams
are thread-safe. In cases where multiple threads must read from the
same directory stream, using readdir() with external synchronization is
still preferable to the use of the deprecated readdir_r(3) function."
Noticed while building on a Fedora Rawhide docker container.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-del8h2a0f40z75j4r42l96l0@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Fri, 8 Apr 2016 14:25:59 +0000 (11:25 -0300)]
perf script: Use readdir() instead of deprecated readdir_r()
The readdir() function is thread safe as long as just one thread uses a
DIR, which is the case in 'perf script', so, to avoid breaking the build
with glibc-2.23.90 (upcoming 2.24), use it instead of readdir_r().
See: http://man7.org/linux/man-pages/man3/readdir.3.html
"However, in modern implementations (including the glibc implementation),
concurrent calls to readdir() that specify different directory streams
are thread-safe. In cases where multiple threads must read from the
same directory stream, using readdir() with external synchronization is
still preferable to the use of the deprecated readdir_r(3) function."
Noticed while building on a Fedora Rawhide docker container.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-mt3xz7n2hl49ni2vx7kuq74g@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Wang Nan [Thu, 7 Apr 2016 10:24:31 +0000 (10:24 +0000)]
perf symbols: Adjust symbol for shared objects
He Kuang reported a problem that perf fails to get correct symbol on
Android platform in [1]. The problem can be reproduced on normal x86_64
platform. I will describe the reproducing steps in detail at the end of
commit message.
The reason of this problem is the missing of symbol adjustment for normal
shared objects. In most of the cases skipping adjustment is okay. However,
when '.text' section have different 'address' and 'offset' the result is wrong.
I checked all shared objects in my working platform, only wine dll objects and
debug objects (in .debug) have this problem. However, it is common on Android.
For example:
$ readelf -S ./libsurfaceflinger.so | grep \.text
[10] .text PROGBITS
0000000000029030 00012030
This patch enables symbol adjustment for dynamic objects so the symbol
address got from elfutils would be adjusted correctly.
Now nearly all types of ELF files should adjust symbols. Makes
ss->adjust_symbols default to true.
Steps to reproduce the problem:
$ cat ./Makefile
PWD := $(shell pwd)
LDFLAGS += "-Wl,-rpath=$(PWD)"
CFLAGS += -g
main: main.c libbuggy.so
libbuggy.so: buggy.c
gcc -g -shared -fPIC -Wl,-Ttext-segment=0x200000 $< -o $@
clean:
rm -rf main libbuggy.so *.o
$ cat ./buggy.c
int fib(int x)
{
return (x == 0) ? 1 : (x == 1) ? 1 : fib(x - 1) + fib(x - 2);
}
$ cat ./main.c
#include <stdio.h>
extern int fib(int x);
int main()
{
int i;
for (i = 0; i < 40; i++)
printf("%d\n", fib(i));
return 0;
}
$ make
$ perf record ./main
...
$ perf report --stdio
# Overhead Command Shared Object Symbol
# ........ ....... ................. ...............................
#
14.97% main libbuggy.so [.] 0x000000000000066c
8.68% main libbuggy.so [.] 0x00000000000006aa
8.52% main libbuggy.so [.] fib@plt
7.95% main libbuggy.so [.] 0x0000000000000664
5.94% main libbuggy.so [.] 0x00000000000006a9
5.35% main libbuggy.so [.] 0x0000000000000678
...
The correct result should be (after this patch):
# Overhead Command Shared Object Symbol
# ........ ....... ................. ...............................
#
91.47% main libbuggy.so [.] fib
8.52% main libbuggy.so [.] fib@plt
0.00% main [kernel.kallsyms] [k] kmem_cache_free
[1] http://lkml.kernel.org/g/
1452567507-54013-1-git-send-email-hekuang@huawei.com
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Cody P Schafer <dev@codyps.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kirill Smelkov <kirr@nexedi.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1460024671-64774-3-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Wang Nan [Thu, 7 Apr 2016 10:24:30 +0000 (10:24 +0000)]
perf symbols: Record text offset in dso to calculate objdump address
In this patch, the offset of '.text' section is stored into dso
and used here to re-calculate address to objdump.
In most of the cases, executable code is in '.text' section, so the
adjustment made to a symbol in dso__load_sym (using
sym.st_value -= shdr.sh_addr - shdr.sh_offset) should equal to
'sym.st_value -= dso->text_offset'. Therefore, adding text_offset back
get objdump address from symbol address (rip). However, it is not true
for kernel and kernel module since there could be multiple executable
sections with different offset. Exclude kernel for this reason.
After this patch, even dso->adjust_symbols is set to true for shared
objects, map__rip_2objdump() and map__objdump_2mem() would return
correct result, so perf behavior of annotate won't be changed.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Cody P Schafer <dev@codyps.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kirill Smelkov <kirr@nexedi.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1460024671-64774-2-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Mon, 4 Apr 2016 22:05:36 +0000 (19:05 -0300)]
perf tools: Build syscall table .c header from kernel's syscall_64.tbl
We used libaudit to map ids to syscall names and vice-versa, but that
imposes a delay in supporting new syscalls, having to wait for libaudit
to get those new syscalls on its tables.
To remove that delay, for x86_64 initially, grab a copy of
arch/x86/entry/syscalls/syscall_64.tbl and use it to generate those
tables.
Syscalls currently not available in audit-libs:
# trace -e copy_file_range,membarrier,mlock2,pread64,pwrite64,timerfd_create,userfaultfd
Error: Invalid syscall copy_file_range, membarrier, mlock2, pread64, pwrite64, timerfd_create, userfaultfd
Hint: try 'perf list syscalls:sys_enter_*'
Hint: and: 'man syscalls'
#
With this patch:
# trace -e copy_file_range,membarrier,mlock2,pread64,pwrite64,timerfd_create,userfaultfd
8505.733 ( 0.010 ms): gnome-shell/2519 timerfd_create(flags: 524288) = 36
8506.688 ( 0.005 ms): gnome-shell/2519 timerfd_create(flags: 524288) = 40
30023.097 ( 0.025 ms): qemu-system-x8/24629 pwrite64(fd: 18, buf: 0x7f63ae382000, count: 4096, pos:
529592320) = 4096
31268.712 ( 0.028 ms): qemu-system-x8/24629 pwrite64(fd: 18, buf: 0x7f63afd8b000, count: 4096, pos:
2314133504) = 4096
31268.854 ( 0.016 ms): qemu-system-x8/24629 pwrite64(fd: 18, buf: 0x7f63afda2000, count: 4096, pos:
2314137600) = 4096
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-51xfjbxevdsucmnbc4ka5r88@git.kernel.org
[ Added make dep for 'prepare' in 'LIBPERF_IN', fix by Wang Nan to fix parallell build ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Mon, 4 Apr 2016 20:52:18 +0000 (17:52 -0300)]
perf tools: Allow generating per-arch syscall table arrays
Tools should use a mechanism similar to arch/x86/entry/syscalls/ to
generate a header file with the definitions for two variables:
static const char *syscalltbl_x86_64[] = {
[0] = "read",
[1] = "write",
<SNIP>
[324] = "membarrier",
[325] = "mlock2",
[326] = "copy_file_range",
};
static const int syscalltbl_x86_64_max_id = 326;
In a per arch file that should then be included in
tools/perf/util/syscalltbl.c.
First one will be for x86_64.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-02uuamkxgccczdth8komspgp@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Mon, 4 Apr 2016 16:32:20 +0000 (13:32 -0300)]
perf trace: Move syscall table id <-> name routines to separate class
We're using libaudit for doing name to id and id to syscall name
translations, but that makes 'perf trace' to have to wait for newer
libaudit versions supporting recently added syscalls, such as
"userfaultfd" at the time of this changeset.
We have all the information right there, in the kernel sources, so move
this code to a separate place, wrapped behind functions that will
progressively use the kernel source files to extract the syscall table
for use in 'perf trace'.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-i38opd09ow25mmyrvfwnbvkj@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Thu, 7 Apr 2016 15:05:51 +0000 (12:05 -0300)]
perf trace: Beautify mode_t arguments
When reading the syscall tracepoint /format file, look for arguments of type
"mode_t" and attach a beautifier:
[root@jouet ~]# cat ~/bin/tp_with_fields_of_type
#!/bin/bash
grep -w $1 /sys/kernel/tracing/events/syscalls/*/format | sed -r 's%.*sys_enter_(.*)/format.*%\1%g' | paste -d, -s
# tp_with_fields_of_type umode_t
chmod,creat,fchmodat,fchmod,mkdirat,mkdir,mknodat,mknod,mq_open,openat,open
#
Testing it:
#define S_ISUID
0004000
#define S_ISGID
0002000
#define S_ISVTX
0001000
#define S_IRWXU
0000700
#define S_IRUSR
0000400
#define S_IWUSR
0000200
#define S_IXUSR
0000100
#define S_IRWXG
0000070
#define S_IRGRP
0000040
#define S_IWGRP
0000020
#define S_IXGRP
0000010
#define S_IRWXO
0000007
#define S_IROTH
0000004
#define S_IWOTH
0000002
#define S_IXOTH
0000001
# for mode in 4000 2000 1000 700 400 200 100 70 40 20 10 7 4 2 1 ; do \
echo -n $mode '->' ; trace --no-inherit -e chmod,fchmodat,fchmod chmod $mode x; \
done
4000 -> 0.338 ( 0.012 ms): fchmodat(dfd: CWD, filename: x, mode: ISUID) = 0
2000 -> 0.438 ( 0.015 ms): fchmodat(dfd: CWD, filename: x, mode: ISGID) = 0
1000 -> 0.677 ( 0.040 ms): fchmodat(dfd: CWD, filename: x, mode: ISVTX) = 0
700 -> 0.394 ( 0.013 ms): fchmodat(dfd: CWD, filename: x, mode: IRWXU) = 0
400 -> 0.337 ( 0.010 ms): fchmodat(dfd: CWD, filename: x, mode: IRUSR) = 0
200 -> 0.259 ( 0.008 ms): fchmodat(dfd: CWD, filename: x, mode: IWUSR) = 0
100 -> 0.249 ( 0.008 ms): fchmodat(dfd: CWD, filename: x, mode: IXUSR) = 0
70 -> 0.266 ( 0.008 ms): fchmodat(dfd: CWD, filename: x, mode: IRWXG) = 0
40 -> 0.329 ( 0.009 ms): fchmodat(dfd: CWD, filename: x, mode: IRGRP) = 0
20 -> 0.250 ( 0.009 ms): fchmodat(dfd: CWD, filename: x, mode: IWGRP) = 0
10 -> 0.259 ( 0.008 ms): fchmodat(dfd: CWD, filename: x, mode: IXGRP) = 0
7 -> 0.249 ( 0.009 ms): fchmodat(dfd: CWD, filename: x, mode: IRWXO) = 0
4 -> 0.278 ( 0.011 ms): fchmodat(dfd: CWD, filename: x, mode: IROTH) = 0
2 -> 0.276 ( 0.009 ms): fchmodat(dfd: CWD, filename: x, mode: IWOTH) = 0
1 -> 0.250 ( 0.008 ms): fchmodat(dfd: CWD, filename: x, mode: IXOTH) = 0
#
# trace --no-inherit -e chmod,fchmodat,fchmod chmod 7777 x
0.258 ( 0.011 ms): fchmodat(dfd: CWD, filename: x, mode: IALLUGO) = 0
# trace --no-inherit -e chmod,fchmodat,fchmod chmod 7770 x
0.258 ( 0.008 ms): fchmodat(dfd: CWD, filename: x, mode: ISUID|ISGID|ISVTX|IRWXU|IRWXG) = 0
# trace --no-inherit -e chmod,fchmodat,fchmod chmod 777 x
0.293 ( 0.012 ms): fchmodat(dfd: CWD, filename: x, mode: IRWXUGO
#
Now lets see if check by using the tracepoint for that specific syscall,
instead of raw_syscalls:sys_enter as 'trace' does for its strace fu:
# trace --no-inherit --ev syscalls:sys_enter_fchmodat -e fchmodat chmod 666 x
0.255 ( ): syscalls:sys_enter_fchmodat:dfd: 0xffffffffffffff9c, filename: 0x55db32a3f0f0, mode: 0x000001b6)
0.268 ( 0.012 ms): fchmodat(dfd: CWD, filename: x, mode: IRUGO|IWUGO ) = 0
#
Perfect, 0x1bc == 0666.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-18e8zfgbkj83xo87yoom43kd@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Thu, 7 Apr 2016 07:11:13 +0000 (09:11 +0200)]
perf script: Process event update events
Andreas reported following command produces no output:
# cat test.py
#!/usr/bin/env python
def stat__krava(cpu, thread, time, val, ena, run):
print "event %s cpu %d, thread %d, time %d, val %d, ena %d, run %d" % \
("krava", cpu, thread, time, val, ena, run)
# perf stat -a -I 1000 -e cycles,"cpu/config=0x6530160,name=krava/" record | perf script -s test.py
^C
#
The reason is that 'perf script' does not process event update events and
will never get the event name update thus the python callback is never
called.
The fix is just to add already existing callback we use in 'perf stat
report'.
Committer note:
After the patch:
# perf stat -a -I 1000 -e cycles,"cpu/config=0x6530160,name=krava/" record | perf script -s test.py
event krava cpu -1, thread -1, time
1000239179, val
1789051, ena
4000690920, run
4000690920
event krava cpu -1, thread -1, time
2000479061, val
2391338, ena
4000879596, run
4000879596
event krava cpu -1, thread -1, time
3000740802, val
1939121, ena
4000977209, run
4000977209
event krava cpu -1, thread -1, time
4001006730, val
2356115, ena
4001000489, run
4001000489
^C
#
Reported-by: Andreas Hollmann <hollmann@in.tum.de>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Milian Wolff <milian.wolff@kdab.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1460013073-18444-3-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Thu, 7 Apr 2016 07:11:12 +0000 (09:11 +0200)]
perf tools: Add dedicated unwind addr_space member into thread struct
Milian reported issue with thread::priv, which was double booked by perf
trace and DWARF unwind code. So using those together is impossible at
the moment.
Moving DWARF unwind private data into separate variable so perf trace
can keep using thread::priv.
Reported-and-Tested-by: Milian Wolff <milian.wolff@kdab.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Andreas Hollmann <hollmann@in.tum.de>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1460013073-18444-2-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Yong Li [Wed, 30 Mar 2016 06:49:14 +0000 (14:49 +0800)]
gpio: pca953x: Use correct u16 value for register word write
The current implementation only uses the first byte in val,
the second byte is always 0. Change it to use cpu_to_le16
to write the two bytes into the register
Cc: stable@vger.kernel.org
Signed-off-by: Yong Li <sdliyong@gmail.com>
Reviewed-by: Phil Reid <preid@electromag.com.au>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Guenter Roeck [Thu, 31 Mar 2016 15:11:30 +0000 (08:11 -0700)]
gpiolib: Defer gpio device setup until after gpiolib initialization
Since commit
ff2b13592299 ("gpio: make the gpiochip a real device"),
attempts to add a gpio chip prior to gpiolib initialization cause
the system to crash. This happens because gpio_bus_type has not been
registered yet. Defer creating gpio devices until after gpiolib has
been initialized to fix the problem.
Cc: Greg Ungerer <gerg@uclinux.org>
Cc: Alexandre Courbot <gnurou@gmail.com>
Fixes:
ff2b13592299 ("gpio: make the gpiochip a real device")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Guenter Roeck [Thu, 31 Mar 2016 15:11:29 +0000 (08:11 -0700)]
gpiolib: Do not use devm functions when registering gpio chip
It is possible that a gpio chip is registered before the gpiolib
initialization code has run. This means we can not use devm_ functions
to allocate memory at that time. Do it the old fashioned way.
Cc: Alexandre Courbot <gnurou@gmail.com>
Cc: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Robert Jarzmik [Tue, 29 Mar 2016 08:04:00 +0000 (10:04 +0200)]
gpio: pxa: fix legacy non pinctrl aware builds
In legacy pxa builds, ie. non device-tree and platform-data only builds,
pinctrl is not yet available. As a consequence, the pinctrl gpio
direction change function is a stub, returning always success.
In the current state, the gpio driver direction function believes the
pinctrl direction change was successful, and exits without actually
changing the gpio direction.
This patch changes the logic :
- if the pinctrl direction function fails, gpio direction will report
that failure
- if the pinctrl direction function succeeds, gpio direction is changed
by the gpio driver anyway.
This is sub optimal in the pinctrl aware case, as the gpio direction
will be changed twice: once by pinctrl function and another time by
the gpio direction function.
Yet it should be acceptable in this form, as this is functional for all
pxa platforms (device-tree and platform-data), and moreover changing a
gpio direction is very very seldom, usually in machine initialization,
seldom in drivers probe, and an exception for ac97 reset bug.
Fixes:
a770d946371e ("gpio: pxa: add pin control gpio direction and request")
Reported-by: Guenter Roeck <guenter@roeck-us.net>
Tested-by: Guenter Roeck <guenter@roeck-us.net>
Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Dmitry Torokhov [Thu, 24 Mar 2016 17:50:25 +0000 (10:50 -0700)]
gpio / ACPI: ignore GpioInt() GPIOs when requesting GPIO_OUT_*
When firmware does not use _DSD properties that allow properly name GPIO
resources, the kernel falls back on parsing _CRS resources, and will
return entries described as GpioInt() as general purpose GPIOs even
though they are meant to be used simply as interrupt sources for the
device:
Device (ETSA)
{
Name (_HID, "ELAN0001")
...
Method(_CRS, 0x0, NotSerialized)
{
Name(BUF0,ResourceTemplate ()
{
I2CSerialBus(
0x10, /* SlaveAddress */
ControllerInitiated, /* SlaveMode */
400000, /* ConnectionSpeed */
AddressingMode7Bit, /* AddressingMode */
"\\_SB.I2C1", /* ResourceSource */
)
GpioInt (Edge, ActiveLow, ExclusiveAndWake, PullNone,,
"\\_SB.GPSW") { BOARD_TOUCH_GPIO_INDEX }
} )
Return (BUF0)
}
...
}
This gives troubles with drivers such as Elan Touchscreen driver
(elants_i2c) that uses devm_gpiod_get to look up "reset" GPIO line and
decide whether the driver is responsible for powering up and resetting
the device, or firmware is. In the above case the lookup succeeds, we
map GPIO as output and later fail to request client->irq interrupt that
is mapped to the same GPIO.
Let's ignore resources described as GpioInt() while parsing _CRS when
requesting output GPIOs (but allow them when requesting GPIOD_ASIS or
GPIOD_IN as some drivers, such as i2c-hid, do request GPIO as input and
then map it to interrupt with gpiod_to_irq).
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Dmitry Torokhov <dtor@chromium.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Dan Williams [Fri, 8 Apr 2016 03:02:06 +0000 (20:02 -0700)]
libnvdimm, pfn: fix nvdimm_namespace_add_poison() vs section alignment
When section alignment padding is in effect we need to shift / truncate
the range that is queried for poison by the 'start_pad' or 'end_trunc'
reservations.
It's easiest if we just pass in an adjusted resource range rather than
deriving it from the passed in namespace. With the resource range
resolution pushed out to the caller we can also push the
namespace-to-region lookup to the caller and drop the implicit pmem-type
assumption about the passed in namespace object.
Cc: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Fri, 8 Apr 2016 02:59:27 +0000 (19:59 -0700)]
libnvdimm, pfn: fix uuid validation
If we detect a namespace has a stale info block in the init path, we
should overwrite with the latest configuration. In fact, we already
return -ENODEV when the parent uuid is invalid, the same should be done
for the 'self' uuid. Otherwise we can get into a condition where
userspace is unable to reconfigure the pfn-device without directly /
manually invalidating the info block.
Cc: <stable@vger.kernel.org>
Reported-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Fri, 8 Apr 2016 02:58:44 +0000 (19:58 -0700)]
libnvdimm: fix smart data retrieval
It appears that smart data retrieval has been broken the since the
initial implementation. Fix the payload size to be 128-bytes per the
specification.
Cc: <stable@vger.kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Jakub Sitnicki [Tue, 5 Apr 2016 16:41:08 +0000 (18:41 +0200)]
ipv6: Count in extension headers in skb->network_header
When sending a UDPv6 message longer than MTU, account for the length
of fragmentable IPv6 extension headers in skb->network_header offset.
Same as we do in alloc_new_skb path in __ip6_append_data().
This ensures that later on __ip6_make_skb() will make space in
headroom for fragmentable extension headers:
/* move skb->data to ip header from ext header */
if (skb->data < skb_network_header(skb))
__skb_pull(skb, skb_network_offset(skb));
Prevents a splat due to skb_under_panic:
skbuff: skb_under_panic: text:
ffffffff8143397b len:2126 put:14 \
head:
ffff880005bacf50 data:
ffff880005bacf4a tail:0x48 end:0xc0 dev:lo
------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:104!
invalid opcode: 0000 [#1] KASAN
CPU: 0 PID: 160 Comm: reproducer Not tainted 4.6.0-rc2 #65
[...]
Call Trace:
[<
ffffffff813eb7b9>] skb_push+0x79/0x80
[<
ffffffff8143397b>] eth_header+0x2b/0x100
[<
ffffffff8141e0d0>] neigh_resolve_output+0x210/0x310
[<
ffffffff814eab77>] ip6_finish_output2+0x4a7/0x7c0
[<
ffffffff814efe3a>] ip6_output+0x16a/0x280
[<
ffffffff815440c1>] ip6_local_out+0xb1/0xf0
[<
ffffffff814f1115>] ip6_send_skb+0x45/0xd0
[<
ffffffff81518836>] udp_v6_send_skb+0x246/0x5d0
[<
ffffffff8151985e>] udpv6_sendmsg+0xa6e/0x1090
[...]
Reported-by: Ji Jianwen <jiji@redhat.com>
Signed-off-by: Jakub Sitnicki <jkbs@redhat.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>