Adrian Hunter [Tue, 21 Jul 2015 09:44:05 +0000 (12:44 +0300)]
perf script: Don't assume evsel position of tracking events
The tracking event does not have to be the first event so replace
perf_evlist__first() with perf_evlist__id2evsel() which uses the event
ID to find the correct evsel.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Jiri Olsa <jolsa@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Pawel Moll <pawel.moll@arm.com>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1437471846-26995-5-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adrian Hunter [Tue, 21 Jul 2015 09:44:04 +0000 (12:44 +0300)]
perf record: Add option --switch-events to select PERF_RECORD_SWITCH events
Add an option to select PERF_RECORD_SWITCH events.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Jiri Olsa <jolsa@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Pawel Moll <pawel.moll@arm.com>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1437471846-26995-4-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adrian Hunter [Tue, 21 Jul 2015 09:44:03 +0000 (12:44 +0300)]
perf tools: Add new PERF_RECORD_SWITCH event
Support processing of PERF_RECORD_SWITCH events and
PERF_RECORD_SWITCH_CPU_WIDE events. There is a single
tools callback for them both so that the tool must
check the event type before using the extra members
in PERF_RECORD_SWITCH_CPU_WIDE.
There is still no way to select the events, though.
That is added in a subsequest patch.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Jiri Olsa <jolsa@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Pawel Moll <pawel.moll@arm.com>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1437471846-26995-3-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adrian Hunter [Tue, 21 Jul 2015 09:44:02 +0000 (12:44 +0300)]
perf: Add PERF_RECORD_SWITCH to indicate context switches
There are already two events for context switches, namely the tracepoint
sched:sched_switch and the software event context_switches.
Unfortunately neither are suitable for use by non-privileged users for
the purpose of synchronizing hardware trace data (e.g. Intel PT) to the
context switch.
Tracepoints are no good at all for non-privileged users because they
need either CAP_SYS_ADMIN or /proc/sys/kernel/perf_event_paranoid <= -1.
On the other hand, kernel software events need either CAP_SYS_ADMIN or
/proc/sys/kernel/perf_event_paranoid <= 1.
Now many distributions do default perf_event_paranoid to 1 making
context_switches a contender, except it has another problem (which is
also shared with sched:sched_switch) which is that it happens before
perf schedules events out instead of after perf schedules events in.
Whereas a privileged user can see all the events anyway, a
non-privileged user only sees events for their own processes, in other
words they see when their process was scheduled out not when it was
scheduled in. That presents two problems to use the event:
1. the information comes too late, so tools have to look ahead in the
event stream to find out what the current state is
2. if they are unlucky tracing might have stopped before the
context-switches event is recorded.
This new PERF_RECORD_SWITCH event does not have those problems
and it also has a couple of other small advantages.
It is easier to use because it is an auxiliary event (like mmap, comm
and task events) which can be enabled by setting a single bit. It is
smaller than sched:sched_switch and easier to parse.
To make the event useful for privileged users also, if the
context is cpu-wide then the event record will be
PERF_RECORD_SWITCH_CPU_WIDE which is the same as
PERF_RECORD_SWITCH except it also provides the next or
previous pid/tid.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Jiri Olsa <jolsa@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Pawel Moll <pawel.moll@arm.com>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1437471846-26995-2-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Wed, 22 Jul 2015 20:02:18 +0000 (17:02 -0300)]
perf tools: Stop copying kallsyms into the perf.data file header
Since we now ask libtraceevent, the only user of this payload, to use
perf's symbol resolution routines, there is no need to carry about
~4.5MB per perf.data when we can get it from one of the places the perf
symbol resolution looks for that symtab (debuginfo, ~/.debug/,
/proc/kallsyms, --symfs, etc), using the kernel and modules build-ids to
make sure the right table is used.
Acked-by: David Ahern <dsahern@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/n/tip-h89ituf9rso2rv1v7kjrbeda@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Wed, 22 Jul 2015 19:48:16 +0000 (16:48 -0300)]
perf tools: Stop reading the kallsyms data from perf.data
As it is not used anymore, since 'perf script' switched to asking
libtraceevent to use tools/perf's symbol resolution routines.
Acked-by: David Ahern <dsahern@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/n/tip-4ilhofz4b7o8yokvutjt9yzz@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Wed, 22 Jul 2015 19:43:37 +0000 (16:43 -0300)]
perf script: Switch from perf.data's kallsyms to perf's symbol resolver
We were storing a copy of kallsyms inside perf.data file so that we
could resolve kernel addresses to function (start, name, mod) tuples,
but that can be achieved using the symbol resolving routines we have
in symbols.c, and that are used elsewhere in tools/perf.
So, do just like 'perf trace' did and ask libtraceevent to use perf's
symbol resolution routines.
The next step is to just skip whatever kallsyms data is embedded in
older perf.data files and finally to stop storing kallsyms in the perf
data file, as the 20-bytes build-id stored in perf.data's header is
enough to find out the right symtab (be it ELF, kcore, kallsyms, etc) to
use.
Acked-by: David Ahern <dsahern@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/n/tip-d0rtb8tk9j72pz0ehw5fnp24@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Wed, 22 Jul 2015 19:16:16 +0000 (16:16 -0300)]
perf trace: Provide libtracevent with a kernel symbol resolver
So that beautifiers wanting to resolve kernel function addresses to
names can do its work, now, for instance, the 'timer' tracepoints
beautifiers works with 'perf trace', see the "function=tick..." part:
# perf trace --event timer:hrtimer_start
<SNIP>
0.000 timer:hrtimer_start:hrtimer=0xffff88026f3101c0 function=tick_sched_timer/0x0 expires=
52098339000000 softexpires=
52098339000000)
0.003 timer:hrtimer_start:hrtimer=0xffff88026f3101c0 function=tick_sched_timer/0x0 expires=
52098339000000 softexpires=
52098339000000)
<SNIP>
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: David Ahern <dsahern@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/n/tip-n4i0hxpbl1tnleiqkok47fw2@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Wed, 22 Jul 2015 19:14:29 +0000 (16:14 -0300)]
perf symbols: Provide libtraceevent callback to resolve kernel symbols
That provides the function signature expected by libtraceevent's
pevent_set_function_resolver().
Acked-by: David Ahern <dsahern@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/n/tip-ie6hvlb6u15y4ulg9j1612zg@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Wed, 22 Jul 2015 15:36:55 +0000 (12:36 -0300)]
tools lib traceevent: Allow setting an alternative symbol resolver
The perf tools have a symbol resolver that includes solving kernel
symbols using either kallsyms or ELF symtabs, and it also is using
libtraceevent to format the trace events fields, including via
subsystem specific plugins, like the "timer" one.
To solve fields like "timer:hrtimer_start"'s "function", libtraceevent
needs a way to map from its value to a function name and addr.
This patch provides a way for tools that already have symbol resolving
facilities to ask libtraceevent to use it when needing to resolve
kernel symbols.
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: David Ahern <dsahern@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-fdx1fazols17w5py26ia3bwh@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Thu, 23 Jul 2015 14:06:16 +0000 (11:06 -0300)]
perf symbols: Introduce map__is_(kernel,kmodule)()
To, with members we already have, check if a kernel level map is for the
kernel proper or for a module.
Acked-by: David Ahern <dsahern@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-m5ic7h0z2crmtj7vi1a1rj3b@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Wed, 22 Jul 2015 15:52:17 +0000 (12:52 -0300)]
perf symbols: Add front end cache for DSO symbol lookup
Acked-by: David Ahern <dsahern@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-voo94tow8wpkcc76mlkny6sc@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Tue, 21 Jul 2015 12:31:31 +0000 (14:31 +0200)]
perf header: Use argv style storage for cmdline feature data
We will reuse argv style data in following change to display counters
header showing monitored command line.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1437481927-29538-12-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Tue, 21 Jul 2015 12:31:29 +0000 (14:31 +0200)]
perf evlist: Tolerate NULL maps in propagate_maps
Tolerating NULL maps in perf_evlist__propagate_maps, so we dont need to
pass evlist with both cpus and threads maps defined.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1437481927-29538-10-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Tue, 21 Jul 2015 12:31:28 +0000 (14:31 +0200)]
perf evlist: Use bool instead of target argument in propagate_maps()
We need only bool info wether user defined her own set of cpus.
Switching target argument to bool so it could be used from places
without target object defined in following patches.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1437481927-29538-9-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Tue, 21 Jul 2015 12:31:30 +0000 (14:31 +0200)]
perf evlist: Force perf_evlist__set_maps to propagate maps through events
Forcing perf_evlist__set_maps to propagate maps through events, so
cpu/thread maps get set within evlist.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1437481927-29538-11-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Tue, 21 Jul 2015 12:31:21 +0000 (14:31 +0200)]
perf test: Check for refcnt in thread_map test
Checking also for refcnt in thread_map test.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1437481927-29538-2-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ingo Molnar [Tue, 21 Jul 2015 05:58:06 +0000 (07:58 +0200)]
Merge tag 'perf-core-for-mingo' of git://git./linux/kernel/git/acme/linux into perf/core
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
New features:
- Allow filtering out of perf's PID via 'perf record --exclude-perf'. (Wang Nan)
- 'perf trace' now supports syscall groups, like strace, i.e:
$ trace -e file touch file
Will expand 'file' into multiple, file related, syscalls. More work needed to
add extra groups for other syscall groups, and also to complement what was
added for the 'file' group, included as a proof of concept. (Arnaldo Carvalho de Melo)
- Add lock_pi stresser to 'perf bench futex', to test the kernel code
related to FUTEX_(UN)LOCK_PI. (Davidlohr Bueso)
User visible fixes:
- Apply --filter to all events in a glob matching, not just the last one. (Wang Nan)
Documentation changes:
- Document setting '-e pmu/period=N/' in the 'perf record' man page. (Kan Liang)
Infrastructure changes:
- 'perf probe' code simplifications and movements to separate files. (Masami Hiramatsu)
- Fix makefile generation under 'dash'. (Sergei Trofimovich)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Ingo Molnar [Tue, 21 Jul 2015 05:57:44 +0000 (07:57 +0200)]
Merge branch 'linus' into perf/core, to refresh the branch
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Davidlohr Bueso [Tue, 7 Jul 2015 08:55:53 +0000 (01:55 -0700)]
perf bench futex: Add lock_pi stresser
Allows a way of measuring low level kernel implementation of FUTEX_LOCK_PI and
FUTEX_UNLOCK_PI.
The program comes in two flavors:
(i) single futex (default), all threads contend on the same uaddr. For the
sake of the benchmark, we call into kernel space even when the lock is
uncontended. The kernel will set it to TID, any waters that come in and
contend for the pi futex will be handled respectively by the kernel.
(ii) -M option for multiple futexes, each thread deals with its own futex. This
is a trivial scenario and only measures kernel handling of 0->TID transition.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Mel Gorman <mgorman@suse.de>
Link: http://lkml.kernel.org/r/1436259353.12255.78.camel@stgolabs.net
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Sergei Trofimovich [Sun, 19 Jul 2015 09:30:05 +0000 (10:30 +0100)]
perf tools: Fix makefile generation under dash
Under dash 'echo -n' yields '-n' to stdout. Use printf "" instead.
Signed-off-by: Sergei Trofimovich <siarheit@google.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1437298205-29305-1-git-send-email-siarheit@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Masami Hiramatsu [Wed, 15 Jul 2015 09:14:28 +0000 (18:14 +0900)]
perf buildid: Use SBUILD_ID_SIZE macro
Introduce SBUILD_ID_SIZE macro and use it instead of using BUILD_ID_SIZE
* 2 + 1.
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Hemant Kumar <hemant@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20150715091428.8915.75265.stgit@localhost.localdomain
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Masami Hiramatsu [Wed, 15 Jul 2015 09:14:07 +0000 (18:14 +0900)]
perf probe: Move ftrace probe-event operations to probe-file.c
Move ftrace probe-event operations to probe-file.c from probe-event.c.
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Hemant Kumar <hemant@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20150715091407.8915.14316.stgit@localhost.localdomain
[ Fixed up strlist__new() calls wrt
4a77e2183fc0 ("perf strlist: Make dupstr be the...") ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Masami Hiramatsu [Wed, 15 Jul 2015 09:14:00 +0000 (18:14 +0900)]
perf probe: Simplify __add_probe_trace_events code
Simplify the __add_probe_trace_events() code by taking out the
probe_trace_event__set_name() and updating show_perf_probe_event()
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Hemant Kumar <hemant@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20150715091400.8915.85501.stgit@localhost.localdomain
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Wang Nan [Fri, 10 Jul 2015 07:36:10 +0000 (07:36 +0000)]
perf record: Allow filtering perf's pid via --exclude-perf
This patch allows 'perf record' to exclude events issued by perf itself
by '--exclude-perf' option.
Before this patch, when doing something like:
# perf record -a -e syscalls:sys_enter_write <cmd>
One could easily get result like this:
# /tmp/perf report --stdio
...
# Overhead Command Shared Object Symbol
# ........ ....... .................. ....................
#
99.99% perf libpthread-2.18.so [.] __write_nocancel
0.01% ls libc-2.18.so [.] write
0.01% sshd libc-2.18.so [.] write
...
Where most events are generated by perf itself.
A shell trick can be done to filter perf itself out:
# cat << EOF > ./tmp
> #!/bin/sh
> exec perf record -e ... --filter="common_pid != \$\$" -a sleep 10
> EOF
# chmod a+x ./tmp
# ./tmp
However, doing so is user unfriendly.
This patch extracts evsel iteration framework introduced by patch 'perf
record: Apply filter to all events in a glob matching' into
foreach_evsel_in_last_glob(), and makes exclude_perf() function append
new filter expression to each evsel selected by a '-e' selector.
To avoid losing filters if user pass '--filter' after '--exclude-perf',
this patch uses perf_evsel__append_filter() in both case, instead of
perf_evsel__set_filter() which removes old filter. As a side effect, now
it is possible to use multiple '--filter' option for one selector. They
are combinded with '&&'.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1436513770-8896-2-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Linus Torvalds [Mon, 20 Jul 2015 19:18:40 +0000 (12:18 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/s390/linux
Pull s390 fix from Martin Schwidefsky:
"Fast path fix for the thread_struct breakage"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390: adapt entry.S to the move of thread_struct
Linus Torvalds [Mon, 20 Jul 2015 19:17:55 +0000 (12:17 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/egtvedt/linux-avr32
Pull AVR32 update from Hans-Christian Egtvedt.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/egtvedt/linux-avr32:
AVR32/time: Migrate to new 'set-state' interface
Wang Nan [Fri, 10 Jul 2015 07:36:09 +0000 (07:36 +0000)]
perf record: Apply filter to all events in a glob matching
There is an old problem in perf's filter applying which first posted at
Sep. 2014 at https://lkml.org/lkml/2014/9/9/944 that, if passing
multiple events in a glob matching expression in cmdline then add
'--filter' after them, the filter will be applied on only the last one.
For example:
# dd if=/dev/zero of=/dev/null &
[1] 464
# perf record -a -e 'syscalls:sys_*_read' --filter 'common_pid != 464' sleep 0.1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.239 MB perf.data (2094 samples) ]
# perf report --stdio | tee
...
# Samples: 2K of event 'syscalls:sys_enter_read'
# Event count (approx.): 2092
...
# Samples: 2 of event 'syscalls:sys_exit_read'
# Event count (approx.): 2
...
In this example, filter only applied on 'syscalls:sys_exit_read', and
there's no way to set filter for ''syscalls:sys_enter_read'.
This patch adds a 'cmdline_group_boundary' for 'struct evsel', and
apply filter on all events between two boundary marks.
After applying this patch:
# perf record -a -e 'syscalls:sys_*_read' --filter 'common_pid != 464' sleep 0.1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.031 MB perf.data (3 samples) ]
# perf report --stdio | tee
...
# Samples: 1 of event 'syscalls:sys_enter_read'
# Event count (approx.): 1
...
# Samples: 2 of event 'syscalls:sys_exit_read'
# Event count (approx.): 2
...
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Reported-by: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1436513770-8896-1-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Mon, 20 Jul 2015 15:02:09 +0000 (12:02 -0300)]
perf trace: Support 'strace' syscall event groups
I.e.:
$ cat ~/share/perf-core/strace/groups/file
access
chmod
creat
execve
faccessat
getcwd
lstat
mkdir
open
openat
quotactl
readlink
rename
rmdir
stat
statfs
symlink
unlink
$
Then, on a quiet desktop, try running this and then moving your mouse to
see the deluge of mouse related activity:
# perf probe 'vfs_getname=getname_flags:72 pathname=filename:string'
Added new event:
probe:vfs_getname (on getname_flags:72 with pathname=filename:string)
You can now use it in all perf tools, such as:
perf record -e probe:vfs_getname -aR sleep 1
#
# trace --ev probe:vfs_getname --filter-pids 2232 -e file
0.042 (0.042 ms): mousetweaks/2235 open(filename: 0x14e3910, mode: 438 ) ...
0.042 ( ): probe:vfs_getname:(
ffffffff812230bc) pathname="/home/acme/.icons/Adwaita/cursors/xterm")
0.100 (0.100 ms): mousetweaks/2235 ... [continued]: open()) = -1 ENOENT No such file or directory
0.142 (0.018 ms): mousetweaks/2235 open(filename: 0x14c3c10, mode: 438 ) ...
0.142 ( ): probe:vfs_getname:(
ffffffff812230bc) pathname="/home/acme/.icons/Adwaita/index.theme")
0.192 (0.069 ms): mousetweaks/2235 ... [continued]: open()) = -1 ENOENT No such file or directory
0.230 (0.017 ms): mousetweaks/2235 open(filename: 0x14c3c10, mode: 438 ) ...
0.230 ( ): probe:vfs_getname:(
ffffffff812230bc) pathname="/usr/share/icons/Adwaita/cursors/xterm")
0.253 (0.041 ms): mousetweaks/2235 ... [continued]: open()) = 14
0.459 (0.008 ms): mousetweaks/2235 open(filename: 0x14e3910, mode: 438 ) ...
0.459 ( ): probe:vfs_getname:(
ffffffff812230bc) pathname="/home/acme/.icons/Adwaita/cursors/left_side")
0.468 (0.017 ms): mousetweaks/2235 ... [continued]: open()) = -1 ENOENT No such file or directory
Need to combine that raw_syscalls:sys_enter(open) + probe:vfs_getname +
raw_syscalls:sys_exit(open) sequence...
Now, if you're bored, please write some more syscall groups, like the ones
in 'strace' and send it our way :-)
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Milian Wolff <mail@milianw.de>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-a42xklu59lcbxp7bbnic74a8@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Fri, 17 Jul 2015 18:10:33 +0000 (15:10 -0300)]
perf strlist: Make parse_list() private
It is not used anywhere, expose it when/if needed.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-f6in51stj17avhk4rv11gjgg@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Fri, 17 Jul 2015 15:07:25 +0000 (12:07 -0300)]
perf strlist: Allow substitutions from file contents in a given directory
So, if we have an strlist equal to:
"file,close"
And we call it as:
struct strlist_config *config = { .dirname = "~/strace/groups", };
struct strlist *slist = strlist__new("file, close", &config);
And we have:
$ cat ~/strace/groups/file
access
open
openat
statfs
Then the resulting strlist will have these contents:
[ "access", "open", "openat", "statfs", "close" ]
This will be used to implement strace syscall groups in 'perf trace',
but can be used in some other tool, thus being implemented in 'strlist'.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-wi6l6qtomqlywwr6005jvs05@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Mon, 20 Jul 2015 15:13:34 +0000 (12:13 -0300)]
perf strlist: Make dupstr be the default and part of an extensible config parm
So that we can pass more info to strlist__new() without having to change
its function signature, just adding entries to the strlist_config struct
with sensible defaults for when those fields are not specified.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-5uaaler4931i0s9sedxjquhq@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Martin Schwidefsky [Mon, 20 Jul 2015 08:01:46 +0000 (10:01 +0200)]
s390: adapt entry.S to the move of thread_struct
git commit
0c8c0f03e3a292e031596484275c14cf39c0ab7a
"x86/fpu, sched: Dynamically allocate 'struct fpu'"
moved the thread_struct to the end of the task_struct.
This causes some of the offsets used in entry.S to overflow their
instruction operand field. To fix this use aghi to create a
dedicated pointer for the thread_struct.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Viresh Kumar [Thu, 16 Jul 2015 11:26:15 +0000 (16:56 +0530)]
AVR32/time: Migrate to new 'set-state' interface
Migrate avr32 driver to the new 'set-state' interface provided by
clockevents core, the earlier 'set-mode' interface is marked obsolete
now.
This also enables us to implement callbacks for new states of clockevent
devices, for example: ONESHOT_STOPPED.
We want to call cpu_idle_poll_ctrl() in shutdown only if we were in
oneshot or resume state earlier. Create another variable to save this
information and check that in shutdown callback.
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Linus Torvalds [Sun, 19 Jul 2015 21:45:02 +0000 (14:45 -0700)]
Linux 4.2-rc3
Linus Torvalds [Sun, 19 Jul 2015 21:18:00 +0000 (14:18 -0700)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Two fairly simple fixes: one is a change that causes us to have a very
low queue depth leading to performance issues and the other is a null
deref occasionally in tapes thanks to use after put"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: fix host max depth checking for the 'queue_depth' sysfs interface
st: null pointer dereference panic caused by use after kref_put by st_open
Linus Torvalds [Sun, 19 Jul 2015 21:12:22 +0000 (14:12 -0700)]
Merge branch 'upstream' of git://git.linux-mips.org/ralf/upstream-linus
Pull MIPS fixes from Ralf Baechle:
"Another round of MIPS fixes for 4.2.
Things are looking quite decent at this stage but the recent work on
the FPU support took its toll:
- fix an incorrect overly restrictive ifdef
- select O32 64-bit FP support for O32 binary compatibility
- remove workarounds for Sibyte SB1250 Pass1 parts. There are rare
fixing the workarounds is not worth the effort.
- patch up an outdated and now incorrect comment"
* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
MIPS: fpu.h: Allow 64-bit FPU on a 64-bit MIPS R6 CPU
MIPS: SB1: Remove support for Pass 1 parts.
MIPS: Require O32 FP64 support for MIPS64 with O32 compat
MIPS: asm-offset.c: Patch up various comments refering to the old filename.
Linus Torvalds [Sun, 19 Jul 2015 20:46:24 +0000 (13:46 -0700)]
Merge branch 'parisc-4.2-2' of git://git./linux/kernel/git/deller/parisc-linux
Pull parisc fix from Helge Deller:
"A memory leak fix from Christophe Jaillet which was introduced with
kernel 4.0 and which leads to kernel crashes on parisc after 1-3 days"
* 'parisc-4.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: mm: Fix a memory leak related to pmd not attached to the pgd
Linus Torvalds [Sun, 19 Jul 2015 20:37:44 +0000 (13:37 -0700)]
Merge tag 'armsoc-fixes' of git://git./linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Olof Johansson:
"By far most of the fixes here are updates to DTS files to deal with
some mostly minor bugs.
There's also a fix to deal with non-PM kernel configs on i.MX, a
regression fix for ethernet on PXA platforms and a dependency fix for
OMAP"
* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
ARM: keystone: dts: rename pcie nodes to help override status
ARM: keystone: dts: fix dt bindings for PCIe
ARM: pxa: fix dm9000 platform data regression
ARM: dts: Correct audio input route & set mic bias for am335x-pepper
ARM: OMAP2+: Add HAVE_ARM_SCU for AM43XX
MAINTAINERS: digicolor: add dts files
ARM: ux500: fix MMC/SD card regression
ARM: ux500: define serial port aliases
ARM: dts: OMAP5: Add #iommu-cells property to IOMMUs
ARM: dts: OMAP4: Add #iommu-cells property to IOMMUs
ARM: dts: Fix frequency scaling on Gumstix Pepper
ARM: dts: configure regulators for Gumstix Pepper
ARM: dts: omap3: overo: Update LCD panel names
ARM: dts: cros-ec-keyboard: Add support for some Japanese keys
ARM: imx6: gpc: always enable PU domain if CONFIG_PM is not set
ARM: dts: imx53-qsb: fix TVE entry
ARM: dts: mx23: fix iio-hwmon support
ARM: dts: imx27: Adjust the GPT compatible string
ARM: socfpga: dts: Fix entries order
ARM: socfpga: dts: Fix adxl34x formating and compatible string
Markos Chandras [Thu, 16 Jul 2015 14:30:04 +0000 (15:30 +0100)]
MIPS: fpu.h: Allow 64-bit FPU on a 64-bit MIPS R6 CPU
Commit
6134d94923d0 ("MIPS: asm: fpu: Allow 64-bit FPU on MIPS32 R6")
added support for 64-bit FPU on a 32-bit MIPS R6 processor but it missed
the 64-bit CPU case leading to FPU failures when requesting FR=1 mode
(which is always the case for MIPS R6 userland) when running a 32-bit
kernel on a 64-bit CPU. We also fix the MIPS R2 case.
Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
Fixes:
6134d94923d0 ("MIPS: asm: fpu: Allow 64-bit FPU on MIPS32 R6")
Reviewed-by: Paul Burton <paul.burton@imgtec.com>
Cc: <stable@vger.kernel.org> # 4.0+
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/10734/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Christophe Jaillet [Mon, 13 Jul 2015 09:32:43 +0000 (11:32 +0200)]
parisc: mm: Fix a memory leak related to pmd not attached to the pgd
Commit
0e0da48dee8d ("parisc: mm: don't count preallocated pmds")
introduced a memory leak.
After this commit, the 'return' statement in pmd_free is executed in all
cases. Even for pmd that are not attached to the pgd. So 'free_pages'
can never be called anymore, leading to a memory leak.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Mikulas Patocka <mpatocka@redhat.com>
Acked-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org # v4.0+
Signed-off-by: Helge Deller <deller@gmx.de>
Olof Johansson [Sun, 19 Jul 2015 04:06:10 +0000 (21:06 -0700)]
Merge tag 'pxa-fixes-v4.2-rc2' of https://github.com/rjarzmik/linux into fixesD
Merge "pxa fixes for v4.2" from Robert Jarzmik:
ARM: pxa: fixes for v4.2-rc2
This single fix reenables ethernet cards for several pxa boards,
broken by regulator addition to dm9000 driver.
* tag 'pxa-fixes-v4.2-rc2' of https://github.com/rjarzmik/linux:
ARM: pxa: fix dm9000 platform data regression
Linus Torvalds [Sat, 18 Jul 2015 18:03:48 +0000 (11:03 -0700)]
Merge branch 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm
Pull ARM fixes from Russell King:
"A small set of ARM fixes for -rc3, most of them not far off
one-liners, with the exception of fixing the V7 cache invalidation for
incoming SMP processors which was causing problems for SoCFPGA
devices"
* 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm:
ARM: fix __virt_to_idmap build error on !MMU
ARM: invalidate L1 before enabling coherency
ARM: 8404/1: dma-mapping: fix off-by-one error in bitmap size check
ARM: 8402/1: perf: Don't use of_node after putting it
ARM: 8400/1: use virt_to_idmap to get phys_reset address
Linus Torvalds [Sat, 18 Jul 2015 17:49:57 +0000 (10:49 -0700)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"Two families of fixes:
- Fix an FPU context related boot crash on newer x86 hardware with
larger context sizes than what most people test. To fix this
without ugly kludges or extensive reverts we had to touch core task
allocator, to allow x86 to determine the task size dynamically, at
boot time.
I've tested it on a number of x86 platforms, and I cross-built it
to a handful of architectures:
(warns) (warns)
testing x86-64: -git: pass ( 0), -tip: pass ( 0)
testing x86-32: -git: pass ( 0), -tip: pass ( 0)
testing arm: -git: pass ( 1359), -tip: pass ( 1359)
testing cris: -git: pass ( 1031), -tip: pass ( 1031)
testing m32r: -git: pass ( 1135), -tip: pass ( 1135)
testing m68k: -git: pass ( 1471), -tip: pass ( 1471)
testing mips: -git: pass ( 1162), -tip: pass ( 1162)
testing mn10300: -git: pass ( 1058), -tip: pass ( 1058)
testing parisc: -git: pass ( 1846), -tip: pass ( 1846)
testing sparc: -git: pass ( 1185), -tip: pass ( 1185)
... so I hope the cross-arch impact 'none', as intended.
(by Dave Hansen)
- Fix various NMI handling related bugs unearthed by the big asm code
rewrite and generally make the NMI code more robust and more
maintainable while at it. These changes are a bit late in the
cycle, I hope they are still acceptable.
(by Andy Lutomirski)"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/fpu, sched: Introduce CONFIG_ARCH_WANTS_DYNAMIC_TASK_STRUCT and use it on x86
x86/fpu, sched: Dynamically allocate 'struct fpu'
x86/entry/64, x86/nmi/64: Add CONFIG_DEBUG_ENTRY NMI testing code
x86/nmi/64: Make the "NMI executing" variable more consistent
x86/nmi/64: Minor asm simplification
x86/nmi/64: Use DF to avoid userspace RSP confusing nested NMI detection
x86/nmi/64: Reorder nested NMI checks
x86/nmi/64: Improve nested NMI comments
x86/nmi/64: Switch stacks on userspace NMI entry
x86/nmi/64: Remove asm code that saves CR2
x86/nmi: Enable nested do_nmi() handling for 64-bit kernels
Linus Torvalds [Sat, 18 Jul 2015 17:49:11 +0000 (10:49 -0700)]
Merge branch 'timers-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull timer fix from Ingo Molnar:
"Fix for a misplaced export that can cause build failures in certain
(rare) Kconfig situations"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tick: Move the export of tick_broadcast_oneshot_control to the proper place
Linus Torvalds [Sat, 18 Jul 2015 17:47:44 +0000 (10:47 -0700)]
Merge branch 'sched-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull scheduler fix from Ingo Molnar:
"A oneliner rq throttling fix"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/fair: Test list head instead of list entry in throttle_cfs_rq()
Linus Torvalds [Sat, 18 Jul 2015 17:44:21 +0000 (10:44 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"Mostly tooling fixes, plus a static key fix fixing /sys/devices/cpu/rdpmc"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf tools: Really allow to specify custom CC, AR or LD
perf auxtrace: Fix misplaced check for HAVE_SYNC_COMPARE_AND_SWAP_SUPPORT
perf hists browser: Take the --comm, --dsos, etc filters into account
perf symbols: Store if there is a filter in place
x86, perf: Fix static_key bug in load_mm_cr4()
tools: Copy lib/hweight.c from the kernel sources
perf tools: Fix the detached tarball wrt rbtree copy
perf thread_map: Fix the sizeof() calculation for map entries
tools lib: Improve clean target
perf stat: Fix shadow declaration of close
perf tools: Fix lockup using 32-bit compat vdso
Linus Torvalds [Sat, 18 Jul 2015 17:27:12 +0000 (10:27 -0700)]
Merge branch 'irq-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull irq fixes from Ingo Molnar:
"Misc irq fixes:
- two driver fixes
- a Xen regression fix
- a nested irq thread crash fix"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/gicv3-its: Fix mapping of LPIs to collections
genirq: Prevent resend to interrupts marked IRQ_NESTED_THREAD
genirq: Revert sparse irq locking around __cpu_up() and move it to x86 for now
gpio/davinci: Fix race in installing chained irq handler
Linus Torvalds [Sat, 18 Jul 2015 17:01:04 +0000 (10:01 -0700)]
Merge branch 'akpm' (patches from Andrew)
Merge fixes from Andrew Morton:
"25 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (25 commits)
lib/decompress: set the compressor name to NULL on error
mm/cma_debug: correct size input to bitmap function
mm/cma_debug: fix debugging alloc/free interface
mm/page_owner: set correct gfp_mask on page_owner
mm/page_owner: fix possible access violation
fsnotify: fix oops in fsnotify_clear_marks_by_group_flags()
/proc/$PID/cmdline: fixup empty ARGV case
dma-debug: skip debug_dma_assert_idle() when disabled
hexdump: fix for non-aligned buffers
checkpatch: fix long line messages about patch context
mm: clean up per architecture MM hook header files
MAINTAINERS: uclinux-h8-devel is moderated for non-subscribers
mailmap: update Sudeep Holla's email id
Update Viresh Kumar's email address
mm, meminit: suppress unused memory variable warning
configfs: fix kernel infoleak through user-controlled format string
include, lib: add __printf attributes to several function prototypes
s390/hugetlb: add hugepages_supported define
mm: hugetlb: allow hugepages_supported to be architecture specific
revert "s390/mm: make hugepages_supported a boot time decision"
...
Linus Torvalds [Sat, 18 Jul 2015 04:46:57 +0000 (21:46 -0700)]
Merge branch 'for-linus-4.2' of git://git./linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
"These are all from Filipe, and cover a few problems we've had reported
on the list recently (along with ones he found on his own)"
* 'for-linus-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: fix file corruption after cloning inline extents
Btrfs: fix order by which delayed references are run
Btrfs: fix list transaction->pending_ordered corruption
Btrfs: fix memory leak in the extent_same ioctl
Btrfs: fix shrinking truncate when the no_holes feature is enabled
Linus Torvalds [Sat, 18 Jul 2015 04:24:31 +0000 (21:24 -0700)]
Merge tag 'rtc-v4.2-2' of git://git./linux/kernel/git/abelloni/linux
Pull rtc fixes from Alexandre Belloni:
"A few fixes for the RTC susbsystem for 4.2.
The mt6397 driver was introduce in 4.2 so it is worth fixing before
the final release. I though the compilation warning for armada38x was
fixed by akpm in commit
f98b733e93e0 ("rtc-armada38x.c: remove unused
local `flags'") but he actually missed some occurrences of the
variables. Since I received 4 patches for that, I think we can
include it now.
Summary:
- fix mt6397 wakealarm creation
- remove a compilation warning for armada38x that was forgotten"
* tag 'rtc-v4.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux:
rtc: armada38x: Remove unused variable from armada38x_rtc_set_time()
rtc: mt6397: enable wakeup before registering rtc device
Linus Torvalds [Sat, 18 Jul 2015 03:53:57 +0000 (20:53 -0700)]
Merge tag 'dm-4.2-fixes-2' of git://git./linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
- revert a request-based DM core change that caused IO latency to
increase and adversely impact both throughput and system load
- fix for a use after free bug in DM core's device cleanup
- a couple DM btree removal fixes (used by dm-thinp)
- a DM thinp fix for order-5 allocation failure
- a DM thinp fix to not degrade to read-only metadata mode when in
out-of-data-space mode for longer than the 'no_space_timeout'
- fix a long-standing oversight in both dm-thinp and dm-cache by now
exporting 'needs_check' in status if it was set in metadata
- fix an embarrassing dm-cache busy-loop that caused worker threads to
eat cpu even if no IO was actively being issued to the cache device
* tag 'dm-4.2-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm cache: avoid calls to prealloc_free_structs() if possible
dm cache: avoid preallocation if no work in writeback_some_dirty_blocks()
dm cache: do not wake_worker() in free_migration()
dm cache: display 'needs_check' in status if it is set
dm thin: display 'needs_check' in status if it is set
dm thin: stay in out-of-data-space mode once no_space_timeout expires
dm: fix use after free crash due to incorrect cleanup sequence
Revert "dm: only run the queue on completion if congested or no requests pending"
dm btree: silence lockdep lock inversion in dm_btree_del()
dm thin: allocate the cell_sort_array dynamically
dm btree remove: fix bug in redistribute3
Ingo Molnar [Fri, 17 Jul 2015 10:28:12 +0000 (12:28 +0200)]
x86/fpu, sched: Introduce CONFIG_ARCH_WANTS_DYNAMIC_TASK_STRUCT and use it on x86
Don't burden architectures without dynamic task_struct sizing
with the overhead of dynamic sizing.
Also optimize the x86 code a bit by caching task_struct_size.
Acked-and-Tested-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1437128892-9831-3-git-send-email-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Dave Hansen [Fri, 17 Jul 2015 10:28:11 +0000 (12:28 +0200)]
x86/fpu, sched: Dynamically allocate 'struct fpu'
The FPU rewrite removed the dynamic allocations of 'struct fpu'.
But, this potentially wastes massive amounts of memory (2k per
task on systems that do not have AVX-512 for instance).
Instead of having a separate slab, this patch just appends the
space that we need to the 'task_struct' which we dynamically
allocate already. This saves from doing an extra slab
allocation at fork().
The only real downside here is that we have to stick everything
and the end of the task_struct. But, I think the
BUILD_BUG_ON()s I stuck in there should keep that from being too
fragile.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1437128892-9831-2-git-send-email-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Aneesh Kumar K.V [Fri, 17 Jul 2015 23:24:26 +0000 (16:24 -0700)]
lib/decompress: set the compressor name to NULL on error
Without this we end up using the previous name of the compressor in the
loop in unpack_rootfs. For example we get errors like "compression
method gzip not configured" even when we have CONFIG_DECOMPRESS_GZIP
enabled.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joonsoo Kim [Fri, 17 Jul 2015 23:24:23 +0000 (16:24 -0700)]
mm/cma_debug: correct size input to bitmap function
In CMA, 1 bit in bitmap means 1 << order_per_bits pages so size of
bitmap is cma->count >> order_per_bits rather than just cma->count.
This patch fixes it.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Stefan Strogin <stefan.strogin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joonsoo Kim [Fri, 17 Jul 2015 23:24:20 +0000 (16:24 -0700)]
mm/cma_debug: fix debugging alloc/free interface
CMA has alloc/free interface for debugging. It is intended that
alloc/free occurs in specific CMA region, but, currently, alloc/free
interface is on root dir due to the bug so we can't select CMA region
where alloc/free happens.
This patch fixes this problem by making alloc/free interface per CMA
region.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Stefan Strogin <stefan.strogin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joonsoo Kim [Fri, 17 Jul 2015 23:24:18 +0000 (16:24 -0700)]
mm/page_owner: set correct gfp_mask on page_owner
Currently, we set wrong gfp_mask to page_owner info in case of isolated
freepage by compaction and split page. It causes incorrect mixed
pageblock report that we can get from '/proc/pagetypeinfo'. This metric
is really useful to measure fragmentation effect so should be accurate.
This patch fixes it by setting correct information.
Without this patch, after kernel build workload is finished, number of
mixed pageblock is 112 among roughly 210 movable pageblocks.
But, with this fix, output shows that mixed pageblock is just 57.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joonsoo Kim [Fri, 17 Jul 2015 23:24:15 +0000 (16:24 -0700)]
mm/page_owner: fix possible access violation
When I tested my new patches, I found that page pointer which is used
for setting page_owner information is changed. This is because page
pointer is used to set new migratetype in loop. After this work, page
pointer could be out of bound. If this wrong pointer is used for
page_owner, access violation happens. Below is error message that I
got.
BUG: unable to handle kernel paging request at
0000000000b00018
IP: [<
ffffffff81025f30>] save_stack_address+0x30/0x40
PGD
1af2d067 PUD
166e0067 PMD 0
Oops: 0002 [#1] SMP
...snip...
Call Trace:
print_context_stack+0xcf/0x100
dump_trace+0x15f/0x320
save_stack_trace+0x2f/0x50
__set_page_owner+0x46/0x70
__isolate_free_page+0x1f7/0x210
split_free_page+0x21/0xb0
isolate_freepages_block+0x1e2/0x410
compaction_alloc+0x22d/0x2d0
migrate_pages+0x289/0x8b0
compact_zone+0x409/0x880
compact_zone_order+0x6d/0x90
try_to_compact_pages+0x110/0x210
__alloc_pages_direct_compact+0x3d/0xe6
__alloc_pages_nodemask+0x6cd/0x9a0
alloc_pages_current+0x91/0x100
runtest_store+0x296/0xa50
simple_attr_write+0xbd/0xe0
__vfs_write+0x28/0xf0
vfs_write+0xa9/0x1b0
SyS_write+0x46/0xb0
system_call_fastpath+0x16/0x75
This patch fixes this error by moving up set_page_owner().
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Kara [Fri, 17 Jul 2015 23:24:12 +0000 (16:24 -0700)]
fsnotify: fix oops in fsnotify_clear_marks_by_group_flags()
fsnotify_clear_marks_by_group_flags() can race with
fsnotify_destroy_marks() so when fsnotify_destroy_mark_locked() drops
mark_mutex, a mark from the list iterated by
fsnotify_clear_marks_by_group_flags() can be freed and we dereference free
memory in the loop there.
Fix the problem by keeping mark_mutex held in
fsnotify_destroy_mark_locked(). The reason why we drop that mutex is that
we need to call a ->freeing_mark() callback which may acquire mark_mutex
again. To avoid this and similar lock inversion issues, we move the call
to ->freeing_mark() callback to the kthread destroying the mark.
Signed-off-by: Jan Kara <jack@suse.cz>
Reported-by: Ashish Sangwan <a.sangwan@samsung.com>
Suggested-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Fri, 17 Jul 2015 23:24:09 +0000 (16:24 -0700)]
/proc/$PID/cmdline: fixup empty ARGV case
/proc/*/cmdline code checks if it should look at ENVP area by checking
last byte of ARGV area:
rv = access_remote_vm(mm, arg_end - 1, &c, 1, 0);
if (rv <= 0)
goto out_free_page;
If ARGV is somehow made empty (by doing execve(..., NULL, ...) or
manually setting ->arg_start and ->arg_end to equal values), the decision
will be based on byte which doesn't even belong to ARGV/ENVP.
So, quickly check if ARGV area is empty and report 0 to match previous
behaviour.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Haggai Eran [Fri, 17 Jul 2015 23:24:06 +0000 (16:24 -0700)]
dma-debug: skip debug_dma_assert_idle() when disabled
If dma-debug is disabled due to a memory error, DMA unmaps do not affect
the dma_active_cacheline radix tree anymore, and debug_dma_assert_idle()
can print false warnings.
Disable debug_dma_assert_idle() when dma_debug_disabled() is true.
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Fixes:
0abdd7a81b7e ("dma-debug: introduce debug_dma_assert_idle()")
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Vinod Koul <vinod.koul@intel.com>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: James Bottomley <JBottomley@Parallels.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Horia Geanta <horia.geanta@freescale.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Horacio Mijail Anton Quiles [Fri, 17 Jul 2015 23:24:04 +0000 (16:24 -0700)]
hexdump: fix for non-aligned buffers
A hexdump with a buf not aligned to the groupsize causes
non-naturally-aligned memory accesses. This was causing a kernel panic
on the processor BlackFin BF527, when such an unaligned buffer was fed
by the function ubifs_scanned_corruption in fs/ubifs/scan.c .
To fix this, change accesses to the contents of the buffer so they go
through get_unaligned(). This change should be harmless to unaligned-
access-capable architectures, and any performance hit should be anyway
dwarfed by the snprintf() processing time.
Signed-off-by: Horacio Mijail Antón Quiles <hmijail@gmail.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Joe Perches <joe@perches.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joe Perches [Fri, 17 Jul 2015 23:24:01 +0000 (16:24 -0700)]
checkpatch: fix long line messages about patch context
Changes in ("checkpatch: categorize some long line length checks")
now erroneously reports long line defects in patch context.
Fix it.
Signed-off-by: Joe Perches <joe@perches.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Laurent Dufour [Fri, 17 Jul 2015 23:23:58 +0000 (16:23 -0700)]
mm: clean up per architecture MM hook header files
Commit
2ae416b142b6 ("mm: new mm hook framework") introduced an empty
header file (mm-arch-hooks.h) for every architecture, even those which
doesn't need to define mm hooks.
As suggested by Geert Uytterhoeven, this could be cleaned through the use
of a generic header file included via each per architecture
asm/include/Kbuild file.
The PowerPC architecture is not impacted here since this architecture has
to defined the arch_remap MM hook.
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Suggested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Vineet Gupta <vgupta@synopsys.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Geert Uytterhoeven [Fri, 17 Jul 2015 23:23:55 +0000 (16:23 -0700)]
MAINTAINERS: uclinux-h8-devel is moderated for non-subscribers
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sudeep Holla [Fri, 17 Jul 2015 23:23:53 +0000 (16:23 -0700)]
mailmap: update Sudeep Holla's email id
Since the get_maintainer script still reports my old email id based on
few old commits, update mailmap to report new/updated address. It also
helps to fix email address for 'git shortlog'
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Viresh Kumar [Fri, 17 Jul 2015 23:23:50 +0000 (16:23 -0700)]
Update Viresh Kumar's email address
Switch to my kernel.org alias instead of a badly named gmail address,
which I rarely use.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mel Gorman [Fri, 17 Jul 2015 23:23:48 +0000 (16:23 -0700)]
mm, meminit: suppress unused memory variable warning
The kbuild test robot reported the following
tree: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
head:
14a6f1989dae9445d4532941bdd6bbad84f4c8da
commit:
3b242c66ccbd60cf47ab0e8992119d9617548c23 x86: mm: enable deferred struct page initialisation on x86-64
date: 3 days ago
config: x86_64-randconfig-x006-201527 (attached as .config)
reproduce:
git checkout
3b242c66ccbd60cf47ab0e8992119d9617548c23
# save the attached .config to linux build tree
make ARCH=x86_64
All warnings (new ones prefixed by >>):
mm/page_alloc.c: In function 'early_page_uninitialised':
>> mm/page_alloc.c:247:6: warning: unused variable 'nid' [-Wunused-variable]
int nid = early_pfn_to_nid(pfn);
It's due to the NODE_DATA macro ignoring the nid parameter on !NUMA
configurations. This patch avoids the warning by not declaring nid.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reported-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nicolas Iooss [Fri, 17 Jul 2015 23:23:45 +0000 (16:23 -0700)]
configfs: fix kernel infoleak through user-controlled format string
Some modules call config_item_init_type_name() and config_group_init_type_name()
with parameter "name" directly controlled by userspace. These two
functions call config_item_set_name() with this name used as a format
string, which can be used to leak information such as content of the
stack to userspace.
For example, make_netconsole_target() in netconsole module calls
config_item_init_type_name() with the name of a newly-created directory.
This means that the following commands give some unexpected output, with
configfs mounted in /sys/kernel/config/ and on a system with a
configured eth0 ethernet interface:
# modprobe netconsole
# mkdir /sys/kernel/config/netconsole/target_%lx
# echo eth0 > /sys/kernel/config/netconsole/target_%lx/dev_name
# echo 1 > /sys/kernel/config/netconsole/target_%lx/enabled
# echo eth0 > /sys/kernel/config/netconsole/target_%lx/dev_name
# dmesg |tail -n1
[ 142.697668] netconsole: target (target_ffffffffc0ae8080) is
enabled, disable to update parameters
The directory name is correct but %lx has been interpreted in the
internal item name, displayed here in the error message used by
store_dev_name() in drivers/net/netconsole.c.
To fix this, update every caller of config_item_set_name to use "%s"
when operating on untrusted input.
This issue was found using -Wformat-security gcc flag, once a __printf
attribute has been added to config_item_set_name().
Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Felipe Balbi <balbi@ti.com>
Acked-by: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nicolas Iooss [Fri, 17 Jul 2015 23:23:42 +0000 (16:23 -0700)]
include, lib: add __printf attributes to several function prototypes
Using __printf attributes helps to detect several format string issues
at compile time (even though -Wformat-security is currently disabled in
Makefile). For example it can detect when formatting a pointer as a
number, like the issue fixed in commit
a3fa71c40f18 ("wl18xx: show
rx_frames_per_rates as an array as it really is"), or when the arguments
do not match the format string, c.f. for example commit
5ce1aca81435
("reiserfs: fix __RASSERT format string").
To prevent similar bugs in the future, add a __printf attribute to every
function prototype which needs one in include/linux/ and lib/. These
functions were mostly found by using gcc's -Wsuggest-attribute=format
flag.
Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Felipe Balbi <balbi@ti.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dominik Dingel [Fri, 17 Jul 2015 23:23:39 +0000 (16:23 -0700)]
s390/hugetlb: add hugepages_supported define
On s390 we only can enable hugepages if the underlying hardware/hypervisor
also does support this. Common code now would assume this to be
signaled by setting HPAGE_SHIFT to 0. But on s390, where we only
support one hugepage size, there is a link between HPAGE_SHIFT and
pageblock_order.
So instead of setting HPAGE_SHIFT to 0, we will implement the check for
the hardware capability.
Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dominik Dingel [Fri, 17 Jul 2015 23:23:37 +0000 (16:23 -0700)]
mm: hugetlb: allow hugepages_supported to be architecture specific
s390 has a constant hugepage size, by setting HPAGE_SHIFT we also change
e.g. the pageblock_order, which should be independent in respect to
hugepage support.
With this patch every architecture is free to define how to check
for hugepage support.
Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dominik Dingel [Fri, 17 Jul 2015 23:23:34 +0000 (16:23 -0700)]
revert "s390/mm: make hugepages_supported a boot time decision"
Heiko noticed that the current check for hugepage support on s390 is a
little bit too harsh as systems which do not support will crash.
The reason is that pageblock_order can now get negative when we set
HPAGE_SHIFT to 0. To avoid all this and to avoid opening another can of
worms with enabling HUGETLB_PAGE_SIZE_VARIABLE I think it would be best
to simply allow architectures to define their own hugepages_supported().
Revert
bea41197ead3 ("s390/mm: make hugepages_supported a boot time
decision") in preparation.
Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dominik Dingel [Fri, 17 Jul 2015 23:23:31 +0000 (16:23 -0700)]
revert "s390/mm: change HPAGE_SHIFT type to int"
Heiko noticed that the current check for hugepage support on s390 is a
little bit too harsh as systems which do not support will crash.
The reason is that pageblock_order can now get negative when we set
HPAGE_SHIFT to 0. To avoid all this and to avoid opening another can of
worms with enabling HUGETLB_PAGE_SIZE_VARIABLE I think it would be best
to simply allow architectures to define their own hugepages_supported().
This patch (of 4): revert commit
cf54e2fce51c ("s390/mm: change
HPAGE_SHIFT type to int") in preparation.
Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morton [Fri, 17 Jul 2015 23:23:28 +0000 (16:23 -0700)]
openrisc: fix CONFIG_UID16 setting
openrisc-allnoconfig:
kernel/uid16.c: In function 'SYSC_setgroups16':
kernel/uid16.c:184:2: error: implicit declaration of function 'groups_alloc'
kernel/uid16.c:184:13: warning: assignment makes pointer from integer without a cast
openrisc shouldn't be setting CONFIG_UID16 when CONFIG_MULTIUSER=n.
Fixes:
2813893f8b197a1 ("kernel: conditionally support non-root users, groups and capabilities")
Reported-by: Fengguang Wu <fengguang.wu@gmail.com>
Cc: Iulia Manda <iulia.manda21@gmail.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michal Hocko [Fri, 17 Jul 2015 23:23:26 +0000 (16:23 -0700)]
MAINTAINERS: change mhocko's email address to
I am moving from mhocko@suse.cz to mhocko@kernel.org for kernel related
stuff.
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Iago López Galeiras [Fri, 17 Jul 2015 23:23:23 +0000 (16:23 -0700)]
fs, proc: add help for CONFIG_PROC_CHILDREN
The purpose of the option was documented in
Documentation/filesystems/proc.txt but the help text was missing.
Add small help text that also points to the documentation.
Signed-off-by: Iago López Galeiras <iago@endocode.com>
Reviewed-by: Jean Delvare <jdelvare@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jiri Slaby [Fri, 17 Jul 2015 23:23:20 +0000 (16:23 -0700)]
MAINTAINERS: switch to suse.com many-in-one (fwd)
Since suse.{de,cz} is deprecated to use (but will still work for some
time), switch to suse.com which is now to be used instead.
Signed-off-by: Jiri Slaby <jslaby@suse.com>
Acked-by: Alexander Graf <agraf@suse.de>
Acked-by: David Sterba <dsterba@suse.com>
Acked-by: Hannes Reinecke <hare@suse.com>
Acked-by: Jan Kara <jack@suse.com>
Acked-by: Jean Delvare <jdelvare@suse.com>
Acked-by: Jiri Kosina <jkosina@suse.com>
Acked-by: Michal Marek <mmarek@suse.com>
Acked-by: NeilBrown <neilb@suse.com>
Acked-by: Oliver Neukum <oneukum@suse.de>
Acked-by: Takashi Iwai <tiwai@suse.com>
Acked-by: Thomas Renninger <trenn@suse.de>
Acked-by: Tomas Cech <sleep_walker@suse.com>
Acked-by: Vojtech Pavlik <vojtech@suse.com>
Signed-off-by: Jiri Kosina <jkosina@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fabio Estevam [Sat, 4 Jul 2015 18:27:34 +0000 (15:27 -0300)]
rtc: armada38x: Remove unused variable from armada38x_rtc_set_time()
Remove the 'flags' variable in order to fix the following warning:
drivers/rtc/rtc-armada38x.c:91:22: warning: unused variable 'flags' [-Wunused-variable]
Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Wei-Ning Huang [Thu, 2 Jul 2015 08:36:56 +0000 (16:36 +0800)]
rtc: mt6397: enable wakeup before registering rtc device
rtc_sysfs_add_device checks if device can wakeup before creating the
wakealarm file in sysfs. Thus the driver must set wakeup capability
before registering the rtc device.
Signed-off-by: Wei-Ning Huang <wnhuang@google.com>
Acked-by: Eddie Huang <eddie.huang@mediatek.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Linus Torvalds [Fri, 17 Jul 2015 18:30:59 +0000 (11:30 -0700)]
Merge tag 'staging-4.2-rc3' of git://git./linux/kernel/git/gregkh/staging
Pull staging and IIO driver fixes from Greg KH:
"Here's some staging and IIO driver fixes for 4.2-rc3.
Nothing major, the majority are IIO issues that were reported, with a
few other minor staging driver fixes. All have been in linux-next for
a while with no reported issues"
* tag 'staging-4.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (25 commits)
staging: vt6656: check ieee80211_bss_conf bssid not NULL
staging: vt6655: check ieee80211_bss_conf bssid not NULL
staging:lustre: remove irq.h from socklnd.h
staging: make board support depend on OF_IRQ and CLKDEV_LOOKUP
iio: tmp006: Check channel info on write
iio: sx9500: Add missing init in sx9500_buffer_pre{en,dis}able()
iio:light:ltr501: fix regmap dependency
iio:light:ltr501: fix variable in ltr501_init
iio: sx9500: fix bug in compensation code
iio: sx9500: rework error handling of raw readings
iio: magnetometer: mmc35240: fix available sampling frequencies
iio:light:stk3310: Fix REGMAP_I2C dependency
iio: light: STK3310: un-invert proximity values
iio:adc:cc10001_adc: fix Kconfig dependency
iio: light: tcs3414: Fix bug preventing to set integration time
iio:accel:bmc150-accel: fix counting direction
iio:light:cm3323: clear bitmask before set
iio: adc: at91_adc: allow to use full range of startup time
iio: DAC: ad5624r_spi: fix bit shift of output data value
iio: proximity: sx9500: Fix proximity value
...
Linus Torvalds [Fri, 17 Jul 2015 18:24:31 +0000 (11:24 -0700)]
Merge tag 'usb-4.2-rc3' of git://git./linux/kernel/git/gregkh/usb
Pull USB driver fixes from Greg KH:
"Here's some USB driver fixes for 4.2-rc3.
The ususal number of gadget driver fixes are in here, along with some
new device ids and a build fix for the mn10300 arch which required
some symbols to be renamed in the mos7720 driver.
All have been in linux-next for a while with no reported issues"
* tag 'usb-4.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
USB: serial: Destroy serial_minors IDR on module exit
usb: gadget: f_midi: fix error recovery path
usb: phy: mxs: suspend to RAM causes NULL pointer dereference
usb: gadget: udc: fix free_irq() after request_irq() failed
usb: gadget: composite: Fix NULL pointer dereference
usb: gadget: f_fs: do not set cancel function on synchronous {read,write}
usb: f_mass_storage: limit number of reported LUNs
usb: dwc3: core: avoid NULL pointer dereference
usb: dwc2: embed storage for reg backup in struct dwc2_hsotg
usb: dwc2: host: allocate qtd before atomic enqueue
usb: dwc2: host: allocate qh before atomic enqueue
usb: musb: host: rely on port_mode to call musb_start()
USB: cp210x: add ID for Aruba Networks controllers
USB: mos7720: rename registers
USB: option: add 2020:4000 ID
Linus Torvalds [Fri, 17 Jul 2015 17:54:30 +0000 (10:54 -0700)]
Merge tag 'sound-4.2-rc3' of git://git./linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"There are two small fixes for HD-audio and USB LINE6, and the rest are
a few new quirks and device ID addition that are good enough to get
into 4.2"
* tag 'sound-4.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda/realtek: Enable HP amp and mute LED on HP Folio 9480m [v3]
ALSA: line6: Fix -EBUSY error during active monitoring
ALSA: hda - Fix a wrong busy check in alt PCM open
ALSA: hda - add codec ID for Broxton display audio codec
ALSA: usb-audio: Add MIDI support for Steinberg MI2/MI4
Linus Torvalds [Fri, 17 Jul 2015 17:40:45 +0000 (10:40 -0700)]
Merge tag 'gpio-v4.2-2' of git://git./linux/kernel/git/linusw/linux-gpio
Pull GPIO fixes from Linus Walleij:
"This is a first set of GPIO fixes for the v4.2 series, all hitting
individual drivers and nothing else (except for a documentation
oneliner. I intended to send a request earlier but life intervened)"
* tag 'gpio-v4.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
gpio: pca953x: fix nested irqs rescheduling
gpio: omap: prevent module from being unloaded while in use
gpio: max732x: Add missing dev reference to gpiochip
gpio/xilinx: Use correct address when setting initial values.
gpio: zynq: Fix problem with unbalanced pm_runtime_enable
gpio: omap: add missed spin_unlock_irqrestore in omap_gpio_irq_type
gpio: brcmstb: fix null ptr dereference in driver remove
gpio: Remove double "base" in comment
Olof Johansson [Fri, 17 Jul 2015 17:10:22 +0000 (10:10 -0700)]
Merge tag 'keystone-dts-fixes' of git://git./linux/kernel/git/ssantosh/linux-keystone into fixes
Merge "ARM: Couple of dts fixes for v4.2-rcx" from Santosh Shilimkar:
Couple of DTS fixes 4.2-rcx for Keystone EVMs:
K2E EVM boot hangs because of missing serdes driver which is needed to bring up
PCIe on K2E. These couple of fixes makes the PCIE disabled on common default and
let the specific board DTS to enable it.
* tag 'keystone-dts-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ssantosh/linux-keystone:
ARM: keystone: dts: rename pcie nodes to help override status
ARM: keystone: dts: fix dt bindings for PCIe
Signed-off-by: Olof Johansson <olof@lixom.net>
Linus Torvalds [Fri, 17 Jul 2015 17:05:00 +0000 (10:05 -0700)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"Fixes all over the place.
The rockchip and imx fixes I missed while on holidays, so I've queued
them now which makes this a bit bigger.
The rest is misc amdgpu, radeon, i915 and armada.
I think the most important thing is the ioctl fix, we dropped the
avoid compat ball, so we get to add a compat wrapper.
There is also an i915 revert to avoid a regression with existing
userspace"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (43 commits)
drm/ttm: improve uncached page deallocation.
drm/ttm: fix uncached page deallocation to properly fill page pool v3.
drm/amdgpu/dce8: Re-set VBLANK interrupt state when enabling a CRTC
drm/radeon/ci: silence a harmless PCC warning
drm/amdgpu/cz: silence some dpm debug output
drm/amdgpu/cz: store the forced dpm level
drm/amdgpu/cz: unforce dpm levels before forcing to low/high
drm/amdgpu: remove bogus check in gfx8 rb setup
drm/amdgpu: set proper index/data pair for smc regs on CZ (v2)
drm/amdgpu: disable the IP module if early_init returns -ENOENT (v2)
drm/amdgpu: stop context leak in the error path
drm/amdgpu: validate the context id in the dependencies
drm/radeon: fix user ptr race condition
drm/radeon: Don't flush the GART TLB if rdev->gart.ptr == NULL
drm/radeon: add a dpm quirk for Sapphire Radeon R9 270X 2GB GDDR5
drm/armada: avoid saving the adjusted mode to crtc->mode
drm/armada: fix overlay when partially off-screen
drm/armada: convert overlay to use drm_plane_helper_check_update()
drm/armada: fix gem object free after failed prime import
drm/armada: fix incorrect overlay plane cleanup
...
Russell King [Fri, 17 Jul 2015 09:33:04 +0000 (10:33 +0100)]
ARM: fix __virt_to_idmap build error on !MMU
Fengguang Wu reports that building ARM with !MMU results in the
following build error:
arch/arm/kernel/built-in.o: In function `__soft_restart':
>> :(.text+0x1624): undefined reference to `arch_virt_to_idmap'
Fix this by adding an appropriate IS_ENABLED(CONFIG_MMU) into the
__virt_to_idmap() inline function.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Russell King [Wed, 8 Jul 2015 23:30:24 +0000 (00:30 +0100)]
ARM: invalidate L1 before enabling coherency
We must invalidate the L1 cache before enabling coherency, otherwise
secondary CPUs can inject invalid cache lines into the coherent CPU
cluster, which could then be migrated to other CPUs. This fixes a
recent regression with SoCFPGA randomly failing to boot.
Fixes:
02b4e2756e01 ("ARM: v7 setup function should invalidate L1 cache")
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Marek Szyprowski [Wed, 8 Jul 2015 12:21:55 +0000 (13:21 +0100)]
ARM: 8404/1: dma-mapping: fix off-by-one error in bitmap size check
nr_bitmaps member of mapping structure stores the number of already
allocated bitmaps and it is interpreted as loop iterator (it starts from
0 not from 1), so a comparison against number of possible bitmap
extensions should include this fact. This patch fixes this by changing
the extension failure condition. This issue has been introduced by
commit
4d852ef8c2544ce21ae41414099a7504c61164a0 ("arm: dma-mapping: Add
support to extend DMA IOMMU mappings").
Reported-by: Hyungwon Hwang <human.hwang@samsung.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Hyungwon Hwang <human.hwang@samsung.com>
Cc: stable@vger.kernel.org # v3.15+
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Stephen Boyd [Tue, 7 Jul 2015 17:17:05 +0000 (18:17 +0100)]
ARM: 8402/1: perf: Don't use of_node after putting it
It's possible, albeit unlikely, that using the of_node here will
reference freed memory. Call of_node_put() after printing the
name to be safe.
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Vitaly Andrianov [Mon, 6 Jul 2015 15:43:18 +0000 (16:43 +0100)]
ARM: 8400/1: use virt_to_idmap to get phys_reset address
This patch is to get correct physical address of the reset function for
PAE systems, which use aliased physical memory for booting.
See the "ARM: mm: Introduce virt_to_idmap() with an arch hook" for details.
Signed-off-by: Vitaly Andrianov <vitalya@ti.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Ingo Molnar [Fri, 17 Jul 2015 12:17:19 +0000 (14:17 +0200)]
Merge tag 'perf-urgent-for-mingo' of git://git./linux/kernel/git/acme/linux into perf/urgent
Pull perf/urgent fixes from Arnaldo Carvalho de Melo:
- Fix misplaced check for HAVE_SYNC_COMPARE_AND_SWAP_SUPPORT in
the auxtrace code, which made 'perf record' fail straight away
in some architectures, even when auxtrace wasn't involved. (Adrian Hunter)
- Really allow to specify custom CC, AR or LD (Alexey Brodkin)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Andy Lutomirski [Wed, 15 Jul 2015 17:29:41 +0000 (10:29 -0700)]
x86/entry/64, x86/nmi/64: Add CONFIG_DEBUG_ENTRY NMI testing code
It turns out to be rather tedious to test the NMI nesting code.
Make it easier: add a new CONFIG_DEBUG_ENTRY option that causes
the NMI handler to pre-emptively unmask NMIs.
With this option set, errors in the repeat_nmi logic or failures
to detect that we're in a nested NMI will result in quick panics
under perf (especially if multiple counters are running at high
frequency) instead of requiring an unusual workload that
generates page faults or breakpoints inside NMIs.
I called it CONFIG_DEBUG_ENTRY instead of CONFIG_DEBUG_NMI_ENTRY
because I want to add new non-NMI checks elsewhere in the entry
code in the future, and I'd rather not add too many new config
options or add this option and then immediately rename it.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Andy Lutomirski [Wed, 15 Jul 2015 17:29:40 +0000 (10:29 -0700)]
x86/nmi/64: Make the "NMI executing" variable more consistent
Currently, "NMI executing" is one the first time an outermost
NMI hits repeat_nmi and zero thereafter. Change it to be zero
each time for consistency.
This is intended to help NMI handling fail harder if it's buggy.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Andy Lutomirski [Wed, 15 Jul 2015 17:29:39 +0000 (10:29 -0700)]
x86/nmi/64: Minor asm simplification
Replace LEA; MOV with an equivalent SUB. This saves one
instruction.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Andy Lutomirski [Wed, 15 Jul 2015 17:29:38 +0000 (10:29 -0700)]
x86/nmi/64: Use DF to avoid userspace RSP confusing nested NMI detection
We have a tricky bug in the nested NMI code: if we see RSP
pointing to the NMI stack on NMI entry from kernel mode, we
assume that we are executing a nested NMI.
This isn't quite true. A malicious userspace program can point
RSP at the NMI stack, issue SYSCALL, and arrange for an NMI to
happen while RSP is still pointing at the NMI stack.
Fix it with a sneaky trick. Set DF in the region of code that
the RSP check is intended to detect. IRET will clear DF
atomically.
( Note: other than paravirt, there's little need for all this
complexity. We could check RIP instead of RSP. )
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Andy Lutomirski [Wed, 15 Jul 2015 17:29:37 +0000 (10:29 -0700)]
x86/nmi/64: Reorder nested NMI checks
Check the repeat_nmi .. end_repeat_nmi special case first. The
next patch will rework the RSP check and, as a side effect, the
RSP check will no longer detect repeat_nmi .. end_repeat_nmi, so
we'll need this ordering of the checks.
Note: this is more subtle than it appears. The check for
repeat_nmi .. end_repeat_nmi jumps straight out of the NMI code
instead of adjusting the "iret" frame to force a repeat. This
is necessary, because the code between repeat_nmi and
end_repeat_nmi sets "NMI executing" and then writes to the
"iret" frame itself. If a nested NMI comes in and modifies the
"iret" frame while repeat_nmi is also modifying it, we'll end up
with garbage. The old code got this right, as does the new
code, but the new code is a bit more explicit.
If we were to move the check right after the "NMI executing"
check, then we'd get it wrong and have random crashes.
( Because the "NMI executing" check would jump to the code that would
modify the "iret" frame without checking if the interrupted NMI was
currently modifying it. )
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Andy Lutomirski [Wed, 15 Jul 2015 17:29:36 +0000 (10:29 -0700)]
x86/nmi/64: Improve nested NMI comments
I found the nested NMI documentation to be difficult to follow.
Improve the comments.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Andy Lutomirski [Wed, 15 Jul 2015 17:29:35 +0000 (10:29 -0700)]
x86/nmi/64: Switch stacks on userspace NMI entry
Returning to userspace is tricky: IRET can fail, and ESPFIX can
rearrange the stack prior to IRET.
The NMI nesting fixup relies on a precise stack layout and
atomic IRET. Rather than trying to teach the NMI nesting fixup
to handle ESPFIX and failed IRET, punt: run NMIs that came from
user mode on the normal kernel stack.
This will make some nested NMIs visible to C code, but the C
code is okay with that.
As a side effect, this should speed up perf: it eliminates an
RDMSR when NMIs come from user mode.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>