Andy Lutomirski [Mon, 12 Sep 2016 22:05:51 +0000 (15:05 -0700)]
x86/entry/64: Clean up and document espfix64 stack setup
The espfix64 setup code was a bit inscrutible and contained an
unnecessary push of RAX. Remove that push, update all the stack
offsets to match, and document the whole mess.
Reported-By: Borislav Petkov <bp@alien8.de>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/e5459eb10cf1175c8b36b840bc425f210d045f35.1473717910.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Andy Lutomirski [Mon, 12 Sep 2016 22:05:50 +0000 (15:05 -0700)]
selftests/x86/sigreturn: Use CX, not AX, as the scratch register
RAX is handled specially in ESPFIX64. Use CX as our scratch
register so that, if something goes wrong with RAX handling, we'll
notice.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/9ceeb24ea56925586c330dc46306f757ddea9fb5.1473717910.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Josh Poimboeuf [Wed, 24 Aug 2016 16:50:18 +0000 (11:50 -0500)]
x86/dumpstack: Remove unnecessary stack pointer arguments
When calling show_stack_log_lvl() or dump_trace() with a regs argument,
providing a stack pointer or frame pointer is redundant.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>d
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1694e2e955e3b9a73a3c3d5ba2634344014dd550.1472057064.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Josh Poimboeuf [Wed, 24 Aug 2016 16:50:17 +0000 (11:50 -0500)]
x86/dumpstack: Add get_stack_pointer() and get_frame_pointer()
The various functions involved in dumping the stack all do similar
things with regard to getting the stack pointer and the frame pointer
based on the regs and task arguments. Create helper functions to
do that instead.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/f448914885a35f333fe04da1b97a6c2cc1f80974.1472057064.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Josh Poimboeuf [Wed, 24 Aug 2016 16:50:16 +0000 (11:50 -0500)]
x86/dumpstack: Make printk_stack_address() more generally useful
Change printk_stack_address() to be useful when called by an unwinder
outside the context of dump_trace().
Specifically:
- printk_stack_address()'s 'data' argument is always used as the log
level string. Make that explicit.
- Call touch_nmi_watchdog().
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/9fbe0db05bacf66d337c162edbf61450d0cff1e2.1472057064.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Josh Poimboeuf [Wed, 24 Aug 2016 16:50:15 +0000 (11:50 -0500)]
oprofile/x86: Add regs->ip to oprofile trace
dump_trace() doesn't add the interrupted instruction's address to the
trace, so add it manually. This makes the profile more useful, and also
makes it more consistent with what perf profiling does.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Robert Richter <rric@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/6c745a83dbd69fc6857ef9b2f8be0f011d775936.1472057064.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Josh Poimboeuf [Wed, 24 Aug 2016 16:50:14 +0000 (11:50 -0500)]
perf/x86: Check perf_callchain_store() error
Add a check to perf_callchain_kernel() so that it returns early if the
callchain entry array is already full.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/dce6d60bab08be2600efd90021d9b85620646161.1472057064.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Colin Ian King [Sun, 28 Aug 2016 10:51:06 +0000 (11:51 +0100)]
selftests/x86: Fix spelling mistake "preseve" -> "preserve"
Trivial fix to spelling mistakes in printf messages.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kselftest@vger.kernel.org
Link: http://lkml.kernel.org/r/20160828105106.9763-1-colin.king@canonical.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Andy Lutomirski [Tue, 30 Aug 2016 15:04:15 +0000 (08:04 -0700)]
virtio_console: Stop doing DMA on the stack
virtio_console uses a small DMA buffer for control requests. Move
that buffer into heap memory.
Doing virtio DMA on the stack is normally okay on non-DMA-API virtio
systems (which is currently most of them), but it breaks completely
if the stack is virtually mapped (CONFIG_VMAP_STACK=y).
Tested by typing both directions using picocom aimed at /dev/hvc0.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: virtualization@lists.linux-foundation.org
Link: http://lkml.kernel.org/r/0afe68f9b4be6c95af9e7672b07acd0274c26dfe.1472569320.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Andy Lutomirski [Wed, 31 Aug 2016 00:27:57 +0000 (17:27 -0700)]
x86/mm: Improve stack-overflow #PF handling
If we get a page fault indicating kernel stack overflow, invoke
handle_stack_overflow(). To prevent us from overflowing the stack
again while handling the overflow (because we are likely to have
very little stack space left), call handle_stack_overflow() on the
double-fault stack.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/6d6cf96b3fb9b4c9aa303817e1dc4de0c7c36487.1472603235.git.luto@kernel.org
[ Minor edit. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Ingo Molnar [Thu, 8 Sep 2016 06:41:52 +0000 (08:41 +0200)]
Merge branch 'x86/mm' into x86/asm, to unify the two branches for simplicity
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Brian Gerst [Sat, 13 Aug 2016 16:38:22 +0000 (12:38 -0400)]
sched: Remove __schedule() non-standard frame annotation
Now that the x86 switch_to() uses the standard C calling convention,
the STACK_FRAME_NON_STANDARD() annotation is no longer needed.
Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1471106302-10159-8-git-send-email-brgerst@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Brian Gerst [Sat, 13 Aug 2016 16:38:21 +0000 (12:38 -0400)]
sched/x86: Fix thread_saved_pc()
thread_saved_pc() was using a completely bogus method to get the return
address. Since switch_to() was previously inlined, there was no sane way
to know where on the stack the return address was stored. Now with the
frame of a sleeping thread well defined, this can be implemented correctly.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1471106302-10159-7-git-send-email-brgerst@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Brian Gerst [Sat, 13 Aug 2016 16:38:20 +0000 (12:38 -0400)]
sched/x86: Pass kernel thread parameters in 'struct fork_frame'
Instead of setting up a fake pt_regs context, put the kernel thread
function pointer and arg into the unused callee-restored registers
of 'struct fork_frame'.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1471106302-10159-6-git-send-email-brgerst@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Brian Gerst [Sat, 13 Aug 2016 16:38:19 +0000 (12:38 -0400)]
sched/x86: Rewrite the switch_to() code
Move the low-level context switch code to an out-of-line asm stub instead of
using complex inline asm. This allows constructing a new stack frame for the
child process to make it seamlessly flow to ret_from_fork without an extra
test and branch in __switch_to(). It also improves code generation for
__schedule() by using the C calling convention instead of clobbering all
registers.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1471106302-10159-5-git-send-email-brgerst@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Brian Gerst [Sat, 13 Aug 2016 16:38:18 +0000 (12:38 -0400)]
sched/x86: Add 'struct inactive_task_frame' to better document the sleeping task stack frame
Add 'struct inactive_task_frame', which defines the layout of the stack for
a sleeping process. For now, the only defined field is the BP register
(frame pointer).
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1471106302-10159-4-git-send-email-brgerst@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Brian Gerst [Sat, 13 Aug 2016 16:38:17 +0000 (12:38 -0400)]
sched/x86/64, kgdb: Clear GDB_PS on 64-bit
switch_to() no longer saves EFLAGS, so it's bogus to look for it on the
stack. Set it to zero like 32-bit.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1471106302-10159-3-git-send-email-brgerst@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Brian Gerst [Sat, 13 Aug 2016 16:38:16 +0000 (12:38 -0400)]
sched/x86/32, kgdb: Don't use thread.ip in sleeping_thread_to_gdb_regs()
Match 64-bit and set gdb_regs[GDB_PC] to zero. thread.ip is always the
same point in the scheduler (except for newly forked processes), and will
be removed in a future patch.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1471106302-10159-2-git-send-email-brgerst@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Josh Poimboeuf [Fri, 19 Aug 2016 11:53:02 +0000 (06:53 -0500)]
x86/dumpstack/ftrace: Don't print unreliable addresses in print_context_stack_bp()
When function graph tracing is enabled, print_context_stack_bp() can
report return_to_handler() as an unreliable address, which is confusing
and misleading: return_to_handler() is really only useful as a hint for
debugging, whereas print_context_stack_bp() users only care about the
actual 'reliable' call path.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/c51aef578d8027791b38d2ad9bac0c7f499fde91.1471607358.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Josh Poimboeuf [Fri, 19 Aug 2016 11:53:01 +0000 (06:53 -0500)]
x86/dumpstack/ftrace: Mark function graph handler function as unreliable
When function graph tracing is enabled for a function, its return
address on the stack is replaced with the address of an ftrace handler
(return_to_handler).
Currently 'return_to_handler' can be reported as reliable. That's not
ideal, and can actually be misleading. When saving or dumping the
stack, you normally only care about what led up to that point (the call
path), rather than what will happen in the future (the return path).
That's especially true in the non-oops stack trace case, which isn't
used for debugging. For example, in a perf profiling operation,
reporting return_to_handler() in the trace would just be confusing.
And in the oops case, where debugging is important, "unreliable" is also
more appropriate there because it serves as a hint that graph tracing
was involved, instead of trying to imply that return_to_handler() was
the real caller.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/f8af15749c7d632d3e7f815995831d5b7f82950d.1471607358.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Josh Poimboeuf [Fri, 19 Aug 2016 11:53:00 +0000 (06:53 -0500)]
ftrace/x86: Implement HAVE_FUNCTION_GRAPH_RET_ADDR_PTR
Use the more reliable version of ftrace_graph_ret_addr() so we no longer
have to worry about the unwinder getting out of sync with the function
graph ret_stack index, which can happen if the unwinder skips any frames
before calling ftrace_graph_ret_addr().
This fixes this issue (and several others like it):
$ cat /proc/self/stack
[<
ffffffff810489a2>] save_stack_trace_tsk+0x22/0x40
[<
ffffffff81311a89>] proc_pid_stack+0xb9/0x110
[<
ffffffff813127c4>] proc_single_show+0x54/0x80
[<
ffffffff812be088>] seq_read+0x108/0x3e0
[<
ffffffff812923d7>] __vfs_read+0x37/0x140
[<
ffffffff812929d9>] vfs_read+0x99/0x140
[<
ffffffff81293f28>] SyS_read+0x58/0xc0
[<
ffffffff818af97c>] entry_SYSCALL_64_fastpath+0x1f/0xbd
[<
ffffffffffffffff>] 0xffffffffffffffff
$ echo function_graph > /sys/kernel/debug/tracing/current_tracer
$ cat /proc/self/stack
[<
ffffffff818b2428>] return_to_handler+0x0/0x27
[<
ffffffff810394cc>] print_context_stack+0xfc/0x100
[<
ffffffff818b2428>] return_to_handler+0x0/0x27
[<
ffffffff8103891b>] dump_trace+0x12b/0x350
[<
ffffffff818b2428>] return_to_handler+0x0/0x27
[<
ffffffff810489a2>] save_stack_trace_tsk+0x22/0x40
[<
ffffffff818b2428>] return_to_handler+0x0/0x27
[<
ffffffff81311a89>] proc_pid_stack+0xb9/0x110
[<
ffffffff818b2428>] return_to_handler+0x0/0x27
[<
ffffffff813127c4>] proc_single_show+0x54/0x80
[<
ffffffff818b2428>] return_to_handler+0x0/0x27
[<
ffffffff812be088>] seq_read+0x108/0x3e0
[<
ffffffff818b2428>] return_to_handler+0x0/0x27
[<
ffffffff812923d7>] __vfs_read+0x37/0x140
[<
ffffffff818b2428>] return_to_handler+0x0/0x27
[<
ffffffff812929d9>] vfs_read+0x99/0x140
[<
ffffffffffffffff>] 0xffffffffffffffff
Enabling function graph tracing causes the stack trace to change in two
ways:
First, the real call addresses are confusingly interspersed with
'return_to_handler' addresses. This issue will be fixed by the next
patch.
Second, the stack trace is offset by two frames, because the unwinder
skipped the first two frames and got out of sync with the ret_stack
index. This patch fixes this issue.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/a6d623e36f8d08f9a17bd74d804d201177a23afd.1471607358.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Josh Poimboeuf [Fri, 19 Aug 2016 11:52:59 +0000 (06:52 -0500)]
x86/dumpstack/ftrace: Convert dump_trace() callbacks to use ftrace_graph_ret_addr()
Convert print_context_stack() and print_context_stack_bp() to use the
arch-independent ftrace_graph_ret_addr() helper.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/56ec97cafc1bf2e34d1119e6443d897db406da86.1471607358.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Josh Poimboeuf [Fri, 19 Aug 2016 11:52:58 +0000 (06:52 -0500)]
ftrace: Add ftrace_graph_ret_addr() stack unwinding helpers
When function graph tracing is enabled for a function, ftrace modifies
the stack by replacing the original return address with the address of a
hook function (return_to_handler).
Stack unwinders need a way to get the original return address. Add an
arch-independent helper function for that named ftrace_graph_ret_addr().
This adds two variations of the function: one depends on
HAVE_FUNCTION_GRAPH_RET_ADDR_PTR, and the other relies on an index state
variable.
The former is recommended because, in some cases, the latter can cause
problems when the unwinder skips stack frames. It can get out of sync
with the ret_stack index and wrong addresses can be reported for the
stack trace.
Once all arches have been ported to use
HAVE_FUNCTION_GRAPH_RET_ADDR_PTR, we can get rid of the distinction.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/36bd90f762fc5e5af3929e3797a68a64906421cf.1471607358.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Josh Poimboeuf [Fri, 19 Aug 2016 11:52:57 +0000 (06:52 -0500)]
ftrace: Add return address pointer to ftrace_ret_stack
Storing this value will help prevent unwinders from getting out of sync
with the function graph tracer ret_stack. Now instead of needing a
stateful iterator, they can compare the return address pointer to find
the right ret_stack entry.
Note that an array of 50 ftrace_ret_stack structs is allocated for every
task. So when an arch implements this, it will add either 200 or 400
bytes of memory usage per task (depending on whether it's a 32-bit or
64-bit platform).
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/a95cfcc39e8f26b89a430c56926af0bb217bc0a1.1471607358.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Josh Poimboeuf [Fri, 19 Aug 2016 11:52:56 +0000 (06:52 -0500)]
ftrace: Only allocate the ret_stack 'fp' field when needed
This saves some memory when HAVE_FUNCTION_GRAPH_FP_TEST isn't defined.
On x86_64 with newer versions of gcc which have -mfentry, it saves 400
bytes per task.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/5c7747d9ea7b5cb47ef0a8ce8a6cea6bf7aa94bf.1471607358.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Josh Poimboeuf [Fri, 19 Aug 2016 11:52:55 +0000 (06:52 -0500)]
ftrace: Remove CONFIG_HAVE_FUNCTION_GRAPH_FP_TEST from config
Make HAVE_FUNCTION_GRAPH_FP_TEST a normal define, independent from
kconfig. This removes some config file pollution and simplifies the
checking for the fp test.
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/2c4e5f05054d6d367f702fd153af7a0109dd5c81.1471607358.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Andy Lutomirski [Thu, 11 Aug 2016 09:35:23 +0000 (02:35 -0700)]
x86/mm/64: Enable vmapped stacks (CONFIG_HAVE_ARCH_VMAP_STACK=y)
This allows x86_64 kernels to enable vmapped stacks by setting
HAVE_ARCH_VMAP_STACK=y - which enables the CONFIG_VMAP_STACK=y
high level Kconfig option.
There are a couple of interesting bits:
First, x86 lazily faults in top-level paging entries for the vmalloc
area. This won't work if we get a page fault while trying to access
the stack: the CPU will promote it to a double-fault and we'll die.
To avoid this problem, probe the new stack when switching stacks and
forcibly populate the pgd entry for the stack when switching mms.
Second, once we have guard pages around the stack, we'll want to
detect and handle stack overflow.
I didn't enable it on x86_32. We'd need to rework the double-fault
code a bit and I'm concerned about running out of vmalloc virtual
addresses under some workloads.
This patch, by itself, will behave somewhat erratically when the
stack overflows while RSP is still more than a few tens of bytes
above the bottom of the stack. Specifically, we'll get #PF and make
it to no_context and them oops without reliably triggering a
double-fault, and no_context doesn't know about stack overflows.
The next patch will improve that case.
Thank you to Nadav and Brian for helping me pay enough attention to
the SDM to hopefully get this right.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/c88f3e2920b18e6cc621d772a04a62c06869037e.1470907718.git.luto@kernel.org
[ Minor edits. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Andy Lutomirski [Thu, 11 Aug 2016 09:35:22 +0000 (02:35 -0700)]
dma-api: Teach the "DMA-from-stack" check about vmapped stacks
If we're using CONFIG_VMAP_STACK=y and we manage to point an sg entry
at the stack, then either the sg page will be in highmem or sg_virt()
will return the direct-map alias. In neither case will the existing
check_for_stack() implementation realize that it's a stack page.
Fix it by explicitly checking for stack pages.
This has no effect by itself. It's broken out for ease of review.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/448460622731312298bf19dcbacb1606e75de7a9.1470907718.git.luto@kernel.org
[ Minor edits. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Andy Lutomirski [Thu, 11 Aug 2016 09:35:21 +0000 (02:35 -0700)]
fork: Add generic vmalloced stack support
If CONFIG_VMAP_STACK=y is selected, kernel stacks are allocated with
__vmalloc_node_range().
Grsecurity has had a similar feature (called GRKERNSEC_KSTACKOVERFLOW=y)
for a long time.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/14c07d4fd173a5b117f51e8b939f9f4323e39899.1470907718.git.luto@kernel.org
[ Minor edits. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Ingo Molnar [Wed, 24 Aug 2016 10:11:29 +0000 (12:11 +0200)]
Merge tag 'v4.8-rc3' into x86/asm, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Borislav Petkov [Tue, 23 Aug 2016 17:23:56 +0000 (19:23 +0200)]
x86/entry: Remove outdated comment about SYSCALL targets
The comment probably meant some old AMD64 incarnation which most likely
never saw the light of day. STAR and LSTAR are two different registers
and STAR sets CS/SS(DS) selectors for *all* modes, not only 32-bit.
So simply remove that comment.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160823172356.15879-1-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Linus Torvalds [Sun, 21 Aug 2016 23:14:10 +0000 (16:14 -0700)]
Linux 4.8-rc3
Linus Torvalds [Sun, 21 Aug 2016 21:28:24 +0000 (14:28 -0700)]
Merge branch 'parisc-4.8-2' of git://git./linux/kernel/git/deller/parisc-linux
Pull two parisc fixes from Helge Deller:
"The first patch ensures that the high-res cr16 clocksource (which was
added in kernel 4.7) gets choosen as default clocksource for parisc.
The second patch moves the #define of EREFUSED down inside errno.h and
thus unbreaks building the gccgo compiler"
* 'parisc-4.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Fix order of EREFUSED define in errno.h
parisc: Fix automatic selection of cr16 clocksource
Tony Luck [Sat, 20 Aug 2016 23:27:58 +0000 (16:27 -0700)]
EDAC, skx_edac: Add EDAC driver for Skylake
This is an entirely new driver instead of yet another set of patches
to sb_edac.c because:
1) Mapping from PCI devices to socket/memory controller is significantly
different. Skylake scatters devices on a socket across a number of
PCI buses.
2) There is an extra level of interleaving via the "mcroute" register
that would be a little messy to squeeze into the old driver.
3) Validation is getting too expensive. Changes to sb_edac need to
be checked against Sandy Bridge, Ivy Bridge, Haswell, Broadwell and
Knights Landing.
Acked-by: Aristeu Rozanski <aris@redhat.com>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Helge Deller [Sat, 20 Aug 2016 09:51:38 +0000 (11:51 +0200)]
parisc: Fix order of EREFUSED define in errno.h
When building gccgo in userspace, errno.h gets parsed and the go include file
sysinfo.go is generated.
Since EREFUSED is defined to the same value as ECONNREFUSED, and ECONNREFUSED
is defined later on in errno.h, this leads to go complaining that EREFUSED
isn't defined yet.
Fix this trivial problem by moving the define of EREFUSED down after
ECONNREFUSED in errno.h (and clean up the indenting while touching this line).
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org
Helge Deller [Fri, 19 Aug 2016 20:39:02 +0000 (22:39 +0200)]
parisc: Fix automatic selection of cr16 clocksource
Commit
54b66800907 (parisc: Add native high-resolution sched_clock()
implementation) added support to use the CPU-internal cr16 counters as reliable
clocksource with the help of HAVE_UNSTABLE_SCHED_CLOCK.
Sadly the commit missed to remove the hack which prevented cr16 to become the
default clocksource even on SMP systems.
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org # 4.7+
Linus Torvalds [Fri, 19 Aug 2016 19:47:01 +0000 (12:47 -0700)]
Make the hardened user-copy code depend on having a hardened allocator
The kernel test robot reported a usercopy failure in the new hardened
sanity checks, due to a page-crossing copy of the FPU state into the
task structure.
This happened because the kernel test robot was testing with SLOB, which
doesn't actually do the required book-keeping for slab allocations, and
as a result the hardening code didn't realize that the task struct
allocation was one single allocation - and the sanity checks fail.
Since SLOB doesn't even claim to support hardening (and you really
shouldn't use it), the straightforward solution is to just make the
usercopy hardening code depend on the allocator supporting it.
Reported-by: kernel test robot <xiaolong.ye@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 19 Aug 2016 19:10:06 +0000 (12:10 -0700)]
Merge branch 'i2c/for-current' of git://git./linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"I2C has some pretty standard driver bugfixes and one minor cleanup"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: meson: Use complete() instead of complete_all()
i2c: brcmstb: Use complete() instead of complete_all()
i2c: bcm-kona: Use complete() instead of complete_all()
i2c: bcm-iproc: Use complete() instead of complete_all()
i2c: at91: fix support of the "alternative command" feature
i2c: ocores: add missed clk_disable_unprepare() on failure paths
i2c: cros-ec-tunnel: Fix usage of cros_ec_cmd_xfer()
i2c: mux: demux-pinctrl: properly roll back when adding adapter fails
Linus Torvalds [Fri, 19 Aug 2016 16:32:48 +0000 (09:32 -0700)]
Merge tag 'dm-4.8-fixes-2' of git://git./linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
- a stable fix for DM round robin multipath path selector to disable
preemption before using this_cpu_ptr()
- a slight increase in DM crypt's mempool reserves to make swap ontop
of DM crypt more performant
- a few DM raid fixes to issues found while testing changes that were
merged in v4.8-rc1
* tag 'dm-4.8-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm raid: support raid0 with missing metadata devices
dm raid: enhance attempt_restore_of_faulty_devices() to support more devices
dm raid: fix restoring of failed devices regression
dm raid: fix frozen recovery regression
dm crypt: increase mempool reserve to better support swapping
dm round robin: do not use this_cpu_ptr() without having preemption disabled
Linus Torvalds [Fri, 19 Aug 2016 16:22:50 +0000 (09:22 -0700)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Six fairly small fixes. The ipr, mpt3sas and ses ones all trigger
oopses. The megaraid one fixes an attach failure on io mapped only
cards, the fcoe one is an obvious problem in the error path and the
aacraid one is a theoretical security issue (ability to trick the
kernel into a buffer overrun)"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
ses: Fix racy cleanup of /sys in remove_dev()
mpt3sas: Fix resume on WarpDrive flash cards
ipr: Fix sync scsi scan
megaraid_sas: Fix probing cards without io port
aacraid: Check size values after double-fetch from user
fcoe: Use kfree_skb() instead of kfree()
Linus Torvalds [Fri, 19 Aug 2016 16:21:24 +0000 (09:21 -0700)]
Merge tag 'usb-4.8-rc3' of git://git./linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are a number of USB fixes for reported issues for your tree.
The normal amount of gadget fixes, xhci fixes, new device ids, and a
few other minor things. All of them have been in linux-next for a
while, the full details are in the shortlog below"
* tag 'usb-4.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (43 commits)
xhci: don't dereference a xhci member after removing xhci
usb: xhci: Fix panic if disconnect
xhci: really enqueue zero length TRBs.
xhci: always handle "Command Ring Stopped" events
cdc-acm: fix wrong pipe type on rx interrupt xfers
usb: misc: usbtest: add fix for driver hang
usb: dwc3: gadget: stop processing on HWO set
usb: dwc3: don't set last bit for ISOC endpoints
usb: gadget: rndis: free response queue during REMOTE_NDIS_RESET_MSG
usb: udc: core: fix error handling
usb: gadget: fsl_qe_udc: off by one in setup_received_handle()
usb/gadget: fix gadgetfs aio support.
usb: gadget: composite: Fix return value in case of error
usb: gadget: uvc: Fix return value in case of error
usb: gadget: fix check in sync read from ep in gadgetfs
usb: misc: usbtest: usbtest_do_ioctl may return positive integer
usb: dwc3: fix missing platform_set_drvdata() in dwc3_of_simple_probe()
usb: phy: omap-otg: Fix missing platform_set_drvdata() in omap_otg_probe()
usb: gadget: configfs: add mutex lock before unregister gadget
usb: gadget: u_ether: fix dereference after null check coverify warning
...
Linus Torvalds [Fri, 19 Aug 2016 16:06:41 +0000 (09:06 -0700)]
Merge tag 'xfs-iomap-for-linus-4.8-rc3' of git://git./linux/kernel/git/dgc/linux-xfs
Pull xfs and iomap fixes from Dave Chinner:
"Changes in this update:
Regression fixes for XFS changes introduce in 4.8-rc1:
- buffer IO accounting assert failure
- ENOSPC block accounting reservation issue
- DAX IO path page cache invalidation fix
- rmapbt on-disk block count in agf
- correct classification of rmap block type when updating AGFL.
- iomap support for attribute fork mapping
Regression fixes for iomap infrastructure in 4.8-rc1:
- fiemap: honor FIEMAP_FLAG_SYNC
- fiemap: implement FIEMAP_FLAG_XATTR support to fix XFS regression
- make mark_page_accessed and pagefault_disable usage consistent with
other IO paths"
* tag 'xfs-iomap-for-linus-4.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs:
xfs: remove OWN_AG rmap when allocating a block from the AGFL
xfs: (re-)implement FIEMAP_FLAG_XATTR
xfs: simplify xfs_file_iomap_begin
iomap: mark ->iomap_end as optional
iomap: prepare iomap_fiemap for attribute mappings
iomap: fiemap should honor the FIEMAP_FLAG_SYNC flag
iomap: remove superflous pagefault_disable from iomap_write_actor
iomap: remove superflous mark_page_accessed from iomap_write_actor
xfs: store rmapbt block count in the AGF
xfs: don't invalidate whole file on DAX read/write
xfs: fix bogus space reservation in xfs_iomap_write_allocate
xfs: don't assert fail on non-async buffers on ioacct decrement
Linus Torvalds [Fri, 19 Aug 2016 15:52:17 +0000 (08:52 -0700)]
Merge tag 'hwmon-for-linus-v4.8-rc2' of git://git./linux/kernel/git/groeck/linux-staging
Pull hwmon fixes from Guenter Roeck:
"Fix a bug in it87 driver and URLs in ftsteutates driver"
* tag 'hwmon-for-linus-v4.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (ftsteutates) Correct ftp urls in driver documentation
hwmon: (it87) Features mask must be 32 bit wide
Linus Torvalds [Fri, 19 Aug 2016 02:38:18 +0000 (19:38 -0700)]
Merge tag 'drm-fixes-for-4.8-rc3-2' of git://people.freedesktop.org/~airlied/linux
Pull more drm fixes from Dave Airlie:
"Daniel pointed out I'd missed some i915 fixes, and I also found a
single etnaviv fix I missed.
So here they are"
* tag 'drm-fixes-for-4.8-rc3-2' of git://people.freedesktop.org/~airlied/linux:
drm/etnaviv: take GPU lock later in the submit process
drm/i915: Fix modeset handling during gpu reset, v5.
drm/i915: fix aliasing_ppgtt leak
drm/i915: fix WaInsertDummyPushConstPs
drm/i915: Fix iboost setting for SKL Y/U DP DDI buffer translation entry 2
drm/i915/gen9: Give one extra block per line for SKL plane WM calculations
drm/i915: Acquire audio powerwell for HD-Audio registers
drm/i915: Add missing rpm wakelock to GGTT pread
drm/i915/fbc: FBC causes display flicker when VT-d is enabled on Skylake
drm/i915: Clean up the extra RPM ref on CHV with i915.enable_rc6=0
drm/i915: Program iboost settings for HDMI/DVI on SKL
drm/i915: Fix iboost setting for DDI with 4 lanes on SKL
drm/i915: Handle ENOSPC after failing to insert a mappable node
drm/i915: Flush GT idle status upon reset
Linus Torvalds [Fri, 19 Aug 2016 02:31:08 +0000 (19:31 -0700)]
Merge tag 'devicetree-fixes-for-4.8' of git://git./linux/kernel/git/robh/linux
Pull DeviceTree fixes from Rob Herring:
- a couple of DT node ref counting fixes
- fix __unflatten_device_tree for PPC PCI hotplug case
- rework marking irq controllers as OF_POPULATED in cases where real
driver is used.
- disable of_platform_default_populate_init on PPC. The change in
initcall order causes problems which need to be sorted out later.
* tag 'devicetree-fixes-for-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
of: fix reference counting in of_graph_get_endpoint_by_regs
of/platform: disable the of_platform_default_populate_init() for all the ppc boards
ARM: imx6: mark GPC node as not populated after irq init to probe pm domain driver
of/irq: Mark interrupt controllers as populated before initialisation
drivers/of: Validate device node in __unflatten_device_tree()
of: Delete an unnecessary check before the function call "of_node_put"
Linus Torvalds [Fri, 19 Aug 2016 01:54:40 +0000 (18:54 -0700)]
Merge tag '4.8-doc-fixes' of git://git.lwn.net/linux
Pull documentation fixes from Jonathan Corbet:
"Three small fixes for Sphinx-formatted documentation generation"
* tag '4.8-doc-fixes' of git://git.lwn.net/linux:
doc-rst: customize RTD theme, drop padding of inline literal
docs: kernel-documentation: remove some highlight directives
docs: Set the Sphinx default highlight language to "guess"
Dave Airlie [Thu, 18 Aug 2016 22:51:13 +0000 (08:51 +1000)]
Merge tag 'drm-intel-fixes-2016-08-15' of git://anongit.freedesktop.org/drm-intel into drm-fixes
Collection of i915 fixes.
* tag 'drm-intel-fixes-2016-08-15' of git://anongit.freedesktop.org/drm-intel:
drm/i915: Fix modeset handling during gpu reset, v5.
drm/i915: fix aliasing_ppgtt leak
drm/i915: fix WaInsertDummyPushConstPs
drm/i915: Fix iboost setting for SKL Y/U DP DDI buffer translation entry 2
drm/i915/gen9: Give one extra block per line for SKL plane WM calculations
drm/i915: Acquire audio powerwell for HD-Audio registers
drm/i915: Add missing rpm wakelock to GGTT pread
drm/i915/fbc: FBC causes display flicker when VT-d is enabled on Skylake
drm/i915: Clean up the extra RPM ref on CHV with i915.enable_rc6=0
drm/i915: Program iboost settings for HDMI/DVI on SKL
drm/i915: Fix iboost setting for DDI with 4 lanes on SKL
drm/i915: Handle ENOSPC after failing to insert a mappable node
drm/i915: Flush GT idle status upon reset
Dave Airlie [Thu, 18 Aug 2016 22:50:42 +0000 (08:50 +1000)]
Merge branch 'drm-etnaviv-fixes' of git://git.pengutronix.de/git/lst/linux into drm-fixes
Single GPU recovery fix
* 'drm-etnaviv-fixes' of git://git.pengutronix.de/git/lst/linux:
drm/etnaviv: take GPU lock later in the submit process
Linus Torvalds [Thu, 18 Aug 2016 22:09:41 +0000 (15:09 -0700)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"An initrd microcode loading fix, and an SMP bootup topology setup fix
to resolve crashes on SGI/UV systems if the BIOS is configured in a
certain way"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/smp: Fix __max_logical_packages value setup
x86/microcode/AMD: Fix initrd loading with CONFIG_RANDOMIZE_MEMORY=y
Linus Torvalds [Thu, 18 Aug 2016 22:08:31 +0000 (15:08 -0700)]
Merge branch 'timers-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull timer fixes from Ingo Molnar:
"Three clocksource driver fixes"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
clocksource/drivers/mips-gic-timer: Make gic_clocksource_of_init() return int
clocksource/drivers/kona: Fix get_counter() error handling
clocksource/drivers/time-armada-370-xp: Fix the clock reference
Linus Torvalds [Thu, 18 Aug 2016 22:07:21 +0000 (15:07 -0700)]
Merge branch 'sched-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
"Two cputime fixes - hopefully the last ones"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/cputime: Resync steal time when guest & host lose sync
sched/cputime: Fix NO_HZ_FULL getrusage() monotonicity regression
Linus Torvalds [Thu, 18 Aug 2016 22:04:53 +0000 (15:04 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"Mostly tooling fixes, but also start/stop filter related fixes, a perf
event read() fix, a fix uncovered by fuzzing, and an uprobes leak fix"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/core: Check return value of the perf_event_read() IPI
perf/core: Enable mapping of the stop filters
perf/core: Update filters only on executable mmap
perf/core: Fix file name handling for start/stop filters
perf/core: Fix event_function_local()
uprobes: Fix the memcg accounting
perf intel-pt: Fix occasional decoding errors when tracing system-wide
tools: Sync kvm related header files for arm64 and s390
perf probe: Release resources on error when handling exit paths
perf probe: Check for dup and fdopen failures
perf symbols: Fix annotation of objects with debuginfo files
perf script: Don't disable use_callchain if input is pipe
perf script: Show proper message when failed list scripts
perf jitdump: Add the right header to get the major()/minor() definitions
perf ppc64le: Fix build failure when libelf is not present
perf tools mem: Fix -t store option for record command
perf intel-pt: Fix ip compression
Linus Torvalds [Thu, 18 Aug 2016 20:45:48 +0000 (13:45 -0700)]
Merge branch 'locking-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull locking fixes from Ingo Molnar:
"Two lockless_dereference() related fixes"
* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/barriers: Suppress sparse warnings in lockless_dereference()
Revert "drm/fb-helper: Reduce READ_ONCE(master) to lockless_dereference"
Linus Torvalds [Thu, 18 Aug 2016 18:17:13 +0000 (11:17 -0700)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
- Avoid a literal load with the MMU off on the CPU resume path
(potential inconsistency between cache and RAM)
- Build error with CONFIG_ACPI=n fixed
- Compiler warning in the arch/arm64/mm/dump.c code fixed
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: Fix shift warning in arch/arm64/mm/dump.c
arm64: kernel: avoid literal load of virtual address with MMU off
arm64: Fix NUMA build error when !CONFIG_ACPI
Linus Torvalds [Thu, 18 Aug 2016 18:13:20 +0000 (11:13 -0700)]
Merge branch 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm
Pull ARM fixes from Russell King:
"Only three fixes this time:
- Emil found an overflow problem with the memory layout sanity check.
- Ard Biesheuvel noticed that late-allocated page tables (for EFI)
weren't being properly constructed.
- Guenter Roeck reported a problem found on qemu caused by the recent
addr_limit changes"
* 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: fix address limit restoration for undefined instructions
ARM: 8591/1: mm: use fully constructed struct pages for EFI pgd allocations
ARM: 8590/1: sanity_check_meminfo(): avoid overflow on vmalloc_limit
Linus Torvalds [Thu, 18 Aug 2016 18:09:43 +0000 (11:09 -0700)]
Merge tag 'pm-4.8-rc3' of git://git./linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"More hibernation-related material: one fix for a recent regression in
the core, one small cleanup of the x86-64 resume code and a
documentation update.
Specifics:
- Fix a hibernate core regression resulting from uncovering a latent
bug in its implementation of memory bitmaps by a recent commit
(James Morse).
- Use __pa() to compute a physical address in the x86-64 code
finalizing resume from hibernation (Rafael Wysocki).
- Update power management documentation related to system sleep
states to remove outdated information from it and to add a
description of a recently introduced hibernation debug feature to
it (Rafael Wysocki)"
* tag 'pm-4.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM / hibernate: Fix rtree_next_node() to avoid walking off list ends
x86/power/64: Use __pa() for physical address computation
PM / sleep: Update some system sleep documentation
Linus Torvalds [Thu, 18 Aug 2016 17:58:50 +0000 (10:58 -0700)]
Merge tag 'drm-fixes-for-4.8-rc3' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"Pretty quiet so far:
- a few amdgpu/radeon fixup for pcie pm changes
- a couple of amdgpu fixes
- some build fixes
- printk fix"
* tag 'drm-fixes-for-4.8-rc3' of git://people.freedesktop.org/~airlied/linux:
drm/amdgpu: Change GART offset to 64-bit
drm/mediatek: add ARM_SMCCC dependency
drm/mediatek: add CONFIG_OF dependency
drm/mediatek: add COMMON_CLK dependency
drm/amdgpu: Fix memory trashing if UVD ring test fails
drm/amdgpu: fix vm init error path
drm/amdkfd: print doorbell offset as a hex value
Revert "drm/radeon: work around lack of upstream ACPI support for D3cold"
Revert "drm/amdgpu: work around lack of upstream ACPI support for D3cold"
Josh Poimboeuf [Thu, 18 Aug 2016 15:59:08 +0000 (10:59 -0500)]
x86/dumpstack: Remove 64-byte gap at end of irq stack
There has been a 64-byte gap at the end of the irq stack for at least 12
years. It predates git history, and I can't find any good reason for
it. Remove it. What's the worst that could happen?
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/14f9281c5475cc44af95945ea7546bff2e3836db.1471535549.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Josh Poimboeuf [Thu, 18 Aug 2016 15:59:07 +0000 (10:59 -0500)]
proc: Fix return address printk conversion specifer in /proc/<pid>/stack
When printing call return addresses found on a stack, /proc/<pid>/stack
can sometimes give a confusing result. If the call instruction was the
last instruction in the function (which can happen when calling a
noreturn function), '%pS' will incorrectly display the name of the
function which happens to be next in the object code, rather than the
name of the actual calling function.
Use '%pB' instead, which was created for this exact purpose.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/47ad2821e5ebdbed1fbf83fb85424ae4fbdf8b6e.1471535549.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Josh Poimboeuf [Thu, 18 Aug 2016 15:59:06 +0000 (10:59 -0500)]
x86/dumpstack: Fix x86_32 kernel_stack_pointer() previous stack access
On x86_32, when an interrupt happens from kernel space, SS and SP aren't
pushed and the existing stack is used. So pt_regs is effectively two
words shorter, and the previous stack pointer is normally the memory
after the shortened pt_regs, aka '®s->sp'.
But in the rare case where the interrupt hits right after the stack
pointer has been changed to point to an empty stack, like for example
when call_on_stack() is used, the address immediately after the
shortened pt_regs is no longer on the stack. In that case, instead of
'®s->sp', the previous stack pointer should be retrieved from the
beginning of the current stack page.
kernel_stack_pointer() wants to do that, but it forgets to dereference
the pointer. So instead of returning a pointer to the previous stack,
it returns a pointer to the beginning of the current stack.
Note that it's probably outside of kernel_stack_pointer()'s scope to be
switching stacks at all. The x86_64 version of this function doesn't do
it, and it would be better for the caller to do it if necessary. But
that's a patch for another day. This just fixes the original intent.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes:
0788aa6a23cb ("x86: Prepare removal of previous_esp from i386 thread_info structure")
Link: http://lkml.kernel.org/r/472453d6e9f6a2d4ab16aaed4935f43117111566.1471535549.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Josh Poimboeuf [Thu, 18 Aug 2016 15:59:05 +0000 (10:59 -0500)]
x86/head: Remove useless zeroed word
This zeroed word has no apparent purpose, so remove it.
Brian Gerst says:
"FYI the word used to be the SS segment selector for the LSS
instruction, which isn't needed in 64-bit mode."
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/b056855c295bbb3825b97c1e9f7958539a4d6cf2.1471535549.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Josh Poimboeuf [Thu, 18 Aug 2016 15:59:04 +0000 (10:59 -0500)]
x86/dumpstack: Remove extra brackets around "<EOE>"
When starting the dump of an exception stack, it shows "<<EOE>>" instead
of "<EOE>". print_trace_stack() already adds brackets, no need to add
them again.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/77f185fd5b81845869b400aa619415458df6b6cc.1471535549.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Josh Poimboeuf [Thu, 18 Aug 2016 15:59:03 +0000 (10:59 -0500)]
x86/asm/head: Rename 'stack_start' -> 'initial_stack'
The 'stack_start' variable is similar in usage to 'initial_code' and
'initial_gs': they're all stored in head_64.S and they're all updated by
SMP and ACPI suspend before starting a CPU.
Rename it to 'initial_stack' to be consistent with the others.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/87063d773a3212051b77e17b0ee427f6582a5050.1471535549.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Josh Poimboeuf [Thu, 18 Aug 2016 15:59:02 +0000 (10:59 -0500)]
x86/asm/head: Remove unused init_rsp variable extern
There is no init_rsp variable. Remove its extern.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/c183bbecd5730d84e8c6aff3824537c1c1bf3591.1471535549.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Josh Poimboeuf [Thu, 18 Aug 2016 15:59:01 +0000 (10:59 -0500)]
x86/dumpstack: Remove show_trace()
There are a bewildering array of options for dumping the stack.
Simplify things a little by removing show_trace(), which is unused.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/fe02292eac9d409001ec0cf6d06f90ced242570d.1471535549.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Ingo Molnar [Thu, 18 Aug 2016 16:41:12 +0000 (18:41 +0200)]
Merge branch 'x86/urgent' into x86/asm, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Johannes Berg [Thu, 11 Aug 2016 09:50:22 +0000 (11:50 +0200)]
locking/barriers: Suppress sparse warnings in lockless_dereference()
After Peter's commit:
331b6d8c7afc ("locking/barriers: Validate lockless_dereference() is used on a pointer type")
... we get a lot of sparse warnings (one for every rcu_dereference, and more)
since the expression here is assigning to the wrong address space.
Instead of validating that 'p' is a pointer this way, instead make
it fail compilation when it's not by using sizeof(*(p)). This will
not cause any sparse warnings (tested, likely since the address
space is irrelevant for sizeof), and will fail compilation when
'p' isn't a pointer type.
Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes:
331b6d8c7afc ("locking/barriers: Validate lockless_dereference() is used on a pointer type")
Link: http://lkml.kernel.org/r/1470909022-687-2-git-send-email-johannes@sipsolutions.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Johannes Berg [Thu, 11 Aug 2016 09:50:21 +0000 (11:50 +0200)]
Revert "drm/fb-helper: Reduce READ_ONCE(master) to lockless_dereference"
This reverts commit:
fa7d81bb3c269 ("drm/fb-helper: Reduce READ_ONCE(master) to lockless_dereference")
As Peter explained:
[...] lockless_dereference() is _stronger_ than READ_ONCE(), not weaker.
[...]
Also, clue is in the name: 'dereference', you don't actually dereference
the pointer here, only load it.
My next patch breaks the compile without this revert, because it assumes
you want to deference and thus also need the struct type visible (which
it isn't here), so revert it.
Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1470909022-687-1-git-send-email-johannes@sipsolutions.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Catalin Marinas [Fri, 5 Dec 2014 12:34:54 +0000 (12:34 +0000)]
arm64: Fix shift warning in arch/arm64/mm/dump.c
When building with 48-bit VAs and 16K page configuration, it's possible
to get the following warning when building the arm64 page table dumping
code:
arch/arm64/mm/dump.c: In function ‘walk_pud’:
arch/arm64/mm/dump.c:274:102: warning: right shift count >= width of type [-Wshift-count-overflow]
This is because pud_offset(pgd, 0) performs a shift to the right by 36
while the value 0 has the type 'int' by default, therefore 32-bit.
This patch modifies all the p*_offset() uses in arch/arm64/mm/dump.c to
use 0UL for the address argument.
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Wanpeng Li [Wed, 17 Aug 2016 02:05:46 +0000 (10:05 +0800)]
sched/cputime: Resync steal time when guest & host lose sync
Commit:
57430218317e ("sched/cputime: Count actually elapsed irq & softirq time")
... fixed a bug but also triggered a regression:
On an i5 laptop, 4 pCPUs, 4vCPUs for one full dynticks guest, there are four
CPU hog processes(for loop) running in the guest, I hot-unplug the pCPUs
on host one by one until there is only one left, then observe CPU utilization
via 'top' in the guest, it shows:
100% st for cpu0(housekeeping)
75% st for other CPUs (nohz full mode)
However, w/o this commit it shows the correct 75% for all four CPUs.
When a guest is interrupted for a longer amount of time, missed clock ticks
are not redelivered later. Because of that, we should not limit the amount
of steal time accounted to the amount of time that the calling functions
think have passed.
However, the interval returned by account_other_time() is NOT rounded down
to the nearest jiffy, while the base interval in get_vtime_delta() it is
subtracted from is, so the max cputime limit is required to avoid underflow.
This patch fixes the regression by limiting the account_other_time() from
get_vtime_delta() to avoid underflow, and lets the other three call sites
(in account_other_time() and steal_account_process_time()) account however
much steal time the host told us elapsed.
Suggested-by: Rik van Riel <riel@redhat.com>
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Radim Krcmar <rkrcmar@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kvm@vger.kernel.org
Link: http://lkml.kernel.org/r/1471399546-4069-1-git-send-email-wanpeng.li@hotmail.com
[ Improved the changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Peter Zijlstra [Mon, 15 Aug 2016 16:38:42 +0000 (18:38 +0200)]
sched/cputime: Fix NO_HZ_FULL getrusage() monotonicity regression
Mike reports:
Roughly 10% of the time, ltp testcase getrusage04 fails:
getrusage04 0 TINFO : Expected timers granularity is 4000 us
getrusage04 0 TINFO : Using 1 as multiply factor for max [us]time increment (1000+4000us)!
getrusage04 0 TINFO : utime: 0us; stime: 179us
getrusage04 0 TINFO : utime: 3751us; stime: 0us
getrusage04 1 TFAIL : getrusage04.c:133: stime increased > 5000us:
And tracked it down to the case where the task simply doesn't get
_any_ [us]time ticks.
Update the code to assume all rtime is utime when we lack information,
thus ensuring a task that elides the tick gets time accounted.
Reported-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Tested-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Fredrik Markstrom <fredrik.markstrom@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Radim <rkrcmar@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Cc: stable@vger.kernel.org # 4.3+
Fixes:
9d7fb0427648 ("sched/cputime: Guarantee stime + utime == rtime")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
David Carrillo-Cisneros [Wed, 17 Aug 2016 20:55:04 +0000 (13:55 -0700)]
perf/core: Check return value of the perf_event_read() IPI
The call to smp_call_function_single in perf_event_read() may fail if
an invalid or not online CPU index is passed. Warn user if such bug is
present and return error.
Signed-off-by: David Carrillo-Cisneros <davidcc@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vegard Nossum <vegard.nossum@gmail.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1471467307-61171-2-git-send-email-davidcc@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Mathieu Poirier [Mon, 18 Jul 2016 16:43:07 +0000 (10:43 -0600)]
perf/core: Enable mapping of the stop filters
At this time the perf_addr_filter_needs_mmap() function will _not_
return true on a user space 'stop' filter. But stop filters need
exactly the same kind of mapping that range and start filters get.
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1468860187-318-4-git-send-email-mathieu.poirier@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Mathieu Poirier [Mon, 18 Jul 2016 16:43:06 +0000 (10:43 -0600)]
perf/core: Update filters only on executable mmap
Function perf_event_mmap() is called by the MM subsystem each time
part of a binary is loaded in memory. There can be several mapping
for a binary, many times unrelated to the code section.
Each time a section of a binary is mapped address filters are
updated, event when the map doesn't pertain to the code section.
The end result is that filters are configured based on the last map
event that was received rather than the last mapping of the code
segment.
For example if we have an executable 'main' that calls library
'libcstest.so.1.0', and that we want to collect traces on code
that is in that library. The perf cmd line for this scenario
would be:
perf record -e cs_etm// --filter 'filter 0x72c/0x40@/opt/lib/libcstest.so.1.0' --per-thread ./main
Resulting in binaries being mapped this way:
root@linaro-nano:~# cat /proc/1950/maps
00400000-
00401000 r-xp
00000000 08:02 33169 /home/linaro/main
00410000-
00411000 r--p
00000000 08:02 33169 /home/linaro/main
00411000-
00412000 rw-p
00001000 08:02 33169 /home/linaro/main
7fa2464000-
7fa2474000 rw-p
00000000 00:00 0
7fa2474000-
7fa25a4000 r-xp
00000000 08:02 543 /lib/aarch64-linux-gnu/libc-2.21.so
7fa25a4000-
7fa25b3000 ---p
00130000 08:02 543 /lib/aarch64-linux-gnu/libc-2.21.so
7fa25b3000-
7fa25b7000 r--p
0012f000 08:02 543 /lib/aarch64-linux-gnu/libc-2.21.so
7fa25b7000-
7fa25b9000 rw-p
00133000 08:02 543 /lib/aarch64-linux-gnu/libc-2.21.so
7fa25b9000-
7fa25bd000 rw-p
00000000 00:00 0
7fa25bd000-
7fa25be000 r-xp
00000000 08:02 38308 /opt/lib/libcstest.so.1.0
7fa25be000-
7fa25cd000 ---p
00001000 08:02 38308 /opt/lib/libcstest.so.1.0
7fa25cd000-
7fa25ce000 r--p
00000000 08:02 38308 /opt/lib/libcstest.so.1.0
7fa25ce000-
7fa25cf000 rw-p
00001000 08:02 38308 /opt/lib/libcstest.so.1.0
7fa25cf000-
7fa25eb000 r-xp
00000000 08:02 574 /lib/aarch64-linux-gnu/ld-2.21.so
7fa25ef000-
7fa25f2000 rw-p
00000000 00:00 0
7fa25f7000-
7fa25f9000 rw-p
00000000 00:00 0
7fa25f9000-
7fa25fa000 r--p
00000000 00:00 0 [vvar]
7fa25fa000-
7fa25fb000 r-xp
00000000 00:00 0 [vdso]
7fa25fb000-
7fa25fc000 r--p
0001c000 08:02 574 /lib/aarch64-linux-gnu/ld-2.21.so
7fa25fc000-
7fa25fe000 rw-p
0001d000 08:02 574 /lib/aarch64-linux-gnu/ld-2.21.so
7ff2ea8000-
7ff2ec9000 rw-p
00000000 00:00 0 [stack]
root@linaro-nano:~#
Before 'main()' can execute 'libcstest.so.1.0' has to be loaded in
memory. Once that has been done perf_event_mmap() has been called
4 times, with the last map starting at address 0x7fa25ce000 and
the address filter configured to start filtering when the
IP has passed over address 0x0x7fa25ce72c (0x7fa25ce000 + 0x72c).
But that is wrong since the code segment for library 'libcstest.so.1.0'
as been mapped at 0x7fa25bd000, resulting in traces not being
collected.
This patch corrects the situation by requesting that address
filters be updated only if the mapped event is for a code
segment.
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1468860187-318-3-git-send-email-mathieu.poirier@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Mathieu Poirier [Mon, 18 Jul 2016 16:43:05 +0000 (10:43 -0600)]
perf/core: Fix file name handling for start/stop filters
Binary file names have to be supplied for both range and start/stop
filters but the current code only processes the filename if an
address range filter is specified. This code adds processing of
the filename for start/stop filters.
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1468860187-318-2-git-send-email-mathieu.poirier@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Peter Zijlstra [Tue, 16 Aug 2016 11:33:26 +0000 (13:33 +0200)]
perf/core: Fix event_function_local()
Vincent reported triggering the WARN_ON_ONCE() in event_function_local().
While thinking through cases I noticed that by using event_function()
directly, we miss the inactive case usually handled by
event_function_call().
Therefore construct a blend of event_function_call() and
event_function() that handles the cases relevant to
event_function_local().
Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org # 4.5+
Fixes:
fae3fde65138 ("perf: Collapse and fix event_function_call() users")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Jiri Olsa [Mon, 15 Aug 2016 10:17:00 +0000 (12:17 +0200)]
x86/smp: Fix __max_logical_packages value setup
Frank reported kernel panic when he disabled several cores in BIOS
via following option:
Core Disable Bitmap(Hex) [0]
with number 0xFFE, which leaves 16 CPUs in system (out of 48).
The kernel panic below goes along with following messages:
smpboot: Max logical packages: 2^M
smpboot: APIC(0) Converting physical 0 to logical package 0^M
smpboot: APIC(20) Converting physical 1 to logical package 1^M
smpboot: APIC(40) Package 2 exceeds logical package map^M
smpboot: CPU 8 APICId 40 disabled^M
smpboot: APIC(60) Package 3 exceeds logical package map^M
smpboot: CPU 12 APICId 60 disabled^M
...
general protection fault: 0000 [#1] SMP^M
Modules linked in:^M
CPU: 15 PID: 1 Comm: swapper/0 Not tainted 4.7.0-rc5+ #1^M
Hardware name: SGI UV300/UV300, BIOS SGI UV 300 series BIOS 05/25/2016^M
task:
ffff8801673e0000 ti:
ffff8801673ac000 task.ti:
ffff8801673ac000^M
RIP: 0010:[<
ffffffff81014d54>] [<
ffffffff81014d54>] uncore_change_context+0xd4/0x180^M
...
[<
ffffffff810158ac>] uncore_event_init_cpu+0x6c/0x70^M
[<
ffffffff81d8c91c>] intel_uncore_init+0x1c2/0x2dd^M
[<
ffffffff81d8c75a>] ? uncore_cpu_setup+0x17/0x17^M
[<
ffffffff81002190>] do_one_initcall+0x50/0x190^M
[<
ffffffff810ab193>] ? parse_args+0x293/0x480^M
[<
ffffffff81d87365>] kernel_init_freeable+0x1a5/0x249^M
[<
ffffffff81d86a35>] ? set_debug_rodata+0x12/0x12^M
[<
ffffffff816dc19e>] kernel_init+0xe/0x110^M
[<
ffffffff816e93bf>] ret_from_fork+0x1f/0x40^M
[<
ffffffff816dc190>] ? rest_init+0x80/0x80^M
The reason for the panic is wrong value of __max_logical_packages,
which lets logical_package_map uninitialized and the uncore code
relying on this map being properly initialized (maybe we should
add some safety checks there as well).
The __max_logical_packages is computed as:
DIV_ROUND_UP(total_cpus, ncpus);
- ncpus being number of cores
With above BIOS setup we get total_cpus == 16 which set
__max_logical_packages to 2 (ncpus is 12).
Once topology_update_package_map processes CPU with logical
pkg over 2 we display above messages and fail to initialize
the physical_to_logical_pkg map, which makes the uncore code
crash.
The fix is to remove logical_package_map bitmap completely
and keep and update the logical_packages number instead.
After we enumerate all the present CPUs, we check if the
enumerated logical packages count is within its computed
maximum from BIOS data.
If it's not the case, we set this maximum to the new enumerated
value and freeze any new addition of logical packages.
The freeze is because lot of init code like uncore/rapl/cqm
depends on having maximum logical package value set to allocate
their data, so we can't change it later on.
Prarit Bhargava tested the patch and confirms that it solves
the problem:
From dmidecode:
Core Count: 24
Core Enabled: 24
Thread Count: 48
Orig kernel boot log:
[ 0.464981] smpboot: Max logical packages: 19
[ 0.469861] smpboot: APIC(0) Converting physical 0 to logical package 0
[ 0.477261] smpboot: APIC(40) Converting physical 1 to logical package 1
[ 0.484760] smpboot: APIC(80) Converting physical 2 to logical package 2
[ 0.492258] smpboot: APIC(c0) Converting physical 3 to logical package 3
1. nr_cpus=8, should stop enumerating in package 0:
[ 0.533664] smpboot: APIC(0) Converting physical 0 to logical package 0
[ 0.539596] smpboot: Max logical packages: 19
2. max_cpus=8, should still enumerate all packages:
[ 0.526494] smpboot: APIC(0) Converting physical 0 to logical package 0
[ 0.532428] smpboot: APIC(40) Converting physical 1 to logical package 1
[ 0.538456] smpboot: APIC(80) Converting physical 2 to logical package 2
[ 0.544486] smpboot: APIC(c0) Converting physical 3 to logical package 3
[ 0.550524] smpboot: Max logical packages: 19
3. nr_cpus=49 ( 2 socket + 1 core on 3rd socket), should stop enumerating in
package 2:
[ 0.521378] smpboot: APIC(0) Converting physical 0 to logical package 0
[ 0.527314] smpboot: APIC(40) Converting physical 1 to logical package 1
[ 0.533345] smpboot: APIC(80) Converting physical 2 to logical package 2
[ 0.539368] smpboot: Max logical packages: 19
4. maxcpus=49, should still enumerate all packages:
[ 0.525591] smpboot: APIC(0) Converting physical 0 to logical package 0
[ 0.531525] smpboot: APIC(40) Converting physical 1 to logical package 1
[ 0.537547] smpboot: APIC(80) Converting physical 2 to logical package 2
[ 0.543579] smpboot: APIC(c0) Converting physical 3 to logical package 3
[ 0.549624] smpboot: Max logical packages: 19
5. kdump (nr_cpus=1) works as well.
Reported-by: Frank Ramsay <framsay@redhat.com>
Tested-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Reviewed-by: Prarit Bhargava <prarit@redhat.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160815101700.GA30090@krava
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Borislav Petkov [Wed, 17 Aug 2016 11:33:14 +0000 (13:33 +0200)]
x86/microcode/AMD: Fix initrd loading with CONFIG_RANDOMIZE_MEMORY=y
Similar to:
efaad554b4ff ("x86/microcode/intel: Fix initrd loading with CONFIG_RANDOMIZE_MEMORY=y")
... fix microcode loading from the initrd on AMD by adding the
randomization offset to the microcode patch container within the initrd.
Reported-and-tested-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-tip-commits@vger.kernel.org
Link: http://lkml.kernel.org/r/20160817113314.GA19221@nazgul.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Oleg Nesterov [Wed, 17 Aug 2016 15:36:29 +0000 (17:36 +0200)]
uprobes: Fix the memcg accounting
__replace_page() wronlgy calls mem_cgroup_cancel_charge() in "success" path,
it should only do this if page_check_address() fails.
This means that every enable/disable leads to unbalanced mem_cgroup_uncharge()
from put_page(old_page), it is trivial to underflow the page_counter->count
and trigger OOM.
Reported-and-tested-by: Brenden Blanco <bblanco@plumgrid.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Cc: stable@vger.kernel.org # 3.17+
Fixes:
00501b531c47 ("mm: memcontrol: rewrite charge API")
Link: http://lkml.kernel.org/r/20160817153629.GB29724@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Dave Airlie [Thu, 18 Aug 2016 02:51:27 +0000 (12:51 +1000)]
Merge branch 'drm-fixes-4.8' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
Single 64-bit gart size fix.
* 'drm-fixes-4.8' of git://people.freedesktop.org/~agd5f/linux:
drm/amdgpu: Change GART offset to 64-bit
Rafael J. Wysocki [Thu, 18 Aug 2016 01:27:08 +0000 (03:27 +0200)]
Merge branch 'pm-sleep'
* pm-sleep:
PM / hibernate: Fix rtree_next_node() to avoid walking off list ends
x86/power/64: Use __pa() for physical address computation
PM / sleep: Update some system sleep documentation
Linus Torvalds [Thu, 18 Aug 2016 00:26:58 +0000 (17:26 -0700)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) Buffers powersave frame test is reversed in cfg80211, fix from Felix
Fietkau.
2) Remove bogus WARN_ON in openvswitch, from Jarno Rajahalme.
3) Fix some tg3 ethtool logic bugs, and one that would cause no
interrupts to be generated when rx-coalescing is set to 0. From
Satish Baddipadige and Siva Reddy Kallam.
4) QLCNIC mailbox corruption and napi budget handling fix from Manish
Chopra.
5) Fix fib_trie logic when walking the trie during /proc/net/route
output than can access a stale node pointer. From David Forster.
6) Several sctp_diag fixes from Phil Sutter.
7) PAUSE frame handling fixes in mlxsw driver from Ido Schimmel.
8) Checksum fixup fixes in bpf from Daniel Borkmann.
9) Memork leaks in nfnetlink, from Liping Zhang.
10) Use after free in rxrpc, from David Howells.
11) Use after free in new skb_array code of macvtap driver, from Jason
Wang.
12) Calipso resource leak, from Colin Ian King.
13) mediatek bug fixes (missing stats sync init, etc.) from Sean Wang.
14) Fix bpf non-linear packet write helpers, from Daniel Borkmann.
15) Fix lockdep splats in macsec, from Sabrina Dubroca.
16) hv_netvsc bug fixes from Vitaly Kuznetsov, mostly to do with VF
handling.
17) Various tc-action bug fixes, from CONG Wang.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (116 commits)
net_sched: allow flushing tc police actions
net_sched: unify the init logic for act_police
net_sched: convert tcf_exts from list to pointer array
net_sched: move tc offload macros to pkt_cls.h
net_sched: fix a typo in tc_for_each_action()
net_sched: remove an unnecessary list_del()
net_sched: remove the leftover cleanup_a()
mlxsw: spectrum: Allow packets to be trapped from any PG
mlxsw: spectrum: Unmap 802.1Q FID before destroying it
mlxsw: spectrum: Add missing rollbacks in error path
mlxsw: reg: Fix missing op field fill-up
mlxsw: spectrum: Trap loop-backed packets
mlxsw: spectrum: Add missing packet traps
mlxsw: spectrum: Mark port as active before registering it
mlxsw: spectrum: Create PVID vPort before registering netdevice
mlxsw: spectrum: Remove redundant errors from the code
mlxsw: spectrum: Don't return upon error in removal path
i40e: check for and deal with non-contiguous TCs
ixgbe: Re-enable ability to toggle VLAN filtering
ixgbe: Force VLNCTRL.VFE to be set in all VMDq paths
...
David S. Miller [Wed, 17 Aug 2016 23:27:58 +0000 (19:27 -0400)]
Merge branch 'tc_action-fixes'
Cong Wang says:
====================
net_sched: tc action fixes and updates
This patchset fixes a few regressions caused by the previous
code refactor and more. Thanks to Jamal for catching them!
Note, patch 3/7 and 4/7 are not strictly necessary for this patchset,
I just want to carry them together.
---
v4: adjust an indention for Jamal
add two more patches
v3: avoid list for fast path, suggested by Jamal
v2: replace flex_array with regular dynamic array
keep tcf_action_stats_update() in act_api.h
fix macro typos found by Amir
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Roman Mashak [Sun, 14 Aug 2016 05:35:02 +0000 (22:35 -0700)]
net_sched: allow flushing tc police actions
The act_police uses its own code to walk the
action hashtable, which leads to that we could
not flush standalone tc police actions, so just
switch to tcf_generic_walker() like other actions.
(Joint work from Roman and Cong.)
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
WANG Cong [Sun, 14 Aug 2016 05:35:01 +0000 (22:35 -0700)]
net_sched: unify the init logic for act_police
Jamal reported a crash when we create a police action
with a specific index, this is because the init logic
is not correct, we should always create one for this
case. Just unify the logic with other tc actions.
Fixes:
a03e6fe56971 ("act_police: fix a crash during removal")
Reported-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
WANG Cong [Sun, 14 Aug 2016 05:35:00 +0000 (22:35 -0700)]
net_sched: convert tcf_exts from list to pointer array
As pointed out by Jamal, an action could be shared by
multiple filters, so we can't use list to chain them
any more after we get rid of the original tc_action.
Instead, we could just save pointers to these actions
in tcf_exts, since they are refcount'ed, so convert
the list to an array of pointers.
The "ugly" part is the action API still accepts list
as a parameter, I just introduce a helper function to
convert the array of pointers to a list, instead of
relying on the C99 feature to iterate the array.
Fixes:
a85a970af265 ("net_sched: move tc_action into tcf_common")
Reported-by: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
WANG Cong [Sun, 14 Aug 2016 05:34:59 +0000 (22:34 -0700)]
net_sched: move tc offload macros to pkt_cls.h
struct tcf_exts belongs to filters, should not be visible
to plain tc actions.
Cc: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
WANG Cong [Sun, 14 Aug 2016 05:34:58 +0000 (22:34 -0700)]
net_sched: fix a typo in tc_for_each_action()
It is harmless because all users pass 'a' to this macro.
Fixes:
00175aec941e ("net/sched: Macro instead of CONFIG_NET_CLS_ACT ifdef")
Cc: Amir Vadai <amir@vadai.me>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
WANG Cong [Sun, 14 Aug 2016 05:34:57 +0000 (22:34 -0700)]
net_sched: remove an unnecessary list_del()
This list_del() for tc action is not needed actually,
because we only use this list to chain bulk operations,
therefore should not be carried for latter operations.
Fixes:
ec0595cc4495 ("net_sched: get rid of struct tcf_common")
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
WANG Cong [Sun, 14 Aug 2016 05:34:56 +0000 (22:34 -0700)]
net_sched: remove the leftover cleanup_a()
After refactoring tc_action into tcf_common, we no
longer need to cleanup temporary "actions" in list,
they are permanently stored in the hashtable.
Fixes:
a85a970af265 ("net_sched: move tc_action into tcf_common")
Reported-by: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 17 Aug 2016 23:20:24 +0000 (19:20 -0400)]
Merge branch '1GbE' of git://git./linux/kernel/git/jkirsher/net-queue
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates 2016-08-16
This series contains fixes to e1000e, igb, ixgbe and i40e.
Kshitiz Gupta provides a fix for igb to resolve the PHY delay compensation
math in several functions.
Jarod Wilson provides a fix for e1000e which had to broken up into 2
patches, first is prepares the driver for expanding the list of NICs
that have occasional ~10 hour clock jumps when being used for PTP.
Second patch actually fixes i218 silicon which has been experiencing
the clock jumps while using PTP.
Alex provides 2 patches for ixgbe now that he is back at Intel. First
fixes setting VLNCTRL.VFE bit, which was left unchanged in earlier patches
which resulted in disabling VLAN filtering for all the VFs. Second
corrects the support for disabling the VLAN tag filtering via the
feature bit.
Lastly, David fixes i40e which was causing a kernel panic when
non-contiguous traffic classes or traffic classes not starting with TC0,
were configured on a link partner switch. To fix this, changed the
logic when determining the total number of TCs enabled.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 17 Aug 2016 23:18:34 +0000 (19:18 -0400)]
Merge branch 'mlxsw-fixes'
Jiri Pirko says:
====================
mlxsw: IPv4 UC router fixes
Ido says:
Patches 1-3 fix a long standing problem in the driver's init sequence,
which manifests itself quite often when routing daemons try to configure
an IP address on registered netdevs that don't yet have an associated
vPort.
Patches 4-9 add missing packet traps for the router to work properly and
also fix ordering issue following the recent changes to the driver's init
sequence.
The last patch isn't related to the router, but fixes a general problem
in which under certain conditions packets aren't trapped to CPU.
v1->v2:
- Change order of patch 7
- Add patch 6 following Ilan's comment
- Add patchset name and cover letter
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Wed, 17 Aug 2016 14:39:37 +0000 (16:39 +0200)]
mlxsw: spectrum: Allow packets to be trapped from any PG
When packets enter the device they are classified to a priority group
(PG) buffer based on their PCP value. After their egress port and
traffic class are determined they are moved to the switch's shared
buffer and await transmission, if:
(Ingress{Port}.Usage < Thres && Ingress{Port,PG}.Usage < Thres &&
Egress{Port}.Usage < Thres && Egress{Port,TC}.Usage < Thres)
||
(Ingress{Port}.Usage < Min || Ingress{Port,PG} < Min ||
Egress{Port}.Usage < Min || Egress{Port,TC}.Usage < Min)
Packets scheduled to transmission through CPU port (trapped to CPU) use
traffic class 7, which has a zero maximum and minimum quotas. However,
when such packets arrive from PG 0 they are admitted to the shared
buffer as PG 0 has a non-zero minimum quota.
Allow all packets to be trapped to the CPU - regardless of the PG they
were classified to - by assigning a 10KB minimum quota for CPU port and
TC7.
Fixes:
8e8dfe9fdf06 ("mlxsw: spectrum: Add IEEE 802.1Qaz ETS support")
Reported-by: Tamir Winetroub <tamirw@mellanox.com>
Tested-by: Tamir Winetroub <tamirw@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Wed, 17 Aug 2016 14:39:36 +0000 (16:39 +0200)]
mlxsw: spectrum: Unmap 802.1Q FID before destroying it
Before destroying the 802.1Q FID we should first remove the VID-to-FID
mapping. This makes mlxsw_sp_fid_destroy() symmetric with regards to
mlxsw_sp_fid_create().
Fixes:
14d39461b3f4 ("mlxsw: spectrum: Use per-FID struct for the VLAN-aware bridge")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Wed, 17 Aug 2016 14:39:35 +0000 (16:39 +0200)]
mlxsw: spectrum: Add missing rollbacks in error path
While going over the code I noticed we are missing two rollbacks in the
port's creation error path. Add them and adjust the place of one of them
in the port's removal sequence so that both are symmetric.
Fixes:
56ade8fe3fe1 ("mlxsw: spectrum: Add initial support for Spectrum ASIC")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Wed, 17 Aug 2016 14:39:34 +0000 (16:39 +0200)]
mlxsw: reg: Fix missing op field fill-up
Ralue pack function needs to set op, otherwise it is 0 for add always.
Fixes:
d5a1c749d22 ("mlxsw: reg: Add Router Algorithmic LPM Unicast Entry Register definition")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Wed, 17 Aug 2016 14:39:33 +0000 (16:39 +0200)]
mlxsw: spectrum: Trap loop-backed packets
One of the conditions to generate an ICMP Redirect Message is that "the
packet is being forwarded out the same physical interface that it was
received from" (RFC 1812).
Therefore, we need to be able to trap such packets and let the kernel
decide what to do with them.
For each RIF, enable the loop-back filter, which will raise the LBERROR
trap whenever the ingress RIF equals the egress RIF.
Fixes:
99724c18fc66 ("mlxsw: spectrum: Introduce support for router interfaces")
Reported-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Elad Raz [Wed, 17 Aug 2016 14:39:32 +0000 (16:39 +0200)]
mlxsw: spectrum: Add missing packet traps
Add the following traps:
1) MTU Error: Trap packets whose size is bigger than the egress RIF's
MTU. If DF bit isn't set, traffic will continue to be routed in slow
path.
2) TTL Error: Trap packets whose TTL expired. This allows traceroute to
work properly.
3) OSPF packets.
Fixes:
7b27ce7bb9cd ("mlxsw: spectrum: Add traps needed for router implementation")
Signed-off-by: Elad Raz <eladr@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Wed, 17 Aug 2016 14:39:31 +0000 (16:39 +0200)]
mlxsw: spectrum: Mark port as active before registering it
Commit
bbf2a4757b30 ("mlxsw: spectrum: Initialize ports at the end of
init sequence") moved ports initialization to the end of the init
sequence, which means ports are the first to be removed during fini.
Since the FDB delayed work is still active when ports are removed it's
possible for it to process FDB notifications of inactive ports,
resulting in a warning message.
Fix that by marking ports as inactive only after unregistering them. The
NETDEV_UNREGISTER event will invoke bridge's driver port removal
sequence that will cause the FDB (and FDB notifications) to be flushed.
Fixes:
bbf2a4757b30 ("mlxsw: spectrum: Initialize ports at the end of init sequence")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Wed, 17 Aug 2016 14:39:30 +0000 (16:39 +0200)]
mlxsw: spectrum: Create PVID vPort before registering netdevice
After registering a netdevice it's possible for user space applications
to configure an IP address on it. From the driver's perspective, this
means a router interface (RIF) should be created for the PVID vPort.
Therefore, we must create the PVID vPort before registering the
netdevice.
Fixes:
99724c18fc66 ("mlxsw: spectrum: Introduce support for router interfaces")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>