Josh Poimboeuf [Tue, 29 Aug 2017 17:51:03 +0000 (12:51 -0500)]
objtool: Handle GCC stack pointer adjustment bug
Arnd Bergmann reported the following warning with GCC 7.1.1:
fs/fs_pin.o: warning: objtool: pin_kill()+0x139: stack state mismatch: cfa1=7+88 cfa2=7+96
And the kbuild robot reported the following warnings with GCC 5.4.1:
fs/fs_pin.o: warning: objtool: pin_kill()+0x182: return with modified stack frame
fs/quota/dquot.o: warning: objtool: dquot_alloc_inode()+0x140: stack state mismatch: cfa1=7+120 cfa2=7+128
fs/quota/dquot.o: warning: objtool: dquot_free_inode()+0x11a: stack state mismatch: cfa1=7+112 cfa2=7+120
Those warnings are caused by an unusual GCC non-optimization where it
uses an intermediate register to adjust the stack pointer. It does:
lea 0x8(%rsp), %rcx
...
mov %rcx, %rsp
Instead of the obvious:
add $0x8, %rsp
It makes no sense to use an intermediate register, so I opened a GCC bug
to track it:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81813
But it's not exactly a high-priority bug and it looks like we'll be
stuck with this issue for a while. So for now we have to track register
values when they're loaded with stack pointer offsets.
This is kind of a big workaround for a tiny problem, but c'est la vie.
I hope to eventually create a GCC plugin to implement a big chunk of
objtool's functionality. Hopefully at that point we'll be able to
remove of a lot of these GCC-isms from the objtool code.
Reported-by: Arnd Bergmann <arnd@arndb.de>
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/6a41a96884c725e7f05413bb7df40cfe824b2444.1504028945.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Jiri Slaby [Thu, 24 Aug 2017 08:06:24 +0000 (10:06 +0200)]
x86/entry/64: Use ENTRY() instead of ALIGN+GLOBAL for stub32_clone()
ALIGN+GLOBAL is effectively what ENTRY() does, so use ENTRY() which is
dedicated for exactly this purpose -- global functions.
Note that stub32_clone() is a C-like leaf function -- it has a standard
call frame -- it only switches one argument and continues by jumping
into C. Since each ENTRY() should be balanced by some END*() marker, we
add a corresponding ENDPROC() to stub32_clone() too.
Besides that, x86's custom GLOBAL macro is going to die very soon.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170824080624.7768-2-jslaby@suse.cz
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Jiri Slaby [Thu, 24 Aug 2017 08:06:23 +0000 (10:06 +0200)]
x86/fpu/math-emu: Add ENDPROC to functions
Functions in math-emu are annotated as ENTRY() symbols, but their
ends are not annotated at all. But these are standard functions
called from C, with proper stack register update etc.
Omitting the ends means:
* the annotations are not paired and we cannot deal with such functions
e.g. in objtool
* the symbols are not marked as functions in the object file
* there are no sizes of the functions in the object file
So fix this by adding ENDPROC() to each such case in math-emu.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170824080624.7768-1-jslaby@suse.cz
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Jiri Slaby [Thu, 24 Aug 2017 07:33:27 +0000 (09:33 +0200)]
x86/boot/64: Extract efi_pe_entry() from startup_64()
Similarly to the 32-bit code, efi_pe_entry body() is somehow squashed into
startup_64().
In the old days, we forced startup_64() to start at offset 0x200 and efi_pe_entry()
to start at 0x210. But this requirement was removed long time ago, in:
99f857db8857 ("x86, build: Dynamically find entry points in compressed startup code")
The way it is now makes the code less readable and illogical. Given
we can now safely extract the inlined efi_pe_entry() body from
startup_64() into a separate function, we do so.
We also annotate the function appropriatelly by ENTRY+ENDPROC.
ABI offsets are preserved:
0000000000000000 T startup_32
0000000000000200 T startup_64
0000000000000390 T efi64_stub_entry
On the top-level, it looked like:
.org 0x200
ENTRY(startup_64)
#ifdef CONFIG_EFI_STUB ; start of inlined
jmp preferred_addr
GLOBAL(efi_pe_entry)
... ; a lot of assembly (efi_pe_entry)
leaq preferred_addr(%rax), %rax
jmp *%rax
preferred_addr:
#endif ; end of inlined
... ; a lot of assembly (startup_64)
ENDPROC(startup_64)
And it is now converted into:
.org 0x200
ENTRY(startup_64)
... ; a lot of assembly (startup_64)
ENDPROC(startup_64)
#ifdef CONFIG_EFI_STUB
ENTRY(efi_pe_entry)
... ; a lot of assembly (efi_pe_entry)
leaq startup_64(%rax), %rax
jmp *%rax
ENDPROC(efi_pe_entry)
#endif
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: ard.biesheuvel@linaro.org
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20170824073327.4129-2-jslaby@suse.cz
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Jiri Slaby [Thu, 24 Aug 2017 07:33:26 +0000 (09:33 +0200)]
x86/boot/32: Extract efi_pe_entry() from startup_32()
The efi_pe_entry() body is somehow squashed into startup_32(). In the old days,
we forced startup_32() to start at offset 0x00 and efi_pe_entry() to start
at 0x10.
But this requirement was removed long time ago, in:
99f857db8857 ("x86, build: Dynamically find entry points in compressed startup code")
The way it is now makes the code less readable and illogical. Given
we can now safely extract the inlined efi_pe_entry() body from
startup_32() into a separate function, we do so and we separate it to two
functions as they are marked already: efi_pe_entry() + efi32_stub_entry().
We also annotate the functions appropriatelly by ENTRY+ENDPROC.
ABI offset is preserved:
0000 128 FUNC GLOBAL DEFAULT 6 startup_32
0080 60 FUNC GLOBAL DEFAULT 6 efi_pe_entry
00bc 68 FUNC GLOBAL DEFAULT 6 efi32_stub_entry
On the top-level, it looked like this:
ENTRY(startup_32)
#ifdef CONFIG_EFI_STUB ; start of inlined
jmp preferred_addr
ENTRY(efi_pe_entry)
... ; a lot of assembly (efi_pe_entry)
ENTRY(efi32_stub_entry)
... ; a lot of assembly (efi32_stub_entry)
leal preferred_addr(%eax), %eax
jmp *%eax
preferred_addr:
#endif ; end of inlined
... ; a lot of assembly (startup_32)
ENDPROC(startup_32)
And it is now converted into:
ENTRY(startup_32)
... ; a lot of assembly (startup_32)
ENDPROC(startup_32)
#ifdef CONFIG_EFI_STUB
ENTRY(efi_pe_entry)
... ; a lot of assembly (efi_pe_entry)
ENDPROC(efi_pe_entry)
ENTRY(efi32_stub_entry)
... ; a lot of assembly (efi32_stub_entry)
leal startup_32(%eax), %eax
jmp *%eax
ENDPROC(efi32_stub_entry)
#endif
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: ard.biesheuvel@linaro.org
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20170824073327.4129-1-jslaby@suse.cz
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Juergen Gross [Wed, 16 Aug 2017 17:31:57 +0000 (19:31 +0200)]
x86/lguest: Remove lguest support
Lguest seems to be rather unused these days. It has seen only patches
ensuring it still builds the last two years and its official state is
"Odd Fixes".
Remove it in order to be able to clean up the paravirt code.
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: boris.ostrovsky@oracle.com
Cc: lguest@lists.ozlabs.org
Cc: rusty@rustcorp.com.au
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/20170816173157.8633-3-jgross@suse.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Juergen Gross [Wed, 16 Aug 2017 17:31:56 +0000 (19:31 +0200)]
x86/paravirt/xen: Remove xen_patch()
Xen's paravirt patch function xen_patch() does some special casing for
irq_ops functions to apply relocations when those functions can be
patched inline instead of calls.
Unfortunately none of the special case function replacements is small
enough to be patched inline, so the special case never applies.
As xen_patch() will call paravirt_patch_default() in all cases it can
be just dropped. xen-asm.h doesn't seem necessary without xen_patch()
as the only thing left in it would be the definition of XEN_EFLAGS_NMI
used only once. So move that definition and remove xen-asm.h.
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: boris.ostrovsky@oracle.com
Cc: lguest@lists.ozlabs.org
Cc: rusty@rustcorp.com.au
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/20170816173157.8633-2-jgross@suse.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Josh Poimboeuf [Fri, 11 Aug 2017 17:24:15 +0000 (12:24 -0500)]
objtool: Fix objtool fallthrough detection with function padding
When GCC adds NOP padding between functions, those NOPs aren't
associated with a function symbol, which breaks objtool's detection of a
function falling through to another function. Instead it shows
confusing errors like:
drivers/mtd/chips/cfi_util.o: warning: objtool: cfi_qry_mode_on()+0x8b: return with modified stack frame
drivers/mtd/chips/cfi_util.o: warning: objtool: cfi_qry_mode_on()+0x0: stack state mismatch: cfa1=-4-32 cfa2=7+8
drivers/mtd/chips/cfi_cmdset_0002.o: warning: objtool: fixup_use_fwh_lock()+0x8: unknown stack-related register move
drivers/mtd/chips/cfi_cmdset_0002.o: warning: objtool: fixup_use_fwh_lock()+0x0: stack state mismatch: cfa1=6+16 cfa2=7+8
drivers/mtd/chips/cfi_cmdset_0002.o: warning: objtool: do_otp_write()+0xa: unsupported stack pointer realignment
drivers/mtd/chips/cfi_cmdset_0002.o: warning: objtool: do_otp_write()+0x0: stack state mismatch: cfa1=-4-40 cfa2=7+8
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/43e7aae9a7a7710cd6df597fa9dc501da4ba0602.1502472193.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Andy Lutomirski [Tue, 15 Aug 2017 05:36:19 +0000 (22:36 -0700)]
x86/xen/64: Fix the reported SS and CS in SYSCALL
When I cleaned up the Xen SYSCALL entries, I inadvertently changed
the reported segment registers. Before my patch, regs->ss was
__USER(32)_DS and regs->cs was __USER(32)_CS. After the patch, they
are FLAT_USER_CS/DS(32).
This had a couple unfortunate effects. It confused the
opportunistic fast return logic. It also significantly increased
the risk of triggering a nasty glibc bug:
https://sourceware.org/bugzilla/show_bug.cgi?id=21269
Update the Xen entry code to change it back.
Reported-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: xen-devel@lists.xenproject.org
Fixes:
8a9949bc71a7 ("x86/xen/64: Rearrange the SYSCALL entries")
Link: http://lkml.kernel.org/r/daba8351ea2764bb30272296ab9ce08a81bd8264.1502775273.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Josh Poimboeuf [Thu, 10 Aug 2017 21:37:26 +0000 (16:37 -0500)]
objtool: Track DRAP separately from callee-saved registers
When GCC realigns a function's stack, it sometimes uses %r13 as the DRAP
register, like:
push %r13
lea 0x10(%rsp), %r13
and $0xfffffffffffffff0, %rsp
pushq -0x8(%r13)
push %rbp
mov %rsp, %rbp
push %r13
...
mov -0x8(%rbp),%r13
leaveq
lea -0x10(%r13), %rsp
pop %r13
retq
Since %r13 was pushed onto the stack twice, its two stack locations need
to be stored separately. The first push of %r13 is its original value,
and the second push of %r13 is the caller's stack frame address.
Since %r13 is a callee-saved register, we need to track the stack
location of its original value separately from the DRAP register.
This fixes the following false positive warning:
lib/ubsan.o: warning: objtool: val_to_string.constprop.7()+0x97: leave instruction with modified stack frame
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes:
baa41469a7b9 ("objtool: Implement stack validation 2.0")
Link: http://lkml.kernel.org/r/3da23a6d4c5b3c1e21fc2ccc21a73941b97ff20a.1502401017.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Josh Poimboeuf [Thu, 10 Aug 2017 21:37:25 +0000 (16:37 -0500)]
objtool: Fix validate_branch() return codes
The validate_branch() function should never return a negative value.
Errors are treated as warnings so that even if something goes wrong,
objtool does its best to generate ORC data for the rest of the file.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes:
baa41469a7b9 ("objtool: Implement stack validation 2.0")
Link: http://lkml.kernel.org/r/d86671cfde823b50477cd2f6f548dfe54871e24d.1502401017.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Peter Zijlstra [Mon, 31 Jul 2017 10:21:54 +0000 (12:21 +0200)]
x86: Clarify/fix no-op barriers for text_poke_bp()
So I was looking at text_poke_bp() today and I couldn't make sense of
the barriers there.
How's for something like so?
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: masami.hiramatsu.pt@hitachi.com
Link: http://lkml.kernel.org/r/20170731102154.f57cvkjtnbmtctk6@hirez.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Andy Lutomirski [Tue, 1 Aug 2017 14:11:37 +0000 (07:11 -0700)]
x86/switch_to/64: Rewrite FS/GS switching yet again to fix AMD CPUs
Switching FS and GS is a mess, and the current code is still subtly
wrong: it assumes that "Loading a nonzero value into FS sets the
index and base", which is false on AMD CPUs if the value being
loaded is 1, 2, or 3.
(The current code came from commit
3e2b68d752c9 ("x86/asm,
sched/x86: Rewrite the FS and GS context switch code"), which made
it better but didn't fully fix it.)
Rewrite it to be much simpler and more obviously correct. This
should fix it fully on AMD CPUs and shouldn't adversely affect
performance.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Chang Seok <chang.seok.bae@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Andy Lutomirski [Tue, 1 Aug 2017 14:11:36 +0000 (07:11 -0700)]
selftests/x86/fsgsbase: Test selectors 1, 2, and 3
Those are funny cases. Make sure they work.
(Something is screwy with signal handling if a selector is 1, 2, or 3.
Anyone who wants to dive into that rabbit hole is welcome to do so.)
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Chang Seok <chang.seok.bae@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Andy Lutomirski [Tue, 1 Aug 2017 14:11:35 +0000 (07:11 -0700)]
x86/fsgsbase/64: Report FSBASE and GSBASE correctly in core dumps
In ELF_COPY_CORE_REGS, we're copying from the current task, so
accessing thread.fsbase and thread.gsbase makes no sense. Just read
the values from the CPU registers.
In practice, the old code would have been correct most of the time
simply because thread.fsbase and thread.gsbase usually matched the
CPU registers.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Chang Seok <chang.seok.bae@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Andy Lutomirski [Tue, 1 Aug 2017 14:11:34 +0000 (07:11 -0700)]
x86/fsgsbase/64: Fully initialize FS and GS state in start_thread_common
execve used to leak FSBASE and GSBASE on AMD CPUs. Fix it.
The security impact of this bug is small but not quite zero -- it
could weaken ASLR when a privileged task execs a less privileged
program, but only if program changed bitness across the exec, or the
child binary was highly unusual or actively malicious. A child
program that was compromised after the exec would not have access to
the leaked base.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Chang Seok <chang.seok.bae@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Josh Poimboeuf [Mon, 7 Aug 2017 14:38:05 +0000 (09:38 -0500)]
x86/asm: Fix UNWIND_HINT_REGS macro for older binutils
Apparently the binutils 2.20 assembler can't handle the '&&' operator in
the UNWIND_HINT_REGS macro. Rearrange the macro to do without it.
This fixes the following error:
arch/x86/entry/entry_64.S: Assembler messages:
arch/x86/entry/entry_64.S:521: Error: non-constant expression in ".if" statement
arch/x86/entry/entry_64.S:521: Error: non-constant expression in ".if" statement
arch/x86/entry/entry_64.S:521: Error: non-constant expression in ".if" statement
arch/x86/entry/entry_64.S:521: Error: non-constant expression in ".if" statement
arch/x86/entry/entry_64.S:521: Error: non-constant expression in ".if" statement
arch/x86/entry/entry_64.S:521: Error: non-constant expression in ".if" statement
arch/x86/entry/entry_64.S:521: Error: non-constant expression in ".if" statement
arch/x86/entry/entry_64.S:521: Error: non-constant expression in ".if" statement
arch/x86/entry/entry_64.S:521: Error: non-constant expression in ".if" statement
Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes:
39358a033b2e ("objtool, x86: Add facility for asm code to provide unwind hints")
Link: http://lkml.kernel.org/r/e2ad97c1ae49a484644b4aaa4dd3faa4d6d969b2.1502116651.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Andy Lutomirski [Wed, 9 Aug 2017 21:39:45 +0000 (14:39 -0700)]
x86/asm/32: Fix regs_get_register() on segment registers
The segment register high words on x86_32 may contain garbage.
Teach regs_get_register() to read them as u16 instead of unsigned
long.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/0b76f6dbe477b7b1a81938fddcc3c483d48f0ff2.1502314765.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Andy Lutomirski [Tue, 8 Aug 2017 03:59:21 +0000 (20:59 -0700)]
x86/xen/64: Rearrange the SYSCALL entries
Xen's raw SYSCALL entries are much less weird than native. Rather
than fudging them to look like native entries, use the Xen-provided
stack frame directly.
This lets us eliminate entry_SYSCALL_64_after_swapgs and two uses of
the SWAPGS_UNSAFE_STACK paravirt hook. The SYSENTER code would
benefit from similar treatment.
This makes one change to the native code path: the compat
instruction that clears the high 32 bits of %rax is moved slightly
later. I'd be surprised if this affects performance at all.
Tested-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Juergen Gross <jgross@suse.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/7c88ed36805d36841ab03ec3b48b4122c4418d71.1502164668.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Ingo Molnar [Thu, 10 Aug 2017 11:14:15 +0000 (13:14 +0200)]
Merge branch 'x86/urgent' into x86/asm, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Andy Lutomirski [Tue, 8 Aug 2017 02:43:13 +0000 (19:43 -0700)]
x86/asm/64: Clear AC on NMI entries
This closes a hole in our SMAP implementation.
This patch comes from grsecurity. Good catch!
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/314cc9f294e8f14ed85485727556ad4f15bb1659.1502159503.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Linus Torvalds [Wed, 9 Aug 2017 21:30:34 +0000 (14:30 -0700)]
Merge tag 'pinctrl-v4.13-2' of git://git./linux/kernel/git/linusw/linux-pinctrl
Pull pin control fixes from Linus Walleij:
"These are the pin control fixes I have gathered since the return from
my vacation. They boiled in -next a while so let's get them in.
Apart from the documentation build it is purely driver fixes. Which is
nice. The Intel fixes seem kind of important.
- Fix the documentation build as the docs were moved
- Correct the UART pin list on the Intel Merrifield
- Fix pin assignment and number of pins on the Marvell Armada 37xx
pin controller
- Cover the Setzer models in the Chromebook DMI quirk in the Intel
cheryview driver so they start working
- Add the missing "sim" function to the sunxi driver
- Fix USB pin definitions on Uniphier Pro4
- Smatch fix for invalid reference in the zx pin control driver"
* tag 'pinctrl-v4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: generic: update references to Documentation/pinctrl.txt
pinctrl: intel: merrifield: Correct UART pin lists
pinctrl: armada-37xx: Fix number of pin in south bridge
pinctrl: armada-37xx: Fix the pin 23 on south bridge
pinctrl: cherryview: Add Setzer models to the Chromebook DMI quirk
pinctrl: sunxi: add a missing function of A10/A20 pinctrl driver
pinctrl: uniphier: fix USB3 pin assignment for Pro4
pinctrl: zte: fix dereference of 'data' in zx_set_mux()
Mel Gorman [Wed, 9 Aug 2017 07:27:11 +0000 (08:27 +0100)]
futex: Remove unnecessary warning from get_futex_key
Commit
65d8fc777f6d ("futex: Remove requirement for lock_page() in
get_futex_key()") removed an unnecessary lock_page() with the
side-effect that page->mapping needed to be treated very carefully.
Two defensive warnings were added in case any assumption was missed and
the first warning assumed a correct application would not alter a
mapping backing a futex key. Since merging, it has not triggered for
any unexpected case but Mark Rutland reported the following bug
triggering due to the first warning.
kernel BUG at kernel/futex.c:679!
Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
Modules linked in:
CPU: 0 PID: 3695 Comm: syz-executor1 Not tainted
4.13.0-rc3-00020-g307fec773ba3 #3
Hardware name: linux,dummy-virt (DT)
task:
ffff80001e271780 task.stack:
ffff000010908000
PC is at get_futex_key+0x6a4/0xcf0 kernel/futex.c:679
LR is at get_futex_key+0x6a4/0xcf0 kernel/futex.c:679
pc : [<
ffff00000821ac14>] lr : [<
ffff00000821ac14>] pstate:
80000145
The fact that it's a bug instead of a warning was due to an unrelated
arm64 problem, but the warning itself triggered because the underlying
mapping changed.
This is an application issue but from a kernel perspective it's a
recoverable situation and the warning is unnecessary so this patch
removes the warning. The warning may potentially be triggered with the
following test program from Mark although it may be necessary to adjust
NR_FUTEX_THREADS to be a value smaller than the number of CPUs in the
system.
#include <linux/futex.h>
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <sys/syscall.h>
#include <sys/time.h>
#include <unistd.h>
#define NR_FUTEX_THREADS 16
pthread_t threads[NR_FUTEX_THREADS];
void *mem;
#define MEM_PROT (PROT_READ | PROT_WRITE)
#define MEM_SIZE 65536
static int futex_wrapper(int *uaddr, int op, int val,
const struct timespec *timeout,
int *uaddr2, int val3)
{
syscall(SYS_futex, uaddr, op, val, timeout, uaddr2, val3);
}
void *poll_futex(void *unused)
{
for (;;) {
futex_wrapper(mem, FUTEX_CMP_REQUEUE_PI, 1, NULL, mem + 4, 1);
}
}
int main(int argc, char *argv[])
{
int i;
mem = mmap(NULL, MEM_SIZE, MEM_PROT,
MAP_SHARED | MAP_ANONYMOUS, -1, 0);
printf("Mapping @ %p\n", mem);
printf("Creating futex threads...\n");
for (i = 0; i < NR_FUTEX_THREADS; i++)
pthread_create(&threads[i], NULL, poll_futex, NULL);
printf("Flipping mapping...\n");
for (;;) {
mmap(mem, MEM_SIZE, MEM_PROT,
MAP_FIXED | MAP_SHARED | MAP_ANONYMOUS, -1, 0);
}
return 0;
}
Reported-and-tested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org # 4.7+
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Wed, 9 Aug 2017 20:21:28 +0000 (13:21 -0700)]
Merge branch 'i2c/for-current' of git://git./linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"The main thing is to allow empty id_tables for ACPI to make some
drivers get probed again. It looks a bit bigger than usual because it
needs some internal renaming, too.
Other than that, there is a fix for broken DSTDs, a super simple
enablement for ARM MPS, and two documentation fixes which I'd like to
see in v4.13 already"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: rephrase explanation of I2C_CLASS_DEPRECATED
i2c: allow i2c-versatile for ARM MPS platforms
i2c: designware: Some broken DSTDs use 1MiHz instead of 1MHz
i2c: designware: Print clock freq on invalid clock freq error
i2c: core: Allow empty id_table in ACPI case as well
i2c: mux: pinctrl: mention correct module name in Kconfig help text
Linus Torvalds [Wed, 9 Aug 2017 17:37:35 +0000 (10:37 -0700)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"Three patches that should go into this release.
Two of them are from Paolo and fix up some corner cases with BFQ, and
the last patch is from Ming and fixes up a potential usage count
imbalance regression due to the recent NOWAIT work"
* 'for-linus' of git://git.kernel.dk/linux-block:
blk-mq: don't leak preempt counter/q_usage_counter when allocating rq failed
block, bfq: consider also in_service_entity to state whether an entity is active
block, bfq: reset in_service_entity if it becomes idle
Linus Torvalds [Wed, 9 Aug 2017 17:33:49 +0000 (10:33 -0700)]
Merge branch 'linus' of git://git./linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
"Fix two regressions in the inside-secure driver with respect to
hmac(sha1)"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: inside-secure - fix the sha state length in hmac_sha1_setkey
crypto: inside-secure - fix invalidation check in hmac_sha1_setkey
Linus Torvalds [Wed, 9 Aug 2017 17:14:04 +0000 (10:14 -0700)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
"The pull requests are getting smaller, that's progress I suppose :-)
1) Fix infinite loop in CIPSO option parsing, from Yujuan Qi.
2) Fix remote checksum handling in VXLAN and GUE tunneling drivers,
from Koichiro Den.
3) Missing u64_stats_init() calls in several drivers, from Florian
Fainelli.
4) TCP can set the congestion window to an invalid ssthresh value
after congestion window reductions, from Yuchung Cheng.
5) Fix BPF jit branch generation on s390, from Daniel Borkmann.
6) Correct MIPS ebpf JIT merge, from David Daney.
7) Correct byte order test in BPF test_verifier.c, from Daniel
Borkmann.
8) Fix various crashes and leaks in ASIX driver, from Dean Jenkins.
9) Handle SCTP checksums properly in mlx4 driver, from Davide
Caratti.
10) We can potentially enter tcp_connect() with a cached route
already, due to fastopen, so we have to explicitly invalidate it.
11) skb_warn_bad_offload() can bark in legitimate situations, fix from
Willem de Bruijn"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (52 commits)
net: avoid skb_warn_bad_offload false positives on UFO
qmi_wwan: fix NULL deref on disconnect
ppp: fix xmit recursion detection on ppp channels
rds: Reintroduce statistics counting
tcp: fastopen: tcp_connect() must refresh the route
net: sched: set xt_tgchk_param par.net properly in ipt_init_target
net: dsa: mediatek: add adjust link support for user ports
net/mlx4_en: don't set CHECKSUM_COMPLETE on SCTP packets
qed: Fix a memory allocation failure test in 'qed_mcp_cmd_init()'
hysdn: fix to a race condition in put_log_buffer
s390/qeth: fix L3 next-hop in xmit qeth hdr
asix: Fix small memory leak in ax88772_unbind()
asix: Ensure asix_rx_fixup_info members are all reset
asix: Add rx->ax_skb = NULL after usbnet_skb_return()
bpf: fix selftest/bpf/test_pkt_md_access on s390x
netvsc: fix race on sub channel creation
bpf: fix byte order test in test_verifier
xgene: Always get clk source, but ignore if it's missing for SGMII ports
MIPS: Add missing file for eBPF JIT.
bpf, s390: fix build for libbpf and selftest suite
...
Willem de Bruijn [Tue, 8 Aug 2017 18:22:55 +0000 (14:22 -0400)]
net: avoid skb_warn_bad_offload false positives on UFO
skb_warn_bad_offload triggers a warning when an skb enters the GSO
stack at __skb_gso_segment that does not have CHECKSUM_PARTIAL
checksum offload set.
Commit
b2504a5dbef3 ("net: reduce skb_warn_bad_offload() noise")
observed that SKB_GSO_DODGY producers can trigger the check and
that passing those packets through the GSO handlers will fix it
up. But, the software UFO handler will set ip_summed to
CHECKSUM_NONE.
When __skb_gso_segment is called from the receive path, this
triggers the warning again.
Make UFO set CHECKSUM_UNNECESSARY instead of CHECKSUM_NONE. On
Tx these two are equivalent. On Rx, this better matches the
skb state (checksum computed), as CHECKSUM_NONE here means no
checksum computed.
See also this thread for context:
http://patchwork.ozlabs.org/patch/799015/
Fixes:
b2504a5dbef3 ("net: reduce skb_warn_bad_offload() noise")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bjørn Mork [Tue, 8 Aug 2017 16:02:11 +0000 (18:02 +0200)]
qmi_wwan: fix NULL deref on disconnect
qmi_wwan_disconnect is called twice when disconnecting devices with
separate control and data interfaces. The first invocation will set
the interface data to NULL for both interfaces to flag that the
disconnect has been handled. But the matching NULL check was left
out when qmi_wwan_disconnect was added, resulting in this oops:
usb 2-1.4: USB disconnect, device number 4
qmi_wwan 2-1.4:1.6 wwp0s29u1u4i6: unregister 'qmi_wwan' usb-0000:00:1d.0-1.4, WWAN/QMI device
BUG: unable to handle kernel NULL pointer dereference at
00000000000000e0
IP: qmi_wwan_disconnect+0x25/0xc0 [qmi_wwan]
PGD 0
P4D 0
Oops: 0000 [#1] SMP
Modules linked in: <stripped irrelevant module list>
CPU: 2 PID: 33 Comm: kworker/2:1 Tainted: G E 4.12.3-nr44-normandy-r1500619820+ #1
Hardware name: LENOVO 4291LR7/4291LR7, BIOS CBET4000
4.6-810-g50522254fb 07/21/2017
Workqueue: usb_hub_wq hub_event [usbcore]
task:
ffff8c882b716040 task.stack:
ffffb8e800d84000
RIP: 0010:qmi_wwan_disconnect+0x25/0xc0 [qmi_wwan]
RSP: 0018:
ffffb8e800d87b38 EFLAGS:
00010246
RAX:
0000000000000000 RBX:
0000000000000000 RCX:
0000000000000000
RDX:
0000000000000001 RSI:
ffff8c8824f3f1d0 RDI:
ffff8c8824ef6400
RBP:
ffff8c8824ef6400 R08:
0000000000000000 R09:
0000000000000000
R10:
ffffb8e800d87780 R11:
0000000000000011 R12:
ffffffffc07ea0e8
R13:
ffff8c8824e2e000 R14:
ffff8c8824e2e098 R15:
0000000000000000
FS:
0000000000000000(0000) GS:
ffff8c8835300000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
00000000000000e0 CR3:
0000000229ca5000 CR4:
00000000000406e0
Call Trace:
? usb_unbind_interface+0x71/0x270 [usbcore]
? device_release_driver_internal+0x154/0x210
? qmi_wwan_unbind+0x6d/0xc0 [qmi_wwan]
? usbnet_disconnect+0x6c/0xf0 [usbnet]
? qmi_wwan_disconnect+0x87/0xc0 [qmi_wwan]
? usb_unbind_interface+0x71/0x270 [usbcore]
? device_release_driver_internal+0x154/0x210
Reported-and-tested-by: Nathaniel Roach <nroach44@gmail.com>
Fixes:
c6adf77953bc ("net: usb: qmi_wwan: add qmap mux protocol support")
Cc: Daniele Palmas <dnlplm@gmail.com>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Guillaume Nault [Tue, 8 Aug 2017 09:43:24 +0000 (11:43 +0200)]
ppp: fix xmit recursion detection on ppp channels
Commit
e5dadc65f9e0 ("ppp: Fix false xmit recursion detect with two ppp
devices") dropped the xmit_recursion counter incrementation in
ppp_channel_push() and relied on ppp_xmit_process() for this task.
But __ppp_channel_push() can also send packets directly (using the
.start_xmit() channel callback), in which case the xmit_recursion
counter isn't incremented anymore. If such packets get routed back to
the parent ppp unit, ppp_xmit_process() won't notice the recursion and
will call ppp_channel_push() on the same channel, effectively creating
the deadlock situation that the xmit_recursion mechanism was supposed
to prevent.
This patch re-introduces the xmit_recursion counter incrementation in
ppp_channel_push(). Since the xmit_recursion variable is now part of
the parent ppp unit, incrementation is skipped if the channel doesn't
have any. This is fine because only packets routed through the parent
unit may enter the channel recursively.
Finally, we have to ensure that pch->ppp is not going to be modified
while executing ppp_channel_push(). Instead of taking this lock only
while calling ppp_xmit_process(), we now have to hold it for the full
ppp_channel_push() execution. This respects the ppp locks ordering
which requires locking ->upl before ->downl.
Fixes:
e5dadc65f9e0 ("ppp: Fix false xmit recursion detect with two ppp devices")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Håkon Bugge [Tue, 8 Aug 2017 09:13:32 +0000 (11:13 +0200)]
rds: Reintroduce statistics counting
In commit
7e3f2952eeb1 ("rds: don't let RDS shutdown a connection
while senders are present"), refilling the receive queue was removed
from rds_ib_recv(), along with the increment of
s_ib_rx_refill_from_thread.
Commit
73ce4317bf98 ("RDS: make sure we post recv buffers")
re-introduces filling the receive queue from rds_ib_recv(), but does
not add the statistics counter. rds_ib_recv() was later renamed to
rds_ib_recv_path().
This commit reintroduces the statistics counting of
s_ib_rx_refill_from_thread and s_ib_rx_refill_from_cq.
Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>
Reviewed-by: Wei Lin Guay <wei.lin.guay@oracle.com>
Reviewed-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 8 Aug 2017 08:41:58 +0000 (01:41 -0700)]
tcp: fastopen: tcp_connect() must refresh the route
With new TCP_FASTOPEN_CONNECT socket option, there is a possibility
to call tcp_connect() while socket sk_dst_cache is either NULL
or invalid.
+0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 4
+0 fcntl(4, F_SETFL, O_RDWR|O_NONBLOCK) = 0
+0 setsockopt(4, SOL_TCP, TCP_FASTOPEN_CONNECT, [1], 4) = 0
+0 connect(4, ..., ...) = 0
<< sk->sk_dst_cache becomes obsolete, or even set to NULL >>
+1 sendto(4, ..., 1000, MSG_FASTOPEN, ..., ...) = 1000
We need to refresh the route otherwise bad things can happen,
especially when syzkaller is running on the host :/
Fixes:
19f6d3f3c8422 ("net/tcp-fastopen: Add new API support")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Wei Wang <weiwan@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Wei Wang <weiwan@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Tue, 8 Aug 2017 07:25:25 +0000 (15:25 +0800)]
net: sched: set xt_tgchk_param par.net properly in ipt_init_target
Now xt_tgchk_param par in ipt_init_target is a local varibale,
par.net is not initialized there. Later when xt_check_target
calls target's checkentry in which it may access par.net, it
would cause kernel panic.
Jaroslav found this panic when running:
# ip link add TestIface type dummy
# tc qd add dev TestIface ingress handle ffff:
# tc filter add dev TestIface parent ffff: u32 match u32 0 0 \
action xt -j CONNMARK --set-mark 4
This patch is to pass net param into ipt_init_target and set
par.net with it properly in there.
v1->v2:
As Wang Cong pointed, I missed ipt_net_id != xt_net_id, so fix
it by also passing net_id to __tcf_ipt_init.
v2->v3:
Missed the fixes tag, so add it.
Fixes:
ecb2421b5ddf ("netfilter: add and use nf_ct_netns_get/put")
Reported-by: Jaroslav Aster <jaster@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
John Crispin [Mon, 7 Aug 2017 14:20:49 +0000 (16:20 +0200)]
net: dsa: mediatek: add adjust link support for user ports
Manually adjust the port settings of user ports once PHY polling has
completed. This patch extends the adjust_link callback to configure the
per port PMCR register, applying the proper values polled from the PHY.
Without this patch flow control was not always getting setup properly.
Signed-off-by: Shashidhar Lakkavalli <shashidhar.lakkavalli@openmesh.com>
Signed-off-by: Muciri Gatimu <muciri@openmesh.com>
Signed-off-by: John Crispin <john@phrozen.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Davide Caratti [Thu, 3 Aug 2017 20:54:48 +0000 (22:54 +0200)]
net/mlx4_en: don't set CHECKSUM_COMPLETE on SCTP packets
if the NIC fails to validate the checksum on TCP/UDP, and validation of IP
checksum is successful, the driver subtracts the pseudo-header checksum
from the value obtained by the hardware and sets CHECKSUM_COMPLETE. Don't
do that if protocol is IPPROTO_SCTP, otherwise CRC32c validation fails.
V2: don't test MLX4_CQE_STATUS_IPV6 if MLX4_CQE_STATUS_IPV4 is set
Reported-by: Shuang Li <shuali@redhat.com>
Fixes:
f8c6455bb04b ("net/mlx4_en: Extend checksum offloading by CHECKSUM COMPLETE")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Tue, 8 Aug 2017 18:42:33 +0000 (11:42 -0700)]
Merge tag 'for-linus' of git://git./linux/kernel/git/dledford/rdma
Pull rdma fixes from Doug Ledford:
"Third set of -rc fixes for 4.13 cycle
- small set of miscellanous fixes
- a reasonably sizable set of IPoIB fixes that deal with multiple
long standing issues"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
IB/hns: checking for IS_ERR() instead of NULL
RDMA/mlx5: Fix existence check for extended address vector
IB/uverbs: Fix device cleanup
RDMA/uverbs: Prevent leak of reserved field
IB/core: Fix race condition in resolving IP to MAC
IB/ipoib: Notify on modify QP failure only when relevant
Revert "IB/core: Allow QP state transition from reset to error"
IB/ipoib: Remove double pointer assigning
IB/ipoib: Clean error paths in add port
IB/ipoib: Add get statistics support to SRIOV VF
IB/ipoib: Add multicast packets statistics
IB/ipoib: Set IPOIB_NEIGH_TBL_FLUSH after flushed completion initialization
IB/ipoib: Prevent setting negative values to max_nonsrq_conn_qp
IB/ipoib: Make sure no in-flight joins while leaving that mcast
IB/ipoib: Use cancel_delayed_work_sync when needed
IB/ipoib: Fix race between light events and interface restart
Joe Perches [Sun, 6 Aug 2017 01:45:49 +0000 (18:45 -0700)]
parse-maintainers: Move matching sections from MAINTAINERS
Allow any number of command line arguments to match either the
section header or the section contents and create new files.
Create MAINTAINERS.new and SECTION.new.
This allows scripting of the movement of various sections from
MAINTAINERS.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joe Perches [Sun, 6 Aug 2017 01:45:48 +0000 (18:45 -0700)]
parse-maintainers: Use perl hash references and specific filenames
Instead of reading STDIN and writing STDOUT, use specific filenames of
MAINTAINERS and MAINTAINERS.new.
Use hash references instead of global hash %hash so future modifications
can read and write specific hashes to split up MAINTAINERS into multiple
files using a script.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joe Perches [Sun, 6 Aug 2017 01:45:47 +0000 (18:45 -0700)]
parse-maintainers: Add section pattern sorting
Section [A-Z]: patterns are not currently in any required sorting order.
Add a specific sorting sequence to MAINTAINERS entries.
Sort F: and X: patterns in alphabetic order.
The preferred section ordering is:
SECTION HEADER
M: Maintainers
R: Reviewers
P: Named persons without email addresses
L: Mailing list addresses
S: Status of this section (Supported, Maintained, Orphan, etc...)
W: Any relevant URLs
T: Source code control type (git, quilt, etc)
Q: Patchwork patch acceptance queue site
B: Bug tracking URIs
C: Chat URIs
F: Files with wildcard patterns (alphabetic ordered)
X: Excluded files with wildcard patterns (alphabetic ordered)
N: Files with regex patterns
K: Keyword regexes in source code for maintainership identification
Miscellaneous perl neatening:
- Rename %map to %hash, map has a different meaning in perl
- Avoid using \& and local variables for function indirection
- Use return for a little c like clarity
- Use c-like function call style instead of &function
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joe Perches [Sat, 5 Aug 2017 04:45:48 +0000 (21:45 -0700)]
get_maintainer: Prepare for separate MAINTAINERS files
Allow for MAINTAINERS to become a directory and if it is,
read all the files in the directory for maintained sections.
Optionally look for all files named MAINTAINERS in directories
excluding the .git directory by using --find-maintainer-files.
This optional feature adds ~.3 seconds of CPU on an Intel
i5-6200 with an SSD.
Miscellanea:
- Create a read_maintainer_file subroutine from the existing code
- Test only the existence of MAINTAINERS, not whether it's a file
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Randy Dunlap [Wed, 2 Aug 2017 17:57:45 +0000 (10:57 -0700)]
MAINTAINERS: openbmc mailing list is moderated
The openbmc mailing list is moderated for non-subscribers.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Brendan Higgins <brendanhiggins@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Joel Stanley <joel@jms.id.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sedat Dilek [Tue, 25 Jul 2017 12:53:42 +0000 (14:53 +0200)]
MAINTAINERS: greybus: Fix typo s/LOOBACK/LOOPBACK
Fixes:
f47e07bc5f1a5c48 ("Fix up MAINTAINERS file problems")
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Tue, 8 Aug 2017 16:38:41 +0000 (09:38 -0700)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Two small fixes, one re-fix of a previous fix and five patches sorting
out hotplug in the bnx2X class of drivers. The latter is rather
involved, but necessary because these drivers have started dropping
lockdep recursion warnings on the hotplug lock because of its
conversion to a percpu rwsem"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: sg: only check for dxfer_len greater than 256M
scsi: aacraid: reading out of bounds
scsi: qedf: Limit number of CQs
scsi: bnx2i: Simplify cpu hotplug code
scsi: bnx2fc: Simplify CPU hotplug code
scsi: bnx2i: Prevent recursive cpuhotplug locking
scsi: bnx2fc: Prevent recursive cpuhotplug locking
scsi: bnx2fc: Plug CPU hotplug race
Helge Deller [Tue, 8 Aug 2017 16:28:41 +0000 (18:28 +0200)]
random: fix warning message on ia64 and parisc
Fix the warning message on the parisc and IA64 architectures to show the
correct function name of the caller by using %pS instead of %pF. The
message is printed with the value of _RET_IP_ which calls
__builtin_return_address(0) and as such returns the IP address caller
instead of pointer to a function descriptor of the caller.
The effect of this patch is visible on the parisc and ia64 architectures
only since those are the ones which use function descriptors while on
all others %pS and %pF will behave the same.
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Helge Deller <deller@gmx.de>
Fixes:
eecabf567422 ("random: suppress spammy warnings about unseeded randomness")
Fixes:
d06bfd1989fe ("random: warn when kernel uses unseeded randomness")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Tue, 8 Aug 2017 01:58:10 +0000 (18:58 -0700)]
Merge tag 'xtensa-
20170807' of git://github.com/jcmvbkbc/linux-xtensa
Pull Xtensa fixes from Max Filippov:
- use asm-generic instances of asm/param.h and asm/device.h instead of
exact copies in arch/xtensa/include/asm;
- fix build error for xtensa cores with aliasing WT cache: define cache
flushing functions and copy_{to,from}_user_page;
- add missing EXPORT_SYMBOLs for clear_user_highpage, copy_user_highpage,
flush_dcache_page, local_flush_cache_range, local_flush_cache_page,
csum_partial and csum_partial_copy_generic.
* tag 'xtensa-
20170807' of git://github.com/jcmvbkbc/linux-xtensa:
xtensa: mm/cache: add missing EXPORT_SYMBOLs
xtensa: don't limit csum_partial export by CONFIG_NET
xtensa: fix cache aliasing handling code for WT cache
xtensa: remove wrapper header for asm/param.h
xtensa: remove wrapper header for asm/device.h
Linus Torvalds [Tue, 8 Aug 2017 01:40:18 +0000 (18:40 -0700)]
Merge tag 'for-linus-
20170807' of git://git.infradead.org/linux-mtd
Pull MTD fixes from Brian Norris:
"I missed getting these out for rc4, but here are some MTD fixes.
Just NAND fixes (in both the core handling, and a few drivers). Notes
stolen from Boris:
Core fixes:
- fix data interface setup for ONFI NANDs that do not support the SET
FEATURES command
- fix a kernel doc header
- fix potential integer overflow when retrieving timing information
from the parameter page
- fix wrong OOB layout for small page NANDs
Driver fixes:
- fix potential division-by-zero bug
- fix backward compat with old atmel-nand DT bindings
- fix ->setup_data_interface() in the atmel NAND driver"
* tag 'for-linus-
20170807' of git://git.infradead.org/linux-mtd:
mtd: nand: atmel: Fix EDO mode check
mtd: nand: Declare tBERS, tR and tPROG as u64 to avoid integer overflow
mtd: nand: Fix timing setup for NANDs that do not support SET FEATURES
mtd: nand: Fix a docs build warning
mtd: nand: sunxi: fix potential divide-by-zero error
nand: fix wrong default oob layout for small pages using soft ecc
mtd: nand: atmel: Fix DT backward compatibility in pmecc.c
Linus Torvalds [Tue, 8 Aug 2017 01:16:22 +0000 (18:16 -0700)]
Merge tag 'xfs-4.13-fixes-3' of git://git./fs/xfs/xfs-linux
Pull xfs fixes from Darrick Wong:
"I have a couple more bug fixes for you today:
- fix memory leak when issuing discard
- fix propagation of the dax inode flag"
* tag 'xfs-4.13-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: Fix per-inode DAX flag inheritance
xfs: Fix leak of discard bio
Christophe Jaillet [Sun, 6 Aug 2017 22:00:17 +0000 (00:00 +0200)]
qed: Fix a memory allocation failure test in 'qed_mcp_cmd_init()'
We allocate 'p_info->mfw_mb_cur' and 'p_info->mfw_mb_shadow' but we check
'p_info->mfw_mb_addr' instead of 'p_info->mfw_mb_cur'.
'p_info->mfw_mb_addr' is never 0, because it is initiliazed a few lines
above in 'qed_load_mcp_offsets()'.
Update the test and check the result of the 2 'kzalloc()' instead.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Tomer Tayar <Tomer.Tayar@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Anton Volkov [Mon, 7 Aug 2017 12:54:14 +0000 (15:54 +0300)]
hysdn: fix to a race condition in put_log_buffer
The synchronization type that was used earlier to guard the loop that
deletes unused log buffers may lead to a situation that prevents any
thread from going through the loop.
The patch deletes previously used synchronization mechanism and moves
the loop under the spin_lock so the similar cases won't be feasible in
the future.
Found by by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Anton Volkov <avolkov@ispras.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Mon, 7 Aug 2017 11:28:39 +0000 (13:28 +0200)]
s390/qeth: fix L3 next-hop in xmit qeth hdr
On L3, the qeth_hdr struct needs to be filled with the next-hop
IP address.
The current code accesses rtable->rt_gateway without checking that
rtable is a valid address. The accidental access to a lowcore area
results in a random next-hop address in the qeth_hdr.
rtable (or more precisely, skb_dst(skb)) can be NULL in rare cases
(for instance together with AF_PACKET sockets).
This patch adds the missing NULL-ptr checks.
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Fixes:
87e7597b5a3 qeth: Move away from using neighbour entries in qeth_l3_fill_header()
Signed-off-by: David S. Miller <davem@davemloft.net>
Doug Ledford [Mon, 7 Aug 2017 17:30:40 +0000 (13:30 -0400)]
Merge tag 'rdma-rc-2017-07-26' of git://git./linux/kernel/git/leon/linux-rdma into leon-ipoib
IPoIB fixes for 4.13
The patchset provides various fixes for IPoIB. It is combination of
fixes to various issues discovered during verification along with
static checkers cleanup patches.
Most of the patches are from pre-git era and hence lack of Fixes lines.
There is one exception in this IPoIB group - addition of patch revert:
Revert "IB/core: Allow QP state transition from reset to error", but
it followed by proper fix to the annoying print, so I thought it is
appropriate to include it.
Signed-off-by: Doug Ledford <dledford@redhat.com>
David S. Miller [Mon, 7 Aug 2017 17:10:19 +0000 (10:10 -0700)]
Merge branch 'asix-Improve-robustness'
Dean Jenkins says:
====================
asix: Improve robustness
Please consider taking these patches to improve the robustness of the ASIX USB
to Ethernet driver.
Failures prompting an ASIX driver code review
=============================================
On an ARM i.MX6 embedded platform some strange one-off and two-off failures were
observed in and around the ASIX USB to Ethernet driver. This was observed on a
highly modified kernel 3.14 with the ASIX driver containing back-ported changes
from kernel.org up to kernel 4.8 approximately.
a) A one-off failure in asix_rx_fixup_internal():
There was an occurrence of an attempt to write off the end of the netdev buffer
which was trapped by skb_over_panic() in skb_put().
[20030.846440] skbuff: skb_over_panic: text:
7f2271c0 len:120 put:60 head:
8366ecc0 data:
8366ed02 tail:0x8366ed7a end:0x8366ed40 dev:eth0
[20030.863007] Kernel BUG at
8044ce38 [verbose debug info unavailable]
[20031.215345] Backtrace:
[20031.217884] [<
8044cde0>] (skb_panic) from [<
8044d50c>] (skb_put+0x50/0x5c)
[20031.227408] [<
8044d4bc>] (skb_put) from [<
7f2271c0>] (asix_rx_fixup_internal+0x1c4/0x23c [asix])
[20031.242024] [<
7f226ffc>] (asix_rx_fixup_internal [asix]) from [<
7f22724c>] (asix_rx_fixup_common+0x14/0x18 [asix])
[20031.260309] [<
7f227238>] (asix_rx_fixup_common [asix]) from [<
7f21f7d4>] (usbnet_bh+0x74/0x224 [usbnet])
[20031.269879] [<
7f21f760>] (usbnet_bh [usbnet]) from [<
8002f834>] (call_timer_fn+0xa4/0x1f0)
[20031.283961] [<
8002f790>] (call_timer_fn) from [<
80030834>] (run_timer_softirq+0x230/0x2a8)
[20031.302782] [<
80030604>] (run_timer_softirq) from [<
80028780>] (__do_softirq+0x15c/0x37c)
[20031.321511] [<
80028624>] (__do_softirq) from [<
80028c38>] (irq_exit+0x8c/0xe8)
[20031.339298] [<
80028bac>] (irq_exit) from [<
8000e9c8>] (handle_IRQ+0x8c/0xc8)
[20031.350038] [<
8000e93c>] (handle_IRQ) from [<
800085c8>] (gic_handle_irq+0xb8/0xf8)
[20031.365528] [<
80008510>] (gic_handle_irq) from [<
8050de80>] (__irq_svc+0x40/0x70)
Analysis of the logic of the ASIX driver (containing backported changes from
kernel.org up to kernel 4.8 approximately) suggested that the software could not
trigger skb_over_panic(). The analysis of the kernel BUG() crash information
suggested that the netdev buffer was written with 2 minimal 60 octet length
Ethernet frames (ASIX hardware drops the 4 octet FCS field) and the 2nd Ethernet
frame attempted to write off the end of the netdev buffer.
Note that the netdev buffer should only contain 1 Ethernet frame so if an
attempt to write 2 Ethernet frames into the buffer is made then that is wrong.
However, the logic of the asix_rx_fixup_internal() only allows 1 Ethernet frame
to be written into the netdev buffer.
Potentially this failure was due to memory corruption because it was only seen
once.
b) Two-off failures in the NAPI layer's backlog queue:
There were 2 crashes in the NAPI layer's backlog queue presumably after
asix_rx_fixup_internal() called usbnet_skb_return().
[24097.273945] Unable to handle kernel NULL pointer dereference at virtual address
00000004
[24097.398944] PC is at process_backlog+0x80/0x16c
[24097.569466] Backtrace:
[24097.572007] [<
8045ad98>] (process_backlog) from [<
8045b64c>] (net_rx_action+0xcc/0x248)
[24097.591631] [<
8045b580>] (net_rx_action) from [<
80028780>] (__do_softirq+0x15c/0x37c)
[24097.610022] [<
80028624>] (__do_softirq) from [<
800289cc>] (run_ksoftirqd+0x2c/0x84)
and
[ 1059.828452] Unable to handle kernel NULL pointer dereference at virtual address
00000000
[ 1059.953715] PC is at process_backlog+0x84/0x16c
[ 1060.140896] Backtrace:
[ 1060.143434] [<
8045ad98>] (process_backlog) from [<
8045b64c>] (net_rx_action+0xcc/0x248)
[ 1060.163075] [<
8045b580>] (net_rx_action) from [<
80028780>] (__do_softirq+0x15c/0x37c)
[ 1060.181474] [<
80028624>] (__do_softirq) from [<
80028c38>] (irq_exit+0x8c/0xe8)
[ 1060.199256] [<
80028bac>] (irq_exit) from [<
8000e9c8>] (handle_IRQ+0x8c/0xc8)
[ 1060.210006] [<
8000e93c>] (handle_IRQ) from [<
800085c8>] (gic_handle_irq+0xb8/0xf8)
[ 1060.225492] [<
80008510>] (gic_handle_irq) from [<
8050de80>] (__irq_svc+0x40/0x70)
The embedded board was only using an ASIX USB to Ethernet adaptor eth0.
Analysis suggested that the doubly-linked list pointers of the backlog queue had
been corrupted because one of the link pointers was NULL.
Potentially this failure was due to memory corruption because it was only seen
twice.
Results of the ASIX driver code review
======================================
During the code review some weaknesses were observed in the ASIX driver and the
following patches have been created to improve the robustness.
Brief overview of the patches
-----------------------------
1. asix: Add rx->ax_skb = NULL after usbnet_skb_return()
The current ASIX driver sends the received Ethernet frame to the NAPI layer of
the network stack via the call to usbnet_skb_return() in
asix_rx_fixup_internal() but retains the rx->ax_skb pointer to the netdev
buffer. The driver no longer needs the rx->ax_skb pointer at this point because
the NAPI layer now has the Ethernet frame.
This means that asix_rx_fixup_internal() must not use rx->ax_skb after the call
to usbnet_skb_return() because it could corrupt the handling of the Ethernet
frame within the network layer.
Therefore, to remove the risk of erroneous usage of rx->ax_skb, set rx->ax_skb
to NULL after the call to usbnet_skb_return(). This avoids potential erroneous
freeing of rx->ax_skb and erroneous writing to the netdev buffer. If the
software now somehow inappropriately reused rx->ax_skb, then a NULL pointer
dereference of rx->ax_skb would occur which makes investigation easier.
2. asix: Ensure asix_rx_fixup_info members are all reset
This patch creates reset_asix_rx_fixup_info() to allow all the
asix_rx_fixup_info structure members to be consistently reset to initial
conditions.
Call reset_asix_rx_fixup_info() upon each detectable error condition so that the
next URB is processed from a known state.
Otherwise, there is a risk that some members of the asix_rx_fixup_info structure
may be incorrect after an error occurred so potentially leading to a
malfunction.
3. asix: Fix small memory leak in ax88772_unbind()
This patch creates asix_rx_fixup_common_free() to allow the rx->ax_skb to be
freed when necessary.
asix_rx_fixup_common_free() is called from ax88772_unbind() before the parent
private data structure is freed.
Without this patch, there is a risk of a small netdev buffer memory leak each
time ax88772_unbind() is called during the reception of an Ethernet frame that
spans across 2 URBs.
Testing
=======
The patches have been sanity tested on a 64-bit Linux laptop running kernel
4.13-rc2 with the 3 patches applied on top.
The ASIX USB to Adaptor used for testing was (output of lsusb):
ID 0b95:772b ASIX Electronics Corp. AX88772B
Test #1
-------
The test ran a flood ping test script which slowly incremented the ICMP Echo
Request's payload from 0 to 5000 octets. This eventually causes IPv4
fragmentation to occur which causes Ethernet frames to be sent very close to
each other so increases the probability that an Ethernet frame will span 2 URBs.
The test showed that all pings were successful. The test took about 15 minutes
to complete.
Test #2
-------
A script was run on the laptop to periodically run ifdown and ifup every second
so that the ASIX USB to Adaptor was up for 1 second and down for 1 second.
From a Linux PC connected to the laptop, the following ping command was used
ping -f -s 5000 <ip address of laptop>
The large ICMP payload causes IPv4 fragmentation resulting in multiple
Ethernet frames per original IP packet.
Kernel debug within the ASIX driver was enabled to see whether any ASIX errors
were generated. The test was run for about 24 hours and no ASIX errors were
seen.
Patches
=======
The 3 patches have been rebased off the net-next repo master branch with HEAD
fbbeefd net: fec: Allow reception of frames bigger than 1522 bytes
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Dean Jenkins [Mon, 7 Aug 2017 08:50:16 +0000 (09:50 +0100)]
asix: Fix small memory leak in ax88772_unbind()
When Ethernet frames span mulitple URBs, the netdev buffer memory
pointed to by the asix_rx_fixup_info structure remains allocated
during the time gap between the 2 executions of asix_rx_fixup_internal().
This means that if ax88772_unbind() is called within this time
gap to free the memory of the parent private data structure then
a memory leak of the part filled netdev buffer memory will occur.
Therefore, create a new function asix_rx_fixup_common_free() to
free the memory of the netdev buffer and add a call to
asix_rx_fixup_common_free() from inside ax88772_unbind().
Consequently when an unbind occurs part way through receiving
an Ethernet frame, the netdev buffer memory that is holding part
of the received Ethernet frame will now be freed.
Signed-off-by: Dean Jenkins <Dean_Jenkins@mentor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dean Jenkins [Mon, 7 Aug 2017 08:50:15 +0000 (09:50 +0100)]
asix: Ensure asix_rx_fixup_info members are all reset
There is a risk that the members of the structure asix_rx_fixup_info
become unsynchronised leading to the possibility of a malfunction.
For example, rx->split_head was not being set to false after an
error was detected so potentially could cause a malformed 32-bit
Data header word to be formed.
Therefore add function reset_asix_rx_fixup_info() to reset all the
members of asix_rx_fixup_info so that future processing will start
with known initial conditions.
Also, if (skb->len != offset) becomes true then call
reset_asix_rx_fixup_info() so that the processing of the next URB
starts with known initial conditions. Without the call, the check
does nothing which potentially could lead to a malfunction
when the next URB is processed.
In addition, for robustness, call reset_asix_rx_fixup_info() before
every error path's "return 0". This ensures that the next URB is
processed from known initial conditions.
Signed-off-by: Dean Jenkins <Dean_Jenkins@mentor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dean Jenkins [Mon, 7 Aug 2017 08:50:14 +0000 (09:50 +0100)]
asix: Add rx->ax_skb = NULL after usbnet_skb_return()
In asix_rx_fixup_internal() there is a risk that rx->ax_skb gets
reused after passing the Ethernet frame into the network stack via
usbnet_skb_return().
The risks include:
a) asynchronously freeing rx->ax_skb after passing the netdev buffer
to the NAPI layer which might corrupt the backlog queue.
b) erroneously reusing rx->ax_skb such as calling skb_put_data() multiple
times which causes writing off the end of the netdev buffer.
Therefore add a defensive rx->ax_skb = NULL after usbnet_skb_return()
so that it is not possible to free rx->ax_skb or to apply
skb_put_data() too many times.
Signed-off-by: Dean Jenkins <Dean_Jenkins@mentor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Richter [Mon, 7 Aug 2017 08:16:36 +0000 (10:16 +0200)]
bpf: fix selftest/bpf/test_pkt_md_access on s390x
Commit
18f3d6be6be1 ("selftests/bpf: Add test cases to test narrower ctx field loads")
introduced new eBPF test cases. One of them (test_pkt_md_access.c)
fails on s390x. The BPF verifier error message is:
[root@s8360046 bpf]# ./test_progs
test_pkt_access:PASS:ipv4 349 nsec
test_pkt_access:PASS:ipv6 212 nsec
[....]
libbpf: load bpf program failed: Permission denied
libbpf: -- BEGIN DUMP LOG ---
libbpf:
0: (71) r2 = *(u8 *)(r1 +0)
invalid bpf_context access off=0 size=1
libbpf: -- END LOG --
libbpf: failed to load program 'test1'
libbpf: failed to load object './test_pkt_md_access.o'
Summary: 29 PASSED, 1 FAILED
[root@s8360046 bpf]#
This is caused by a byte endianness issue. S390x is a big endian
architecture. Pointer access to the lowest byte or halfword of a
four byte value need to add an offset.
On little endian architectures this offset is not needed.
Fix this and use the same approach as the originator used for other files
(for example test_verifier.c) in his original commit.
With this fix the test program test_progs succeeds on s390x:
[root@s8360046 bpf]# ./test_progs
test_pkt_access:PASS:ipv4 236 nsec
test_pkt_access:PASS:ipv6 217 nsec
test_xdp:PASS:ipv4 3624 nsec
test_xdp:PASS:ipv6 1722 nsec
test_l4lb:PASS:ipv4 926 nsec
test_l4lb:PASS:ipv6 1322 nsec
test_tcp_estats:PASS: 0 nsec
test_bpf_obj_id:PASS:get-fd-by-notexist-prog-id 0 nsec
test_bpf_obj_id:PASS:get-fd-by-notexist-map-id 0 nsec
test_bpf_obj_id:PASS:get-prog-info(fd) 0 nsec
test_bpf_obj_id:PASS:get-map-info(fd) 0 nsec
test_bpf_obj_id:PASS:get-prog-info(fd) 0 nsec
test_bpf_obj_id:PASS:get-map-info(fd) 0 nsec
test_bpf_obj_id:PASS:get-prog-fd(next_id) 0 nsec
test_bpf_obj_id:PASS:get-prog-info(next_id->fd) 0 nsec
test_bpf_obj_id:PASS:get-prog-fd(next_id) 0 nsec
test_bpf_obj_id:PASS:get-prog-info(next_id->fd) 0 nsec
test_bpf_obj_id:PASS:check total prog id found by get_next_id 0 nsec
test_bpf_obj_id:PASS:get-map-fd(next_id) 0 nsec
test_bpf_obj_id:PASS:get-map-fd(next_id) 0 nsec
test_bpf_obj_id:PASS:get-map-fd(next_id) 0 nsec
test_bpf_obj_id:PASS:get-map-fd(next_id) 0 nsec
test_bpf_obj_id:PASS:get-map-fd(next_id) 0 nsec
test_bpf_obj_id:PASS:get-map-fd(next_id) 0 nsec
test_bpf_obj_id:PASS:get-map-fd(next_id) 0 nsec
test_bpf_obj_id:PASS:check get-map-info(next_id->fd) 0 nsec
test_bpf_obj_id:PASS:get-map-fd(next_id) 0 nsec
test_bpf_obj_id:PASS:check get-map-info(next_id->fd) 0 nsec
test_bpf_obj_id:PASS:check total map id found by get_next_id 0 nsec
test_pkt_md_access:PASS: 277 nsec
Summary: 30 PASSED, 0 FAILED
[root@s8360046 bpf]#
Fixes:
18f3d6be6be1 ("selftests/bpf: Add test cases to test narrower ctx field loads")
Signed-off-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ludovic Desroches [Sun, 6 Aug 2017 14:00:05 +0000 (16:00 +0200)]
pinctrl: generic: update references to Documentation/pinctrl.txt
Update deprecated references to Documentation/pinctrl.txt since it has been
moved to Documentation/driver-api/pinctl.rst.
Signed-off-by: Ludovic Desroches <ludovic.desroches@o2linux.fr>
Fixes:
5a9b73832e9e ("pinctrl.txt: move it to the driver-api book")
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Andy Shevchenko [Fri, 4 Aug 2017 16:26:34 +0000 (19:26 +0300)]
pinctrl: intel: merrifield: Correct UART pin lists
UART pin lists consist GPIO numbers which is simply wrong.
Replace it by pin numbers.
Fixes:
4e80c8f50574 ("pinctrl: intel: Add Intel Merrifield pin controller support")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Gregory CLEMENT [Tue, 1 Aug 2017 15:57:20 +0000 (17:57 +0200)]
pinctrl: armada-37xx: Fix number of pin in south bridge
On the south bridge we have pin from to 29, so it gives 30 pins (and not
29).
Without this patch the kernel complain with the following traces:
cat /sys/kernel/debug/pinctrl/
d0018800.pinctrl/pingroups
[ 154.530205] armada-37xx-pinctrl
d0018800.pinctrl: failed to get pin(29) name
[ 154.537567] ------------[ cut here ]------------
[ 154.542348] WARNING: CPU: 1 PID: 1347 at /home/gclement/open/kernel/marvell-mainline-linux/drivers/pinctrl/core.c:1610 pinctrl_groups_show+0x15c/0x1a0
[ 154.555918] Modules linked in:
[ 154.558890] CPU: 1 PID: 1347 Comm: cat Tainted: G W
4.13.0-rc1-00001-g19e1b9fa219d #525
[ 154.568316] Hardware name: Marvell Armada 3720 Development Board DB-
88F3720-DDR3 (DT)
[ 154.576311] task:
ffff80001d32d100 task.stack:
ffff80001bdc0000
[ 154.583048] PC is at pinctrl_groups_show+0x15c/0x1a0
[ 154.587816] LR is at pinctrl_groups_show+0x148/0x1a0
[ 154.592847] pc : [<
ffff0000083e3adc>] lr : [<
ffff0000083e3ac8>] pstate:
00000145
[ 154.600840] sp :
ffff80001bdc3c80
[ 154.604255] x29:
ffff80001bdc3c80 x28:
00000000f7750000
[ 154.609825] x27:
ffff80001d05d198 x26:
0000000000000009
[ 154.615224] x25:
ffff0000089ead20 x24:
0000000000000002
[ 154.620705] x23:
ffff000008c8e1d0 x22:
ffff80001be55700
[ 154.626187] x21:
ffff80001d05d100 x20:
0000000000000005
[ 154.631667] x19:
0000000000000006 x18:
0000000000000010
[ 154.637238] x17:
0000000000000000 x16:
ffff0000081fc4b8
[ 154.642726] x15:
0000000000000006 x14:
ffff0000899e537f
[ 154.648214] x13:
ffff0000099e538d x12:
206f742064656c69
[ 154.653613] x11:
6166203a6c727463 x10:
0000000005f5e0ff
[ 154.659094] x9 :
ffff80001bdc38c0 x8 :
286e697020746567
[ 154.664576] x7 :
ffff000008551870 x6 :
000000000000011b
[ 154.670146] x5 :
0000000000000000 x4 :
0000000000000000
[ 154.675544] x3 :
0000000000000000 x2 :
0000000000000000
[ 154.681025] x1 :
ffff000008c8e1d0 x0 :
ffff80001be55700
[ 154.686507] Call trace:
[ 154.688668] Exception stack(0xffff80001bdc3ab0 to 0xffff80001bdc3be0)
[ 154.695224] 3aa0:
0000000000000006 0001000000000000
[ 154.703310] 3ac0:
ffff80001bdc3c80 ffff0000083e3adc ffff80001bdc3bb0 00000000ffffffd8
[ 154.711304] 3ae0:
4554535953425553 6f6674616c703d4d 4349564544006d72 6674616c702b3d45
[ 154.719478] 3b00:
313030643a6d726f 6e69702e30303838 ffff80006c727463 ffff0000089635d8
[ 154.727562] 3b20:
ffff80001d1ca0cb ffff000008af0fa4 ffff80001bdc3b40 ffff000008c8e1dc
[ 154.735648] 3b40:
ffff80001bdc3bc0 ffff000008223174 ffff80001be55700 ffff000008c8e1d0
[ 154.743731] 3b60:
0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 154.752354] 3b80:
000000000000011b ffff000008551870 286e697020746567 ffff80001bdc38c0
[ 154.760446] 3ba0:
0000000005f5e0ff 6166203a6c727463 206f742064656c69 ffff0000099e538d
[ 154.767910] 3bc0:
ffff0000899e537f 0000000000000006 ffff0000081fc4b8 0000000000000000
[ 154.776085] [<
ffff0000083e3adc>] pinctrl_groups_show+0x15c/0x1a0
[ 154.782823] [<
ffff000008222abc>] seq_read+0x184/0x460
[ 154.787505] [<
ffff000008344120>] full_proxy_read+0x60/0xa8
[ 154.793431] [<
ffff0000081f9bec>] __vfs_read+0x1c/0x110
[ 154.799001] [<
ffff0000081faff4>] vfs_read+0x84/0x140
[ 154.803860] [<
ffff0000081fc4fc>] SyS_read+0x44/0xa0
[ 154.808983] [<
ffff000008082f30>] el0_svc_naked+0x24/0x28
[ 154.814459] ---[ end trace
4cbb00a92d616b95 ]---
Cc: stable@vger.kernel.org
Fixes:
87466ccd9401 ("pinctrl: armada-37xx: Add pin controller support
for Armada 37xx")
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Gregory CLEMENT [Tue, 1 Aug 2017 15:57:19 +0000 (17:57 +0200)]
pinctrl: armada-37xx: Fix the pin 23 on south bridge
Pin 23 on South bridge does not belong to the rgmii group. It belongs to
a separate group which can have 3 functions.
Due to this the fix also have to update the way the functions are
managed. Until now each groups used NB_FUNCS(which was 2) functions. For
the mpp23, 3 functions are available but it is the only group which needs
it, so on the loop involving NB_FUNCS an extra test was added to handle
only the functions added.
The bug was visible with the merge of the commit
07d065abf93d "arm64:
dts: marvell: armada-3720-db: Add vqmmc regulator for SD slot", the gpio
regulator used the gpio 23, due to this the whole rgmii group was setup
to gpio which broke the Ethernet support on the Armada 3720 DB
board. Thanks to this patch, the UHS SD cards (which need the vqmmc)
_and_ the Ethernet work again.
Cc: stable@vger.kernel.org
Fixes:
87466ccd9401 ("pinctrl: armada-37xx: Add pin controller support
for Armada 37xx")
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
stephen hemminger [Fri, 4 Aug 2017 00:13:54 +0000 (17:13 -0700)]
netvsc: fix race on sub channel creation
The existing sub channel code did not wait for all the sub-channels
to completely initialize. This could lead to race causing crash
in napi_netif_del() from bad list. The existing code would send
an init message, then wait only for the initial response that
the init message was received. It thought it was waiting for
sub channels but really the init response did the wakeup.
The new code keeps track of the number of open channels and
waits until that many are open.
Other issues here were:
* host might return less sub-channels than was requested.
* the new init status is not valid until after init was completed.
Fixes:
b3e6b82a0099 ("hv_netvsc: Wait for sub-channels to be processed during probe")
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Mon, 7 Aug 2017 01:44:49 +0000 (18:44 -0700)]
Linux 4.13-rc4
Linus Torvalds [Sun, 6 Aug 2017 23:11:34 +0000 (16:11 -0700)]
Merge tag 'platform-drivers-x86-v4.13-4' of git://git.infradead.org/linux-platform-drivers-x86
Pull x86 platform driver fix from Darren Hart:
"Fix loop preventing some platforms from waking up via the power button
in s2idle:
- intel-vbtn: match power button on press rather than release"
* tag 'platform-drivers-x86-v4.13-4' of git://git.infradead.org/linux-platform-drivers-x86:
platform/x86: intel-vbtn: match power button on press rather than release
Linus Torvalds [Sun, 6 Aug 2017 19:31:17 +0000 (12:31 -0700)]
Merge tag 'ext4_for_linus_stable' of git://git./linux/kernel/git/tytso/ext4
Pull ext4 fixes from Ted Ts'o:
"A large number of ext4 bug fixes and cleanups for v4.13"
* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: fix copy paste error in ext4_swap_extents()
ext4: fix overflow caused by missing cast in ext4_resize_fs()
ext4, project: expand inode extra size if possible
ext4: cleanup ext4_expand_extra_isize_ea()
ext4: restructure ext4_expand_extra_isize
ext4: fix forgetten xattr lock protection in ext4_expand_extra_isize
ext4: make xattr inode reads faster
ext4: inplace xattr block update fails to deduplicate blocks
ext4: remove unused mode parameter
ext4: fix warning about stack corruption
ext4: fix dir_nlink behaviour
ext4: silence array overflow warning
ext4: fix SEEK_HOLE/SEEK_DATA for blocksize < pagesize
ext4: release discard bio after sending discard commands
ext4: convert swap_inode_data() over to use swap() on most of the fields
ext4: error should be cleared if ea_inode isn't added to the cache
ext4: Don't clear SGID when inheriting ACLs
ext4: preserve i_mode if __ext4_set_acl() fails
ext4: remove unused metadata accounting variables
ext4: correct comment references to ext4_ext_direct_IO()
Linus Torvalds [Sun, 6 Aug 2017 18:52:01 +0000 (11:52 -0700)]
Merge branch 'upstream' of git://git.linux-mips.org/ralf/upstream-linus
Pull MIPS fixes from Ralf Baechle:
"This fixes two build issues for ralink platforms, both due to missing
#includes which used to be included indirectly via other headers"
* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
MIPS: ralink: mt7620: Add missing header
MIPS: ralink: Fix build error due to missing header
Dmitry V. Levin [Sat, 5 Aug 2017 20:00:50 +0000 (23:00 +0300)]
Fix compat_sys_sigpending breakage
The latest change of compat_sys_sigpending in commit
8f13621abced
("sigpending(): move compat to native") has broken it in two ways.
First, it tries to write 4 bytes more than userspace expects:
sizeof(old_sigset_t) == sizeof(long) == 8 instead of
sizeof(compat_old_sigset_t) == sizeof(u32) == 4.
Second, on big endian architectures these bytes are being written in the
wrong order.
This bug was found by strace test suite.
Reported-by: Anatoly Pugachev <matorola@gmail.com>
Inspired-by: Eugene Syromyatnikov <evgsyr@gmail.com>
Fixes:
8f13621abced ("sigpending(): move compat to native")
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Maninder Singh [Sun, 6 Aug 2017 05:33:07 +0000 (01:33 -0400)]
ext4: fix copy paste error in ext4_swap_extents()
This bug was found by a static code checker tool for copy paste
problems.
Signed-off-by: Maninder Singh <maninder1.s@samsung.com>
Signed-off-by: Vaneet Narang <v.narang@samsung.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Jerry Lee [Sun, 6 Aug 2017 05:18:31 +0000 (01:18 -0400)]
ext4: fix overflow caused by missing cast in ext4_resize_fs()
On a 32-bit platform, the value of n_blcoks_count may be wrong during
the file system is resized to size larger than 2^32 blocks. This may
caused the superblock being corrupted with zero blocks count.
Fixes:
1c6bd7173d66
Signed-off-by: Jerry Lee <jerrylee@qnap.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org # 3.7+
Miao Xie [Sun, 6 Aug 2017 05:00:49 +0000 (01:00 -0400)]
ext4, project: expand inode extra size if possible
When upgrading from old format, try to set project id
to old file first time, it will return EOVERFLOW, but if
that file is dirtied(touch etc), changing project id will
be allowed, this might be confusing for users, we could
try to expand @i_extra_isize here too.
Reported-by: Zhang Yi <yi.zhang@huawei.com>
Signed-off-by: Miao Xie <miaoxie@huawei.com>
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Miao Xie [Sun, 6 Aug 2017 04:55:48 +0000 (00:55 -0400)]
ext4: cleanup ext4_expand_extra_isize_ea()
Clean up some goto statement, make ext4_expand_extra_isize_ea() clearer.
Signed-off-by: Miao Xie <miaoxie@huawei.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Miao Xie [Sun, 6 Aug 2017 04:40:01 +0000 (00:40 -0400)]
ext4: restructure ext4_expand_extra_isize
Current ext4_expand_extra_isize just tries to expand extra isize, if
someone is holding xattr lock or some check fails, it will give up.
So rename its name to ext4_try_to_expand_extra_isize.
Besides that, we clean up unnecessary check and move some relative checks
into it.
Signed-off-by: Miao Xie <miaoxie@huawei.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Miao Xie [Sun, 6 Aug 2017 04:27:38 +0000 (00:27 -0400)]
ext4: fix forgetten xattr lock protection in ext4_expand_extra_isize
We should avoid the contention between the i_extra_isize update and
the inline data insertion, so move the xattr trylock in front of
i_extra_isize update.
Signed-off-by: Miao Xie <miaoxie@huawei.com>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Tahsin Erdogan [Sun, 6 Aug 2017 04:07:01 +0000 (00:07 -0400)]
ext4: make xattr inode reads faster
ext4_xattr_inode_read() currently reads each block sequentially while
waiting for io operation to complete before moving on to the next
block. This prevents request merging in block layer.
Add a ext4_bread_batch() function that starts reads for all blocks
then optionally waits for them to complete. A similar logic is used
in ext4_find_entry(), so update that code to use the new function.
Signed-off-by: Tahsin Erdogan <tahsin@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Tahsin Erdogan [Sun, 6 Aug 2017 02:41:42 +0000 (22:41 -0400)]
ext4: inplace xattr block update fails to deduplicate blocks
When an xattr block has a single reference, block is updated inplace
and it is reinserted to the cache. Later, a cache lookup is performed
to see whether an existing block has the same contents. This cache
lookup will most of the time return the just inserted entry so
deduplication is not achieved.
Running the following test script will produce two xattr blocks which
can be observed in "File ACL: " line of debugfs output:
mke2fs -b 1024 -I 128 -F -O extent /dev/sdb 1G
mount /dev/sdb /mnt/sdb
touch /mnt/sdb/{x,y}
setfattr -n user.1 -v aaa /mnt/sdb/x
setfattr -n user.2 -v bbb /mnt/sdb/x
setfattr -n user.1 -v aaa /mnt/sdb/y
setfattr -n user.2 -v bbb /mnt/sdb/y
debugfs -R 'stat x' /dev/sdb | cat
debugfs -R 'stat y' /dev/sdb | cat
This patch defers the reinsertion to the cache so that we can locate
other blocks with the same contents.
Signed-off-by: Tahsin Erdogan <tahsin@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Tahsin Erdogan [Sun, 6 Aug 2017 02:15:45 +0000 (22:15 -0400)]
ext4: remove unused mode parameter
ext4_alloc_file_blocks() does not use its mode parameter. Remove it.
Signed-off-by: Tahsin Erdogan <tahsin@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Arnd Bergmann [Sun, 6 Aug 2017 01:57:46 +0000 (21:57 -0400)]
ext4: fix warning about stack corruption
After commit
62d1034f53e3 ("fortify: use WARN instead of BUG for now"),
we get a warning about possible stack overflow from a memcpy that
was not strictly bounded to the size of the local variable:
inlined from 'ext4_mb_seq_groups_show' at fs/ext4/mballoc.c:2322:2:
include/linux/string.h:309:9: error: '__builtin_memcpy': writing between 161 and 1116 bytes into a region of size 160 overflows the destination [-Werror=stringop-overflow=]
We actually had a bug here that would have been found by the warning,
but it was already fixed last year in commit
30a9d7afe70e ("ext4: fix
stack memory corruption with 64k block size").
This replaces the fixed-length structure on the stack with a variable-length
structure, using the correct upper bound that tells the compiler that
everything is really fine here. I also change the loop count to check
for the same upper bound for consistency, but the existing code is
already correct here.
Note that while clang won't allow certain kinds of variable-length arrays
in structures, this particular instance is fine, as the array is at the
end of the structure, and the size is strictly bounded.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Andreas Dilger [Sat, 5 Aug 2017 23:47:34 +0000 (19:47 -0400)]
ext4: fix dir_nlink behaviour
The dir_nlink feature has been enabled by default for new ext4
filesystems since e2fsprogs-1.41 in 2008, and was automatically
enabled by the kernel for older ext4 filesystems since the
dir_nlink feature was added with ext4 in kernel 2.6.28+ when
the subdirectory count exceeded EXT4_LINK_MAX-1.
Automatically adding the file system features such as dir_nlink is
generally frowned upon, since it could cause the file system to not be
mountable on older kernel, thus preventing the administrator from
rolling back to an older kernel if necessary.
In this case, the administrator might also want to disable the feature
because glibc's fts_read() function does not correctly optimize
directory traversal for directories that use st_nlinks field of 1 to
indicate that the number of links in the directory are not tracked by
the file system, and could fail to traverse the full directory
hierarchy. Fortunately, in the past ten years very few users have
complained about incomplete file system traversal by glibc's
fts_read().
This commit also changes ext4_inc_count() to allow i_nlinks to reach
the full EXT4_LINK_MAX links on the parent directory (including "."
and "..") before changing i_links_count to be 1.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=196405
Signed-off-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Dan Carpenter [Sat, 5 Aug 2017 23:00:31 +0000 (19:00 -0400)]
ext4: silence array overflow warning
I get a static checker warning:
fs/ext4/ext4.h:3091 ext4_set_de_type()
error: buffer overflow 'ext4_type_by_mode' 15 <= 15
It seems unlikely that we would hit this read overflow in real life, but
it's also simple enough to make the array 16 bytes instead of 15.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Jan Kara [Sat, 5 Aug 2017 21:43:24 +0000 (17:43 -0400)]
ext4: fix SEEK_HOLE/SEEK_DATA for blocksize < pagesize
ext4_find_unwritten_pgoff() does not properly handle a situation when
starting index is in the middle of a page and blocksize < pagesize. The
following command shows the bug on filesystem with 1k blocksize:
xfs_io -f -c "falloc 0 4k" \
-c "pwrite 1k 1k" \
-c "pwrite 3k 1k" \
-c "seek -a -r 0" foo
In this example, neither lseek(fd, 1024, SEEK_HOLE) nor lseek(fd, 2048,
SEEK_DATA) will return the correct result.
Fix the problem by neglecting buffers in a page before starting offset.
Reported-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jan Kara <jack@suse.cz>
CC: stable@vger.kernel.org # 3.8+
Mario Limonciello [Fri, 4 Aug 2017 17:00:06 +0000 (12:00 -0500)]
platform/x86: intel-vbtn: match power button on press rather than release
This fixes a problem where the system gets stuck in a loop
unable to wakeup via power button in s2idle.
The problem happens because:
- press power button:
- system emits 0xc0 (power press), event ignored
- system emits 0xc1 (power release), event processed,
emited as KEY_POWER
- set wakeup_mode to true
- system goes to s2idle
- press power button
- system emits 0xc0 (power press), wakeup_mode is true,
system wakes
- system emits 0xc1 (power release), event processed,
emited as KEY_POWER
- system goes to s2idle again
To avoid this situation, process the presses (which matches what
intel-hid does too).
Verified on an Dell XPS 9365
Signed-off-by: Mario Limonciello <mario.limonciello@dell.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Darren Hart (VMware) <dvhart@infradead.org>
Linus Torvalds [Sat, 5 Aug 2017 21:09:26 +0000 (14:09 -0700)]
Merge tag 'media/v4.13-2' of git://git./linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
"This series is larger than I would like to submit for -rc4. My
original intent were to sent it to either -rc2 or -rc3. Unfortunately,
due to my vacations, I got a lot of pending stuff after my return, and
had to do some biz trips, with prevented me to send this earlier.
Several fixes:
- some fixes at atomisp staging driver
- several gcc 7 warning fixes
- cleanup media SVG files, in order to fix PDF build on some distros
- fix random Kconfig build of venus driver
- some fixes for the venus driver
- some changes from semaphone to mutex in ngene's driver
- some locking fixes at dib0700 driver
- several fixes on ngene's driver and frontends to make it properly
support some new boards added on Kernel 4.13
- some fixes to CEC drivers
- omap_vout: vrfb: convert to dmaengine
- docs-rst: document EBUSY for VIDIOC_S_FMT
Please notice that the big diffstat changes here are at the SVG files.
Visually, the images look the same, but the file size is now a lot
smaller than before, and they don't use some XML tags that would cause
them to be badly parsed by some ImageMagick versions, or to require a
lot of memory by TeTex, with would break PDF output on some
distributions"
* tag 'media/v4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (68 commits)
media: atomisp2: array underflow in imx_enum_frame_size()
media: atomisp2: array underflow in ap1302_enum_frame_size()
media: atomisp2: Array underflow in atomisp_enum_input()
media: platform: davinci: drop VPFE_CMD_S_CCDC_RAW_PARAMS
media: platform: davinci: return -EINVAL for VPFE_CMD_S_CCDC_RAW_PARAMS ioctl
media: venus: don't abuse dma_alloc for non-DMA allocations
media: venus: hfi: fix error handling in hfi_sys_init_done()
media: venus: fix compile-test build on non-qcom ARM platform
media: venus: mark PM functions as __maybe_unused
media: cec-notifier: small improvements
media: pulse8-cec: persistent_config should be off by default
media: cec: cec_transmit_attempt_done: ignore CEC_TX_STATUS_MAX_RETRIES
media: staging: atomisp: array underflow in ioctl
media: lirc: LIRC_GET_REC_RESOLUTION should return microseconds
media: svg: avoid too long lines
media: svg files: simplify files
media: selection.svg: simplify the SVG file
media: vimc: set id_table for platform drivers
media: staging: atomisp: disable warnings with cc-disable-warning
media: davinci: variable 'common' set but not used
...
Daeho Jeong [Sat, 5 Aug 2017 17:11:57 +0000 (13:11 -0400)]
ext4: release discard bio after sending discard commands
We've changed the discard command handling into parallel manner.
But, in this change, I forgot decreasing the usage count of the bio
which was used to send discard request. I'm sorry about that.
Fixes:
a015434480dc ("ext4: send parallel discards on commit completions")
Signed-off-by: Daeho Jeong <daeho.jeong@samsung.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Linus Torvalds [Sat, 5 Aug 2017 13:55:13 +0000 (06:55 -0700)]
Merge tag 'gpio-v4.13-2' of git://git./linux/kernel/git/linusw/linux-gpio
Pull GPIO fixes from Linus Walleij:
- LP87565: set the proper output level for direction_output.
- stm32: fix the kernel build by selecting the hierarchical irqdomain
symbol properly - this happens to be done in the pin control
framework but whatever, it had dependencies to GPIO so we need to
apply it here.
- Select the hierarchical IRQ domain also for Xgene.
- Fix wakeups to work on MXC.
- Fix up the device tree binding on Exar that went astray, also add the
right bindings.
- Fix the unwanted events for edges from the library.
- Fix the unbalanced chanined IRQ on the Tegra.
* tag 'gpio-v4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
gpio: tegra: fix unbalanced chained_irq_enter/exit
gpiolib: skip unwanted events, don't convert them to opposite edge
gpio: exar: Use correct property prefix and document bindings
gpio: gpio-mxc: Fix: higher 16 GPIOs usable as wake source
gpio: xgene-sb: select IRQ_DOMAIN_HIERARCHY
pinctrl: stm32: select IRQ_DOMAIN_HIERARCHY instead of depends on
gpio: lp87565: Set proper output level and direction for direction_output
MAINTAINERS: Add entry for Whiskey Cove PMIC GPIO driver
Brian Norris [Sat, 5 Aug 2017 01:42:37 +0000 (18:42 -0700)]
Merge tag 'nand/fixes-for-4.13-rc4' of git://git.infradead.org/l2-mtd into MTD
"""
This PR contains both core and drivers fixes for 4.13.
Core fixes:
- Fix data interface setup for ONFI NANDs that do not support the SET
FEATURES command
- Fix a kernel doc header
- Fix potential integer overflow when retrieving timing information
from the parameter page
- Fix wrong OOB layout for small page NANDs
Driver fixes:
- Fix potential division-by-zero bug
- Fix backward compat with old atmel-nand DT bindings
- Fix ->setup_data_interface() in the atmel NAND driver
"""
Linus Torvalds [Fri, 4 Aug 2017 23:45:29 +0000 (16:45 -0700)]
Merge tag 'clk-fixes-for-linus' of git://git./linux/kernel/git/clk/linux
Pull clk fixes from Stephen Boyd:
"A handful of critical fixes for changes introduce this merge window.
- The TI sci_clk_get() API was pretty broken and nobody noticed.
- There were some CPUfreq crashes on C.H.I.P devices because we
failed to propagate rates up the clk tree.
- Also, the Intel Atom PMC clk driver needs to mark a clk critical if
the firmware has it enabled already so that audio doesn't get
killed on Baytrail.
- Gemini devices have a dead serial console because the reset control
usage in the serial driver assume one method of reset that gemini
doesn't support (this will be fixed in the next version in the
reset framework so this is the small fix for -rc series).
- Finally we have two rate calculation fixes, one for Exynos and one
for Meson SoCs, that fix rate inconsistencies"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: keystone: sci-clk: Fix sci_clk_get
clk: meson: mpll: fix mpll0 fractional part ignored
clk: samsung: exynos5420: The EPLL rate table corrections
clk: sunxi-ng: sun5i: Add clk_set_rate_parent to the CPU clock
clk: x86: Do not gate clocks enabled by the firmware
clk: gemini: Fix reset regression
Daniel Borkmann [Fri, 4 Aug 2017 20:24:41 +0000 (22:24 +0200)]
bpf: fix byte order test in test_verifier
We really must check with #if __BYTE_ORDER == XYZ instead of
just presence of #ifdef __LITTLE_ENDIAN. I noticed that when
actually running this on big endian machine, the latter test
resolves to true for user space, same for #ifdef __BIG_ENDIAN.
E.g., looking at endian.h from libc, both are also defined
there, so we really must test this against __BYTE_ORDER instead
for proper insns selection. For the kernel, such checks are
fine though e.g. see
13da9e200fe4 ("Revert "endian: #define
__BYTE_ORDER"") and
415586c9e6d3 ("UAPI: fix endianness conditionals
in M32R's asm/stat.h") for some more context, but not for
user space. Lets also make sure to properly include endian.h.
After that, suite passes for me:
./test_verifier: ELF 64-bit MSB executable, [...]
Linux foo 4.13.0-rc3+ #4 SMP Fri Aug 4 06:59:30 EDT 2017 s390x s390x s390x GNU/Linux
Before fix: Summary: 505 PASSED, 11 FAILED
After fix: Summary: 516 PASSED, 0 FAILED
Fixes:
18f3d6be6be1 ("selftests/bpf: Add test cases to test narrower ctx field loads")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong <yhs@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 4 Aug 2017 22:18:27 +0000 (15:18 -0700)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm
Pull KVM fixes from Radim Krčmář:
"ARM:
- Yet another race with VM destruction plugged
- A set of small vgic fixes
x86:
- Preserve pending INIT
- RCU fixes in paravirtual async pf, VM teardown, and VMXOFF
emulation
- nVMX interrupt injection and dirty tracking fixes
- initialize to make UBSAN happy"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: arm/arm64: vgic: Use READ_ONCE fo cmpxchg
KVM: nVMX: Fix interrupt window request with "Acknowledge interrupt on exit"
KVM: nVMX: mark vmcs12 pages dirty on L2 exit
kvm: nVMX: don't flush VMCS12 during VMXOFF or VCPU teardown
KVM: nVMX: do not pin the VMCS12
KVM: avoid using rcu_dereference_protected
KVM: X86: init irq->level in kvm_pv_kick_cpu_op
KVM: X86: Fix loss of pending INIT due to race
KVM: async_pf: make rcu irq exit if not triggered from idle task
KVM: nVMX: fixes to nested virt interrupt injection
KVM: nVMX: do not fill vm_exit_intr_error_code in prepare_vmcs12
KVM: arm/arm64: Handle hva aging while destroying the vm
KVM: arm/arm64: PMU: Fix overflow interrupt injection
KVM: arm/arm64: Fix bug in advertising KVM_CAP_MSI_DEVID capability
Linus Torvalds [Fri, 4 Aug 2017 22:16:09 +0000 (15:16 -0700)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 fix from Thomas Gleixner:
"The recent irq core changes unearthed API abuse in the HPET code,
which manifested itself in a suspend/resume regression.
The fix replaces the cruft with the proper function calls and cures
the regression"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/hpet: Cure interface abuse in the resume path
Linus Torvalds [Fri, 4 Aug 2017 22:14:09 +0000 (15:14 -0700)]
Merge branch 'timers-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull timer fix from Thomas Gleixner:
"A single fix for a multiplication overflow in the timer code on 32bit
systems"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timers: Fix overflow in get_next_timer_interrupt
Linus Torvalds [Fri, 4 Aug 2017 22:12:15 +0000 (15:12 -0700)]
Merge tag 'armsoc-fixes' of git://git./linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Arnd Bergmann:
"This comes a bit later than I planned, and as a consequence is a
larger than it should be.
Most of the changes are devicetree fixes, across lots of platforms:
Renesas, Samsung Exynos, Marvell EBU, TI OMAP, Rockchips, Amlogic
Meson, Sigma Desings Tango, Allwinner SUNxi and TI Davinci.
Also across many platforms, I applied an older series of simple
randconfig build fixes. This includes making the CONFIG_MTD_XIP option
compile again, which had been broken for many years and probably has
not been missed, but it felt wrong to just remove it completely.
The only other changes are:
- We enable HWSPINLOCK in defconfig to get some Qualcomm boards to
work out of the box.
- A few regression fixes for Texas Instruments OMAP2+.
- A boot regression fix for the Renesas regulator quirk.
- A suspend/resume fix for Uniphier SoCs, fixing the resume of the
system bus"
* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (43 commits)
ARM: dts: tango4: Request RGMII RX and TX clock delays
bus: uniphier-system-bus: set up registers when resuming
ARM64: dts: marvell: armada-37xx: Fix the number of GPIO on south bridge
ARM: shmobile: rcar-gen2: Fix deadlock in regulator quirk
arm64: defconfig: enable missing HWSPINLOCK
ARM: pxa: select both FB and FB_W100 for eseries
ARM: ixp4xx: fix ioport_unmap definition
ARM: ep93xx: use ARM_PATCH_PHYS_VIRT correctly
ARM: mmp: mark usb_dma_mask as __maybe_unused
ARM: omap2: mark unused functions as __maybe_unused
ARM: omap1: avoid unused variable warning
ARM: sirf: mark sirfsoc_init_late as __maybe_unused
ARM: ixp4xx: use normal prototype for {read,write}s{b,w,l}
ARM: omap1/ams-delta: warn about failed regulator enable
ARM: rpc: rename RAM_SIZE macro
ARM: w90x900: normalize clk API
ARM: ep93xx: normalize clk API
ARM: dts: sun8i: a83t: Switch to CCU device tree binding macros
arm64: allwinner: sun50i-a64: Correct emac register size
ARM: dts: sunxi: h3/h5: Correct emac register size
...
Lukas Czerner [Thu, 3 Aug 2017 20:19:13 +0000 (13:19 -0700)]
xfs: Fix per-inode DAX flag inheritance
According to the commit that implemented per-inode DAX flag:
commit
58f88ca2df72 ("xfs: introduce per-inode DAX enablement")
the flag is supposed to act as "inherit flag".
Currently this only works in the situations where parent directory
already has a flag in di_flags set, otherwise inheritance does not
work. This is because setting the XFS_DIFLAG2_DAX flag is done in a
wrong branch designated for di_flags, not di_flags2.
Fix this by moving the code to branch designated for setting di_flags2,
which does test for flags in di_flags2.
Fixes:
58f88ca2df72 ("xfs: introduce per-inode DAX enablement")
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Jan Kara [Wed, 2 Aug 2017 19:37:16 +0000 (12:37 -0700)]
xfs: Fix leak of discard bio
The bio describing discard operation is allocated by
__blkdev_issue_discard() which returns us a reference to it. That
reference is never released and thus we leak this bio. Drop the bio
reference once it completes in xlog_discard_endio().
CC: stable@vger.kernel.org
Fixes:
4560e78f40cb55bd2ea8f1ef4001c5baa88531c7
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Linus Torvalds [Fri, 4 Aug 2017 19:11:48 +0000 (12:11 -0700)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"Here are some more arm64 fixes for 4.13. The main one is the PTE race
with the hardware walker, but there are a couple of other things too.
- Report correct timer frequency to userspace when trapping
CNTFRQ_EL0
- Fix race with hardware page table updates when updating access
flags
- Silence clang overflow warning in VA_START and PAGE_OFFSET
calculations"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: avoid overflow in VA_START and PAGE_OFFSET
arm64: Fix potential race with hardware DBM in ptep_set_access_flags()
arm64: Use arch_timer_get_rate when trapping CNTFRQ_EL0
Dan Carpenter [Fri, 4 Aug 2017 08:12:08 +0000 (11:12 +0300)]
IB/hns: checking for IS_ERR() instead of NULL
The hns_roce_v1_create_lp_qp() returns NULL on error, not error pointers.
Fixes:
bfcc681bd09d ("IB/hns: Fix the bug when free mr")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Thomas Bogendoerfer [Thu, 3 Aug 2017 13:43:14 +0000 (15:43 +0200)]
xgene: Always get clk source, but ignore if it's missing for SGMII ports
Even the driver doesn't do anything with the clk source for SGMII
ports it needs to be enabled by doing a devm_clk_get(), if there is
a clk source in DT.
Fixes:
0db01097cabd ('xgene: Don't fail probe, if there is no clk resource for SGMII interfaces')
Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
Tested-by: Laura Abbott <labbott@redhat.com>
Acked-by: Iyappan Subramanian <isubramanian@apm.com>
Tested-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Leon Romanovsky [Tue, 1 Aug 2017 06:41:37 +0000 (09:41 +0300)]
RDMA/mlx5: Fix existence check for extended address vector
The extended address vector is the highest bit in be32 variable,
but it was compared with the lowest. This patch fixes the endianness
of that check and removes already declared define.
Fixes:
17d2f88f92ce ("IB/mlx5: Add ODP atomics support")
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Yishai Hadas [Tue, 1 Aug 2017 06:41:36 +0000 (09:41 +0300)]
IB/uverbs: Fix device cleanup
Uverbs device should be cleaned up only when there is no
potential usage of.
As part of ib_uverbs_remove_one which might be triggered upon reset flow
the device reference count is decreased as expected and leave the final
cleanup to the FDs that were opened.
Current code increases reference count upon opening a new command FD and
decreases it upon closing the file. The event FD is opened internally
and rely on the command FD by taking on it a reference count.
In case that the command FD was closed and just later the event FD we
may ensure that the device resources as of srcu are still alive as they
are still in use.
Fixing the above by moving the reference count decreasing to the place
where the command FD is really freed instead of doing that when it was
just closed.
fixes:
036b10635739 ("IB/uverbs: Enable device removal when there are active user space applications")
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Tested-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Leon Romanovsky [Tue, 1 Aug 2017 06:41:35 +0000 (09:41 +0300)]
RDMA/uverbs: Prevent leak of reserved field
initialize to zero the response structure to prevent
the leakage of "resp.reserved" field.
drivers/infiniband/core/uverbs_cmd.c:1178 ib_uverbs_resize_cq() warn:
check that 'resp.reserved' doesn't leak information
Fixes:
33b9b3ee9709 ("IB: Add userspace support for resizing CQs")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Parav Pandit [Tue, 1 Aug 2017 06:41:34 +0000 (09:41 +0300)]
IB/core: Fix race condition in resolving IP to MAC
Currently while resolving IP address to MAC address single delayed work
is used for resolving multiple such resolve requests. This singled work
is essentially performs two tasks.
(a) any retry needed to resolve and
(b) it executes the callback function for all completed requests
While work is executing callbacks, any new work scheduled on for this
workqueue is lost because workqueue has completed looking at all pending
requests and now looking at callbacks, but work is still under
execution. Any further retry to look at pending requests in
process_req() after executing callbacks would lead to similar race
condition (may be reduce the probably further but doesn't eliminate it).
Retrying to enqueue work that from queue_req() context is not something
rest of the kernel modules have followed.
Therefore fix in this patch utilizes kernel facility to enqueue multiple
work items to a workqueue. This ensures that no such requests
gets lost in synchronization. Request list is still maintained so that
rdma_cancel_addr() can unlink the request and get the completion with
error sooner. Neighbour update event handling continues to be handled in
same way as before.
Additionally process_req() work entry cancels any pending work for a
request that gets completed while processing those requests.
Originally ib_addr was ST workqueue, but it became MT work queue with
patch of [1]. This patch again makes it similar to ST so that
neighbour update events handler work item doesn't race with
other work items.
In one such below trace, (though on 4.5 based kernel) it can be seen
that process_req() never executed the callback, which is likely for an
event that was schedule by queue_req() when previous callback was
getting executed by workqueue.
[<
ffffffff816b0dde>] schedule+0x3e/0x90
[<
ffffffff816b3c45>] schedule_timeout+0x1b5/0x210
[<
ffffffff81618c37>] ? ip_route_output_flow+0x27/0x70
[<
ffffffffa027f9c9>] ? addr_resolve+0x149/0x1b0 [ib_addr]
[<
ffffffff816b228f>] wait_for_completion+0x10f/0x170
[<
ffffffff810b6140>] ? try_to_wake_up+0x210/0x210
[<
ffffffffa027f220>] ? rdma_copy_addr+0xa0/0xa0 [ib_addr]
[<
ffffffffa0280120>] rdma_addr_find_l2_eth_by_grh+0x1d0/0x278 [ib_addr]
[<
ffffffff81321297>] ? sub_alloc+0x77/0x1c0
[<
ffffffffa02943b7>] ib_init_ah_from_wc+0x3a7/0x5a0 [ib_core]
[<
ffffffffa0457aba>] cm_req_handler+0xea/0x580 [ib_cm]
[<
ffffffff81015982>] ? __switch_to+0x212/0x5e0
[<
ffffffffa04582fd>] cm_work_handler+0x6d/0x150 [ib_cm]
[<
ffffffff810a14c1>] process_one_work+0x151/0x4b0
[<
ffffffff810a1940>] worker_thread+0x120/0x480
[<
ffffffff816b074b>] ? __schedule+0x30b/0x890
[<
ffffffff810a1820>] ? process_one_work+0x4b0/0x4b0
[<
ffffffff810a1820>] ? process_one_work+0x4b0/0x4b0
[<
ffffffff810a6b1e>] kthread+0xce/0xf0
[<
ffffffff810a6a50>] ? kthread_freezable_should_stop+0x70/0x70
[<
ffffffff816b53a2>] ret_from_fork+0x42/0x70
[<
ffffffff810a6a50>] ? kthread_freezable_should_stop+0x70/0x70
INFO: task kworker/u144:1:156520 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
kworker/u144:1 D
ffff883ffe1d7600 0 156520 2 0x00000080
Workqueue: ib_addr process_req [ib_addr]
ffff883f446fbbd8 0000000000000046 ffff881f95280000 ffff881ff24de200
ffff883f66120000 ffff883f446f8008 ffff881f95280000 ffff883f6f9208c4
ffff883f6f9208c8 00000000ffffffff ffff883f446fbbf8 ffffffff816b0dde
[1] http://lkml.iu.edu/hypermail/linux/kernel/1608.1/05834.html
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
David Daney [Fri, 4 Aug 2017 00:10:12 +0000 (17:10 -0700)]
MIPS: Add missing file for eBPF JIT.
Inexplicably, commit
f381bf6d82f0 ("MIPS: Add support for eBPF JIT.")
lost a file somewhere on its path to Linus' tree. Add back the
missing ebpf_jit.c so that we can build with CONFIG_BPF_JIT selected.
This version of ebpf_jit.c is identical to the original except for two
minor change need to resolve conflicts with changes merged from the
BPF branch:
A) Set prog->jited_len = image_size;
B) Use BPF_TAIL_CALL instead of BPF_CALL | BPF_X
Fixes:
f381bf6d82f0 ("MIPS: Add support for eBPF JIT.")
Signed-off-by: David Daney <david.daney@cavium.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>