Mark Rutland [Tue, 5 Jan 2016 17:33:34 +0000 (17:33 +0000)]
UPSTREAM: arm64: entry: remove pointless SPSR mode check
In work_pending, we may skip work if the stacked SPSR value represents
anything other than an EL0 context. We then immediately invoke the
kernel_exit 0 macro as part of ret_to_user, assuming a return to EL0.
This is somewhat confusing.
We use work_pending as part of the ret_to_user/ret_fast_syscall state
machine. We only use ret_fast_syscall in the return from an SVC issued
from EL0. We use ret_to_user for return from EL0 exception handlers and
also for return from ret_from_fork in the case the task was not a kernel
thread (i.e. it is a user task).
Thus in all cases the stacked SPSR value must represent an EL0 context,
and the check is redundant. This patch removes it, along with the now
unused no_work_pending label.
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Bug:
30369029
Patchset: rework-pagetable
(cherry picked from commit
ee03353bc04f8e460cc4e3da80d9721d9ecb89f1)
Signed-off-by: Jeff Vander Stoep <jeffv@google.com>
Change-Id: I2738149342ef469e8cde7c5f8f7c65daab93fb2b
Will Deacon [Tue, 5 Jan 2016 15:36:59 +0000 (15:36 +0000)]
UPSTREAM: arm64: mm: move pgd_cache initialisation to pgtable_cache_init
Initialising the suppport for EFI runtime services requires us to
allocate a pgd off the back of an early_initcall. On systems where the
PGD_SIZE is smaller than PAGE_SIZE (e.g. 64k pages and 48-bit VA), the
pgd_cache isn't initialised at this stage, and we panic with a NULL
dereference during boot:
Unable to handle kernel NULL pointer dereference at virtual address
00000000
__create_mapping.isra.5+0x84/0x350
create_pgd_mapping+0x20/0x28
efi_create_mapping+0x5c/0x6c
arm_enable_runtime_services+0x154/0x1e4
do_one_initcall+0x8c/0x190
kernel_init_freeable+0x84/0x1ec
kernel_init+0x10/0xe0
ret_from_fork+0x10/0x50
This patch fixes the problem by initialising the pgd_cache earlier, in
the pgtable_cache_init callback, which sounds suspiciously like what it
was intended for.
Reported-by: Dennis Chen <dennis.chen@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Bug:
30369029
Patchset: rework-pagetable
(cherry picked from commit
39b5be9b4233a9f212b98242bddf008f379b5122)
Signed-off-by: Jeff Vander Stoep <jeffv@google.com>
Change-Id: I124f03d19299be93124af35641294bf73c13bb22
Will Deacon [Mon, 21 Dec 2015 16:44:27 +0000 (16:44 +0000)]
UPSTREAM: arm64: traps: address fallout from printk -> pr_* conversion
Commit
ac7b406c1a9d ("arm64: Use pr_* instead of printk") was a fairly
mindless s/printk/pr_*/ change driven by a complaint from checkpatch.
As is usual with such changes, this has led to some odd behaviour on
arm64:
* syslog now picks up the "pr_emerg" line from dump_backtrace, but not
the actual trace, which leads to a bunch of "kernel:Call trace:"
lines in the log
* __{pte,pmd,pgd}_error print at KERN_CRIT, as opposed to KERN_ERR
which is used by other architectures.
This patch restores the original printk behaviour for dump_backtrace
and downgrade the pgtable error macros to KERN_ERR.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Bug:
30369029
Patchset: rework-pagetable
(cherry picked from commit
c9cd0ed925c0b927283d4739bfe689eb9d1e9dfd)
Signed-off-by: Jeff Vander Stoep <jeffv@google.com>
Change-Id: Iaf028e5368df4623f7257ef432a7f9da86261609
AKASHI Takahiro [Tue, 15 Dec 2015 08:33:41 +0000 (17:33 +0900)]
UPSTREAM: arm64: ftrace: fix a stack tracer's output under function graph tracer
Function graph tracer modifies a return address (LR) in a stack frame
to hook a function return. This will result in many useless entries
(return_to_handler) showing up in
a) a stack tracer's output
b) perf call graph (with perf record -g)
c) dump_backtrace (at panic et al.)
For example, in case of a),
$ echo function_graph > /sys/kernel/debug/tracing/current_tracer
$ echo 1 > /proc/sys/kernel/stack_trace_enabled
$ cat /sys/kernel/debug/tracing/stack_trace
Depth Size Location (54 entries)
----- ---- --------
0) 4504 16 gic_raise_softirq+0x28/0x150
1) 4488 80 smp_cross_call+0x38/0xb8
2) 4408 48 return_to_handler+0x0/0x40
3) 4360 32 return_to_handler+0x0/0x40
...
In case of b),
$ echo function_graph > /sys/kernel/debug/tracing/current_tracer
$ perf record -e mem:XXX:x -ag -- sleep 10
$ perf report
...
| | |--0.22%-- 0x550f8
| | | 0x10888
| | | el0_svc_naked
| | | sys_openat
| | | return_to_handler
| | | return_to_handler
...
In case of c),
$ echo function_graph > /sys/kernel/debug/tracing/current_tracer
$ echo c > /proc/sysrq-trigger
...
Call trace:
[<
ffffffc00044d3ac>] sysrq_handle_crash+0x24/0x30
[<
ffffffc000092250>] return_to_handler+0x0/0x40
[<
ffffffc000092250>] return_to_handler+0x0/0x40
...
This patch replaces such entries with real addresses preserved in
current->ret_stack[] at unwind_frame(). This way, we can cover all
the cases.
Reviewed-by: Jungseok Lee <jungseoklee85@gmail.com>
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
[will: fixed minor context changes conflicting with irq stack bits]
Signed-off-by: Will Deacon <will.deacon@arm.com>
Bug:
30369029
Patchset: rework-pagetable
(cherry picked from commit
20380bb390a443b2c5c8800cec59743faf8151b4)
Signed-off-by: Jeff Vander Stoep <jeffv@google.com>
Change-Id: I6360182f8d04fdd2e31c0cb6054aefa2adb216e7
AKASHI Takahiro [Tue, 15 Dec 2015 08:33:40 +0000 (17:33 +0900)]
UPSTREAM: arm64: pass a task parameter to unwind_frame()
Function graph tracer modifies a return address (LR) in a stack frame
to hook a function's return. This will result in many useless entries
(return_to_handler) showing up in a call stack list.
We will fix this problem in a later patch ("arm64: ftrace: fix a stack
tracer's output under function graph tracer"). But since real return
addresses are saved in ret_stack[] array in struct task_struct,
unwind functions need to be notified of, in addition to a stack pointer
address, which task is being traced in order to find out real return
addresses.
This patch extends unwind functions' interfaces by adding an extra
argument of a pointer to task_struct.
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Bug:
30369029
Patchset: rework-pagetable
(cherry picked from commit
fe13f95b720075327a761fe6ddb45b0c90cab504)
Signed-off-by: Jeff Vander Stoep <jeffv@google.com>
Change-Id: I92a9a07468c182d5abbacaa73a90984ab11ad535
AKASHI Takahiro [Tue, 15 Dec 2015 08:33:39 +0000 (17:33 +0900)]
UPSTREAM: arm64: ftrace: modify a stack frame in a safe way
Function graph tracer modifies a return address (LR) in a stack frame by
calling ftrace_prepare_return() in a traced function's function prologue.
The current code does this modification before preserving an original
address at ftrace_push_return_trace() and there is always a small window
of inconsistency when an interrupt occurs.
This doesn't matter, as far as an interrupt stack is introduced, because
stack tracer won't be invoked in an interrupt context. But it would be
better to proactively minimize such a window by moving the LR modification
after ftrace_push_return_trace().
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Bug:
30369029
Patchset: rework-pagetable
(cherry picked from commit
79fdee9b6355c9720f14717e1ad66af51bb331b5)
Signed-off-by: Jeff Vander Stoep <jeffv@google.com>
Change-Id: Ief0a136aa01348b4d0d3f447544f21fd77835c67
James Morse [Fri, 18 Dec 2015 16:01:47 +0000 (16:01 +0000)]
UPSTREAM: arm64: remove irq_count and do_softirq_own_stack()
sysrq_handle_reboot() re-enables interrupts while on the irq stack. The
irq_stack implementation wrongly assumed this would only ever happen
via the softirq path, allowing it to update irq_count late, in
do_softirq_own_stack().
This means if an irq occurs in sysrq_handle_reboot(), during
emergency_restart() the stack will be corrupted, as irq_count wasn't
updated.
Lose the optimisation, and instead of moving the adding/subtracting of
irq_count into irq_stack_entry/irq_stack_exit, remove it, and compare
sp_el0 (struct thread_info) with sp & ~(THREAD_SIZE - 1). This tells us
if we are on a task stack, if so, we can safely switch to the irq stack.
Finally, remove do_softirq_own_stack(), we don't need it anymore.
Reported-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
[will: use get_thread_info macro]
Signed-off-by: Will Deacon <will.deacon@arm.com>
Bug:
30369029
Patchset: rework-pagetable
(cherry picked from commit
d224a69e3d80fe08f285d1f41d21b590bae4fa9f)
Signed-off-by: Jeff Vander Stoep <jeffv@google.com>
Change-Id: I1f613279bf875443b10d65b1cd1ed4a6abfcb605
David Woods [Thu, 17 Dec 2015 19:31:26 +0000 (14:31 -0500)]
UPSTREAM: arm64: hugetlb: add support for PTE contiguous bit
The arm64 MMU supports a Contiguous bit which is a hint that the TTE
is one of a set of contiguous entries which can be cached in a single
TLB entry. Supporting this bit adds new intermediate huge page sizes.
The set of huge page sizes available depends on the base page size.
Without using contiguous pages the huge page sizes are as follows.
4KB: 2MB 1GB
64KB: 512MB
With a 4KB granule, the contiguous bit groups together sets of 16 pages
and with a 64KB granule it groups sets of 32 pages. This enables two new
huge page sizes in each case, so that the full set of available sizes
is as follows.
4KB: 64KB 2MB 32MB 1GB
64KB: 2MB 512MB 16GB
If a 16KB granule is used then the contiguous bit groups 128 pages
at the PTE level and 32 pages at the PMD level.
If the base page size is set to 64KB then 2MB pages are enabled by
default. It is possible in the future to make 2MB the default huge
page size for both 4KB and 64KB granules.
Reviewed-by: Chris Metcalf <cmetcalf@ezchip.com>
Reviewed-by: Steve Capper <steve.capper@linaro.org>
Signed-off-by: David Woods <dwoods@ezchip.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Bug:
30369029
Patchset: rework-pagetable
(cherry picked from commit
66b3923a1a0f77a563b43f43f6ad091354abbfe9)
Signed-off-by: Jeff Vander Stoep <jeffv@google.com>
Change-Id: I5e99c5165bc5eb966adf4d4523632fd9eedd9602
Ashok Kumar [Thu, 17 Dec 2015 09:38:32 +0000 (01:38 -0800)]
BACKPORT: arm64: Use PoU cache instr for I/D coherency
In systems with three levels of cache(PoU at L1 and PoC at L3),
PoC cache flush instructions flushes L2 and L3 caches which could affect
performance.
For cache flushes for I and D coherency, PoU should suffice.
So changing all I and D coherency related cache flushes to PoU.
Introduced a new __clean_dcache_area_pou API for dcache flush till PoU
and provided a common macro for __flush_dcache_area and
__clean_dcache_area_pou.
Also, now in __sync_icache_dcache, icache invalidation for non-aliasing
VIPT icache is done only for that particular page instead of the earlier
__flush_icache_all.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Ashok Kumar <ashoks@broadcom.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Bug:
30369029
Patchset: rework-pagetable
(cherry picked from commit
0a28714c53fd4f7aea709be7577dfbe0095c8c3e)
Signed-off-by: Jeff Vander Stoep <jeffv@google.com>
Change-Id: I64f065140d5e8783e91ed53ae9c7a2e33a3e515a
Lorenzo Pieralisi [Wed, 13 Jan 2016 14:50:03 +0000 (14:50 +0000)]
BACKPORT: arm64: kernel: fix architected PMU registers unconditional access
The Performance Monitors extension is an optional feature of the
AArch64 architecture, therefore, in order to access Performance
Monitors registers safely, the kernel should detect the architected
PMU unit presence through the ID_AA64DFR0_EL1 register PMUVer field
before accessing them.
This patch implements a guard by reading the ID_AA64DFR0_EL1 register
PMUVer field to detect the architected PMU presence and prevent accessing
PMU system registers if the Performance Monitors extension is not
implemented in the core.
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: <stable@vger.kernel.org>
Fixes:
60792ad349f3 ("arm64: kernel: enforce pmuserenr_el0 initialization and restore")
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reported-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Bug:
30369029
Patchset: rework-pagetable
(cherry picked from commit
f436b2ac90a095746beb6729b8ee8ed87c9eaede)
Signed-off-by: Jeff Vander Stoep <jeffv@google.com>
Change-Id: Ie442b9feba5d143bd6b6c82d70190fc8bc95298d
Ashok Kumar [Thu, 17 Dec 2015 09:38:31 +0000 (01:38 -0800)]
UPSTREAM: arm64: Defer dcache flush in __cpu_copy_user_page
Defer dcache flushing to __sync_icache_dcache by calling
flush_dcache_page which clears PG_dcache_clean flag.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Ashok Kumar <ashoks@broadcom.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Bug:
30369029
Patchset: rework-pagetable
(cherry picked from commit
e6b1185f77351aa154e63bd54b05d07ff99d4ffa)
Signed-off-by: Jeff Vander Stoep <jeffv@google.com>
Change-Id: I2e02ce2f43f68287337bed30e3c3455c0eee4164
James Morse [Tue, 15 Dec 2015 11:21:25 +0000 (11:21 +0000)]
UPSTREAM: arm64: reduce stack use in irq_handler
The code for switching to irq_stack stores three pieces of information on
the stack, fp+lr, as a fake stack frame (that lets us walk back onto the
interrupted tasks stack frame), and the address of the struct pt_regs that
contains the register values from kernel entry. (which dump_backtrace()
will print in any stack trace).
To reduce this, we store fp, and the pointer to the struct pt_regs.
unwind_frame() can recognise this as the irq_stack dummy frame, (as it only
appears at the top of the irq_stack), and use the struct pt_regs values
to find the missing interrupted link-register.
Suggested-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Bug:
30369029
Patchset: rework-pagetable
(cherry picked from commit
971c67ce37cfeeaf560e792a2c3bc21d8b67163a)
Signed-off-by: Jeff Vander Stoep <jeffv@google.com>
Change-Id: I84cbb04857a441083d331e875c3e228d24ec2276
Will Deacon [Tue, 17 Nov 2015 14:45:47 +0000 (14:45 +0000)]
UPSTREAM: arm64: Documentation: add list of software workarounds for errata
It's not immediately obvious which hardware errata are worked around in
the Linux kernel for an arbitrary kernel tree, so add a file to keep
track of what we're working around.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Bug:
30369029
Patchset: rework-pagetable
(cherry picked from commit
9cb9c9e5ba8453537e8e645318edf231fe54eaf9)
Signed-off-by: Jeff Vander Stoep <jeffv@google.com>
Change-Id: I139972ff6e10e12c4b4f27cd047f55b3dd8b4118
Mark Rutland [Fri, 11 Dec 2015 11:04:31 +0000 (11:04 +0000)]
UPSTREAM: arm64: mm: place __cpu_setup in .text
We drop __cpu_setup in .text.init, which ends up being part of .text.
The .text.init section was a legacy section name which has been unused
elsewhere for a long time.
The ".text.init" name is misleading if read as a synonym for
".init.text". Any CPU may execute __cpu_setup before turning the MMU on,
so it should simply live in .text.
Remove the pointless section assignment. This will leave __cpu_setup in
the .text section.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Bug:
30369029
Patchset: rework-pagetable
(cherry picked from commit
f00083cae331e5d3eecade6b4fdc35d0825e73ef)
Signed-off-by: Jeff Vander Stoep <jeffv@google.com>
Change-Id: I2e9b154cd6de92662c70c2b957479448252c661a
Mark Brown [Thu, 10 Dec 2015 16:54:32 +0000 (16:54 +0000)]
UPSTREAM: arm64: cmpxchg: Don't incldue linux/mmdebug.h
The arm64 asm/cmpxchg.h includes linux/mmdebug.h but doesn't so far as I
can tell actually use anything from it. Removing the inclusion reduces
spurious header dependency rebuilds and also avoids issues with
recursive inclusions of headers causing build breaks due to attempts to
use things before they are defined if linux/mmdebug.h starts pulling in
more low level headers.
Such errors have happened in -next recently, for example:
In file included from include/linux/completion.h:11:0,
from include/linux/rcupdate.h:43,
from include/linux/tracepoint.h:19,
from include/linux/mmdebug.h:6,
from ./arch/arm64/include/asm/cmpxchg.h:22,
from ./arch/arm64/include/asm/atomic.h:41,
from include/linux/atomic.h:4,
from include/linux/spinlock.h:406,
from include/linux/seqlock.h:35,
from include/linux/time.h:5,
from include/uapi/linux/timex.h:56,
from include/linux/timex.h:56,
from include/linux/sched.h:19,
from arch/arm64/kernel/asm-offsets.c:21:
include/linux/wait.h: In function 'wait_on_atomic_t':
include/linux/wait.h:1218:2: error: implicit declaration of function 'atomic_read' [-Werror=implicit-function-declaration]
if (atomic_read(val) == 0)
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Bug:
30369029
Patchset: rework-pagetable
(cherry picked from commit
4a6ccf30263f4e265c0f171561bf4c40bed5f273)
Signed-off-by: Jeff Vander Stoep <jeffv@google.com>
Change-Id: Idf44176dad0abc11e67b4e416b02a3fba14f3f1b
Mark Rutland [Wed, 9 Dec 2015 12:44:38 +0000 (12:44 +0000)]
UPSTREAM: arm64: mm: fold alternatives into .init
Currently we treat the alternatives separately from other data that's
only used during initialisation, using separate .altinstructions and
.altinstr_replacement linker sections. These are freed for general
allocation separately from .init*. This is problematic as:
* We do not remove execute permissions, as we do for .init, leaving the
memory executable.
* We pad between them, making the kernel Image bianry up to PAGE_SIZE
bytes larger than necessary.
This patch moves the two sections into the contiguous region used for
.init*. This saves some memory, ensures that we remove execute
permissions, and allows us to remove some code made redundant by this
reorganisation.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Andre Przywara <andre.przywara@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jeremy Linton <jeremy.linton@arm.com>
Cc: Laura Abbott <labbott@fedoraproject.org>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Bug:
30369029
Patchset: rework-pagetable
(cherry picked from commit
9aa4ec1571da62366cfddc20f3b923609604fe63)
Signed-off-by: Jeff Vander Stoep <jeffv@google.com>
Change-Id: I3fee118ead5c73ade50ea10d436881c9424a549c
Mark Rutland [Wed, 9 Dec 2015 12:44:37 +0000 (12:44 +0000)]
BACKPORT: arm64: Remove redundant padding from linker script
Currently we place an ALIGN_DEBUG_RO between text and data for the .text
and .init sections, and depending on configuration each of these may
result in up to SECTION_SIZE bytes worth of padding (for
DEBUG_RODATA_ALIGN).
We make no distinction between the text and data in each of these
sections at any point when creating the initial page tables in head.S.
We also make no distinction when modifying the tables; __map_memblock,
fixup_executable, mark_rodata_ro, and fixup_init only work at section
granularity. Thus this padding is unnecessary.
For the spit between init text and data we impose a minimum alignment of
16 bytes, but this is also unnecessary. The init data is output
immediately after the padding before any symbols are defined, so this is
not required to keep a symbol for linker a section array correctly
associated with the data. Any objects within the section will be given
at least their usual alignment regardless.
This patch removes the redundant padding.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jeremy Linton <jeremy.linton@arm.com>
Cc: Laura Abbott <labbott@fedoraproject.org>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Bug:
30369029
Patchset: rework-pagetable
(cherry picked from commit
5b28cd9d084eca8ddc46270d2720305bfd40e348)
Signed-off-by: Jeff Vander Stoep <jeffv@google.com>
Change-Id: I5540ba1f4d90e2ae8fafa22e98c389bc4e975ac7
Mark Rutland [Wed, 9 Dec 2015 12:44:36 +0000 (12:44 +0000)]
UPSTREAM: arm64: mm: remove pointless PAGE_MASKing
As pgd_offset{,_k} shift the input address by PGDIR_SHIFT, the sub-page
bits will always be shifted out. There is no need to apply PAGE_MASK
before this.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jeremy Linton <jeremy.linton@arm.com>
Cc: Laura Abbott <labbott@fedoraproject.org>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Bug:
30369029
Patchset: rework-pagetable
(cherry picked from commit
e2c30ee320eb96304896c7ab84499e5bc5e5fb6e)
Signed-off-by: Jeff Vander Stoep <jeffv@google.com>
Change-Id: I86d3438aecf7295d5895e1430c1e19fbc82c9719
Lorenzo Colitti [Wed, 7 Sep 2016 15:42:25 +0000 (00:42 +0900)]
net: inet: diag: expose the socket mark to privileged processes.
This adds the capability for a process that has CAP_NET_ADMIN on
a socket to see the socket mark in socket dumps.
Commit
a52e95abf772 ("net: diag: allow socket bytecode filters to
match socket marks") recently gave privileged processes the
ability to filter socket dumps based on mark. This patch is
complementary: it ensures that the mark is also passed to
userspace in the socket's netlink attributes. It is useful for
tools like ss which display information about sockets.
[backport of net-next
d545caca827b65aab557a9e9dcdcf1e5a3823c2d]
Change-Id: I33336ed9c3ee3fb78fe05c4c47b7fd18c6e33ef1
Tested: https://android-review.googlesource.com/270210
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lorenzo Colitti [Wed, 7 Sep 2016 04:38:35 +0000 (13:38 +0900)]
net: diag: make udp_diag_destroy work for mapped addresses.
udp_diag_destroy does look up the IPv4 UDP hashtable for mapped
addresses, but it gets the IPv4 address to look up from the
beginning of the IPv6 address instead of the end.
[cherry-pick of net-next
f95bf346226b9b79352e05508beececc807cc37a]
Change-Id: Ia369482c4645bcade320b2c33a763f1ce4378ff1
Tested: https://android-review.googlesource.com/269874
Fixes:
5d77dca82839 ("net: diag: support SOCK_DESTROY for UDP sockets")
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Wed, 24 Aug 2016 04:06:33 +0000 (21:06 -0700)]
net: diag: support SOCK_DESTROY for UDP sockets
This implements SOCK_DESTROY for UDP sockets similar to what was done
for TCP with commit
c1e64e298b8ca ("net: diag: Support destroying TCP
sockets.") A process with a UDP socket targeted for destroy is awakened
and recvmsg fails with ECONNABORTED.
[cherry-pick of
5d77dca82839ef016a93ad7acd7058b14d967752]
Change-Id: I4b4862548e6e3c05dde27781e7daa0b18b93bd81
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lorenzo Colitti [Wed, 24 Aug 2016 06:46:26 +0000 (15:46 +0900)]
net: diag: allow socket bytecode filters to match socket marks
This allows a privileged process to filter by socket mark when
dumping sockets via INET_DIAG_BY_FAMILY. This is useful on
systems that use mark-based routing such as Android.
The ability to filter socket marks requires CAP_NET_ADMIN, which
is consistent with other privileged operations allowed by the
SOCK_DIAG interface such as the ability to destroy sockets and
the ability to inspect BPF filters attached to packet sockets.
[cherry-pick of
a52e95abf772b43c9226e9a72d3c1353903ba96f]
Change-Id: I8b90b814264d9808bda050cdba8f104943bdb9a8
Tested: https://android-review.googlesource.com/261350
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lorenzo Colitti [Wed, 24 Aug 2016 06:46:25 +0000 (15:46 +0900)]
net: diag: slightly refactor the inet_diag_bc_audit error checks.
This simplifies the code a bit and also allows inet_diag_bc_audit
to send to userspace an error that isn't EINVAL.
[cherry-pick of net-next
627cc4add53c0470bfd118002669205d222d3a54]
Change-Id: Iee3d2bbb19f3110d71f0698ffb293f9bdffc8ef1
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Fri, 24 Jun 2016 01:42:51 +0000 (18:42 -0700)]
net: diag: Add support to filter on device index
Add support to inet_diag facility to filter sockets based on device
index. If an interface index is in the filter only sockets bound
to that index (sk_bound_dev_if) are returned.
[cherry-pick of net-next
637c841dd7a5f9bd97b75cbe90b526fa1a52e530]
Change-Id: I6b6bcdcf15d3142003f1ee53b4d82f2fabbb8250
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
James Morse [Thu, 10 Dec 2015 10:22:41 +0000 (10:22 +0000)]
UPSTREAM: arm64: don't call C code with el0's fp register
On entry from el0, we save all the registers on the kernel stack, and
restore them before returning. x29 remains unchanged when we call out
to C code, which will store x29 as the frame-pointer on the stack.
Instead, write 0 into x29 after entry from el0, to avoid any risk of
tracing into user space.
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Bug:
30369029
Patchset: per-cpu-irq-stack
(cherry picked from commit
49003a8d6b35e128ef5e51433e60e783a46fbe5f)
Signed-off-by: Jeff Vander Stoep <jeffv@google.com>
Change-Id: Ifae7003018e4088d5de038cef25fa210211a75b6
James Morse [Thu, 10 Dec 2015 10:22:40 +0000 (10:22 +0000)]
UPSTREAM: arm64: when walking onto the task stack, check sp & fp are in current->stack
When unwind_frame() reaches the bottom of the irq_stack, the last fp
points to the original task stack. unwind_frame() uses
IRQ_STACK_TO_TASK_STACK() to find the sp value. If either values is
wrong, we may end up walking a corrupt stack.
Check these values are sane by testing if they are both on the stack
pointed to by current->stack.
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Bug:
30369029
Patchset: per-cpu-irq-stack
(cherry picked from commit
1ffe199b1c9b72a8e752a9ae2a7af10128ab2ca1)
Signed-off-by: Jeff Vander Stoep <jeffv@google.com>
Change-Id: I2e5bf1ce899a1018f1c5b8ccb4f7c816d61bba21
James Morse [Thu, 10 Dec 2015 10:22:39 +0000 (10:22 +0000)]
UPSTREAM: arm64: Add this_cpu_ptr() assembler macro for use in entry.S
irq_stack is a per_cpu variable, that needs to be access from entry.S.
Use an assembler macro instead of the unreadable details.
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Bug:
30369029
Patchset: per-cpu-irq-stack
(cherry picked from commit
aa4d5d3cbc258c355151a3903211b27359390ec5)
Signed-off-by: Jeff Vander Stoep <jeffv@google.com>
Change-Id: I2ccc11f442c0303c62e1c3e0a05a088958c922b8
Will Deacon [Wed, 9 Dec 2015 13:58:42 +0000 (13:58 +0000)]
UPSTREAM: arm64: irq: fix walking from irq stack to task stack
Running with CONFIG_DEBUG_SPINLOCK=y can trigger a BUG with the new IRQ
stack code:
BUG: spinlock lockup suspected on CPU#1
This is due to the IRQ_STACK_TO_TASK_STACK macro incorrectly retrieving
the task stack pointer stashed at the top of the IRQ stack.
Sayeth James:
| Yup, this is what is happening. Its an off-by-one due to broken
| thinking about how the stack works. My broken thinking was:
|
| > top ------------
| > | dummy_lr | <- irq_stack_ptr
| > ------------
| > | x29 |
| > ------------
| > | x19 | <- irq_stack_ptr - 0x10
| > ------------
| > | xzr |
| > ------------
|
| But the stack-pointer is decreased before use. So it actually looks
| like this:
|
| > ------------
| > | | <- irq_stack_ptr
| > top ------------
| > | dummy_lr |
| > ------------
| > | x29 | <- irq_stack_ptr - 0x10
| > ------------
| > | x19 |
| > ------------
| > | xzr | <- irq_stack_ptr - 0x20
| > ------------
|
| The value being used as the original stack is x29, which in all the
| tests is sp but without the current frames data, hence there are no
| missing frames in the output.
|
| Jungseok Lee picked it up with a 32bit user space because aarch32
| can't use x29, so it remains 0 forever. The fix he posted is correct.
This patch fixes the macro and adds some of this wisdom to a comment,
so that the layout of the IRQ stack is well understood.
Cc: James Morse <james.morse@arm.com>
Reported-by: Jungseok Lee <jungseoklee85@gmail.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Bug:
30369029
Patchset: per-cpu-irq-stack
(cherry picked from commit
7596abf2e5661d52c4f414f37addeed54e098880)
Signed-off-by: Jeff Vander Stoep <jeffv@google.com>
Change-Id: Ic65c0d0d90a30a5caf8b3791d1e856400bd2b5f5
James Morse [Fri, 4 Dec 2015 11:02:27 +0000 (11:02 +0000)]
UPSTREAM: arm64: Add do_softirq_own_stack() and enable irq_stacks
entry.S is modified to switch to the per_cpu irq_stack during el{0,1}_irq.
irq_count is used to detect recursive interrupts on the irq_stack, it is
updated late by do_softirq_own_stack(), when called on the irq_stack, before
__do_softirq() re-enables interrupts to process softirqs.
do_softirq_own_stack() is added by this patch, but does not yet switch
stack.
This patch adds the dummy stack frame and data needed by the previous
stack tracing patches.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Bug:
30369029
Patchset: per-cpu-irq-stack
(cherry picked from commit
8e23dacd12a48e58125b84c817da50850b73280a)
Signed-off-by: Jeff Vander Stoep <jeffv@google.com>
Change-Id: I9f79437af3da0398cb12e7afd1aa9f473f59b2e6
AKASHI Takahiro [Fri, 4 Dec 2015 11:02:26 +0000 (11:02 +0000)]
UPSTREAM: arm64: Modify stack trace and dump for use with irq_stack
This patch allows unwind_frame() to traverse from interrupt stack to task
stack correctly. It requires data from a dummy stack frame, created
during irq_stack_entry(), added by a later patch.
A similar approach is taken to modify dump_backtrace(), which expects to
find struct pt_regs underneath any call to functions marked __exception.
When on an irq_stack, the struct pt_regs is stored on the old task stack,
the location of which is stored in the dummy stack frame.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
[james.morse: merged two patches, reworked for per_cpu irq_stacks, and
no alignment guarantees, added irq_stack definitions]
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Bug:
30369029
Patchset: per-cpu-irq-stack
(cherry picked from commit
132cd887b5c54758d04bf25c52fa48f45e843a30)
Signed-off-by: Jeff Vander Stoep <jeffv@google.com>
Change-Id: I60b29291620a71ab7b6564730299d29f41ceb199
Jungseok Lee [Fri, 4 Dec 2015 11:02:25 +0000 (11:02 +0000)]
UPSTREAM: arm64: Store struct thread_info in sp_el0
There is need for figuring out how to manage struct thread_info data when
IRQ stack is introduced. struct thread_info information should be copied
to IRQ stack under the current thread_info calculation logic whenever
context switching is invoked. This is too expensive to keep supporting
the approach.
Instead, this patch pays attention to sp_el0 which is an unused scratch
register in EL1 context. sp_el0 utilization not only simplifies the
management, but also prevents text section size from being increased
largely due to static allocated IRQ stack as removing masking operation
using THREAD_SIZE in many places.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Jungseok Lee <jungseoklee85@gmail.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Bug:
30369029
Patchset: per-cpu-irq-stack
(cherry picked from commit
6cdf9c7ca687e01840d0215437620a20263012fc)
Signed-off-by: Jeff Vander Stoep <jeffv@google.com>
Change-Id: I53c9f44a0772b8649f302a65a7a6519d8eebcb91
Catalin Marinas [Fri, 4 Dec 2015 12:42:29 +0000 (12:42 +0000)]
UPSTREAM: arm64: Add trace_hardirqs_off annotation in ret_to_user
When a kernel is built with CONFIG_TRACE_IRQFLAGS the following warning
is produced when entering userspace for the first time:
WARNING: at /work/Linux/linux-2.6-aarch64/kernel/locking/lockdep.c:3519
Modules linked in:
CPU: 1 PID: 1 Comm: systemd Not tainted 4.4.0-rc3+ #639
Hardware name: Juno (DT)
task:
ffffffc9768a0000 ti:
ffffffc9768a8000 task.ti:
ffffffc9768a8000
PC is at check_flags.part.22+0x19c/0x1a8
LR is at check_flags.part.22+0x19c/0x1a8
pc : [<
ffffffc0000fba6c>] lr : [<
ffffffc0000fba6c>] pstate:
600001c5
sp :
ffffffc9768abe10
x29:
ffffffc9768abe10 x28:
ffffffc9768a8000
x27:
0000000000000000 x26:
0000000000000001
x25:
00000000000000a6 x24:
ffffffc00064be6c
x23:
ffffffc0009f249e x22:
ffffffc9768a0000
x21:
ffffffc97fea5480 x20:
00000000000001c0
x19:
ffffffc00169a000 x18:
0000005558cc7b58
x17:
0000007fb78e3180 x16:
0000005558d2e238
x15:
ffffffffffffffff x14:
0ffffffffffffffd
x13:
0000000000000008 x12:
0101010101010101
x11:
7f7f7f7f7f7f7f7f x10:
fefefefefefeff63
x9 :
7f7f7f7f7f7f7f7f x8 :
6e655f7371726964
x7 :
0000000000000001 x6 :
ffffffc0001079c4
x5 :
0000000000000000 x4 :
0000000000000001
x3 :
ffffffc001698438 x2 :
0000000000000000
x1 :
ffffffc9768a0000 x0 :
000000000000002e
Call trace:
[<
ffffffc0000fba6c>] check_flags.part.22+0x19c/0x1a8
[<
ffffffc0000fc440>] lock_is_held+0x80/0x98
[<
ffffffc00064bafc>] __schedule+0x404/0x730
[<
ffffffc00064be6c>] schedule+0x44/0xb8
[<
ffffffc000085bb0>] ret_to_user+0x0/0x24
possible reason: unannotated irqs-off.
irq event stamp: 502169
hardirqs last enabled at (502169): [<
ffffffc000085a98>] el0_irq_naked+0x1c/0x24
hardirqs last disabled at (502167): [<
ffffffc0000bb3bc>] __do_softirq+0x17c/0x298
softirqs last enabled at (502168): [<
ffffffc0000bb43c>] __do_softirq+0x1fc/0x298
softirqs last disabled at (502143): [<
ffffffc0000bb830>] irq_exit+0xa0/0xf0
This happens because we disable interrupts in ret_to_user before calling
schedule() in work_resched. This patch adds the necessary
trace_hardirqs_off annotation.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Bug:
30369029
(cherry picked from commit
db3899a6477a4dccd26cbfb7f408b6be2cc068e0)
Signed-off-by: Jeff Vander Stoep <jeffv@google.com>
Change-Id: I868c6c64c548f12156467959e2b8df09caae282f
Li Bin [Fri, 4 Dec 2015 03:38:40 +0000 (11:38 +0800)]
UPSTREAM: arm64: ftrace: fix the comments for ftrace_modify_code
There is no need to worry about module and __init text disappearing
case, because that ftrace has a module notifier that is called when
a module is being unloaded and before the text goes away and this
code grabs the ftrace_lock mutex and removes the module functions
from the ftrace list, such that it will no longer do any
modifications to that module's text, the update to make functions
be traced or not is done under the ftrace_lock mutex as well.
And by now, __init section codes should not been modified
by ftrace, because it is black listed in recordmcount.c and
ignored by ftrace.
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Li Bin <huawei.libin@huawei.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Bug:
30369029
Patchset: arm64-ftrace
(cherry picked from commit
004ab584e028093996cf5b8e220b8bc50c5111cf)
Signed-off-by: Jeff Vander Stoep <jeffv@google.com>
Change-Id: I69df7eddbf9e17031920b950312399dc4d36c09e
Li Bin [Fri, 4 Dec 2015 03:38:39 +0000 (11:38 +0800)]
UPSTREAM: arm64: ftrace: stop using kstop_machine to enable/disable tracing
For ftrace on arm64, kstop_machine which is hugely disruptive
to a running system is not needed to convert nops to ftrace calls
or back, because that to be modified instrucions, that NOP, B or BL,
are all safe instructions which called "concurrent modification
and execution of instructions", that can be executed by one
thread of execution as they are being modified by another thread
of execution without requiring explicit synchronization.
Signed-off-by: Li Bin <huawei.libin@huawei.com>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Bug:
30369029
Patchset: arm64-ftrace
(cherry picked from commit
81a6a146e88eca5d6726569779778d61489d85aa)
Signed-off-by: Jeff Vander Stoep <jeffv@google.com>
Change-Id: I54e2c0d49bd68f9547bd9f0da8b7520e02bb0714
Will Deacon [Thu, 19 Nov 2015 17:48:31 +0000 (17:48 +0000)]
UPSTREAM: arm64: spinlock: serialise spin_unlock_wait against concurrent lockers
Boqun Feng reported a rather nasty ordering issue with spin_unlock_wait
on architectures implementing spin_lock with LL/SC sequences and acquire
semantics:
| CPU 1 CPU 2 CPU 3
| ================== ==================== ==============
| spin_unlock(&lock);
| spin_lock(&lock):
| r1 = *lock; // r1 == 0;
| o = READ_ONCE(object); // reordered here
| object = NULL;
| smp_mb();
| spin_unlock_wait(&lock);
| *lock = 1;
| smp_mb();
| o->dead = true;
| if (o) // true
| BUG_ON(o->dead); // true!!
The crux of the problem is that spin_unlock_wait(&lock) can return on
CPU 1 whilst CPU 2 is in the process of taking the lock. This can be
resolved by upgrading spin_unlock_wait to a LOCK operation, forcing it
to serialise against a concurrent locker and giving it acquire semantics
in the process (although it is not at all clear whether this is needed -
different callers seem to assume different things about the barrier
semantics and architectures are similarly disjoint in their
implementations of the macro).
This patch implements spin_unlock_wait using an LL/SC sequence with
acquire semantics on arm64. For v8.1 systems with the LSE atomics, the
exclusive writeback is omitted, since the spin_lock operation is
indivisible and no intermediate state can be observed.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Bug:
30369029
(cherry picked from commit
d86b8da04dfa4771a68bdbad6c424d40f22f0d14)
Signed-off-by: Jeff Vander Stoep <jeffv@google.com>
Change-Id: I27d88458c99dfd475d0326bd1d408ec3e575a7dd
Will Deacon [Mon, 23 Nov 2015 15:12:59 +0000 (15:12 +0000)]
UPSTREAM: arm64: enable HAVE_IRQ_TIME_ACCOUNTING
arm64 relies on the arm_arch_timer for sched_clock, so we can select
HAVE_IRQ_TIME_ACCOUNTING and have the core sched-clock code enable the
feature at runtime based on the rate.
Reported-by: Mario Smarduch <m.smarduch@samsung.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Bug:
30369029
(cherry picked from commit
24da208db32ee1e4757ceaba898c47add8e5361e)
Signed-off-by: Jeff Vander Stoep <jeffv@google.com>
Change-Id: I849e010459dbbde0ac0d44d665dadcea9f8bf12d
Yury Norov [Wed, 2 Dec 2015 14:00:10 +0000 (14:00 +0000)]
UPSTREAM: arm64: fix COMPAT_SHMLBA definition for large pages
ARM glibc uses (4 * __getpagesize()) for SHMLBA, which is correct for
4KB pages and works fine for 64KB pages, but the kernel uses a hardcoded
16KB that is too small for 64KB page based kernels. This changes the
definition to what user space sees when using 64KB pages.
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Yury Norov <ynorov@caviumnetworks.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Bug:
30369029
(cherry picked from commit
b9b7aebb42d1b1392f3111de61136bb6cf3aae3f)
Signed-off-by: Jeff Vander Stoep <jeffv@google.com>
Change-Id: I698814d005a28c7fe3963cded9756f88660d4aa0
Jisheng Zhang [Fri, 20 Nov 2015 09:59:10 +0000 (17:59 +0800)]
UPSTREAM: arm64: add __init/__initdata section marker to some functions/variables
These functions/variables are not needed after booting, so mark them
as __init or __initdata.
Signed-off-by: Jisheng Zhang <jszhang@marvell.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Bug:
30369029
(cherry picked from commit
a7c61a3452d39078919f0e1f493ff966fb64f0db)
Signed-off-by: Jeff Vander Stoep <jeffv@google.com>
Change-Id: I50a3362e186750e139d2440d2c1e1d49ace896e1
Lars-Peter Clausen [Thu, 14 Apr 2016 15:01:17 +0000 (17:01 +0200)]
UPSTREAM: usb: gadget: f_fs: Fix use-after-free
(cherry picked from commit
38740a5b87d53ceb89eb2c970150f6e94e00373a)
When using asynchronous read or write operations on the USB endpoints the
issuer of the IO request is notified by calling the ki_complete() callback
of the submitted kiocb when the URB has been completed.
Calling this ki_complete() callback will free kiocb. Make sure that the
structure is no longer accessed beyond that point, otherwise undefined
behaviour might occur.
Fixes:
2e4c7553cd6f ("usb: gadget: f_fs: add aio support")
Cc: <stable@vger.kernel.org> # v3.15+
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Change-Id: I3c7b643f6440c4fb6160a57c1058523030b46a6c
Bug:
30950866
Arend van Spriel [Fri, 2 Sep 2016 08:37:24 +0000 (09:37 +0100)]
UPSTREAM: brcmfmac: avoid potential stack overflow in brcmf_cfg80211_start_ap()
commit
ded89912156b1a47d940a0c954c43afbabd0c42c upstream
User-space can choose to omit NL80211_ATTR_SSID and only provide raw
IE TLV data. When doing so it can provide SSID IE with length exceeding
the allowed size. The driver further processes this IE copying it
into a local variable without checking the length. Hence stack can be
corrupted and used as exploit.
Cc: stable@vger.kernel.org # v4.4, v4.1
Reported-by: Daxing Guo <freener.gdx@gmail.com>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Will Deacon [Fri, 30 Oct 2015 18:56:19 +0000 (18:56 +0000)]
UPSTREAM: arm64: pgtable: implement pte_accessible()
This patch implements the pte_accessible() macro, which can be used to
test whether or not a given pte is a candidate for allocation in the
TLB.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Bug:
30369029
(cherry picked from commit
76c714be0e5e60c935a53b31be58939510ba1d0f)
Signed-off-by: Jeff Vander Stoep <jeffv@google.com>
Change-Id: I249e2d15665870149dd17d1cdb3850008f5a56fd
Mark Rutland [Mon, 23 Nov 2015 13:26:20 +0000 (13:26 +0000)]
UPSTREAM: arm64: mm: allow sections for unaligned bases
Callees of __create_mapping may decide to create section mappings if
sufficient low bits of the physical and virtual addresses they were
passed are zero. While __create_mapping rounds the virtual base address
down, it does not similarly round the physical base address down, and
hence non-zero bits in the physical address can prevent use of a section
mapping, even where a whole next-level table would be used instead.
Round down the physical base address in __create_mapping to enable all
callees to always create section mappings when such a mapping is
possible.
Cc: Laura Abbott <labbott@fedoraproject.org>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Steve Capper <steve.capper@linaro.org>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Bug:
30369029
Patchset: __create_mapping-fixes
(cherry picked from commit
9c4e08a3022b6df90d31ef4007291faabfce5431)
Signed-off-by: Jeff Vander Stoep <jeffv@google.com>
Change-Id: Ic447350efdba3cd9f9e101c72183e04e39dd28d2
Mark Rutland [Mon, 23 Nov 2015 13:26:19 +0000 (13:26 +0000)]
UPSTREAM: arm64: mm: detect bad __create_mapping uses
If a caller of __create_mapping provides a PA and VA which have
different sub-page offsets, it is not clear which offset they expect to
apply to the mapping, and is indicative of a bad caller.
In some cases, the region we wish to map may validly have a sub-page
offset in the physical and virtual addresses. For example, EFI runtime
regions have 4K granularity, yet may be mapped by a 64K page kernel. So
long as the physical and virtual offsets are the same, the region will
be mapped at the expected VAs.
Disallow calls with differing sub-page offsets, and WARN when they are
encountered, so that we can detect and fix such cases.
Cc: Laura Abbott <labbott@fedoraproject.org>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Steve Capper <steve.capper@linaro.org>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Bug:
30369029
Patchset: __create_mapping-fixes
(cherry picked from commit
cc5d2b3b95cdbb3fed4e38e667d17b9ac7250f7a)
Signed-off-by: Jeff Vander Stoep <jeffv@google.com>
Change-Id: I114a1265b10ff76daff385728d2125e618c313a1
Linus Torvalds [Tue, 23 Feb 2016 22:58:52 +0000 (14:58 -0800)]
UPSTREAM: x86: fix SMAP in 32-bit environments
(cherry picked from commit
de9e478b9d49f3a0214310d921450cf5bb4a21e6)
In commit
11f1a4b9755f ("x86: reorganize SMAP handling in user space
accesses") I changed how the stac/clac instructions were generated
around the user space accesses, which then made it possible to do
batched accesses efficiently for user string copies etc.
However, in doing so, I completely spaced out, and didn't even think
about the 32-bit case. And nobody really even seemed to notice, because
SMAP doesn't even exist until modern Skylake processors, and you'd have
to be crazy to run 32-bit kernels on a modern CPU.
Which brings us to Andy Lutomirski.
He actually tested the 32-bit kernel on new hardware, and noticed that
it doesn't work. My bad. The trivial fix is to add the required
uaccess begin/end markers around the raw accesses in <asm/uaccess_32.h>.
I feel a bit bad about this patch, just because that header file really
should be cleaned up to avoid all the duplicated code in it, and this
commit just expands on the problem. But this just fixes the bug without
any bigger cleanup surgery.
Reported-and-tested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: Ic044ebfe658a13179984111d062ca3a0b1404110
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
Paul Moore [Tue, 19 Jul 2016 21:42:57 +0000 (17:42 -0400)]
UPSTREAM: audit: fix a double fetch in audit_log_single_execve_arg()
(cherry picked from commit
43761473c254b45883a64441dd0bc85a42f3645c)
There is a double fetch problem in audit_log_single_execve_arg()
where we first check the execve(2) argumnets for any "bad" characters
which would require hex encoding and then re-fetch the arguments for
logging in the audit record[1]. Of course this leaves a window of
opportunity for an unsavory application to munge with the data.
This patch reworks things by only fetching the argument data once[2]
into a buffer where it is scanned and logged into the audit
records(s). In addition to fixing the double fetch, this patch
improves on the original code in a few other ways: better handling
of large arguments which require encoding, stricter record length
checking, and some performance improvements (completely unverified,
but we got rid of some strlen() calls, that's got to be a good
thing).
As part of the development of this patch, I've also created a basic
regression test for the audit-testsuite, the test can be tracked on
GitHub at the following link:
* https://github.com/linux-audit/audit-testsuite/issues/25
[1] If you pay careful attention, there is actually a triple fetch
problem due to a strnlen_user() call at the top of the function.
[2] This is a tiny white lie, we do make a call to strnlen_user()
prior to fetching the argument data. I don't like it, but due to the
way the audit record is structured we really have no choice unless we
copy the entire argument at once (which would require a rather
wasteful allocation). The good news is that with this patch the
kernel no longer relies on this strnlen_user() value for anything
beyond recording it in the log, we also update it with a trustworthy
value whenever possible.
Reported-by: Pengfei Wang <wpengfeinudt@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Change-Id: I10e979e94605e3cf8d461e3e521f8f9837228aa5
Bug:
30956807
Jungseung Lee [Mon, 28 Dec 2015 20:47:00 +0000 (04:47 +0800)]
UPSTREAM: ARM: 8494/1: mm: Enable PXN when running non-LPAE kernel on LPAE processor
The VMSA field of MMFR0 (bottom 4 bits) is incremented for each
added feature. PXN is supported if the value is >= 4 and LPAE
is supported if it is >= 5.
In case a kernel with CONFIG_ARM_LPAE disabled is used on a
processor that supports LPAE, we can still use PXN in short
descriptors. So check for >= 4 not == 4.
Signed-off-by: Jungseung Lee <js07.lee@samsung.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Patrick Bellasi [Wed, 24 Aug 2016 10:27:27 +0000 (11:27 +0100)]
FIXUP: sched/tune: update accouting before CPU capacity
The SchedTune tasks accounting is used to identify how many tasks are in
a boostgroup and thus to bias the selection of an OPP based on the
maximum boost value of the active boostgroups.
The current implementation however update the accounting after CPU
capacity has been update. This has two effects:
a) when we enqueue a boosted task, we do not immediately boost its CPU
b) when we dequeue a boosted task, we can keep a CPU boosted even if not
required
This patch change the order of the SchedTune accounting and SchedFreq
updated to ensure to have always an updated representation of which
boosted tasks are runnable on a CPU before updating its capacity.
Reported-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Patrick Bellasi [Wed, 24 Aug 2016 10:02:29 +0000 (11:02 +0100)]
FIXUP: sched/tune: add fixes missing from a previous patch
The previous patch:
e7ce26f - FIXUP: sched/tune: fix accounting for runnable tasks
squashed together patches of a series to fix SchedTune's accounting
issues. However, in the consolidation and cleanup of the series to merge
in the Android Common Kernel, we somehow missed a couple of important
changes:
1) the schedtune_exit function is not more required, because
e7ce26f
fixes accounting of exiting tasks in a different way
2) the schedtune_initialized flag was not set at the end of
scheddtune_init_cgroup() thus failing to enabled SchedTune at boot.
This patch thus is to be considered an integration of
e7ce26f.
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
[jstultz: Cherry-picked from android-3.18. It should be noted that
some of this patch was already applied in the 4.4 patches (schedtune_exit
doesn't exist for example), but this patch just ensures things are totally
synced up]
Signed-off-by: John Stultz <john.stultz@linaro.org>
Jon Medhurst [Thu, 2 Jun 2016 12:18:08 +0000 (12:18 +0000)]
arm: Fix #if/#ifdef typo in topology.c
Probably a typo in arch/arm/kernel/topology.c
This patch fixes the warning...
arch/arm/kernel/topology.c: In function 'scale_cpu_capacity':
arch/arm/kernel/topology.c:47:5: warning: "CONFIG_CPU_FREQ" is not defined [-Wundef]
Fixes: Change-Id: If5e9e0ba8ff5a5d3236b373dbce8c72ea71b5e18
("arm: Enable max freq invariant scheduler load-tracking and capacity support")
Signed-off-by: Jon Medhurst <tixy@linaro.org>
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
[jstultz: Cherry-picked from android-3.18]
Signed-off-by: John Stultz <john.stultz@linaro.org>
Steve Muckle [Thu, 5 May 2016 01:56:45 +0000 (18:56 -0700)]
arm: Fix build error "conflicting types for 'scale_cpu_capacity'"
Commit "arm: Update arch_scale_cpu_capacity() to reflect change to
define" introduced a dependency on struct sched_domain in
arch/arm/include/asm/topologoy.h, but that structure is only currently
defined if CONFIG_CPU_FREQ is enabled, which causes
include/linux/cpufreq.h to get pulled in which defines it.
Include <linux/cpufreq.h> regardless of CONFIG_CPU_FREQ so struct
sched_domain is always defined.
Fixes: Change-Id: I372bd5e4c1e203428d72b18c8a806b06f3567ef6
("arm: Update arch_scale_cpu_capacity() to reflect change to define")
Signed-off-by: Steve Muckle <smuckle@linaro.org>
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
[jstultz: Cherry-picked from android-3.18]
Signed-off-by: John Stultz <john.stultz@linaro.org>
Amit Pundir [Wed, 24 Aug 2016 06:22:17 +0000 (11:52 +0530)]
sched/walt: use do_div instead of division operator
Use do_div() instead of "/" operator to fix undefined references to
"__aeabi_uldivmod" build error for ARCH=arm.
Also in TP_fast_assign(), along with do_div() usage, replace "," with
";" which would have resulted in a syntax error (!), because
'#define TP_fast_assign(args...) args' would have stripped off the ","
and left white space between these two assignments after CPP phase.
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
[jstultz: Cherry-picked from common/android-3.18]
Signed-off-by: John Stultz <john.stultz@linaro.org>
Amit Pundir [Mon, 29 Aug 2016 14:18:17 +0000 (19:48 +0530)]
DEBUG: cpufreq: fix cpu_capacity tracing build for non-smp systems
cpu curr capacity can only be traced for SMP systems. Non-SMP builds
will fail with:
drivers/cpufreq/cpufreq.c: In function ‘cpufreq_freq_transition_begin’:
drivers/cpufreq/cpufreq.c:438:22: error: implicit declaration of function ‘capacity_curr_of’ [-Werror=implicit-function-declaration]
trace_cpu_capacity(capacity_curr_of(cpu), cpu);
^
Fixes: Change-Id: Icd0930d11068fcb7d2b6a9a48e7ed974904e1081
("DEBUG: sched,cpufreq: add cpu_capacity change tracepoint")
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
[jstultz: Cherry-picked from common/android-3.18]
Signed-off-by: John Stultz <john.stultz@linaro.org>
Vladis Dronov [Thu, 31 Mar 2016 16:05:43 +0000 (12:05 -0400)]
UPSTREAM: ALSA: usb-audio: Fix double-free in error paths after snd_usb_add_audio_stream() call
(cherry picked from commit
836b34a935abc91e13e63053d0a83b24dfb5ea78)
create_fixed_stream_quirk(), snd_usb_parse_audio_interface() and
create_uaxx_quirk() functions allocate the audioformat object by themselves
and free it upon error before returning. However, once the object is linked
to a stream, it's freed again in snd_usb_audio_pcm_free(), thus it'll be
double-freed, eventually resulting in a memory corruption.
This patch fixes these failures in the error paths by unlinking the audioformat
object before freeing it.
Based on a patch by Takashi Iwai <tiwai@suse.de>
[Note for stable backports:
this patch requires the commit
902eb7fd1e4a ('ALSA: usb-audio: Minor
code cleanup in create_fixed_stream_quirk()')]
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=
1283358
Reported-by: Ralf Spenneberg <ralf@spenneberg.net>
Cc: <stable@vger.kernel.org> # see the note above
Signed-off-by: Vladis Dronov <vdronov@redhat.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Change-Id: I7073a17d8c99886d2f6ed7981892712ba7dd5873
Bug:
30952477
Takashi Iwai [Tue, 15 Mar 2016 11:14:49 +0000 (12:14 +0100)]
BACKPORT: ALSA: usb-audio: Minor code cleanup in create_fixed_stream_quirk()
(cherry picked from commit
902eb7fd1e4af3ac69b9b30f8373f118c92b9729)
Just a minor code cleanup: unify the error paths.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Change-Id: I8253a86235df2ac1258153c9e128fa158527567f
Bug:
30952477
Amit Pundir [Thu, 25 Aug 2016 05:36:37 +0000 (11:06 +0530)]
sched/walt: include missing header for arm_timer_read_counter()
Include clocksource/arm_arch_timer.h to fix implicit function
declaration of ‘arch_timer_read_counter’ build error for ARCH=arm.
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
[jstultz: Cherry-picked from common/android-3.18]
Signed-off-by: John Stultz <john.stultz@linaro.org>
Dmitry Shmidt [Fri, 9 Sep 2016 00:02:15 +0000 (17:02 -0700)]
Merge remote-tracking branch 'linaro-ext/EAS/v4.4-easv5.2+aosp-changes' into android-4.4
Change-Id: Ic24b43ee867bc4f70b31bedaad734717b64b86a1
John Stultz [Thu, 8 Sep 2016 23:43:21 +0000 (16:43 -0700)]
cpufreq: Kconfig: Fixup incorrect selection by CPU_FREQ_DEFAULT_GOV_SCHED
The CPU_FREQ_DEFAULT_GOV_SCHED option is incorrectly selecting
CPU_FREQ_GOV_INTERACTIVE, when it should be selecting
CPU_FREQ_GOV_SCHED.
Signed-off-by: John Stultz <john.stultz@linaro.org>
Mark Salyzyn [Wed, 31 Aug 2016 15:09:04 +0000 (08:09 -0700)]
FROMLIST: pstore: drop pmsg bounce buffer
(from https://lkml.org/lkml/2016/9/1/428)
(cherry pick from android-3.10 commit
b58133100b38f2bf83cad2d7097417a3a196ed0b)
Removing a bounce buffer copy operation in the pmsg driver path is
always better. We also gain in overall performance by not requesting
a vmalloc on every write as this can cause precious RT tasks, such
as user facing media operation, to stall while memory is being
reclaimed. Added a write_buf_user to the pstore functions, a backup
platform write_buf_user that uses the small buffer that is part of
the instance, and implemented a ramoops write_buf_user that only
supports PSTORE_TYPE_PMSG.
Signed-off-by: Mark Salyzyn <salyzyn@google.com>
Bug:
31057326
Change-Id: I4cdee1cd31467aa3e6c605bce2fbd4de5b0f8caa
Kees Cook [Wed, 7 Sep 2016 16:54:34 +0000 (09:54 -0700)]
UPSTREAM: usercopy: remove page-spanning test for now
A custom allocator without __GFP_COMP that copies to userspace has been
found in vmw_execbuf_process[1], so this disables the page-span checker
by placing it behind a CONFIG for future work where such things can be
tracked down later.
[1] https://bugzilla.redhat.com/show_bug.cgi?id=
1373326
Reported-by: Vinson Lee <vlee@freedesktop.org>
Fixes:
f5509cc18daa ("mm: Hardened usercopy")
Signed-off-by: Kees Cook <keescook@chromium.org>
Change-Id: I4177c0fb943f14a5faf5c70f5e54bf782c316f43
(cherry picked from commit
8e1f74ea02cf4562404c48c6882214821552c13f)
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Kees Cook [Wed, 7 Sep 2016 16:39:32 +0000 (09:39 -0700)]
UPSTREAM: usercopy: force check_object_size() inline
Just for good measure, make sure that check_object_size() is always
inlined too, as already done for copy_*_user() and __copy_*_user().
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Change-Id: Ibfdf4790d03fe426e68d9a864c55a0d1bbfb7d61
(cherry picked from commit
a85d6b8242dc78ef3f4542a0f979aebcbe77fc4e)
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Kees Cook [Wed, 31 Aug 2016 23:04:21 +0000 (16:04 -0700)]
BACKPORT: usercopy: fold builtin_const check into inline function
Instead of having each caller of check_object_size() need to remember to
check for a const size parameter, move the check into check_object_size()
itself. This actually matches the original implementation in PaX, though
this commit cleans up the now-redundant builtin_const() calls in the
various architectures.
Signed-off-by: Kees Cook <keescook@chromium.org>
Change-Id: I348809399c10ffa051251866063be674d064b9ff
(cherry picked from
81409e9e28058811c9ea865345e1753f8f677e44)
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Kees Cook [Tue, 6 Sep 2016 18:56:01 +0000 (11:56 -0700)]
UPSTREAM: x86/uaccess: force copy_*_user() to be inlined
As already done with __copy_*_user(), mark copy_*_user() as __always_inline.
Without this, the checks for things like __builtin_const_p() won't work
consistently in either hardened usercopy nor the recent adjustments for
detecting usercopy overflows at compile time.
The change in kernel text size is detectable, but very small:
text data bss dec hex filename
12118735 5768608 14229504 32116847 1ea106f vmlinux.before
12120207 5768608 14229504 32118319 1ea162f vmlinux.after
Signed-off-by: Kees Cook <keescook@chromium.org>
Change-Id: I284c85c2a782145f46655a91d4f83874c90eba61
(cherry picked from commit
e6971009a95a74f28c58bbae415c40effad1226c)
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Benjamin Tissoires [Tue, 19 Jan 2016 11:34:58 +0000 (12:34 +0100)]
UPSTREAM: HID: core: prevent out-of-bound readings
(cherry picked from commit
50220dead1650609206efe91f0cc116132d59b3f)
Plugging a Logitech DJ receiver with KASAN activated raises a bunch of
out-of-bound readings.
The fields are allocated up to MAX_USAGE, meaning that potentially, we do
not have enough fields to fit the incoming values.
Add checks and silence KASAN.
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Change-Id: Iaf25e882a6696884439d7091b5fbb0b350d893d3
Bug:
30951261
Omar Sandoval [Fri, 1 Jul 2016 07:39:35 +0000 (00:39 -0700)]
UPSTREAM: block: fix use-after-free in sys_ioprio_get()
(cherry picked from commit
8ba8682107ee2ca3347354e018865d8e1967c5f4)
get_task_ioprio() accesses the task->io_context without holding the task
lock and thus can race with exit_io_context(), leading to a
use-after-free. The reproducer below hits this within a few seconds on
my 4-core QEMU VM:
#define _GNU_SOURCE
#include <assert.h>
#include <unistd.h>
#include <sys/syscall.h>
#include <sys/wait.h>
int main(int argc, char **argv)
{
pid_t pid, child;
long nproc, i;
/* ioprio_set(IOPRIO_WHO_PROCESS, 0, IOPRIO_PRIO_VALUE(IOPRIO_CLASS_IDLE, 0)); */
syscall(SYS_ioprio_set, 1, 0, 0x6000);
nproc = sysconf(_SC_NPROCESSORS_ONLN);
for (i = 0; i < nproc; i++) {
pid = fork();
assert(pid != -1);
if (pid == 0) {
for (;;) {
pid = fork();
assert(pid != -1);
if (pid == 0) {
_exit(0);
} else {
child = wait(NULL);
assert(child == pid);
}
}
}
pid = fork();
assert(pid != -1);
if (pid == 0) {
for (;;) {
/* ioprio_get(IOPRIO_WHO_PGRP, 0); */
syscall(SYS_ioprio_get, 2, 0);
}
}
}
for (;;) {
/* ioprio_get(IOPRIO_WHO_PGRP, 0); */
syscall(SYS_ioprio_get, 2, 0);
}
return 0;
}
This gets us KASAN dumps like this:
[ 35.526914] ==================================================================
[ 35.530009] BUG: KASAN: out-of-bounds in get_task_ioprio+0x7b/0x90 at addr
ffff880066f34e6c
[ 35.530009] Read of size 2 by task ioprio-gpf/363
[ 35.530009] =============================================================================
[ 35.530009] BUG blkdev_ioc (Not tainted): kasan: bad access detected
[ 35.530009] -----------------------------------------------------------------------------
[ 35.530009] Disabling lock debugging due to kernel taint
[ 35.530009] INFO: Allocated in create_task_io_context+0x2b/0x370 age=0 cpu=0 pid=360
[ 35.530009] ___slab_alloc+0x55d/0x5a0
[ 35.530009] __slab_alloc.isra.20+0x2b/0x40
[ 35.530009] kmem_cache_alloc_node+0x84/0x200
[ 35.530009] create_task_io_context+0x2b/0x370
[ 35.530009] get_task_io_context+0x92/0xb0
[ 35.530009] copy_process.part.8+0x5029/0x5660
[ 35.530009] _do_fork+0x155/0x7e0
[ 35.530009] SyS_clone+0x19/0x20
[ 35.530009] do_syscall_64+0x195/0x3a0
[ 35.530009] return_from_SYSCALL_64+0x0/0x6a
[ 35.530009] INFO: Freed in put_io_context+0xe7/0x120 age=0 cpu=0 pid=1060
[ 35.530009] __slab_free+0x27b/0x3d0
[ 35.530009] kmem_cache_free+0x1fb/0x220
[ 35.530009] put_io_context+0xe7/0x120
[ 35.530009] put_io_context_active+0x238/0x380
[ 35.530009] exit_io_context+0x66/0x80
[ 35.530009] do_exit+0x158e/0x2b90
[ 35.530009] do_group_exit+0xe5/0x2b0
[ 35.530009] SyS_exit_group+0x1d/0x20
[ 35.530009] entry_SYSCALL_64_fastpath+0x1a/0xa4
[ 35.530009] INFO: Slab 0xffffea00019bcd00 objects=20 used=4 fp=0xffff880066f34ff0 flags=0x1fffe0000004080
[ 35.530009] INFO: Object 0xffff880066f34e58 @offset=3672 fp=0x0000000000000001
[ 35.530009] ==================================================================
Fix it by grabbing the task lock while we poke at the io_context.
Cc: stable@vger.kernel.org
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Change-Id: I3f5858cc9a1b9d4124ae7a6578660dec219d2c57
Bug:
30946378
Mohan Srinivasan [Thu, 8 Sep 2016 00:39:42 +0000 (17:39 -0700)]
Android: Fix build breakages.
The IO latency histogram change broke allmodconfig and
allnoconfig builds. This fixes those breakages.
Change-Id: I9cdae655b40ed155468f3cef25cdb74bb56c4d3e
Signed-off-by: Mohan Srinivasan <srmohan@google.com>
Peter Hurley [Fri, 27 Nov 2015 19:30:21 +0000 (14:30 -0500)]
UPSTREAM: tty: Prevent ldisc drivers from re-using stale tty fields
(cherry picked from commit
dd42bf1197144ede075a9d4793123f7689e164bc)
Line discipline drivers may mistakenly misuse ldisc-related fields
when initializing. For example, a failure to initialize tty->receive_room
in the N_GIGASET_M101 line discipline was recently found and fixed [1].
Now, the N_X25 line discipline has been discovered accessing the previous
line discipline's already-freed private data [2].
Harden the ldisc interface against misuse by initializing revelant
tty fields before instancing the new line discipline.
[1]
commit
fd98e9419d8d622a4de91f76b306af6aa627aa9c
Author: Tilman Schmidt <tilman@imap.cc>
Date: Tue Jul 14 00:37:13 2015 +0200
isdn/gigaset: reset tty->receive_room when attaching ser_gigaset
[2] Report from Sasha Levin <sasha.levin@oracle.com>
[ 634.336761] ==================================================================
[ 634.338226] BUG: KASAN: use-after-free in x25_asy_open_tty+0x13d/0x490 at addr
ffff8800a743efd0
[ 634.339558] Read of size 4 by task syzkaller_execu/8981
[ 634.340359] =============================================================================
[ 634.341598] BUG kmalloc-512 (Not tainted): kasan: bad access detected
...
[ 634.405018] Call Trace:
[ 634.405277] dump_stack (lib/dump_stack.c:52)
[ 634.405775] print_trailer (mm/slub.c:655)
[ 634.406361] object_err (mm/slub.c:662)
[ 634.406824] kasan_report_error (mm/kasan/report.c:138 mm/kasan/report.c:236)
[ 634.409581] __asan_report_load4_noabort (mm/kasan/report.c:279)
[ 634.411355] x25_asy_open_tty (drivers/net/wan/x25_asy.c:559 (discriminator 1))
[ 634.413997] tty_ldisc_open.isra.2 (drivers/tty/tty_ldisc.c:447)
[ 634.414549] tty_set_ldisc (drivers/tty/tty_ldisc.c:567)
[ 634.415057] tty_ioctl (drivers/tty/tty_io.c:2646 drivers/tty/tty_io.c:2879)
[ 634.423524] do_vfs_ioctl (fs/ioctl.c:43 fs/ioctl.c:607)
[ 634.427491] SyS_ioctl (fs/ioctl.c:622 fs/ioctl.c:613)
[ 634.427945] entry_SYSCALL_64_fastpath (arch/x86/entry/entry_64.S:188)
Cc: Tilman Schmidt <tilman@imap.cc>
Cc: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Change-Id: Ibed6feadfb9706d478f93feec3b240aecfc64af3
Bug:
30951112
Phil Turnbull [Tue, 2 Feb 2016 18:36:45 +0000 (13:36 -0500)]
UPSTREAM: netfilter: nfnetlink: correctly validate length of batch messages
(cherry picked from commit
c58d6c93680f28ac58984af61d0a7ebf4319c241)
If nlh->nlmsg_len is zero then an infinite loop is triggered because
'skb_pull(skb, msglen);' pulls zero bytes.
The calculation in nlmsg_len() underflows if 'nlh->nlmsg_len <
NLMSG_HDRLEN' which bypasses the length validation and will later
trigger an out-of-bound read.
If the length validation does fail then the malformed batch message is
copied back to userspace. However, we cannot do this because the
nlh->nlmsg_len can be invalid. This leads to an out-of-bounds read in
netlink_ack:
[ 41.455421] ==================================================================
[ 41.456431] BUG: KASAN: slab-out-of-bounds in memcpy+0x1d/0x40 at addr
ffff880119e79340
[ 41.456431] Read of size
4294967280 by task a.out/987
[ 41.456431] =============================================================================
[ 41.456431] BUG kmalloc-512 (Not tainted): kasan: bad access detected
[ 41.456431] -----------------------------------------------------------------------------
...
[ 41.456431] Bytes b4
ffff880119e79310: 00 00 00 00 d5 03 00 00 b0 fb fe ff 00 00 00 00 ................
[ 41.456431] Object
ffff880119e79320: 20 00 00 00 10 00 05 00 00 00 00 00 00 00 00 00 ...............
[ 41.456431] Object
ffff880119e79330: 14 00 0a 00 01 03 fc 40 45 56 11 22 33 10 00 05 .......@EV."3...
[ 41.456431] Object
ffff880119e79340: f0 ff ff ff 88 99 aa bb 00 14 00 0a 00 06 fe fb ................
^^ start of batch nlmsg with
nlmsg_len=
4294967280
...
[ 41.456431] Memory state around the buggy address:
[ 41.456431]
ffff880119e79400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 41.456431]
ffff880119e79480: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 41.456431] >
ffff880119e79500: 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc fc
[ 41.456431] ^
[ 41.456431]
ffff880119e79580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 41.456431]
ffff880119e79600: fc fc fc fc fc fc fc fc fc fc fb fb fb fb fb fb
[ 41.456431] ==================================================================
Fix this with better validation of nlh->nlmsg_len and by setting
NFNL_BATCH_FAILURE if any batch message fails length validation.
CAP_NET_ADMIN is required to trigger the bugs.
Fixes:
9ea2aa8b7dba ("netfilter: nfnetlink: validate nfnetlink header from batch")
Signed-off-by: Phil Turnbull <phil.turnbull@oracle.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change-Id: Id3e15c40cb464bf2791af907c235d8a316b2449c
Bug:
30947055
Riley Andrews [Tue, 6 Sep 2016 22:16:25 +0000 (15:16 -0700)]
cpuset: Make cpusets restore on hotplug
This deliberately changes the behavior of the per-cpuset
cpus file to not be effected by hotplug. When a cpu is offlined,
it will be removed from the cpuset/cpus file. When a cpu is onlined,
if the cpuset originally requested that that cpu was part of the cpuset,
that cpu will be restored to the cpuset. The cpus files still
have to be hierachical, but the ranges no longer have to be out of
the currently online cpus, just the physically present cpus.
Change-Id: I22cdf33e7d312117bcefba1aeb0125e1ada289a9
Signed-off-by: Dmitry Shmidt <dimitrysh@google.com>
Joonsoo Kim [Tue, 15 Mar 2016 21:55:12 +0000 (14:55 -0700)]
UPSTREAM: mm/slub: support left redzone
SLUB already has a redzone debugging feature. But it is only positioned
at the end of object (aka right redzone) so it cannot catch left oob.
Although current object's right redzone acts as left redzone of next
object, first object in a slab cannot take advantage of this effect.
This patch explicitly adds a left red zone to each object to detect left
oob more precisely.
Background:
Someone complained to me that left OOB doesn't catch even if KASAN is
enabled which does page allocation debugging. That page is out of our
control so it would be allocated when left OOB happens and, in this
case, we can't find OOB. Moreover, SLUB debugging feature can be
enabled without page allocator debugging and, in this case, we will miss
that OOB.
Before trying to implement, I expected that changes would be too
complex, but, it doesn't look that complex to me now. Almost changes
are applied to debug specific functions so I feel okay.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: Ib893a17ecabd692e6c402e864196bf89cd6781a5
(cherry picked from commit
d86bd1bece6fc41d59253002db5441fe960a37f6)
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Linus Torvalds [Thu, 17 Dec 2015 17:45:09 +0000 (09:45 -0800)]
UPSTREAM: x86: reorganize SMAP handling in user space accesses
This reorganizes how we do the stac/clac instructions in the user access
code. Instead of adding the instructions directly to the same inline
asm that does the actual user level access and exception handling, add
them at a higher level.
This is mainly preparation for the next step, where we will expose an
interface to allow users to mark several accesses together as being user
space accesses, but it does already clean up some code:
- the inlined trivial cases of copy_in_user() now do stac/clac just
once over the accesses: they used to do one pair around the user
space read, and another pair around the write-back.
- the {get,put}_user_ex() macros that are used with the catch/try
handling don't do any stac/clac at all, because that happens in the
try/catch surrounding them.
Other than those two cleanups that happened naturally from the
re-organization, this should not make any difference. Yet.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: Iaad8756bc8e95876e2e2a7d7bbd333fc478ab441
(cherry picked from commit
11f1a4b9755f5dbc3e822a96502ebe9b044b14d8)
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Linus Torvalds [Fri, 19 Aug 2016 19:47:01 +0000 (12:47 -0700)]
UPSTREAM: Make the hardened user-copy code depend on having a hardened allocator
The kernel test robot reported a usercopy failure in the new hardened
sanity checks, due to a page-crossing copy of the FPU state into the
task structure.
This happened because the kernel test robot was testing with SLOB, which
doesn't actually do the required book-keeping for slab allocations, and
as a result the hardening code didn't realize that the task struct
allocation was one single allocation - and the sanity checks fail.
Since SLOB doesn't even claim to support hardening (and you really
shouldn't use it), the straightforward solution is to just make the
usercopy hardening code depend on the allocator supporting it.
Reported-by: kernel test robot <xiaolong.ye@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: I37d51f866f873341bf7d5297249899b852e1c6ce
(cherry picked from commit
6040e57658eee6eb1315a26119101ca832d1f854)
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Mohan Srinivasan [Fri, 26 Aug 2016 01:31:01 +0000 (18:31 -0700)]
Android: MMC/UFS IO Latency Histograms.
This patch adds a new sysfs node (latency_hist) and reports
IO (svc time) latency histograms. Disabled by default, can be
enabled by echoing 0 into latency_hist, stats can be cleared
by writing 2 into latency_hist. This commit fixes the 32 bit
build breakage in the previous commit. Tested on both 32 bit
and 64 bit arm devices.
Bug:
30677035
Change-Id: I9a615a16616d80f87e75676ac4d078a5c429dcf9
Signed-off-by: Mohan Srinivasan <srmohan@google.com>
Josh Poimboeuf [Mon, 22 Aug 2016 16:53:59 +0000 (11:53 -0500)]
UPSTREAM: usercopy: fix overlap check for kernel text
When running with a local patch which moves the '_stext' symbol to the
very beginning of the kernel text area, I got the following panic with
CONFIG_HARDENED_USERCOPY:
usercopy: kernel memory exposure attempt detected from
ffff88103dfff000 (<linear kernel text>) (4096 bytes)
------------[ cut here ]------------
kernel BUG at mm/usercopy.c:79!
invalid opcode: 0000 [#1] SMP
...
CPU: 0 PID: 4800 Comm: cp Not tainted 4.8.0-rc3.after+ #1
Hardware name: Dell Inc. PowerEdge R720/0X3D66, BIOS 2.5.4 01/22/2016
task:
ffff880817444140 task.stack:
ffff880816274000
RIP: 0010:[<
ffffffff8121c796>] __check_object_size+0x76/0x413
RSP: 0018:
ffff880816277c40 EFLAGS:
00010246
RAX:
000000000000006b RBX:
ffff88103dfff000 RCX:
0000000000000000
RDX:
0000000000000000 RSI:
ffff88081f80dfa8 RDI:
ffff88081f80dfa8
RBP:
ffff880816277c90 R08:
000000000000054c R09:
0000000000000000
R10:
0000000000000005 R11:
0000000000000006 R12:
0000000000001000
R13:
ffff88103e000000 R14:
ffff88103dffffff R15:
0000000000000001
FS:
00007fb9d1750800(0000) GS:
ffff88081f800000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
00000000021d2000 CR3:
000000081a08f000 CR4:
00000000001406f0
Stack:
ffff880816277cc8 0000000000010000 000000043de07000 0000000000000000
0000000000001000 ffff880816277e60 0000000000001000 ffff880816277e28
000000000000c000 0000000000001000 ffff880816277ce8 ffffffff8136c3a6
Call Trace:
[<
ffffffff8136c3a6>] copy_page_to_iter_iovec+0xa6/0x1c0
[<
ffffffff8136e766>] copy_page_to_iter+0x16/0x90
[<
ffffffff811970e3>] generic_file_read_iter+0x3e3/0x7c0
[<
ffffffffa06a738d>] ? xfs_file_buffered_aio_write+0xad/0x260 [xfs]
[<
ffffffff816e6262>] ? down_read+0x12/0x40
[<
ffffffffa06a61b1>] xfs_file_buffered_aio_read+0x51/0xc0 [xfs]
[<
ffffffffa06a6692>] xfs_file_read_iter+0x62/0xb0 [xfs]
[<
ffffffff812224cf>] __vfs_read+0xdf/0x130
[<
ffffffff81222c9e>] vfs_read+0x8e/0x140
[<
ffffffff81224195>] SyS_read+0x55/0xc0
[<
ffffffff81003a47>] do_syscall_64+0x67/0x160
[<
ffffffff816e8421>] entry_SYSCALL64_slow_path+0x25/0x25
RIP: 0033:[<
00007fb9d0c33c00>] 0x7fb9d0c33c00
RSP: 002b:
00007ffc9c262f28 EFLAGS:
00000246 ORIG_RAX:
0000000000000000
RAX:
ffffffffffffffda RBX:
fffffffffff8ffff RCX:
00007fb9d0c33c00
RDX:
0000000000010000 RSI:
00000000021c3000 RDI:
0000000000000004
RBP:
00000000021c3000 R08:
0000000000000000 R09:
00007ffc9c264d6c
R10:
00007ffc9c262c50 R11:
0000000000000246 R12:
0000000000010000
R13:
00007ffc9c2630b0 R14:
0000000000000004 R15:
0000000000010000
Code: 81 48 0f 44 d0 48 c7 c6 90 4d a3 81 48 c7 c0 bb b3 a2 81 48 0f 44 f0 4d 89 e1 48 89 d9 48 c7 c7 68 16 a3 81 31 c0 e8 f4 57 f7 ff <0f> 0b 48 8d 90 00 40 00 00 48 39 d3 0f 83 22 01 00 00 48 39 c3
RIP [<
ffffffff8121c796>] __check_object_size+0x76/0x413
RSP <
ffff880816277c40>
The checked object's range [
ffff88103dfff000,
ffff88103e000000) is
valid, so there shouldn't have been a BUG. The hardened usercopy code
got confused because the range's ending address is the same as the
kernel's text starting address at 0xffff88103e000000. The overlap check
is slightly off.
Fixes:
f5509cc18daa ("mm: Hardened usercopy")
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Change-Id: I839dbf4ddbb4d9874026a42abed557eb9b3f8bef
(cherry picked from commit
94cd97af690dd9537818dc9841d0ec68bb1dd877)
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Eric Biggers [Fri, 19 Aug 2016 19:15:22 +0000 (12:15 -0700)]
UPSTREAM: usercopy: avoid potentially undefined behavior in pointer math
check_bogus_address() checked for pointer overflow using this expression,
where 'ptr' has type 'const void *':
ptr + n < ptr
Since pointer wraparound is undefined behavior, gcc at -O2 by default
treats it like the following, which would not behave as intended:
(long)n < 0
Fortunately, this doesn't currently happen for kernel code because kernel
code is compiled with -fno-strict-overflow. But the expression should be
fixed anyway to use well-defined integer arithmetic, since it could be
treated differently by different compilers in the future or could be
reported by tools checking for undefined behavior.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Change-Id: I73b13be651cf35c03482f2014bf2c3dd291518ab
(cherry picked from commit
7329a655875a2f4bd6984fe8a7e00a6981e802f3)
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Kees Cook [Thu, 23 Jun 2016 22:24:05 +0000 (15:24 -0700)]
UPSTREAM: mm: SLUB hardened usercopy support
Under CONFIG_HARDENED_USERCOPY, this adds object size checking to the
SLUB allocator to catch any copies that may span objects. Includes a
redzone handling fix discovered by Michael Ellerman.
Based on code from PaX and grsecurity.
Signed-off-by: Kees Cook <keescook@chromium.org>
Tested-by: Michael Ellerman <mpe@ellerman.id.au>
Reviwed-by: Laura Abbott <labbott@redhat.com>
Change-Id: I52dc6fb3a3492b937d52b5cf9c046bf03dc40a3a
(cherry picked from commit
ed18adc1cdd00a5c55a20fbdaed4804660772281)
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Kees Cook [Thu, 23 Jun 2016 22:20:59 +0000 (15:20 -0700)]
UPSTREAM: mm: SLAB hardened usercopy support
Under CONFIG_HARDENED_USERCOPY, this adds object size checking to the
SLAB allocator to catch any copies that may span objects.
Based on code from PaX and grsecurity.
Signed-off-by: Kees Cook <keescook@chromium.org>
Tested-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Change-Id: Ib910a71fdc2ab808e1a45b6d33e9bae1681a1f4a
(cherry picked from commit
04385fc5e8fffed84425d909a783c0f0c587d847)
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Kees Cook [Thu, 23 Jun 2016 22:59:42 +0000 (15:59 -0700)]
BACKPORT: arm64/uaccess: Enable hardened usercopy
Enables CONFIG_HARDENED_USERCOPY checks on arm64. As done by KASAN in -next,
renames the low-level functions to __arch_copy_*_user() so a static inline
can do additional work before the copy.
Signed-off-by: Kees Cook <keescook@chromium.org>
Change-Id: I1286cae8e6ffcf12ea54ddd62f1a6d2ce742c8d0
(cherry picked from commit
faf5b63e294151d6ac24ca6906d6f221bd3496cd)
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Kees Cook [Thu, 23 Jun 2016 22:06:53 +0000 (15:06 -0700)]
BACKPORT: ARM: uaccess: Enable hardened usercopy
Enables CONFIG_HARDENED_USERCOPY checks on arm.
Based on code from PaX and grsecurity.
Signed-off-by: Kees Cook <keescook@chromium.org>
Change-Id: I03a44ca7a8c56832f15a6a74ac32e9330df3ac3b
(cherry picked from commit
dfd45b6103c973bfcea2341d89e36faf947dbc33)
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Kees Cook [Thu, 23 Jun 2016 22:04:01 +0000 (15:04 -0700)]
BACKPORT: x86/uaccess: Enable hardened usercopy
Enables CONFIG_HARDENED_USERCOPY checks on x86. This is done both in
copy_*_user() and __copy_*_user() because copy_*_user() actually calls
down to _copy_*_user() and not __copy_*_user().
Based on code from PaX and grsecurity.
Signed-off-by: Kees Cook <keescook@chromium.org>
Tested-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Change-Id: I260db1d4572bdd2f779200aca99d03a170658440
(cherry picked from commit
5b710f34e194c6b7710f69fdb5d798fdf35b98c1)
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Kees Cook [Tue, 7 Jun 2016 18:05:33 +0000 (11:05 -0700)]
BACKPORT: mm: Hardened usercopy
This is the start of porting PAX_USERCOPY into the mainline kernel. This
is the first set of features, controlled by CONFIG_HARDENED_USERCOPY. The
work is based on code by PaX Team and Brad Spengler, and an earlier port
from Casey Schaufler. Additional non-slab page tests are from Rik van Riel.
This patch contains the logic for validating several conditions when
performing copy_to_user() and copy_from_user() on the kernel object
being copied to/from:
- address range doesn't wrap around
- address range isn't NULL or zero-allocated (with a non-zero copy size)
- if on the slab allocator:
- object size must be less than or equal to copy size (when check is
implemented in the allocator, which appear in subsequent patches)
- otherwise, object must not span page allocations (excepting Reserved
and CMA ranges)
- if on the stack
- object must not extend before/after the current process stack
- object must be contained by a valid stack frame (when there is
arch/build support for identifying stack frames)
- object must not overlap with kernel text
Signed-off-by: Kees Cook <keescook@chromium.org>
Tested-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Tested-by: Michael Ellerman <mpe@ellerman.id.au>
Change-Id: Iff3b5f1ddb04acd99ccf9a9046c7797363962b2a
(cherry picked from commit
f5509cc18daa7f82bcc553be70df2117c8eedc16)
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Kees Cook [Tue, 12 Jul 2016 23:19:48 +0000 (16:19 -0700)]
BACKPORT: mm: Implement stack frame object validation
This creates per-architecture function arch_within_stack_frames() that
should validate if a given object is contained by a kernel stack frame.
Initial implementation is on x86.
This is based on code from PaX.
Signed-off-by: Kees Cook <keescook@chromium.org>
Change-Id: I1f3b299bb8991d65dcdac6af85d633d4b7776df1
(cherry picked from commit
0f60a8efe4005ab5e65ce000724b04d4ca04a199)
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Laura Abbott [Tue, 19 Jul 2016 22:00:04 +0000 (15:00 -0700)]
UPSTREAM: mm: Add is_migrate_cma_page
Code such as hardened user copy[1] needs a way to tell if a
page is CMA or not. Add is_migrate_cma_page in a similar way
to is_migrate_isolate_page.
[1]http://article.gmane.org/gmane.linux.kernel.mm/155238
Signed-off-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Change-Id: I1f9aa13d8d063038fa70b93282a836648fbb4f6d
(cherry picked from commit
7c15d9bb8231f998ae7dc0b72415f5215459f7fb)
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Linus Torvalds [Mon, 8 Aug 2016 20:02:01 +0000 (13:02 -0700)]
UPSTREAM: unsafe_[get|put]_user: change interface to use a error target label
When I initially added the unsafe_[get|put]_user() helpers in commit
5b24a7a2aa20 ("Add 'unsafe' user access functions for batched
accesses"), I made the mistake of modeling the interface on our
traditional __[get|put]_user() functions, which return zero on success,
or -EFAULT on failure.
That interface is fairly easy to use, but it's actually fairly nasty for
good code generation, since it essentially forces the caller to check
the error value for each access.
In particular, since the error handling is already internally
implemented with an exception handler, and we already use "asm goto" for
various other things, we could fairly easily make the error cases just
jump directly to an error label instead, and avoid the need for explicit
checking after each operation.
So switch the interface to pass in an error label, rather than checking
the error value in the caller. Best do it now before we start growing
more users (the signal handling code in particular would be a good place
to use the new interface).
So rather than
if (unsafe_get_user(x, ptr))
... handle error ..
the interface is now
unsafe_get_user(x, ptr, label);
where an error during the user mode fetch will now just cause a jump to
'label' in the caller.
Right now the actual _implementation_ of this all still ends up being a
"if (err) goto label", and does not take advantage of any exception
label tricks, but for "unsafe_put_user()" in particular it should be
fairly straightforward to convert to using the exception table model.
Note that "unsafe_get_user()" is much harder to convert to a clever
exception table model, because current versions of gcc do not allow the
use of "asm goto" (for the exception) with output values (for the actual
value to be fetched). But that is hopefully not a limitation in the
long term.
[ Also note that it might be a good idea to switch unsafe_get_user() to
actually _return_ the value it fetches from user space, but this
commit only changes the error handling semantics ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: Ib905a84a04d46984320f6fd1056da4d72f3d6b53
(cherry picked from commit
1bd4403d86a1c06cb6cc9ac87664a0c9d3413d51)
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Ard Biesheuvel [Thu, 23 Jun 2016 13:53:17 +0000 (15:53 +0200)]
BACKPORT: arm64: mm: fix location of _etext
As Kees Cook notes in the ARM counterpart of this patch [0]:
The _etext position is defined to be the end of the kernel text code,
and should not include any part of the data segments. This interferes
with things that might check memory ranges and expect executable code
up to _etext.
In particular, Kees is referring to the HARDENED_USERCOPY patch set [1],
which rejects attempts to call copy_to_user() on kernel ranges containing
executable code, but does allow access to the .rodata segment. Regardless
of whether one may or may not agree with the distinction, it makes sense
for _etext to have the same meaning across architectures.
So let's put _etext where it belongs, between .text and .rodata, and fix
up existing references to use __init_begin instead, which unlike _end_rodata
includes the exception and notes sections as well.
The _etext references in kaslr.c are left untouched, since its references
to [_stext, _etext) are meant to capture potential jump instruction targets,
and so disregarding .rodata is actually an improvement here.
[0] http://article.gmane.org/gmane.linux.kernel/
2245084
[1] http://thread.gmane.org/gmane.linux.kernel.hardened.devel/2502
Reported-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Change-Id: I8f6582525217b9ca324f6a382ea52d30ce1d0dbd
(cherry picked from commit
9fdc14c55cd6579d619ccd9d40982e0805e62b6d)
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Kees Cook [Thu, 23 Jun 2016 20:28:47 +0000 (21:28 +0100)]
BACKPORT: ARM: 8583/1: mm: fix location of _etext
The _etext position is defined to be the end of the kernel text code,
and should not include any part of the data segments. This interferes
with things that might check memory ranges and expect executable code
up to _etext. Just to be conservative, leave the kernel resource as
it was, using __init_begin instead of _etext as the end mark.
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Change-Id: Ida514d1359dbe6f782f562ce29b4ba09ae72bfc0
(cherry picked from commit
14c4a533e0996f95a0a64dfd0b6252d788cebc74)
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Linus Torvalds [Thu, 17 Dec 2015 18:05:19 +0000 (10:05 -0800)]
UPSTREAM: Use the new batched user accesses in generic user string handling
This converts the generic user string functions to use the batched user
access functions.
It makes a big difference on Skylake, which is the first x86
microarchitecture to implement SMAP. The STAC/CLAC instructions are not
very fast, and doing them for each access inside the loop that copies
strings from user space (which is what the pathname handling does for
every pathname the kernel uses, for example) is very inefficient.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: Ic39a686b4bb1ad9cd16ad0887bb669beed6fe8aa
(cherry picked from commit
9fd4470ff4974c41b1db43c3b355b9085af9c12a)
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Linus Torvalds [Thu, 17 Dec 2015 17:57:27 +0000 (09:57 -0800)]
UPSTREAM: Add 'unsafe' user access functions for batched accesses
The naming is meant to discourage random use: the helper functions are
not really any more "unsafe" than the traditional double-underscore
functions (which need the address range checking), but they do need even
more infrastructure around them, and should not be used willy-nilly.
In addition to checking the access range, these user access functions
require that you wrap the user access with a "user_acess_{begin,end}()"
around it.
That allows architectures that implement kernel user access control
(x86: SMAP, arm64: PAN) to do the user access control in the wrapping
user_access_begin/end part, and then batch up the actual user space
accesses using the new interfaces.
The main (and hopefully only) use for these are for core generic access
helpers, initially just the generic user string functions
(strnlen_user() and strncpy_from_user()).
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: Ic64efea41f97171bdbdabe3e531489aebd9b6fac
(cherry picked from commit
5b24a7a2aa2040c8c50c3b71122901d01661ff78)
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Mohamad Ayyash [Wed, 11 May 2016 20:18:35 +0000 (13:18 -0700)]
BACKPORT: Don't show empty tag stats for unprivileged uids
BUG:
27577101
BUG:
27532522
Signed-off-by: Mohamad Ayyash <mkayyash@google.com>
Eric Dumazet [Wed, 17 Aug 2016 12:56:26 +0000 (05:56 -0700)]
UPSTREAM: tcp: fix use after free in tcp_xmit_retransmit_queue()
(cherry picked from commit
bb1fceca22492109be12640d49f5ea5a544c6bb4)
When tcp_sendmsg() allocates a fresh and empty skb, it puts it at the
tail of the write queue using tcp_add_write_queue_tail()
Then it attempts to copy user data into this fresh skb.
If the copy fails, we undo the work and remove the fresh skb.
Unfortunately, this undo lacks the change done to tp->highest_sack and
we can leave a dangling pointer (to a freed skb)
Later, tcp_xmit_retransmit_queue() can dereference this pointer and
access freed memory. For regular kernels where memory is not unmapped,
this might cause SACK bugs because tcp_highest_sack_seq() is buggy,
returning garbage instead of tp->snd_nxt, but with various debug
features like CONFIG_DEBUG_PAGEALLOC, this can crash the kernel.
This bug was found by Marco Grassi thanks to syzkaller.
Fixes:
6859d49475d4 ("[TCP]: Abstract tp->highest_sack accessing & point to next skb")
Reported-by: Marco Grassi <marco.gra@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change-Id: I58bb02d6e4e399612e8580b9e02d11e661df82f5
Bug:
31183296
Amit Pundir [Fri, 2 Sep 2016 04:43:21 +0000 (10:13 +0530)]
ANDROID: base-cfg: drop SECCOMP_FILTER config
Don't need to set SECCOMP_FILTER explicitly since CONFIG_SECCOMP=y will
select that config anyway.
Fixes:
a49dcf2e745c ("ANDROID: base-cfg: enable SECCOMP config")
Change-Id: Iff18ed4d2db5a55b9f9480d5ecbeef7b818b3837
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
Mathias Krause [Thu, 5 May 2016 23:22:26 +0000 (16:22 -0700)]
UPSTREAM: proc: prevent accessing /proc/<PID>/environ until it's ready
(cherry picked from commit
8148a73c9901a8794a50f950083c00ccf97d43b3)
If /proc/<PID>/environ gets read before the envp[] array is fully set up
in create_{aout,elf,elf_fdpic,flat}_tables(), we might end up trying to
read more bytes than are actually written, as env_start will already be
set but env_end will still be zero, making the range calculation
underflow, allowing to read beyond the end of what has been written.
Fix this as it is done for /proc/<PID>/cmdline by testing env_end for
zero. It is, apparently, intentionally set last in create_*_tables().
This bug was found by the PaX size_overflow plugin that detected the
arithmetic underflow of 'this_len = env_end - (env_start + src)' when
env_end is still zero.
The expected consequence is that userland trying to access
/proc/<PID>/environ of a not yet fully set up process may get
inconsistent data as we're in the middle of copying in the environment
variables.
Fixes: https://forums.grsecurity.net/viewtopic.php?f=3&t=4363
Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=116461
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Emese Revfy <re.emese@gmail.com>
Cc: Pax Team <pageexec@freemail.hu>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Mateusz Guzik <mguzik@redhat.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Jarod Wilson <jarod@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: Ia2f58d48c15478ed4b6e237b63e704c70ff21e96
Bug:
30951939
Dan Carpenter [Wed, 3 Feb 2016 15:34:00 +0000 (13:34 -0200)]
UPSTREAM: [media] xc2028: unlock on error in xc2028_set_config()
(cherry picked from commit
210bd104c6acd31c3c6b8b075b3f12d4a9f6b60d)
We have to unlock before returning -ENOMEM.
Fixes:
8dfbcc4351a0 ('[media] xc2028: avoid use after free')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Change-Id: I7b6ba9fde5c6e29467e6de23d398af2fe56e2547
Bug:
30946097
Mauro Carvalho Chehab [Thu, 28 Jan 2016 11:22:44 +0000 (09:22 -0200)]
UPSTREAM: [media] xc2028: avoid use after free
(cherry picked from commit
8dfbcc4351a0b6d2f2d77f367552f48ffefafe18)
If struct xc2028_config is passed without a firmware name,
the following trouble may happen:
[11009.907205] xc2028 5-0061: type set to XCeive xc2028/xc3028 tuner
[11009.907491] ==================================================================
[11009.907750] BUG: KASAN: use-after-free in strcmp+0x96/0xb0 at addr
ffff8803bd78ab40
[11009.907992] Read of size 1 by task modprobe/28992
[11009.907994] =============================================================================
[11009.907997] BUG kmalloc-16 (Tainted: G W ): kasan: bad access detected
[11009.907999] -----------------------------------------------------------------------------
[11009.908008] INFO: Allocated in xhci_urb_enqueue+0x214/0x14c0 [xhci_hcd] age=0 cpu=3 pid=28992
[11009.908012] ___slab_alloc+0x581/0x5b0
[11009.908014] __slab_alloc+0x51/0x90
[11009.908017] __kmalloc+0x27b/0x350
[11009.908022] xhci_urb_enqueue+0x214/0x14c0 [xhci_hcd]
[11009.908026] usb_hcd_submit_urb+0x1e8/0x1c60
[11009.908029] usb_submit_urb+0xb0e/0x1200
[11009.908032] usb_serial_generic_write_start+0xb6/0x4c0
[11009.908035] usb_serial_generic_write+0x92/0xc0
[11009.908039] usb_console_write+0x38a/0x560
[11009.908045] call_console_drivers.constprop.14+0x1ee/0x2c0
[11009.908051] console_unlock+0x40d/0x900
[11009.908056] vprintk_emit+0x4b4/0x830
[11009.908061] vprintk_default+0x1f/0x30
[11009.908064] printk+0x99/0xb5
[11009.908067] kasan_report_error+0x10a/0x550
[11009.908070] __asan_report_load1_noabort+0x43/0x50
[11009.908074] INFO: Freed in xc2028_set_config+0x90/0x630 [tuner_xc2028] age=1 cpu=3 pid=28992
[11009.908077] __slab_free+0x2ec/0x460
[11009.908080] kfree+0x266/0x280
[11009.908083] xc2028_set_config+0x90/0x630 [tuner_xc2028]
[11009.908086] xc2028_attach+0x310/0x8a0 [tuner_xc2028]
[11009.908090] em28xx_attach_xc3028.constprop.7+0x1f9/0x30d [em28xx_dvb]
[11009.908094] em28xx_dvb_init.part.3+0x8e4/0x5cf4 [em28xx_dvb]
[11009.908098] em28xx_dvb_init+0x81/0x8a [em28xx_dvb]
[11009.908101] em28xx_register_extension+0xd9/0x190 [em28xx]
[11009.908105] em28xx_dvb_register+0x10/0x1000 [em28xx_dvb]
[11009.908108] do_one_initcall+0x141/0x300
[11009.908111] do_init_module+0x1d0/0x5ad
[11009.908114] load_module+0x6666/0x9ba0
[11009.908117] SyS_finit_module+0x108/0x130
[11009.908120] entry_SYSCALL_64_fastpath+0x16/0x76
[11009.908123] INFO: Slab 0xffffea000ef5e280 objects=25 used=25 fp=0x (null) flags=0x2ffff8000004080
[11009.908126] INFO: Object 0xffff8803bd78ab40 @offset=2880 fp=0x0000000000000001
[11009.908130] Bytes b4
ffff8803bd78ab30: 01 00 00 00 2a 07 00 00 9d 28 00 00 01 00 00 00 ....*....(......
[11009.908133] Object
ffff8803bd78ab40: 01 00 00 00 00 00 00 00 b0 1d c3 6a 00 88 ff ff ...........j....
[11009.908137] CPU: 3 PID: 28992 Comm: modprobe Tainted: G B W 4.5.0-rc1+ #43
[11009.908140] Hardware name: /NUC5i7RYB, BIOS RYBDWi35.86A.0350.2015.0812.1722 08/12/2015
[11009.908142]
ffff8803bd78a000 ffff8802c273f1b8 ffffffff81932007 ffff8803c6407a80
[11009.908148]
ffff8802c273f1e8 ffffffff81556759 ffff8803c6407a80 ffffea000ef5e280
[11009.908153]
ffff8803bd78ab40 dffffc0000000000 ffff8802c273f210 ffffffff8155ccb4
[11009.908158] Call Trace:
[11009.908162] [<
ffffffff81932007>] dump_stack+0x4b/0x64
[11009.908165] [<
ffffffff81556759>] print_trailer+0xf9/0x150
[11009.908168] [<
ffffffff8155ccb4>] object_err+0x34/0x40
[11009.908171] [<
ffffffff8155f260>] kasan_report_error+0x230/0x550
[11009.908175] [<
ffffffff81237d71>] ? trace_hardirqs_off_caller+0x21/0x290
[11009.908179] [<
ffffffff8155e926>] ? kasan_unpoison_shadow+0x36/0x50
[11009.908182] [<
ffffffff8155f5c3>] __asan_report_load1_noabort+0x43/0x50
[11009.908185] [<
ffffffff8155ea00>] ? __asan_register_globals+0x50/0xa0
[11009.908189] [<
ffffffff8194cea6>] ? strcmp+0x96/0xb0
[11009.908192] [<
ffffffff8194cea6>] strcmp+0x96/0xb0
[11009.908196] [<
ffffffffa13ba4ac>] xc2028_set_config+0x15c/0x630 [tuner_xc2028]
[11009.908200] [<
ffffffffa13bac90>] xc2028_attach+0x310/0x8a0 [tuner_xc2028]
[11009.908203] [<
ffffffff8155ea78>] ? memset+0x28/0x30
[11009.908206] [<
ffffffffa13ba980>] ? xc2028_set_config+0x630/0x630 [tuner_xc2028]
[11009.908211] [<
ffffffffa157a59a>] em28xx_attach_xc3028.constprop.7+0x1f9/0x30d [em28xx_dvb]
[11009.908215] [<
ffffffffa157aa2a>] ? em28xx_dvb_init.part.3+0x37c/0x5cf4 [em28xx_dvb]
[11009.908219] [<
ffffffffa157a3a1>] ? hauppauge_hvr930c_init+0x487/0x487 [em28xx_dvb]
[11009.908222] [<
ffffffffa01795ac>] ? lgdt330x_attach+0x1cc/0x370 [lgdt330x]
[11009.908226] [<
ffffffffa01793e0>] ? i2c_read_demod_bytes.isra.2+0x210/0x210 [lgdt330x]
[11009.908230] [<
ffffffff812e87d0>] ? ref_module.part.15+0x10/0x10
[11009.908233] [<
ffffffff812e56e0>] ? module_assert_mutex_or_preempt+0x80/0x80
[11009.908238] [<
ffffffffa157af92>] em28xx_dvb_init.part.3+0x8e4/0x5cf4 [em28xx_dvb]
[11009.908242] [<
ffffffffa157a6ae>] ? em28xx_attach_xc3028.constprop.7+0x30d/0x30d [em28xx_dvb]
[11009.908245] [<
ffffffff8195222d>] ? string+0x14d/0x1f0
[11009.908249] [<
ffffffff8195381f>] ? symbol_string+0xff/0x1a0
[11009.908253] [<
ffffffff81953720>] ? uuid_string+0x6f0/0x6f0
[11009.908257] [<
ffffffff811a775e>] ? __kernel_text_address+0x7e/0xa0
[11009.908260] [<
ffffffff8104b02f>] ? print_context_stack+0x7f/0xf0
[11009.908264] [<
ffffffff812e9846>] ? __module_address+0xb6/0x360
[11009.908268] [<
ffffffff8137fdc9>] ? is_ftrace_trampoline+0x99/0xe0
[11009.908271] [<
ffffffff811a775e>] ? __kernel_text_address+0x7e/0xa0
[11009.908275] [<
ffffffff81240a70>] ? debug_check_no_locks_freed+0x290/0x290
[11009.908278] [<
ffffffff8104a24b>] ? dump_trace+0x11b/0x300
[11009.908282] [<
ffffffffa13e8143>] ? em28xx_register_extension+0x23/0x190 [em28xx]
[11009.908285] [<
ffffffff81237d71>] ? trace_hardirqs_off_caller+0x21/0x290
[11009.908289] [<
ffffffff8123ff56>] ? trace_hardirqs_on_caller+0x16/0x590
[11009.908292] [<
ffffffff812404dd>] ? trace_hardirqs_on+0xd/0x10
[11009.908296] [<
ffffffffa13e8143>] ? em28xx_register_extension+0x23/0x190 [em28xx]
[11009.908299] [<
ffffffff822dcbb0>] ? mutex_trylock+0x400/0x400
[11009.908302] [<
ffffffff810021a1>] ? do_one_initcall+0x131/0x300
[11009.908306] [<
ffffffff81296dc7>] ? call_rcu_sched+0x17/0x20
[11009.908309] [<
ffffffff8159e708>] ? put_object+0x48/0x70
[11009.908314] [<
ffffffffa1579f11>] em28xx_dvb_init+0x81/0x8a [em28xx_dvb]
[11009.908317] [<
ffffffffa13e81f9>] em28xx_register_extension+0xd9/0x190 [em28xx]
[11009.908320] [<
ffffffffa0150000>] ? 0xffffffffa0150000
[11009.908324] [<
ffffffffa0150010>] em28xx_dvb_register+0x10/0x1000 [em28xx_dvb]
[11009.908327] [<
ffffffff810021b1>] do_one_initcall+0x141/0x300
[11009.908330] [<
ffffffff81002070>] ? try_to_run_init_process+0x40/0x40
[11009.908333] [<
ffffffff8123ff56>] ? trace_hardirqs_on_caller+0x16/0x590
[11009.908337] [<
ffffffff8155e926>] ? kasan_unpoison_shadow+0x36/0x50
[11009.908340] [<
ffffffff8155e926>] ? kasan_unpoison_shadow+0x36/0x50
[11009.908343] [<
ffffffff8155e926>] ? kasan_unpoison_shadow+0x36/0x50
[11009.908346] [<
ffffffff8155ea37>] ? __asan_register_globals+0x87/0xa0
[11009.908350] [<
ffffffff8144da7b>] do_init_module+0x1d0/0x5ad
[11009.908353] [<
ffffffff812f2626>] load_module+0x6666/0x9ba0
[11009.908356] [<
ffffffff812e9c90>] ? symbol_put_addr+0x50/0x50
[11009.908361] [<
ffffffffa1580037>] ? em28xx_dvb_init.part.3+0x5989/0x5cf4 [em28xx_dvb]
[11009.908366] [<
ffffffff812ebfc0>] ? module_frob_arch_sections+0x20/0x20
[11009.908369] [<
ffffffff815bc940>] ? open_exec+0x50/0x50
[11009.908374] [<
ffffffff811671bb>] ? ns_capable+0x5b/0xd0
[11009.908377] [<
ffffffff812f5e58>] SyS_finit_module+0x108/0x130
[11009.908379] [<
ffffffff812f5d50>] ? SyS_init_module+0x1f0/0x1f0
[11009.908383] [<
ffffffff81004044>] ? lockdep_sys_exit_thunk+0x12/0x14
[11009.908394] [<
ffffffff822e6936>] entry_SYSCALL_64_fastpath+0x16/0x76
[11009.908396] Memory state around the buggy address:
[11009.908398]
ffff8803bd78aa00: 00 00 fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[11009.908401]
ffff8803bd78aa80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[11009.908403] >
ffff8803bd78ab00: fc fc fc fc fc fc fc fc 00 00 fc fc fc fc fc fc
[11009.908405] ^
[11009.908407]
ffff8803bd78ab80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[11009.908409]
ffff8803bd78ac00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[11009.908411] ==================================================================
In order to avoid it, let's set the cached value of the firmware
name to NULL after freeing it. While here, return an error if
the memory allocation fails.
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Change-Id: I945c841dcfb45de2056267e4aa50bbe176b527cf
Bug:
30946097
Vegard Nossum [Fri, 29 Jul 2016 08:40:31 +0000 (10:40 +0200)]
UPSTREAM: block: fix use-after-free in seq file
(cherry picked from commit
77da160530dd1dc94f6ae15a981f24e5f0021e84)
I got a KASAN report of use-after-free:
==================================================================
BUG: KASAN: use-after-free in klist_iter_exit+0x61/0x70 at addr
ffff8800b6581508
Read of size 8 by task trinity-c1/315
=============================================================================
BUG kmalloc-32 (Not tainted): kasan: bad access detected
-----------------------------------------------------------------------------
Disabling lock debugging due to kernel taint
INFO: Allocated in disk_seqf_start+0x66/0x110 age=144 cpu=1 pid=315
___slab_alloc+0x4f1/0x520
__slab_alloc.isra.58+0x56/0x80
kmem_cache_alloc_trace+0x260/0x2a0
disk_seqf_start+0x66/0x110
traverse+0x176/0x860
seq_read+0x7e3/0x11a0
proc_reg_read+0xbc/0x180
do_loop_readv_writev+0x134/0x210
do_readv_writev+0x565/0x660
vfs_readv+0x67/0xa0
do_preadv+0x126/0x170
SyS_preadv+0xc/0x10
do_syscall_64+0x1a1/0x460
return_from_SYSCALL_64+0x0/0x6a
INFO: Freed in disk_seqf_stop+0x42/0x50 age=160 cpu=1 pid=315
__slab_free+0x17a/0x2c0
kfree+0x20a/0x220
disk_seqf_stop+0x42/0x50
traverse+0x3b5/0x860
seq_read+0x7e3/0x11a0
proc_reg_read+0xbc/0x180
do_loop_readv_writev+0x134/0x210
do_readv_writev+0x565/0x660
vfs_readv+0x67/0xa0
do_preadv+0x126/0x170
SyS_preadv+0xc/0x10
do_syscall_64+0x1a1/0x460
return_from_SYSCALL_64+0x0/0x6a
CPU: 1 PID: 315 Comm: trinity-c1 Tainted: G B 4.7.0+ #62
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
ffffea0002d96000 ffff880119b9f918 ffffffff81d6ce81 ffff88011a804480
ffff8800b6581500 ffff880119b9f948 ffffffff8146c7bd ffff88011a804480
ffffea0002d96000 ffff8800b6581500 fffffffffffffff4 ffff880119b9f970
Call Trace:
[<
ffffffff81d6ce81>] dump_stack+0x65/0x84
[<
ffffffff8146c7bd>] print_trailer+0x10d/0x1a0
[<
ffffffff814704ff>] object_err+0x2f/0x40
[<
ffffffff814754d1>] kasan_report_error+0x221/0x520
[<
ffffffff8147590e>] __asan_report_load8_noabort+0x3e/0x40
[<
ffffffff83888161>] klist_iter_exit+0x61/0x70
[<
ffffffff82404389>] class_dev_iter_exit+0x9/0x10
[<
ffffffff81d2e8ea>] disk_seqf_stop+0x3a/0x50
[<
ffffffff8151f812>] seq_read+0x4b2/0x11a0
[<
ffffffff815f8fdc>] proc_reg_read+0xbc/0x180
[<
ffffffff814b24e4>] do_loop_readv_writev+0x134/0x210
[<
ffffffff814b4c45>] do_readv_writev+0x565/0x660
[<
ffffffff814b8a17>] vfs_readv+0x67/0xa0
[<
ffffffff814b8de6>] do_preadv+0x126/0x170
[<
ffffffff814b92ec>] SyS_preadv+0xc/0x10
This problem can occur in the following situation:
open()
- pread()
- .seq_start()
- iter = kmalloc() // succeeds
- seqf->private = iter
- .seq_stop()
- kfree(seqf->private)
- pread()
- .seq_start()
- iter = kmalloc() // fails
- .seq_stop()
- class_dev_iter_exit(seqf->private) // boom! old pointer
As the comment in disk_seqf_stop() says, stop is called even if start
failed, so we need to reinitialise the private pointer to NULL when seq
iteration stops.
An alternative would be to set the private pointer to NULL when the
kmalloc() in disk_seqf_start() fails.
Cc: stable@vger.kernel.org
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
Change-Id: I07b33f4b38341f60a37806cdd45b0a0c3ab4d84d
Bug:
30942273
Jerome Marchand [Wed, 6 Apr 2016 13:06:48 +0000 (14:06 +0100)]
UPSTREAM: assoc_array: don't call compare_object() on a node
(cherry picked from commit
8d4a2ec1e0b41b0cf9a0c5cd4511da7f8e4f3de2)
Changes since V1: fixed the description and added KASan warning.
In assoc_array_insert_into_terminal_node(), we call the
compare_object() method on all non-empty slots, even when they're
not leaves, passing a pointer to an unexpected structure to
compare_object(). Currently it causes an out-of-bound read access
in keyring_compare_object detected by KASan (see below). The issue
is easily reproduced with keyutils testsuite.
Only call compare_object() when the slot is a leave.
KASan warning:
==================================================================
BUG: KASAN: slab-out-of-bounds in keyring_compare_object+0x213/0x240 at addr
ffff880060a6f838
Read of size 8 by task keyctl/1655
=============================================================================
BUG kmalloc-192 (Not tainted): kasan: bad access detected
-----------------------------------------------------------------------------
Disabling lock debugging due to kernel taint
INFO: Allocated in assoc_array_insert+0xfd0/0x3a60 age=69 cpu=1 pid=1647
___slab_alloc+0x563/0x5c0
__slab_alloc+0x51/0x90
kmem_cache_alloc_trace+0x263/0x300
assoc_array_insert+0xfd0/0x3a60
__key_link_begin+0xfc/0x270
key_create_or_update+0x459/0xaf0
SyS_add_key+0x1ba/0x350
entry_SYSCALL_64_fastpath+0x12/0x76
INFO: Slab 0xffffea0001829b80 objects=16 used=8 fp=0xffff880060a6f550 flags=0x3fff8000004080
INFO: Object 0xffff880060a6f740 @offset=5952 fp=0xffff880060a6e5d1
Bytes b4
ffff880060a6f730: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Object
ffff880060a6f740: d1 e5 a6 60 00 88 ff ff 0e 00 00 00 00 00 00 00 ...`............
Object
ffff880060a6f750: 02 cf 8e 60 00 88 ff ff 02 c0 8e 60 00 88 ff ff ...`.......`....
Object
ffff880060a6f760: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Object
ffff880060a6f770: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Object
ffff880060a6f780: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Object
ffff880060a6f790: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Object
ffff880060a6f7a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Object
ffff880060a6f7b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Object
ffff880060a6f7c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Object
ffff880060a6f7d0: 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Object
ffff880060a6f7e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Object
ffff880060a6f7f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
CPU: 0 PID: 1655 Comm: keyctl Tainted: G B 4.5.0-rc4-kasan+ #291
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
0000000000000000 000000001b2800b4 ffff880060a179e0 ffffffff81b60491
ffff88006c802900 ffff880060a6f740 ffff880060a17a10 ffffffff815e2969
ffff88006c802900 ffffea0001829b80 ffff880060a6f740 ffff880060a6e650
Call Trace:
[<
ffffffff81b60491>] dump_stack+0x85/0xc4
[<
ffffffff815e2969>] print_trailer+0xf9/0x150
[<
ffffffff815e9454>] object_err+0x34/0x40
[<
ffffffff815ebe50>] kasan_report_error+0x230/0x550
[<
ffffffff819949be>] ? keyring_get_key_chunk+0x13e/0x210
[<
ffffffff815ec62d>] __asan_report_load_n_noabort+0x5d/0x70
[<
ffffffff81994cc3>] ? keyring_compare_object+0x213/0x240
[<
ffffffff81994cc3>] keyring_compare_object+0x213/0x240
[<
ffffffff81bc238c>] assoc_array_insert+0x86c/0x3a60
[<
ffffffff81bc1b20>] ? assoc_array_cancel_edit+0x70/0x70
[<
ffffffff8199797d>] ? __key_link_begin+0x20d/0x270
[<
ffffffff8199786c>] __key_link_begin+0xfc/0x270
[<
ffffffff81993389>] key_create_or_update+0x459/0xaf0
[<
ffffffff8128ce0d>] ? trace_hardirqs_on+0xd/0x10
[<
ffffffff81992f30>] ? key_type_lookup+0xc0/0xc0
[<
ffffffff8199e19d>] ? lookup_user_key+0x13d/0xcd0
[<
ffffffff81534763>] ? memdup_user+0x53/0x80
[<
ffffffff819983ea>] SyS_add_key+0x1ba/0x350
[<
ffffffff81998230>] ? key_get_type_from_user.constprop.6+0xa0/0xa0
[<
ffffffff828bcf4e>] ? retint_user+0x18/0x23
[<
ffffffff8128cc7e>] ? trace_hardirqs_on_caller+0x3fe/0x580
[<
ffffffff81004017>] ? trace_hardirqs_on_thunk+0x17/0x19
[<
ffffffff828bc432>] entry_SYSCALL_64_fastpath+0x12/0x76
Memory state around the buggy address:
ffff880060a6f700: fc fc fc fc fc fc fc fc 00 00 00 00 00 00 00 00
ffff880060a6f780: 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc fc
>
ffff880060a6f800: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
^
ffff880060a6f880: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff880060a6f900: fc fc fc fc fc fc 00 00 00 00 00 00 00 00 00 00
==================================================================
Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: stable@vger.kernel.org
Change-Id: I903935a221a5b9fb14cec14ef64bd2b6fa8eb222
Bug:
30513364
Yongqin Liu [Thu, 1 Sep 2016 16:36:04 +0000 (22:06 +0530)]
ANDROID: base-cfg: enable SECCOMP config
Enable following seccomp configs
CONFIG_SECCOMP=y
CONFIG_SECCOMP_FILTER=y
Otherwise we will get mediacode error like this on Android N:
E /system/bin/mediaextractor: libminijail: prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER): Invalid argument
Change-Id: I2477b6a2cfdded5c0ebf6ffbb6150b0e5fe2ba12
Signed-off-by: Yongqin Liu <yongqin.liu@linaro.org>
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
Guenter Roeck [Wed, 31 Aug 2016 16:52:16 +0000 (09:52 -0700)]
ANDROID: rcu_sync: Export rcu_sync_lockdep_assert
x86_64:allmodconfig fails to build with the following error.
ERROR: "rcu_sync_lockdep_assert" [kernel/locking/locktorture.ko] undefined!
Introduced by commit
3228c5eb7af2 ("RFC: FROMLIST: locking/percpu-rwsem:
Optimize readers and reduce global impact"). The applied upstream version
exports the missing symbol, so let's do the same.
Change-Id: If4e516715c3415fe8c82090f287174857561550d
Fixes:
3228c5eb7af2 ("RFC: FROMLIST: locking/percpu-rwsem: Optimize ...")
Signed-off-by: Guenter Roeck <groeck@chromium.org>
Badhri Jagan Sridharan [Tue, 30 Aug 2016 20:39:02 +0000 (13:39 -0700)]
UPSTREAM: USB: cdc-acm: more sanity checking
commit
8835ba4a39cf53f705417b3b3a94eb067673f2c9 upstream.
An attack has become available which pretends to be a quirky
device circumventing normal sanity checks and crashes the kernel
by an insufficient number of interfaces. This patch adds a check
to the code path for quirky devices.
BUG:
28242610
Signed-off-by: Oliver Neukum <ONeukum@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Badhri Jagan Sridharan <Badhri@google.com>
Change-Id: I9a5f7f3c704b65e866335054f470451fcfae9d1c
Badhri Jagan Sridharan [Tue, 30 Aug 2016 20:37:07 +0000 (13:37 -0700)]
UPSTREAM: USB: iowarrior: fix oops with malicious USB descriptors
commit
4ec0ef3a82125efc36173062a50624550a900ae0 upstream.
The iowarrior driver expects at least one valid endpoint. If given
malicious descriptors that specify 0 for the number of endpoints,
it will crash in the probe function. Ensure there is at least
one endpoint on the interface before using it.
The full report of this issue can be found here:
http://seclists.org/bugtraq/2016/Mar/87
BUG:
28242610
Reported-by: Ralf Spenneberg <ralf@spenneberg.net>
Signed-off-by: Josh Boyer <jwboyer@fedoraproject.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Badhri Jagan Sridharan <Badhri@google.com>
Change-Id: If5161c23928e9ef77cb3359cba9b36622b1908df
Badhri Jagan Sridharan [Tue, 30 Aug 2016 20:35:32 +0000 (13:35 -0700)]
UPSTREAM: USB: usb_driver_claim_interface: add sanity checking
commit
0b818e3956fc1ad976bee791eadcbb3b5fec5bfd upstream.
Attacks that trick drivers into passing a NULL pointer
to usb_driver_claim_interface() using forged descriptors are
known. This thwarts them by sanity checking.
BUG:
28242610
Signed-off-by: Oliver Neukum <ONeukum@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Badhri Jagan Sridharan <Badhri@google.com>
Change-Id: Ib43ec5edb156985a9db941785a313f6801df092a