GitHub/MotorolaMobilityLLC/kernel-slsi.git
2 years ago[RAMEN9610-22013]futex: Fix inode life-time issue android-11-release-rsa MMI-RSA31.Q1-48-36-11 MMI-RSB31.Q1-48-36-11
Peter Zijlstra [Wed, 4 Mar 2020 10:28:31 +0000 (11:28 +0100)]
[RAMEN9610-22013]futex: Fix inode life-time issue

commit 8019ad13ef7f64be44d4f892af9c840179009254 upstream.

As reported by Jann, ihold() does not in fact guarantee inode
persistence. And instead of making it so, replace the usage of inode
pointers with a per boot, machine wide, unique inode identifier.

This sequence number is global, but shared (file backed) futexes are
rare enough that this should not become a performance issue.

Change-Id: I71ce183a546dc536fddd1c1f982fa96a577e5a2d
Reported-by: Jann Horn <jannh@google.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 years ago[RAMEN9610-22013]HID: make arrays usage and value to be the same
Will McVicker [Sat, 5 Dec 2020 00:48:48 +0000 (00:48 +0000)]
[RAMEN9610-22013]HID: make arrays usage and value to be the same

commit ed9be64eefe26d7d8b0b5b9fa3ffdf425d87a01f upstream.

The HID subsystem allows an "HID report field" to have a different
number of "values" and "usages" when it is allocated. When a field
struct is created, the size of the usage array is guaranteed to be at
least as large as the values array, but it may be larger. This leads to
a potential out-of-bounds write in
__hidinput_change_resolution_multipliers() and an out-of-bounds read in
hidinput_count_leds().

To fix this, let's make sure that both the usage and value arrays are
the same size.

Change-Id: I96857aa2e17b88d900a50aa16a016f4aca6b1fa5
Cc: stable@vger.kernel.org
Signed-off-by: Will McVicker <willmcvicker@google.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years ago[RAMEN9610-22009]tty: Fix ->pgrp locking in tiocspgrp()
Jann Horn [Thu, 3 Dec 2020 01:25:04 +0000 (02:25 +0100)]
[RAMEN9610-22009]tty: Fix ->pgrp locking in tiocspgrp()

commit 54ffccbf053b5b6ca4f6e45094b942fab92a25fc upstream.

tiocspgrp() takes two tty_struct pointers: One to the tty that userspace
passed to ioctl() (`tty`) and one to the TTY being changed (`real_tty`).
These pointers are different when ioctl() is called with a master fd.

To properly lock real_tty->pgrp, we must take real_tty->ctrl_lock.

This bug makes it possible for racing ioctl(TIOCSPGRP, ...) calls on
both sides of a PTY pair to corrupt the refcount of `struct pid`,
leading to use-after-free errors.

Change-Id: I000d97f52d398cdf171fb8fb53d9b91456e9dc4c
Fixes: 47f86834bbd4 ("redo locking of tty->pgrp")
CC: stable@kernel.org
Signed-off-by: Jann Horn <jannh@google.com>
Reviewed-by: Jiri Slaby <jirislaby@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years ago[RAMEN9610-21992]icmp: randomize the global rate limiter
Eric Dumazet [Thu, 15 Oct 2020 18:42:00 +0000 (11:42 -0700)]
[RAMEN9610-21992]icmp: randomize the global rate limiter

[ Upstream commit b38e7819cae946e2edf869e604af1e65a5d241c5 ]

Keyu Man reported that the ICMP rate limiter could be used
by attackers to get useful signal. Details will be provided
in an upcoming academic publication.

Our solution is to add some noise, so that the attackers
no longer can get help from the predictable token bucket limiter.

Fixes: 4cdf507d5452 ("icmp: add a global rate limitation")
Change-Id: Ib1a0ee70e55123a17d88b79a83664a3b2ac70034
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Keyu Man <kman001@ucr.edu>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years ago[RAMEN9610-21992]mm: thp: make the THP mapcount atomic against __split_huge_pmd_locked()
Andrea Arcangeli [Wed, 27 May 2020 23:06:24 +0000 (19:06 -0400)]
[RAMEN9610-21992]mm: thp: make the THP mapcount atomic against __split_huge_pmd_locked()

commit c444eb564fb16645c172d550359cb3d75fe8a040 upstream.

Write protect anon page faults require an accurate mapcount to decide
if to break the COW or not. This is implemented in the THP path with
reuse_swap_page() ->
page_trans_huge_map_swapcount()/page_trans_huge_mapcount().

If the COW triggers while the other processes sharing the page are
under a huge pmd split, to do an accurate reading, we must ensure the
mapcount isn't computed while it's being transferred from the head
page to the tail pages.

reuse_swap_cache() already runs serialized by the page lock, so it's
enough to add the page lock around __split_huge_pmd_locked too, in
order to add the missing serialization.

Note: the commit in "Fixes" is just to facilitate the backporting,
because the code before such commit didn't try to do an accurate THP
mapcount calculation and it instead used the page_count() to decide if
to COW or not. Both the page_count and the pin_count are THP-wide
refcounts, so they're inaccurate if used in
reuse_swap_page(). Reverting such commit (besides the unrelated fix to
the local anon_vma assignment) would have also opened the window for
memory corruption side effects to certain workloads as documented in
such commit header.

Change-Id: I702663cf38fd55f76fa2772e4b1cc8b1d0a45a0a
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Suggested-by: Jann Horn <jannh@google.com>
Reported-by: Jann Horn <jannh@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Fixes: 6d0a07edd17c ("mm: thp: calculate the mapcount correctly for THP pages during WP faults")
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years ago[RAMEN9610-21992]block: Fix use-after-free in blkdev_get()
Jason Yan [Tue, 16 Jun 2020 12:16:55 +0000 (20:16 +0800)]
[RAMEN9610-21992]block: Fix use-after-free in blkdev_get()

[ Upstream commit 2d3a8e2deddea6c89961c422ec0c5b851e648c14 ]

In blkdev_get() we call __blkdev_get() to do some internal jobs and if
there is some errors in __blkdev_get(), the bdput() is called which
means we have released the refcount of the bdev (actually the refcount of
the bdev inode). This means we cannot access bdev after that point. But
acctually bdev is still accessed in blkdev_get() after calling
__blkdev_get(). This results in use-after-free if the refcount is the
last one we released in __blkdev_get(). Let's take a look at the
following scenerio:

  CPU0            CPU1                    CPU2
blkdev_open     blkdev_open           Remove disk
                  bd_acquire
  blkdev_get
    __blkdev_get      del_gendisk
bdev_unhash_inode
  bd_acquire          bdev_get_gendisk
    bd_forget           failed because of unhashed
  bdput
              bdput (the last one)
        bdev_evict_inode

       access bdev => use after free

[  459.350216] BUG: KASAN: use-after-free in __lock_acquire+0x24c1/0x31b0
[  459.351190] Read of size 8 at addr ffff88806c815a80 by task syz-executor.0/20132
[  459.352347]
[  459.352594] CPU: 0 PID: 20132 Comm: syz-executor.0 Not tainted 4.19.90 #2
[  459.353628] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
[  459.354947] Call Trace:
[  459.355337]  dump_stack+0x111/0x19e
[  459.355879]  ? __lock_acquire+0x24c1/0x31b0
[  459.356523]  print_address_description+0x60/0x223
[  459.357248]  ? __lock_acquire+0x24c1/0x31b0
[  459.357887]  kasan_report.cold+0xae/0x2d8
[  459.358503]  __lock_acquire+0x24c1/0x31b0
[  459.359120]  ? _raw_spin_unlock_irq+0x24/0x40
[  459.359784]  ? lockdep_hardirqs_on+0x37b/0x580
[  459.360465]  ? _raw_spin_unlock_irq+0x24/0x40
[  459.361123]  ? finish_task_switch+0x125/0x600
[  459.361812]  ? finish_task_switch+0xee/0x600
[  459.362471]  ? mark_held_locks+0xf0/0xf0
[  459.363108]  ? __schedule+0x96f/0x21d0
[  459.363716]  lock_acquire+0x111/0x320
[  459.364285]  ? blkdev_get+0xce/0xbe0
[  459.364846]  ? blkdev_get+0xce/0xbe0
[  459.365390]  __mutex_lock+0xf9/0x12a0
[  459.365948]  ? blkdev_get+0xce/0xbe0
[  459.366493]  ? bdev_evict_inode+0x1f0/0x1f0
[  459.367130]  ? blkdev_get+0xce/0xbe0
[  459.367678]  ? destroy_inode+0xbc/0x110
[  459.368261]  ? mutex_trylock+0x1a0/0x1a0
[  459.368867]  ? __blkdev_get+0x3e6/0x1280
[  459.369463]  ? bdev_disk_changed+0x1d0/0x1d0
[  459.370114]  ? blkdev_get+0xce/0xbe0
[  459.370656]  blkdev_get+0xce/0xbe0
[  459.371178]  ? find_held_lock+0x2c/0x110
[  459.371774]  ? __blkdev_get+0x1280/0x1280
[  459.372383]  ? lock_downgrade+0x680/0x680
[  459.373002]  ? lock_acquire+0x111/0x320
[  459.373587]  ? bd_acquire+0x21/0x2c0
[  459.374134]  ? do_raw_spin_unlock+0x4f/0x250
[  459.374780]  blkdev_open+0x202/0x290
[  459.375325]  do_dentry_open+0x49e/0x1050
[  459.375924]  ? blkdev_get_by_dev+0x70/0x70
[  459.376543]  ? __x64_sys_fchdir+0x1f0/0x1f0
[  459.377192]  ? inode_permission+0xbe/0x3a0
[  459.377818]  path_openat+0x148c/0x3f50
[  459.378392]  ? kmem_cache_alloc+0xd5/0x280
[  459.379016]  ? entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  459.379802]  ? path_lookupat.isra.0+0x900/0x900
[  459.380489]  ? __lock_is_held+0xad/0x140
[  459.381093]  do_filp_open+0x1a1/0x280
[  459.381654]  ? may_open_dev+0xf0/0xf0
[  459.382214]  ? find_held_lock+0x2c/0x110
[  459.382816]  ? lock_downgrade+0x680/0x680
[  459.383425]  ? __lock_is_held+0xad/0x140
[  459.384024]  ? do_raw_spin_unlock+0x4f/0x250
[  459.384668]  ? _raw_spin_unlock+0x1f/0x30
[  459.385280]  ? __alloc_fd+0x448/0x560
[  459.385841]  do_sys_open+0x3c3/0x500
[  459.386386]  ? filp_open+0x70/0x70
[  459.386911]  ? trace_hardirqs_on_thunk+0x1a/0x1c
[  459.387610]  ? trace_hardirqs_off_caller+0x55/0x1c0
[  459.388342]  ? do_syscall_64+0x1a/0x520
[  459.388930]  do_syscall_64+0xc3/0x520
[  459.389490]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  459.390248] RIP: 0033:0x416211
[  459.390720] Code: 75 14 b8 02 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83
04 19 00 00 c3 48 83 ec 08 e8 0a fa ff ff 48 89 04 24 b8 02 00 00 00 0f
   05 <48> 8b 3c 24 48 89 c2 e8 53 fa ff ff 48 89 d0 48 83 c4 08 48 3d
      01
[  459.393483] RSP: 002b:00007fe45dfe9a60 EFLAGS: 00000293 ORIG_RAX: 0000000000000002
[  459.394610] RAX: ffffffffffffffda RBX: 00007fe45dfea6d4 RCX: 0000000000416211
[  459.395678] RDX: 00007fe45dfe9b0a RSI: 0000000000000002 RDI: 00007fe45dfe9b00
[  459.396758] RBP: 000000000076bf20 R08: 0000000000000000 R09: 000000000000000a
[  459.397930] R10: 0000000000000075 R11: 0000000000000293 R12: 00000000ffffffff
[  459.399022] R13: 0000000000000bd9 R14: 00000000004cdb80 R15: 000000000076bf2c
[  459.400168]
[  459.400430] Allocated by task 20132:
[  459.401038]  kasan_kmalloc+0xbf/0xe0
[  459.401652]  kmem_cache_alloc+0xd5/0x280
[  459.402330]  bdev_alloc_inode+0x18/0x40
[  459.402970]  alloc_inode+0x5f/0x180
[  459.403510]  iget5_locked+0x57/0xd0
[  459.404095]  bdget+0x94/0x4e0
[  459.404607]  bd_acquire+0xfa/0x2c0
[  459.405113]  blkdev_open+0x110/0x290
[  459.405702]  do_dentry_open+0x49e/0x1050
[  459.406340]  path_openat+0x148c/0x3f50
[  459.406926]  do_filp_open+0x1a1/0x280
[  459.407471]  do_sys_open+0x3c3/0x500
[  459.408010]  do_syscall_64+0xc3/0x520
[  459.408572]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  459.409415]
[  459.409679] Freed by task 1262:
[  459.410212]  __kasan_slab_free+0x129/0x170
[  459.410919]  kmem_cache_free+0xb2/0x2a0
[  459.411564]  rcu_process_callbacks+0xbb2/0x2320
[  459.412318]  __do_softirq+0x225/0x8ac

Fix this by delaying bdput() to the end of blkdev_get() which means we
have finished accessing bdev.

Fixes: 77ea887e433a ("implement in-kernel gendisk events handling")
Change-Id: I9ec1b9ea3520f3e1374b95cee48ac8cc38476117
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Jason Yan <yanaijie@huawei.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
3 years ago[RAMEN9610-21968]ANDROID: xt_qtaguid: Remove tag_entry from process list on untag
Kalesh Singh [Mon, 11 Jan 2021 06:26:18 +0000 (01:26 -0500)]
[RAMEN9610-21968]ANDROID: xt_qtaguid: Remove tag_entry from process list on untag

A sock_tag_entry can only be part of one process's
pqd_entry->sock_tag_list. RetagGing the socket only updates
sock_tag_entry->tag, and does not add the tag entry to the current
process's pqd_entry list, nor update sock_tag_entry->pid.
So the sock_tag_entry is only ever present in the
pqd_entry list of the process that initially tagged the socket.

A sock_tag_entry can also get created and not be added to any process's
pqd_entry list. This happens if the process that initially tags the
socket has not opened /dev/xt_qtaguid.

ctrl_cmd_untag() supports untagGing from a context other than the
process that initially tagged the socket. Currently, the sock_tag_entry is
only removed from its containing pqd_entry->sock_tag_list if the
process that does the untagGing has opened /dev/xt_qtaguid. However, the
tag entry should always be deleted from its pqd entry list (if present).

Bug: 176919394
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Change-Id: I5b6f0c36c0ebefd98cc6873a4057104c7d885ccc
(cherry picked from commit c2ab93b45b5cdc426868fb8793ada2cac20568ef)

3 years agotracing: do not leak kernel addresses
Nick Desaulniers [Fri, 3 Mar 2017 23:40:12 +0000 (15:40 -0800)]
tracing: do not leak kernel addresses

This likely breaks tracing tools like trace-cmd.  It logs in the same
format but now addresses are all 0x0.

Bug: 34277115
MOT-CRs-fixed: (CR)

Change-Id: Ifb0d4d2a184bf0d95726de05b1acee0287a375d9
Reviewed-on: https://gerrit.mot.com/1887836
SME-Granted: SME Approvals Granted
SLTApproved: Slta Waiver
Tested-by: Jira Key
Reviewed-by: Wang Wang <wangwang1@mt.com>
Reviewed-by: Xiangpo Zhao <zhaoxp3@motorola.com>
Submit-Approved: Jira Key

3 years agoarm/config: support for hid-nintendo driver
sunyue5 [Wed, 30 Dec 2020 09:54:19 +0000 (17:54 +0800)]
arm/config: support for hid-nintendo driver

CTS will test this device so we should enable it in the
kernel config
CTS cases:
run cts -m CtsHardwareTestCases -t
android.hardware.input.cts.tests.NintendoSwitchProTest#testAllKeys
android.hardware.input.cts.tests.NintendoSwitchProTest#testAllMotions

Change-Id: I4c0f9dcb468fbfa629ae82b984ff55dc170a1acc
Signed-off-by: sunyue5 <sunyue5@motorola.com>
Reviewed-on: https://gerrit.mot.com/1839437
SLTApproved: Slta Waiver
SME-Granted: SME Approvals Granted
Tested-by: Jira Key
Reviewed-by: Xiangpo Zhao <zhaoxp3@motorola.com>
Submit-Approved: Jira Key

3 years agoarm64/defconfig: Disable CONFIG_SCSC_WLAN_WIFI_SHARING
sunyue5 [Fri, 11 Dec 2020 01:25:53 +0000 (09:25 +0800)]
arm64/defconfig: Disable CONFIG_SCSC_WLAN_WIFI_SHARING

SCSC_WLAN_WIFI_SHARING is to enable APSTA function which means
the device can start wifi hotspot with wifi station enabled.
In this case, wlan1 will be for the AP mode while wlan0 will be
for the STA mode.
Disable this function since we have disabled it in the frameworks

Change-Id: Icd0d10badec991b3235735c0e53de3214368ba87
Signed-off-by: sunyue5 <sunyue5@motorola.com>
Reviewed-on: https://gerrit.mot.com/1825115
SLTApproved: Slta Waiver
SME-Granted: SME Approvals Granted
Tested-by: Jira Key
Reviewed-by: Xiangpo Zhao <zhaoxp3@motorola.com>
Submit-Approved: Jira Key

3 years agoenable blkio for exynos9610
lulei1 [Fri, 27 Nov 2020 06:01:27 +0000 (14:01 +0800)]
enable blkio for exynos9610

IO cgroup should be enabled to restrict background IO R/W.

Change-Id: I4736eb38d208a23afb183c27a5d90296e79e5f39
Reviewed-on: https://gerrit.mot.com/1812665
SLTApproved: Slta Waiver
SME-Granted: SME Approvals Granted
Tested-by: Jira Key
Reviewed-by: Guolin Wang <wanggl3@mt.com>
Submit-Approved: Jira Key

3 years agoarm64/defconfig: sync Q defconfig
zhaoxp3 [Fri, 13 Nov 2020 07:25:04 +0000 (15:25 +0800)]
arm64/defconfig: sync Q defconfig

1. sync kane defconfig change
2. pick wifi change
RAMEN9610-21775]wlbt: Enable Wifi Configurations in Defconfig.
Enable Wifi Configurations in Defconfi

Change-Id: Iea3d6c430f6b3a039c7124b5fa2a331c15bad27c
Signed-off-by: zhaoxp3 <zhaoxp3@motorola.com>
3 years agoUPSTREAM: signal: don't silently convert SI_USER signals to non-current pidfd
Jann Horn [Sat, 30 Mar 2019 02:12:32 +0000 (03:12 +0100)]
UPSTREAM: signal: don't silently convert SI_USER signals to non-current pidfd

The current sys_pidfd_send_signal() silently turns signals with explicit
SI_USER context that are sent to non-current tasks into signals with
kernel-generated siGinfo.
This is unlike do_rt_sigqueueinfo(), which returns -EPERM in this case.
If a user actually wants to send a signal with kernel-provided siGinfo,
they can do that with pidfd_send_signal(pidfd, sig, NULL, 0); so allowing
this case is unnecessary.

Instead of silently replacing the siGinfo, just bail out with an error;
this is consistent with other interfaces and avoids special-casing behavior
based on security checks.

Fixes: 3eb39f47934f ("signal: add pidfd_send_signal() syscall")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Christian Brauner <christian@brauner.io>
(cherry picked from commit 556a888a14afe27164191955618990fb3ccc9aad)

Mot-CRs-fixed: (CR)

Bug: 135608568
Test: test program using syscall(__NR_pidfd_send_signal,..) to send SIGKILL
Change-Id: I493af671b82c43bff1425ee24550d2fb9aa6d961
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-on: https://gerrit.mot.com/1505848
Tested-by: Jira Key
SLTApproved: Slta Waiver
SME-Granted: SME Approvals Granted
Reviewed-by: Wang Wang <wangwang1@mt.com>
Reviewed-by: Yonghui Jia <jiayh2@motorola.com>
Submit-Approved: Jira Key
Reviewed-on: https://gerrit.mot.com/1796156
Reviewed-by: Xiangpo Zhao <zhaoxp3@motorola.com>
3 years agoUPSTREAM: pidfd: add polling support
Joel Fernandes (Google) [Tue, 30 Apr 2019 16:21:53 +0000 (12:21 -0400)]
UPSTREAM: pidfd: add polling support

This patch adds polling support to pidfd.

Android low memory killer (LMK) needs to know when a process dies once
it is sent the kill signal. It does so by checking for the existence of
/proc/pid which is both racy and slow. For example, if a PID is reused
between when LMK sends a kill signal and checks for existence of the
PID, since the wrong PID is now possibly checked for existence.
Using the polling support, LMK will be able to get notified when a process
exists in race-free and fast way, and allows the LMK to do other things
(such as by polling on other fds) while awaiting the process being killed
to die.

For notification to polling processes, we follow the same existing
mechanism in the kernel used when the parent of the task group is to be
notified of a child's death (do_notify_parent). This is precisely when the
tasks waiting on a poll of pidfd are also awakened in this patch.

We have decided to include the waitqueue in struct pid for the following
reasons:
1. The wait queue has to survive for the lifetime of the poll. Including
   it in task_struct would not be option in this case because the task can
   be reaped and destroyed before the poll returns.

2. By including the struct pid for the waitqueue means that during
   de_thread(), the new thread group leader automatically gets the new
   waitqueue/pid even though its task_struct is different.

Appropriate test cases are added in the second patch to provide coverage of
all the cases the patch is handling.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Daniel Colascione <dancol@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Tim Murray <timmurray@google.com>
Cc: Jonathan Kowalski <bl0pbl33p@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Kees Cook <keescook@chromium.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: kernel-team@android.com
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Co-developed-by: Daniel Colascione <dancol@google.com>
Signed-off-by: Daniel Colascione <dancol@google.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Christian Brauner <christian@brauner.io>
(cherry picked from commit b53b0b9d9a613c418057f6cb921c2f40a6f78c24)

Mot-CRs-fixed: (CR)

Bug: 135608568
Test: test program using syscall(__NR_sys_pidfd_open,..) and poll()
Change-Id: I02f259d2875bec46b198d580edfbb067f077084e
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-on: https://gerrit.mot.com/1505855
SME-Granted: SME Approvals Granted
SLTApproved: Slta Waiver
Tested-by: Jira Key
Reviewed-by: Wang Wang <wangwang1@mt.com>
Reviewed-by: Yonghui Jia <jiayh2@motorola.com>
Submit-Approved: Jira Key
Reviewed-on: https://gerrit.mot.com/1796163
Reviewed-by: Xiangpo Zhao <zhaoxp3@motorola.com>
3 years agoBACKPORT: signal: support CLONE_PIDFD with pidfd_send_signal
Christian Brauner [Wed, 17 Apr 2019 20:50:25 +0000 (22:50 +0200)]
BACKPORT: signal: support CLONE_PIDFD with pidfd_send_signal

Let pidfd_send_signal() use pidfds retrieved via CLONE_PIDFD.  With this
patch pidfd_send_signal() becomes independent of procfs.  This fullfils
the request made when we merged the pidfd_send_signal() patchset.  The
pidfd_send_signal() syscall is now always available allowing for it to
be used by users without procfs mounted or even users without procfs
support compiled into the kernel.

Signed-off-by: Christian Brauner <christian@brauner.io>
Co-developed-by: Jann Horn <jannh@google.com>
Signed-off-by: Jann Horn <jannh@google.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: David Howells <dhowells@redhat.com>
Cc: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>
Cc: Andy Lutomirsky <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
(cherry picked from commit 2151ad1b067275730de1b38c7257478cae47d29e)

Conflicts:
        kernel/sys_ni.c

(1. Replaced COND_SYSCALL with cond_syscall.)

Mot-CRs-fixed: (CR)

Bug: 135608568
Test: test program using syscall(__NR_sys_pidfd_open,..) and poll()
Change-Id: I621fe6547397e0e68c560d7da60ef7715deb290c
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-on: https://gerrit.mot.com/1505852
SLTApproved: Slta Waiver
SME-Granted: SME Approvals Granted
Tested-by: Jira Key
Reviewed-by: Wang Wang <wangwang1@mt.com>
Reviewed-by: Yonghui Jia <jiayh2@motorola.com>
Submit-Approved: Jira Key
Reviewed-on: https://gerrit.mot.com/1796160
Reviewed-by: Xiangpo Zhao <zhaoxp3@motorola.com>
3 years agoUPSTREAM: fork: do not release lock that wasn't taken
Christian Brauner [Fri, 10 May 2019 09:53:46 +0000 (11:53 +0200)]
UPSTREAM: fork: do not release lock that wasn't taken

Avoid calling cgroup_threadgroup_change_end() without having called
cgroup_threadgroup_change_beGin() first.

During process creation we need to check whether the cgroup we are in
allows us to fork. To perform this check the cgroup needs to guard itself
against threadgroup changes and takes a lock.
Prior to CLONE_PIDFD the cleanup target "bad_fork_free_pid" would also need
to call cgroup_threadgroup_change_end() because said lock had already been
taken.
However, this is not the case anymore with the addition of CLONE_PIDFD. We
are now allocating a pidfd before we check whether the cgroup we're in can
fork and thus prior to taking the lock. So when copy_process() fails at the
right step it would release a lock we haven't taken.
This bug is not even very subtle to be honest. It's just not very clear
from the naming of cgroup_threadgroup_change_{beGin,end}() that a lock is
taken.

Here's the relevant splat:

entry_SYSENTER_compat+0x70/0x7f arch/x86/entry/entry_64_compat.S:139
RIP: 0023:0xf7fec849
Code: 85 d2 74 02 89 0a 5b 5d c3 8b 04 24 c3 8b 14 24 c3 8b 3c 24 c3 90 90
90 90 90 90 90 90 90 90 90 90 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90
90 90 90 eb 0d 90 90 90 90 90 90 90 90 90 90 90 90
RSP: 002b:00000000ffed5a8c EFLAGS: 00000246 ORIG_RAX: 0000000000000078
RAX: ffffffffffffffda RBX: 0000000000003ffc RCX: 0000000000000000
RDX: 00000000200005c0 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 0000000000000012 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
------------[ cut here ]------------
DEBUG_LOCKS_WARN_ON(depth <= 0)
WARNING: CPU: 1 PID: 7744 at kernel/locking/lockdep.c:4052 __lock_release
kernel/locking/lockdep.c:4052 [inline]
WARNING: CPU: 1 PID: 7744 at kernel/locking/lockdep.c:4052
lock_release+0x667/0xa00 kernel/locking/lockdep.c:4321
Kernel panic - not syncing: panic_on_warn set ...
CPU: 1 PID: 7744 Comm: syz-executor007 Not tainted 5.1.0+ #4
Hardware name: Google Google Compute EnGine/Google Compute EnGine, BIOS
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x172/0x1f0 lib/dump_stack.c:113
  panic+0x2cb/0x65c kernel/panic.c:214
  __warn.cold+0x20/0x45 kernel/panic.c:566
  report_bug+0x263/0x2b0 lib/bug.c:186
  fixup_bug arch/x86/kernel/traps.c:179 [inline]
  fixup_bug arch/x86/kernel/traps.c:174 [inline]
  do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:272
  do_invalid_op+0x37/0x50 arch/x86/kernel/traps.c:291
  invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:972
RIP: 0010:__lock_release kernel/locking/lockdep.c:4052 [inline]
RIP: 0010:lock_release+0x667/0xa00 kernel/locking/lockdep.c:4321
Code: 0f 85 a0 03 00 00 8b 35 77 66 08 08 85 f6 75 23 48 c7 c6 a0 55 6b 87
48 c7 c7 40 25 6b 87 4c 89 85 70 ff ff ff e8 b7 a9 eb ff <0f> 0b 4c 8b 85
70 ff ff ff 4c 89 ea 4c 89 e6 4c 89 c7 e8 52 63 ff
RSP: 0018:ffff888094117b48 EFLAGS: 00010086
RAX: 0000000000000000 RBX: 1ffff11012822f6f RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffffff815af236 RDI: ffffed1012822f5b
RBP: ffff888094117c00 R08: ffff888092bfc400 R09: fffffbfff113301d
R10: fffffbfff113301c R11: ffffffff889980e3 R12: ffffffff8a451df8
R13: ffffffff8142e71f R14: ffffffff8a44cc80 R15: ffff888094117bd8
  percpu_up_read.constprop.0+0xcb/0x110 include/linux/percpu-rwsem.h:92
  cgroup_threadgroup_change_end include/linux/cgroup-defs.h:712 [inline]
  copy_process.part.0+0x47ff/0x6710 kernel/fork.c:2222
  copy_process kernel/fork.c:1772 [inline]
  _do_fork+0x25d/0xfd0 kernel/fork.c:2338
  __do_compat_sys_x86_clone arch/x86/ia32/sys_ia32.c:240 [inline]
  __se_compat_sys_x86_clone arch/x86/ia32/sys_ia32.c:236 [inline]
  __ia32_compat_sys_x86_clone+0xbc/0x140 arch/x86/ia32/sys_ia32.c:236
  do_syscall_32_irqs_on arch/x86/entry/common.c:334 [inline]
  do_fast_syscall_32+0x281/0xd54 arch/x86/entry/common.c:405
  entry_SYSENTER_compat+0x70/0x7f arch/x86/entry/entry_64_compat.S:139
RIP: 0023:0xf7fec849
Code: 85 d2 74 02 89 0a 5b 5d c3 8b 04 24 c3 8b 14 24 c3 8b 3c 24 c3 90 90
90 90 90 90 90 90 90 90 90 90 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90
90 90 90 eb 0d 90 90 90 90 90 90 90 90 90 90 90 90
RSP: 002b:00000000ffed5a8c EFLAGS: 00000246 ORIG_RAX: 0000000000000078
RAX: ffffffffffffffda RBX: 0000000000003ffc RCX: 0000000000000000
RDX: 00000000200005c0 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 0000000000000012 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
Kernel Offset: disabled
Rebooting in 86400 seconds..

Reported-and-tested-by: syzbot+3286e58549edc479faae@syzkaller.appspotmail.com
Fixes: b3e583825266 ("clone: add CLONE_PIDFD")
Signed-off-by: Christian Brauner <christian@brauner.io>
(cherry picked from commit c3b7112df86b769927a60a6d7175988ca3d60f09)

Mot-CRs-fixed: (CR)

Bug: 135608568
Test: test program using syscall(__NR_sys_pidfd_open,..) and poll()
Change-Id: Ib9ecb1e5c0c6e2d062b89c25109ec571570eb497
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-on: https://gerrit.mot.com/1505853
Tested-by: Jira Key
SLTApproved: Slta Waiver
SME-Granted: SME Approvals Granted
Reviewed-by: Wang Wang <wangwang1@mt.com>
Reviewed-by: Yonghui Jia <jiayh2@motorola.com>
Submit-Approved: Jira Key
Reviewed-on: https://gerrit.mot.com/1796161
Reviewed-by: Xiangpo Zhao <zhaoxp3@motorola.com>
3 years agoBACKPORT: clone: add CLONE_PIDFD
Christian Brauner [Wed, 27 Mar 2019 12:04:15 +0000 (13:04 +0100)]
BACKPORT: clone: add CLONE_PIDFD

This patchset makes it possible to retrieve pid file descriptors at
process creation time by introducing the new flag CLONE_PIDFD to the
clone() system call.  Linus oriGinally suggested to implement this as a
new flag to clone() instead of making it a separate system call.  As
spotted by Linus, there is exactly one bit for clone() left.

CLONE_PIDFD creates file descriptors based on the anonymous inode
implementation in the kernel that will also be used to implement the new
mount api.  They serve as a simple opaque handle on pids.  Logically,
this makes it possible to interpret a pidfd differently, narrowing or
widening the scope of various operations (e.g. signal sending).  Thus, a
pidfd cannot just refer to a tgid, but also a tid, or in theory - given
appropriate flag arguments in relevant syscalls - a process group or
session. A pidfd does not represent a privilege.  This does not imply it
cannot ever be that way but for now this is not the case.

A pidfd comes with additional information in fdinfo if the kernel supports
procfs.  The fdinfo file contains the pid of the process in the callers
pid namespace in the same format as the procfs status file, i.e. "Pid:\t%d".

As suggested by Oleg, with CLONE_PIDFD the pidfd is returned in the
parent_tidptr argument of clone.  This has the advantage that we can
give back the associated pid and the pidfd at the same time.

To remove worries about missing metadata access this patchset comes with
a sample program that illustrates how a combination of CLONE_PIDFD, and
pidfd_send_signal() can be used to gain race-free access to process
metadata through /proc/<pid>.  The sample program can easily be
translated into a helper that would be suitable for inclusion in libc so
that users don't have to worry about writing it themselves.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Christian Brauner <christian@brauner.io>
Co-developed-by: Jann Horn <jannh@google.com>
Signed-off-by: Jann Horn <jannh@google.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: David Howells <dhowells@redhat.com>
Cc: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>
Cc: Andy Lutomirsky <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
(cherry picked from commit b3e5838252665ee4cfa76b82bdf1198dca81e5be)

Conflicts:
        kernel/fork.c

(1. Replaced proc_pid_ns() with its direct implementation.)

Mot-CRs-fixed: (CR)

Bug: 135608568
Test: test program using syscall(__NR_sys_pidfd_open,..) and poll()
Change-Id: I3c804a92faea686e5bf7f99df893fe3a5d87ddf7
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-on: https://gerrit.mot.com/1505851
SLTApproved: Slta Waiver
SME-Granted: SME Approvals Granted
Tested-by: Jira Key
Reviewed-by: Wang Wang <wangwang1@mt.com>
Reviewed-by: Yonghui Jia <jiayh2@motorola.com>
Submit-Approved: Jira Key
Reviewed-on: https://gerrit.mot.com/1796159
Reviewed-by: Xiangpo Zhao <zhaoxp3@motorola.com>
3 years agoUPSTREAM: Make anon_inodes unconditional
David Howells [Mon, 5 Nov 2018 17:40:31 +0000 (17:40 +0000)]
UPSTREAM: Make anon_inodes unconditional

Make the anon_inodes facility unconditional so that it can be used by core
VFS code.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
(cherry picked from commit dadd2299ab61fc2b55b95b7b3a8f674cdd3b69c9)

Mot-CRs-fixed: (CR)

Bug: 135608568
Test: test program using syscall(__NR_sys_pidfd_open,..) and poll()
Change-Id: I2f97bda4f360d8d05bbb603de839717b3d8067ae
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-on: https://gerrit.mot.com/1505850
Tested-by: Jira Key
SLTApproved: Slta Waiver
SME-Granted: SME Approvals Granted
Reviewed-by: Wang Wang <wangwang1@mt.com>
Reviewed-by: Yonghui Jia <jiayh2@motorola.com>
Submit-Approved: Jira Key
Reviewed-on: https://gerrit.mot.com/1796158
Reviewed-by: Xiangpo Zhao <zhaoxp3@motorola.com>
3 years agoBACKPORT: pid: add pidfd_open()
Christian Brauner [Fri, 24 May 2019 10:43:51 +0000 (12:43 +0200)]
BACKPORT: pid: add pidfd_open()

This adds the pidfd_open() syscall. It allows a caller to retrieve pollable
pidfds for a process which did not get created via CLONE_PIDFD, i.e. for a
process that is created via traditional fork()/clone() calls that is only
referenced by a PID:

int pidfd = pidfd_open(1234, 0);
ret = pidfd_send_signal(pidfd, SIGSTOP, NULL, 0);

With the introduction of pidfds through CLONE_PIDFD it is possible to
created pidfds at process creation time.
However, a lot of processes get created with traditional PID-based calls
such as fork() or clone() (without CLONE_PIDFD). For these processes a
caller can currently not create a pollable pidfd. This is a problem for
Android's low memory killer (LMK) and service managers such as systemd.
Both are examples of tools that want to make use of pidfds to get reliable
notification of process exit for non-parents (pidfd polling) and race-free
signal sending (pidfd_send_signal()). They intend to switch to this API for
process supervision/management as soon as possible. Having no way to get
pollable pidfds from PID-only processes is one of the biggest blockers for
them in adopting this api. With pidfd_open() making it possible to retrieve
pidfds for PID-based processes we enable them to adopt this api.

In line with Arnd's recent changes to consolidate syscall numbers across
architectures, I have added the pidfd_open() syscall to all architectures
at the same time.

Signed-off-by: Christian Brauner <christian@brauner.io>
Reviewed-by: David Howells <dhowells@redhat.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jann Horn <jannh@google.com>
Cc: Andy Lutomirsky <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-api@vger.kernel.org
(cherry picked from commit 32fcb426ec001cb6d5a4a195091a8486ea77e2df)

Conflicts:
        kernel/pid.c

(1. Replaced PIDTYPE_TGID with PIDTYPE_PID and thread_group_leader() check in pidfd_open() call)

Mot-CRs-fixed: (CR)

Bug: 135608568
Test: test program using syscall(__NR_sys_pidfd_open,..) and poll()
Change-Id: I52a93a73722d7f7754dae05f63b94b4ca4a71a75
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-on: https://gerrit.mot.com/1505856
SME-Granted: SME Approvals Granted
SLTApproved: Slta Waiver
Tested-by: Jira Key
Reviewed-by: Wang Wang <wangwang1@mt.com>
Reviewed-by: Yonghui Jia <jiayh2@motorola.com>
Submit-Approved: Jira Key
Reviewed-on: https://gerrit.mot.com/1796165
Reviewed-by: Xiangpo Zhao <zhaoxp3@motorola.com>
3 years agoUPSTREAM: pidfd: fix a poll race when setting exit_state
Suren Baghdasaryan [Wed, 17 Jul 2019 17:21:00 +0000 (13:21 -0400)]
UPSTREAM: pidfd: fix a poll race when setting exit_state

There is a race between reading task->exit_state in pidfd_poll and
writing it after do_notify_parent calls do_notify_pidfd. Expected
sequence of events is:

CPU 0                            CPU 1
------------------------------------------------
exit_notify
  do_notify_parent
    do_notify_pidfd
  tsk->exit_state = EXIT_DEAD
                                  pidfd_poll
                                     if (tsk->exit_state)

However nothing prevents the following sequence:

CPU 0                            CPU 1
------------------------------------------------
exit_notify
  do_notify_parent
    do_notify_pidfd
                                   pidfd_poll
                                      if (tsk->exit_state)
  tsk->exit_state = EXIT_DEAD

This causes a polling task to wait forever, since poll blocks because
exit_state is 0 and the waiting task is not notified again. A stress
test continuously doing pidfd poll and process exits uncovered this bug.

To fix it, we make sure that the task's exit_state is always set before
calling do_notify_pidfd.

Fixes: b53b0b9d9a6 ("pidfd: add polling support")
Cc: kernel-team@android.com
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Link: https://lore.kernel.org/r/20190717172100.261204-1-joel@joelfernandes.org
[christian@brauner.io: adapt commit message and drop unneeded changes from wait_task_zombie]
Signed-off-by: Christian Brauner <christian@brauner.io>
(cherry picked from commit b191d6491be67cef2b3fa83015561caca1394ab9)

Mot-CRs-fixed: (CR)

Bug: 135608568
Test: test program using syscall(__NR_sys_pidfd_open,..) and poll()
Change-Id: I043e54c9b69f25de88f6f19ae167920af8532de2
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-on: https://gerrit.mot.com/1505858
SLTApproved: Slta Waiver
SME-Granted: SME Approvals Granted
Tested-by: Jira Key
Reviewed-by: Wang Wang <wangwang1@mt.com>
Reviewed-by: Yonghui Jia <jiayh2@motorola.com>
Submit-Approved: Jira Key
Reviewed-on: https://gerrit.mot.com/1796167
Reviewed-by: Xiangpo Zhao <zhaoxp3@motorola.com>
3 years agoEnable PSI
huangzq2 [Fri, 6 Nov 2020 11:10:37 +0000 (19:10 +0800)]
Enable PSI

Change-Id: I02a9d4b95606250a6727479ed1a1be01dd208e50
Signed-off-by: huangzq2 <huangzq2@motorola.com>
Reviewed-on: https://gerrit.mot.com/1796168
SLTApproved: Slta Waiver
SME-Granted: SME Approvals Granted
Tested-by: Jira Key
Reviewed-by: Xiangpo Zhao <zhaoxp3@motorola.com>
Submit-Approved: Jira Key

3 years agoUPSTREAM: signal: improve comments
Christian Brauner [Tue, 4 Jun 2019 13:18:43 +0000 (15:18 +0200)]
UPSTREAM: signal: improve comments

Improve the comments for pidfd_send_signal().
First, the comment still referred to a file descriptor for a process as a
"task file descriptor" which stems from way back at the beGinning of the
discussion. Replace this with "pidfd" for consistency.
Second, the wording for the explanation of the arguments to the syscall
was a bit inconsistent, e.g. some used the past tense some used present
tense. Make the wording more consistent.

Signed-off-by: Christian Brauner <christian@brauner.io>
(cherry picked from commit c732327f04a3818f35fa97d07b1d64d31b691d78)

Mot-CRs-fixed: (CR)

Bug: 135608568
Test: test program using syscall(__NR_sys_pidfd_open,..) and poll()
Change-Id: I06c6bdd1dddaeb8ac75a78dd21f9cdd0dc139a4c
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-on: https://gerrit.mot.com/1505854
SLTApproved: Slta Waiver
SME-Granted: SME Approvals Granted
Tested-by: Jira Key
Reviewed-by: Wang Wang <wangwang1@mt.com>
Reviewed-by: Yonghui Jia <jiayh2@motorola.com>
Submit-Approved: Jira Key
Reviewed-on: https://gerrit.mot.com/1796162
Reviewed-by: Xiangpo Zhao <zhaoxp3@motorola.com>
3 years agoBACKPORT: arch: wire-up pidfd_open()
Christian Brauner [Fri, 24 May 2019 10:44:59 +0000 (12:44 +0200)]
BACKPORT: arch: wire-up pidfd_open()

This wires up the pidfd_open() syscall into all arches at once.

Signed-off-by: Christian Brauner <christian@brauner.io>
Reviewed-by: David Howells <dhowells@redhat.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jann Horn <jannh@google.com>
Cc: Andy Lutomirsky <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-api@vger.kernel.org
Cc: linux-alpha@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-ia64@vger.kernel.org
Cc: linux-m68k@lists.linux-m68k.org
Cc: linux-mips@vger.kernel.org
Cc: linux-parisc@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-s390@vger.kernel.org
Cc: linux-sh@vger.kernel.org
Cc: sparclinux@vger.kernel.org
Cc: linux-xtensa@linux-xtensa.org
Cc: linux-arch@vger.kernel.org
Cc: x86@kernel.org
(cherry picked from commit 7615d9e1780e26e0178c93c55b73309a5dc093d7)

Conflicts:

        arch/alpha/kernel/syscalls/syscall.tbl
        arch/arm/tools/syscall.tbl
        arch/ia64/kernel/syscalls/syscall.tbl
        arch/m68k/kernel/syscalls/syscall.tbl
        arch/microblaze/kernel/syscalls/syscall.tbl
        arch/mips/kernel/syscalls/syscall_n32.tbl
        arch/mips/kernel/syscalls/syscall_n64.tbl
        arch/mips/kernel/syscalls/syscall_o32.tbl
        arch/parisc/kernel/syscalls/syscall.tbl
        arch/powerpc/kernel/syscalls/syscall.tbl
        arch/s390/kernel/syscalls/syscall.tbl
        arch/sh/kernel/syscalls/syscall.tbl
        arch/sparc/kernel/syscalls/syscall.tbl
        arch/xtensa/kernel/syscalls/syscall.tbl
        arch/x86/entry/syscalls/syscall_32.tbl
        arch/x86/entry/syscalls/syscall_64.tbl

(1. Skipped syscall.tbl modifications for missing architectures.
 2. Removed __ia32_sys_pidfd_open in arch/x86/entry/syscalls/syscall_32.tbl.
 3. Replaced __x64_sys_pidfd_open with sys_pidfd_open in arch/x86/entry/syscalls/syscall_64.tbl.)

Mot-CRs-fixed: (CR)

Bug: 135608568
Test: test program using syscall(__NR_sys_pidfd_open,..) and poll()
Change-Id: I294aa33dea5ed2662e077340281d7aa0452f7471
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-on: https://gerrit.mot.com/1505857
Tested-by: Jira Key
SLTApproved: Slta Waiver
SME-Granted: SME Approvals Granted
Reviewed-by: Wang Wang <wangwang1@mt.com>
Reviewed-by: Yonghui Jia <jiayh2@motorola.com>
Submit-Approved: Jira Key
Reviewed-on: https://gerrit.mot.com/1796166
Reviewed-by: Xiangpo Zhao <zhaoxp3@motorola.com>
3 years agoUPSTREAM: signal: use fdget() since we don't allow O_PATH
Christian Brauner [Thu, 18 Apr 2019 10:18:39 +0000 (12:18 +0200)]
UPSTREAM: signal: use fdget() since we don't allow O_PATH

As stated in the oriGinal commit for pidfd_send_signal() we don't allow
to signal processes through O_PATH file descriptors since it is
semantically equivalent to a write on the pidfd.

We already correctly error out right now and return EBADF if an O_PATH
fd is passed.  This is because we use file->f_op to detect whether a
pidfd is passed and O_PATH fds have their file->f_op set to empty_fops
in do_dentry_open() and thus fail the test.

Thus, there is no regression.  It's just semantically correct to use
fdget() and return an error right from there instead of taking a
reference and returning an error later.

Signed-off-by: Christian Brauner <christian@brauner.io>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jann Horn <jann@thejh.net>
Cc: David Howells <dhowells@redhat.com>
Cc: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>
Cc: Andy Lutomirsky <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 738a7832d21e3d911fcddab98ce260b79010b461)

Mot-CRs-fixed: (CR)

Bug: 135608568
Test: test program using syscall(__NR_pidfd_send_signal,..) to send SIGKILL
Change-Id: IMontanaeaadf9da371fb2d9caae4df49627760de7229
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-on: https://gerrit.mot.com/1505849
Tested-by: Jira Key
SLTApproved: Slta Waiver
SME-Granted: SME Approvals Granted
Reviewed-by: Wang Wang <wangwang1@mt.com>
Reviewed-by: Yonghui Jia <jiayh2@motorola.com>
Submit-Approved: Jira Key
Reviewed-on: https://gerrit.mot.com/1796157
Reviewed-by: Xiangpo Zhao <zhaoxp3@motorola.c
3 years agoBACKPORT: signal: add pidfd_send_signal() syscall
Christian Brauner [Sun, 18 Nov 2018 23:51:56 +0000 (00:51 +0100)]
BACKPORT: signal: add pidfd_send_signal() syscall

The kill() syscall operates on process identifiers (pid). After a process
has exited its pid can be reused by another process. If a caller sends a
signal to a reused pid it will end up signaling the wrong process. This
issue has often Surfaced and there has been a push to address this problem [1].

This patch uses file descriptors (fd) from proc/<pid> as stable handles on
struct pid. Even if a pid is recycled the handle will not change. The fd
can be used to send signals to the process it refers to.
Thus, the new syscall pidfd_send_signal() is introduced to solve this
problem. Instead of pids it operates on process fds (pidfd).

/* prototype and argument /*
long pidfd_send_signal(int pidfd, int sig, siGinfo_t *info, unsigned int flags);

/* syscall number 424 */
The syscall number was chosen to be 424 to align with Arnd's rework in his
y2038 to minimize merge conflicts (cf. [25]).

In addition to the pidfd and signal argument it takes an additional
siGinfo_t and flags argument. If the siGinfo_t argument is NULL then
pidfd_send_signal() is equivalent to kill(<positive-pid>, <signal>). If it
is not NULL pidfd_send_signal() is equivalent to rt_sigqueueinfo().
The flags argument is added to allow for future extensions of this syscall.
It currently needs to be passed as 0. Failing to do so will cause EINVAL.

/* pidfd_send_signal() replaces multiple pid-based syscalls */
The pidfd_send_signal() syscall currently takes on the job of
rt_sigqueueinfo(2) and parts of the functionality of kill(2), Namely, when a
positive pid is passed to kill(2). It will however be possible to also
replace tgkill(2) and rt_tgsigqueueinfo(2) if this syscall is extended.

/* sending signals to threads (tid) and process groups (pgid) */
Specifically, the pidfd_send_signal() syscall does currently not operate on
process groups or threads. This is left for future extensions.
In order to extend the syscall to allow sending signal to threads and
process groups appropriately named flags (e.g. PIDFD_TYPE_PGID, and
PIDFD_TYPE_TID) should be added. This implies that the flags argument will
determine what is signaled and not the file descriptor itself. Put in other
words, grouping in this api is a property of the flags argument not a
property of the file descriptor (cf. [13]). Clarification for this has been
requested by Eric (cf. [19]).
When appropriate extensions through the flags argument are added then
pidfd_send_signal() can additionally replace the part of kill(2) which
operates on process groups as well as the tgkill(2) and
rt_tgsigqueueinfo(2) syscalls.
How such an extension could be implemented has been very roughly sketched
in [14], [15], and [16]. However, this should not be taken as a commitment
to a particular implementation. There might be better ways to do it.
Right now this is intentionally left out to keep this patchset as simple as
possible (cf. [4]).

/* naming */
The syscall had various names throughout iterations of this patchset:
- procfd_signal()
- procfd_send_signal()
- taskfd_send_signal()
In the last round of reviews it was pointed out that given that if the
flags argument decides the scope of the signal instead of different types
of fds it might make sense to either settle for "procfd_" or "pidfd_" as
prefix. The community was willing to accept either (cf. [17] and [18]).
Given that one developer expressed strong preference for the "pidfd_"
prefix (cf. [13]) and with other developers less opinionated about the name
we should settle for "pidfd_" to avoid further bikeshedding.

The  "_send_signal" suffix was chosen to reflect the fact that the syscall
takes on the job of multiple syscalls. It is therefore intentional that the
name is not reminiscent of neither kill(2) nor rt_sigqueueinfo(2). Not the
fomer because it might imply that pidfd_send_signal() is a replacement for
kill(2), and not the latter because it is a hassle to remember the correct
spelling - especially for non-native speakers - and because it is not
descriptive enough of what the syscall actually does. The name
"pidfd_send_signal" makes it very clear that its job is to send signals.

/* zombies */
Zombies can be signaled just as any other process. No special error will be
reported since a zombie state is an unreliable state (cf. [3]). However,
this can be added as an extension through the @flags argument if the need
ever arises.

/* cross-namespace signals */
The patch currently enforces that the signaler and signalee either are in
the same pid namespace or that the signaler's pid namespace is an ancestor
of the signalee's pid namespace. This is done for the sake of simplicity
and because it is unclear to what values certain members of struct
siGinfo_t would need to be set to (cf. [5], [6]).

/* compat syscalls */
It became clear that we would like to avoid adding compat syscalls
(cf. [7]).  The compat syscall handling is now done in kernel/signal.c
itself by adding __copy_siGinfo_from_user_generic() which lets us avoid
compat syscalls (cf. [8]). It should be noted that the addition of
__copy_siGinfo_from_user_any() is caused by a bug in the oriGinal
implementation of rt_sigqueueinfo(2) (cf. 12).
With upcoming rework for syscall handling things might improve
significantly (cf. [11]) and __copy_siGinfo_from_user_any() will not gain
any additional callers.

/* testing */
This patch was tested on x64 and x86.

/* userspace usage */
An asciinema recording for the basic functionality can be found under [9].
With this patch a process can be killed via:

 #define _GNU_SOURCE
 #include <errno.h>
 #include <fcntl.h>
 #include <signal.h>
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
 #include <sys/stat.h>
 #include <sys/syscall.h>
 #include <sys/types.h>
 #include <unistd.h>

 static inline int do_pidfd_send_signal(int pidfd, int sig, siGinfo_t *info,
                                         unsigned int flags)
 {
 #ifdef __NR_pidfd_send_signal
         return syscall(__NR_pidfd_send_signal, pidfd, sig, info, flags);
 #else
         return -ENOSYS;
 #endif
 }

 int main(int argc, char *argv[])
 {
         int fd, ret, saved_errno, sig;

         if (argc < 3)
                 exit(EXIT_FAILURE);

         fd = open(argv[1], O_DIRECTORY | O_CLOEXEC);
         if (fd < 0) {
                 printf("%s - Failed to open \"%s\"\n", strerror(errno), argv[1]);
                 exit(EXIT_FAILURE);
         }

         sig = atoi(argv[2]);

         printf("Sending signal %d to process %s\n", sig, argv[1]);
         ret = do_pidfd_send_signal(fd, sig, NULL, 0);

         saved_errno = errno;
         close(fd);
         errno = saved_errno;

         if (ret < 0) {
                 printf("%s - Failed to send signal %d to process %s\n",
                        strerror(errno), sig, argv[1]);
                 exit(EXIT_FAILURE);
         }

         exit(EXIT_SUCCESS);
 }

/* Q&A
 * Given that it seems the same questions get asked again by people who are
 * late to the party it makes sense to add a Q&A section to the commit
 * message so it's hopefully easier to avoid duplicate threads.
 *
 * For the sake of progress please consider these arguments settled unless
 * there is a new point that desperately needs to be addressed. Please make
 * sure to check the links to the threads in this commit message whether
 * this has not already been covered.
 */
Q-01: (Florian Weimer [20], Andrew Morton [21])
      What happens when the target process has exited?
A-01: Sending the signal will fail with ESRCH (cf. [22]).

Q-02:  (Andrew Morton [21])
       Is the task_struct pinned by the fd?
A-02:  No. A reference to struct pid is kept. struct pid - as far as I
       understand - was created exactly for the reason to not require to
       pin struct task_struct (cf. [22]).

Q-03: (Andrew Morton [21])
      Does the entire procfs directory remain visible? Just one entry
      within it?
A-03: The same thing that happens right now when you hold a file descriptor
      to /proc/<pid> open (cf. [22]).

Q-04: (Andrew Morton [21])
      Does the pid remain reserved?
A-04: No. This patchset guarantees a stable handle not that pids are not
      recycled (cf. [22]).

Q-05: (Andrew Morton [21])
      Do attempts to signal that fd return errors?
A-05: See {Q,A}-01.

Q-06: (Andrew Morton [22])
      Is there a cleaner way of obtaining the fd? Another syscall perhaps.
A-06: Userspace can already trivially retrieve file descriptors from procfs
      so this is something that we will need to support anyway. Hence,
      there's no immediate need to add another syscalls just to make
      pidfd_send_signal() not dependent on the presence of procfs. However,
      adding a syscalls to get such file descriptors is planned for a
      future patchset (cf. [22]).

Q-07: (Andrew Morton [21] and others)
      This fd-for-a-process sounds like a handy thing and people may well
      think up other uses for it in the future, probably unrelated to
      signals. Are the code and the interface designed to permit such
      future applications?
A-07: Yes (cf. [22]).

Q-08: (Andrew Morton [21] and others)
      Now I think about it, why a new syscall? This thing is looking
      rather like an ioctl?
A-08: This has been extensively discussed. It was agreed that a syscall is
      preferred for a variety or reasons. Here are just a few taken from
      prior threads. Syscalls are safer than ioctl()s especially when
      signaling to fds. Processes are a core kernel concept so a syscall
      seems more appropriate. The layout of the syscall with its four
      arguments would require the addition of a custom struct for the
      ioctl() thereby causing at least the same amount or even more
      complexity for userspace than a simple syscall. The new syscall will
      replace multiple other pid-based syscalls (see description above).
      The file-descriptors-for-processes concept introduced with this
      syscall will be extended with other syscalls in the future. See also
      [22], [23] and various other threads already linked in here.

Q-09: (Florian Weimer [24])
      What happens if you use the new interface with an O_PATH descriptor?
A-09:
      pidfds opened as O_PATH fds cannot be used to send signals to a
      process (cf. [2]). Signaling processes through pidfds is the
      equivalent of writing to a file. Thus, this is not an operation that
      operates "purely at the file descriptor level" as required by the
      open(2) manpage. See also [4].

/* References */
[1]:  https://lore.kernel.org/lkml/20181029221037.87724-1-dancol@google.com/
[2]:  https://lore.kernel.org/lkml/874lbtjvtd.fsf@oldenburg2.str.redhat.com/
[3]:  https://lore.kernel.org/lkml/20181204132604.aspfupwjgjx6fhva@brauner.io/
[4]:  https://lore.kernel.org/lkml/20181203180224.fkvw4kajtbvru2ku@brauner.io/
[5]:  https://lore.kernel.org/lkml/20181121213946.GA10795@mail.hallyn.com/
[6]:  https://lore.kernel.org/lkml/20181120103111.etlqp7zop34v6nv4@brauner.io/
[7]:  https://lore.kernel.org/lkml/36323361-90BD-41AF-AB5B-EE0D7BA02C21@amacapital.net/
[8]:  https://lore.kernel.org/lkml/87tvjxp8pc.fsf@xmission.com/
[9]:  https://asciinema.org/a/IQjuCHew6bnq1cr78yuMv16cy
[11]: https://lore.kernel.org/lkml/F53D6D38-3521-4C20-9034-5AF447DF62FF@amacapital.net/
[12]: https://lore.kernel.org/lkml/87zhtjn8ck.fsf@xmission.com/
[13]: https://lore.kernel.org/lkml/871s6u9z6u.fsf@xmission.com/
[14]: https://lore.kernel.org/lkml/20181206231742.xxi4ghn24z4h2qki@brauner.io/
[15]: https://lore.kernel.org/lkml/20181207003124.GA11160@mail.hallyn.com/
[16]: https://lore.kernel.org/lkml/20181207015423.4miorx43l3qhppfz@brauner.io/
[17]: https://lore.kernel.org/lkml/CAGXu5jL8PciZAXvOvCeCU3wKUEB_dU-O3q0tDw4uB_ojMvDEew@mail.gmail.com/
[18]: https://lore.kernel.org/lkml/20181206222746.GB9224@mail.hallyn.com/
[19]: https://lore.kernel.org/lkml/20181208054059.19813-1-christian@brauner.io/
[20]: https://lore.kernel.org/lkml/8736rebl9s.fsf@oldenburg.str.redhat.com/
[21]: https://lore.kernel.org/lkml/20181228152012.dbf0508c2508138efc5f2bbe@linux-foundation.org/
[22]: https://lore.kernel.org/lkml/20181228233725.722tdfgijxcssg76@brauner.io/
[23]: https://lwn.net/Articles/773459/
[24]: https://lore.kernel.org/lkml/8736rebl9s.fsf@oldenburg.str.redhat.com/
[25]: https://lore.kernel.org/lkml/CAK8P3a0ej9NcJM8wXNPbcGUyOUZYX+VLoDFdbenW3s3114oQZw@mail.gmail.com/

Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Jann Horn <jannh@google.com>
Cc: Andy Lutomirsky <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Christian Brauner <christian@brauner.io>
Reviewed-by: Tycho Andersen <tycho@tycho.ws>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Howells <dhowells@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Serge Hallyn <serge@hallyn.com>
Acked-by: Aleksa Sarai <cyphar@cyphar.com>
(cherry picked from commit 3eb39f47934f9d5a3027fe00d906a45fe3a15fad)

Conflicts:
        arch/x86/entry/syscalls/syscall_32.tbl - trivial manual merge
        arch/x86/entry/syscalls/syscall_64.tbl - trivial manual merge
        include/linux/proc_fs.h - trivial manual merge
        include/linux/syscalls.h - trivial manual merge
        include/uapi/asm-generic/unistd.h - trivial manual merge
        kernel/signal.c - struct kernel_siGinfo does not exist in 4.14
        kernel/sys_ni.c - cond_syscall is used instead of COND_SYSCALL
        arch/x86/entry/syscalls/syscall_32.tbl
        arch/x86/entry/syscalls/syscall_64.tbl

(1. manual merges because of 4.14 differences
 2. change prepare_kill_siGinfo() to use struct siGinfo instead of
kernel_siGinfo
 3. use copy_from_user() instead of copy_siGinfo_from_user() in copy_siGinfo_from_user_any()
 4. replaced COND_SYSCALL with cond_syscall
 5. Removed __ia32_sys_pidfd_send_signal in arch/x86/entry/syscalls/syscall_32.tbl.
 6. Replaced __x64_sys_pidfd_send_signal with sys_pidfd_send_signal in arch/x86/entry/syscalls/syscall_64.tbl.)

Mot-CRs-fixed: (CR)

Bug: 135608568
Test: test program using syscall(__NR_pidfd_send_signal,..) to send SIGKILL
Change-Id: I34da11c63ac8cafb0353d9af24c820cef519ec27
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-on: https://gerrit.mot.com/1505847
Tested-by: Jira Key
SLTApproved: Slta Waiver
SME-Granted: SME Approvals Granted
Reviewed-by: Wang Wang <wangwang1@mt.com>
Reviewed-by: Yonghui Jia <jiayh2@motorola.com>
Submit-Approved: Jira Key
Reviewed-on: https://gerrit.mot.com/1796155
Reviewed-by: Xiangpo Zhao <zhaoxp3@motorola.com>
3 years agoinet: switch IP ID generator to siphash
Eric Dumazet [Wed, 27 Mar 2019 19:40:33 +0000 (12:40 -0700)]
inet: switch IP ID generator to siphash

[ Upstream commit df453700e8d81b1bdafdf684365ee2b9431fb702 ]

According to Amit Klein and Benny Pinkas, IP ID generation is too weak
and might be used by attackers.

Even with recent net_hash_mix() fix (netns: provide pure entropy for net_hash_mix())
having 64bit key and Jenkins hash is risky.

It is time to switch to siphash and its 128bit keys.

Mot-CRs-fixed: (CR)
CVE-Fixed: CVE-2019-18282
Bug: 148588557

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Amit Klein <aksecurity@gmail.com>
Reported-by: Benny Pinkas <benny@pinkas.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Jignesh Patel <jignesh@motorola.com>
Change-Id: I9593781a735940aaedf8e6b38fef02b48169bd12
Reviewed-on: https://gerrit.mot.com/1572721
SME-Granted: SME Approvals Granted
SLTApproved: Slta Waiver
Tested-by: Jira Key
Reviewed-by: Xiangpo Zhao <zhaoxp3@motorola.com>
Submit-Approved: Jira Key

3 years agoANDROID: fix binder change in merge of 4.9.188
Todd Kjos [Thu, 23 Jan 2020 05:14:53 +0000 (10:44 +0530)]
ANDROID: fix binder change in merge of 4.9.188

The 4.9.188 merge was missing the change to the
binder driver associated with the linux-4.9.y
commit 16903f1a5ba7 ("coredump: fix race condition
between mmget_not_zero()/get_task_mm() and core dumping").
It was left out because the android-4.9 binder
driver has been significantly refactored compared
to linux-4.9.y.

This patch applies the missing change from that
patch to the binder driver.

Mot-CRs-fixed: (CR)
CVE-Fixed: CVE-2019-11599
BUG: 131964235

Change-Id: I1402cf3c28f1336da9d942abeb322f71a9b8138b
Signed-off-by: Pachipulusu Bhanu Prakash <bhprakas@motorola.com>
Reviewed-on: https://gerrit.mot.com/1473937
SME-Granted: SME Approvals Granted
SLTApproved: Slta Waiver
Tested-by: Jira Key
Reviewed-by: Xiangpo Zhao <zhaoxp3@motorola.com>
Submit-Approved: Jira Key

3 years agoRevert "(CR) arm64: dts: Keep VCCQ power when S2R mode for Sandisk UFS"
Dingwei Luo [Tue, 17 Dec 2019 02:55:38 +0000 (20:55 -0600)]
Revert "(CR) arm64: dts: Keep VCCQ power when S2R mode for Sandisk UFS"

The changed will be effect the APM current, the current will increase about 6mA.

This reverts commit 7eb50b69c6197393012ccc02b97c8ad54c8f6ba3.

Change-Id: I27586d466caa39a9632136b87734fc9447090092
Reviewed-on: https://gerrit.mot.com/1473008
Tested-by: Jira Key
SLTApproved: Slta Waiver
SME-Granted: SME Approvals Granted
Reviewed-by: Xiangpo Zhao <zhaoxp3@motorola.com>
Submit-Approved: Jira Key

3 years agoarm64: dts: Keep VCCQ power when S2R mode for Sandisk UFS
luodw1 [Thu, 12 Dec 2019 05:51:06 +0000 (13:51 +0800)]
arm64: dts: Keep VCCQ power when S2R mode for Sandisk UFS

Change-Id: Ib55c4b86a0608ec3e436dd2b8ae36cb1fd44287e
Reviewed-on: https://gerrit.mot.com/1470812
SME-Granted: SME Approvals Granted
SLTApproved: Slta Waiver
Tested-by: Jira Key
Reviewed-by: Xiangpo Zhao <zhaoxp3@motorola.com>
Submit-Approved: Jira Key

3 years agousb:Restore linked_func list in none-secure mode
a17671 [Mon, 18 Nov 2019 07:56:16 +0000 (15:56 +0800)]
usb:Restore linked_func list in none-secure mode

In none-secure mode, there is still
A low chance of gadget NULL case,
Restore the binded functions list back to
Linked_func list, so next time unbind functions
Could be handled correctly without memory corruption
This is a Samsung platform only issue

Change-Id: Ie46fc52d3eaa6ef60c1a4f6bb83a56229Montana854d
Signed-off-by: a17671 <a17671@motorola.com>
Reviewed-on: https://gerrit.mot.com/1456923
SME-Granted: SME Approvals Granted
SLTApproved: Slta Waiver
Tested-by: Jira Key
Reviewed-by: Xiangpo Zhao <zhaoxp3@motorola.com>
Submit-Approved: Jira Key

3 years agoRevert "(CR) psi:kernel:enable PSI configuration"
wangwang [Thu, 14 Nov 2019 09:05:25 +0000 (17:05 +0800)]
Revert "(CR) psi:kernel:enable PSI configuration"

This reverts commit 9ea0893bec0beb7328429c222097427d33981714.

Conflicts:
arch/arm64/configs/ext_config/moto-erd9610.config

Change-Id: I8e2c88a7b2c932fa877416564d5cbf294afe0d5a
Reviewed-on: https://gerrit.mot.com/1455315
SME-Granted: SME Approvals Granted
SLTApproved: Slta Waiver
Tested-by: Jira Key
Reviewed-by: Xiangpo Zhao <zhaoxp3@motorola.com>
Submit-Approved: Jira Key

3 years agoarm64/defconfig: define CONFIG_LOCALVERSION to user
zhaoxp3 [Wed, 13 Nov 2019 09:49:56 +0000 (17:49 +0800)]
arm64/defconfig: define CONFIG_LOCALVERSION to user

add user defconfig
Change-Id: Ib8113f551270eee70178e9638dabeb8083e5b675
Signed-off-by: zhaoxp3 <zhaoxp3@motorola.com>
Reviewed-on: https://gerrit.mot.com/1453937
SME-Granted: SME Approvals Granted
SLTApproved: Slta Waiver
Tested-by: Jira Key
Submit-Approved: Jira Key

3 years agoEnable process reclaim
huangzq2 [Tue, 12 Nov 2019 07:28:15 +0000 (15:28 +0800)]
Enable process reclaim

Change-Id: Icda8271812c13fa2e4677e42ec38f8a52dd50721
Signed-off-by: huangzq2 <huangzq2@motorola.com>
Reviewed-on: https://gerrit.mot.com/1453732
SLTApproved: Slta Waiver
SME-Granted: SME Approvals Granted
Tested-by: Jira Key
Reviewed-by: Xiangpo Zhao <zhaoxp3@motorola.com>
Submit-Approved: Jira Key

3 years agomm: mm_event: remove get/put_online_cpus call
Martin Liu [Mon, 6 May 2019 16:57:20 +0000 (00:57 +0800)]
mm: mm_event: remove get/put_online_cpus call

remove get/put_online_cpus call since it could cause
deadlock in cpu hotplug path. This might cause race
but should be rare and we should be able to correct
that with the next dump.

=======================================================
    Task name: Binder:897_2 pid: 3255 cpu: 0 start: 0xffffffd39e2f5700
    state: 0x2 exit_state: 0x0 stack base: 0xffffff80241f0000 Prio: 116
    Stack:
    [<ffffff9048f3163c>] __switch_to.cfi+0x138
    [<ffffff904a947808>] __schedule+0xb7c
    [<ffffff904a94dbdc>] rwsem_down_read_failed.cfi+0x270
    [<ffffff904900a0a4>] __percpu_down_read.cfi+0x164
    [<ffffff90491bc4e8>] record_stat+0x6c0
    [<ffffff90491bbdfc>] mm_event_end.cfi+0x14c
    [<ffffff904916c280>] try_to_free_pages.cfi+0xaf4
    [<ffffff904914a598>] __alloc_pages_nodemask.cfi+0x9c8
    [<ffffff9049aa6350>] zcomp_cpu_up_prepare.cfi+0x88
    [<ffffff9048f66da8>] cpuhp_invoke_callback+0x378
    [<ffffff9048f664e0>] _cpu_up+0x1bc
    [<ffffff9048f6aadc>] enable_nonboot_cpus.cfi+0x208
    [<ffffff90490135f4>] suspend_devices_and_enter.cfi+0xc20
    [<ffffff9049012844>] pm_suspend.cfi+0xb30
    [<ffffff9049010568>] state_store.cfi+0x94
    [<ffffff904a9339dc>] kobj_attr_store.cfi+0x34
    [<ffffff90492ed9ec>] sysfs_kf_write.cfi+0x64
    [<ffffff90492eb51c>] kernfs_fop_write.cfi+0x1a4
    [<ffffff90491fbf34>] __vfs_write.cfi+0x50
    [<ffffff90491fbd58>] vfs_write.cfi+0xcc
    [<ffffff90491fefd4>] SyS_write.cfi+0xa4
    [<ffffff9048e84080>] el0_svc_naked+0x34

Test: manual suspend/resume test

Mot-CRs-fixed: (CR)

Bug: 132011965
Change-Id: I112ca0d25e825bb4e0e8979d9b4f1d8e6090147f
Signed-off-by: Martin Liu <liumartin@google.com>
(cherry picked from commit 5f419093ab253702847a6b3a8417e47c2acfb652)
Reviewed-on: https://gerrit.mot.com/1453731
Tested-by: Jira Key
SLTApproved: Slta Waiver
SME-Granted: SME Approvals Granted
Reviewed-by: Xiangpo Zhao <zhaoxp3@motorola.com>
Submit-Approved: Jira Key

3 years agomm: mm_event: fix compact_scan
Minchan Kim [Mon, 28 Jan 2019 11:00:33 +0000 (20:00 +0900)]
mm: mm_event: fix compact_scan

It fixes double counting of COMPACTFREE_SCANNED.

Mot-CRs-fixed: (CR)

Bug: 80168800
Change-Id: I38ef432ecf44ba94988f5a4ec9c69bcb5d20fdce
Signed-off-by: Minchan Kim <minchan@google.com>
Reviewed-on: https://gerrit.mot.com/1453730
SLTApproved: Slta Waiver
SME-Granted: SME Approvals Granted
Tested-by: Jira Key
Reviewed-by: Xiangpo Zhao <zhaoxp3@motorola.com>
Submit-Approved: Jira Key

3 years agomm: synchronize period update interval
Minchan Kim [Thu, 17 Jan 2019 02:14:14 +0000 (11:14 +0900)]
mm: synchronize period update interval

Wei pointed out period update is racy so it could make partial
update, which could lose a ton of trace potentially.

To close period_ms race between updating and reading, use rwlock
to reduce contention.
To close vmstat_period_ms between updating and reading,
use vmstat_lock.

This patch has small refactoring, too.

Mot-CRs-fixed: (CR)

Bug: 80168800
Change-Id: I7f84cff758b533b7881f47889c7662b743bc3c12
Signed-off-by: Minchan Kim <minchan@google.com>
Reviewed-on: https://gerrit.mot.com/1453729
SLTApproved: Slta Waiver
SME-Granted: SME Approvals Granted
Tested-by: Jira Key
Reviewed-by: Xiangpo Zhao <zhaoxp3@motorola.com>
Submit-Approved: Jira Key

3 years agomm: mm_event supports vmstat
Minchan Kim [Tue, 15 Jan 2019 04:54:07 +0000 (13:54 +0900)]
mm: mm_event supports vmstat

Vmstat is significantly important to investigate MM problem.
We have solved many problmes with it via asking users to get
vmstat data periodically from the device, which manual way is
painful once we release the device or on hard reproducible
scenario.

This patch adds periodic vmstat dump into mm_event. It works
only if there are some events in compaction or reclaim. Thus,
unless there is memory pressure, it doesn't gather any vmstat
data. Default interval between each dump is 1000ms.
Admin can tweak it via

echo 2000 > /sys/kernel/debug/mm_event/vmstat_period_ms

Mot-CRs-fixed: (CR)

Bug: 80168800
Change-Id: I4c0e7237d7764c4ea79da00952e5de34ccbe4187
Signed-off-by: Minchan Kim <minchan@google.com>
Reviewed-on: https://gerrit.mot.com/1453728
SLTApproved: Slta Waiver
SME-Granted: SME Approvals Granted
Tested-by: Jira Key
Reviewed-by: Xiangpo Zhao <zhaoxp3@motorola.com>
Submit-Approved: Jira Key

3 years agomm: per-process reclaim
Minchan Kim [Fri, 9 Jan 2015 13:06:55 +0000 (18:36 +0530)]
mm: per-process reclaim

These day, there are many platforms available in the embedded market
and they are smarter than kernel which has very limited information
about working set so they want to involve memory management more heavily
like android's lowmemory killer and ashmem or recent many lowmemory
notifier.

One of the simple imaGine scenario about userspace's intelligence is that
platform can manage tasks as forground and background so it would be
better to reclaim background's task pages for end-user's *responsibility*
although it has frequent referenced pages.

This patch adds new knob "reclaim under proc/<pid>/" so task manager
can reclaim any target process anytime, anywhere. It could give another
method to platform for using memory efficiently.

It can avoid process killing for getting free memory, which was really
terrible experience because I lost my best score of game I had ever
after I switch the phone call while I enjoyed the game.

Reclaim file-backed pages only.
echo file > /proc/PID/reclaim
Reclaim anonymous pages only.
echo anon > /proc/PID/reclaim
Reclaim all pages
echo all > /proc/PID/reclaim

Mot-CRs-fixed: (CR)

Bug: 122047783
Change-Id: I2f629f7a43289af114df27044b1d2af4a6e785bc
Signed-off-by: Tim Murray <timmurray@google.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-on: https://gerrit.mot.com/1453727
Tested-by: Jira Key
SLTApproved: Slta Waiver
SME-Granted: SME Approvals Granted
Reviewed-by: Xiangpo Zhao <zhaoxp3@motorola.com>
Submit-Approved: Jira Key

3 years agomm: emit tracepoint when rss watermark is hit
Joel Fernandes [Sat, 5 May 2018 21:58:08 +0000 (14:58 -0700)]
mm: emit tracepoint when rss watermark is hit

Useful to track how rss is chanGing per tgid. Required for the
memory visibility work being done for Android.

OriGinal patch by Tim Murray:
https://partner-android-review.googlesource.com/c/kernel/private/msm-google/+/1081280

Changes from oriGinal patch:
- don't bloat mm_struct
- add some noise reduction to rss tracking

Mot-CRs-fixed: (CR)

Change-Id: Ief904334235ff4380244e5803d7853579e70d202
Signed-off-by: Joel Fernandes <joelaf@google.com>
Reviewed-on: https://gerrit.mot.com/1453726
SME-Granted: SME Approvals Granted
Tested-by: Jira Key
SLTApproved: Slta Waiver
Reviewed-by: Xiangpo Zhao <zhaoxp3@motorola.com>
Submit-Approved: Jira Key

3 years agomm: mm_event: add read io stat
Minchan Kim [Sun, 12 Aug 2018 23:15:32 +0000 (08:15 +0900)]
mm: mm_event: add read io stat

Read IO's latency as well as filemap fault could affect system
performance so this patch keeps track it on.

Mot-CRs-fixed: (CR)

Bug: 80168800
Change-Id: I761b7110339cf1e5ef24530ad32aedd784d00d07
Signed-off-by: Minchan Kim <minchan@google.com>
Signed-off-by: Cho KyongHo <pullip.cho@samsung.com>
Reviewed-on: https://gerrit.mot.com/1453725
Tested-by: Jira Key
SME-Granted: SME Approvals Granted
SLTApproved: Slta Waiver
Reviewed-by: Xiangpo Zhao <zhaoxp3@motorola.com>
Submit-Approved: Jira Key

3 years agomm: mm_event: add special kernel allocation stat
Minchan Kim [Mon, 6 Aug 2018 06:12:44 +0000 (15:12 +0900)]
mm: mm_event: add special kernel allocation stat

Record the count of special page allocation on the process context.

This patch aims for accounting of special page allocation which
consumed a lot by android system.
At this moment, ION system heap is good candidate(it could cover
other kernel allocation in future).
With that, we could keep tracking burst kernel allocation owner
so that it would be useful to find places caused by lmk, reclaim,
compaction latency.

Mot-CRs-fixed: (CR)

Bug: 80168800
Change-Id: I5942fd940d98baa2eb814f66b076cb37ecd3b4aa
Signed-off-by: Minchan Kim <minchan@google.com>
Reviewed-on: https://gerrit.mot.com/1453724
Tested-by: Jira Key
SME-Granted: SME Approvals Granted
SLTApproved: Slta Waiver
Reviewed-by: Xiangpo Zhao <zhaoxp3@motorola.com>
Submit-Approved: Jira Key

3 years agomm: mm_event: add swapin stat
Minchan Kim [Mon, 6 Aug 2018 06:07:57 +0000 (15:07 +0900)]
mm: mm_event: add swapin stat

Many embedded devices use zram as swap. Compared to storage swap
(e.g. UFS), swapin from zram(ie., decompression) is extremly fast
so it might be not major fault but minor. So this patch provides
swapin latency tracking to distinguish them from storage major
fault.

Mot-CRs-fixed: (CR)

Bug: 80168800
Change-Id: I1c32430e32a051916ede5219bd5f40a9002652bc
Signed-off-by: Minchan Kim <minchan@google.com>
Reviewed-on: https://gerrit.mot.com/1453723
Tested-by: Jira Key
SLTApproved: Slta Waiver
SME-Granted: SME Approvals Granted
Reviewed-by: Xiangpo Zhao <zhaoxp3@motorola.com>
Submit-Approved: Jira Key

3 years agomm: mm_event: add compaction stat
Minchan Kim [Wed, 27 Jun 2018 13:04:37 +0000 (22:04 +0900)]
mm: mm_event: add compaction stat

This patch adds compaction mm_event stat so that we could keep track
latency of compaction as well as count of the event.

Under heavy memory fragmentation, high-order page allocation(e.g.
fork, ION memory allocation) triggers compaction, which is
another major part of latency. Let's track it down, too.

Mot-CRs-fixed: (CR)

Bug: 80168800
Change-Id: Ia3da9324f123ba2542863eafaf72024b5351785b
Signed-off-by: Minchan Kim <minchan@google.com>
Reviewed-on: https://gerrit.mot.com/1453722
SLTApproved: Slta Waiver
SME-Granted: SME Approvals Granted
Tested-by: Jira Key
Reviewed-by: Xiangpo Zhao <zhaoxp3@motorola.com>
Submit-Approved: Jira Key

3 years agomm: mm_event: add reclaim stat
Minchan Kim [Wed, 27 Jun 2018 13:04:07 +0000 (22:04 +0900)]
mm: mm_event: add reclaim stat

This patch adds page reclaim mm_event stat so that we could
keep tracking [avg|max]_latency for the handling the event
as well as count of the event.

Direct reclaim latency is usually a most popular latency source
caused by memory pressure so we need to track it down to hunt
down application's jank problem.

Mot-CRs-fixed: (CR)

Bug: 80168800
Change-Id: I215c3972f76389404da7c4806a776bf753daac01
Signed-off-by: Minchan Kim <minchan@google.com>
Reviewed-on: https://gerrit.mot.com/1453721
SLTApproved: Slta Waiver
SME-Granted: SME Approvals Granted
Tested-by: Jira Key
Reviewed-by: Xiangpo Zhao <zhaoxp3@motorola.com>
Submit-Approved: Jira Key

3 years agomm: mm_event: add page fault stat
Minchan Kim [Wed, 27 Jun 2018 12:54:47 +0000 (21:54 +0900)]
mm: mm_event: add page fault stat

This patch add major and minor fault mm_event stat so that we could
keep tracking [avg|max]_latency for the handling the event
as well as count of the event.

With major fault, we could see how long the IO is delayed. It's very
tightly coupled with application's latency.

With major+minor fault, we could see how many of pages are allocated
for the process in the period. It would help to see memory spike.

Mot-CRs-fixed: (CR)

Bug: 80168800
Change-Id: I8a4434493e3ec291227961939a24c3d57a18fd5b
Signed-off-by: Minchan Kim <minchan@google.com>
Signed-off-by: Cho KyongHo <pullip.cho@samsung.com>
Reviewed-on: https://gerrit.mot.com/1453720
SME-Granted: SME Approvals Granted
SLTApproved: Slta Waiver
Tested-by: Jira Key
Reviewed-by: Xiangpo Zhao <zhaoxp3@motorola.com>
Submit-Approved: Jira Key

3 years agomm: mm_event: make capture period configurable
Minchan Kim [Mon, 6 Aug 2018 06:02:06 +0000 (15:02 +0900)]
mm: mm_event: make capture period configurable

This patch makes per-process mm event capture inteval configurable.
Default is 500ms but admin can change it by below knob.

/sys/kernel/debug/mm_event/period_ms

The unit is millisecond.

Mot-CRs-fixed: (CR)

Bug: 80168800
Change-Id: I3b2de3dd5c4a519a2e5e20f1ef0d5f9a4c7afc8a
Signed-off-by: Minchan Kim <minchan@google.com>
Reviewed-on: https://gerrit.mot.com/1453719
SLTApproved: Slta Waiver
SME-Granted: SME Approvals Granted
Tested-by: Jira Key
Reviewed-by: Xiangpo Zhao <zhaoxp3@motorola.com>
Submit-Approved: Jira Key

3 years agomm: introduce per-process mm event tracking feature
Minchan Kim [Mon, 6 Aug 2018 06:00:19 +0000 (15:00 +0900)]
mm: introduce per-process mm event tracking feature

Linux supports /proc/meminfo and /proc/vmstat stats as memory health metric.
Android uses them too. If user see something goes wrong(e.g., sluggish, jank)
on their system, they can capture and report system state to developers
for debugGing.

It shows memory stat at the moment the bug is captured. However, it’s
not enough to investigate application's jank problem caused by memory
shortage. Because

1. It just shows event count which doesn’t quantify the latency of the
application well. Jank could happen by various reasons and one of simple
scenario is frame drop for a second. App should draw the frame every 16ms
interval. Just number of stats(e.g., allocstall or pgmajfault) couldn't
represnt how many of time the app spends for handling the event.

2. At bugreport, dump with vmstat and meminfo is never helpful because it's
too late to capture the moment when the problem happens.
When the user catch up the problem and try to capture the system state,
the problem has already gone.

3. Although we could capture MM stat at the moment bug happens, it couldn't
be helpful because MM stats are usually very flucuate so we need historical
data rather than one-time snapshot to see MM trend.

To solve above problems, this patch introduces per-process, light-weight,
mm event stat. Basically, it tracks minor/major faults, reclaim and compaction
latency of each process as well as event count and record the data into global
buffer.
To compromise memory overhead, it doesn't record every MM event of the process
to the buffer but just drain accumuated stats every 0.5sec interval to buffer.
If there isn't any event, it just skips the recording.
For latency data, it keeps average/max latency of each event in that period

With that, we could keep useful information with small buffer so that
we couldn't miss precious information any longer although the capture time
is rather late. This patch introduces basic facility of MM event stat.

After all patches in this patchset are applied, outout format is as follows,
dumpstate can use it for VM debugGing in future.

<...>-1665  [001] d...   217.575173: mm_event_record: min_flt count=203 avg_lat=3 max_lat=58
<...>-1665  [001] d...   217.575183: mm_event_record: maj_flt count=1 avg_lat=1994 max_lat=1994
<...>-1665  [001] d...   217.575184: mm_event_record: kern_alloc count=227 avg_lat=0 max_lat=0
<...>-626   [000] d...   217.578096: mm_event_record: kern_alloc count=4 avg_lat=0 max_lat=0
<...>-6547  [000] ....   217.581913: mm_event_record: min_flt count=7 avg_lat=7 max_lat=20
<...>-6547  [000] ....   217.581955: mm_event_record: kern_alloc count=4 avg_lat=0 max_lat=0

This feature uses event trace for output buffer so that we could use all of
general benefit of event trace(e.g., buffer size management, filtering and
so on). To prevent overflow of the ring buffer by other random event race,
highly suggest that create separate instance of tracing
on /sys/kernel/debug/tracing/instances/

I had a concern of adding overhead. Actually, major|compaction/reclaim
are already heavy cost so it should be not a concern. Rather than,
minor fault and kern alloc would be severe so I tested a micro benchmark
to measure minor page fault overhead.

Test scenario is create 40 threads and each of them does minor
page fault for 25M range(ranges are not overwrapped).
I didn't see any noticible regression.

Base:
fault/wsec avg: 758489.8288

minor faults=13123118, major faults=0 ctx switch=139234
    User   System     Wall        fault/wsec
  39.55s   41.73s   17.49s        749995.768
minor faults=13123135, major faults=0 ctx switch=139627
    User   System     Wall        fault/wsec
  34.59s   41.61s   16.95s        773906.976
minor faults=13123061, major faults=0 ctx switch=139254
    User   System     Wall        fault/wsec
  39.03s   41.55s   16.97s        772966.334
minor faults=13123131, major faults=0 ctx switch=139970
    User   System     Wall        fault/wsec
  36.71s   42.12s   17.04s        769941.019
minor faults=13123027, major faults=0 ctx switch=138524
    User   System     Wall        fault/wsec
  42.08s   42.24s   18.08s        725639.047

Base + MM event + event trace enable:
fault/wsec avg: 759626.1488

minor faults=13123488, major faults=0 ctx switch=140303
    User   System     Wall        fault/wsec
  37.66s   42.21s   17.48s        750414.257
minor faults=13123066, major faults=0 ctx switch=138119
    User   System     Wall        fault/wsec
  36.77s   42.14s   17.49s        750010.107
minor faults=13123505, major faults=0 ctx switch=140021
    User   System     Wall        fault/wsec
  38.51s   42.50s   17.54s        748022.219
minor faults=13123431, major faults=0 ctx switch=138517
    User   System     Wall        fault/wsec
  36.74s   41.49s   17.03s        770255.610
minor faults=13122955, major faults=0 ctx switch=137174
    User   System     Wall        fault/wsec
  40.68s   40.97s   16.83s        779428.551

Mot-CRs-fixed: (CR)

Bug: 80168800
Change-Id: I4e69c994f47402766481c58ab5ec2071180964b8
Signed-off-by: Minchan Kim <minchan@google.com>
Signed-off-by: Cho KyongHo <pullip.cho@samsung.com>
Reviewed-on: https://gerrit.mot.com/1453718
SME-Granted: SME Approvals Granted
SLTApproved: Slta Waiver
Tested-by: Jira Key
Reviewed-by: Xiangpo Zhao <zhaoxp3@motorola.com>
Submit-Approved: Jira Key

3 years agopsi:kernel:enable PSI configuration
wangwang [Wed, 13 Nov 2019 06:26:15 +0000 (14:26 +0800)]
psi:kernel:enable PSI configuration

support Google PSI memory management in lmkd

Change-Id: I437daa54c55c4caa9d8a67ba5bf7ac529d61da87
Reviewed-on: https://gerrit.mot.com/1453674
SLTApproved: Slta Waiver
SME-Granted: SME Approvals Granted
Tested-by: Jira Key
Reviewed-by: Xiangpo Zhao <zhaoxp3@motorola.com>
Submit-Approved: Jira Key

3 years agopsi:kernel:oom reaper porting into samsung platform
wangwang [Wed, 13 Nov 2019 06:04:33 +0000 (14:04 +0800)]
psi:kernel:oom reaper porting into samsung platform

reaper can help to reclaim the memory in time, the knob will be set to true
when init parses the init.rc conf file.

Change-Id: I59f1173c0e46202904da6eeacb2fecc32c53232c

3 years agoBACKPORT: kernel: cgroup: add poll file operation
Johannes Weiner [Wed, 30 Jan 2019 23:41:54 +0000 (10:41 +1100)]
BACKPORT: kernel: cgroup: add poll file operation

Cgroup has a standardized poll/notification mechanism for waking all
pollers on all fds when a filesystem node changes.  To allow polling for
custom events, add a .poll callback that can override the default.

This is in preparation for pollable cgroup pressure files which have
per-fd trigger configurations.

Link: http://lkml.kernel.org/r/20190124211518.244221-3-surenb@google.com
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
(cherry picked from commit: dc50537bdd1a0804fa2cbc990565ee9a944e66fa)

Conflicts:
        include/linux/cgroup-defs.h
        kernel/cgroup.c

(1. replaced __poll_t with unsigned int)

Bug: 127712811
Test: lmkd in PSI mode
Change-Id: I21aff1d9d31e3d4b45e257aa4d299405a2ce6de3
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
3 years agoFROMLIST: psi: introduce psi monitor
Suren Baghdasaryan [Tue, 4 Dec 2018 01:36:42 +0000 (17:36 -0800)]
FROMLIST: psi: introduce psi monitor

Psi monitor aims to provide a low-latency short-term pressure
detection mechanism configurable by users. It allows users to
monitor psi metrics growth and trigger events whenever a metric
raises above user-defined threshold within user-defined time window.

Time window and threshold are both expressed in usecs. Multiple psi
resources with different thresholds and window sizes can be monitored
concurrently.

Psi monitors activate when system enters stall state for the monitored
psi metric and deactivate upon exit from the stall state. While system
is in the stall state psi signal growth is monitored at a rate of 10 times
per tracking window. Min window size is 500ms, therefore the min monitoring
interval is 50ms. Max window size is 10s with monitoring interval of 1s.

When activated psi monitor stays active for at least the duration of one
tracking window to avoid repeated activations/deactivations when psi
signal is bouncing.

Notifications to the users are rate-limited to one per tracking window.

Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
(not upstream yet, latest version published at: https://lore.kernel.org/patchwork/patch/1052418/)

Conflicts:
        include/linux/psi.h
        kernel/cgroup/cgroup.c
        kernel/sched/psi.c

(1. replaced __poll_t with unsigned int
2. replaced EPOLLERR/EPOLLPRI with POLLERR/POLLPRI (values are the same)
3. include <linux/cgroup-defs.h> in include/linux/psi.h
4. include <uapi/linux/sched/types.h> in kernel/sched/psi.c)

Bug: 127712811
Bug: 129157727
Test: lmkd in PSI mode
Change-Id: I1688f047e98e1f109627dad72a33d2f70e575268
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
3 years agoFROMLIST: refactor header includes to allow kthread.h inclusion in psi_types.h
Suren Baghdasaryan [Sun, 17 Feb 2019 23:07:38 +0000 (15:07 -0800)]
FROMLIST: refactor header includes to allow kthread.h inclusion in psi_types.h

kthread.h can't be included in psi_types.h because it creates a circular
inclusion with kthread.h eventually including psi_types.h and complaining
on kthread structures not being defined because they are defined further
in the kthread.h. Resolve this by removing psi_types.h inclusion from the
headers included from kthread.h.

Signed-off-by: Suren Baghdasaryan <surenb@google.com>
(not upstream yet, latest version published at: https://lore.kernel.org/patchwork/patch/1052417/)

Conflicts:
        include/linux/kthread.h
        kernel/kthread.c

(1. <linux/cgroup.h> include is already missing in kthread.h
2. <linux/cgroup.h> is already included in kthread.c)

Bug: 127712811
Bug: 129157727
Test: lmkd in PSI mode
Change-Id: I07c1f4fddf0c43b3095f505e062d9d179d041544
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
3 years agoFROMLIST: psi: track changed states
Suren Baghdasaryan [Wed, 6 Mar 2019 18:25:50 +0000 (10:25 -0800)]
FROMLIST: psi: track changed states

Introduce changed_states parameter into collect_percpu_times to track
the states changed since the last update.

Signed-off-by: Suren Baghdasaryan <surenb@google.com>
(not upstream yet, latest version published at: https://lore.kernel.org/patchwork/patch/1052420/)

Bug: 127712811
Bug: 129157727
Test: lmkd in PSI mode
Change-Id: Idb2f7d73013bff16bb101b62a2609917a5353bf9
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
3 years agoFROMLIST: psi: split update_stats into parts
Suren Baghdasaryan [Wed, 6 Mar 2019 17:52:23 +0000 (09:52 -0800)]
FROMLIST: psi: split update_stats into parts

Split update_stats into collect_percpu_times and update_averages for
collect_percpu_times to be reused later inside psi monitor.

Signed-off-by: Suren Baghdasaryan <surenb@google.com>
(not upstream yet, latest version published at: https://lore.kernel.org/patchwork/patch/1052419/)

Bug: 127712811
Bug: 129157727
Test: lmkd in PSI mode
Change-Id: Ic5dca1924a3f8997b49b5d16289f53bcc43b88fa
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
3 years agoFROMLIST: psi: rename psi fields in preparation for psi trigger addition
Suren Baghdasaryan [Wed, 6 Mar 2019 17:21:03 +0000 (09:21 -0800)]
FROMLIST: psi: rename psi fields in preparation for psi trigger addition

Renaming psi_group structure member fields used for calculating psi totals
and averages for clear distinction between them and trigger-related fields
that will be added next.

Signed-off-by: Suren Baghdasaryan <surenb@google.com>
(not upstream yet, latest version published at: https://lore.kernel.org/patchwork/patch/1052416/)

Bug: 127712811
Bug: 129157727
Test: lmkd in PSI mode
Change-Id: I7aaadfc558950b54b02a051d63e508e8fe233b49
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
3 years agoFROMLIST: psi: make psi_enable static
Suren Baghdasaryan [Wed, 30 Jan 2019 23:41:54 +0000 (10:41 +1100)]
FROMLIST: psi: make psi_enable static

psi_enable is not used outside of psi.c, make it static.

Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
(not upstream yet, latest version published at: https://lore.kernel.org/patchwork/patch/1052415/)

Bug: 127712811
Bug: 129157727
Test: lmkd in PSI mode
Change-Id: I3c422d6c0c4299095c6ba05cfe942a2b00705f29
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
3 years agoFROMLIST: psi: introduce state_mask to represent stalled psi states
Suren Baghdasaryan [Wed, 30 Jan 2019 23:41:54 +0000 (10:41 +1100)]
FROMLIST: psi: introduce state_mask to represent stalled psi states

The psi monitoring patches will need to determine the same states as
record_times().  To avoid calculating them twice, maintain a state mask
that can be consulted cheaply.  Do this in a separate patch to keep the
churn in the main feature patch at a minimum.

This adds 4-byte state_mask member into psi_group_cpu struct which results
in its first cacheline-aligned part becoming 52 bytes long.  Add explicit
values to enumeration element counters that affect psi_group_cpu struct
size.

Link: http://lkml.kernel.org/r/20190124211518.244221-4-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
(not upstream yet, latest version published at: https://lore.kernel.org/patchwork/patch/1052414/)

Bug: 127712811
Bug: 129157727
Test: lmkd in PSI mode
Change-Id: I7807b687e2a5d78aed44c5e33be1621aa11451cb
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
3 years agoBACKPORT: fs: kernfs: add poll file operation
Johannes Weiner [Wed, 30 Jan 2019 23:41:54 +0000 (10:41 +1100)]
BACKPORT: fs: kernfs: add poll file operation

Patch series "psi: pressure stall monitors", v3.

Android is adopting psi to detect and remedy memory pressure that results
in stuttering and decreased responsiveness on mobile devices.

Psi gives us the stall information, but because we're dealing with
latencies in the millisecond range, periodically reading the pressure
files to detect stalls in a timely fashion is not feasible.  Psi also
doesn't aggregate its averages at a high enough frequency right now.

This patch series extends the psi interface such that users can configure
sensitive latency thresholds and use poll() and friends to be notified
when these are breached.

As high-frequency aggregation is costly, it implements an aggregation
method that is optimized for fast, short-interval averaGing, and makes the
aggregation frequency adaptive, such that high-frequency updates only
happen while monitored stall events are actively occurring.

With these patches applied, Android can monitor for, and ward off,
mounting memory shortages before they cause problems for the user.  For
example, using memory stall monitors in userspace low memory killer daemon
(lmkd) we can detect mounting pressure and kill less important processes
before device becomes visibly sluggish.  In our memory stress testing psi
memory monitors produce roughly 10x less false positives compared to
vmpressure signals.  Having ability to specify multiple triggers for the
same psi metric allows other parts of Android framework to monitor memory
state of the device and act accordingly.

The new interface is straightforward.  The user opens one of the pressure
files for writing and writes a trigger description into the file
descriptor that defines the stall state - some or full, and the maximum
stall time over a given window of time.  E.g.:

        /* Signal when stall time exceeds 100ms of a 1s window */
        char trigger[] = "full 100000 1000000";
        fd = open("/proc/pressure/memory");
        write(fd, trigger, sizeof(trigger));
        while (poll() >= 0) {
                ...
        }
        close(fd);

When the monitored stall state is entered, psi adapts its aggregation
frequency according to what the configured time window requires in order
to emit event signals in a timely fashion.  Once the stalling subsides,
aggregation reverts back to normal.

The trigger is associated with the open file descriptor.  To stop
monitoring, the user only needs to close the file descriptor and the
trigger is discarded.

Patches 1-4 prepare the psi code for polling support.  Patch 5 implements
the adaptive polling logic, the pressure growth detection optimized for
short intervals, and hooks up write() and poll() on the pressure files.

The patches were developed in collaboration with Johannes Weiner.

This patch (of 5):

Kernfs has a standardized poll/notification mechanism for waking all
pollers on all fds when a filesystem node changes.  To allow polling for
custom events, add a .poll callback that can override the default.

This is in preparation for pollable cgroup pressure files which have
per-fd trigger configurations.

Link: http://lkml.kernel.org/r/20190124211518.244221-2-surenb@google.com
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
(cherry picked from commit: 147e1a97c4a0bdd43f55a582a9416bb9092563a9)

Conflicts:
        fs/kernfs/file.c
        include/linux/kernfs.h

1. replaced __poll_t with unsigned int.
2. replaced EPOLLERR/EPOLLPRI with POLLERR/POLLPRI (values are the same)

Bug: 127712811
Test: lmkd in PSI mode
Change-Id: Ic2bed334d05aec62f4e695f263893c3057921c55
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
3 years agoUPSTREAM: psi: avoid divide-by-zero crash inside virtual machines
Johannes Weiner [Thu, 21 Feb 2019 06:19:59 +0000 (22:19 -0800)]
UPSTREAM: psi: avoid divide-by-zero crash inside virtual machines

We've been seeing hard-to-trigger psi crashes when running inside VM
instances:

    divide error: 0000 [#1] SMP PTI
    Modules linked in: [...]
    CPU: 0 PID: 212 Comm: kworker/0:2 Not tainted 4.16.18-119_fbk9_3817_gfe944c98d695 #119
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
    Workqueue: events psi_clock
    RIP: 0010:psi_update_stats+0x270/0x490
    RSP: 0018:ffffc90001117e10 EFLAGS: 00010246
    RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff8800a35a13f8
    RDX: 0000000000000000 RSI: ffff8800a35a1340 RDI: 0000000000000000
    RBP: 0000000000000658 R08: ffff8800a35a1470 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
    R13: 0000000000000000 R14: 0000000000000000 R15: 00000000000f8502
    FS:  0000000000000000(0000) GS:ffff88023fc00000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007fbe370fa000 CR3: 00000000b1e3a000 CR4: 00000000000006f0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
     psi_clock+0x12/0x50
     process_one_work+0x1e0/0x390
     worker_thread+0x2b/0x3c0
     ? rescuer_thread+0x330/0x330
     kthread+0x113/0x130
     ? kthread_create_worker_on_cpu+0x40/0x40
     ? SyS_exit_group+0x10/0x10
     ret_from_fork+0x35/0x40
    Code: 48 0f 47 c7 48 01 c2 45 85 e4 48 89 16 0f 85 e6 00 00 00 4c 8b 49 10 4c 8b 51 08 49 69 d9 f2 07 00 00 48 6b c0 64 4c 8b 29 31 d2 <48> f7 f7 49 69 d5 8d 06 00 00 48 89 c5 4c 69 f0 00 98 0b 00 48

The Code-line points to `period` being 0 inside update_stats(), and we
divide by that when calculating that period's pressure percentage.

The elapsed period should never be 0.  The reason this can happen is due
to an off-by-one in the idle time / missing period calculation combined
with a coarse sched_clock() in the virtual machine.

The target time for aggregation is advanced into the future on a fixed
grid to prevent clock drift.  So when an aggregation runs after some idle
period, we can not just set it to "now + psi_period", but have to
calculate the downtime and advance the target time relative to itself.

However, if the aggregator was disabled exactly one psi_period (ns), we
drop one idle period in the calculation due to a > when we should do >=.
In that case, next_update will be advanced from 'now - psi_period' to
'now' when it should be moved to 'now + psi_period'.  The run finishes
with last_update == next_update == sched_clock().

With hardware clocks, this exact nanosecond match isn't likely in the
first place; but if it does happen, the clock will still have moved on and
the period non-zero by the time the worker runs.  A pointlessly short
period, but besides the extra work, no harm no foul.  However, a slow
sched_clock() like we have on VMs might not have advanced either by the
time the worker runs again.  And when we calculate the elapsed period, the
result, our pressure divisor, will be 0.  Ouch.

Fix this by correctly handling the situation when the elapsed time between
aggregation runs is precisely two periods, and advance the expiration
timestamp correctly to period into the future.

Link: http://lkml.kernel.org/r/20190214193157.15788-1-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Łukasz Siudut <lsiudut@fb.com
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 4e37504d1c49eec6434d0cc97278d2b51c9e8763)

Bug: 127712811
Test: lmkd in PSI mode
Change-Id: I40917c84354f9f32259c6703f00b6b1d21f45f02
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
3 years agoUPSTREAM: psi: clarify the Kconfig text for the default-disable option
Johannes Weiner [Fri, 1 Feb 2019 22:21:15 +0000 (14:21 -0800)]
UPSTREAM: psi: clarify the Kconfig text for the default-disable option

The current help text caused some confusion in online forums about
whether or not to default-enable or default-disable psi in vendor
kernels.  This is because it doesn't communicate the reason for why we
made this setting configurable in the first place: that the overhead is
non-zero in an artificial scheduler stress test.

Since this isn't representative of real workloads, and the effect was
not measurable in scheduler-heavy real world applications such as the
webservers and memcache installations at Facebook, it's fair to point
out that this is a pretty cautious option to select.

Link: http://lkml.kernel.org/r/20190129233617.16767-1-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 7b2489d37e1e355228f7c55724f77580e1dec22a)

Bug: 127712811
Test: lmkd in PSI mode
Change-Id: I5d0cb901562fd74c82d9d211544745b802776d8a
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
3 years agoUPSTREAM: psi: fix aggregation idle shut-off
Johannes Weiner [Fri, 1 Feb 2019 22:20:42 +0000 (14:20 -0800)]
UPSTREAM: psi: fix aggregation idle shut-off

psi has provisions to shut off the periodic aggregation worker when
there is a period of no task activity - and thus no data that needs
aggregating.  However, while developing psi monitoring, Suren noticed
that the aggregation clock currently won't stay shut off for good.

DebugGing this revealed a flaw in the idle design: an aggregation run
will see no task activity and decide to go to sleep; shortly thereafter,
the kworker thread that executed the aggregation will go idle and cause
a scheduling change, during which the psi callback will kick the
!pending worker again.  This will ping-pong forever, and is equivalent
to having no shut-off logic at all (but with more code!)

Fix this by exempting aggregation workers from psi's clock waking logic
when the state change is them going to sleep.  To do this, tag workers
with the last work function they executed, and if in psi we see a worker
going to sleep after aggregating psi data, we will not reschedule the
aggregation work item.

What if the worker is also executing other items before or after?

Any psi state times that were incurred by work items preceding the
aggregation work will have been collected from the per-cpu buckets
during the aggregation itself.  If there are work items following the
aggregation work, the worker's last_func tag will be overwritten and the
aggregator will be kept alive to process this genuine new activity.

If the aggregation work is the last thing the worker does, and we decide
to go idle, the brief period of non-idle time incurred between the
aggregation run and the kworker's dequeue will be stranded in the
per-cpu buckets until the clock is woken by later activity.  But that
should not be a problem.  The buckets can hold 4s worth of time, and
future activity will wake the clock with a 2s delay, giving us 2s worth
of data we can leave behind when disabling aggregation.  If it takes a
worker more than two seconds to go idle after it finishes its last work
item, we likely have bigger problems in the system, and won't notice one
sample that was averaged with a bogus per-CPU weight.

Link: http://lkml.kernel.org/r/20190116193501.1910-1-hannes@cmpxchg.org
Fixes: eb414681d5a0 ("psi: pressure stall information for CPU, memory, and IO")
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 1b69ac6b40ebd85eed73e4dbccde2a36961ab990)

Bug: 127712811
Test: lmkd in PSI mode
Change-Id: I2877fec3d381b1006b8bd1261895fdfd68bd21db
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
3 years agoUPSTREAM: psi: fix reference to kernel commandline enable
Baruch Siach [Fri, 14 Dec 2018 22:17:03 +0000 (14:17 -0800)]
UPSTREAM: psi: fix reference to kernel commandline enable

The kernel commandline parameter named in CONFIG_PSI_DEFAULT_DISABLED
help text contradicts the documentation in kernel-parameters.txt, and
the code.  Fix that.

Link: http://lkml.kernel.org/r/20181203213416.GA12627@cmpxchg.org
Fixes: e0c274472d ("psi: make disabling/enabling easier for vendor kernels")
Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 428a1cb4baeb9e5c7feda93af7372ba6d2491558)

Bug: 127712811
Test: lmkd in PSI mode
Change-Id: I592b66d6542f4fa7c2b6eb9f60a5dd43bcfbabf3
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
3 years agoUPSTREAM: psi: make disabling/enabling easier for vendor kernels
Johannes Weiner [Fri, 30 Nov 2018 22:09:58 +0000 (14:09 -0800)]
UPSTREAM: psi: make disabling/enabling easier for vendor kernels

Mel Gorman reports a hackbench regression with psi that would prohibit
shipping the suse kernel with it default-enabled, but he'd still like
users to be able to opt in at little to no cost to others.

With the current combination of CONFIG_PSI and the psi_disabled bool set
from the commandline, this is a challenge.  Do the following things to
make it easier:

1. Add a config option CONFIG_PSI_DEFAULT_DISABLED that allows distros
   to enable CONFIG_PSI in their kernel but leave the feature disabled
   unless a user requests it at boot-time.

   To avoid double negatives, rename psi_disabled= to psi=.

2. Make psi_disabled a static branch to eliminate any branch costs
   when the feature is disabled.

In terms of numbers before and after this patch, Mel says:

: The following is a comparision using CONFIG_PSI=n as a baseline against
: your patch and a vanilla kernel
:
:                          4.20.0-rc4             4.20.0-rc4             4.20.0-rc4
:                 kconfigdisable-v1r1                vanilla        psidisable-v1r1
: Amean     1       1.3100 (   0.00%)      1.3923 (  -6.28%)      1.3427 (  -2.49%)
: Amean     3       3.8860 (   0.00%)      4.1230 *  -6.10%*      3.8860 (  -0.00%)
: Amean     5       6.8847 (   0.00%)      8.0390 * -16.77%*      6.7727 (   1.63%)
: Amean     7       9.9310 (   0.00%)     10.8367 *  -9.12%*      9.9910 (  -0.60%)
: Amean     12     16.6577 (   0.00%)     18.2363 *  -9.48%*     17.1083 (  -2.71%)
: Amean     18     26.5133 (   0.00%)     27.8833 *  -5.17%*     25.7663 (   2.82%)
: Amean     24     34.3003 (   0.00%)     34.6830 (  -1.12%)     32.0450 (   6.58%)
: Amean     30     40.0063 (   0.00%)     40.5800 (  -1.43%)     41.5087 (  -3.76%)
: Amean     32     40.1407 (   0.00%)     41.2273 (  -2.71%)     39.9417 (   0.50%)
:
: It's showing that the vanilla kernel takes a hit (as the bisection
: indicated it would) and that disabling PSI by default is reasonably
: close in terms of performance for this particular workload on this
: particular machine so;

Link: http://lkml.kernel.org/r/20181127165329.GA29728@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Tested-by: Mel Gorman <mgorman@techsingularity.net>
Reported-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit e0c274472d5d27f277af722e017525e0b33784cd)

Bug: 127712811
Test: lmkd in PSI mode
Change-Id: I6cb666fa351e8901df82e4d6931bfec0c5ce230d
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
3 years agoUPSTREAM: kernel/sched/psi.c: simplify cgroup_move_task()
Olof Johansson [Fri, 16 Nov 2018 23:08:00 +0000 (15:08 -0800)]
UPSTREAM: kernel/sched/psi.c: simplify cgroup_move_task()

The existing code triggered an invalid warning about 'rq' possibly being
used uninitialized.  Instead of doing the silly warning suppression by
initializa it to NULL, refactor the code to bail out early instead.

Warning was:

  kernel/sched/psi.c: In function `cgroup_move_task':
  kernel/sched/psi.c:639:13: warning: `rq' may be used uninitialized in this function [-Wmaybe-uninitialized]

Link: http://lkml.kernel.org/r/20181103183339.8669-1-olof@lixom.net
Fixes: 2ce7135adc9ad ("psi: cgroup support")
Signed-off-by: Olof Johansson <olof@lixom.net>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 8fcb2312d1e3300e81aa871aad00d4c038cfc184)

Bug: 127712811
Test: lmkd in PSI mode
Change-Id: Id989da224a726082e0cfa5d5d9460bf63d448a93
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
3 years agoBACKPORT: psi: cgroup support
Johannes Weiner [Fri, 26 Oct 2018 22:06:31 +0000 (15:06 -0700)]
BACKPORT: psi: cgroup support

On a system that executes multiple cgrouped jobs and independent
workloads, we don't just care about the health of the overall system, but
also that of individual jobs, so that we can ensure individual job health,
fairness between jobs, or prioritize some jobs over others.

This patch implements pressure stall tracking for cgroups.  In kernels
with CONFIG_PSI=y, cgroup2 groups will have cpu.pressure, memory.pressure,
and io.pressure files that track aggregate pressure stall times for only
the tasks inside the cgroup.

Link: http://lkml.kernel.org/r/20180828172258.3185-10-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Daniel Drake <drake@endlessm.com>
Tested-by: Suren Baghdasaryan <surenb@google.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Johannes Weiner <jweiner@fb.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Enderborg <peter.enderborg@sony.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Vinayak Menon <vinmenon@codeaurora.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 2ce7135adc9ad081aa3c49744144376ac74fea60)

Conflicts:
        Documentation/cgroup-v2.txt
        include/linux/psi.h
        kernel/cgroup/cgroup.c

(1. manual merge from Documentation/admin-guide/cgroup-v2.rst
2. include <linux/cgroup-defs.h> into include/linux/psi.h
3. manual merge in css_free_work_fn to allow psi support only for cgroup v2
4. manual merge in cgroup_create to allow psi support only for cgroup v2)

Bug: 127712811
Test: lmkd in PSI mode
Change-Id: I163e6657aaa60aa5aab9372616a3bce2a65e90ec
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
3 years agoUPSTREAM: psi: pressure stall information for CPU, memory, and IO
Johannes Weiner [Fri, 26 Oct 2018 22:06:27 +0000 (15:06 -0700)]
UPSTREAM: psi: pressure stall information for CPU, memory, and IO

When systems are overcommitted and resources become contended, it's hard
to tell exactly the impact this has on workload productivity, or how close
the system is to lockups and OOM kills.  In particular, when machines work
multiple jobs concurrently, the impact of overcommit in terms of latency
and throughput on the individual job can be enormous.

In order to maximize hardware utilization without sacrificing individual
job health or risk complete machine lockups, this patch implements a way
to quantify resource pressure in the system.

A kernel built with CONFIG_PSI=y creates files in /proc/pressure/ that
expose the percentage of time the system is stalled on CPU, memory, or IO,
respectively.  Stall states are aggregate versions of the per-task delay
accounting delays:

       cpu: some tasks are runnable but not executing on a CPU
       memory: tasks are reclaiming, or waiting for swapin or thrashing cache
       io: tasks are waiting for io completions

These percentages of walltime can be thought of as pressure percentages,
and they give a general sense of system health and productivity loss
incurred by resource overcommit.  They can also indicate when the system
is approaching lockup scenarios and OOMs.

To do this, psi keeps track of the task states associated with each CPU
and samples the time they spend in stall states.  Every 2 seconds, the
samples are averaged across CPUs - weighted by the CPUs' non-idle time to
eliminate artifacts from unused CPUs - and translated into percentages of
walltime.  A running average of those percentages is maintained over 10s,
1m, and 5m periods (similar to the loadaverage).

[hannes@cmpxchg.org: doc fixlet, per Randy]
Link: http://lkml.kernel.org/r/20180828205625.GA14030@cmpxchg.org
[hannes@cmpxchg.org: code optimization]
Link: http://lkml.kernel.org/r/20180907175015.GA8479@cmpxchg.org
[hannes@cmpxchg.org: rename psi_clock() to psi_update_work(), per Peter]
Link: http://lkml.kernel.org/r/20180907145404.GB11088@cmpxchg.org
[hannes@cmpxchg.org: fix build]
Link: http://lkml.kernel.org/r/20180913014222.GA2370@cmpxchg.org
Link: http://lkml.kernel.org/r/20180828172258.3185-9-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Daniel Drake <drake@endlessm.com>
Tested-by: Suren Baghdasaryan <surenb@google.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Johannes Weiner <jweiner@fb.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Enderborg <peter.enderborg@sony.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vinayak Menon <vinmenon@codeaurora.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit eb414681d5a07d28d2ff90dc05f69ec6b232ebd2)

Bug: 127712811
Test: lmkd in PSI mode
Change-Id: Id00d23c977169b0c4636d92016fc1fee0274be05
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Conflicts:
kernel/sched/Makefile

3 years agoUPSTREAM: sched: introduce this_rq_lock_irq()
Johannes Weiner [Fri, 26 Oct 2018 22:06:23 +0000 (15:06 -0700)]
UPSTREAM: sched: introduce this_rq_lock_irq()

do_sched_yield() disables IRQs, looks up this_rq() and locks it.  The next
patch is adding another site with the same pattern, so provide a
convenience function for it.

Link: http://lkml.kernel.org/r/20180828172258.3185-8-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Suren Baghdasaryan <surenb@google.com>
Tested-by: Daniel Drake <drake@endlessm.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Johannes Weiner <jweiner@fb.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Enderborg <peter.enderborg@sony.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vinayak Menon <vinmenon@codeaurora.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 246b3b3342c9b0a2e24cda2178be87bc36e1c874)

Bug: 127712811
Test: lmkd in PSI mode
Change-Id: I24b42cff1624c80633f116b7cb485564f53a30a7
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
3 years agoUPSTREAM: sched: sched.h: make rq locking and clock functions available in stats.h
Johannes Weiner [Fri, 26 Oct 2018 22:06:19 +0000 (15:06 -0700)]
UPSTREAM: sched: sched.h: make rq locking and clock functions available in stats.h

kernel/sched/sched.h includes "stats.h" half-way through the file.  The
next patch introduces users of sched.h's rq locking functions and
update_rq_clock() in kernel/sched/stats.h.  Move those definitions up in
the file so they are available in stats.h.

Link: http://lkml.kernel.org/r/20180828172258.3185-7-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Suren Baghdasaryan <surenb@google.com>
Tested-by: Daniel Drake <drake@endlessm.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Johannes Weiner <jweiner@fb.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Enderborg <peter.enderborg@sony.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vinayak Menon <vinmenon@codeaurora.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 1f351d7f7590857ea281579c26e6045b4c548ef4)

Bug: 127712811
Test: lmkd in PSI mode
Change-Id: Id342e0ba9a62b49e64f2ce8b87f883ea70230b2f
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
3 years agoUPSTREAM: sched: loadavg: make calc_load_n() public
Johannes Weiner [Fri, 26 Oct 2018 22:06:16 +0000 (15:06 -0700)]
UPSTREAM: sched: loadavg: make calc_load_n() public

It's going to be used in a later patch. Keep the churn separate.

Link: http://lkml.kernel.org/r/20180828172258.3185-6-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Suren Baghdasaryan <surenb@google.com>
Tested-by: Daniel Drake <drake@endlessm.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Johannes Weiner <jweiner@fb.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Enderborg <peter.enderborg@sony.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vinayak Menon <vinmenon@codeaurora.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 5c54f5b9edb1aa2eabbb1091c458f1b6776a1896)

Bug: 127712811
Test: lmkd in PSI mode
Change-Id: I50e0cb0dbf20ced329a484493f82ff69ca1ae97a
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
3 years agoBACKPORT: sched: loadavg: consolidate LOAD_INT, LOAD_FRAC, CALC_LOAD
Johannes Weiner [Fri, 26 Oct 2018 22:06:11 +0000 (15:06 -0700)]
BACKPORT: sched: loadavg: consolidate LOAD_INT, LOAD_FRAC, CALC_LOAD

There are several definitions of those functions/macros in places that
mess with fixed-point load averages.  Provide an official version.

[akpm@linux-foundation.org: fix missed conversion in block/blk-iolatency.c]
Link: http://lkml.kernel.org/r/20180828172258.3185-5-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Suren Baghdasaryan <surenb@google.com>
Tested-by: Daniel Drake <drake@endlessm.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Johannes Weiner <jweiner@fb.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Enderborg <peter.enderborg@sony.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vinayak Menon <vinmenon@codeaurora.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 8508cf3ffad4defa202b303e5b6379efc4cd9054)

Conflicts:
        block/blk-iolatency.c

(1. skipped changes in block/blk-iolatency.c as file does not exist in 4.14)

Bug: 127712811
Test: lmkd in PSI mode
Change-Id: Ifb7e12280b2aa4d379df29e24bbeab3e82a0bff8
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
3 years agoUPSTREAM: delayacct: track delays from thrashing cache pages
Johannes Weiner [Fri, 26 Oct 2018 22:06:08 +0000 (15:06 -0700)]
UPSTREAM: delayacct: track delays from thrashing cache pages

Delay accounting already measures the time a task spends in direct reclaim
and waiting for swapin, but in low memory situations tasks spend can spend
a significant amount of their time waiting on thrashing page cache.  This
isn't tracked right now.

To know the full impact of memory contention on an individual task,
measure the delay when waiting for a recently evicted active cache page to
read back into memory.

Also update tools/accounting/getdelays.c:

     [hannes@computer accounting]$ sudo ./getdelays -d -p 1
     print delayacct stats ON
     PID     1

     CPU             count     real total  virtual total    delay total  delay average
                     50318      745000000      847346785      400533713          0.008ms
     IO              count    delay total  delay average
                       435      122601218              0ms
     SWAP            count    delay total  delay average
                         0              0              0ms
     RECLAIM         count    delay total  delay average
                         0              0              0ms
     THRASHING       count    delay total  delay average
                        19       12621439              0ms

Link: http://lkml.kernel.org/r/20180828172258.3185-4-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Daniel Drake <drake@endlessm.com>
Tested-by: Suren Baghdasaryan <surenb@google.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Johannes Weiner <jweiner@fb.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Enderborg <peter.enderborg@sony.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vinayak Menon <vinmenon@codeaurora.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit b1d29ba82cf2bc784f4c963ddd6a2cf29e229b33)

Bug: 127712811
Test: lmkd in PSI mode
Change-Id: I259f693987cf04e6a52ee7e8accf55a17e0de005
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
3 years agoUPSTREAM: mm: workingset: tell cache transitions from workingset thrashing
Johannes Weiner [Fri, 26 Oct 2018 22:06:04 +0000 (15:06 -0700)]
UPSTREAM: mm: workingset: tell cache transitions from workingset thrashing

Refaults happen during transitions between workingsets as well as in-place
thrashing.  Knowing the difference between the two has a range of
applications, including measuring the impact of memory shortage on the
system performance, as well as the ability to smarter balance pressure
between the filesystem cache and the swap-backed workingset.

During workingset transitions, inactive cache refaults and pushes out
established active cache.  When that active cache isn't stale, however,
and also ends up refaulting, that's bonafide thrashing.

Introduce a new page flag that tells on eviction whether the page has been
active or not in its lifetime.  This bit is then stored in the shadow
entry, to classify refaults as transitioning or thrashing.

How many page->flags does this leave us with on 32-bit?

20 bits are always page flags

21 if you have an MMU

23 with the zone bits for DMA, Normal, HighMem, Movable

29 with the sparsemem section bits

30 if PAE is enabled

31 with this patch.

So on 32-bit PAE, that leaves 1 bit for distinguishing two NUMA nodes.  If
that's not enough, the system can switch to discontigmem and re-gain the 6
or 7 sparsemem section bits.

Link: http://lkml.kernel.org/r/20180828172258.3185-3-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Daniel Drake <drake@endlessm.com>
Tested-by: Suren Baghdasaryan <surenb@google.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Johannes Weiner <jweiner@fb.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Enderborg <peter.enderborg@sony.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vinayak Menon <vinmenon@codeaurora.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 8508cf3ffad4defa202b303e5b6379efc4cd9054)

Bug: 127712811
Test: lmkd in PSI mode
Change-Id: I71df060dce5590a3c654f9a0e8e54deeb74b64c2
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
3 years agomfd: cs47l35: Update codec reg value.
Wen Xie [Mon, 11 Nov 2019 09:58:11 +0000 (17:58 +0800)]
mfd: cs47l35: Update codec reg value.

cirrus vendor patch:
When detected the reg value in the cache is inconsistent with
the value in the hardware, update the hardware reg.

Change-Id: I0aea0c59665f470a8625601ac3abbbd915f8dbee
Signed-off-by: Wen Xie <xiewen3@motorola.com>
Reviewed-on: https://gerrit.mot.com/1452347
SLTApproved: Slta Waiver
SME-Granted: SME Approvals Granted
Tested-by: Jira Key
Reviewed-by: Zhengming Yao <yaozm1@mt.com>
Reviewed-by: Xiangpo Zhao <zhaoxp3@motorola.com>
Submit-Approved: Jira Key

3 years agousb:Balance the enable/disable in secure mode
a17671 [Fri, 8 Nov 2019 02:48:21 +0000 (10:48 +0800)]
usb:Balance the enable/disable in secure mode

Enable/Disable shall be banlanced when USB in secure mode
Otherwise the linked_func and func_list could be messed up
That will cause the unbinding release the wild memory
This is a Samsung platform only issue,kernel panic
Has the following mark:

configfs-gadget gadget:unbind function 'mtp'
configfs-gadget gadget:unbind function 'ptp'

Which shall not happen, since user could not choose
Both mtp and ptp together

Change-Id: I4aba691a0c4180f828c55aad5d63b9162c3f881a
Signed-off-by: a17671 <a17671@motorola.com>
Reviewed-on: https://gerrit.mot.com/1451197
SME-Granted: SME Approvals Granted
SLTApproved: Slta Waiver
Tested-by: Jira Key
Reviewed-by: Xiangpo Zhao <zhaoxp3@motorola.com>
Submit-Approved: Jira Key

3 years agoRevert "Revert "printk: add cpu info into kernel log""
Yue Sun [Wed, 6 Nov 2019 02:13:54 +0000 (21:13 -0500)]
Revert "Revert "printk: add cpu info into kernel log""

Revert this change since we finally decided not to enable Samsung
CONFIG_PRINTK_PROCESS,
https://gerrit.mot.com/#/c/1435442/ had been abandoned

Change-Id: Ic60281e58b15656199666da976721340cd692dcd
Reviewed-on: https://gerrit.mot.com/1449799
SLTApproved: Slta Waiver
SME-Granted: SME Approvals Granted
Tested-by: Jira Key
Reviewed-by: Xiangpo Zhao <zhaoxp3@motorola.com>
Submit-Approved: Jira Key

3 years agofix build error
xiest1 [Thu, 7 Nov 2019 02:18:09 +0000 (10:18 +0800)]
fix build error

Change-Id: Icbe77ce94e1ac234eb13753750a8ac7a17c77103

3 years agoinput: update touch usb cable detect report function
dengwei1 [Tue, 5 Nov 2019 07:51:10 +0000 (15:51 +0800)]
input: update touch usb cable detect report function

as vendor patch, change the report function
in call back function

Change-Id: Id0063704802c0841e14cbd5fbd2dd75a8a71c28e
Signed-off-by: dengwei1 <dengwei1@motorola.com>
Reviewed-on: https://gerrit.mot.com/1449028
SLTApproved: Slta Waiver
SME-Granted: SME Approvals Granted
Tested-by: Jira Key
Reviewed-by: Xiangpo Zhao <zhaoxp3@motorola.com>
Submit-Approved: Jira Key

3 years agoUPSTREAM: xfrm: Make set-mark default behavior backward compatible
Benedict Wong [Mon, 14 Jan 2019 19:24:38 +0000 (11:24 -0800)]
UPSTREAM: xfrm: Make set-mark default behavior backward compatible

Fixes 9b42c1f, which changed the default route lookup behavior for
tunnel mode SAs in the outbound direction to use the skb mark, whereas
previously mark=0 was used if the output mark was unspecified. In
mark-based routing schemes such as Android’s, this change in default
behavior causes routing loops or lookup failures.

This patch restores the default behavior of using a 0 mark while still
incorporating the skb mark if the SET_MARK (and SET_MARK_MASK) is
specified.

Tested with additions to Android's kernel unit test suite:
https://android-review.googlesource.com/c/kernel/tests/+/860150

Fixes: 9b42c1f ("xfrm: Extend the output_mark to support input direction and masking")
Signed-off-by: Benedict Wong <benedictwong@google.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
(cherry picked from commit e2612cd496e7b465711d219ea6118893d7253f52)
Bug: 122236988
Test: Passes kernel tests
Change-Id: I1289b5b7b1eb93c6d99a0ba7d28e24c3eb25883d
Signed-off-by: Benedict Wong <benedictwong@google.com>
3 years agoUPSTREAM: xfrm: fix ptr_ret.cocci warnings
kbuild test robot [Thu, 26 Jul 2018 07:09:52 +0000 (15:09 +0800)]
UPSTREAM: xfrm: fix ptr_ret.cocci warnings

net/xfrm/xfrm_interface.c:692:1-3: WARNING: PTR_ERR_OR_ZERO can be used

 Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR

Generated by: scripts/coccinelle/api/ptr_ret.cocci

Fixes: 44e2b838c24d ("xfrm: Return detailed errors from xfrmi_newlink")
CC: Benedict Wong <benedictwong@google.com>
Signed-off-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
(cherry picked from commit c6f5e017df9dfa9f6cbe70da008e7d716d726f1b)
Signed-off-by: Benedict Wong <benedictwong@google.com>
Bug: 113046120
Test: All kernel net-tests run, passing (20x repeated)
Change-Id: I4ec93c0427fded57ff5126dc7b3d97d9b5fd615b

3 years agoUPSTREAM: xfrm: Return detailed errors from xfrmi_newlink
Benedict Wong [Wed, 25 Jul 2018 20:45:29 +0000 (13:45 -0700)]
UPSTREAM: xfrm: Return detailed errors from xfrmi_newlink

Currently all failure modes of xfrm interface creation return EEXIST.
This change improves the granularity of errnos provided by also
returning ENODEV or EINVAL if failures happen in looking up the
underlying interface, or a required parameter is not provided.

This change has been tested against the Android Kernel Networking Tests,
with additional xfrmi_newlink tests here:

https://android-review.googlesource.com/c/kernel/tests/+/715755

Signed-off-by: Benedict Wong <benedictwong@google.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
(cherry picked from commit 44e2b838c24d883dae8496dc7b6ddac7956ba53c)
Bug: 113046120
Change-Id: Ic680bf1e4a828aaae01b289223d9396a551eefd2

3 years agoUPSTREAM: xfrm: Remove xfrmi interface ID from flowi
Benedict Wong [Thu, 19 Jul 2018 17:50:44 +0000 (10:50 -0700)]
UPSTREAM: xfrm: Remove xfrmi interface ID from flowi

In order to remove performance impact of having the extra u32 in every
single flowi, this change removes the flowi_xfrm struct, prefering to
take the if_id as a method parameter where needed.

In the inbound direction, if_id is only needed during the
__xfrm_check_policy() function, and the if_id can be determined at that
point based on the skb. As such, xfrmi_decode_session() is only called
with the skb in __xfrm_check_policy().

In the outbound direction, the only place where if_id is needed is the
xfrm_lookup() call in xfrmi_xmit2(). With this change, the if_id is
directly passed into the xfrm_lookup_with_ifid() call. All existing
callers can still call xfrm_lookup(), which uses a default if_id of 0.

This change does not change any behavior of XFRMIs except for improving
overall system performance via flowi size reduction.

This change has been tested against the Android Kernel Networking Tests:

https://android.googlesource.com/kernel/tests/+/master/net/test

Signed-off-by: Benedict Wong <benedictwong@google.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
(cherry picked from commit bc56b33404599edc412b91933d74b36873e8ea25)
Bug: 113046120
Change-Id: Icd3a1ea08427b91c54a64318d9dbb9acfb5d429a

3 years agoUPSTREAM: xfrm: Add virtual xfrm interfaces
Steffen Klassert [Tue, 12 Jun 2018 12:07:12 +0000 (14:07 +0200)]
UPSTREAM: xfrm: Add virtual xfrm interfaces

This patch adds support for virtual xfrm interfaces.
Packets that are routed through such an interface
are guaranteed to be IPsec transformed or dropped.
It is a generic virtual interface that ensures IPsec
transformation, no need to know what happens behind
the interface. This means that we can tunnel IPv4 and
IPv6 through the same interface and support all xfrm
modes (tunnel, transport and beet) on it.

Co-developed-by: Lorenzo Colitti <lorenzo@google.com>
Co-developed-by: Benedict Wong <benedictwong@google.com>
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: Benedict Wong <benedictwong@google.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Benedict Wong <benedictwong@google.com>
Tested-by: Antony Antony <antony@phenome.org>
Reviewed-by: Eyal Birger <eyal.birger@gmail.com>
(cherry picked from commit f203b76d78092faf248db3f851840fbecf80b40e)
Bug: 113046120
Change-Id: I05e8fe1e8a8a4b01886504ce694ddda29e4fbec6

3 years agoUPSTREAM: xfrm: Add a new lookup key to match xfrm interfaces.
Steffen Klassert [Tue, 12 Jun 2018 12:07:07 +0000 (14:07 +0200)]
UPSTREAM: xfrm: Add a new lookup key to match xfrm interfaces.

This patch adds the xfrm interface id as a lookup key
for xfrm states and policies. With this we can assign
states and policies to virtual xfrm interfaces.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Shannon Nelson <shannon.nelson@oracle.com>
Acked-by: Benedict Wong <benedictwong@google.com>
Tested-by: Benedict Wong <benedictwong@google.com>
Tested-by: Antony Antony <antony@phenome.org>
Reviewed-by: Eyal Birger <eyal.birger@gmail.com>
(cherry picked from commit 7e6526404adedf079279aa7aa11722deaca8fe2e)
Signed-off-by: Benedict Wong <benedictwong@google.com>
Bug: 113046120
Change-Id: I27d7757a374b0bd5f97c3e723773d6c7470a0717

3 years agoUPSTREAM: flow: Extend flow informations with xfrm interface id.
Steffen Klassert [Tue, 12 Jun 2018 12:06:57 +0000 (14:06 +0200)]
UPSTREAM: flow: Extend flow informations with xfrm interface id.

Add a new flowi_xfrm structure with informations needed to do
a xfrm lookup. At the moment it keeps the informations about
the new xfrm interface id needed to lookup xfrm interfaces
that are introduced with a followup patch. We need this new
lookup key as other possible keys, like the ifindex is
already part of the xfrm selector and used as a key to
enforce the output device after the transformation in the
policy/state lookup.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Shannon Nelson <shannon.nelson@oracle.com>
Acked-by: Benedict Wong <benedictwong@google.com>
Tested-by: Benedict Wong <benedictwong@google.com>
Tested-by: Antony Antony <antony@phenome.org>
Reviewed-by: Eyal Birger <eyal.birger@gmail.com>
(cherry picked from commit d159ce7957eec306eacda672e5909e26675ca8ef)
Signed-off-by: Benedict Wong <benedictwong@google.com>
Bug: 113046120
Change-Id: I70b520d3cf67cd663e84868b0e7cc45ffa74d080

3 years agoUPSTREAM: xfrm: Extend the output_mark to support input direction and masking.
Steffen Klassert [Tue, 12 Jun 2018 10:44:26 +0000 (12:44 +0200)]
UPSTREAM: xfrm: Extend the output_mark to support input direction and masking.

We already support setting an output mark at the xfrm_state,
unfortunately this does not support the input direction and
masking the marks that will be applied to the skb. This change
adds support applying a masked value in both directions.

The existing XFRMA_OUTPUT_MARK number is reused for this purpose
and as it is now bi-directional, it is renamed to XFRMA_SET_MARK.

An additional XFRMA_SET_MARK_MASK attribute is added for setting the
mask. If the attribute mask not provided, it is set to 0xffffffff,
keeping the XFRMA_OUTPUT_MARK existing 'full mask' semantics.

Co-developed-by: Tobias Brunner <tobias@strongswan.org>
Co-developed-by: Eyal Birger <eyal.birger@gmail.com>
Co-developed-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Tobias Brunner <tobias@strongswan.org>
Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
(cherry picked from commit 9b42c1f179a614e11893ae4619f0304a38f481ae)
Signed-off-by: Benedict Wong <benedictwong@google.com>
Bug: 113046120
Change-Id: I582f0b460dc58f01e0c30afb6167725aa337d054

3 years agoUPSTREAM: xfrm: fix XFRMA_OUTPUT_MARK policy entry
Michal Kubecek [Wed, 29 Nov 2017 17:23:56 +0000 (18:23 +0100)]
UPSTREAM: xfrm: fix XFRMA_OUTPUT_MARK policy entry

This seems to be an obvious typo, NLA_U32 is type of the attribute, not its
(minimal) length.

Fixes: 077fbac405bf ("net: xfrm: support setting an output mark.")
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
(cherry picked from commit e719135881f00c01ca400abb8a5dadaf297a24f9)
Signed-off-by: Benedict Wong <benedictwong@google.com>
Bug: 113046120
Change-Id: I4c1a8de03febfa246b99c7eb67d77f74a1e3ba93

3 years agoarm64/dts: Set detect headset button twice
Wen Xie [Wed, 23 Oct 2019 09:06:31 +0000 (17:06 +0800)]
arm64/dts: Set detect headset button twice

arm/dts audio:
Set detect headset button twice to avoid err report.

Change-Id: I6d5ca6f72cfdc7459eb02489edeee432f57dae91
Signed-off-by: Wen Xie <xiewen3@motorola.com>
Reviewed-on: https://gerrit.mot.com/1441697
SLTApproved: Slta Waiver
SME-Granted: SME Approvals Granted
Tested-by: Jira Key
Reviewed-by: Xiangpo Zhao <zhaoxp3@motorola.com>
Submit-Approved: Jira Key

3 years agousb:configfs:Set udc_name NULL if attach failed
a17671 [Thu, 17 Oct 2019 09:49:51 +0000 (17:49 +0800)]
usb:configfs:Set udc_name NULL if attach failed

If the probing of UDC controller failed
udc_name shall be NULL to avoid double unregistration
and the panic
It could happen in some corner case

Change-Id: I2e6e4168a505b86d8f1b57db53be91acc608ee97
Signed-off-by: a17671 <a17671@motorola.com>
Reviewed-on: https://gerrit.mot.com/1438349
SLTApproved: Slta Waiver
SME-Granted: SME Approvals Granted
Tested-by: Jira Key
Reviewed-by: Xiangpo Zhao <zhaoxp3@motorola.com>
Submit-Approved: Jira Key

3 years agoRevert "(CR): wlbt: update pmu sequence"
Yue Sun [Wed, 16 Oct 2019 03:31:53 +0000 (22:31 -0500)]
Revert "(CR): wlbt: update pmu sequence"

This reverts commit 4e2e4c5090cd3fad1b24e1fb81e94c8b38867e53.

Change-Id: I062461db80799c48f9119606866b2668a12694fe
Reviewed-on: https://gerrit.mot.com/1437290
SME-Granted: SME Approvals Granted
SLTApproved: Slta Waiver
Tested-by: Jira Key
Reviewed-by: Hua Tan <tanhua1@motorola.com>
Submit-Approved: Jira Key

3 years agowlbt: update pmu sequence
sunyue5 [Tue, 15 Oct 2019 13:35:44 +0000 (21:35 +0800)]
wlbt: update pmu sequence

Change-Id: I161372cee02d25b312968d4c075acccab6ac23eb
Signed-off-by: Youngsoo <youngss.kim@samsung.com>
Signed-off-by: sunyue5 <sunyue5@motorola.com>
Reviewed-on: https://gerrit.mot.com/1436932
SLTApproved: Slta Waiver
SME-Granted: SME Approvals Granted
Tested-by: Jira Key
Reviewed-by: Hua Tan <tanhua1@motorola.com>
Submit-Approved: Jira Key

3 years agoRevert "printk: add cpu info into kernel log"
sunyue5 [Fri, 11 Oct 2019 05:55:48 +0000 (13:55 +0800)]
Revert "printk: add cpu info into kernel log"

Change-Id: I20b58073db759906e2892e14373148a51e2aef99
Signed-off-by: sunyue5 <sunyue5@motorola.com>
Reviewed-on: https://gerrit.mot.com/1435441
SLTApproved: Slta Waiver
SME-Granted: SME Approvals Granted
Tested-by: Jira Key
Reviewed-by: Zonghua Liu <a17671@motorola.com>
Reviewed-by: Xiangpo Zhao <zhaoxp3@motorola.com>
Submit-Approved: Jira Key

3 years agokernel:optimize cdp thermal charGing limitation
xuwei9 [Fri, 11 Oct 2019 02:55:46 +0000 (10:55 +0800)]
kernel:optimize cdp thermal charGing limitation

Optimize cdp thermal
charGing limitation

Change-Id: I1ce95bd96e8f257ab103720609d93842840868d5
Signed-off-by: xuwei9 <xuwei9@mt.com>
Reviewed-on: https://gerrit.mot.com/1435344
SLTApproved: Slta Waiver
SME-Granted: SME Approvals Granted
Tested-by: Jira Key
Reviewed-by: Xiangpo Zhao <zhaoxp3@motorola.com>
Submit-Approved: Jira Key

3 years agofimc-is2: fix ITS scene0 test_read_write
wangdw10 [Wed, 9 Oct 2019 03:02:02 +0000 (11:02 +0800)]
fimc-is2: fix ITS scene0 test_read_write

update exposure metadata update timing to fix exposure
mismatch for raw and jpg case

Change-Id: I7bf61b8d89da819ce8974e2c7f887c3ed6ed2e3a
Signed-off-by: wangdw10 <wangdw10@mt.com>
Reviewed-on: https://gerrit.mot.com/1433855
SME-Granted: SME Approvals Granted
SLTApproved: Slta Waiver
Tested-by: Jira Key
Reviewed-by: Biming Li <libm1@motorola.com>
Reviewed-by: Dawei Wang <wangdw10@motorola.com>
Reviewed-by: Zhichao Chen <chenzc2@motorola.com>
Submit-Approved: Jira Key

3 years agoSupport APEX on samsung platform
huangzq2 [Wed, 25 Sep 2019 08:34:13 +0000 (16:34 +0800)]
Support APEX on samsung platform

Change-Id: If15e3cc404b4f6cb6b582877a55aa5779cbac5e7
Signed-off-by: huangzq2 <huangzq2@motorola.com>
Reviewed-on: https://gerrit.mot.com/1427696
SLTApproved: Slta Waiver
SME-Granted: SME Approvals Granted
Tested-by: Jira Key
Reviewed-by: Xiangpo Zhao <zhaoxp3@motorola.com>
Submit-Approved: Jira Key

3 years agoRevert "ANDROID: dm verity: add minimum prefetch size"
Sami Tolvanen [Fri, 11 Jan 2019 00:07:19 +0000 (16:07 -0800)]
Revert "ANDROID: dm verity: add minimum prefetch size"

This reverts commit ace74ccf82cfb2b73ce1df2e698d20c2fbc559dd.

Mot-CRs-fixed: (CR)

Bug: 71728490
Change-Id: Iebcb0cd9982f36c4bd2552811f9147325a291db0
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Reviewed-on: https://gerrit.mot.com/1427695
SLTApproved: Slta Waiver
SME-Granted: SME Approvals Granted
Tested-by: Jira Key
Reviewed-by: Xiangpo Zhao <zhaoxp3@motorola.com>
Submit-Approved: Jira Key

3 years agoUPSTREAM: loop: Add LOOP_SET_BLOCK_SIZE in compat ioctl
Evan Green [Mon, 2 Jul 2018 23:03:46 +0000 (16:03 -0700)]
UPSTREAM: loop: Add LOOP_SET_BLOCK_SIZE in compat ioctl

This change adds LOOP_SET_BLOCK_SIZE as one of the supported ioctls
in lo_compat_ioctl. It only takes an unsigned long argument, and
in practice a 32-bit value works fine.

Mot-CRs-fixed: (CR)

Bug: 117823094
Change-Id: I0061a082eb2632c47b7d66f35f2c909d33ff1653
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Evan Green <evgreen@chromium.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
(cherry picked from commit 9fea4b395260175de4016b42982f45a3e6e03d0b)
Signed-off-by: Martijn Coenen <maco@android.com>
Reviewed-on: https://gerrit.mot.com/1427694
Tested-by: Jira Key
SLTApproved: Slta Waiver
SME-Granted: SME Approvals Granted
Reviewed-by: Xiangpo Zhao <zhaoxp3@motorola.com>
Submit-Approved: Jira Key

3 years agoRevert "proc: Convert proc_mount to use mount_ns."
Alistair Strachan [Thu, 12 Sep 2019 06:06:06 +0000 (14:06 +0800)]
Revert "proc: Convert proc_mount to use mount_ns."

This cleanup broke the parsing of procfs mount parameters.

Bug: 79705088
Mot-CRs-Fixed:(CR)

Signed-off-by: Alistair Strachan <astrachan@google.com>
Change-Id: If6159e6501a5f9a77dd2c4ff339d378ac271fdf4
Signed-off-by: a17671 <a17671@motorola.com>
Reviewed-on: https://gerrit.mot.com/1420288
SLTApproved: Slta Waiver
SME-Granted: SME Approvals Granted
Tested-by: Jira Key
Reviewed-by: Xiangpo Zhao <zhaoxp3@motorola.com>
Submit-Approved: Jira Key

3 years agoUPSTREAM: zsmalloc: introduce zs_huge_class_size()
Sergey Senozhatsky [Thu, 5 Apr 2018 23:24:43 +0000 (16:24 -0700)]
UPSTREAM: zsmalloc: introduce zs_huge_class_size()

Patch series "zsmalloc/zram: drop zram's max_zpage_size", v3.

ZRAM's max_zpage_size is a bad thing.  It forces zsmalloc to store
normal objects as huge ones, which results in bigger zsmalloc memory
usage.  Drop it and use actual zsmalloc huge-class value when decide if
the object is huge or not.

This patch (of 2):

Not every object can be share its zspage with other objects, e.g.  when
the object is as big as zspage or nearly as big a zspage.  For such
objects zsmalloc has a so called huge class - every object which belongs
to huge class consumes the entire zspage (which consists of a physical
page).  On x86_64, PAGE_SHIFT 12 box, the first non-huge class size is
3264, so starting down from size 3264, objects can share page(-s) and
thus minimize memory wastage.

ZRAM, however, has its own statically defined watermark for huge
objects, namely "3 * PAGE_SIZE / 4 = 3072", and forcibly stores every
object larger than this watermark (3072) as a PAGE_SIZE object, in other
words, to a huge class, while zsmalloc can keep some of those objects in
non-huge classes.  This results in increased memory consumption.

zsmalloc knows better if the object is huge or not.  Introduce
zs_huge_class_size() function which tells if the given object can be
stored in one of non-huge classes or not.  This will let us to drop
ZRAM's huge object watermark and fully rely on zsmalloc when we decide
if the object is huge.

[sergey.senozhatsky.work@gmail.com: add pool param to zs_huge_class_size()]
Link: http://lkml.kernel.org/r/20180314081833.1096-2-sergey.senozhatsky@gmail.com
Link: http://lkml.kernel.org/r/20180306070639.7389-2-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 010b495e2fa32353d0ef6aa70a8169e5ef617a15)
Signed-off-by: Peter Kalauskas <peskal@google.com>
Bug: 113183619
Change-Id: I842d8234a53f30d2803139107f420f7217d6df6e
Reviewed-on: https://gerrit.mot.com/1416293
SLTApproved: Slta Waiver
SME-Granted: SME Approvals Granted
Tested-by: Jira Key
Reviewed-by: Guolin Wang <wanggl3@mt.com>
Submit-Approved: Jira Key

3 years agoEnable zram writeback
huangzq2 [Wed, 4 Sep 2019 04:58:12 +0000 (12:58 +0800)]
Enable zram writeback

Porting zram changes from Google, and enable zram writeback

Change-Id: I1bcb545dd4cdeb7f456d2f609fdb43cd9a822816
Signed-off-by: huangzq2 <huangzq2@motorola.com>
Reviewed-on: https://gerrit.mot.com/1416294
SLTApproved: Slta Waiver
SME-Granted: SME Approvals Granted
Tested-by: Jira Key
Reviewed-by: Guolin Wang <wanggl3@mt.com>
Submit-Approved: Jira Key

3 years agoconfig: Enable SCSC_WLAN_ABNORMAL_MULTICAST_PKT_FILTER
sunyue5 [Wed, 4 Sep 2019 07:16:53 +0000 (15:16 +0800)]
config: Enable SCSC_WLAN_ABNORMAL_MULTICAST_PKT_FILTER

Drop pkts whose mac address are unicast and ip address
are multicast. It can optimize wifi power consumption.

Change-Id: I5fcb63783e76a4dae78b8305d6e0fc0a009c1aa7
Signed-off-by: sunyue5 <sunyue5@motorola.com>
Reviewed-on: https://gerrit.mot.com/1415107
SLTApproved: Slta Waiver
SME-Granted: SME Approvals Granted
Tested-by: Jira Key
Reviewed-by: Xiangpo Zhao <zhaoxp3@motorola.com>
Submit-Approved: Jira Key