Thomas Gleixner [Fri, 26 Jan 2018 13:54:32 +0000 (14:54 +0100)]
hrtimer: Reset hrtimer cpu base proper on CPU hotplug
commit
d5421ea43d30701e03cadc56a38854c36a8b4433 upstream.
The hrtimer interrupt code contains a hang detection and mitigation
mechanism, which prevents that a long delayed hrtimer interrupt causes a
continous retriggering of interrupts which prevent the system from making
progress. If a hang is detected then the timer hardware is programmed with
a certain delay into the future and a flag is set in the hrtimer cpu base
which prevents newly enqueued timers from reprogramming the timer hardware
prior to the chosen delay. The subsequent hrtimer interrupt after the delay
clears the flag and resumes normal operation.
If such a hang happens in the last hrtimer interrupt before a CPU is
unplugged then the hang_detected flag is set and stays that way when the
CPU is plugged in again. At that point the timer hardware is not armed and
it cannot be armed because the hang_detected flag is still active, so
nothing clears that flag. As a consequence the CPU does not receive hrtimer
interrupts and no timers expire on that CPU which results in RCU stalls and
other malfunctions.
Clear the flag along with some other less critical members of the hrtimer
cpu base to ensure starting from a clean state when a CPU is plugged in.
Thanks to Paul, Sebastian and Anna-Maria for their help to get down to the
root cause of that hard to reproduce heisenbug. Once understood it's
trivial and certainly justifies a brown paperbag.
Fixes:
41d2e4949377 ("hrtimer: Tune hrtimer_interrupt hang logic")
Change-Id: I707de358bcc70a980dc4bd581c159a30e01cce41
Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Sewior <bigeasy@linutronix.de>
Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801261447590.2067@nanos
[bigeasy: backport to v3.18, drop ->next_timer it was introduced later]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kevin F. Haggerty <haggertk@lineageos.org>
John Stultz [Fri, 8 May 2015 20:47:23 +0000 (13:47 -0700)]
ktime: Fix ktime_divns to do signed division
[ Upstream commit
f7bcb70ebae0dcdb5a2d859b09e4465784d99029 ]
It was noted that the 32bit implementation of ktime_divns()
was doing unsigned division and didn't properly handle
negative values.
And when a ktime helper was changed to utilize
ktime_divns, it caused a regression on some IR blasters.
See the following bugzilla for details:
https://bugzilla.redhat.com/show_bug.cgi?id=
1200353
This patch fixes the problem in ktime_divns by checking
and preserving the sign bit, and then reapplying it if
appropriate after the division, it also changes the return
type to a s64 to make it more obvious this is expected.
Nicolas also pointed out that negative dividers would
cause infinite loops on 32bit systems, negative dividers
is unlikely for users of this function, but out of caution
this patch adds checks for negative dividers for both
32-bit (BUG_ON) and 64-bit(WARN_ON) versions to make sure
no such use cases creep in.
[ tglx: Hand an u64 to do_div() to avoid the compiler warning ]
Fixes:
166afb64511e 'ktime: Sanitize ktime_to_us/ms conversion'
Change-Id: I8af36d5e6bd3a3f049f564dd3b65cc1b3e929561
Reported-and-tested-by: Trevor Cordes <trevor@tecnopolis.ca>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Josh Boyer <jwboyer@redhat.com>
Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/1431118043-23452-1-git-send-email-john.stultz@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Kevin F. Haggerty <haggertk@lineageos.org>
Nicolas Pitre [Wed, 3 Dec 2014 19:43:06 +0000 (14:43 -0500)]
ktime: Optimize ktime_divns for constant divisors
[ Upstream commit
8b618628b2bf83512fc8df5e8672619d65adfdfb ]
At least on ARM, do_div() is optimized to turn constant divisors into
an inline multiplication by the reciprocal value at compile time.
However this optimization is missed entirely whenever ktime_divns() is
used and the slow out-of-line division code is used all the time.
Let ktime_divns() use do_div() inline whenever the divisor is constant
and small enough. This will make things like ktime_to_us() and
ktime_to_ms() much faster.
Change-Id: I6a672b10d6d6b373b327f9548135ededfd1e1835
Cc: Arnd Bergmann <arnd.bergmann@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Nicolas Pitre <nico@linaro.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Kevin F. Haggerty <haggertk@lineageos.org>
Thomas Gleixner [Wed, 16 Jul 2014 21:03:55 +0000 (21:03 +0000)]
ktime: Sanitize ktime_to_us/ms conversion
With the plain nanoseconds based ktime_t we can simply use
ktime_divns() instead of going through loops and hoops of
timespec/timeval conversion.
Change-Id: Ib33ea80bc3fdd26fcada1072b1f8ffc26081c0ba
Reported-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Kevin F. Haggerty <haggertk@lineageos.org>
Jason Yan [Tue, 16 Jun 2020 12:16:55 +0000 (20:16 +0800)]
block: Fix use-after-free in blkdev_get()
[ Upstream commit
2d3a8e2deddea6c89961c422ec0c5b851e648c14 ]
In blkdev_get() we call __blkdev_get() to do some internal jobs and if
there is some errors in __blkdev_get(), the bdput() is called which
means we have released the refcount of the bdev (actually the refcount of
the bdev inode). This means we cannot access bdev after that point. But
acctually bdev is still accessed in blkdev_get() after calling
__blkdev_get(). This results in use-after-free if the refcount is the
last one we released in __blkdev_get(). Let's take a look at the
following scenerio:
CPU0 CPU1 CPU2
blkdev_open blkdev_open Remove disk
bd_acquire
blkdev_get
__blkdev_get del_gendisk
bdev_unhash_inode
bd_acquire bdev_get_gendisk
bd_forget failed because of unhashed
bdput
bdput (the last one)
bdev_evict_inode
access bdev => use after free
[ 459.350216] BUG: KASAN: use-after-free in __lock_acquire+0x24c1/0x31b0
[ 459.351190] Read of size 8 at addr
ffff88806c815a80 by task syz-executor.0/20132
[ 459.352347]
[ 459.352594] CPU: 0 PID: 20132 Comm: syz-executor.0 Not tainted 4.19.90 #2
[ 459.353628] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
[ 459.354947] Call Trace:
[ 459.355337] dump_stack+0x111/0x19e
[ 459.355879] ? __lock_acquire+0x24c1/0x31b0
[ 459.356523] print_address_description+0x60/0x223
[ 459.357248] ? __lock_acquire+0x24c1/0x31b0
[ 459.357887] kasan_report.cold+0xae/0x2d8
[ 459.358503] __lock_acquire+0x24c1/0x31b0
[ 459.359120] ? _raw_spin_unlock_irq+0x24/0x40
[ 459.359784] ? lockdep_hardirqs_on+0x37b/0x580
[ 459.360465] ? _raw_spin_unlock_irq+0x24/0x40
[ 459.361123] ? finish_task_switch+0x125/0x600
[ 459.361812] ? finish_task_switch+0xee/0x600
[ 459.362471] ? mark_held_locks+0xf0/0xf0
[ 459.363108] ? __schedule+0x96f/0x21d0
[ 459.363716] lock_acquire+0x111/0x320
[ 459.364285] ? blkdev_get+0xce/0xbe0
[ 459.364846] ? blkdev_get+0xce/0xbe0
[ 459.365390] __mutex_lock+0xf9/0x12a0
[ 459.365948] ? blkdev_get+0xce/0xbe0
[ 459.366493] ? bdev_evict_inode+0x1f0/0x1f0
[ 459.367130] ? blkdev_get+0xce/0xbe0
[ 459.367678] ? destroy_inode+0xbc/0x110
[ 459.368261] ? mutex_trylock+0x1a0/0x1a0
[ 459.368867] ? __blkdev_get+0x3e6/0x1280
[ 459.369463] ? bdev_disk_changed+0x1d0/0x1d0
[ 459.370114] ? blkdev_get+0xce/0xbe0
[ 459.370656] blkdev_get+0xce/0xbe0
[ 459.371178] ? find_held_lock+0x2c/0x110
[ 459.371774] ? __blkdev_get+0x1280/0x1280
[ 459.372383] ? lock_downgrade+0x680/0x680
[ 459.373002] ? lock_acquire+0x111/0x320
[ 459.373587] ? bd_acquire+0x21/0x2c0
[ 459.374134] ? do_raw_spin_unlock+0x4f/0x250
[ 459.374780] blkdev_open+0x202/0x290
[ 459.375325] do_dentry_open+0x49e/0x1050
[ 459.375924] ? blkdev_get_by_dev+0x70/0x70
[ 459.376543] ? __x64_sys_fchdir+0x1f0/0x1f0
[ 459.377192] ? inode_permission+0xbe/0x3a0
[ 459.377818] path_openat+0x148c/0x3f50
[ 459.378392] ? kmem_cache_alloc+0xd5/0x280
[ 459.379016] ? entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 459.379802] ? path_lookupat.isra.0+0x900/0x900
[ 459.380489] ? __lock_is_held+0xad/0x140
[ 459.381093] do_filp_open+0x1a1/0x280
[ 459.381654] ? may_open_dev+0xf0/0xf0
[ 459.382214] ? find_held_lock+0x2c/0x110
[ 459.382816] ? lock_downgrade+0x680/0x680
[ 459.383425] ? __lock_is_held+0xad/0x140
[ 459.384024] ? do_raw_spin_unlock+0x4f/0x250
[ 459.384668] ? _raw_spin_unlock+0x1f/0x30
[ 459.385280] ? __alloc_fd+0x448/0x560
[ 459.385841] do_sys_open+0x3c3/0x500
[ 459.386386] ? filp_open+0x70/0x70
[ 459.386911] ? trace_hardirqs_on_thunk+0x1a/0x1c
[ 459.387610] ? trace_hardirqs_off_caller+0x55/0x1c0
[ 459.388342] ? do_syscall_64+0x1a/0x520
[ 459.388930] do_syscall_64+0xc3/0x520
[ 459.389490] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 459.390248] RIP: 0033:0x416211
[ 459.390720] Code: 75 14 b8 02 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83
04 19 00 00 c3 48 83 ec 08 e8 0a fa ff ff 48 89 04 24 b8 02 00 00 00 0f
05 <48> 8b 3c 24 48 89 c2 e8 53 fa ff ff 48 89 d0 48 83 c4 08 48 3d
01
[ 459.393483] RSP: 002b:
00007fe45dfe9a60 EFLAGS:
00000293 ORIG_RAX:
0000000000000002
[ 459.394610] RAX:
ffffffffffffffda RBX:
00007fe45dfea6d4 RCX:
0000000000416211
[ 459.395678] RDX:
00007fe45dfe9b0a RSI:
0000000000000002 RDI:
00007fe45dfe9b00
[ 459.396758] RBP:
000000000076bf20 R08:
0000000000000000 R09:
000000000000000a
[ 459.397930] R10:
0000000000000075 R11:
0000000000000293 R12:
00000000ffffffff
[ 459.399022] R13:
0000000000000bd9 R14:
00000000004cdb80 R15:
000000000076bf2c
[ 459.400168]
[ 459.400430] Allocated by task 20132:
[ 459.401038] kasan_kmalloc+0xbf/0xe0
[ 459.401652] kmem_cache_alloc+0xd5/0x280
[ 459.402330] bdev_alloc_inode+0x18/0x40
[ 459.402970] alloc_inode+0x5f/0x180
[ 459.403510] iget5_locked+0x57/0xd0
[ 459.404095] bdget+0x94/0x4e0
[ 459.404607] bd_acquire+0xfa/0x2c0
[ 459.405113] blkdev_open+0x110/0x290
[ 459.405702] do_dentry_open+0x49e/0x1050
[ 459.406340] path_openat+0x148c/0x3f50
[ 459.406926] do_filp_open+0x1a1/0x280
[ 459.407471] do_sys_open+0x3c3/0x500
[ 459.408010] do_syscall_64+0xc3/0x520
[ 459.408572] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 459.409415]
[ 459.409679] Freed by task 1262:
[ 459.410212] __kasan_slab_free+0x129/0x170
[ 459.410919] kmem_cache_free+0xb2/0x2a0
[ 459.411564] rcu_process_callbacks+0xbb2/0x2320
[ 459.412318] __do_softirq+0x225/0x8ac
Fix this by delaying bdput() to the end of blkdev_get() which means we
have finished accessing bdev.
Fixes:
77ea887e433a ("implement in-kernel gendisk events handling")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Jason Yan <yanaijie@huawei.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Change-Id: I642bd3c3224cc174ba4bdd57133b1a17fb35dccd
CVE-2020-15436
Signed-off-by: Kevin F. Haggerty <haggertk@lineageos.org>
Kevin F. Haggerty [Fri, 4 Jun 2021 02:33:21 +0000 (20:33 -0600)]
fs: sdfat: __sdfat_submit_bio_write() should always submit a WRITE
sdfat version 2.4.5 introduced wbc_to_write_flags() as a compat
function for legacy kernels. Unfortunately, as the legacy
__sdfat_submit_bio_write() path was adapted to take advantage of
wbc_to_write_flags(), it used the return value direcly as the
rw paramater to submit_bio(), and the default return value is 0
(READ). This becomes a problem later when write paths assert that
bio_data_dir() is WRITE (i.e., write bit of bio->bi_rw is set).
Pass a bitwise-or of WRITE and the return value of wbc_to_write_flags()
to submit_bio() on this legacy path. This logically unifies the
approach with the newer path here and directly with the legacy path of
__sdfat_submit_bio_write2() in mpage.c.
Fixes:
afd489c6a8c8 ("fs: sdfat: Update to version 2.4.5")
Signed-off-by: Kevin F. Haggerty <haggertk@lineageos.org>
Change-Id: Ic7b937a467e9e116c3390c6ac0e009a59229b93c
Sergey Senozhatsky [Fri, 20 May 2016 23:59:51 +0000 (16:59 -0700)]
zram: user per-cpu compression streams
Remove idle streams list and keep compression streams in per-cpu data.
This removes two contented spin_lock()/spin_unlock() calls from write
path and also prevent write OP from being preempted while holding the
compression stream, which can cause slow downs.
For instance, let's assume that we have N cpus and N-2
max_comp_streams.TASK1 owns the last idle stream, TASK2-TASK3 come in
with the write requests:
TASK1 TASK2 TASK3
zram_bvec_write()
spin_lock
find stream
spin_unlock
compress
<<preempted>> zram_bvec_write()
spin_lock
find stream
spin_unlock
no_stream
schedule
zram_bvec_write()
spin_lock
find_stream
spin_unlock
no_stream
schedule
spin_lock
release stream
spin_unlock
wake up TASK2
not only TASK2 and TASK3 will not get the stream, TASK1 will be
preempted in the middle of its operation; while we would prefer it to
finish compression and release the stream.
Test environment: x86_64, 4 CPU box, 3G zram, lzo
The following fio tests were executed:
read, randread, write, randwrite, rw, randrw
with the increasing number of jobs from 1 to 10.
4 streams 8 streams per-cpu
===========================================================
jobs1
READ: 2520.1MB/s 2566.5MB/s 2491.5MB/s
READ: 2102.7MB/s 2104.2MB/s 2091.3MB/s
WRITE: 1355.1MB/s 1320.2MB/s 1378.9MB/s
WRITE: 1103.5MB/s 1097.2MB/s 1122.5MB/s
READ: 434013KB/s 435153KB/s 439961KB/s
WRITE: 433969KB/s 435109KB/s 439917KB/s
READ: 403166KB/s 405139KB/s 403373KB/s
WRITE: 403223KB/s 405197KB/s 403430KB/s
jobs2
READ: 7958.6MB/s 8105.6MB/s 8073.7MB/s
READ: 6864.9MB/s 6989.8MB/s 7021.8MB/s
WRITE: 2438.1MB/s 2346.9MB/s 3400.2MB/s
WRITE: 1994.2MB/s 1990.3MB/s 2941.2MB/s
READ: 981504KB/s 973906KB/s 1018.8MB/s
WRITE: 981659KB/s 974060KB/s 1018.1MB/s
READ: 937021KB/s 938976KB/s 987250KB/s
WRITE: 934878KB/s 936830KB/s 984993KB/s
jobs3
READ: 13280MB/s 13553MB/s 13553MB/s
READ: 11534MB/s 11785MB/s 11755MB/s
WRITE: 3456.9MB/s 3469.9MB/s 4810.3MB/s
WRITE: 3029.6MB/s 3031.6MB/s 4264.8MB/s
READ: 1363.8MB/s 1362.6MB/s 1448.9MB/s
WRITE: 1361.9MB/s 1360.7MB/s 1446.9MB/s
READ: 1309.4MB/s 1310.6MB/s 1397.5MB/s
WRITE: 1307.4MB/s 1308.5MB/s 1395.3MB/s
jobs4
READ: 20244MB/s 20177MB/s 20344MB/s
READ: 17886MB/s 17913MB/s 17835MB/s
WRITE: 4071.6MB/s 4046.1MB/s 6370.2MB/s
WRITE: 3608.9MB/s 3576.3MB/s 5785.4MB/s
READ: 1824.3MB/s 1821.6MB/s 1997.5MB/s
WRITE: 1819.8MB/s 1817.4MB/s 1992.5MB/s
READ: 1765.7MB/s 1768.3MB/s 1937.3MB/s
WRITE: 1767.5MB/s 1769.1MB/s 1939.2MB/s
jobs5
READ: 18663MB/s 18986MB/s 18823MB/s
READ: 16659MB/s 16605MB/s 16954MB/s
WRITE: 3912.4MB/s 3888.7MB/s 6126.9MB/s
WRITE: 3506.4MB/s 3442.5MB/s 5519.3MB/s
READ: 1798.2MB/s 1746.5MB/s 1935.8MB/s
WRITE: 1792.7MB/s 1740.7MB/s 1929.1MB/s
READ: 1727.6MB/s 1658.2MB/s 1917.3MB/s
WRITE: 1726.5MB/s 1657.2MB/s 1916.6MB/s
jobs6
READ: 21017MB/s 20922MB/s 21162MB/s
READ: 19022MB/s 19140MB/s 18770MB/s
WRITE: 3968.2MB/s 4037.7MB/s 6620.8MB/s
WRITE: 3643.5MB/s 3590.2MB/s 6027.5MB/s
READ: 1871.8MB/s 1880.5MB/s 2049.9MB/s
WRITE: 1867.8MB/s 1877.2MB/s 2046.2MB/s
READ: 1755.8MB/s 1710.3MB/s 1964.7MB/s
WRITE: 1750.5MB/s 1705.9MB/s 1958.8MB/s
jobs7
READ: 21103MB/s 20677MB/s 21482MB/s
READ: 18522MB/s 18379MB/s 19443MB/s
WRITE: 4022.5MB/s 4067.4MB/s 6755.9MB/s
WRITE: 3691.7MB/s 3695.5MB/s 5925.6MB/s
READ: 1841.5MB/s 1933.9MB/s 2090.5MB/s
WRITE: 1842.7MB/s 1935.3MB/s 2091.9MB/s
READ: 1832.4MB/s 1856.4MB/s 1971.5MB/s
WRITE: 1822.3MB/s 1846.2MB/s 1960.6MB/s
jobs8
READ: 20463MB/s 20194MB/s 20862MB/s
READ: 18178MB/s 17978MB/s 18299MB/s
WRITE: 4085.9MB/s 4060.2MB/s 7023.8MB/s
WRITE: 3776.3MB/s 3737.9MB/s 6278.2MB/s
READ: 1957.6MB/s 1944.4MB/s 2109.5MB/s
WRITE: 1959.2MB/s 1946.2MB/s 2111.4MB/s
READ: 1900.6MB/s 1885.7MB/s 2082.1MB/s
WRITE: 1896.2MB/s 1881.4MB/s 2078.3MB/s
jobs9
READ: 19692MB/s 19734MB/s 19334MB/s
READ: 17678MB/s 18249MB/s 17666MB/s
WRITE: 4004.7MB/s 4064.8MB/s 6990.7MB/s
WRITE: 3724.7MB/s 3772.1MB/s 6193.6MB/s
READ: 1953.7MB/s 1967.3MB/s 2105.6MB/s
WRITE: 1953.4MB/s 1966.7MB/s 2104.1MB/s
READ: 1860.4MB/s 1897.4MB/s 2068.5MB/s
WRITE: 1858.9MB/s 1895.9MB/s 2066.8MB/s
jobs10
READ: 19730MB/s 19579MB/s 19492MB/s
READ: 18028MB/s 18018MB/s 18221MB/s
WRITE: 4027.3MB/s 4090.6MB/s 7020.1MB/s
WRITE: 3810.5MB/s 3846.8MB/s 6426.8MB/s
READ: 1956.1MB/s 1994.6MB/s 2145.2MB/s
WRITE: 1955.9MB/s 1993.5MB/s 2144.8MB/s
READ: 1852.8MB/s 1911.6MB/s 2075.8MB/s
WRITE: 1855.7MB/s 1914.6MB/s 2078.1MB/s
perf stat
4 streams 8 streams per-cpu
====================================================================================================================
jobs1
stalled-cycles-frontend 23,174,811,209 ( 38.21%) 23,220,254,188 ( 38.25%) 23,061,406,918 ( 38.34%)
stalled-cycles-backend 11,514,174,638 ( 18.98%) 11,696,722,657 ( 19.27%) 11,370,852,810 ( 18.90%)
instructions 73,925,005,782 ( 1.22) 73,903,177,632 ( 1.22) 73,507,201,037 ( 1.22)
branches 14,455,124,835 ( 756.063) 14,455,184,779 ( 755.281) 14,378,599,509 ( 758.546)
branch-misses 69,801,336 ( 0.48%) 80,225,529 ( 0.55%) 72,044,726 ( 0.50%)
jobs2
stalled-cycles-frontend 49,912,741,782 ( 46.11%) 50,101,189,290 ( 45.95%) 32,874,195,633 ( 35.11%)
stalled-cycles-backend 27,080,366,230 ( 25.02%) 27,949,970,232 ( 25.63%) 16,461,222,706 ( 17.58%)
instructions 122,831,629,690 ( 1.13) 122,919,846,419 ( 1.13) 121,924,786,775 ( 1.30)
branches 23,725,889,239 ( 692.663) 23,733,547,140 ( 688.062) 23,553,950,311 ( 794.794)
branch-misses 90,733,041 ( 0.38%) 96,320,895 ( 0.41%) 84,561,092 ( 0.36%)
jobs3
stalled-cycles-frontend 66,437,834,608 ( 45.58%) 63,534,923,344 ( 43.69%) 42,101,478,505 ( 33.19%)
stalled-cycles-backend 34,940,799,661 ( 23.97%) 34,774,043,148 ( 23.91%) 21,163,324,388 ( 16.68%)
instructions 171,692,121,862 ( 1.18) 171,775,373,044 ( 1.18) 170,353,542,261 ( 1.34)
branches 32,968,962,622 ( 628.723) 32,987,739,894 ( 630.512) 32,729,463,918 ( 717.027)
branch-misses 111,522,732 ( 0.34%) 110,472,894 ( 0.33%) 99,791,291 ( 0.30%)
jobs4
stalled-cycles-frontend 98,741,701,675 ( 49.72%) 94,797,349,965 ( 47.59%) 54,535,655,381 ( 33.53%)
stalled-cycles-backend 54,642,609,615 ( 27.51%) 55,233,554,408 ( 27.73%) 27,882,323,541 ( 17.14%)
instructions 220,884,807,851 ( 1.11) 220,930,887,273 ( 1.11) 218,926,845,851 ( 1.35)
branches 42,354,518,180 ( 592.105) 42,362,770,587 ( 590.452) 41,955,552,870 ( 716.154)
branch-misses 138,093,449 ( 0.33%) 131,295,286 ( 0.31%) 121,794,771 ( 0.29%)
jobs5
stalled-cycles-frontend 116,219,747,212 ( 48.14%) 110,310,397,012 ( 46.29%) 66,373,082,723 ( 33.70%)
stalled-cycles-backend 66,325,434,776 ( 27.48%) 64,157,087,914 ( 26.92%) 32,999,097,299 ( 16.76%)
instructions 270,615,008,466 ( 1.12) 270,546,409,525 ( 1.14) 268,439,910,948 ( 1.36)
branches 51,834,046,557 ( 599.108) 51,811,867,722 ( 608.883) 51,412,576,077 ( 729.213)
branch-misses 158,197,086 ( 0.31%) 142,639,805 ( 0.28%) 133,425,455 ( 0.26%)
jobs6
stalled-cycles-frontend 138,009,414,492 ( 48.23%) 139,063,571,254 ( 48.80%) 75,278,568,278 ( 32.80%)
stalled-cycles-backend 79,211,949,650 ( 27.68%) 79,077,241,028 ( 27.75%) 37,735,797,899 ( 16.44%)
instructions 319,763,993,731 ( 1.12) 319,937,782,834 ( 1.12) 316,663,600,784 ( 1.38)
branches 61,219,433,294 ( 595.056) 61,250,355,540 ( 598.215) 60,523,446,617 ( 733.706)
branch-misses 169,257,123 ( 0.28%) 154,898,028 ( 0.25%) 141,180,587 ( 0.23%)
jobs7
stalled-cycles-frontend 162,974,812,119 ( 49.20%) 159,290,061,987 ( 48.43%) 88,046,641,169 ( 33.21%)
stalled-cycles-backend 92,223,151,661 ( 27.84%) 91,667,904,406 ( 27.87%) 44,068,454,971 ( 16.62%)
instructions 369,516,432,430 ( 1.12) 369,361,799,063 ( 1.12) 365,290,380,661 ( 1.38)
branches 70,795,673,950 ( 594.220) 70,743,136,124 ( 597.876) 69,803,996,038 ( 732.822)
branch-misses 181,708,327 ( 0.26%) 165,767,821 ( 0.23%) 150,109,797 ( 0.22%)
jobs8
stalled-cycles-frontend 185,000,017,027 ( 49.30%) 182,334,345,473 ( 48.37%) 99,980,147,041 ( 33.26%)
stalled-cycles-backend 105,753,516,186 ( 28.18%) 107,937,830,322 ( 28.63%) 51,404,177,181 ( 17.10%)
instructions 418,153,161,055 ( 1.11) 418,308,565,828 ( 1.11) 413,653,475,581 ( 1.38)
branches 80,035,882,398 ( 592.296) 80,063,204,510 ( 589.843) 79,024,105,589 ( 730.530)
branch-misses 199,764,528 ( 0.25%) 177,936,926 ( 0.22%) 160,525,449 ( 0.20%)
jobs9
stalled-cycles-frontend 210,941,799,094 ( 49.63%) 204,714,679,254 ( 48.55%) 114,251,113,756 ( 33.96%)
stalled-cycles-backend 122,640,849,067 ( 28.85%) 122,188,553,256 ( 28.98%) 58,360,041,127 ( 17.35%)
instructions 468,151,025,415 ( 1.10) 467,354,869,323 ( 1.11) 462,665,165,216 ( 1.38)
branches 89,657,067,510 ( 585.628) 89,411,550,407 ( 588.990) 88,360,523,943 ( 730.151)
branch-misses 218,292,301 ( 0.24%) 191,701,247 ( 0.21%) 178,535,678 ( 0.20%)
jobs10
stalled-cycles-frontend 233,595,958,008 ( 49.81%) 227,540,615,689 ( 49.11%) 160,341,979,938 ( 43.07%)
stalled-cycles-backend 136,153,676,021 ( 29.03%) 133,635,240,742 ( 28.84%) 65,909,135,465 ( 17.70%)
instructions 517,001,168,497 ( 1.10) 516,210,976,158 ( 1.11) 511,374,038,613 ( 1.37)
branches 98,911,641,329 ( 585.796) 98,700,069,712 ( 591.583) 97,646,761,028 ( 728.712)
branch-misses 232,341,823 ( 0.23%) 199,256,308 ( 0.20%) 183,135,268 ( 0.19%)
per-cpu streams tend to cause significantly less stalled cycles; execute
less branches and hit less branch-misses.
perf stat reported execution time
4 streams 8 streams per-cpu
====================================================================
jobs1
seconds elapsed 20.
909073870 20.
875670495 20.
817838540
jobs2
seconds elapsed 18.
529488399 18.
720566469 16.
356103108
jobs3
seconds elapsed 18.
991159531 18.
991340812 16.
766216066
jobs4
seconds elapsed 19.
560643828 19.
551323547 16.
246621715
jobs5
seconds elapsed 24.
746498464 25.
221646740 20.
696112444
jobs6
seconds elapsed 28.
258181828 28.
289765505 22.
885688857
jobs7
seconds elapsed 32.
632490241 31.
909125381 26.
272753738
jobs8
seconds elapsed 35.
651403851 36.
027596308 29.
108024711
jobs9
seconds elapsed 40.
569362365 40.
024227989 32.
898204012
jobs10
seconds elapsed 44.
673112304 43.
874898137 35.
632952191
Please see
Link: http://marc.info/?l=linux-kernel&m=146166970727530
Link: http://marc.info/?l=linux-kernel&m=146174716719650
for more test results (under low memory conditions).
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Suggested-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: I433a03617eda51d3ad3b3ffe93c5e1794d477893
Sergey Senozhatsky [Fri, 20 May 2016 23:59:48 +0000 (16:59 -0700)]
zsmalloc: require GFP in zs_malloc()
Pass GFP flags to zs_malloc() instead of using a fixed mask supplied to
zs_create_pool(), so we can be more flexible, but, more importantly, we
need this to switch zram to per-cpu compression streams -- zram will try
to allocate handle with preemption disabled in a fast path and switch to
a slow path (using different gfp mask) if the fast one has failed.
Apart from that, this also align zs_malloc() interface with zspool/zbud.
[sergey.senozhatsky@gmail.com: pass GFP flags to zs_malloc() instead of using a fixed mask]
Link: http://lkml.kernel.org/r/20160429150942.GA637@swordfish
Link: http://lkml.kernel.org/r/20160429150942.GA637@swordfish
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: I58cd72a0b1925203ec7d9a3962cc6c3422e8ac96
Sergey Senozhatsky [Thu, 14 Jan 2016 23:22:35 +0000 (15:22 -0800)]
zram/zcomp: do not zero out zcomp private pages
Do not __GFP_ZERO allocated zcomp ->private pages. We keep allocated
streams around and use them for read/write requests, so we supply a
zeroed out ->private to compression algorithm as a scratch buffer only
once -- the first time we use that stream. For the rest of IO requests
served by this stream ->private usually contains some temporarily data
from the previous requests.
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: Icf20e8a707f7c925952034bc691e6ae593ef6f22
Minchan Kim [Thu, 14 Jan 2016 23:22:32 +0000 (15:22 -0800)]
zram: pass gfp from zcomp frontend to backend
Each zcomp backend uses own gfp flag but it's pointless because the
context they could be called is driven by upper layer(ie, zcomp
frontend). As well, zcomp frondend could call them in different
context. One context(ie, zram init part) is it should be better to make
sure successful allocation other context(ie, further stream allocation
part for accelarating I/O speed) is just optional so let's pass gfp down
from driver (ie, zcomp frontend) like normal MM convention.
[sergey.senozhatsky@gmail.com: add missing __vmalloc zero and highmem gfps]
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: Ie60f5399a4ec8ad54a0cd7575977d79e6eda6e43
Sergey Senozhatsky [Fri, 14 Aug 2015 22:35:19 +0000 (15:35 -0700)]
zram: fix pool name truncation
zram_meta_alloc() constructs a pool name for zs_create_pool() call as
snprintf(pool_name, sizeof(pool_name), "zram%d", device_id);
However, it defines pool name buffer to be only 8 bytes long (minus
trailing zero), which means that we can have only 1000 pool names: zram0
-- zram999.
With CONFIG_ZSMALLOC_STAT enabled an attempt to create a device zram1000
can fail if device zram100 already exists, because snprintf() will
truncate new pool name to zram100 and pass it debugfs_create_dir(),
causing:
debugfs dir <zram100> creation failed
zram: Error creating memory pool
... and so on.
Fix it by passing zram->disk->disk_name to zram_meta_alloc() instead of
divice_id. We construct zram%d name earlier and keep it as a ->disk_name,
no need to snprintf() it again.
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: I4c42cc1134b09f93a824d15e589e66fee28af539
Sergey Senozhatsky [Thu, 25 Jun 2015 22:00:32 +0000 (15:00 -0700)]
zram: check comp algorithm availability earlier
Improvement idea by Marcin Jabrzyk.
comp_algorithm_store() silently accepts any supplied algorithm name,
because zram performs algorithm availability check later, during the
device configuration phase in disksize_store() and emits the following
error:
"zram: Cannot initialise %s compressing backend"
this error line is somewhat generic and, besides, can indicate a failed
attempt to allocate compression backend's working buffers.
add algorithm availability check to comp_algorithm_store():
echo lzz > /sys/block/zram0/comp_algorithm
-bash: echo: write error: Invalid argument
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reported-by: Marcin Jabrzyk <m.jabrzyk@samsung.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: Iece40446b18e7e37fa4ced3459d40ddbb9984297
Sergey Senozhatsky [Thu, 25 Jun 2015 22:00:29 +0000 (15:00 -0700)]
zram: cut trailing newline in algorithm name
Supplied sysfs values sometimes contain new-line symbols (echo vs. echo
-n), which we also copy as a compression algorithm name. it works fine
when we lookup for compression algorithm, because we use sysfs_streq()
which takes care of new line symbols. however, it doesn't look nice when
we print compression algorithm name if zcomp_create() failed:
zram: Cannot initialise LXZ
compressing backend
cut trailing new-line, so the error string will look like
zram: Cannot initialise LXZ compressing backend
we also now can replace sysfs_streq() in zcomp_available_show() with
strcmp().
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: If00faf29cca63fc20c8caea57ababe97a16c4a7a
Sergey Senozhatsky [Thu, 25 Jun 2015 22:00:27 +0000 (15:00 -0700)]
zram: cosmetic zram_bvec_write() cleanup
`bool locked' local variable tells us if we should perform
zcomp_strm_release() or not (jumped to `out' label before
zcomp_strm_find() occurred), which is equivalent to `zstrm' being or not
being NULL. remove `locked' and check `zstrm' instead.
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: Id55b0c47aa6b508ac5138f90fafd92a82d2fe9d8
Danny Wood [Tue, 9 Feb 2021 15:43:46 +0000 (15:43 +0000)]
universal7580: abov_touchkey: Fix firmware loading of ft1604 devices
Also fix ft1604 firmware versions of other A510x variants
Change-Id: I33004e2f62b5954bf7b4f2ab6337c0ce39907287
Danny Wood [Fri, 10 Sep 2021 08:48:34 +0000 (09:48 +0100)]
universal7580: Enable FB_WINDOW_UPDATE on all devices
Change-Id: I66f2a7fb7e0128518e3df75f0b9653500169334d
Danny Wood [Fri, 27 Aug 2021 15:42:04 +0000 (16:42 +0100)]
universal7580: Backport LZ4 compression from Linux 4.1.52
* Enable it in our configs
Change-Id: Ideb285d3678e479ede33b56e30e99062ec4884d6
Danny Wood [Tue, 3 Aug 2021 13:48:01 +0000 (14:48 +0100)]
universal7580: enable ZRAM
Change-Id: Icc376054fee0160ceb35edd09e5e733dff5eb52a
Ganesh Mahendran [Thu, 12 Feb 2015 23:00:51 +0000 (15:00 -0800)]
mm/zpool: add name argument to create zpool
Currently the underlay of zpool: zsmalloc/zbud, do not know who creates
them. There is not a method to let zsmalloc/zbud find which caller they
belong to.
Now we want to add statistics collection in zsmalloc. We need to name the
debugfs dir for each pool created. The way suggested by Minchan Kim is to
use a name passed by caller(such as zram) to create the zsmalloc pool.
/sys/kernel/debug/zsmalloc/zram0
This patch adds an argument `name' to zs_create_pool() and other related
functions.
Change-Id: Ic197610141c0dc2711c239db01f735a938bf3808
Signed-off-by: Ganesh Mahendran <opensource.ganesh@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Seth Jennings <sjennings@variantweb.net>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Dan Streetman <ddstreet@ieee.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sultan Qasim Khan [Sun, 6 Sep 2015 21:59:58 +0000 (17:59 -0400)]
block: zram: Backport from Linux 4.1.52
Change-Id: I23f6f75979077992298d848efd79a6efc0d776bd
Sultan Qasim Khan [Sun, 6 Sep 2015 22:12:58 +0000 (18:12 -0400)]
mm: zsmalloc: backport from Linux 4.1.52
Change-Id: I3960e31f889d643e87b99fe7a88a1e0ca402d6cd
Danny Wood [Mon, 10 May 2021 10:16:01 +0000 (11:16 +0100)]
Add initial support for the Samsung On8 (J710FNDDU1BRJ1)
Change-Id: Ie7ca517efb81c758499d08a4329d46fb089c960c
Matt Wagantall [Thu, 2 Jul 2015 02:43:51 +0000 (19:43 -0700)]
net: PPPoPNS: use updated data_ready API definition
data_ready(struct sock *) no longer takes a length argument.
Remove length argument in the implementation of this function
in PPPoPNS. It was never used anyway.
Change-Id: I457cedd375a490dd85e60f172b4122b4c7ba36f0
Signed-off-by: Matt Wagantall <mattw@codeaurora.org>
Eric Dumazet [Fri, 22 May 2015 04:51:19 +0000 (21:51 -0700)]
tcp: fix a potential deadlock in tcp_get_info()
Taking socket spinlock in tcp_get_info() can deadlock, as
inet_diag_dump_icsk() holds the &hashinfo->ehash_locks[i],
while packet processing can use the reverse locking order.
We could avoid this locking for TCP_LISTEN states, but lockdep would
certainly get confused as all TCP sockets share same lockdep classes.
[ 523.722504] ======================================================
[ 523.728706] [ INFO: possible circular locking dependency detected ]
[ 523.734990] 4.1.0-dbg-DEV #1676 Not tainted
[ 523.739202] -------------------------------------------------------
[ 523.745474] ss/18032 is trying to acquire lock:
[ 523.750002] (slock-AF_INET){+.-...}, at: [<
ffffffff81669d44>] tcp_get_info+0x2c4/0x360
[ 523.758129]
[ 523.758129] but task is already holding lock:
[ 523.763968] (&(&hashinfo->ehash_locks[i])->rlock){+.-...}, at: [<
ffffffff816bcb75>] inet_diag_dump_icsk+0x1d5/0x6c0
[ 523.774661]
[ 523.774661] which lock already depends on the new lock.
[ 523.774661]
[ 523.782850]
[ 523.782850] the existing dependency chain (in reverse order) is:
[ 523.790326]
-> #1 (&(&hashinfo->ehash_locks[i])->rlock){+.-...}:
[ 523.796599] [<
ffffffff811126bb>] lock_acquire+0xbb/0x270
[ 523.802565] [<
ffffffff816f5868>] _raw_spin_lock+0x38/0x50
[ 523.808628] [<
ffffffff81665af8>] __inet_hash_nolisten+0x78/0x110
[ 523.815273] [<
ffffffff816819db>] tcp_v4_syn_recv_sock+0x24b/0x350
[ 523.822067] [<
ffffffff81684d41>] tcp_check_req+0x3c1/0x500
[ 523.828199] [<
ffffffff81682d09>] tcp_v4_do_rcv+0x239/0x3d0
[ 523.834331] [<
ffffffff816842fe>] tcp_v4_rcv+0xa8e/0xc10
[ 523.840202] [<
ffffffff81658fa3>] ip_local_deliver_finish+0x133/0x3e0
[ 523.847214] [<
ffffffff81659a9a>] ip_local_deliver+0xaa/0xc0
[ 523.853440] [<
ffffffff816593b8>] ip_rcv_finish+0x168/0x5c0
[ 523.859624] [<
ffffffff81659db7>] ip_rcv+0x307/0x420
Lets use u64_sync infrastructure instead. As a bonus, 64bit
arches get optimized, as these are nop for them.
Fixes:
0df48c26d841 ("tcp: add tcpi_bytes_acked to tcp_info")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change-Id: Ic33ba5a9c4ca5dfd3d2224b7c0ed9fbe9eccd0ca
Marcelo Ricardo Leitner [Wed, 20 May 2015 23:35:41 +0000 (16:35 -0700)]
tcp: add tcpi_segs_in and tcpi_segs_out to tcp_info
This patch tracks the total number of inbound and outbound segments on a
TCP socket. One may use this number to have an idea on connection
quality when compared against the retransmissions.
RFC4898 named these : tcpEStatsPerfSegsIn and tcpEStatsPerfSegsOut
These are a 32bit field each and can be fetched both from TCP_INFO
getsockopt() if one has a handle on a TCP socket, or from inet_diag
netlink facility (iproute2/ss patch will follow)
Note that tp->segs_out was placed near tp->snd_nxt for good data
locality and minimal performance impact, while tp->segs_in was placed
near tp->bytes_received for the same reason.
Join work with Eric Dumazet.
Note that received SYN are accounted on the listener, but sent SYNACK
are not accounted.
Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change-Id: I7bf746ad3f9fcc01f672f7dc110aa0c43d9ee042
Eric Dumazet [Tue, 28 Apr 2015 22:28:18 +0000 (15:28 -0700)]
tcp: add tcpi_bytes_received to tcp_info
This patch tracks total number of payload bytes received on a TCP socket.
This is the sum of all changes done to tp->rcv_nxt
RFC4898 named this : tcpEStatsAppHCThruOctetsReceived
This is a 64bit field, and can be fetched both from TCP_INFO
getsockopt() if one has a handle on a TCP socket, or from inet_diag
netlink facility (iproute2/ss patch will follow)
Note that tp->bytes_received was placed near tp->rcv_nxt for
best data locality and minimal performance impact.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Matt Mathis <mattmathis@google.com>
Cc: Eric Salo <salo@google.com>
Cc: Martin Lau <kafai@fb.com>
Cc: Chris Rapier <rapier@psc.edu>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change-Id: Ieaf3d4c7d1bea0ac8906cc0702bc22c73314c889
David S. Miller [Fri, 11 Apr 2014 20:15:36 +0000 (16:15 -0400)]
BACKPORT net: Fix use after free by removing length arg from sk_data_ready callbacks.
Several spots in the kernel perform a sequence like:
skb_queue_tail(&sk->s_receive_queue, skb);
sk->sk_data_ready(sk, skb->len);
But at the moment we place the SKB onto the socket receive queue it
can be consumed and freed up. So this skb->len access is potentially
to freed up memory.
Furthermore, the skb->len can be modified by the consumer so it is
possible that the value isn't accurate.
And finally, no actual implementation of this callback actually uses
the length argument. And since nobody actually cared about it's
value, lots of call sites pass arbitrary values in such as '0' and
even '1'.
So just remove the length argument from the callback, that way there
is no confusion whatsoever and all of these use-after-free cases get
fixed as a side effect.
Based upon a patch by Eric Dumazet and his suggestion to audit this
issue tree-wide.
Change-Id: Ia7443b38da20849b684957049812805ff25ab1f0
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: elektroschmock <elektroschmock78@googlemail.com>
Tariq Saeed [Thu, 3 Apr 2014 21:47:11 +0000 (14:47 -0700)]
ocfs2/o2net: o2net_listen_data_ready should do nothing if socket state is not TCP_LISTEN
Orabug:
17330860
When accepting an incomming connection o2net_accept_one clones a child
data socket from the parent listening socket. It then proceeds to setup
the child with callback o2net_data_ready() and sk_user_data to NULL. If
data arrives in this window, o2net_listen_data_ready will be called with
some non-deterministic value in sk_user_data (not inherited). We panic
when we page fault on sk_user_data -- in parent it is
sock_def_readable().
The fix is to recognize that this is a data socket being set up by
looking at the socket state and do nothing.
Change-Id: Iffbed59cbec1499e67e6b41ddfdcdc663957c6f5
Signed-off-by: Tariq Saseed <tariq.x.saeed@oracle.com>
Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric Dumazet [Tue, 28 Apr 2015 22:28:17 +0000 (15:28 -0700)]
tcp: add tcpi_bytes_acked to tcp_info
This patch tracks total number of bytes acked for a TCP socket.
This is the sum of all changes done to tp->snd_una, and allows
for precise tracking of delivered data.
RFC4898 named this : tcpEStatsAppHCThruOctetsAcked
This is a 64bit field, and can be fetched both from TCP_INFO
getsockopt() if one has a handle on a TCP socket, or from inet_diag
netlink facility (iproute2/ss patch will follow)
Note that tp->bytes_acked was placed near tp->snd_una for
best data locality and minimal performance impact.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Cc: Matt Mathis <mattmathis@google.com>
Cc: Eric Salo <salo@google.com>
Cc: Martin Lau <kafai@fb.com>
Cc: Chris Rapier <rapier@psc.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change-Id: I4c65cc5034eb486ebd6c36783b35cab3d311be34
Eric Dumazet [Thu, 13 Feb 2014 22:27:40 +0000 (14:27 -0800)]
tcp: add pacing_rate information into tcp_info
Add two new fields to struct tcp_info, to report sk_pacing_rate
and sk_max_pacing_rate to monitoring applications, as ss from iproute2.
User exported fields are 64bit, even if kernel is currently using 32bit
fields.
lpaa5:~# ss -i
..
skmem:(r0,rb357120,t0,tb2097152,f1584,w1980880,o0,bl0) ts sack cubic
wscale:6,6 rto:400 rtt:0.875/0.75 mss:1448 cwnd:1 ssthresh:12 send
13.2Mbps pacing_rate 3336.2Mbps unacked:15 retrans:1/5448 lost:15
rcv_space:29200
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change-Id: Ib4f3ace4059fa799fe1af94713c31efacd593aac
Eric Dumazet [Sun, 29 Sep 2013 08:12:40 +0000 (01:12 -0700)]
net: add missing sk_max_pacing_rate doc
Warning(include/net/sock.h:411): No description found for parameter
'sk_max_pacing_rate'
Lets please "make htmldocs" and kbuild bot.
Reported-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change-Id: Ie72d80b73c2c2cdc57d9866cd932843f086f9d05
Eric Dumazet [Tue, 24 Sep 2013 15:20:52 +0000 (08:20 -0700)]
net: introduce SO_MAX_PACING_RATE
As mentioned in commit
afe4fd062416b ("pkt_sched: fq: Fair Queue packet
scheduler"), this patch adds a new socket option.
SO_MAX_PACING_RATE offers the application the ability to cap the
rate computed by transport layer. Value is in bytes per second.
u32 val =
1000000;
setsockopt(sockfd, SOL_SOCKET, SO_MAX_PACING_RATE, &val, sizeof(val));
To be effectively paced, a flow must use FQ packet scheduler.
Note that a packet scheduler takes into account the headers for its
computations. The effective payload rate depends on MSS and retransmits
if any.
I chose to make this pacing rate a SOL_SOCKET option instead of a
TCP one because this can be used by other protocols.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Steinar H. Gunderson <sesse@google.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change-Id: I5fda6798068d171868fd3d52873778c993f20a9f
Eric W. Biederman [Fri, 14 Mar 2014 04:26:42 +0000 (21:26 -0700)]
net: Replace u64_stats_fetch_begin_bh to u64_stats_fetch_begin_irq
Replace the bh safe variant with the hard irq safe variant.
We need a hard irq safe variant to deal with netpoll transmitting
packets from hard irq context, and we need it in most if not all of
the places using the bh safe variant.
Except on 32bit uni-processor the code is exactly the same so don't
bother with a bh variant, just have a hard irq safe variant that
everyone can use.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change-Id: I53e4597523d89796b22b0112f02a6ee1a2f6d256
Chris Metcalf [Thu, 25 Jul 2013 16:41:15 +0000 (12:41 -0400)]
tile: handle 64-bit statistics in tilepro network driver
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change-Id: I5b22e93ad81dd8f57637cc38b8fcb5f0dd3e2372
WANG Cong [Fri, 14 Feb 2014 23:10:46 +0000 (15:10 -0800)]
openvswitch: rename ->sync to ->syncp
Openvswitch defines u64_stats_sync as ->sync rather than ->syncp,
so fails to compile with netdev_alloc_pcpu_stats(). So just rename it to ->syncp.
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Fixes:
1c213bd24ad04f4430031 (net: introduce netdev_alloc_pcpu_stats() for drivers)
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change-Id: I12536247d88850e64d4ef9b71a3ec4332aca32c2
Li RongQing [Thu, 2 Jan 2014 00:49:36 +0000 (08:49 +0800)]
ipv6: fix the use of pcpu_tstats in sit
when read/write the 64bit data, the correct lock should be hold.
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change-Id: Ie289d00d1761200ae14336c38324c2b8ba179ec7
John Stultz [Mon, 7 Oct 2013 22:51:58 +0000 (15:51 -0700)]
net: Explicitly initialize u64_stats_sync structures for lockdep
In order to enable lockdep on seqcount/seqlock structures, we
must explicitly initialize any locks.
The u64_stats_sync structure, uses a seqcount, and thus we need
to introduce a u64_stats_init() function and use it to initialize
the structure.
This unfortunately adds a lot of fairly trivial initialization code
to a number of drivers. But the benefit of ensuring correctness makes
this worth while.
Because these changes are required for lockdep to be enabled, and the
changes are quite trivial, I've not yet split this patch out into 30-some
separate patches, as I figured it would be better to get the various
maintainers thoughts on how to best merge this change along with
the seqcount lockdep enablement.
Feedback would be appreciated!
Signed-off-by: John Stultz <john.stultz@linaro.org>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: James Morris <jmorris@namei.org>
Cc: Jesse Gross <jesse@nicira.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Mirko Lindner <mlindner@marvell.com>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Roger Luethi <rl@hellgate.ch>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Simon Horman <horms@verge.net.au>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Cc: Wensong Zhang <wensong@linux-vs.org>
Cc: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/1381186321-4906-2-git-send-email-john.stultz@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Change-Id: If41808d130a6c0235faf7c64f17c7beaf33d40ae
Hannes Frederic Sowa [Sat, 11 Jun 2016 18:32:06 +0000 (20:32 +0200)]
ipv6: fix endianness error in icmpv6_err
IPv6 ping socket error handler doesn't correctly convert the new 32 bit
mtu to host endianness before using.
[Cherry-pick of net
dcb94b88c09ce82a80e188d49bcffdc83ba215a6]
Bug:
29370996
Change-Id: Idf475e2555252d91e1d3fa92071a661242780074
Cc: Lorenzo Colitti <lorenzo@google.com>
Fixes:
6d0bfe22611602f ("net: ipv6: Add IPv6 support to the ping socket.")
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kevin F. Haggerty [Fri, 11 Dec 2020 14:27:25 +0000 (07:27 -0700)]
fs: sdfat: Update to version 2.4.5
* Samsung source G981USQU1CTKH
Signed-off-by: Kevin F. Haggerty <haggertk@lineageos.org>
Change-Id: I79b75d2e47e9be33b311b8d72ac92c66b45a7df1
Kevin F. Haggerty [Fri, 13 Dec 2019 23:38:33 +0000 (16:38 -0700)]
fs: sdfat: Update to version 2.3.0
* Samsung version G975FXXU3BSKO
Change-Id: I11a2c361ba70441d2a75188a4f91d3cd324d1a9e
Signed-off-by: Kevin F. Haggerty <haggertk@lineageos.org>
Kevin F. Haggerty [Sat, 12 Jan 2019 16:10:18 +0000 (09:10 -0700)]
fs: sdfat: Update to version 2.1.8
* Samsung version G960FXXU2CRLI
Change-Id: Ib935f8a5eae8d6145e7b585cc9239caef1d7216b
Signed-off-by: Kevin F. Haggerty <haggertk@lineageos.org>
Paul Keith [Wed, 28 Mar 2018 17:52:29 +0000 (19:52 +0200)]
fs: sdfat: Add MODULE_ALIAS_FS for supported filesystems
* This is the proper thing to do for filesystem drivers
Change-Id: I109b201d85e324cc0a72c3fcd09df4a3e1703042
Signed-off-by: Paul Keith <javelinanddart@gmail.com>
Matthias Kaehlcke [Thu, 16 Mar 2017 22:26:52 +0000 (15:26 -0700)]
UPSTREAM: selinux: Remove unnecessary check of array base in selinux_set_mapping()
'perms' will never be NULL since it isn't a plain pointer but an array
of u32 values.
This fixes the following warning when building with clang:
security/selinux/ss/services.c:158:16: error: address of array
'p_in->perms' will always evaluate to 'true'
[-Werror,-Wpointer-bool-conversion]
while (p_in->perms && p_in->perms[k]) {
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Link: https://git.kernel.org/linus/342e91578eb6909529bc7095964cd44b9c057c4e
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Change-Id: Iacc441a51a908c5fc3fcbd7874802b8eb889f828
Todd Kjos [Tue, 21 Jul 2020 04:14:33 +0000 (21:14 -0700)]
binder: fix UAF when releasing todo list
When releasing a thread todo list when tearing down
a binder_proc, the following race was possible which
could result in a use-after-free:
1. Thread 1: enter binder_release_work from binder_thread_release
2. Thread 2: binder_update_ref_for_handle() calls binder_dec_node_ilocked()
3. Thread 2: dec nodeA --> 0 (will free node)
4. Thread 1: ACQ inner_proc_lock
5. Thread 2: block on inner_proc_lock
6. Thread 1: dequeue work (BINDER_WORK_NODE, part of nodeA)
7. Thread 1: REL inner_proc_lock
8. Thread 2: ACQ inner_proc_lock
9. Thread 2: todo list cleanup, but work was already dequeued
10. Thread 2: free node
11. Thread 2: REL inner_proc_lock
12. Thread 1: deref w->type (UAF)
The problem was that for a BINDER_WORK_NODE, the binder_work element
must not be accessed after releasing the inner_proc_lock while
processing the todo list elements since another thread might be
handling a deref on the node containing the binder_work element
leading to the node being freed.
Bug:
161151868
Signed-off-by: Todd Kjos <tkjos@google.com>
Change-Id: I4ae752abfe1aa38872be6f266ddd271802952625
Git-repo: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Git-commit:
cebe72f19bebdee8fc9f1b31dd21a88a259ff419
Signed-off-by: Alam Md Danish <amddan@codeaurora.org>
Signed-off-by: Rahul Shahare <rshaha@codeaurora.org>
Todd Kjos [Tue, 8 Aug 2017 22:48:36 +0000 (15:48 -0700)]
UPSTREAM: binder: fix incorrect cmd to binder_stat_br
commit
26549d177410 ("binder: guarantee txn complete / errors delivered
in-order") passed the locally declared and undefined cmd
to binder_stat_br() which results in a bogus cmd field in a trace
event and BR stats are incremented incorrectly.
Change to use e->cmd which has been initialized.
Signed-off-by: Todd Kjos <tkjos@google.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes:
26549d177410 ("binder: guarantee txn complete / errors delivered in-order")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit
4f9adc8f91ba996374cd9487ecd1180fa99b9438)
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Id8b0eefbee754408eb97ffb7050389aeeecb2214
Amir Goldstein [Sun, 22 Dec 2019 18:45:28 +0000 (20:45 +0200)]
locks: print unsigned ino in /proc/locks
commit
98ca480a8f22fdbd768e3dad07024c8d4856576c upstream.
An ino is unsigned, so display it as such in /proc/locks.
Cc: stable@vger.kernel.org
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Change-Id: I250a495fe3fc809e880535347f462fe552644edf
Signed-off-by: Kevin F. Haggerty <haggertk@lineageos.org>
Jeff Layton [Tue, 22 Apr 2014 12:24:32 +0000 (08:24 -0400)]
locks: rename FL_FILE_PVT and IS_FILE_PVT to use "*_OFDLCK" instead
File-private locks have been re-christened as "open file description"
locks. Finish the symbol name cleanup in the internal implementation.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Change-Id: Iee48047540a7d8fefb5078cc005ae9ea8994f521
Signed-off-by: Kevin F. Haggerty <haggertk@lineageos.org>
Jeff Layton [Tue, 22 Apr 2014 12:23:58 +0000 (08:23 -0400)]
locks: rename file-private locks to "open file description locks"
File-private locks have been merged into Linux for v3.15, and *now*
people are commenting that the name and macro definitions for the new
file-private locks suck.
...and I can't even disagree. The names and command macros do suck.
We're going to have to live with these for a long time, so it's
important that we be happy with the names before we're stuck with them.
The consensus on the lists so far is that they should be rechristened as
"open file description locks".
The name isn't a big deal for the kernel, but the command macros are not
visually distinct enough from the traditional POSIX lock macros. The
glibc and documentation folks are recommending that we change them to
look like F_OFD_{GETLK|SETLK|SETLKW}. That lessens the chance that a
programmer will typo one of the commands wrong, and also makes it easier
to spot this difference when reading code.
This patch makes the following changes that I think are necessary before
v3.15 ships:
1) rename the command macros to their new names. These end up in the uapi
headers and so are part of the external-facing API. It turns out that
glibc doesn't actually use the fcntl.h uapi header, but it's hard to
be sure that something else won't. Changing it now is safest.
2) make the the /proc/locks output display these as type "OFDLCK"
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Carlos O'Donell <carlos@redhat.com>
Cc: Stefan Metzmacher <metze@samba.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Frank Filz <ffilzlnx@mindspring.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Change-Id: Ia975197281d4c80a4ad420d7621896d2f369cef6
Signed-off-by: Kevin F. Haggerty <haggertk@lineageos.org>
Jeff Layton [Mon, 3 Feb 2014 17:13:10 +0000 (12:13 -0500)]
locks: add new fcntl cmd values for handling file private locks
Due to some unfortunate history, POSIX locks have very strange and
unhelpful semantics. The thing that usually catches people by surprise
is that they are dropped whenever the process closes any file descriptor
associated with the inode.
This is extremely problematic for people developing file servers that
need to implement byte-range locks. Developers often need a "lock
management" facility to ensure that file descriptors are not closed
until all of the locks associated with the inode are finished.
Additionally, "classic" POSIX locks are owned by the process. Locks
taken between threads within the same process won't conflict with one
another, which renders them useless for synchronization between threads.
This patchset adds a new type of lock that attempts to address these
issues. These locks conflict with classic POSIX read/write locks, but
have semantics that are more like BSD locks with respect to inheritance
and behavior on close.
This is implemented primarily by changing how fl_owner field is set for
these locks. Instead of having them owned by the files_struct of the
process, they are instead owned by the filp on which they were acquired.
Thus, they are inherited across fork() and are only released when the
last reference to a filp is put.
These new semantics prevent them from being merged with classic POSIX
locks, even if they are acquired by the same process. These locks will
also conflict with classic POSIX locks even if they are acquired by
the same process or on the same file descriptor.
The new locks are managed using a new set of cmd values to the fcntl()
syscall. The initial implementation of this converts these values to
"classic" cmd values at a fairly high level, and the details are not
exposed to the underlying filesystem. We may eventually want to push
this handing out to the lower filesystem code but for now I don't
see any need for it.
Also, note that with this implementation the new cmd values are only
available via fcntl64() on 32-bit arches. There's little need to
add support for legacy apps on a new interface like this.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Change-Id: I35691bdfed9cadcbbcb6ff6804d9eea1db661ddc
Signed-off-by: Kevin F. Haggerty <haggertk@lineageos.org>
Jeff Layton [Mon, 3 Feb 2014 17:13:09 +0000 (12:13 -0500)]
locks: pass the cmd value to fcntl_getlk/getlk64
Once we introduce file private locks, we'll need to know what cmd value
was used, as that affects the ownership and whether a conflict would
arise.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Change-Id: Iaeb8233ae25bde5ef0049118ff94e4a9e0f02214
Signed-off-by: Kevin F. Haggerty <haggertk@lineageos.org>
Jeff Layton [Mon, 3 Feb 2014 17:13:09 +0000 (12:13 -0500)]
locks: report l_pid as -1 for FL_FILE_PVT locks
FL_FILE_PVT locks are no longer tied to a particular pid, and are
instead inheritable by child processes. Report a l_pid of '-1' for
these sorts of locks since the pid is somewhat meaningless for them.
This precedent comes from FreeBSD. There, POSIX and flock() locks can
conflict with one another. If fcntl(F_GETLK, ...) returns a lock set
with flock() then the l_pid member cannot be a process ID because the
lock is not held by a process as such.
Acked-by: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Change-Id: I7d702fcaaaf8592356926d51b60e53ee217ca747
Signed-off-by: Kevin F. Haggerty <haggertk@lineageos.org>
Jeff Layton [Mon, 3 Feb 2014 17:13:09 +0000 (12:13 -0500)]
locks: make /proc/locks show IS_FILE_PVT locks as type "FLPVT"
In a later patch, we'll be adding a new type of lock that's owned by
the struct file instead of the files_struct. Those sorts of locks
will be flagged with a new FL_FILE_PVT flag.
Report these types of locks as "FLPVT" in /proc/locks to distinguish
them from "classic" POSIX locks.
Acked-by: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Change-Id: Id0b6d9c7a947b512e5683ad3b6188d73582c2de9
Signed-off-by: Kevin F. Haggerty <haggertk@lineageos.org>
Jeff Layton [Mon, 3 Feb 2014 17:13:08 +0000 (12:13 -0500)]
locks: rename locks_remove_flock to locks_remove_file
This function currently removes leases in addition to flock locks and in
a later patch we'll have it deal with file-private locks too. Rename it
to locks_remove_file to indicate that it removes locks that are
associated with a particular struct file, and not just flock locks.
Acked-by: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Change-Id: I1289cfbc02eb778532e984a29adffb02a9370cc1
Signed-off-by: Kevin F. Haggerty <haggertk@lineageos.org>
Yunlian Jiang [Thu, 11 Apr 2013 18:02:50 +0000 (11:02 -0700)]
dtc: remove extra parentheses to pass clang check
BUG=chromium:230457
TEST=the clang warning is gone
Change-Id: If9536c181d564e6ee3c1b5777dd78ad3a57a16c7
Reviewed-on: https://gerrit.chromium.org/gerrit/47879
Reviewed-by: Han Shen <shenhan@chromium.org>
Commit-Queue: Yunlian Jiang <yunlian@chromium.org>
Tested-by: Yunlian Jiang <yunlian@chromium.org>
Masahiro Yamada [Thu, 22 Nov 2018 04:28:42 +0000 (13:28 +0900)]
modpost: file2alias: check prototype of handler
[ Upstream commit
f880eea68fe593342fa6e09be9bb661f3c297aec ]
Use specific prototype instead of an opaque pointer so that the
compiler can catch function prototype mismatch.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Change-Id: I522d2ef030c5a98fc06c3b9c93c7be34b750d037
Masahiro Yamada [Thu, 22 Nov 2018 04:28:41 +0000 (13:28 +0900)]
modpost: file2alias: go back to simple devtable lookup
[ Upstream commit
ec91e78d378cc5d4b43805a1227d8e04e5dfa17d ]
Commit
e49ce14150c6 ("modpost: use linker section to generate table.")
was not so cool as we had expected first; it ended up with ugly section
hacks when commit
dd2a3acaecd7 ("mod/file2alias: make modpost compile
on darwin again") came in.
Given a certain degree of unknowledge about the link stage of host
programs, I really want to see simple, stupid table lookup so that
this works in the same way regardless of the underlying executable
format.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Change-Id: If4290e58a2c34a7f69e2aa8e9ec0b07f15792d21
Joel Fernandes [Wed, 19 Dec 2018 17:54:40 +0000 (09:54 -0800)]
BACKPORT: mm: Add an F_SEAL_FUTURE_WRITE seal to memfd
Android uses ashmem for sharing memory regions. We are looking forward
to migrating all usecases of ashmem to memfd so that we can possibly
remove the ashmem driver in the future from staging while also
benefiting from using memfd and contributing to it. Note staging drivers
are also not ABI and generally can be removed at anytime.
One of the main usecases Android has is the ability to create a region
and mmap it as writeable, then add protection against making any
"future" writes while keeping the existing already mmap'ed
writeable-region active. This allows us to implement a usecase where
receivers of the shared memory buffer can get a read-only view, while
the sender continues to write to the buffer.
See CursorWindow documentation in Android for more details:
https://developer.android.com/reference/android/database/CursorWindow
This usecase cannot be implemented with the existing F_SEAL_WRITE seal.
To support the usecase, this patch adds a new F_SEAL_FUTURE_WRITE seal
which prevents any future mmap and write syscalls from succeeding while
keeping the existing mmap active.
Verified with test program at: https://lore.kernel.org/patchwork/patch/
1008117/
link: https://lore.kernel.org/patchwork/patch/1014892/
Bug:
113362644
Change-Id: If7424db3b64372932d455f0219cd9df613fec1d4
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Joel Fernandes <joelaf@google.com>
Steven Rostedt (VMware) [Fri, 24 Feb 2017 22:59:10 +0000 (14:59 -0800)]
mm/shmem.c: fix unlikely() test of info->seals to test only for WRITE and GROW
Running my likely/unlikely profiler, I discovered that the test in
shmem_write_begin() that tests for info->seals as unlikely, is always
incorrect. This is because shmem_get_inode() sets info->seals to have
F_SEAL_SEAL set by default, and it is unlikely to be cleared when
shmem_write_begin() is called. Thus, the if statement is very likely.
But as the if statement block only cares about F_SEAL_WRITE and
F_SEAL_GROW, change the test to only test those two bits.
Link: http://lkml.kernel.org/r/20170203105656.7aec6237@gandalf.local.home
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: I83b8fc6ebae581486df16842713ba83a37e3b858
Kyle Harrison [Mon, 7 Oct 2019 11:25:34 +0000 (12:25 +0100)]
ARM: Fix build after memfd_create syscall
Error: __NR_syscalls is not equal to the size of the syscall table
Change-Id: I26519fb6be3882893ca4e82d8a011a6abe1a6f53
Russell King [Mon, 1 Jul 2019 23:57:25 +0000 (02:57 +0300)]
ARM: wire up memfd_create syscall
Add the memfd_create syscall to ARM.
Change-Id: I0cb81d70e5a224fde6a5d33c9a04c40c4c184a9e
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Angelo G. Del Regno [Tue, 14 Mar 2017 23:56:03 +0000 (00:56 +0100)]
mm: shmem: Reschedule by unlocking and relocking RCU because of missing API
The commit introducing the call to cond_resched_rcu() is backported
from a recent kernel version, which has got some very good updates
to the RCU, including a new function cond_resched_rcu which is
doing not-so-complicated rescheduling stuff.
Kernel 3.10 hasn't got any of these and porting would be overkill.
On our current code base, the RCU management is pretty stupid
compared to newer kernels, so it's just ok to reschedule by just
unlocking the RCU and relocking it: this will allow to update its
status and the drivers will be happy.
Change-Id: Iadf407ccaccee64ffeed5e292d17f6b2f7e6ead4
David Herrmann [Fri, 8 Aug 2014 21:25:36 +0000 (14:25 -0700)]
shm: wait for pins to be released when sealing
If we set SEAL_WRITE on a file, we must make sure there cannot be any
ongoing write-operations on the file. For write() calls, we simply lock
the inode mutex, for mmap() we simply verify there're no writable
mappings. However, there might be pages pinned by AIO, Direct-IO and
similar operations via GUP. We must make sure those do not write to the
memfd file after we set SEAL_WRITE.
As there is no way to notify GUP users to drop pages or to wait for them
to be done, we implement the wait ourself: When setting SEAL_WRITE, we
check all pages for their ref-count. If it's bigger than 1, we know
there's some user of the page. We then mark the page and wait for up to
150ms for those ref-counts to be dropped. If the ref-counts are not
dropped in time, we refuse the seal operation.
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Ryan Lortie <desrt@desrt.ca>
Cc: Lennart Poettering <lennart@poettering.net>
Cc: Daniel Mack <zonque@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: I44afbd3f0af72777702c317737f8d16c566bd240
David Herrmann [Fri, 8 Aug 2014 21:25:25 +0000 (14:25 -0700)]
mm: allow drivers to prevent new writable mappings
This patch (of 6):
The i_mmap_writable field counts existing writable mappings of an
address_space. To allow drivers to prevent new writable mappings, make
this counter signed and prevent new writable mappings if it is negative.
This is modelled after i_writecount and DENYWRITE.
This will be required by the shmem-sealing infrastructure to prevent any
new writable mappings after the WRITE seal has been set. In case there
exists a writable mapping, this operation will fail with EBUSY.
Note that we rely on the fact that iff you already own a writable mapping,
you can increase the counter without using the helpers. This is the same
that we do for i_writecount.
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Ryan Lortie <desrt@desrt.ca>
Cc: Lennart Poettering <lennart@poettering.net>
Cc: Daniel Mack <zonque@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: Ic852afffd43f8e75333d8182ad2ab045c78996f4
Oleg Nesterov [Wed, 11 Sep 2013 21:20:20 +0000 (14:20 -0700)]
mm: mmap_region: kill correct_wcount/inode, use allow_write_access()
correct_wcount and inode in mmap_region() just complicate the code. This
boolean was needed previously, when deny_write_access() was called before
vma_merge(), now we can simply check VM_DENYWRITE and do
allow_write_access() if it is set.
allow_write_access() checks file != NULL, so this is safe even if it was
possible to use VM_DENYWRITE && !file. Just we need to ensure we use the
same file which was deny_write_access()'ed, so the patch also moves "file
= vma->vm_file" down after allow_write_access().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Colin Cross <ccross@android.com>
Cc: David Rientjes <rientjes@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: I4274f2f7e24b51ce8e2eafd5a28bdafb106fbe5e
Oleg Nesterov [Wed, 11 Sep 2013 21:20:19 +0000 (14:20 -0700)]
mm: do_mmap_pgoff: cleanup the usage of file_inode()
Simple cleanup. Move "struct inode *inode" variable into "if (file)"
block to simplify the code and avoid the unnecessary check.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Colin Cross <ccross@android.com>
Cc: David Rientjes <rientjes@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: I1bf739a9e3175cf5d3c1d3b421bf5daa48d5f1b5
Oleg Nesterov [Wed, 11 Sep 2013 21:20:18 +0000 (14:20 -0700)]
mm: shift VM_GROWS* check from mmap_region() to do_mmap_pgoff()
mmap() doesn't allow the non-anonymous mappings with VM_GROWS* bit set.
In particular this means that mmap_region()->vma_merge(file, vm_flags)
must always fail if "vm_flags & VM_GROWS" is set incorrectly.
So it does not make sense to check VM_GROWS* after we already allocated
the new vma, the only caller, do_mmap_pgoff(), which can pass this flag
can do the check itself.
And this looks a bit more correct, mmap_region() already unmapped the
old mapping at this stage. But if mmap() is going to fail, it should
avoid do_munmap() if possible.
Note: we check VM_GROWS at the end to ensure that do_mmap_pgoff() won't
return EINVAL in the case when it currently returns another error code.
Many thanks to Hugh who nacked the buggy v1.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: Ic81c1919adf051b9308125fcb87ae6a46e71b580
Oleg Nesterov [Wed, 11 Sep 2013 21:20:14 +0000 (14:20 -0700)]
mm: mempolicy: turn vma_set_policy() into vma_dup_policy()
Simple cleanup. Every user of vma_set_policy() does the same work, this
looks a bit annoying imho. And the new trivial helper which does
mpol_dup() + vma_set_policy() to simplify the callers.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: Ice9406b849ca330fc0b6bc2436a685fa6fd50217
David Herrmann [Fri, 8 Aug 2014 21:25:27 +0000 (14:25 -0700)]
shm: add sealing API
If two processes share a common memory region, they usually want some
guarantees to allow safe access. This often includes:
- one side cannot overwrite data while the other reads it
- one side cannot shrink the buffer while the other accesses it
- one side cannot grow the buffer beyond previously set boundaries
If there is a trust-relationship between both parties, there is no need
for policy enforcement. However, if there's no trust relationship (eg.,
for general-purpose IPC) sharing memory-regions is highly fragile and
often not possible without local copies. Look at the following two
use-cases:
1) A graphics client wants to share its rendering-buffer with a
graphics-server. The memory-region is allocated by the client for
read/write access and a second FD is passed to the server. While
scanning out from the memory region, the server has no guarantee that
the client doesn't shrink the buffer at any time, requiring rather
cumbersome SIGBUS handling.
2) A process wants to perform an RPC on another process. To avoid huge
bandwidth consumption, zero-copy is preferred. After a message is
assembled in-memory and a FD is passed to the remote side, both sides
want to be sure that neither modifies this shared copy, anymore. The
source may have put sensible data into the message without a separate
copy and the target may want to parse the message inline, to avoid a
local copy.
While SIGBUS handling, POSIX mandatory locking and MAP_DENYWRITE provide
ways to achieve most of this, the first one is unproportionally ugly to
use in libraries and the latter two are broken/racy or even disabled due
to denial of service attacks.
This patch introduces the concept of SEALING. If you seal a file, a
specific set of operations is blocked on that file forever. Unlike locks,
seals can only be set, never removed. Hence, once you verified a specific
set of seals is set, you're guaranteed that no-one can perform the blocked
operations on this file, anymore.
An initial set of SEALS is introduced by this patch:
- SHRINK: If SEAL_SHRINK is set, the file in question cannot be reduced
in size. This affects ftruncate() and open(O_TRUNC).
- GROW: If SEAL_GROW is set, the file in question cannot be increased
in size. This affects ftruncate(), fallocate() and write().
- WRITE: If SEAL_WRITE is set, no write operations (besides resizing)
are possible. This affects fallocate(PUNCH_HOLE), mmap() and
write().
- SEAL: If SEAL_SEAL is set, no further seals can be added to a file.
This basically prevents the F_ADD_SEAL operation on a file and
can be set to prevent others from adding further seals that you
don't want.
The described use-cases can easily use these seals to provide safe use
without any trust-relationship:
1) The graphics server can verify that a passed file-descriptor has
SEAL_SHRINK set. This allows safe scanout, while the client is
allowed to increase buffer size for window-resizing on-the-fly.
Concurrent writes are explicitly allowed.
2) For general-purpose IPC, both processes can verify that SEAL_SHRINK,
SEAL_GROW and SEAL_WRITE are set. This guarantees that neither
process can modify the data while the other side parses it.
Furthermore, it guarantees that even with writable FDs passed to the
peer, it cannot increase the size to hit memory-limits of the source
process (in case the file-storage is accounted to the source).
The new API is an extension to fcntl(), adding two new commands:
F_GET_SEALS: Return a bitset describing the seals on the file. This
can be called on any FD if the underlying file supports
sealing.
F_ADD_SEALS: Change the seals of a given file. This requires WRITE
access to the file and F_SEAL_SEAL may not already be set.
Furthermore, the underlying file must support sealing and
there may not be any existing shared mapping of that file.
Otherwise, EBADF/EPERM is returned.
The given seals are _added_ to the existing set of seals
on the file. You cannot remove seals again.
The fcntl() handler is currently specific to shmem and disabled on all
files. A file needs to explicitly support sealing for this interface to
work. A separate syscall is added in a follow-up, which creates files that
support sealing. There is no intention to support this on other
file-systems. Semantics are unclear for non-volatile files and we lack any
use-case right now. Therefore, the implementation is specific to shmem.
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Ryan Lortie <desrt@desrt.ca>
Cc: Lennart Poettering <lennart@poettering.net>
Cc: Daniel Mack <zonque@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Angelo G. Del Regno <kholk11@gmail.com>
Change-Id: Ib71a640ebcc010c1ac2ec384bc292dd9dc7a5a26
Roman Birg [Mon, 18 Aug 2014 21:04:44 +0000 (14:04 -0700)]
input: gpio_keys: report SW_LID instead of SW_FLIP
* Android expects SW_LID for lid events. It also expects a different
sequence of lid state since windowed covers are not yet supported.
Change-Id: Iebffbabdbb3748eec4f887ebd227c67adf01d8ef
Signed-off-by: Roman Birg <roman@cyngn.com>
Eric Dumazet [Mon, 16 Mar 2015 04:12:12 +0000 (21:12 -0700)]
net: add sk_fullsock() helper
We have many places where we want to check if a socket is
not a timewait or request socket. Use a helper to avoid
hard coding this.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[backported from net-next
1d0ab253872cdd3d8e7913f59c266c7fd01771d0]
[lorenzo@google.com: removed TCPF_NEW_SYN_RECV, and added a comment to add it back.]
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Bug:
24163529
Change-Id: Ibf09017e1ab00af5e6925273117c335d7f515d73
Harout Hedeshian [Mon, 2 Feb 2015 20:30:42 +0000 (13:30 -0700)]
net: tcp: Scale the TCP backlog queue to absorb packet bursts
A large momentary influx of packets flooding the TCP layer may cause
packets to get dropped at the socket's backlog queue. Bump this up to
prevent these drops. Note that this change may cause the socket memory
accounting to allow the total backlog queue length to exceed the user
space configured values, sometimes by a substantial amount, which can
lead to out of order packets to be dropped instead of being queued. To
avoid these ofo drops, the condition to drop an out of order packet is
modified to allow out of order queuing to continue as long as it falls
within the now increased backlog queue limit.
Change-Id: I447ffc8560cb149fe84193c72bf693862f7ec740
Signed-off-by: Harout Hedeshian <harouth@codeaurora.org>
Eric Dumazet [Mon, 5 Oct 2015 04:08:09 +0000 (21:08 -0700)]
ipv6: inet6_sk() should use sk_fullsock()
SYN_RECV & TIMEWAIT sockets are not full blown, they do not have a pinet6
pointer.
Bug:
24163529
Change-Id: I6ce67a190d67d200c6ebeb81d2daeb9c86cd7581
Fixes:
ca6fb0651883 ("tcp: attach SYNACK messages to request sockets instead of listener")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Eric Dumazet [Tue, 19 Sep 2017 17:05:57 +0000 (10:05 -0700)]
tcp: fastopen: fix on syn-data transmit failure
[ Upstream commit
b5b7db8d680464b1d631fd016f5e093419f0bfd9 ]
Our recent change exposed a bug in TCP Fastopen Client that syzkaller
found right away [1]
When we prepare skb with SYN+DATA, we attempt to transmit it,
and we update socket state as if the transmit was a success.
In socket RTX queue we have two skbs, one with the SYN alone,
and a second one containing the DATA.
When (malicious) ACK comes in, we now complain that second one had no
skb_mstamp.
The proper fix is to make sure that if the transmit failed, we do not
pretend we sent the DATA skb, and make it our send_head.
When 3WHS completes, we can now send the DATA right away, without having
to wait for a timeout.
[1]
WARNING: CPU: 0 PID: 100189 at net/ipv4/tcp_input.c:3117 tcp_clean_rtx_queue+0x2057/0x2ab0 net/ipv4/tcp_input.c:3117()
WARN_ON_ONCE(last_ackt == 0);
Modules linked in:
CPU: 0 PID: 100189 Comm: syz-executor1 Not tainted
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
0000000000000000 ffff8800b35cb1d8 ffffffff81cad00d 0000000000000000
ffffffff828a4347 ffff88009f86c080 ffffffff8316eb20 0000000000000d7f
ffff8800b35cb220 ffffffff812c33c2 ffff8800baad2440 00000009d46575c0
Call Trace:
[<
ffffffff81cad00d>] __dump_stack
[<
ffffffff81cad00d>] dump_stack+0xc1/0x124
[<
ffffffff812c33c2>] warn_slowpath_common+0xe2/0x150
[<
ffffffff812c361e>] warn_slowpath_null+0x2e/0x40
[<
ffffffff828a4347>] tcp_clean_rtx_queue+0x2057/0x2ab0 n
[<
ffffffff828ae6fd>] tcp_ack+0x151d/0x3930
[<
ffffffff828baa09>] tcp_rcv_state_process+0x1c69/0x4fd0
[<
ffffffff828efb7f>] tcp_v4_do_rcv+0x54f/0x7c0
[<
ffffffff8258aacb>] sk_backlog_rcv
[<
ffffffff8258aacb>] __release_sock+0x12b/0x3a0
[<
ffffffff8258ad9e>] release_sock+0x5e/0x1c0
[<
ffffffff8294a785>] inet_wait_for_connect
[<
ffffffff8294a785>] __inet_stream_connect+0x545/0xc50
[<
ffffffff82886f08>] tcp_sendmsg_fastopen
[<
ffffffff82886f08>] tcp_sendmsg+0x2298/0x35a0
[<
ffffffff82952515>] inet_sendmsg+0xe5/0x520
[<
ffffffff8257152f>] sock_sendmsg_nosec
[<
ffffffff8257152f>] sock_sendmsg+0xcf/0x110
Fixes:
8c72c65b426b ("tcp: update skb->skb_mstamp more carefully")
Fixes:
783237e8daf1 ("net-tcp: Fast Open client - sending SYN-data")
Change-Id: I1ee49ef4b2ab363fd9f10a518c1ce8bfa71ad7d1
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Lorenzo Colitti [Tue, 29 Nov 2016 17:56:47 +0000 (02:56 +0900)]
net: ipv4: Don't crash if passing a null sk to ip_rt_update_pmtu.
Commit
e2d118a1cb5e ("net: inet: Support UID-based routing in IP
protocols.") made __build_flow_key call sock_net(sk) to determine
the network namespace of the passed-in socket. This crashes if sk
is NULL.
Fix this by getting the network namespace from the skb instead.
[Backport of net-next
d109e61bfe7a468fd8df4a7ceb65635e7aa909a0]
Bug:
16355602
Change-Id: I23b43db5adb8546833e013c268f31111d0e53c69
Fixes:
e2d118a1cb5e ("net: inet: Support UID-based routing in IP protocols.")
Reported-by: Erez Shitrit <erezsh@dev.mellanox.co.il>
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lorenzo Colitti [Thu, 3 Nov 2016 17:23:43 +0000 (02:23 +0900)]
net: inet: Support UID-based routing in IP protocols.
- Use the UID in routing lookups made by protocol connect() and
sendmsg() functions.
- Make sure that routing lookups triggered by incoming packets
(e.g., Path MTU discovery) take the UID of the socket into
account.
- For packets not associated with a userspace socket, (e.g., ping
replies) use UID 0 inside the user namespace corresponding to
the network namespace the socket belongs to. This allows
all namespaces to apply routing and iptables rules to
kernel-originated traffic in that namespaces by matching UID 0.
This is better than using the UID of the kernel socket that is
sending the traffic, because the UID of kernel sockets created
at namespace creation time (e.g., the per-processor ICMP and
TCP sockets) is the UID of the user that created the socket,
which might not be mapped in the namespace.
[Backport of net-next
e2d118a1cb5e60d077131a09db1d81b90a5295fe]
Bug:
16355602
Change-Id: I126f8359887b5b5bbac68daf0ded89e899cb7cb0
Tested: compiles allnoconfig, allyesconfig, allmodconfig
Tested: https://android-review.googlesource.com/253302
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lorenzo Colitti [Thu, 15 May 2014 23:38:41 +0000 (16:38 -0700)]
net: ipv6: make "ip -6 route get mark xyz" work.
Currently, "ip -6 route get mark xyz" ignores the mark passed in
by userspace. Make it honour the mark, just like IPv4 does.
[net-next commit
2e47b291953c35afa4e20a65475954c1a1b9afe1]
Change-Id: Idaae7338506d1785a80159bfe4f0cc3c2a9b6827
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lorenzo Colitti [Thu, 3 Nov 2016 17:23:41 +0000 (02:23 +0900)]
net: core: Add a UID field to struct sock.
Protocol sockets (struct sock) don't have UIDs, but most of the
time, they map 1:1 to userspace sockets (struct socket) which do.
Various operations such as the iptables xt_owner match need
access to the "UID of a socket", and do so by following the
backpointer to the struct socket. This involves taking
sk_callback_lock and doesn't work when there is no socket
because userspace has already called close().
Simplify this by adding a sk_uid field to struct sock whose value
matches the UID of the corresponding struct socket. The semantics
are as follows:
1. Whenever sk_socket is non-null: sk_uid is the same as the UID
in sk_socket, i.e., matches the return value of sock_i_uid.
Specifically, the UID is set when userspace calls socket(),
fchown(), or accept().
2. When sk_socket is NULL, sk_uid is defined as follows:
- For a socket that no longer has a sk_socket because
userspace has called close(): the previous UID.
- For a cloned socket (e.g., an incoming connection that is
established but on which userspace has not yet called
accept): the UID of the socket it was cloned from.
- For a socket that has never had an sk_socket: UID 0 inside
the user namespace corresponding to the network namespace
the socket belongs to.
Kernel sockets created by sock_create_kern are a special case
of #1 and sk_uid is the user that created them. For kernel
sockets created at network namespace creation time, such as the
per-processor ICMP and TCP sockets, this is the user that created
the network namespace.
[Backport of net-next
86741ec25462e4c8cdce6df2f41ead05568c7d5e]
Bug:
16355602
Change-Id: I73e1a57dfeedf672f4c2dfc9ce6867838b55974b
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 9 Dec 2014 17:56:08 +0000 (09:56 -0800)]
tcp: fix more NULL deref after prequeue changes
When I cooked commit
c3658e8d0f1 ("tcp: fix possible NULL dereference in
tcp_vX_send_reset()") I missed other spots we could deref a NULL
skb_dst(skb)
Again, if a socket is provided, we do not need skb_dst() to get a
pointer to network namespace : sock_net(sk) is good enough.
[Backport of net-next
0f85feae6b710ced3abad5b2b47d31dfcb956b62]
Bug:
16355602
Change-Id: I72c9f7dae8da4451112a20ea36183365303bd389
Reported-by: Dann Frazier <dann.frazier@canonical.com>
Bisected-by: Dann Frazier <dann.frazier@canonical.com>
Tested-by: Dann Frazier <dann.frazier@canonical.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes:
ca777eff51f7 ("tcp: remove dst refcount false sharing for prequeue mode")
Signed-off-by: David S. Miller <davem@davemloft.net>
Lorenzo Colitti [Fri, 12 Aug 2016 16:13:38 +0000 (01:13 +0900)]
net: ipv6: Fix ping to link-local addresses.
ping_v6_sendmsg does not set flowi6_oif in response to
sin6_scope_id or sk_bound_dev_if, so it is not possible to use
these APIs to ping an IPv6 address on a different interface.
Instead, it sets flowi6_iif, which is incorrect but harmless.
Stop setting flowi6_iif, and support various ways of setting oif
in the same priority order used by udpv6_sendmsg.
[Backport of net
5e457896986e16c440c97bb94b9ccd95dd157292]
Bug:
29370996
Change-Id: I2c8bc213c417a4427f64439e0954138cb30416c2
Tested: https://android-review.googlesource.com/#/c/254470/
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Corinna Vinschen <xda@vinschen.de>
Chris Clark [Tue, 27 Aug 2013 18:02:15 +0000 (12:02 -0600)]
ipv4: sendto/hdrincl: don't use destination address found in header
ipv4: raw_sendmsg: don't use header's destination address
A sendto() regression was bisected and found to start with commit
f8126f1d5136be1 (ipv4: Adjust semantics of rt->rt_gateway.)
The problem is that it tries to ARP-lookup the constructed packet's
destination address rather than the explicitly provided address.
Fix this using FLOWI_FLAG_KNOWN_NH so that given nexthop is used.
cf. commit
2ad5b9e4bd314fc685086b99e90e5de3bc59e26b
Reported-by: Chris Clark <chris.clark@alcatel-lucent.com>
Bisected-by: Chris Clark <chris.clark@alcatel-lucent.com>
Tested-by: Chris Clark <chris.clark@alcatel-lucent.com>
Suggested-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Chris Clark <chris.clark@alcatel-lucent.com>
Change-Id: I06c9f3a0bca97a4b190e31543345e5accbf73a6d
Danny Wood [Mon, 6 Jul 2020 14:06:03 +0000 (15:06 +0100)]
universal7580: a7xelte-dts: remove second touchkey entry
* fixes touchkeys using our abov_touchkey_ft1804 combined driver
Change-Id: Ie82812adf7a13b2c4d05ea70d319cae83e2e566f
Sourajit Karmakar [Tue, 21 Apr 2020 13:48:24 +0000 (09:48 -0400)]
defconfig: Import a7xelte defconfig.
Thanks @danwood76.
Change-Id: I3e68f291b872d1e493d662d9cab699fb0f472a2c
Dario Trombello [Thu, 20 Feb 2020 18:41:57 +0000 (18:41 +0000)]
sensors: k2hh: Fix accelerometer
Using the STMicroelectronics K2HH driver from SM-J700F Android 6.0 kernel source (J700FXXU4BQE3) makes the sensor work.
Change-Id: I9f50ea5096b56617b171d5cc64c2ed1b01a3e205
Dario Trombello [Thu, 20 Feb 2020 18:34:39 +0000 (18:34 +0000)]
arm64: Add lineageos_j7elte_defconfig
Change-Id: I85146aabf467c93cc6713b63445dfac667186212
Will Deacon [Mon, 11 Aug 2014 13:24:47 +0000 (14:24 +0100)]
asm-generic: add memfd_create system call to unistd.h
Commit
9183df25fe7b ("shm: add memfd_create() syscall") added a new
system call (memfd_create) but didn't update the asm-generic unistd
header.
This patch adds the new system call to the asm-generic version of
unistd.h so that it can be used by architectures such as arm64.
Change-Id: I173b1e5b6087fcea7d226a9f55f792432515897d
Cc: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
David Herrmann [Fri, 8 Aug 2014 21:25:29 +0000 (14:25 -0700)]
shm: add memfd_create() syscall
memfd_create() is similar to mmap(MAP_ANON), but returns a file-descriptor
that you can pass to mmap(). It can support sealing and avoids any
connection to user-visible mount-points. Thus, it's not subject to quotas
on mounted file-systems, but can be used like malloc()'ed memory, but with
a file-descriptor to it.
memfd_create() returns the raw shmem file, so calls like ftruncate() can
be used to modify the underlying inode. Also calls like fstat() will
return proper information and mark the file as regular file. If you want
sealing, you can specify MFD_ALLOW_SEALING. Otherwise, sealing is not
supported (like on all other regular files).
Compared to O_TMPFILE, it does not require a tmpfs mount-point and is not
subject to a filesystem size limit. It is still properly accounted to
memcg limits, though, and to the same overcommit or no-overcommit
accounting as all user memory.
Change-Id: Iaf959293e2c490523aeb46d56cc45b0e7bbe7bf5
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Ryan Lortie <desrt@desrt.ca>
Cc: Lennart Poettering <lennart@poettering.net>
Cc: Daniel Mack <zonque@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Angelo G. Del Regno <kholk11@gmail.com>
Corinna Vinschen [Sun, 18 Nov 2018 18:21:35 +0000 (19:21 +0100)]
universal7580: fix commit "ANDROID: sdcardfs: Hold i_mutex for
i_size_write"
I accidentally merged the 3.18 patch, using a different way to access
the lower file's inode. Use the 3.10 technique instead.
Change-Id: Iea18abcb24cce9afa23e870af8beb31767d67250
Signed-off-by: Corinna Vinschen <xda@vinschen.de>
Daniel Rosenberg [Thu, 25 Oct 2018 23:25:15 +0000 (16:25 -0700)]
ANDROID: sdcardfs: Add option to not link obb
Add mount option unshared_obb to not link the obb
folders of multiple users together.
Bug:
27915347
Test: mount with option. Check if altering one obb
alters the other
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Change-Id: I3956e06bd0a222b0bbb2768c9a8a8372ada85e1e
Signed-off-by: Paul Keith <javelinanddart@gmail.com>
Daniel Rosenberg [Thu, 25 Oct 2018 23:22:50 +0000 (16:22 -0700)]
ANDROID: sdcardfs: Add sandbox
Android/sandbox is treated the same as Android/data
Bug:
27915347
Test: ls -l /sdcard/Android/sandbox/*somepackage* after
creating the folder.
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Change-Id: I7ef440a88df72198303c419e1f2f7c4657f9c170
Signed-off-by: Paul Keith <javelinanddart@gmail.com>
Daniel Rosenberg [Fri, 6 Jul 2018 23:24:27 +0000 (16:24 -0700)]
ANDROID: sdcardfs: Add option to drop unused dentries
This adds the nocache mount option, which will cause sdcardfs to always
drop dentries that are not in use, preventing cached entries from
holding on to lower dentries, which could cause strange behavior when
bypassing the sdcardfs layer and directly changing the lower fs.
Change-Id: I70268584a20b989ae8cfdd278a2e4fa1605217fb
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Signed-off-by: Paul Keith <javelinanddart@gmail.com>
Daniel Rosenberg [Fri, 20 Jul 2018 23:11:40 +0000 (16:11 -0700)]
ANDROID: sdcardfs: Change current->fs under lock
bug:
111641492
Change-Id: I79e9894f94880048edaf0f7cfa2d180f65cbcf3b
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Signed-off-by: Paul Keith <javelinanddart@gmail.com>
Daniel Rosenberg [Fri, 20 Jul 2018 01:08:35 +0000 (18:08 -0700)]
ANDROID: sdcardfs: Don't use OVERRIDE_CRED macro
The macro hides some control flow, making it easier
to run into bugs.
bug:
111642636
Change-Id: I37ec207c277d97c4e7f1e8381bc9ae743ad78435
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Signed-off-by: Paul Keith <javelinanddart@gmail.com>
Danny Wood [Thu, 31 Oct 2019 14:35:27 +0000 (14:35 +0000)]
Revert "FROMLIST: android: binder: Move buffer out of area shared with user space"
This commit causes the Samsung a5xelte fingerprint blobs to stop working
This reverts commit
35852b611c5af888e0ac979391099fe2035a06be.
Change-Id: I4c9e3c551deb98b793cb6a7de9ef2a14f3a46067
Todd Kjos [Wed, 12 Jun 2019 20:29:27 +0000 (13:29 -0700)]
binder: fix possible UAF when freeing buffer
commit
a370003cc301d4361bae20c9ef615f89bf8d1e8a upstream
There is a race between the binder driver cleaning
up a completed transaction via binder_free_transaction()
and a user calling binder_ioctl(BC_FREE_BUFFER) to
release a buffer. It doesn't matter which is first but
they need to be protected against running concurrently
which can result in a UAF.
Signed-off-by: Todd Kjos <tkjos@google.com>
Cc: stable <stable@vger.kernel.org> # 4.14 4.19
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Change-Id: I1b9c9bdc52df8ddbc5fe7c6d8308f1068265f8ae
Martijn Coenen [Tue, 9 Jul 2019 11:09:23 +0000 (13:09 +0200)]
BACKPORT: binder: Set end of SG buffer area properly.
In case the target node requests a security context, the
extra_buffers_size is increased with the size of the security context.
But, that size is not available for use by regular scatter-gather
buffers; make sure the ending of that buffer is marked correctly.
Bug:
136210786
Acked-by: Todd Kjos <tkjos@google.com>
Fixes:
ec74136ded79 ("binder: create node flag to request sender's security context")
Signed-off-by: Martijn Coenen <maco@android.com>
Cc: stable@vger.kernel.org # 5.1+
Link: https://lore.kernel.org/r/20190709110923.220736-1-maco@android.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit
a56587065094fd96eb4c2b5ad65571daad32156d)
Change-Id: Ib4d3a99e7a881992c1313169f902cfad02a508a6
Todd Kjos [Wed, 24 Apr 2019 19:31:18 +0000 (12:31 -0700)]
UPSTREAM: binder: check for overflow when alloc for security context
commit
0b0509508beff65c1d50541861bc0d4973487dc5 upstream.
When allocating space in the target buffer for the security context,
make sure the extra_buffers_size doesn't overflow. This can only
happen if the given size is invalid, but an overflow can turn it
into a valid size. Fail the transaction if an overflow is detected.
Bug:
130571081
Change-Id: Ibaec652d2073491cc426a4a24004a848348316bf
Signed-off-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Todd Kjos [Mon, 14 Jan 2019 17:10:21 +0000 (09:10 -0800)]
FROMGIT: binder: create node flag to request sender's security context
To allow servers to verify client identity, allow a node
flag to be set that causes the sender's security context
to be delivered with the transaction. The BR_TRANSACTION
command is extended in BR_TRANSACTION_SEC_CTX to
contain a pointer to the security context string.
Signed-off-by: Todd Kjos <tkjos@google.com>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit
ec74136ded792deed80780a2f8baf3521eeb72f9
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
master)
Change-Id: I44496546e2d0dc0022f818a45cd52feb1c1a92cb
Signed-off-by: Todd Kjos <tkjos@google.com>
Todd Kjos [Tue, 6 Nov 2018 23:55:32 +0000 (15:55 -0800)]
UPSTREAM: binder: fix race that allows malicious free of live buffer
commit
7bada55ab50697861eee6bb7d60b41e68a961a9c upstream
Malicious code can attempt to free buffers using the BC_FREE_BUFFER
ioctl to binder. There are protections against a user freeing a buffer
while in use by the kernel, however there was a window where
BC_FREE_BUFFER could be used to free a recently allocated buffer that
was not completely initialized. This resulted in a use-after-free
detected by KASAN with a malicious test program.
This window is closed by setting the buffer's allow_user_free attribute
to 0 when the buffer is allocated or when the user has previously freed
it instead of waiting for the caller to set it. The problem was that
when the struct buffer was recycled, allow_user_free was stale and set
to 1 allowing a free to go through.
Bug:
116855682
Change-Id: I0b38089f6fdb1adbf7e1102747e4119c9a05b191
Signed-off-by: Todd Kjos <tkjos@google.com>
Acked-by: Arve Hjønnevåg <arve@android.com>
Cc: stable <stable@vger.kernel.org> # 4.14
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Todd Kjos [Mon, 27 Nov 2017 17:32:33 +0000 (09:32 -0800)]
UPSTREAM: binder: fix proc->files use-after-free
proc->files cleanup is initiated by binder_vma_close. Therefore
a reference on the binder_proc is not enough to prevent the
files_struct from being released while the binder_proc still has
a reference. This can lead to an attempt to dereference the
stale pointer obtained from proc->files prior to proc->files
cleanup. This has been seen once in task_get_unused_fd_flags()
when __alloc_fd() is called with a stale "files".
The fix is to protect proc->files with a mutex to prevent cleanup
while in use.
Signed-off-by: Todd Kjos <tkjos@google.com>
Cc: stable <stable@vger.kernel.org> # 4.14
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit
7f3dc0088b98533f17128058fac73cd8b2752ef1)
Change-Id: I40982bb0b4615bda5459538c20eb2a913964042c
Martijn Coenen [Sat, 25 Aug 2018 20:50:56 +0000 (13:50 -0700)]
FROMLIST: ANDROID: binder: Add BINDER_GET_NODE_INFO_FOR_REF ioctl.
This allows the context manager to retrieve information about nodes
that it holds a reference to, such as the current number of
references to those nodes.
Such information can for example be used to determine whether the
servicemanager is the only process holding a reference to a node.
This information can then be passed on to the process holding the
node, which can in turn decide whether it wants to shut down to
reduce resource usage.
Signed-off-by: Martijn Coenen <maco@android.com>
Change-Id: I2fa9b6e2b1d1d6c84fca954125c3ec776dc2c04f