GitHub/LineageOS/android_kernel_samsung_universal7580.git
3 years agoktime: Sanitize ktime_to_us/ms conversion
Thomas Gleixner [Wed, 16 Jul 2014 21:03:55 +0000 (21:03 +0000)]
ktime: Sanitize ktime_to_us/ms conversion

With the plain nanoseconds based ktime_t we can simply use
ktime_divns() instead of going through loops and hoops of
timespec/timeval conversion.

Change-Id: Ib33ea80bc3fdd26fcada1072b1f8ffc26081c0ba
Reported-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Kevin F. Haggerty <haggertk@lineageos.org>
3 years agoblock: Fix use-after-free in blkdev_get()
Jason Yan [Tue, 16 Jun 2020 12:16:55 +0000 (20:16 +0800)]
block: Fix use-after-free in blkdev_get()

[ Upstream commit 2d3a8e2deddea6c89961c422ec0c5b851e648c14 ]

In blkdev_get() we call __blkdev_get() to do some internal jobs and if
there is some errors in __blkdev_get(), the bdput() is called which
means we have released the refcount of the bdev (actually the refcount of
the bdev inode). This means we cannot access bdev after that point. But
acctually bdev is still accessed in blkdev_get() after calling
__blkdev_get(). This results in use-after-free if the refcount is the
last one we released in __blkdev_get(). Let's take a look at the
following scenerio:

  CPU0            CPU1                    CPU2
blkdev_open     blkdev_open           Remove disk
                  bd_acquire
  blkdev_get
    __blkdev_get      del_gendisk
bdev_unhash_inode
  bd_acquire          bdev_get_gendisk
    bd_forget           failed because of unhashed
  bdput
              bdput (the last one)
        bdev_evict_inode

       access bdev => use after free

[  459.350216] BUG: KASAN: use-after-free in __lock_acquire+0x24c1/0x31b0
[  459.351190] Read of size 8 at addr ffff88806c815a80 by task syz-executor.0/20132
[  459.352347]
[  459.352594] CPU: 0 PID: 20132 Comm: syz-executor.0 Not tainted 4.19.90 #2
[  459.353628] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
[  459.354947] Call Trace:
[  459.355337]  dump_stack+0x111/0x19e
[  459.355879]  ? __lock_acquire+0x24c1/0x31b0
[  459.356523]  print_address_description+0x60/0x223
[  459.357248]  ? __lock_acquire+0x24c1/0x31b0
[  459.357887]  kasan_report.cold+0xae/0x2d8
[  459.358503]  __lock_acquire+0x24c1/0x31b0
[  459.359120]  ? _raw_spin_unlock_irq+0x24/0x40
[  459.359784]  ? lockdep_hardirqs_on+0x37b/0x580
[  459.360465]  ? _raw_spin_unlock_irq+0x24/0x40
[  459.361123]  ? finish_task_switch+0x125/0x600
[  459.361812]  ? finish_task_switch+0xee/0x600
[  459.362471]  ? mark_held_locks+0xf0/0xf0
[  459.363108]  ? __schedule+0x96f/0x21d0
[  459.363716]  lock_acquire+0x111/0x320
[  459.364285]  ? blkdev_get+0xce/0xbe0
[  459.364846]  ? blkdev_get+0xce/0xbe0
[  459.365390]  __mutex_lock+0xf9/0x12a0
[  459.365948]  ? blkdev_get+0xce/0xbe0
[  459.366493]  ? bdev_evict_inode+0x1f0/0x1f0
[  459.367130]  ? blkdev_get+0xce/0xbe0
[  459.367678]  ? destroy_inode+0xbc/0x110
[  459.368261]  ? mutex_trylock+0x1a0/0x1a0
[  459.368867]  ? __blkdev_get+0x3e6/0x1280
[  459.369463]  ? bdev_disk_changed+0x1d0/0x1d0
[  459.370114]  ? blkdev_get+0xce/0xbe0
[  459.370656]  blkdev_get+0xce/0xbe0
[  459.371178]  ? find_held_lock+0x2c/0x110
[  459.371774]  ? __blkdev_get+0x1280/0x1280
[  459.372383]  ? lock_downgrade+0x680/0x680
[  459.373002]  ? lock_acquire+0x111/0x320
[  459.373587]  ? bd_acquire+0x21/0x2c0
[  459.374134]  ? do_raw_spin_unlock+0x4f/0x250
[  459.374780]  blkdev_open+0x202/0x290
[  459.375325]  do_dentry_open+0x49e/0x1050
[  459.375924]  ? blkdev_get_by_dev+0x70/0x70
[  459.376543]  ? __x64_sys_fchdir+0x1f0/0x1f0
[  459.377192]  ? inode_permission+0xbe/0x3a0
[  459.377818]  path_openat+0x148c/0x3f50
[  459.378392]  ? kmem_cache_alloc+0xd5/0x280
[  459.379016]  ? entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  459.379802]  ? path_lookupat.isra.0+0x900/0x900
[  459.380489]  ? __lock_is_held+0xad/0x140
[  459.381093]  do_filp_open+0x1a1/0x280
[  459.381654]  ? may_open_dev+0xf0/0xf0
[  459.382214]  ? find_held_lock+0x2c/0x110
[  459.382816]  ? lock_downgrade+0x680/0x680
[  459.383425]  ? __lock_is_held+0xad/0x140
[  459.384024]  ? do_raw_spin_unlock+0x4f/0x250
[  459.384668]  ? _raw_spin_unlock+0x1f/0x30
[  459.385280]  ? __alloc_fd+0x448/0x560
[  459.385841]  do_sys_open+0x3c3/0x500
[  459.386386]  ? filp_open+0x70/0x70
[  459.386911]  ? trace_hardirqs_on_thunk+0x1a/0x1c
[  459.387610]  ? trace_hardirqs_off_caller+0x55/0x1c0
[  459.388342]  ? do_syscall_64+0x1a/0x520
[  459.388930]  do_syscall_64+0xc3/0x520
[  459.389490]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  459.390248] RIP: 0033:0x416211
[  459.390720] Code: 75 14 b8 02 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83
04 19 00 00 c3 48 83 ec 08 e8 0a fa ff ff 48 89 04 24 b8 02 00 00 00 0f
   05 <48> 8b 3c 24 48 89 c2 e8 53 fa ff ff 48 89 d0 48 83 c4 08 48 3d
      01
[  459.393483] RSP: 002b:00007fe45dfe9a60 EFLAGS: 00000293 ORIG_RAX: 0000000000000002
[  459.394610] RAX: ffffffffffffffda RBX: 00007fe45dfea6d4 RCX: 0000000000416211
[  459.395678] RDX: 00007fe45dfe9b0a RSI: 0000000000000002 RDI: 00007fe45dfe9b00
[  459.396758] RBP: 000000000076bf20 R08: 0000000000000000 R09: 000000000000000a
[  459.397930] R10: 0000000000000075 R11: 0000000000000293 R12: 00000000ffffffff
[  459.399022] R13: 0000000000000bd9 R14: 00000000004cdb80 R15: 000000000076bf2c
[  459.400168]
[  459.400430] Allocated by task 20132:
[  459.401038]  kasan_kmalloc+0xbf/0xe0
[  459.401652]  kmem_cache_alloc+0xd5/0x280
[  459.402330]  bdev_alloc_inode+0x18/0x40
[  459.402970]  alloc_inode+0x5f/0x180
[  459.403510]  iget5_locked+0x57/0xd0
[  459.404095]  bdget+0x94/0x4e0
[  459.404607]  bd_acquire+0xfa/0x2c0
[  459.405113]  blkdev_open+0x110/0x290
[  459.405702]  do_dentry_open+0x49e/0x1050
[  459.406340]  path_openat+0x148c/0x3f50
[  459.406926]  do_filp_open+0x1a1/0x280
[  459.407471]  do_sys_open+0x3c3/0x500
[  459.408010]  do_syscall_64+0xc3/0x520
[  459.408572]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  459.409415]
[  459.409679] Freed by task 1262:
[  459.410212]  __kasan_slab_free+0x129/0x170
[  459.410919]  kmem_cache_free+0xb2/0x2a0
[  459.411564]  rcu_process_callbacks+0xbb2/0x2320
[  459.412318]  __do_softirq+0x225/0x8ac

Fix this by delaying bdput() to the end of blkdev_get() which means we
have finished accessing bdev.

Fixes: 77ea887e433a ("implement in-kernel gendisk events handling")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Jason Yan <yanaijie@huawei.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Change-Id: I642bd3c3224cc174ba4bdd57133b1a17fb35dccd
CVE-2020-15436
Signed-off-by: Kevin F. Haggerty <haggertk@lineageos.org>
3 years agofs: sdfat: __sdfat_submit_bio_write() should always submit a WRITE
Kevin F. Haggerty [Fri, 4 Jun 2021 02:33:21 +0000 (20:33 -0600)]
fs: sdfat: __sdfat_submit_bio_write() should always submit a WRITE

sdfat version 2.4.5 introduced wbc_to_write_flags() as a compat
function for legacy kernels. Unfortunately, as the legacy
__sdfat_submit_bio_write() path was adapted to take advantage of
wbc_to_write_flags(), it used the return value direcly as the
rw paramater to submit_bio(), and the default return value is 0
(READ). This becomes a problem later when write paths assert that
bio_data_dir() is WRITE (i.e., write bit of bio->bi_rw is set).

Pass a bitwise-or of WRITE and the return value of wbc_to_write_flags()
to submit_bio() on this legacy path. This logically unifies the
approach with the newer path here and directly with the legacy path of
__sdfat_submit_bio_write2() in mpage.c.

Fixes: afd489c6a8c8 ("fs: sdfat: Update to version 2.4.5")
Signed-off-by: Kevin F. Haggerty <haggertk@lineageos.org>
Change-Id: Ic7b937a467e9e116c3390c6ac0e009a59229b93c

3 years agozram: user per-cpu compression streams
Sergey Senozhatsky [Fri, 20 May 2016 23:59:51 +0000 (16:59 -0700)]
zram: user per-cpu compression streams

Remove idle streams list and keep compression streams in per-cpu data.
This removes two contented spin_lock()/spin_unlock() calls from write
path and also prevent write OP from being preempted while holding the
compression stream, which can cause slow downs.

For instance, let's assume that we have N cpus and N-2
max_comp_streams.TASK1 owns the last idle stream, TASK2-TASK3 come in
with the write requests:

  TASK1            TASK2              TASK3
 zram_bvec_write()
  spin_lock
  find stream
  spin_unlock

  compress

  <<preempted>>   zram_bvec_write()
                   spin_lock
                   find stream
                   spin_unlock
                     no_stream
                       schedule
                                     zram_bvec_write()
                                      spin_lock
                                      find_stream
                                      spin_unlock
                                        no_stream
                                          schedule
   spin_lock
   release stream
   spin_unlock
     wake up TASK2

not only TASK2 and TASK3 will not get the stream, TASK1 will be
preempted in the middle of its operation; while we would prefer it to
finish compression and release the stream.

Test environment: x86_64, 4 CPU box, 3G zram, lzo

The following fio tests were executed:
      read, randread, write, randwrite, rw, randrw
with the increasing number of jobs from 1 to 10.

                  4 streams        8 streams       per-cpu
  ===========================================================
  jobs1
  READ:           2520.1MB/s       2566.5MB/s      2491.5MB/s
  READ:           2102.7MB/s       2104.2MB/s      2091.3MB/s
  WRITE:          1355.1MB/s       1320.2MB/s      1378.9MB/s
  WRITE:          1103.5MB/s       1097.2MB/s      1122.5MB/s
  READ:           434013KB/s       435153KB/s      439961KB/s
  WRITE:          433969KB/s       435109KB/s      439917KB/s
  READ:           403166KB/s       405139KB/s      403373KB/s
  WRITE:          403223KB/s       405197KB/s      403430KB/s
  jobs2
  READ:           7958.6MB/s       8105.6MB/s      8073.7MB/s
  READ:           6864.9MB/s       6989.8MB/s      7021.8MB/s
  WRITE:          2438.1MB/s       2346.9MB/s      3400.2MB/s
  WRITE:          1994.2MB/s       1990.3MB/s      2941.2MB/s
  READ:           981504KB/s       973906KB/s      1018.8MB/s
  WRITE:          981659KB/s       974060KB/s      1018.1MB/s
  READ:           937021KB/s       938976KB/s      987250KB/s
  WRITE:          934878KB/s       936830KB/s      984993KB/s
  jobs3
  READ:           13280MB/s        13553MB/s       13553MB/s
  READ:           11534MB/s        11785MB/s       11755MB/s
  WRITE:          3456.9MB/s       3469.9MB/s      4810.3MB/s
  WRITE:          3029.6MB/s       3031.6MB/s      4264.8MB/s
  READ:           1363.8MB/s       1362.6MB/s      1448.9MB/s
  WRITE:          1361.9MB/s       1360.7MB/s      1446.9MB/s
  READ:           1309.4MB/s       1310.6MB/s      1397.5MB/s
  WRITE:          1307.4MB/s       1308.5MB/s      1395.3MB/s
  jobs4
  READ:           20244MB/s        20177MB/s       20344MB/s
  READ:           17886MB/s        17913MB/s       17835MB/s
  WRITE:          4071.6MB/s       4046.1MB/s      6370.2MB/s
  WRITE:          3608.9MB/s       3576.3MB/s      5785.4MB/s
  READ:           1824.3MB/s       1821.6MB/s      1997.5MB/s
  WRITE:          1819.8MB/s       1817.4MB/s      1992.5MB/s
  READ:           1765.7MB/s       1768.3MB/s      1937.3MB/s
  WRITE:          1767.5MB/s       1769.1MB/s      1939.2MB/s
  jobs5
  READ:           18663MB/s        18986MB/s       18823MB/s
  READ:           16659MB/s        16605MB/s       16954MB/s
  WRITE:          3912.4MB/s       3888.7MB/s      6126.9MB/s
  WRITE:          3506.4MB/s       3442.5MB/s      5519.3MB/s
  READ:           1798.2MB/s       1746.5MB/s      1935.8MB/s
  WRITE:          1792.7MB/s       1740.7MB/s      1929.1MB/s
  READ:           1727.6MB/s       1658.2MB/s      1917.3MB/s
  WRITE:          1726.5MB/s       1657.2MB/s      1916.6MB/s
  jobs6
  READ:           21017MB/s        20922MB/s       21162MB/s
  READ:           19022MB/s        19140MB/s       18770MB/s
  WRITE:          3968.2MB/s       4037.7MB/s      6620.8MB/s
  WRITE:          3643.5MB/s       3590.2MB/s      6027.5MB/s
  READ:           1871.8MB/s       1880.5MB/s      2049.9MB/s
  WRITE:          1867.8MB/s       1877.2MB/s      2046.2MB/s
  READ:           1755.8MB/s       1710.3MB/s      1964.7MB/s
  WRITE:          1750.5MB/s       1705.9MB/s      1958.8MB/s
  jobs7
  READ:           21103MB/s        20677MB/s       21482MB/s
  READ:           18522MB/s        18379MB/s       19443MB/s
  WRITE:          4022.5MB/s       4067.4MB/s      6755.9MB/s
  WRITE:          3691.7MB/s       3695.5MB/s      5925.6MB/s
  READ:           1841.5MB/s       1933.9MB/s      2090.5MB/s
  WRITE:          1842.7MB/s       1935.3MB/s      2091.9MB/s
  READ:           1832.4MB/s       1856.4MB/s      1971.5MB/s
  WRITE:          1822.3MB/s       1846.2MB/s      1960.6MB/s
  jobs8
  READ:           20463MB/s        20194MB/s       20862MB/s
  READ:           18178MB/s        17978MB/s       18299MB/s
  WRITE:          4085.9MB/s       4060.2MB/s      7023.8MB/s
  WRITE:          3776.3MB/s       3737.9MB/s      6278.2MB/s
  READ:           1957.6MB/s       1944.4MB/s      2109.5MB/s
  WRITE:          1959.2MB/s       1946.2MB/s      2111.4MB/s
  READ:           1900.6MB/s       1885.7MB/s      2082.1MB/s
  WRITE:          1896.2MB/s       1881.4MB/s      2078.3MB/s
  jobs9
  READ:           19692MB/s        19734MB/s       19334MB/s
  READ:           17678MB/s        18249MB/s       17666MB/s
  WRITE:          4004.7MB/s       4064.8MB/s      6990.7MB/s
  WRITE:          3724.7MB/s       3772.1MB/s      6193.6MB/s
  READ:           1953.7MB/s       1967.3MB/s      2105.6MB/s
  WRITE:          1953.4MB/s       1966.7MB/s      2104.1MB/s
  READ:           1860.4MB/s       1897.4MB/s      2068.5MB/s
  WRITE:          1858.9MB/s       1895.9MB/s      2066.8MB/s
  jobs10
  READ:           19730MB/s        19579MB/s       19492MB/s
  READ:           18028MB/s        18018MB/s       18221MB/s
  WRITE:          4027.3MB/s       4090.6MB/s      7020.1MB/s
  WRITE:          3810.5MB/s       3846.8MB/s      6426.8MB/s
  READ:           1956.1MB/s       1994.6MB/s      2145.2MB/s
  WRITE:          1955.9MB/s       1993.5MB/s      2144.8MB/s
  READ:           1852.8MB/s       1911.6MB/s      2075.8MB/s
  WRITE:          1855.7MB/s       1914.6MB/s      2078.1MB/s

perf stat

                                  4 streams                       8 streams                       per-cpu
  ====================================================================================================================
  jobs1
  stalled-cycles-frontend      23,174,811,209 (  38.21%)     23,220,254,188 (  38.25%)       23,061,406,918 (  38.34%)
  stalled-cycles-backend       11,514,174,638 (  18.98%)     11,696,722,657 (  19.27%)       11,370,852,810 (  18.90%)
  instructions                 73,925,005,782 (    1.22)     73,903,177,632 (    1.22)       73,507,201,037 (    1.22)
  branches                     14,455,124,835 ( 756.063)     14,455,184,779 ( 755.281)       14,378,599,509 ( 758.546)
  branch-misses                    69,801,336 (   0.48%)         80,225,529 (   0.55%)           72,044,726 (   0.50%)
  jobs2
  stalled-cycles-frontend      49,912,741,782 (  46.11%)     50,101,189,290 (  45.95%)       32,874,195,633 (  35.11%)
  stalled-cycles-backend       27,080,366,230 (  25.02%)     27,949,970,232 (  25.63%)       16,461,222,706 (  17.58%)
  instructions                122,831,629,690 (    1.13)    122,919,846,419 (    1.13)      121,924,786,775 (    1.30)
  branches                     23,725,889,239 ( 692.663)     23,733,547,140 ( 688.062)       23,553,950,311 ( 794.794)
  branch-misses                    90,733,041 (   0.38%)         96,320,895 (   0.41%)           84,561,092 (   0.36%)
  jobs3
  stalled-cycles-frontend      66,437,834,608 (  45.58%)     63,534,923,344 (  43.69%)       42,101,478,505 (  33.19%)
  stalled-cycles-backend       34,940,799,661 (  23.97%)     34,774,043,148 (  23.91%)       21,163,324,388 (  16.68%)
  instructions                171,692,121,862 (    1.18)    171,775,373,044 (    1.18)      170,353,542,261 (    1.34)
  branches                     32,968,962,622 ( 628.723)     32,987,739,894 ( 630.512)       32,729,463,918 ( 717.027)
  branch-misses                   111,522,732 (   0.34%)        110,472,894 (   0.33%)           99,791,291 (   0.30%)
  jobs4
  stalled-cycles-frontend      98,741,701,675 (  49.72%)     94,797,349,965 (  47.59%)       54,535,655,381 (  33.53%)
  stalled-cycles-backend       54,642,609,615 (  27.51%)     55,233,554,408 (  27.73%)       27,882,323,541 (  17.14%)
  instructions                220,884,807,851 (    1.11)    220,930,887,273 (    1.11)      218,926,845,851 (    1.35)
  branches                     42,354,518,180 ( 592.105)     42,362,770,587 ( 590.452)       41,955,552,870 ( 716.154)
  branch-misses                   138,093,449 (   0.33%)        131,295,286 (   0.31%)          121,794,771 (   0.29%)
  jobs5
  stalled-cycles-frontend     116,219,747,212 (  48.14%)    110,310,397,012 (  46.29%)       66,373,082,723 (  33.70%)
  stalled-cycles-backend       66,325,434,776 (  27.48%)     64,157,087,914 (  26.92%)       32,999,097,299 (  16.76%)
  instructions                270,615,008,466 (    1.12)    270,546,409,525 (    1.14)      268,439,910,948 (    1.36)
  branches                     51,834,046,557 ( 599.108)     51,811,867,722 ( 608.883)       51,412,576,077 ( 729.213)
  branch-misses                   158,197,086 (   0.31%)        142,639,805 (   0.28%)          133,425,455 (   0.26%)
  jobs6
  stalled-cycles-frontend     138,009,414,492 (  48.23%)    139,063,571,254 (  48.80%)       75,278,568,278 (  32.80%)
  stalled-cycles-backend       79,211,949,650 (  27.68%)     79,077,241,028 (  27.75%)       37,735,797,899 (  16.44%)
  instructions                319,763,993,731 (    1.12)    319,937,782,834 (    1.12)      316,663,600,784 (    1.38)
  branches                     61,219,433,294 ( 595.056)     61,250,355,540 ( 598.215)       60,523,446,617 ( 733.706)
  branch-misses                   169,257,123 (   0.28%)        154,898,028 (   0.25%)          141,180,587 (   0.23%)
  jobs7
  stalled-cycles-frontend     162,974,812,119 (  49.20%)    159,290,061,987 (  48.43%)       88,046,641,169 (  33.21%)
  stalled-cycles-backend       92,223,151,661 (  27.84%)     91,667,904,406 (  27.87%)       44,068,454,971 (  16.62%)
  instructions                369,516,432,430 (    1.12)    369,361,799,063 (    1.12)      365,290,380,661 (    1.38)
  branches                     70,795,673,950 ( 594.220)     70,743,136,124 ( 597.876)       69,803,996,038 ( 732.822)
  branch-misses                   181,708,327 (   0.26%)        165,767,821 (   0.23%)          150,109,797 (   0.22%)
  jobs8
  stalled-cycles-frontend     185,000,017,027 (  49.30%)    182,334,345,473 (  48.37%)       99,980,147,041 (  33.26%)
  stalled-cycles-backend      105,753,516,186 (  28.18%)    107,937,830,322 (  28.63%)       51,404,177,181 (  17.10%)
  instructions                418,153,161,055 (    1.11)    418,308,565,828 (    1.11)      413,653,475,581 (    1.38)
  branches                     80,035,882,398 ( 592.296)     80,063,204,510 ( 589.843)       79,024,105,589 ( 730.530)
  branch-misses                   199,764,528 (   0.25%)        177,936,926 (   0.22%)          160,525,449 (   0.20%)
  jobs9
  stalled-cycles-frontend     210,941,799,094 (  49.63%)    204,714,679,254 (  48.55%)      114,251,113,756 (  33.96%)
  stalled-cycles-backend      122,640,849,067 (  28.85%)    122,188,553,256 (  28.98%)       58,360,041,127 (  17.35%)
  instructions                468,151,025,415 (    1.10)    467,354,869,323 (    1.11)      462,665,165,216 (    1.38)
  branches                     89,657,067,510 ( 585.628)     89,411,550,407 ( 588.990)       88,360,523,943 ( 730.151)
  branch-misses                   218,292,301 (   0.24%)        191,701,247 (   0.21%)          178,535,678 (   0.20%)
  jobs10
  stalled-cycles-frontend     233,595,958,008 (  49.81%)    227,540,615,689 (  49.11%)      160,341,979,938 (  43.07%)
  stalled-cycles-backend      136,153,676,021 (  29.03%)    133,635,240,742 (  28.84%)       65,909,135,465 (  17.70%)
  instructions                517,001,168,497 (    1.10)    516,210,976,158 (    1.11)      511,374,038,613 (    1.37)
  branches                     98,911,641,329 ( 585.796)     98,700,069,712 ( 591.583)       97,646,761,028 ( 728.712)
  branch-misses                   232,341,823 (   0.23%)        199,256,308 (   0.20%)          183,135,268 (   0.19%)

per-cpu streams tend to cause significantly less stalled cycles; execute
less branches and hit less branch-misses.

perf stat reported execution time

                          4 streams        8 streams       per-cpu
  ====================================================================
  jobs1
  seconds elapsed        20.909073870     20.875670495    20.817838540
  jobs2
  seconds elapsed        18.529488399     18.720566469    16.356103108
  jobs3
  seconds elapsed        18.991159531     18.991340812    16.766216066
  jobs4
  seconds elapsed        19.560643828     19.551323547    16.246621715
  jobs5
  seconds elapsed        24.746498464     25.221646740    20.696112444
  jobs6
  seconds elapsed        28.258181828     28.289765505    22.885688857
  jobs7
  seconds elapsed        32.632490241     31.909125381    26.272753738
  jobs8
  seconds elapsed        35.651403851     36.027596308    29.108024711
  jobs9
  seconds elapsed        40.569362365     40.024227989    32.898204012
  jobs10
  seconds elapsed        44.673112304     43.874898137    35.632952191

Please see
Link: http://marc.info/?l=linux-kernel&m=146166970727530
Link: http://marc.info/?l=linux-kernel&m=146174716719650
for more test results (under low memory conditions).

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Suggested-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: I433a03617eda51d3ad3b3ffe93c5e1794d477893

3 years agozsmalloc: require GFP in zs_malloc()
Sergey Senozhatsky [Fri, 20 May 2016 23:59:48 +0000 (16:59 -0700)]
zsmalloc: require GFP in zs_malloc()

Pass GFP flags to zs_malloc() instead of using a fixed mask supplied to
zs_create_pool(), so we can be more flexible, but, more importantly, we
need this to switch zram to per-cpu compression streams -- zram will try
to allocate handle with preemption disabled in a fast path and switch to
a slow path (using different gfp mask) if the fast one has failed.

Apart from that, this also align zs_malloc() interface with zspool/zbud.

[sergey.senozhatsky@gmail.com: pass GFP flags to zs_malloc() instead of using a fixed mask]
Link: http://lkml.kernel.org/r/20160429150942.GA637@swordfish
Link: http://lkml.kernel.org/r/20160429150942.GA637@swordfish
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: I58cd72a0b1925203ec7d9a3962cc6c3422e8ac96

3 years agozram/zcomp: do not zero out zcomp private pages
Sergey Senozhatsky [Thu, 14 Jan 2016 23:22:35 +0000 (15:22 -0800)]
zram/zcomp: do not zero out zcomp private pages

Do not __GFP_ZERO allocated zcomp ->private pages.  We keep allocated
streams around and use them for read/write requests, so we supply a
zeroed out ->private to compression algorithm as a scratch buffer only
once -- the first time we use that stream.  For the rest of IO requests
served by this stream ->private usually contains some temporarily data
from the previous requests.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: Icf20e8a707f7c925952034bc691e6ae593ef6f22

3 years agozram: pass gfp from zcomp frontend to backend
Minchan Kim [Thu, 14 Jan 2016 23:22:32 +0000 (15:22 -0800)]
zram: pass gfp from zcomp frontend to backend

Each zcomp backend uses own gfp flag but it's pointless because the
context they could be called is driven by upper layer(ie, zcomp
frontend).  As well, zcomp frondend could call them in different
context.  One context(ie, zram init part) is it should be better to make
sure successful allocation other context(ie, further stream allocation
part for accelarating I/O speed) is just optional so let's pass gfp down
from driver (ie, zcomp frontend) like normal MM convention.

[sergey.senozhatsky@gmail.com: add missing __vmalloc zero and highmem gfps]
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: Ie60f5399a4ec8ad54a0cd7575977d79e6eda6e43

3 years agozram: fix pool name truncation
Sergey Senozhatsky [Fri, 14 Aug 2015 22:35:19 +0000 (15:35 -0700)]
zram: fix pool name truncation

zram_meta_alloc() constructs a pool name for zs_create_pool() call as

    snprintf(pool_name, sizeof(pool_name), "zram%d", device_id);

However, it defines pool name buffer to be only 8 bytes long (minus
trailing zero), which means that we can have only 1000 pool names: zram0
-- zram999.

With CONFIG_ZSMALLOC_STAT enabled an attempt to create a device zram1000
can fail if device zram100 already exists, because snprintf() will
truncate new pool name to zram100 and pass it debugfs_create_dir(),
causing:

  debugfs dir <zram100> creation failed
  zram: Error creating memory pool

... and so on.

Fix it by passing zram->disk->disk_name to zram_meta_alloc() instead of
divice_id.  We construct zram%d name earlier and keep it as a ->disk_name,
no need to snprintf() it again.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: I4c42cc1134b09f93a824d15e589e66fee28af539

3 years agozram: check comp algorithm availability earlier
Sergey Senozhatsky [Thu, 25 Jun 2015 22:00:32 +0000 (15:00 -0700)]
zram: check comp algorithm availability earlier

Improvement idea by Marcin Jabrzyk.

comp_algorithm_store() silently accepts any supplied algorithm name,
because zram performs algorithm availability check later, during the
device configuration phase in disksize_store() and emits the following
error:

  "zram: Cannot initialise %s compressing backend"

this error line is somewhat generic and, besides, can indicate a failed
attempt to allocate compression backend's working buffers.

add algorithm availability check to comp_algorithm_store():

  echo lzz > /sys/block/zram0/comp_algorithm
  -bash: echo: write error: Invalid argument

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reported-by: Marcin Jabrzyk <m.jabrzyk@samsung.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: Iece40446b18e7e37fa4ced3459d40ddbb9984297

3 years agozram: cut trailing newline in algorithm name
Sergey Senozhatsky [Thu, 25 Jun 2015 22:00:29 +0000 (15:00 -0700)]
zram: cut trailing newline in algorithm name

Supplied sysfs values sometimes contain new-line symbols (echo vs.  echo
-n), which we also copy as a compression algorithm name.  it works fine
when we lookup for compression algorithm, because we use sysfs_streq()
which takes care of new line symbols.  however, it doesn't look nice when
we print compression algorithm name if zcomp_create() failed:

 zram: Cannot initialise LXZ
            compressing backend

cut trailing new-line, so the error string will look like

  zram: Cannot initialise LXZ compressing backend

we also now can replace sysfs_streq() in zcomp_available_show() with
strcmp().

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: If00faf29cca63fc20c8caea57ababe97a16c4a7a

3 years agozram: cosmetic zram_bvec_write() cleanup
Sergey Senozhatsky [Thu, 25 Jun 2015 22:00:27 +0000 (15:00 -0700)]
zram: cosmetic zram_bvec_write() cleanup

`bool locked' local variable tells us if we should perform
zcomp_strm_release() or not (jumped to `out' label before
zcomp_strm_find() occurred), which is equivalent to `zstrm' being or not
being NULL.  remove `locked' and check `zstrm' instead.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: Id55b0c47aa6b508ac5138f90fafd92a82d2fe9d8

3 years agouniversal7580: abov_touchkey: Fix firmware loading of ft1604 devices
Danny Wood [Tue, 9 Feb 2021 15:43:46 +0000 (15:43 +0000)]
universal7580: abov_touchkey: Fix firmware loading of ft1604 devices

Also fix ft1604 firmware versions of other A510x variants

Change-Id: I33004e2f62b5954bf7b4f2ab6337c0ce39907287

3 years agouniversal7580: Enable FB_WINDOW_UPDATE on all devices
Danny Wood [Fri, 10 Sep 2021 08:48:34 +0000 (09:48 +0100)]
universal7580: Enable FB_WINDOW_UPDATE on all devices

Change-Id: I66f2a7fb7e0128518e3df75f0b9653500169334d

3 years agouniversal7580: Backport LZ4 compression from Linux 4.1.52
Danny Wood [Fri, 27 Aug 2021 15:42:04 +0000 (16:42 +0100)]
universal7580: Backport LZ4 compression from Linux 4.1.52

* Enable it in our configs

Change-Id: Ideb285d3678e479ede33b56e30e99062ec4884d6

3 years agouniversal7580: enable ZRAM
Danny Wood [Tue, 3 Aug 2021 13:48:01 +0000 (14:48 +0100)]
universal7580: enable ZRAM

Change-Id: Icc376054fee0160ceb35edd09e5e733dff5eb52a

3 years agomm/zpool: add name argument to create zpool
Ganesh Mahendran [Thu, 12 Feb 2015 23:00:51 +0000 (15:00 -0800)]
mm/zpool: add name argument to create zpool

Currently the underlay of zpool: zsmalloc/zbud, do not know who creates
them.  There is not a method to let zsmalloc/zbud find which caller they
belong to.

Now we want to add statistics collection in zsmalloc.  We need to name the
debugfs dir for each pool created.  The way suggested by Minchan Kim is to
use a name passed by caller(such as zram) to create the zsmalloc pool.

    /sys/kernel/debug/zsmalloc/zram0

This patch adds an argument `name' to zs_create_pool() and other related
functions.

Change-Id: Ic197610141c0dc2711c239db01f735a938bf3808
Signed-off-by: Ganesh Mahendran <opensource.ganesh@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Seth Jennings <sjennings@variantweb.net>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Dan Streetman <ddstreet@ieee.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoblock: zram: Backport from Linux 4.1.52
Sultan Qasim Khan [Sun, 6 Sep 2015 21:59:58 +0000 (17:59 -0400)]
block: zram: Backport from Linux 4.1.52

Change-Id: I23f6f75979077992298d848efd79a6efc0d776bd

3 years agomm: zsmalloc: backport from Linux 4.1.52
Sultan Qasim Khan [Sun, 6 Sep 2015 22:12:58 +0000 (18:12 -0400)]
mm: zsmalloc: backport from Linux 4.1.52

Change-Id: I3960e31f889d643e87b99fe7a88a1e0ca402d6cd

3 years agoAdd initial support for the Samsung On8 (J710FNDDU1BRJ1)
Danny Wood [Mon, 10 May 2021 10:16:01 +0000 (11:16 +0100)]
Add initial support for the Samsung On8 (J710FNDDU1BRJ1)

Change-Id: Ie7ca517efb81c758499d08a4329d46fb089c960c

3 years agonet: PPPoPNS: use updated data_ready API definition
Matt Wagantall [Thu, 2 Jul 2015 02:43:51 +0000 (19:43 -0700)]
net: PPPoPNS: use updated data_ready API definition

data_ready(struct sock *) no longer takes a length argument.
Remove length argument in the implementation of this function
in PPPoPNS. It was never used anyway.

Change-Id: I457cedd375a490dd85e60f172b4122b4c7ba36f0
Signed-off-by: Matt Wagantall <mattw@codeaurora.org>
3 years agotcp: fix a potential deadlock in tcp_get_info()
Eric Dumazet [Fri, 22 May 2015 04:51:19 +0000 (21:51 -0700)]
tcp: fix a potential deadlock in tcp_get_info()

Taking socket spinlock in tcp_get_info() can deadlock, as
inet_diag_dump_icsk() holds the &hashinfo->ehash_locks[i],
while packet processing can use the reverse locking order.

We could avoid this locking for TCP_LISTEN states, but lockdep would
certainly get confused as all TCP sockets share same lockdep classes.

[  523.722504] ======================================================
[  523.728706] [ INFO: possible circular locking dependency detected ]
[  523.734990] 4.1.0-dbg-DEV #1676 Not tainted
[  523.739202] -------------------------------------------------------
[  523.745474] ss/18032 is trying to acquire lock:
[  523.750002]  (slock-AF_INET){+.-...}, at: [<ffffffff81669d44>] tcp_get_info+0x2c4/0x360
[  523.758129]
[  523.758129] but task is already holding lock:
[  523.763968]  (&(&hashinfo->ehash_locks[i])->rlock){+.-...}, at: [<ffffffff816bcb75>] inet_diag_dump_icsk+0x1d5/0x6c0
[  523.774661]
[  523.774661] which lock already depends on the new lock.
[  523.774661]
[  523.782850]
[  523.782850] the existing dependency chain (in reverse order) is:
[  523.790326]
-> #1 (&(&hashinfo->ehash_locks[i])->rlock){+.-...}:
[  523.796599]        [<ffffffff811126bb>] lock_acquire+0xbb/0x270
[  523.802565]        [<ffffffff816f5868>] _raw_spin_lock+0x38/0x50
[  523.808628]        [<ffffffff81665af8>] __inet_hash_nolisten+0x78/0x110
[  523.815273]        [<ffffffff816819db>] tcp_v4_syn_recv_sock+0x24b/0x350
[  523.822067]        [<ffffffff81684d41>] tcp_check_req+0x3c1/0x500
[  523.828199]        [<ffffffff81682d09>] tcp_v4_do_rcv+0x239/0x3d0
[  523.834331]        [<ffffffff816842fe>] tcp_v4_rcv+0xa8e/0xc10
[  523.840202]        [<ffffffff81658fa3>] ip_local_deliver_finish+0x133/0x3e0
[  523.847214]        [<ffffffff81659a9a>] ip_local_deliver+0xaa/0xc0
[  523.853440]        [<ffffffff816593b8>] ip_rcv_finish+0x168/0x5c0
[  523.859624]        [<ffffffff81659db7>] ip_rcv+0x307/0x420

Lets use u64_sync infrastructure instead. As a bonus, 64bit
arches get optimized, as these are nop for them.

Fixes: 0df48c26d841 ("tcp: add tcpi_bytes_acked to tcp_info")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change-Id: Ic33ba5a9c4ca5dfd3d2224b7c0ed9fbe9eccd0ca

3 years agotcp: add tcpi_segs_in and tcpi_segs_out to tcp_info
Marcelo Ricardo Leitner [Wed, 20 May 2015 23:35:41 +0000 (16:35 -0700)]
tcp: add tcpi_segs_in and tcpi_segs_out to tcp_info

This patch tracks the total number of inbound and outbound segments on a
TCP socket. One may use this number to have an idea on connection
quality when compared against the retransmissions.

RFC4898 named these : tcpEStatsPerfSegsIn and tcpEStatsPerfSegsOut

These are a 32bit field each and can be fetched both from TCP_INFO
getsockopt() if one has a handle on a TCP socket, or from inet_diag
netlink facility (iproute2/ss patch will follow)

Note that tp->segs_out was placed near tp->snd_nxt for good data
locality and minimal performance impact, while tp->segs_in was placed
near tp->bytes_received for the same reason.

Join work with Eric Dumazet.

Note that received SYN are accounted on the listener, but sent SYNACK
are not accounted.

Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change-Id: I7bf746ad3f9fcc01f672f7dc110aa0c43d9ee042

3 years agotcp: add tcpi_bytes_received to tcp_info
Eric Dumazet [Tue, 28 Apr 2015 22:28:18 +0000 (15:28 -0700)]
tcp: add tcpi_bytes_received to tcp_info

This patch tracks total number of payload bytes received on a TCP socket.
This is the sum of all changes done to tp->rcv_nxt

RFC4898 named this : tcpEStatsAppHCThruOctetsReceived

This is a 64bit field, and can be fetched both from TCP_INFO
getsockopt() if one has a handle on a TCP socket, or from inet_diag
netlink facility (iproute2/ss patch will follow)

Note that tp->bytes_received was placed near tp->rcv_nxt for
best data locality and minimal performance impact.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Matt Mathis <mattmathis@google.com>
Cc: Eric Salo <salo@google.com>
Cc: Martin Lau <kafai@fb.com>
Cc: Chris Rapier <rapier@psc.edu>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change-Id: Ieaf3d4c7d1bea0ac8906cc0702bc22c73314c889

3 years agoBACKPORT net: Fix use after free by removing length arg from sk_data_ready callbacks.
David S. Miller [Fri, 11 Apr 2014 20:15:36 +0000 (16:15 -0400)]
BACKPORT net: Fix use after free by removing length arg from sk_data_ready callbacks.

Several spots in the kernel perform a sequence like:

skb_queue_tail(&sk->s_receive_queue, skb);
sk->sk_data_ready(sk, skb->len);

But at the moment we place the SKB onto the socket receive queue it
can be consumed and freed up.  So this skb->len access is potentially
to freed up memory.

Furthermore, the skb->len can be modified by the consumer so it is
possible that the value isn't accurate.

And finally, no actual implementation of this callback actually uses
the length argument.  And since nobody actually cared about it's
value, lots of call sites pass arbitrary values in such as '0' and
even '1'.

So just remove the length argument from the callback, that way there
is no confusion whatsoever and all of these use-after-free cases get
fixed as a side effect.

Based upon a patch by Eric Dumazet and his suggestion to audit this
issue tree-wide.

Change-Id: Ia7443b38da20849b684957049812805ff25ab1f0
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: elektroschmock <elektroschmock78@googlemail.com>
3 years agoocfs2/o2net: o2net_listen_data_ready should do nothing if socket state is not TCP_LISTEN
Tariq Saeed [Thu, 3 Apr 2014 21:47:11 +0000 (14:47 -0700)]
ocfs2/o2net: o2net_listen_data_ready should do nothing if socket state is not TCP_LISTEN

Orabug: 17330860

When accepting an incomming connection o2net_accept_one clones a child
data socket from the parent listening socket.  It then proceeds to setup
the child with callback o2net_data_ready() and sk_user_data to NULL.  If
data arrives in this window, o2net_listen_data_ready will be called with
some non-deterministic value in sk_user_data (not inherited).  We panic
when we page fault on sk_user_data -- in parent it is
sock_def_readable().

The fix is to recognize that this is a data socket being set up by
looking at the socket state and do nothing.

Change-Id: Iffbed59cbec1499e67e6b41ddfdcdc663957c6f5
Signed-off-by: Tariq Saseed <tariq.x.saeed@oracle.com>
Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agotcp: add tcpi_bytes_acked to tcp_info
Eric Dumazet [Tue, 28 Apr 2015 22:28:17 +0000 (15:28 -0700)]
tcp: add tcpi_bytes_acked to tcp_info

This patch tracks total number of bytes acked for a TCP socket.
This is the sum of all changes done to tp->snd_una, and allows
for precise tracking of delivered data.

RFC4898 named this : tcpEStatsAppHCThruOctetsAcked

This is a 64bit field, and can be fetched both from TCP_INFO
getsockopt() if one has a handle on a TCP socket, or from inet_diag
netlink facility (iproute2/ss patch will follow)

Note that tp->bytes_acked was placed near tp->snd_una for
best data locality and minimal performance impact.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Cc: Matt Mathis <mattmathis@google.com>
Cc: Eric Salo <salo@google.com>
Cc: Martin Lau <kafai@fb.com>
Cc: Chris Rapier <rapier@psc.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change-Id: I4c65cc5034eb486ebd6c36783b35cab3d311be34

3 years agotcp: add pacing_rate information into tcp_info
Eric Dumazet [Thu, 13 Feb 2014 22:27:40 +0000 (14:27 -0800)]
tcp: add pacing_rate information into tcp_info

Add two new fields to struct tcp_info, to report sk_pacing_rate
and sk_max_pacing_rate to monitoring applications, as ss from iproute2.

User exported fields are 64bit, even if kernel is currently using 32bit
fields.

lpaa5:~# ss -i
..
 skmem:(r0,rb357120,t0,tb2097152,f1584,w1980880,o0,bl0) ts sack cubic
wscale:6,6 rto:400 rtt:0.875/0.75 mss:1448 cwnd:1 ssthresh:12 send
13.2Mbps pacing_rate 3336.2Mbps unacked:15 retrans:1/5448 lost:15
rcv_space:29200

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change-Id: Ib4f3ace4059fa799fe1af94713c31efacd593aac

3 years agonet: add missing sk_max_pacing_rate doc
Eric Dumazet [Sun, 29 Sep 2013 08:12:40 +0000 (01:12 -0700)]
net: add missing sk_max_pacing_rate doc

Warning(include/net/sock.h:411): No description found for parameter
'sk_max_pacing_rate'

Lets please "make htmldocs" and kbuild bot.

Reported-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change-Id: Ie72d80b73c2c2cdc57d9866cd932843f086f9d05

3 years agonet: introduce SO_MAX_PACING_RATE
Eric Dumazet [Tue, 24 Sep 2013 15:20:52 +0000 (08:20 -0700)]
net: introduce SO_MAX_PACING_RATE

As mentioned in commit afe4fd062416b ("pkt_sched: fq: Fair Queue packet
scheduler"), this patch adds a new socket option.

SO_MAX_PACING_RATE offers the application the ability to cap the
rate computed by transport layer. Value is in bytes per second.

u32 val = 1000000;
setsockopt(sockfd, SOL_SOCKET, SO_MAX_PACING_RATE, &val, sizeof(val));

To be effectively paced, a flow must use FQ packet scheduler.

Note that a packet scheduler takes into account the headers for its
computations. The effective payload rate depends on MSS and retransmits
if any.

I chose to make this pacing rate a SOL_SOCKET option instead of a
TCP one because this can be used by other protocols.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Steinar H. Gunderson <sesse@google.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change-Id: I5fda6798068d171868fd3d52873778c993f20a9f

3 years agonet: Replace u64_stats_fetch_begin_bh to u64_stats_fetch_begin_irq
Eric W. Biederman [Fri, 14 Mar 2014 04:26:42 +0000 (21:26 -0700)]
net: Replace u64_stats_fetch_begin_bh to u64_stats_fetch_begin_irq

Replace the bh safe variant with the hard irq safe variant.

We need a hard irq safe variant to deal with netpoll transmitting
packets from hard irq context, and we need it in most if not all of
the places using the bh safe variant.

Except on 32bit uni-processor the code is exactly the same so don't
bother with a bh variant, just have a hard irq safe variant that
everyone can use.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change-Id: I53e4597523d89796b22b0112f02a6ee1a2f6d256

3 years agotile: handle 64-bit statistics in tilepro network driver
Chris Metcalf [Thu, 25 Jul 2013 16:41:15 +0000 (12:41 -0400)]
tile: handle 64-bit statistics in tilepro network driver

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change-Id: I5b22e93ad81dd8f57637cc38b8fcb5f0dd3e2372

3 years agoopenvswitch: rename ->sync to ->syncp
WANG Cong [Fri, 14 Feb 2014 23:10:46 +0000 (15:10 -0800)]
openvswitch: rename ->sync to ->syncp

Openvswitch defines u64_stats_sync as ->sync rather than ->syncp,
so fails to compile with netdev_alloc_pcpu_stats(). So just rename it to ->syncp.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Fixes: 1c213bd24ad04f4430031 (net: introduce netdev_alloc_pcpu_stats() for drivers)
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change-Id: I12536247d88850e64d4ef9b71a3ec4332aca32c2

3 years agoipv6: fix the use of pcpu_tstats in sit
Li RongQing [Thu, 2 Jan 2014 00:49:36 +0000 (08:49 +0800)]
ipv6: fix the use of pcpu_tstats in sit

when read/write the 64bit data, the correct lock should be hold.

Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change-Id: Ie289d00d1761200ae14336c38324c2b8ba179ec7

3 years agonet: Explicitly initialize u64_stats_sync structures for lockdep
John Stultz [Mon, 7 Oct 2013 22:51:58 +0000 (15:51 -0700)]
net: Explicitly initialize u64_stats_sync structures for lockdep

In order to enable lockdep on seqcount/seqlock structures, we
must explicitly initialize any locks.

The u64_stats_sync structure, uses a seqcount, and thus we need
to introduce a u64_stats_init() function and use it to initialize
the structure.

This unfortunately adds a lot of fairly trivial initialization code
to a number of drivers. But the benefit of ensuring correctness makes
this worth while.

Because these changes are required for lockdep to be enabled, and the
changes are quite trivial, I've not yet split this patch out into 30-some
separate patches, as I figured it would be better to get the various
maintainers thoughts on how to best merge this change along with
the seqcount lockdep enablement.

Feedback would be appreciated!

Signed-off-by: John Stultz <john.stultz@linaro.org>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: James Morris <jmorris@namei.org>
Cc: Jesse Gross <jesse@nicira.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Mirko Lindner <mlindner@marvell.com>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Roger Luethi <rl@hellgate.ch>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Simon Horman <horms@verge.net.au>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Cc: Wensong Zhang <wensong@linux-vs.org>
Cc: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/1381186321-4906-2-git-send-email-john.stultz@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Change-Id: If41808d130a6c0235faf7c64f17c7beaf33d40ae

3 years agoipv6: fix endianness error in icmpv6_err
Hannes Frederic Sowa [Sat, 11 Jun 2016 18:32:06 +0000 (20:32 +0200)]
ipv6: fix endianness error in icmpv6_err

IPv6 ping socket error handler doesn't correctly convert the new 32 bit
mtu to host endianness before using.

[Cherry-pick of net dcb94b88c09ce82a80e188d49bcffdc83ba215a6]

Bug: 29370996
Change-Id: Idf475e2555252d91e1d3fa92071a661242780074
Cc: Lorenzo Colitti <lorenzo@google.com>
Fixes: 6d0bfe22611602f ("net: ipv6: Add IPv6 support to the ping socket.")
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agofs: sdfat: Update to version 2.4.5
Kevin F. Haggerty [Fri, 11 Dec 2020 14:27:25 +0000 (07:27 -0700)]
fs: sdfat: Update to version 2.4.5

* Samsung source G981USQU1CTKH

Signed-off-by: Kevin F. Haggerty <haggertk@lineageos.org>
Change-Id: I79b75d2e47e9be33b311b8d72ac92c66b45a7df1

3 years agofs: sdfat: Update to version 2.3.0
Kevin F. Haggerty [Fri, 13 Dec 2019 23:38:33 +0000 (16:38 -0700)]
fs: sdfat: Update to version 2.3.0

* Samsung version G975FXXU3BSKO

Change-Id: I11a2c361ba70441d2a75188a4f91d3cd324d1a9e
Signed-off-by: Kevin F. Haggerty <haggertk@lineageos.org>
3 years agofs: sdfat: Update to version 2.1.8
Kevin F. Haggerty [Sat, 12 Jan 2019 16:10:18 +0000 (09:10 -0700)]
fs: sdfat: Update to version 2.1.8

* Samsung version G960FXXU2CRLI

Change-Id: Ib935f8a5eae8d6145e7b585cc9239caef1d7216b
Signed-off-by: Kevin F. Haggerty <haggertk@lineageos.org>
3 years agofs: sdfat: Add MODULE_ALIAS_FS for supported filesystems
Paul Keith [Wed, 28 Mar 2018 17:52:29 +0000 (19:52 +0200)]
fs: sdfat: Add MODULE_ALIAS_FS for supported filesystems

* This is the proper thing to do for filesystem drivers

Change-Id: I109b201d85e324cc0a72c3fcd09df4a3e1703042
Signed-off-by: Paul Keith <javelinanddart@gmail.com>
3 years agoUPSTREAM: selinux: Remove unnecessary check of array base in selinux_set_mapping()
Matthias Kaehlcke [Thu, 16 Mar 2017 22:26:52 +0000 (15:26 -0700)]
UPSTREAM: selinux: Remove unnecessary check of array base in selinux_set_mapping()

'perms' will never be NULL since it isn't a plain pointer but an array
of u32 values.

This fixes the following warning when building with clang:

security/selinux/ss/services.c:158:16: error: address of array
'p_in->perms' will always evaluate to 'true'
[-Werror,-Wpointer-bool-conversion]
                while (p_in->perms && p_in->perms[k]) {

Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Link: https://git.kernel.org/linus/342e91578eb6909529bc7095964cd44b9c057c4e
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Change-Id: Iacc441a51a908c5fc3fcbd7874802b8eb889f828

3 years agobinder: fix UAF when releasing todo list
Todd Kjos [Tue, 21 Jul 2020 04:14:33 +0000 (21:14 -0700)]
binder: fix UAF when releasing todo list

When releasing a thread todo list when tearing down
a binder_proc, the following race was possible which
could result in a use-after-free:

1.  Thread 1: enter binder_release_work from binder_thread_release
2.  Thread 2: binder_update_ref_for_handle() calls binder_dec_node_ilocked()
3.  Thread 2: dec nodeA --> 0 (will free node)
4.  Thread 1: ACQ inner_proc_lock
5.  Thread 2: block on inner_proc_lock
6.  Thread 1: dequeue work (BINDER_WORK_NODE, part of nodeA)
7.  Thread 1: REL inner_proc_lock
8.  Thread 2: ACQ inner_proc_lock
9.  Thread 2: todo list cleanup, but work was already dequeued
10. Thread 2: free node
11. Thread 2: REL inner_proc_lock
12. Thread 1: deref w->type (UAF)

The problem was that for a BINDER_WORK_NODE, the binder_work element
must not be accessed after releasing the inner_proc_lock while
processing the todo list elements since another thread might be
handling a deref on the node containing the binder_work element
leading to the node being freed.

Bug: 161151868
Signed-off-by: Todd Kjos <tkjos@google.com>
Change-Id: I4ae752abfe1aa38872be6f266ddd271802952625
Git-repo: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Git-commit: cebe72f19bebdee8fc9f1b31dd21a88a259ff419
Signed-off-by: Alam Md Danish <amddan@codeaurora.org>
Signed-off-by: Rahul Shahare <rshaha@codeaurora.org>
3 years agoUPSTREAM: binder: fix incorrect cmd to binder_stat_br
Todd Kjos [Tue, 8 Aug 2017 22:48:36 +0000 (15:48 -0700)]
UPSTREAM: binder: fix incorrect cmd to binder_stat_br

commit 26549d177410 ("binder: guarantee txn complete / errors delivered
in-order") passed the locally declared and undefined cmd
to binder_stat_br() which results in a bogus cmd field in a trace
event and BR stats are incremented incorrectly.

Change to use e->cmd which has been initialized.

Signed-off-by: Todd Kjos <tkjos@google.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: 26549d177410 ("binder: guarantee txn complete / errors delivered in-order")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 4f9adc8f91ba996374cd9487ecd1180fa99b9438)
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Id8b0eefbee754408eb97ffb7050389aeeecb2214

3 years agolocks: print unsigned ino in /proc/locks
Amir Goldstein [Sun, 22 Dec 2019 18:45:28 +0000 (20:45 +0200)]
locks: print unsigned ino in /proc/locks

commit 98ca480a8f22fdbd768e3dad07024c8d4856576c upstream.

An ino is unsigned, so display it as such in /proc/locks.

Cc: stable@vger.kernel.org
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Change-Id: I250a495fe3fc809e880535347f462fe552644edf
Signed-off-by: Kevin F. Haggerty <haggertk@lineageos.org>
3 years agolocks: rename FL_FILE_PVT and IS_FILE_PVT to use "*_OFDLCK" instead
Jeff Layton [Tue, 22 Apr 2014 12:24:32 +0000 (08:24 -0400)]
locks: rename FL_FILE_PVT and IS_FILE_PVT to use "*_OFDLCK" instead

File-private locks have been re-christened as "open file description"
locks.  Finish the symbol name cleanup in the internal implementation.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Change-Id: Iee48047540a7d8fefb5078cc005ae9ea8994f521
Signed-off-by: Kevin F. Haggerty <haggertk@lineageos.org>
3 years agolocks: rename file-private locks to "open file description locks"
Jeff Layton [Tue, 22 Apr 2014 12:23:58 +0000 (08:23 -0400)]
locks: rename file-private locks to "open file description locks"

File-private locks have been merged into Linux for v3.15, and *now*
people are commenting that the name and macro definitions for the new
file-private locks suck.

...and I can't even disagree. The names and command macros do suck.

We're going to have to live with these for a long time, so it's
important that we be happy with the names before we're stuck with them.
The consensus on the lists so far is that they should be rechristened as
"open file description locks".

The name isn't a big deal for the kernel, but the command macros are not
visually distinct enough from the traditional POSIX lock macros. The
glibc and documentation folks are recommending that we change them to
look like F_OFD_{GETLK|SETLK|SETLKW}. That lessens the chance that a
programmer will typo one of the commands wrong, and also makes it easier
to spot this difference when reading code.

This patch makes the following changes that I think are necessary before
v3.15 ships:

1) rename the command macros to their new names. These end up in the uapi
   headers and so are part of the external-facing API. It turns out that
   glibc doesn't actually use the fcntl.h uapi header, but it's hard to
   be sure that something else won't. Changing it now is safest.

2) make the the /proc/locks output display these as type "OFDLCK"

Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Carlos O'Donell <carlos@redhat.com>
Cc: Stefan Metzmacher <metze@samba.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Frank Filz <ffilzlnx@mindspring.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Change-Id: Ia975197281d4c80a4ad420d7621896d2f369cef6
Signed-off-by: Kevin F. Haggerty <haggertk@lineageos.org>
3 years agolocks: add new fcntl cmd values for handling file private locks
Jeff Layton [Mon, 3 Feb 2014 17:13:10 +0000 (12:13 -0500)]
locks: add new fcntl cmd values for handling file private locks

Due to some unfortunate history, POSIX locks have very strange and
unhelpful semantics. The thing that usually catches people by surprise
is that they are dropped whenever the process closes any file descriptor
associated with the inode.

This is extremely problematic for people developing file servers that
need to implement byte-range locks. Developers often need a "lock
management" facility to ensure that file descriptors are not closed
until all of the locks associated with the inode are finished.

Additionally, "classic" POSIX locks are owned by the process. Locks
taken between threads within the same process won't conflict with one
another, which renders them useless for synchronization between threads.

This patchset adds a new type of lock that attempts to address these
issues. These locks conflict with classic POSIX read/write locks, but
have semantics that are more like BSD locks with respect to inheritance
and behavior on close.

This is implemented primarily by changing how fl_owner field is set for
these locks. Instead of having them owned by the files_struct of the
process, they are instead owned by the filp on which they were acquired.
Thus, they are inherited across fork() and are only released when the
last reference to a filp is put.

These new semantics prevent them from being merged with classic POSIX
locks, even if they are acquired by the same process. These locks will
also conflict with classic POSIX locks even if they are acquired by
the same process or on the same file descriptor.

The new locks are managed using a new set of cmd values to the fcntl()
syscall. The initial implementation of this converts these values to
"classic" cmd values at a fairly high level, and the details are not
exposed to the underlying filesystem. We may eventually want to push
this handing out to the lower filesystem code but for now I don't
see any need for it.

Also, note that with this implementation the new cmd values are only
available via fcntl64() on 32-bit arches. There's little need to
add support for legacy apps on a new interface like this.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Change-Id: I35691bdfed9cadcbbcb6ff6804d9eea1db661ddc
Signed-off-by: Kevin F. Haggerty <haggertk@lineageos.org>
3 years agolocks: pass the cmd value to fcntl_getlk/getlk64
Jeff Layton [Mon, 3 Feb 2014 17:13:09 +0000 (12:13 -0500)]
locks: pass the cmd value to fcntl_getlk/getlk64

Once we introduce file private locks, we'll need to know what cmd value
was used, as that affects the ownership and whether a conflict would
arise.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Change-Id: Iaeb8233ae25bde5ef0049118ff94e4a9e0f02214
Signed-off-by: Kevin F. Haggerty <haggertk@lineageos.org>
3 years agolocks: report l_pid as -1 for FL_FILE_PVT locks
Jeff Layton [Mon, 3 Feb 2014 17:13:09 +0000 (12:13 -0500)]
locks: report l_pid as -1 for FL_FILE_PVT locks

FL_FILE_PVT locks are no longer tied to a particular pid, and are
instead inheritable by child processes. Report a l_pid of '-1' for
these sorts of locks since the pid is somewhat meaningless for them.

This precedent comes from FreeBSD. There, POSIX and flock() locks can
conflict with one another. If fcntl(F_GETLK, ...) returns a lock set
with flock() then the l_pid member cannot be a process ID because the
lock is not held by a process as such.

Acked-by: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Change-Id: I7d702fcaaaf8592356926d51b60e53ee217ca747
Signed-off-by: Kevin F. Haggerty <haggertk@lineageos.org>
3 years agolocks: make /proc/locks show IS_FILE_PVT locks as type "FLPVT"
Jeff Layton [Mon, 3 Feb 2014 17:13:09 +0000 (12:13 -0500)]
locks: make /proc/locks show IS_FILE_PVT locks as type "FLPVT"

In a later patch, we'll be adding a new type of lock that's owned by
the struct file instead of the files_struct. Those sorts of locks
will be flagged with a new FL_FILE_PVT flag.

Report these types of locks as "FLPVT" in /proc/locks to distinguish
them from "classic" POSIX locks.

Acked-by: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Change-Id: Id0b6d9c7a947b512e5683ad3b6188d73582c2de9
Signed-off-by: Kevin F. Haggerty <haggertk@lineageos.org>
3 years agolocks: rename locks_remove_flock to locks_remove_file
Jeff Layton [Mon, 3 Feb 2014 17:13:08 +0000 (12:13 -0500)]
locks: rename locks_remove_flock to locks_remove_file

This function currently removes leases in addition to flock locks and in
a later patch we'll have it deal with file-private locks too. Rename it
to locks_remove_file to indicate that it removes locks that are
associated with a particular struct file, and not just flock locks.

Acked-by: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Change-Id: I1289cfbc02eb778532e984a29adffb02a9370cc1
Signed-off-by: Kevin F. Haggerty <haggertk@lineageos.org>
3 years agodtc: remove extra parentheses to pass clang check
Yunlian Jiang [Thu, 11 Apr 2013 18:02:50 +0000 (11:02 -0700)]
dtc: remove extra parentheses to pass clang check

BUG=chromium:230457
TEST=the clang warning is gone

Change-Id: If9536c181d564e6ee3c1b5777dd78ad3a57a16c7
Reviewed-on: https://gerrit.chromium.org/gerrit/47879
Reviewed-by: Han Shen <shenhan@chromium.org>
Commit-Queue: Yunlian Jiang <yunlian@chromium.org>
Tested-by: Yunlian Jiang <yunlian@chromium.org>
3 years agomodpost: file2alias: check prototype of handler
Masahiro Yamada [Thu, 22 Nov 2018 04:28:42 +0000 (13:28 +0900)]
modpost: file2alias: check prototype of handler

[ Upstream commit f880eea68fe593342fa6e09be9bb661f3c297aec ]

Use specific prototype instead of an opaque pointer so that the
compiler can catch function prototype mismatch.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Change-Id: I522d2ef030c5a98fc06c3b9c93c7be34b750d037

3 years agomodpost: file2alias: go back to simple devtable lookup
Masahiro Yamada [Thu, 22 Nov 2018 04:28:41 +0000 (13:28 +0900)]
modpost: file2alias: go back to simple devtable lookup

[ Upstream commit ec91e78d378cc5d4b43805a1227d8e04e5dfa17d ]

Commit e49ce14150c6 ("modpost: use linker section to generate table.")
was not so cool as we had expected first; it ended up with ugly section
hacks when commit dd2a3acaecd7 ("mod/file2alias: make modpost compile
on darwin again") came in.

Given a certain degree of unknowledge about the link stage of host
programs, I really want to see simple, stupid table lookup so that
this works in the same way regardless of the underlying executable
format.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Change-Id: If4290e58a2c34a7f69e2aa8e9ec0b07f15792d21

3 years agoBACKPORT: mm: Add an F_SEAL_FUTURE_WRITE seal to memfd
Joel Fernandes [Wed, 19 Dec 2018 17:54:40 +0000 (09:54 -0800)]
BACKPORT: mm: Add an F_SEAL_FUTURE_WRITE seal to memfd

Android uses ashmem for sharing memory regions. We are looking forward
to migrating all usecases of ashmem to memfd so that we can possibly
remove the ashmem driver in the future from staging while also
benefiting from using memfd and contributing to it. Note staging drivers
are also not ABI and generally can be removed at anytime.

One of the main usecases Android has is the ability to create a region
and mmap it as writeable, then add protection against making any
"future" writes while keeping the existing already mmap'ed
writeable-region active.  This allows us to implement a usecase where
receivers of the shared memory buffer can get a read-only view, while
the sender continues to write to the buffer.
See CursorWindow documentation in Android for more details:
https://developer.android.com/reference/android/database/CursorWindow

This usecase cannot be implemented with the existing F_SEAL_WRITE seal.
To support the usecase, this patch adds a new F_SEAL_FUTURE_WRITE seal
which prevents any future mmap and write syscalls from succeeding while
keeping the existing mmap active.

Verified with test program at: https://lore.kernel.org/patchwork/patch/1008117/
link: https://lore.kernel.org/patchwork/patch/1014892/
Bug: 113362644
Change-Id: If7424db3b64372932d455f0219cd9df613fec1d4
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Joel Fernandes <joelaf@google.com>
3 years agomm/shmem.c: fix unlikely() test of info->seals to test only for WRITE and GROW
Steven Rostedt (VMware) [Fri, 24 Feb 2017 22:59:10 +0000 (14:59 -0800)]
mm/shmem.c: fix unlikely() test of info->seals to test only for WRITE and GROW

Running my likely/unlikely profiler, I discovered that the test in
shmem_write_begin() that tests for info->seals as unlikely, is always
incorrect.  This is because shmem_get_inode() sets info->seals to have
F_SEAL_SEAL set by default, and it is unlikely to be cleared when
shmem_write_begin() is called.  Thus, the if statement is very likely.

But as the if statement block only cares about F_SEAL_WRITE and
F_SEAL_GROW, change the test to only test those two bits.

Link: http://lkml.kernel.org/r/20170203105656.7aec6237@gandalf.local.home
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: I83b8fc6ebae581486df16842713ba83a37e3b858

3 years agoARM: Fix build after memfd_create syscall
Kyle Harrison [Mon, 7 Oct 2019 11:25:34 +0000 (12:25 +0100)]
ARM: Fix build after memfd_create syscall

Error: __NR_syscalls is not equal to the size of the syscall table
Change-Id: I26519fb6be3882893ca4e82d8a011a6abe1a6f53

3 years agoARM: wire up memfd_create syscall
Russell King [Mon, 1 Jul 2019 23:57:25 +0000 (02:57 +0300)]
ARM: wire up memfd_create syscall

Add the memfd_create syscall to ARM.

Change-Id: I0cb81d70e5a224fde6a5d33c9a04c40c4c184a9e
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
3 years agomm: shmem: Reschedule by unlocking and relocking RCU because of missing API
Angelo G. Del Regno [Tue, 14 Mar 2017 23:56:03 +0000 (00:56 +0100)]
mm: shmem: Reschedule by unlocking and relocking RCU because of missing API

The commit introducing the call to cond_resched_rcu() is backported
from a recent kernel version, which has got some very good updates
to the RCU, including a new function cond_resched_rcu which is
doing not-so-complicated rescheduling stuff.
Kernel 3.10 hasn't got any of these and porting would be overkill.

On our current code base, the RCU management is pretty stupid
compared to newer kernels, so it's just ok to reschedule by just
unlocking the RCU and relocking it: this will allow to update its
status and the drivers will be happy.

Change-Id: Iadf407ccaccee64ffeed5e292d17f6b2f7e6ead4

3 years agoshm: wait for pins to be released when sealing
David Herrmann [Fri, 8 Aug 2014 21:25:36 +0000 (14:25 -0700)]
shm: wait for pins to be released when sealing

If we set SEAL_WRITE on a file, we must make sure there cannot be any
ongoing write-operations on the file.  For write() calls, we simply lock
the inode mutex, for mmap() we simply verify there're no writable
mappings.  However, there might be pages pinned by AIO, Direct-IO and
similar operations via GUP.  We must make sure those do not write to the
memfd file after we set SEAL_WRITE.

As there is no way to notify GUP users to drop pages or to wait for them
to be done, we implement the wait ourself: When setting SEAL_WRITE, we
check all pages for their ref-count.  If it's bigger than 1, we know
there's some user of the page.  We then mark the page and wait for up to
150ms for those ref-counts to be dropped.  If the ref-counts are not
dropped in time, we refuse the seal operation.

Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Ryan Lortie <desrt@desrt.ca>
Cc: Lennart Poettering <lennart@poettering.net>
Cc: Daniel Mack <zonque@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: I44afbd3f0af72777702c317737f8d16c566bd240

3 years agomm: allow drivers to prevent new writable mappings
David Herrmann [Fri, 8 Aug 2014 21:25:25 +0000 (14:25 -0700)]
mm: allow drivers to prevent new writable mappings

This patch (of 6):

The i_mmap_writable field counts existing writable mappings of an
address_space.  To allow drivers to prevent new writable mappings, make
this counter signed and prevent new writable mappings if it is negative.
This is modelled after i_writecount and DENYWRITE.

This will be required by the shmem-sealing infrastructure to prevent any
new writable mappings after the WRITE seal has been set.  In case there
exists a writable mapping, this operation will fail with EBUSY.

Note that we rely on the fact that iff you already own a writable mapping,
you can increase the counter without using the helpers.  This is the same
that we do for i_writecount.

Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Ryan Lortie <desrt@desrt.ca>
Cc: Lennart Poettering <lennart@poettering.net>
Cc: Daniel Mack <zonque@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: Ic852afffd43f8e75333d8182ad2ab045c78996f4

3 years agomm: mmap_region: kill correct_wcount/inode, use allow_write_access()
Oleg Nesterov [Wed, 11 Sep 2013 21:20:20 +0000 (14:20 -0700)]
mm: mmap_region: kill correct_wcount/inode, use allow_write_access()

correct_wcount and inode in mmap_region() just complicate the code.  This
boolean was needed previously, when deny_write_access() was called before
vma_merge(), now we can simply check VM_DENYWRITE and do
allow_write_access() if it is set.

allow_write_access() checks file != NULL, so this is safe even if it was
possible to use VM_DENYWRITE && !file.  Just we need to ensure we use the
same file which was deny_write_access()'ed, so the patch also moves "file
= vma->vm_file" down after allow_write_access().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Colin Cross <ccross@android.com>
Cc: David Rientjes <rientjes@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: I4274f2f7e24b51ce8e2eafd5a28bdafb106fbe5e

3 years agomm: do_mmap_pgoff: cleanup the usage of file_inode()
Oleg Nesterov [Wed, 11 Sep 2013 21:20:19 +0000 (14:20 -0700)]
mm: do_mmap_pgoff: cleanup the usage of file_inode()

Simple cleanup.  Move "struct inode *inode" variable into "if (file)"
block to simplify the code and avoid the unnecessary check.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Colin Cross <ccross@android.com>
Cc: David Rientjes <rientjes@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: I1bf739a9e3175cf5d3c1d3b421bf5daa48d5f1b5

3 years agomm: shift VM_GROWS* check from mmap_region() to do_mmap_pgoff()
Oleg Nesterov [Wed, 11 Sep 2013 21:20:18 +0000 (14:20 -0700)]
mm: shift VM_GROWS* check from mmap_region() to do_mmap_pgoff()

mmap() doesn't allow the non-anonymous mappings with VM_GROWS* bit set.
In particular this means that mmap_region()->vma_merge(file, vm_flags)
must always fail if "vm_flags & VM_GROWS" is set incorrectly.

So it does not make sense to check VM_GROWS* after we already allocated
the new vma, the only caller, do_mmap_pgoff(), which can pass this flag
can do the check itself.

And this looks a bit more correct, mmap_region() already unmapped the
old mapping at this stage. But if mmap() is going to fail, it should
avoid do_munmap() if possible.

Note: we check VM_GROWS at the end to ensure that do_mmap_pgoff() won't
return EINVAL in the case when it currently returns another error code.

Many thanks to Hugh who nacked the buggy v1.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: Ic81c1919adf051b9308125fcb87ae6a46e71b580

3 years agomm: mempolicy: turn vma_set_policy() into vma_dup_policy()
Oleg Nesterov [Wed, 11 Sep 2013 21:20:14 +0000 (14:20 -0700)]
mm: mempolicy: turn vma_set_policy() into vma_dup_policy()

Simple cleanup.  Every user of vma_set_policy() does the same work, this
looks a bit annoying imho.  And the new trivial helper which does
mpol_dup() + vma_set_policy() to simplify the callers.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: Ice9406b849ca330fc0b6bc2436a685fa6fd50217

3 years agoshm: add sealing API
David Herrmann [Fri, 8 Aug 2014 21:25:27 +0000 (14:25 -0700)]
shm: add sealing API

If two processes share a common memory region, they usually want some
guarantees to allow safe access. This often includes:
  - one side cannot overwrite data while the other reads it
  - one side cannot shrink the buffer while the other accesses it
  - one side cannot grow the buffer beyond previously set boundaries

If there is a trust-relationship between both parties, there is no need
for policy enforcement.  However, if there's no trust relationship (eg.,
for general-purpose IPC) sharing memory-regions is highly fragile and
often not possible without local copies.  Look at the following two
use-cases:

  1) A graphics client wants to share its rendering-buffer with a
     graphics-server. The memory-region is allocated by the client for
     read/write access and a second FD is passed to the server. While
     scanning out from the memory region, the server has no guarantee that
     the client doesn't shrink the buffer at any time, requiring rather
     cumbersome SIGBUS handling.
  2) A process wants to perform an RPC on another process. To avoid huge
     bandwidth consumption, zero-copy is preferred. After a message is
     assembled in-memory and a FD is passed to the remote side, both sides
     want to be sure that neither modifies this shared copy, anymore. The
     source may have put sensible data into the message without a separate
     copy and the target may want to parse the message inline, to avoid a
     local copy.

While SIGBUS handling, POSIX mandatory locking and MAP_DENYWRITE provide
ways to achieve most of this, the first one is unproportionally ugly to
use in libraries and the latter two are broken/racy or even disabled due
to denial of service attacks.

This patch introduces the concept of SEALING.  If you seal a file, a
specific set of operations is blocked on that file forever.  Unlike locks,
seals can only be set, never removed.  Hence, once you verified a specific
set of seals is set, you're guaranteed that no-one can perform the blocked
operations on this file, anymore.

An initial set of SEALS is introduced by this patch:
  - SHRINK: If SEAL_SHRINK is set, the file in question cannot be reduced
            in size. This affects ftruncate() and open(O_TRUNC).
  - GROW: If SEAL_GROW is set, the file in question cannot be increased
          in size. This affects ftruncate(), fallocate() and write().
  - WRITE: If SEAL_WRITE is set, no write operations (besides resizing)
           are possible. This affects fallocate(PUNCH_HOLE), mmap() and
           write().
  - SEAL: If SEAL_SEAL is set, no further seals can be added to a file.
          This basically prevents the F_ADD_SEAL operation on a file and
          can be set to prevent others from adding further seals that you
          don't want.

The described use-cases can easily use these seals to provide safe use
without any trust-relationship:

  1) The graphics server can verify that a passed file-descriptor has
     SEAL_SHRINK set. This allows safe scanout, while the client is
     allowed to increase buffer size for window-resizing on-the-fly.
     Concurrent writes are explicitly allowed.
  2) For general-purpose IPC, both processes can verify that SEAL_SHRINK,
     SEAL_GROW and SEAL_WRITE are set. This guarantees that neither
     process can modify the data while the other side parses it.
     Furthermore, it guarantees that even with writable FDs passed to the
     peer, it cannot increase the size to hit memory-limits of the source
     process (in case the file-storage is accounted to the source).

The new API is an extension to fcntl(), adding two new commands:
  F_GET_SEALS: Return a bitset describing the seals on the file. This
               can be called on any FD if the underlying file supports
               sealing.
  F_ADD_SEALS: Change the seals of a given file. This requires WRITE
               access to the file and F_SEAL_SEAL may not already be set.
               Furthermore, the underlying file must support sealing and
               there may not be any existing shared mapping of that file.
               Otherwise, EBADF/EPERM is returned.
               The given seals are _added_ to the existing set of seals
               on the file. You cannot remove seals again.

The fcntl() handler is currently specific to shmem and disabled on all
files. A file needs to explicitly support sealing for this interface to
work. A separate syscall is added in a follow-up, which creates files that
support sealing. There is no intention to support this on other
file-systems. Semantics are unclear for non-volatile files and we lack any
use-case right now. Therefore, the implementation is specific to shmem.

Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Ryan Lortie <desrt@desrt.ca>
Cc: Lennart Poettering <lennart@poettering.net>
Cc: Daniel Mack <zonque@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Angelo G. Del Regno <kholk11@gmail.com>
Change-Id: Ib71a640ebcc010c1ac2ec384bc292dd9dc7a5a26

3 years agoinput: gpio_keys: report SW_LID instead of SW_FLIP
Roman Birg [Mon, 18 Aug 2014 21:04:44 +0000 (14:04 -0700)]
input: gpio_keys: report SW_LID instead of SW_FLIP

* Android expects SW_LID for lid events. It also expects a different
  sequence of lid state since windowed covers are not yet supported.

Change-Id: Iebffbabdbb3748eec4f887ebd227c67adf01d8ef
Signed-off-by: Roman Birg <roman@cyngn.com>
4 years agonet: add sk_fullsock() helper lineage-18.0
Eric Dumazet [Mon, 16 Mar 2015 04:12:12 +0000 (21:12 -0700)]
net: add sk_fullsock() helper

We have many places where we want to check if a socket is
not a timewait or request socket. Use a helper to avoid
hard coding this.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[backported from net-next 1d0ab253872cdd3d8e7913f59c266c7fd01771d0]
[lorenzo@google.com: removed TCPF_NEW_SYN_RECV, and added a comment to add it back.]

Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Bug: 24163529
Change-Id: Ibf09017e1ab00af5e6925273117c335d7f515d73

4 years agonet: tcp: Scale the TCP backlog queue to absorb packet bursts
Harout Hedeshian [Mon, 2 Feb 2015 20:30:42 +0000 (13:30 -0700)]
net: tcp: Scale the TCP backlog queue to absorb packet bursts

A large momentary influx of packets flooding the TCP layer may cause
packets to get dropped at the socket's backlog queue. Bump this up to
prevent these drops. Note that this change may cause the socket memory
accounting to allow the total backlog queue length to exceed the user
space configured values, sometimes by a substantial amount, which can
lead to out of order packets to be dropped instead of being queued. To
avoid these ofo drops, the condition to drop an out of order packet is
modified to allow out of order queuing to continue as long as it falls
within the now increased backlog queue limit.

Change-Id: I447ffc8560cb149fe84193c72bf693862f7ec740
Signed-off-by: Harout Hedeshian <harouth@codeaurora.org>
4 years agoipv6: inet6_sk() should use sk_fullsock()
Eric Dumazet [Mon, 5 Oct 2015 04:08:09 +0000 (21:08 -0700)]
ipv6: inet6_sk() should use sk_fullsock()

SYN_RECV & TIMEWAIT sockets are not full blown, they do not have a pinet6
pointer.

Bug: 24163529
Change-Id: I6ce67a190d67d200c6ebeb81d2daeb9c86cd7581
Fixes: ca6fb0651883 ("tcp: attach SYNACK messages to request sockets instead of listener")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
4 years agotcp: fastopen: fix on syn-data transmit failure
Eric Dumazet [Tue, 19 Sep 2017 17:05:57 +0000 (10:05 -0700)]
tcp: fastopen: fix on syn-data transmit failure

[ Upstream commit b5b7db8d680464b1d631fd016f5e093419f0bfd9 ]

Our recent change exposed a bug in TCP Fastopen Client that syzkaller
found right away [1]

When we prepare skb with SYN+DATA, we attempt to transmit it,
and we update socket state as if the transmit was a success.

In socket RTX queue we have two skbs, one with the SYN alone,
and a second one containing the DATA.

When (malicious) ACK comes in, we now complain that second one had no
skb_mstamp.

The proper fix is to make sure that if the transmit failed, we do not
pretend we sent the DATA skb, and make it our send_head.

When 3WHS completes, we can now send the DATA right away, without having
to wait for a timeout.

[1]
WARNING: CPU: 0 PID: 100189 at net/ipv4/tcp_input.c:3117 tcp_clean_rtx_queue+0x2057/0x2ab0 net/ipv4/tcp_input.c:3117()

 WARN_ON_ONCE(last_ackt == 0);

Modules linked in:
CPU: 0 PID: 100189 Comm: syz-executor1 Not tainted
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
 0000000000000000 ffff8800b35cb1d8 ffffffff81cad00d 0000000000000000
 ffffffff828a4347 ffff88009f86c080 ffffffff8316eb20 0000000000000d7f
 ffff8800b35cb220 ffffffff812c33c2 ffff8800baad2440 00000009d46575c0
Call Trace:
 [<ffffffff81cad00d>] __dump_stack
 [<ffffffff81cad00d>] dump_stack+0xc1/0x124
 [<ffffffff812c33c2>] warn_slowpath_common+0xe2/0x150
 [<ffffffff812c361e>] warn_slowpath_null+0x2e/0x40
 [<ffffffff828a4347>] tcp_clean_rtx_queue+0x2057/0x2ab0 n
 [<ffffffff828ae6fd>] tcp_ack+0x151d/0x3930
 [<ffffffff828baa09>] tcp_rcv_state_process+0x1c69/0x4fd0
 [<ffffffff828efb7f>] tcp_v4_do_rcv+0x54f/0x7c0
 [<ffffffff8258aacb>] sk_backlog_rcv
 [<ffffffff8258aacb>] __release_sock+0x12b/0x3a0
 [<ffffffff8258ad9e>] release_sock+0x5e/0x1c0
 [<ffffffff8294a785>] inet_wait_for_connect
 [<ffffffff8294a785>] __inet_stream_connect+0x545/0xc50
 [<ffffffff82886f08>] tcp_sendmsg_fastopen
 [<ffffffff82886f08>] tcp_sendmsg+0x2298/0x35a0
 [<ffffffff82952515>] inet_sendmsg+0xe5/0x520
 [<ffffffff8257152f>] sock_sendmsg_nosec
 [<ffffffff8257152f>] sock_sendmsg+0xcf/0x110

Fixes: 8c72c65b426b ("tcp: update skb->skb_mstamp more carefully")
Fixes: 783237e8daf1 ("net-tcp: Fast Open client - sending SYN-data")
Change-Id: I1ee49ef4b2ab363fd9f10a518c1ce8bfa71ad7d1
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agonet: ipv4: Don't crash if passing a null sk to ip_rt_update_pmtu.
Lorenzo Colitti [Tue, 29 Nov 2016 17:56:47 +0000 (02:56 +0900)]
net: ipv4: Don't crash if passing a null sk to ip_rt_update_pmtu.

Commit e2d118a1cb5e ("net: inet: Support UID-based routing in IP
protocols.") made __build_flow_key call sock_net(sk) to determine
the network namespace of the passed-in socket. This crashes if sk
is NULL.

Fix this by getting the network namespace from the skb instead.

[Backport of net-next d109e61bfe7a468fd8df4a7ceb65635e7aa909a0]

Bug: 16355602
Change-Id: I23b43db5adb8546833e013c268f31111d0e53c69
Fixes: e2d118a1cb5e ("net: inet: Support UID-based routing in IP protocols.")
Reported-by: Erez Shitrit <erezsh@dev.mellanox.co.il>
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: inet: Support UID-based routing in IP protocols.
Lorenzo Colitti [Thu, 3 Nov 2016 17:23:43 +0000 (02:23 +0900)]
net: inet: Support UID-based routing in IP protocols.

- Use the UID in routing lookups made by protocol connect() and
  sendmsg() functions.
- Make sure that routing lookups triggered by incoming packets
  (e.g., Path MTU discovery) take the UID of the socket into
  account.
- For packets not associated with a userspace socket, (e.g., ping
  replies) use UID 0 inside the user namespace corresponding to
  the network namespace the socket belongs to. This allows
  all namespaces to apply routing and iptables rules to
  kernel-originated traffic in that namespaces by matching UID 0.
  This is better than using the UID of the kernel socket that is
  sending the traffic, because the UID of kernel sockets created
  at namespace creation time (e.g., the per-processor ICMP and
  TCP sockets) is the UID of the user that created the socket,
  which might not be mapped in the namespace.

[Backport of net-next e2d118a1cb5e60d077131a09db1d81b90a5295fe]

Bug: 16355602
Change-Id: I126f8359887b5b5bbac68daf0ded89e899cb7cb0
Tested: compiles allnoconfig, allyesconfig, allmodconfig
Tested: https://android-review.googlesource.com/253302
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: ipv6: make "ip -6 route get mark xyz" work.
Lorenzo Colitti [Thu, 15 May 2014 23:38:41 +0000 (16:38 -0700)]
net: ipv6: make "ip -6 route get mark xyz" work.

Currently, "ip -6 route get mark xyz" ignores the mark passed in
by userspace. Make it honour the mark, just like IPv4 does.

[net-next commit 2e47b291953c35afa4e20a65475954c1a1b9afe1]

Change-Id: Idaae7338506d1785a80159bfe4f0cc3c2a9b6827
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: core: Add a UID field to struct sock.
Lorenzo Colitti [Thu, 3 Nov 2016 17:23:41 +0000 (02:23 +0900)]
net: core: Add a UID field to struct sock.

Protocol sockets (struct sock) don't have UIDs, but most of the
time, they map 1:1 to userspace sockets (struct socket) which do.

Various operations such as the iptables xt_owner match need
access to the "UID of a socket", and do so by following the
backpointer to the struct socket. This involves taking
sk_callback_lock and doesn't work when there is no socket
because userspace has already called close().

Simplify this by adding a sk_uid field to struct sock whose value
matches the UID of the corresponding struct socket. The semantics
are as follows:

1. Whenever sk_socket is non-null: sk_uid is the same as the UID
   in sk_socket, i.e., matches the return value of sock_i_uid.
   Specifically, the UID is set when userspace calls socket(),
   fchown(), or accept().
2. When sk_socket is NULL, sk_uid is defined as follows:
   - For a socket that no longer has a sk_socket because
     userspace has called close(): the previous UID.
   - For a cloned socket (e.g., an incoming connection that is
     established but on which userspace has not yet called
     accept): the UID of the socket it was cloned from.
   - For a socket that has never had an sk_socket: UID 0 inside
     the user namespace corresponding to the network namespace
     the socket belongs to.

Kernel sockets created by sock_create_kern are a special case
of #1 and sk_uid is the user that created them. For kernel
sockets created at network namespace creation time, such as the
per-processor ICMP and TCP sockets, this is the user that created
the network namespace.

[Backport of net-next 86741ec25462e4c8cdce6df2f41ead05568c7d5e]

Bug: 16355602
Change-Id: I73e1a57dfeedf672f4c2dfc9ce6867838b55974b
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agotcp: fix more NULL deref after prequeue changes
Eric Dumazet [Tue, 9 Dec 2014 17:56:08 +0000 (09:56 -0800)]
tcp: fix more NULL deref after prequeue changes

When I cooked commit c3658e8d0f1 ("tcp: fix possible NULL dereference in
tcp_vX_send_reset()") I missed other spots we could deref a NULL
skb_dst(skb)

Again, if a socket is provided, we do not need skb_dst() to get a
pointer to network namespace : sock_net(sk) is good enough.

[Backport of net-next 0f85feae6b710ced3abad5b2b47d31dfcb956b62]

Bug: 16355602
Change-Id: I72c9f7dae8da4451112a20ea36183365303bd389
Reported-by: Dann Frazier <dann.frazier@canonical.com>
Bisected-by: Dann Frazier <dann.frazier@canonical.com>
Tested-by: Dann Frazier <dann.frazier@canonical.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: ca777eff51f7 ("tcp: remove dst refcount false sharing for prequeue mode")
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: ipv6: Fix ping to link-local addresses.
Lorenzo Colitti [Fri, 12 Aug 2016 16:13:38 +0000 (01:13 +0900)]
net: ipv6: Fix ping to link-local addresses.

ping_v6_sendmsg does not set flowi6_oif in response to
sin6_scope_id or sk_bound_dev_if, so it is not possible to use
these APIs to ping an IPv6 address on a different interface.
Instead, it sets flowi6_iif, which is incorrect but harmless.

Stop setting flowi6_iif, and support various ways of setting oif
in the same priority order used by udpv6_sendmsg.

[Backport of net 5e457896986e16c440c97bb94b9ccd95dd157292]

Bug: 29370996
Change-Id: I2c8bc213c417a4427f64439e0954138cb30416c2
Tested: https://android-review.googlesource.com/#/c/254470/
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Corinna Vinschen <xda@vinschen.de>
4 years agoipv4: sendto/hdrincl: don't use destination address found in header
Chris Clark [Tue, 27 Aug 2013 18:02:15 +0000 (12:02 -0600)]
ipv4: sendto/hdrincl: don't use destination address found in header

ipv4: raw_sendmsg: don't use header's destination address

A sendto() regression was bisected and found to start with commit
f8126f1d5136be1 (ipv4: Adjust semantics of rt->rt_gateway.)

The problem is that it tries to ARP-lookup the constructed packet's
destination address rather than the explicitly provided address.

Fix this using FLOWI_FLAG_KNOWN_NH so that given nexthop is used.

cf. commit 2ad5b9e4bd314fc685086b99e90e5de3bc59e26b

Reported-by: Chris Clark <chris.clark@alcatel-lucent.com>
Bisected-by: Chris Clark <chris.clark@alcatel-lucent.com>
Tested-by: Chris Clark <chris.clark@alcatel-lucent.com>
Suggested-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Chris Clark <chris.clark@alcatel-lucent.com>
Change-Id: I06c9f3a0bca97a4b190e31543345e5accbf73a6d

4 years agouniversal7580: a7xelte-dts: remove second touchkey entry
Danny Wood [Mon, 6 Jul 2020 14:06:03 +0000 (15:06 +0100)]
universal7580: a7xelte-dts: remove second touchkey entry

* fixes touchkeys using our abov_touchkey_ft1804 combined driver

Change-Id: Ie82812adf7a13b2c4d05ea70d319cae83e2e566f

4 years agodefconfig: Import a7xelte defconfig.
Sourajit Karmakar [Tue, 21 Apr 2020 13:48:24 +0000 (09:48 -0400)]
defconfig: Import a7xelte defconfig.

Thanks @danwood76.

Change-Id: I3e68f291b872d1e493d662d9cab699fb0f472a2c

4 years agosensors: k2hh: Fix accelerometer
Dario Trombello [Thu, 20 Feb 2020 18:41:57 +0000 (18:41 +0000)]
sensors: k2hh: Fix accelerometer

Using the STMicroelectronics K2HH driver from SM-J700F Android 6.0 kernel source (J700FXXU4BQE3) makes the sensor work.

Change-Id: I9f50ea5096b56617b171d5cc64c2ed1b01a3e205

4 years agoarm64: Add lineageos_j7elte_defconfig
Dario Trombello [Thu, 20 Feb 2020 18:34:39 +0000 (18:34 +0000)]
arm64: Add lineageos_j7elte_defconfig

Change-Id: I85146aabf467c93cc6713b63445dfac667186212

5 years agoasm-generic: add memfd_create system call to unistd.h lineage-17.0
Will Deacon [Mon, 11 Aug 2014 13:24:47 +0000 (14:24 +0100)]
asm-generic: add memfd_create system call to unistd.h

Commit 9183df25fe7b ("shm: add memfd_create() syscall") added a new
system call (memfd_create) but didn't update the asm-generic unistd
header.

This patch adds the new system call to the asm-generic version of
unistd.h so that it can be used by architectures such as arm64.

Change-Id: I173b1e5b6087fcea7d226a9f55f792432515897d
Cc: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
5 years agoshm: add memfd_create() syscall
David Herrmann [Fri, 8 Aug 2014 21:25:29 +0000 (14:25 -0700)]
shm: add memfd_create() syscall

memfd_create() is similar to mmap(MAP_ANON), but returns a file-descriptor
that you can pass to mmap().  It can support sealing and avoids any
connection to user-visible mount-points.  Thus, it's not subject to quotas
on mounted file-systems, but can be used like malloc()'ed memory, but with
a file-descriptor to it.

memfd_create() returns the raw shmem file, so calls like ftruncate() can
be used to modify the underlying inode.  Also calls like fstat() will
return proper information and mark the file as regular file.  If you want
sealing, you can specify MFD_ALLOW_SEALING.  Otherwise, sealing is not
supported (like on all other regular files).

Compared to O_TMPFILE, it does not require a tmpfs mount-point and is not
subject to a filesystem size limit.  It is still properly accounted to
memcg limits, though, and to the same overcommit or no-overcommit
accounting as all user memory.

Change-Id: Iaf959293e2c490523aeb46d56cc45b0e7bbe7bf5
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Ryan Lortie <desrt@desrt.ca>
Cc: Lennart Poettering <lennart@poettering.net>
Cc: Daniel Mack <zonque@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Angelo G. Del Regno <kholk11@gmail.com>
5 years agouniversal7580: fix commit "ANDROID: sdcardfs: Hold i_mutex for
Corinna Vinschen [Sun, 18 Nov 2018 18:21:35 +0000 (19:21 +0100)]
universal7580: fix commit "ANDROID: sdcardfs: Hold i_mutex for
 i_size_write"

I accidentally merged the 3.18 patch, using a different way to access
the lower file's inode.  Use the 3.10 technique instead.

Change-Id: Iea18abcb24cce9afa23e870af8beb31767d67250
Signed-off-by: Corinna Vinschen <xda@vinschen.de>
5 years agoANDROID: sdcardfs: Add option to not link obb
Daniel Rosenberg [Thu, 25 Oct 2018 23:25:15 +0000 (16:25 -0700)]
ANDROID: sdcardfs: Add option to not link obb

Add mount option unshared_obb to not link the obb
folders of multiple users together.

Bug: 27915347
Test: mount with option. Check if altering one obb
      alters the other
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Change-Id: I3956e06bd0a222b0bbb2768c9a8a8372ada85e1e
Signed-off-by: Paul Keith <javelinanddart@gmail.com>
5 years agoANDROID: sdcardfs: Add sandbox
Daniel Rosenberg [Thu, 25 Oct 2018 23:22:50 +0000 (16:22 -0700)]
ANDROID: sdcardfs: Add sandbox

Android/sandbox is treated the same as Android/data

Bug: 27915347
Test: ls -l /sdcard/Android/sandbox/*somepackage* after
      creating the folder.
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Change-Id: I7ef440a88df72198303c419e1f2f7c4657f9c170
Signed-off-by: Paul Keith <javelinanddart@gmail.com>
5 years agoANDROID: sdcardfs: Add option to drop unused dentries
Daniel Rosenberg [Fri, 6 Jul 2018 23:24:27 +0000 (16:24 -0700)]
ANDROID: sdcardfs: Add option to drop unused dentries

This adds the nocache mount option, which will cause sdcardfs to always
drop dentries that are not in use, preventing cached entries from
holding on to lower dentries, which could  cause strange behavior when
bypassing the sdcardfs layer and directly changing the lower fs.

Change-Id: I70268584a20b989ae8cfdd278a2e4fa1605217fb
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Signed-off-by: Paul Keith <javelinanddart@gmail.com>
5 years agoANDROID: sdcardfs: Change current->fs under lock
Daniel Rosenberg [Fri, 20 Jul 2018 23:11:40 +0000 (16:11 -0700)]
ANDROID: sdcardfs: Change current->fs under lock

bug: 111641492

Change-Id: I79e9894f94880048edaf0f7cfa2d180f65cbcf3b
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Signed-off-by: Paul Keith <javelinanddart@gmail.com>
5 years agoANDROID: sdcardfs: Don't use OVERRIDE_CRED macro
Daniel Rosenberg [Fri, 20 Jul 2018 01:08:35 +0000 (18:08 -0700)]
ANDROID: sdcardfs: Don't use OVERRIDE_CRED macro

The macro hides some control flow, making it easier
to run into bugs.

bug: 111642636

Change-Id: I37ec207c277d97c4e7f1e8381bc9ae743ad78435
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Signed-off-by: Paul Keith <javelinanddart@gmail.com>
5 years agoRevert "FROMLIST: android: binder: Move buffer out of area shared with user space"
Danny Wood [Thu, 31 Oct 2019 14:35:27 +0000 (14:35 +0000)]
Revert "FROMLIST: android: binder: Move buffer out of area shared with user space"

This commit causes the Samsung a5xelte fingerprint blobs to stop working

This reverts commit 35852b611c5af888e0ac979391099fe2035a06be.

Change-Id: I4c9e3c551deb98b793cb6a7de9ef2a14f3a46067

5 years agobinder: fix possible UAF when freeing buffer
Todd Kjos [Wed, 12 Jun 2019 20:29:27 +0000 (13:29 -0700)]
binder: fix possible UAF when freeing buffer

commit a370003cc301d4361bae20c9ef615f89bf8d1e8a upstream

There is a race between the binder driver cleaning
up a completed transaction via binder_free_transaction()
and a user calling binder_ioctl(BC_FREE_BUFFER) to
release a buffer. It doesn't matter which is first but
they need to be protected against running concurrently
which can result in a UAF.

Signed-off-by: Todd Kjos <tkjos@google.com>
Cc: stable <stable@vger.kernel.org> # 4.14 4.19
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Change-Id: I1b9c9bdc52df8ddbc5fe7c6d8308f1068265f8ae

5 years agoBACKPORT: binder: Set end of SG buffer area properly.
Martijn Coenen [Tue, 9 Jul 2019 11:09:23 +0000 (13:09 +0200)]
BACKPORT: binder: Set end of SG buffer area properly.

In case the target node requests a security context, the
extra_buffers_size is increased with the size of the security context.
But, that size is not available for use by regular scatter-gather
buffers; make sure the ending of that buffer is marked correctly.

Bug: 136210786
Acked-by: Todd Kjos <tkjos@google.com>
Fixes: ec74136ded79 ("binder: create node flag to request sender's security context")
Signed-off-by: Martijn Coenen <maco@android.com>
Cc: stable@vger.kernel.org # 5.1+
Link: https://lore.kernel.org/r/20190709110923.220736-1-maco@android.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit a56587065094fd96eb4c2b5ad65571daad32156d)
Change-Id: Ib4d3a99e7a881992c1313169f902cfad02a508a6

5 years agoUPSTREAM: binder: check for overflow when alloc for security context
Todd Kjos [Wed, 24 Apr 2019 19:31:18 +0000 (12:31 -0700)]
UPSTREAM: binder: check for overflow when alloc for security context

commit 0b0509508beff65c1d50541861bc0d4973487dc5 upstream.

When allocating space in the target buffer for the security context,
make sure the extra_buffers_size doesn't overflow. This can only
happen if the given size is invalid, but an overflow can turn it
into a valid size. Fail the transaction if an overflow is detected.

Bug: 130571081
Change-Id: Ibaec652d2073491cc426a4a24004a848348316bf
Signed-off-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoFROMGIT: binder: create node flag to request sender's security context
Todd Kjos [Mon, 14 Jan 2019 17:10:21 +0000 (09:10 -0800)]
FROMGIT: binder: create node flag to request sender's security context

To allow servers to verify client identity, allow a node
flag to be set that causes the sender's security context
to be delivered with the transaction. The BR_TRANSACTION
command is extended in BR_TRANSACTION_SEC_CTX to
contain a pointer to the security context string.

Signed-off-by: Todd Kjos <tkjos@google.com>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit ec74136ded792deed80780a2f8baf3521eeb72f9
 https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
 master)
Change-Id: I44496546e2d0dc0022f818a45cd52feb1c1a92cb
Signed-off-by: Todd Kjos <tkjos@google.com>
5 years agoUPSTREAM: binder: fix race that allows malicious free of live buffer
Todd Kjos [Tue, 6 Nov 2018 23:55:32 +0000 (15:55 -0800)]
UPSTREAM: binder: fix race that allows malicious free of live buffer

commit 7bada55ab50697861eee6bb7d60b41e68a961a9c upstream

Malicious code can attempt to free buffers using the BC_FREE_BUFFER
ioctl to binder. There are protections against a user freeing a buffer
while in use by the kernel, however there was a window where
BC_FREE_BUFFER could be used to free a recently allocated buffer that
was not completely initialized. This resulted in a use-after-free
detected by KASAN with a malicious test program.

This window is closed by setting the buffer's allow_user_free attribute
to 0 when the buffer is allocated or when the user has previously freed
it instead of waiting for the caller to set it. The problem was that
when the struct buffer was recycled, allow_user_free was stale and set
to 1 allowing a free to go through.

Bug: 116855682
Change-Id: I0b38089f6fdb1adbf7e1102747e4119c9a05b191
Signed-off-by: Todd Kjos <tkjos@google.com>
Acked-by: Arve Hjønnevåg <arve@android.com>
Cc: stable <stable@vger.kernel.org> # 4.14
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoUPSTREAM: binder: fix proc->files use-after-free
Todd Kjos [Mon, 27 Nov 2017 17:32:33 +0000 (09:32 -0800)]
UPSTREAM: binder: fix proc->files use-after-free

proc->files cleanup is initiated by binder_vma_close. Therefore
a reference on the binder_proc is not enough to prevent the
files_struct from being released while the binder_proc still has
a reference. This can lead to an attempt to dereference the
stale pointer obtained from proc->files prior to proc->files
cleanup. This has been seen once in task_get_unused_fd_flags()
when __alloc_fd() is called with a stale "files".

The fix is to protect proc->files with a mutex to prevent cleanup
while in use.

Signed-off-by: Todd Kjos <tkjos@google.com>
Cc: stable <stable@vger.kernel.org> # 4.14
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 7f3dc0088b98533f17128058fac73cd8b2752ef1)

Change-Id: I40982bb0b4615bda5459538c20eb2a913964042c

5 years agoFROMLIST: ANDROID: binder: Add BINDER_GET_NODE_INFO_FOR_REF ioctl.
Martijn Coenen [Sat, 25 Aug 2018 20:50:56 +0000 (13:50 -0700)]
FROMLIST: ANDROID: binder: Add BINDER_GET_NODE_INFO_FOR_REF ioctl.

This allows the context manager to retrieve information about nodes
that it holds a reference to, such as the current number of
references to those nodes.

Such information can for example be used to determine whether the
servicemanager is the only process holding a reference to a node.
This information can then be passed on to the process holding the
node, which can in turn decide whether it wants to shut down to
reduce resource usage.

Signed-off-by: Martijn Coenen <maco@android.com>
Change-Id: I2fa9b6e2b1d1d6c84fca954125c3ec776dc2c04f

5 years agoUPSTREAM: ANDROID: binder: prevent transactions into own process.
Martijn Coenen [Wed, 28 Mar 2018 09:14:50 +0000 (11:14 +0200)]
UPSTREAM: ANDROID: binder: prevent transactions into own process.

This can't happen with normal nodes (because you can't get a ref
to a node you own), but it could happen with the context manager;
to make the behavior consistent with regular nodes, reject
transactions into the context manager by the process owning it.

Reported-by: syzbot+09e05aba06723a94d43d@syzkaller.appspotmail.com
Signed-off-by: Martijn Coenen <maco@android.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 7aa135fcf26377f92dc0680a57566b4c7f3e281b)
Change-Id: I3f6c0528fb2d3f8b835255b2a0ec603cab94626a

5 years agoUPSTREAM: ANDROID: binder: synchronize_rcu() when using POLLFREE.
Martijn Coenen [Fri, 16 Feb 2018 08:47:15 +0000 (09:47 +0100)]
UPSTREAM: ANDROID: binder: synchronize_rcu() when using POLLFREE.

To prevent races with ep_remove_waitqueue() removing the
waitqueue at the same time.

Reported-by: syzbot+a2a3c4909716e271487e@syzkaller.appspotmail.com
Signed-off-by: Martijn Coenen <maco@android.com>
Cc: stable <stable@vger.kernel.org> # 4.14+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 5eeb2ca02a2f6084fc57ae5c244a38baab07033a)

Change-Id: Ia0089448079c78d0ab0b57303faf838e9e5ee797

5 years agoUPSTREAM: ANDROID: binder: remove waitqueue when thread exits.
Martijn Coenen [Fri, 5 Jan 2018 10:27:07 +0000 (11:27 +0100)]
UPSTREAM: ANDROID: binder: remove waitqueue when thread exits.

binder_poll() passes the thread->wait waitqueue that
can be slept on for work. When a thread that uses
epoll explicitly exits using BINDER_THREAD_EXIT,
the waitqueue is freed, but it is never removed
from the corresponding epoll data structure. When
the process subsequently exits, the epoll cleanup
code tries to access the waitlist, which results in
a use-after-free.

Prevent this by using POLLFREE when the thread exits.

(cherry picked from commit f5cb779ba16334b45ba8946d6bfa6d9834d1527f)

Change-Id: Ib34b1cbb8ab2192d78c3d9956b2f963a66ecad2e
Signed-off-by: Martijn Coenen <maco@android.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: stable <stable@vger.kernel.org> # 4.14
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>