GitHub/exynos8895/android_kernel_samsung_universal8895.git
6 years agof2fs: fix a problem of using memory after free
Yunlei He [Mon, 19 Dec 2016 12:10:48 +0000 (20:10 +0800)]
f2fs: fix a problem of using memory after free

commit 7855eba4d6102f811b6dd142d6c749f53b591fa3 upstream.

This patch fix a problem of using memory after free
in function __try_merge_extent_node.

Fixes: 0f825ee6e873 ("f2fs: add new interfaces for extent tree")
Cc: <stable@vger.kernel.org>
Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: remove unneeded condition
Dan Carpenter [Fri, 16 Dec 2016 08:18:15 +0000 (11:18 +0300)]
f2fs: remove unneeded condition

commit 07fe8d44409f88be8f4a4e8f22b47ee709a22657 upstream.

We checked that "inode" is not an error pointer earlier so there is
no need to check again here.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: don't cache nat entry if out of memory
Chao Yu [Tue, 13 Dec 2016 10:54:59 +0000 (18:54 +0800)]
f2fs: don't cache nat entry if out of memory

commit 5c9e418436f3445d7cc4f3ba2964f231a4b33f17 upstream.

If we run out of memory, in cache_nat_entry, it's better to avoid loop
for allocating memory to cache nat entry, so in low memory scenario, for
read path of node block, I expect this can avoid unneeded latency.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: remove unused values in recover_fsync_data
Yunlei He [Tue, 13 Dec 2016 09:23:37 +0000 (17:23 +0800)]
f2fs: remove unused values in recover_fsync_data

commit fed24668482e07421b8e746a4886e7725434050a upstream.

This patch remove unused values in function recover_fsync_data

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: support async discard based on v4.9
Jaegeuk Kim [Thu, 12 Jan 2017 00:41:25 +0000 (16:41 -0800)]
f2fs: support async discard based on v4.9

commit 275b66b09e85cf0520dc610dd89706952751a473 upstream.

This patch is based on commit 275b66b09e85 (f2fs: support async discard).

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: resolve op and op_flags confilcts
Jaegeuk Kim [Thu, 12 Jan 2017 02:24:54 +0000 (18:24 -0800)]
f2fs: resolve op and op_flags confilcts

commit 70fd76140a6cb63262bd47b68d57b42e889c10ee upstream.

This patch backported ("block,fs: use REQ_* flags directly")

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: remove wrong backported codes
Jaegeuk Kim [Wed, 11 Jan 2017 17:55:38 +0000 (09:55 -0800)]
f2fs: remove wrong backported codes

Kconfig and dentry RCU mode fixes.

Fixes: c1286ff41c2f610df ("f2fs: backport from 'for-f2fs-4.9'")
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agoFROMLIST: binder: fix use-after-free in binder_transaction()
Todd Kjos [Mon, 25 Sep 2017 15:55:09 +0000 (08:55 -0700)]
FROMLIST: binder: fix use-after-free in binder_transaction()

(from https://patchwork.kernel.org/patch/9978801/)

User-space normally keeps the node alive when creating a transaction
since it has a reference to the target. The local strong ref keeps it
alive if the sending process dies before the target process processes
the transaction. If the source process is malicious or has a reference
counting bug, this can fail.

In this case, when we attempt to decrement the node in the failure
path, the node has already been freed.

This is fixed by taking a tmpref on the node while constructing
the transaction. To avoid re-acquiring the node lock and inner
proc lock to increment the proc's tmpref, a helper is used that
does the ref increments on both the node and proc.

Bug: 66899329
Change-Id: Iad40e1e0bccee88234900494fb52a510a37fe8d7
Signed-off-by: Todd Kjos <tkjos@google.com>
6 years agoUPSTREAM: ipv6: fib: Unlink replaced routes from their nodes
Ido Schimmel [Thu, 3 Aug 2017 11:28:22 +0000 (13:28 +0200)]
UPSTREAM: ipv6: fib: Unlink replaced routes from their nodes

When a route is deleted its node pointer is set to NULL to indicate it's
no longer linked to its node. Do the same for routes that are replaced.

This will later allow us to test if a route is still in the FIB by
checking its node pointer instead of its reference count.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cherry-pick from: 7483cea79957312e9f8e9cf760a1bc5d6c507113
Bug: 64978549

Change-Id: Ibfa54cf918084138b6b19437e9ef86bfaea5deae

6 years agoMerge 4.4.89 into android-4.4
Greg Kroah-Hartman [Wed, 27 Sep 2017 09:52:16 +0000 (11:52 +0200)]
Merge 4.4.89 into android-4.4

Changes in 4.4.89
ipv6: accept 64k - 1 packet length in ip6_find_1stfragopt()
ipv6: add rcu grace period before freeing fib6_node
ipv6: fix sparse warning on rt6i_node
qlge: avoid memcpy buffer overflow
Revert "net: phy: Correctly process PHY_HALTED in phy_stop_machine()"
tcp: initialize rcv_mss to TCP_MIN_MSS instead of 0
Revert "net: use lib/percpu_counter API for fragmentation mem accounting"
Revert "net: fix percpu memory leaks"
gianfar: Fix Tx flow control deactivation
ipv6: fix memory leak with multiple tables during netns destruction
ipv6: fix typo in fib6_net_exit()
f2fs: check hot_data for roll-forward recovery
x86/fsgsbase/64: Report FSBASE and GSBASE correctly in core dumps
md/raid5: release/flush io in raid5_do_work()
nfsd: Fix general protection fault in release_lock_stateid()
mm: prevent double decrease of nr_reserved_highatomic
tty: improve tty_insert_flip_char() fast path
tty: improve tty_insert_flip_char() slow path
tty: fix __tty_insert_flip_char regression
Input: i8042 - add Gigabyte P57 to the keyboard reset table
MIPS: math-emu: <MAX|MAXA|MIN|MINA>.<D|S>: Fix quiet NaN propagation
MIPS: math-emu: <MAX|MAXA|MIN|MINA>.<D|S>: Fix cases of both inputs zero
MIPS: math-emu: <MAX|MIN>.<D|S>: Fix cases of both inputs negative
MIPS: math-emu: <MAXA|MINA>.<D|S>: Fix cases of input values with opposite signs
MIPS: math-emu: <MAXA|MINA>.<D|S>: Fix cases of both infinite inputs
MIPS: math-emu: MINA.<D|S>: Fix some cases of infinity and zero inputs
crypto: AF_ALG - remove SGL terminator indicator when chaining
ext4: fix incorrect quotaoff if the quota feature is enabled
ext4: fix quota inconsistency during orphan cleanup for read-only mounts
powerpc: Fix DAR reporting when alignment handler faults
block: Relax a check in blk_start_queue()
md/bitmap: disable bitmap_resize for file-backed bitmaps.
skd: Avoid that module unloading triggers a use-after-free
skd: Submit requests to firmware before triggering the doorbell
scsi: zfcp: fix queuecommand for scsi_eh commands when DIX enabled
scsi: zfcp: add handling for FCP_RESID_OVER to the fcp ingress path
scsi: zfcp: fix capping of unsuccessful GPN_FT SAN response trace records
scsi: zfcp: fix passing fsf_req to SCSI trace on TMF to correlate with HBA
scsi: zfcp: fix missing trace records for early returns in TMF eh handlers
scsi: zfcp: fix payload with full FCP_RSP IU in SCSI trace records
scsi: zfcp: trace HBA FSF response by default on dismiss or timedout late response
scsi: zfcp: trace high part of "new" 64 bit SCSI LUN
scsi: megaraid_sas: Check valid aen class range to avoid kernel panic
scsi: megaraid_sas: Return pended IOCTLs with cmd_status MFI_STAT_WRONG_STATE in case adapter is dead
scsi: storvsc: fix memory leak on ring buffer busy
scsi: sg: remove 'save_scat_len'
scsi: sg: use standard lists for sg_requests
scsi: sg: off by one in sg_ioctl()
scsi: sg: factor out sg_fill_request_table()
scsi: sg: fixup infoleak when using SG_GET_REQUEST_TABLE
scsi: qla2xxx: Fix an integer overflow in sysfs code
ftrace: Fix selftest goto location on error
tracing: Apply trace_clock changes to instance max buffer
ARC: Re-enable MMU upon Machine Check exception
PCI: shpchp: Enable bridge bus mastering if MSI is enabled
media: v4l2-compat-ioctl32: Fix timespec conversion
media: uvcvideo: Prevent heap overflow when accessing mapped controls
bcache: initialize dirty stripes in flash_dev_run()
bcache: Fix leak of bdev reference
bcache: do not subtract sectors_to_gc for bypassed IO
bcache: correct cache_dirty_target in __update_writeback_rate()
bcache: Correct return value for sysfs attach errors
bcache: fix for gc and write-back race
bcache: fix bch_hprint crash and improve output
ftrace: Fix memleak when unregistering dynamic ops when tracing disabled
Linux 4.4.89

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
6 years agoLinux 4.4.89
Greg Kroah-Hartman [Wed, 27 Sep 2017 09:00:37 +0000 (11:00 +0200)]
Linux 4.4.89

6 years agoftrace: Fix memleak when unregistering dynamic ops when tracing disabled
Steven Rostedt (VMware) [Fri, 1 Sep 2017 16:18:28 +0000 (12:18 -0400)]
ftrace: Fix memleak when unregistering dynamic ops when tracing disabled

commit edb096e00724f02db5f6ec7900f3bbd465c6c76f upstream.

If function tracing is disabled by the user via the function-trace option or
the proc sysctl file, and a ftrace_ops that was allocated on the heap is
unregistered, then the shutdown code exits out without doing the proper
clean up. This was found via kmemleak and running the ftrace selftests, as
one of the tests unregisters with function tracing disabled.

 # cat kmemleak
unreferenced object 0xffffffffa0020000 (size 4096):
  comm "swapper/0", pid 1, jiffies 4294668889 (age 569.209s)
  hex dump (first 32 bytes):
    55 ff 74 24 10 55 48 89 e5 ff 74 24 18 55 48 89  U.t$.UH...t$.UH.
    e5 48 81 ec a8 00 00 00 48 89 44 24 50 48 89 4c  .H......H.D$PH.L
  backtrace:
    [<ffffffff81d64665>] kmemleak_vmalloc+0x85/0xf0
    [<ffffffff81355631>] __vmalloc_node_range+0x281/0x3e0
    [<ffffffff8109697f>] module_alloc+0x4f/0x90
    [<ffffffff81091170>] arch_ftrace_update_trampoline+0x160/0x420
    [<ffffffff81249947>] ftrace_startup+0xe7/0x300
    [<ffffffff81249bd2>] register_ftrace_function+0x72/0x90
    [<ffffffff81263786>] trace_selftest_ops+0x204/0x397
    [<ffffffff82bb8971>] trace_selftest_startup_function+0x394/0x624
    [<ffffffff81263a75>] run_tracer_selftest+0x15c/0x1d7
    [<ffffffff82bb83f1>] init_trace_selftests+0x75/0x192
    [<ffffffff81002230>] do_one_initcall+0x90/0x1e2
    [<ffffffff82b7d620>] kernel_init_freeable+0x350/0x3fe
    [<ffffffff81d61ec3>] kernel_init+0x13/0x122
    [<ffffffff81d72c6a>] ret_from_fork+0x2a/0x40
    [<ffffffffffffffff>] 0xffffffffffffffff

Fixes: 12cce594fa ("ftrace/x86: Allow !CONFIG_PREEMPT dynamic ops to use allocated trampolines")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 years agobcache: fix bch_hprint crash and improve output
Michael Lyle [Wed, 6 Sep 2017 06:26:02 +0000 (14:26 +0800)]
bcache: fix bch_hprint crash and improve output

commit 9276717b9e297a62d1151a43d1cd286213f68eb7 upstream.

Most importantly, solve a crash where %llu was used to format signed
numbers.  This would cause a buffer overflow when reading sysfs
writeback_rate_debug, as only 20 bytes were allocated for this and
%llu writes 20 characters plus a null.

Always use the units mechanism rather than having different output
paths for simplicity.

Also, correct problems with display output where 1.10 was a larger
number than 1.09, by multiplying by 10 and then dividing by 1024 instead
of dividing by 100.  (Remainders of >= 1000 would print as .10).

Minor changes: Always display the decimal point instead of trying to
omit it based on number of digits shown.  Decide what units to use
based on 1000 as a threshold, not 1024 (in other words, always print
at most 3 digits before the decimal point).

Signed-off-by: Michael Lyle <mlyle@lyle.org>
Reported-by: Dmitry Yu Okunev <dyokunev@ut.mephi.ru>
Acked-by: Kent Overstreet <kent.overstreet@gmail.com>
Reviewed-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 years agobcache: fix for gc and write-back race
Tang Junhui [Wed, 6 Sep 2017 06:25:59 +0000 (14:25 +0800)]
bcache: fix for gc and write-back race

commit 9baf30972b5568d8b5bc8b3c46a6ec5b58100463 upstream.

gc and write-back get raced (see the email "bcache get stucked" I sended
before):
gc thread                               write-back thread
|                                       |bch_writeback_thread()
|bch_gc_thread()                        |
|                                       |==>read_dirty()
|==>bch_btree_gc()                      |
|==>btree_root() //get btree root       |
|                //node write locker    |
|==>bch_btree_gc_root()                 |
|                                       |==>read_dirty_submit()
|                                       |==>write_dirty()
|                                       |==>continue_at(cl,
|                                       |               write_dirty_finish,
|                                       |               system_wq);
|                                       |==>write_dirty_finish()//excute
|                                       |               //in system_wq
|                                       |==>bch_btree_insert()
|                                       |==>bch_btree_map_leaf_nodes()
|                                       |==>__bch_btree_map_nodes()
|                                       |==>btree_root //try to get btree
|                                       |              //root node read
|                                       |              //lock
|                                       |-----stuck here
|==>bch_btree_set_root()
|==>bch_journal_meta()
|==>bch_journal()
|==>journal_try_write()
|==>journal_write_unlocked() //journal_full(&c->journal)
|                            //condition satisfied
|==>continue_at(cl, journal_write, system_wq); //try to excute
|                               //journal_write in system_wq
|                               //but work queue is excuting
|                               //write_dirty_finish()
|==>closure_sync(); //wait journal_write execute
|                   //over and wake up gc,
|-------------stuck here
|==>release root node write locker

This patch alloc a separate work-queue for write-back thread to avoid such
race.

(Commit log re-organized by Coly Li to pass checkpatch.pl checking)

Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn>
Acked-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 years agobcache: Correct return value for sysfs attach errors
Tony Asleson [Wed, 6 Sep 2017 06:25:57 +0000 (14:25 +0800)]
bcache: Correct return value for sysfs attach errors

commit 77fa100f27475d08a569b9d51c17722130f089e7 upstream.

If you encounter any errors in bch_cached_dev_attach it will return
a negative error code.  The variable 'v' which stores the result is
unsigned, thus user space sees a very large value returned for bytes
written which can cause incorrect user space behavior.  Utilize 1
signed variable to use throughout the function to preserve error return
capability.

Signed-off-by: Tony Asleson <tasleson@redhat.com>
Acked-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 years agobcache: correct cache_dirty_target in __update_writeback_rate()
Tang Junhui [Wed, 6 Sep 2017 06:25:56 +0000 (14:25 +0800)]
bcache: correct cache_dirty_target in __update_writeback_rate()

commit a8394090a9129b40f9d90dcb7f4a49d60c727ca6 upstream.

__update_write_rate() uses a Proportion-Differentiation Controller
algorithm to control writeback rate. A dirty target number is used in
this PD controller to control writeback rate. A larger target number
will make the writeback rate smaller, on the versus, a smaller target
number will make the writeback rate larger.

bcache uses the following steps to calculate the target number,
1) cache_sectors = all-buckets-of-cache-set * buckets-size
2) cache_dirty_target = cache_sectors * cached-device-writeback_percent
3) target = cache_dirty_target *
(sectors-of-cached-device/sectors-of-all-cached-devices-of-this-cache-set)

The calculation at step 1) for cache_sectors is incorrect, which does
not consider dirty blocks occupied by flash only volume.

A flash only volume can be took as a bcache device without cached
device. All data sectors allocated for it are persistent on cache device
and marked dirty, they are not touched by bcache writeback and garbage
collection code. So data blocks of flash only volume should be ignore
when calculating cache_sectors of cache set.

Current code does not subtract dirty sectors of flash only volume, which
results a larger target number from the above 3 steps. And in sequence
the cache device's writeback rate is smaller then a correct value,
writeback speed is slower on all cached devices.

This patch fixes the incorrect slower writeback rate by subtracting
dirty sectors of flash only volumes in __update_writeback_rate().

(Commit log composed by Coly Li to pass checkpatch.pl checking)

Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn>
Reviewed-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 years agobcache: do not subtract sectors_to_gc for bypassed IO
Tang Junhui [Wed, 6 Sep 2017 06:25:53 +0000 (14:25 +0800)]
bcache: do not subtract sectors_to_gc for bypassed IO

commit 69daf03adef5f7bc13e0ac86b4b8007df1767aab upstream.

Since bypassed IOs use no bucket, so do not subtract sectors_to_gc to
trigger gc thread.

Signed-off-by: tang.junhui <tang.junhui@zte.com.cn>
Acked-by: Coly Li <colyli@suse.de>
Reviewed-by: Eric Wheeler <bcache@linux.ewheeler.net>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 years agobcache: Fix leak of bdev reference
Jan Kara [Wed, 6 Sep 2017 06:25:51 +0000 (14:25 +0800)]
bcache: Fix leak of bdev reference

commit 4b758df21ee7081ab41448d21d60367efaa625b3 upstream.

If blkdev_get_by_path() in register_bcache() fails, we try to lookup the
block device using lookup_bdev() to detect which situation we are in to
properly report error. However we never drop the reference returned to
us from lookup_bdev(). Fix that.

Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 years agobcache: initialize dirty stripes in flash_dev_run()
Tang Junhui [Wed, 6 Sep 2017 17:28:53 +0000 (01:28 +0800)]
bcache: initialize dirty stripes in flash_dev_run()

commit 175206cf9ab63161dec74d9cd7f9992e062491f5 upstream.

bcache uses a Proportion-Differentiation Controller algorithm to control
writeback rate to cached devices. In the PD controller algorithm, dirty
stripes of thin flash device should not be counted in, because flash only
volumes never write back dirty data.

Currently dirty stripe counter for thin flash device is not initialized
when the thin flash device starts. Which means the following calculation
in PD controller will reference an undefined dirty stripes number, and
all cached devices attached to the same cache set where the thin flash
device lies on may have an inaccurate writeback rate.

This patch calles bch_sectors_dirty_init() in flash_dev_run(), to
correctly initialize dirty stripe counter when the thin flash device
starts to run. This patch also does following parameter data type change,
 -void bch_sectors_dirty_init(struct cached_dev *dc);
 +void bch_sectors_dirty_init(struct bcache_device *);
to call this function conveniently in flash_dev_run().

(Commit log is composed by Coly Li)

Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn>
Reviewed-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 years agomedia: uvcvideo: Prevent heap overflow when accessing mapped controls
Guenter Roeck [Tue, 8 Aug 2017 12:56:21 +0000 (08:56 -0400)]
media: uvcvideo: Prevent heap overflow when accessing mapped controls

commit 7e09f7d5c790278ab98e5f2c22307ebe8ad6e8ba upstream.

The size of uvc_control_mapping is user controlled leading to a
potential heap overflow in the uvc driver. This adds a check to verify
the user provided size fits within the bounds of the defined buffer
size.

Originally-from: Richard Simmons <rssimmo@amazon.com>

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 years agomedia: v4l2-compat-ioctl32: Fix timespec conversion
Daniel Mentz [Thu, 3 Aug 2017 03:42:17 +0000 (23:42 -0400)]
media: v4l2-compat-ioctl32: Fix timespec conversion

commit 9c7ba1d7634cef490b85bc64c4091ff004821bfd upstream.

Certain syscalls like recvmmsg support 64 bit timespec values for the
X32 ABI. The helper function compat_put_timespec converts a timespec
value to a 32 bit or 64 bit value depending on what ABI is used. The
v4l2 compat layer, however, is not designed to support 64 bit timespec
values and always uses 32 bit values. Hence, compat_put_timespec must
not be used.

Without this patch, user space will be provided with bad timestamp
values from the VIDIOC_DQEVENT ioctl. Also, fields of the struct
v4l2_event32 that come immediately after timestamp get overwritten,
namely the field named id.

Fixes: 81993e81a994 ("compat: Get rid of (get|put)_compat_time(val|spec)")
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Cc: Tiffany Lin <tiffany.lin@mediatek.com>
Cc: Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com>
Cc: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Daniel Mentz <danielmentz@google.com>
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 years agoPCI: shpchp: Enable bridge bus mastering if MSI is enabled
Aleksandr Bezzubikov [Tue, 18 Jul 2017 14:12:25 +0000 (17:12 +0300)]
PCI: shpchp: Enable bridge bus mastering if MSI is enabled

commit 48b79a14505349a29b3e20f03619ada9b33c4b17 upstream.

An SHPC may generate MSIs to notify software about slot or controller
events (SHPC spec r1.0, sec 4.7).  A PCI device can only generate an MSI if
it has bus mastering enabled.

Enable bus mastering if the bridge contains an SHPC that uses MSI for event
notifications.

Signed-off-by: Aleksandr Bezzubikov <zuban32s@gmail.com>
[bhelgaas: changelog]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 years agoARC: Re-enable MMU upon Machine Check exception
Jose Abreu [Fri, 1 Sep 2017 16:00:23 +0000 (17:00 +0100)]
ARC: Re-enable MMU upon Machine Check exception

commit 1ee55a8f7f6b7ca4c0c59e0b4b4e3584a085c2d3 upstream.

I recently came upon a scenario where I would get a double fault
machine check exception tiriggered by a kernel module.
However the ensuing crash stacktrace (ksym lookup) was not working
correctly.

Turns out that machine check auto-disables MMU while modules are allocated
in kernel vaddr spapce.

This patch re-enables the MMU before start printing the stacktrace
making stacktracing of modules work upon a fatal exception.

Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Reviewed-by: Alexey Brodkin <abrodkin@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
[vgupta: moved code into low level handler to avoid in 2 places]
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 years agotracing: Apply trace_clock changes to instance max buffer
Baohong Liu [Tue, 5 Sep 2017 21:57:19 +0000 (16:57 -0500)]
tracing: Apply trace_clock changes to instance max buffer

commit 170b3b1050e28d1ba0700e262f0899ffa4fccc52 upstream.

Currently trace_clock timestamps are applied to both regular and max
buffers only for global trace. For instance trace, trace_clock
timestamps are applied only to regular buffer. But, regular and max
buffers can be swapped, for example, following a snapshot. So, for
instance trace, bad timestamps can be seen following a snapshot.
Let's apply trace_clock timestamps to instance max buffer as well.

Link: http://lkml.kernel.org/r/ebdb168d0be042dcdf51f81e696b17fabe3609c1.1504642143.git.tom.zanussi@linux.intel.com
Fixes: 277ba0446 ("tracing: Add interface to allow multiple trace buffers")
Signed-off-by: Baohong Liu <baohong.liu@intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 years agoftrace: Fix selftest goto location on error
Steven Rostedt (VMware) [Fri, 1 Sep 2017 16:04:09 +0000 (12:04 -0400)]
ftrace: Fix selftest goto location on error

commit 46320a6acc4fb58f04bcf78c4c942cc43b20f986 upstream.

In the second iteration of trace_selftest_ops(), the error goto label is
wrong in the case where trace_selftest_test_global_cnt is off. In the
case of error, it leaks the dynamic ops that was allocated.

Fixes: 95950c2e ("ftrace: Add self-tests for multiple function trace users")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 years agoscsi: qla2xxx: Fix an integer overflow in sysfs code
Dan Carpenter [Wed, 30 Aug 2017 13:30:35 +0000 (16:30 +0300)]
scsi: qla2xxx: Fix an integer overflow in sysfs code

commit e6f77540c067b48dee10f1e33678415bfcc89017 upstream.

The value of "size" comes from the user.  When we add "start + size" it
could lead to an integer overflow bug.

It means we vmalloc() a lot more memory than we had intended.  I believe
that on 64 bit systems vmalloc() can succeed even if we ask it to
allocate huge 4GB buffers.  So we would get memory corruption and likely
a crash when we call ha->isp_ops->write_optrom() and ->read_optrom().

Only root can trigger this bug.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=194061
Fixes: b7cc176c9eb3 ("[SCSI] qla2xxx: Allow region-based flash-part accesses.")
Reported-by: shqking <shqking@gmail.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 years agoscsi: sg: fixup infoleak when using SG_GET_REQUEST_TABLE
Hannes Reinecke [Fri, 15 Sep 2017 12:05:16 +0000 (14:05 +0200)]
scsi: sg: fixup infoleak when using SG_GET_REQUEST_TABLE

commit 3e0097499839e0fe3af380410eababe5a47c4cf9 upstream.

When calling SG_GET_REQUEST_TABLE ioctl only a half-filled table is
returned; the remaining part will then contain stale kernel memory
information.  This patch zeroes out the entire table to avoid this
issue.

Signed-off-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 years agoscsi: sg: factor out sg_fill_request_table()
Hannes Reinecke [Fri, 15 Sep 2017 12:05:15 +0000 (14:05 +0200)]
scsi: sg: factor out sg_fill_request_table()

commit 4759df905a474d245752c9dc94288e779b8734dd upstream.

Factor out sg_fill_request_table() for better readability.

[mkp: typos, applied by hand]

Signed-off-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 years agoscsi: sg: off by one in sg_ioctl()
Dan Carpenter [Thu, 17 Aug 2017 07:09:54 +0000 (10:09 +0300)]
scsi: sg: off by one in sg_ioctl()

commit bd46fc406b30d1db1aff8dabaff8d18bb423fdcf upstream.

If "val" is SG_MAX_QUEUE then we are one element beyond the end of the
"rinfo" array so the > should be >=.

Fixes: 109bade9c625 ("scsi: sg: use standard lists for sg_requests")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 years agoscsi: sg: use standard lists for sg_requests
Hannes Reinecke [Fri, 7 Apr 2017 07:34:16 +0000 (09:34 +0200)]
scsi: sg: use standard lists for sg_requests

commit 109bade9c625c89bb5ea753aaa1a0a97e6fbb548 upstream.

'Sg_request' is using a private list implementation; convert it to
standard lists.

Signed-off-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Tested-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 years agoscsi: sg: remove 'save_scat_len'
Hannes Reinecke [Fri, 7 Apr 2017 07:34:13 +0000 (09:34 +0200)]
scsi: sg: remove 'save_scat_len'

commit 136e57bf43dc4babbfb8783abbf707d483cacbe3 upstream.

Unused.

Signed-off-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Tested-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 years agoscsi: storvsc: fix memory leak on ring buffer busy
Long Li [Tue, 29 Aug 2017 00:43:59 +0000 (17:43 -0700)]
scsi: storvsc: fix memory leak on ring buffer busy

commit 0208eeaa650c5c866a3242201678a19e6dc4a14e upstream.

When storvsc is sending I/O to Hyper-v, it may allocate a bigger buffer
descriptor for large data payload that can't fit into a pre-allocated
buffer descriptor. This bigger buffer is freed on return path.

If I/O request to Hyper-v fails due to ring buffer busy, the storvsc
allocated buffer descriptor should also be freed.

[mkp: applied by hand]

Fixes: be0cf6ca301c ("scsi: storvsc: Set the tablesize based on the information given by the host")
Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 years agoscsi: megaraid_sas: Return pended IOCTLs with cmd_status MFI_STAT_WRONG_STATE in...
Shivasharan S [Wed, 23 Aug 2017 11:47:04 +0000 (04:47 -0700)]
scsi: megaraid_sas: Return pended IOCTLs with cmd_status MFI_STAT_WRONG_STATE in case adapter is dead

commit eb3fe263a48b0d27b229c213929c4cb3b1b39a0f upstream.

After a kill adapter, since the cmd_status is not set, the IOCTLs will
be hung in driver resulting in application hang.  Set cmd_status
MFI_STAT_WRONG_STATE when completing pended IOCTLs.

Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 years agoscsi: megaraid_sas: Check valid aen class range to avoid kernel panic
Shivasharan S [Wed, 23 Aug 2017 11:47:01 +0000 (04:47 -0700)]
scsi: megaraid_sas: Check valid aen class range to avoid kernel panic

commit 91b3d9f0069c8307d0b3a4c6843b65a439183318 upstream.

Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 years agoscsi: zfcp: trace high part of "new" 64 bit SCSI LUN
Steffen Maier [Fri, 28 Jul 2017 10:30:58 +0000 (12:30 +0200)]
scsi: zfcp: trace high part of "new" 64 bit SCSI LUN

commit 5d4a3d0a2ff23799b956e5962b886287614e7fad upstream.

Complements debugging aspects of the otherwise functionally complete
v3.17 commit 9cb78c16f5da ("scsi: use 64-bit LUNs").

While I don't have access to a target exporting 3 or 4 level LUNs,
I did test it by explicitly attaching a non-existent fake 4 level LUN
by means of zfcp sysfs attribute "unit_add".
In order to see corresponding trace records of otherwise successful
events, we had to increase the trace level of area SCSI and HBA to 6.

$ echo 6 > /sys/kernel/debug/s390dbf/zfcp_0.0.1880_scsi/level
$ echo 6 > /sys/kernel/debug/s390dbf/zfcp_0.0.1880_hba/level

$ echo 0x4011402240334044 > \
  /sys/bus/ccw/drivers/zfcp/0.0.1880/0x50050763031bd327/unit_add

Example output formatted by an updated zfcpdbf from the s390-tools
package interspersed with kernel messages at scsi_logging_level=4605:

Timestamp      : ...
Area           : REC
Subarea        : 00
Level          : 1
Exception      : -
CPU ID         : ..
Caller         : 0x...
Record ID      : 1
Tag            : scsla_1
LUN            : 0x4011402240334044
WWPN           : 0x50050763031bd327
D_ID           : 0x00......
Adapter status : 0x5400050b
Port status    : 0x54000001
LUN status     : 0x41000000
Ready count    : 0x00000001
Running count  : 0x00000000
ERP want       : 0x01
ERP need       : 0x01

scsi 2:0:0:4630896905707208721: scsi scan: INQUIRY pass 1 length 36
scsi 2:0:0:4630896905707208721: scsi scan: INQUIRY successful with code 0x0

Timestamp      : ...
Area           : HBA
Subarea        : 00
Level          : 6
Exception      : -
CPU ID         : ..
Caller         : 0x...
Record ID      : 1
Tag            : fs_norm
Request ID     : 0x<inquiry2-req-id>
Request status : 0x00000010
FSF cmnd       : 0x00000001
FSF sequence no: 0x...
FSF issued     : ...
FSF stat       : 0x00000000
FSF stat qual  : 00000000 00000000 00000000 00000000
Prot stat      : 0x00000001
Prot stat qual : ........ ........ 00000000 00000000
Port handle    : 0x...
LUN handle     : 0x...
|
Timestamp      : ...
Area           : SCSI
Subarea        : 00
Level          : 6
Exception      : -
CPU ID         : ..
Caller         : 0x...
Record ID      : 1
Tag            : rsl_nor
Request ID     : 0x<inquiry2-req-id>
SCSI ID        : 0x00000000
SCSI LUN       : 0x40224011
SCSI LUN high  : 0x40444033 <=======================
SCSI result    : 0x00000000
SCSI retries   : 0x00
SCSI allowed   : 0x03
SCSI scribble  : 0x<inquiry2-req-id>
SCSI opcode    : 12000000 a4000000 00000000 00000000
FCP rsp inf cod: 0x00
FCP rsp IU     : 00000000 00000000 00000000 00000000
                 00000000 00000000

scsi 2:0:0:4630896905707208721: scsi scan: INQUIRY pass 2 length 164
scsi 2:0:0:4630896905707208721: scsi scan: INQUIRY successful with code 0x0
scsi 2:0:0:4630896905707208721: scsi scan: peripheral device type of 31, \
no device added

Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Fixes: 9cb78c16f5da ("scsi: use 64-bit LUNs")
Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Reviewed-by: Jens Remus <jremus@linux.vnet.ibm.com>
Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 years agoscsi: zfcp: trace HBA FSF response by default on dismiss or timedout late response
Steffen Maier [Fri, 28 Jul 2017 10:30:57 +0000 (12:30 +0200)]
scsi: zfcp: trace HBA FSF response by default on dismiss or timedout late response

commit fdb7cee3b9e3c561502e58137a837341f10cbf8b upstream.

At the default trace level, we only trace unsuccessful events including
FSF responses.

zfcp_dbf_hba_fsf_response() only used protocol status and FSF status to
decide on an unsuccessful response. However, this is only one of multiple
possible sources determining a failed struct zfcp_fsf_req.

An FSF request can also "fail" if its response runs into an ERP timeout
or if it gets dismissed because a higher level recovery was triggered
[trace tags "erscf_1" or "erscf_2" in zfcp_erp_strategy_check_fsfreq()].
FSF requests with ERP timeout are:
FSF_QTCB_EXCHANGE_CONFIG_DATA, FSF_QTCB_EXCHANGE_PORT_DATA,
FSF_QTCB_OPEN_PORT_WITH_DID or FSF_QTCB_CLOSE_PORT or
FSF_QTCB_CLOSE_PHYSICAL_PORT for target ports,
FSF_QTCB_OPEN_LUN, FSF_QTCB_CLOSE_LUN.
One example is slow queue processing which can cause follow-on errors,
e.g. FSF_PORT_ALREADY_OPEN after FSF_QTCB_OPEN_PORT_WITH_DID timed out.
In order to see the root cause, we need to see late responses even if the
channel presented them successfully with FSF_PROT_GOOD and FSF_GOOD.
Example trace records formatted with zfcpdbf from the s390-tools package:

Timestamp      : ...
Area           : REC
Subarea        : 00
Level          : 1
Exception      : -
CPU ID         : ..
Caller         : ...
Record ID      : 1
Tag            : fcegpf1
LUN            : 0xffffffffffffffff
WWPN           : 0x<WWPN>
D_ID           : 0x00<D_ID>
Adapter status : 0x5400050b
Port status    : 0x41200000
LUN status     : 0x00000000
Ready count    : 0x00000001
Running count  : 0x...
ERP want       : 0x02 ZFCP_ERP_ACTION_REOPEN_PORT
ERP need       : 0x02 ZFCP_ERP_ACTION_REOPEN_PORT
|
Timestamp      : ... 30 seconds later
Area           : REC
Subarea        : 00
Level          : 1
Exception      : -
CPU ID         : ..
Caller         : ...
Record ID      : 2
Tag            : erscf_2
LUN            : 0xffffffffffffffff
WWPN           : 0x<WWPN>
D_ID           : 0x00<D_ID>
Adapter status : 0x5400050b
Port status    : 0x41200000
LUN status     : 0x00000000
Request ID     : 0x<request_ID>
ERP status     : 0x10000000 ZFCP_STATUS_ERP_TIMEDOUT
ERP step       : 0x0800 ZFCP_ERP_STEP_PORT_OPENING
ERP action     : 0x02 ZFCP_ERP_ACTION_REOPEN_PORT
ERP count      : 0x00
|
Timestamp      : ... later than previous record
Area           : HBA
Subarea        : 00
Level          : 5 > default level => 3 <= default level
Exception      : -
CPU ID         : 00
Caller         : ...
Record ID      : 1
Tag            : fs_qtcb => fs_rerr
Request ID     : 0x<request_ID>
Request status : 0x00001010 ZFCP_STATUS_FSFREQ_DISMISSED
| ZFCP_STATUS_FSFREQ_CLEANUP
FSF cmnd       : 0x00000005
FSF sequence no: 0x...
FSF issued     : ... > 30 seconds ago
FSF stat       : 0x00000000 FSF_GOOD
FSF stat qual  : 00000000 00000000 00000000 00000000
Prot stat      : 0x00000001 FSF_PROT_GOOD
Prot stat qual : 00000000 00000000 00000000 00000000
Port handle    : 0x...
LUN handle     : 0x00000000
QTCB log length: ...
QTCB log info  : ...

In case of problems detecting that new responses are waiting on the input
queue, we sooner or later trigger adapter recovery due to an FSF request
timeout (trace tag "fsrth_1").
FSF requests with FSF request timeout are:
typically FSF_QTCB_ABORT_FCP_CMND; but theoretically also
FSF_QTCB_EXCHANGE_CONFIG_DATA or FSF_QTCB_EXCHANGE_PORT_DATA via sysfs,
FSF_QTCB_OPEN_PORT_WITH_DID or FSF_QTCB_CLOSE_PORT for WKA ports,
FSF_QTCB_FCP_CMND for task management function (LUN / target reset).
One or more pending requests can meanwhile have FSF_PROT_GOOD and FSF_GOOD
because the channel filled in the response via DMA into the request's QTCB.

In a theroretical case, inject code can create an erroneous FSF request
on purpose. If data router is enabled, it uses deferred error reporting.
A READ SCSI command can succeed with FSF_PROT_GOOD, FSF_GOOD, and
SAM_STAT_GOOD. But on writing the read data to host memory via DMA,
it can still fail, e.g. if an intentionally wrong scatter list does not
provide enough space. Rather than getting an unsuccessful response,
we get a QDIO activate check which in turn triggers adapter recovery.
One or more pending requests can meanwhile have FSF_PROT_GOOD and FSF_GOOD
because the channel filled in the response via DMA into the request's QTCB.
Example trace records formatted with zfcpdbf from the s390-tools package:

Timestamp      : ...
Area           : HBA
Subarea        : 00
Level          : 6 > default level => 3 <= default level
Exception      : -
CPU ID         : ..
Caller         : ...
Record ID      : 1
Tag            : fs_norm => fs_rerr
Request ID     : 0x<request_ID2>
Request status : 0x00001010 ZFCP_STATUS_FSFREQ_DISMISSED
| ZFCP_STATUS_FSFREQ_CLEANUP
FSF cmnd       : 0x00000001
FSF sequence no: 0x...
FSF issued     : ...
FSF stat       : 0x00000000 FSF_GOOD
FSF stat qual  : 00000000 00000000 00000000 00000000
Prot stat      : 0x00000001 FSF_PROT_GOOD
Prot stat qual : ........ ........ 00000000 00000000
Port handle    : 0x...
LUN handle     : 0x...
|
Timestamp      : ...
Area           : SCSI
Subarea        : 00
Level          : 3
Exception      : -
CPU ID         : ..
Caller         : ...
Record ID      : 1
Tag            : rsl_err
Request ID     : 0x<request_ID2>
SCSI ID        : 0x...
SCSI LUN       : 0x...
SCSI result    : 0x000e0000 DID_TRANSPORT_DISRUPTED
SCSI retries   : 0x00
SCSI allowed   : 0x05
SCSI scribble  : 0x<request_ID2>
SCSI opcode    : 28... Read(10)
FCP rsp inf cod: 0x00
FCP rsp IU     : 00000000 00000000 00000000 00000000
                                         ^^ SAM_STAT_GOOD
                 00000000 00000000

Only with luck in both above cases, we could see a follow-on trace record
of an unsuccesful event following a successful but late FSF response with
FSF_PROT_GOOD and FSF_GOOD. Typically this was the case for I/O requests
resulting in a SCSI trace record "rsl_err" with DID_TRANSPORT_DISRUPTED
[On ZFCP_STATUS_FSFREQ_DISMISSED, zfcp_fsf_protstatus_eval() sets
ZFCP_STATUS_FSFREQ_ERROR seen by the request handler functions as failure].
However, the reason for this follow-on trace was invisible because the
corresponding HBA trace record was missing at the default trace level
(by default hidden records with tags "fs_norm", "fs_qtcb", or "fs_open").

On adapter recovery, after we had shut down the QDIO queues, we perform
unsuccessful pseudo completions with flag ZFCP_STATUS_FSFREQ_DISMISSED
for each pending FSF request in zfcp_fsf_req_dismiss_all().
In order to find the root cause, we need to see all pseudo responses even
if the channel presented them successfully with FSF_PROT_GOOD and FSF_GOOD.

Therefore, check zfcp_fsf_req.status for ZFCP_STATUS_FSFREQ_DISMISSED
or ZFCP_STATUS_FSFREQ_ERROR and trace with a new tag "fs_rerr".

It does not matter that there are numerous places which set
ZFCP_STATUS_FSFREQ_ERROR after the location where we trace an FSF response
early. These cases are based on protocol status != FSF_PROT_GOOD or
== FSF_PROT_FSF_STATUS_PRESENTED and are thus already traced by default
as trace tag "fs_perr" or "fs_ferr" respectively.

NB: The trace record with tag "fssrh_1" for status read buffers on dismiss
all remains. zfcp_fsf_req_complete() handles this and returns early.
All other FSF request types are handled separately and as described above.

Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Fixes: 8a36e4532ea1 ("[SCSI] zfcp: enhancement of zfcp debug features")
Fixes: 2e261af84cdb ("[SCSI] zfcp: Only collect FSF/HBA debug data for matching trace levels")
Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 years agoscsi: zfcp: fix payload with full FCP_RSP IU in SCSI trace records
Steffen Maier [Fri, 28 Jul 2017 10:30:56 +0000 (12:30 +0200)]
scsi: zfcp: fix payload with full FCP_RSP IU in SCSI trace records

commit 12c3e5754c8022a4f2fd1e9f00d19e99ee0d3cc1 upstream.

If the FCP_RSP UI has optional parts (FCP_SNS_INFO or FCP_RSP_INFO) and
thus does not fit into the fsp_rsp field built into a SCSI trace record,
trace the full FCP_RSP UI with all optional parts as payload record
instead of just FCP_SNS_INFO as payload and
a 1 byte RSP_INFO_CODE part of FCP_RSP_INFO built into the SCSI record.

That way we would also get the full FCP_SNS_INFO in case a
target would ever send more than
min(SCSI_SENSE_BUFFERSIZE==96, ZFCP_DBF_PAY_MAX_REC==256)==96.

The mandatory part of FCP_RSP IU is only 24 bytes.
PAYload costs at least one full PAY record of 256 bytes anyway.
We cap to the hardware response size which is only FSF_FCP_RSP_SIZE==128.
So we can just put the whole FCP_RSP IU with any optional parts into
PAYload similarly as we do for SAN PAY since v4.9 commit aceeffbb59bb
("zfcp: trace full payload of all SAN records (req,resp,iels)").
This does not cause any additional trace records wasting memory.

Decoded trace records were confusing because they showed a hard-coded
sense data length of 96 even if the FCP_RSP_IU field FCP_SNS_LEN showed
actually less.

Since the same commit, we set pl_len for SAN traces to the full length of a
request/response even if we cap the corresponding trace.
In contrast, here for SCSI traces we set pl_len to the pre-computed
length of FCP_RSP IU considering SNS_LEN or RSP_LEN if valid.
Nonetheless we trace a hardcoded payload of length FSF_FCP_RSP_SIZE==128
if there were optional parts.
This makes it easier for the zfcpdbf tool to format only the relevant
part of the long FCP_RSP UI buffer. And any trailing information is still
available in the payload trace record just in case.

Rename the payload record tag from "fcp_sns" to "fcp_riu" to make the new
content explicit to zfcpdbf which can then pick a suitable field name such
as "FCP rsp IU all:" instead of "Sense info :"
Also, the same zfcpdbf can still be backwards compatible with "fcp_sns".

Old example trace record before this fix, formatted with the tool zfcpdbf
from s390-tools:

Timestamp      : ...
Area           : SCSI
Subarea        : 00
Level          : 3
Exception      : -
CPU id         : ..
Caller         : 0x...
Record id      : 1
Tag            : rsl_err
Request id     : 0x<request_id>
SCSI ID        : 0x...
SCSI LUN       : 0x...
SCSI result    : 0x00000002
SCSI retries   : 0x00
SCSI allowed   : 0x05
SCSI scribble  : 0x<request_id>
SCSI opcode    : 00000000 00000000 00000000 00000000
FCP rsp inf cod: 0x00
FCP rsp IU     : 00000000 00000000 00000202 00000000
                                       ^^==FCP_SNS_LEN_VALID
                 00000020 00000000
                 ^^^^^^^^==FCP_SNS_LEN==32
Sense len      : 96 <==min(SCSI_SENSE_BUFFERSIZE,ZFCP_DBF_PAY_MAX_REC)
Sense info     : 70000600 00000018 00000000 29000000
                 00000400 00000000 00000000 00000000
                 00000000 00000000 00000000 00000000<==superfluous
                 00000000 00000000 00000000 00000000<==superfluous
                 00000000 00000000 00000000 00000000<==superfluous
                 00000000 00000000 00000000 00000000<==superfluous

New example trace records with this fix:

Timestamp      : ...
Area           : SCSI
Subarea        : 00
Level          : 3
Exception      : -
CPU ID         : ..
Caller         : 0x...
Record ID      : 1
Tag            : rsl_err
Request ID     : 0x<request_id>
SCSI ID        : 0x...
SCSI LUN       : 0x...
SCSI result    : 0x00000002
SCSI retries   : 0x00
SCSI allowed   : 0x03
SCSI scribble  : 0x<request_id>
SCSI opcode    : a30c0112 00000000 02000000 00000000
FCP rsp inf cod: 0x00
FCP rsp IU     : 00000000 00000000 00000a02 00000200
                 00000020 00000000
FCP rsp IU len : 56
FCP rsp IU all : 00000000 00000000 00000a02 00000200
                                       ^^=FCP_RESID_UNDER|FCP_SNS_LEN_VALID
                 00000020 00000000 70000500 00000018
                 ^^^^^^^^==FCP_SNS_LEN
                                   ^^^^^^^^^^^^^^^^^
                 00000000 240000cb 00011100 00000000
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                 00000000 00000000
                 ^^^^^^^^^^^^^^^^^==FCP_SNS_INFO

Timestamp      : ...
Area           : SCSI
Subarea        : 00
Level          : 1
Exception      : -
CPU ID         : ..
Caller         : 0x...
Record ID      : 1
Tag            : lr_okay
Request ID     : 0x<request_id>
SCSI ID        : 0x...
SCSI LUN       : 0x...
SCSI result    : 0x00000000
SCSI retries   : 0x00
SCSI allowed   : 0x05
SCSI scribble  : 0x<request_id>
SCSI opcode    : <CDB of unrelated SCSI command passed to eh handler>
FCP rsp inf cod: 0x00
FCP rsp IU     : 00000000 00000000 00000100 00000000
                 00000000 00000008
FCP rsp IU len : 32
FCP rsp IU all : 00000000 00000000 00000100 00000000
                                       ^^==FCP_RSP_LEN_VALID
                 00000000 00000008 00000000 00000000
                          ^^^^^^^^==FCP_RSP_LEN
                                   ^^^^^^^^^^^^^^^^^==FCP_RSP_INFO

Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Fixes: 250a1352b95e ("[SCSI] zfcp: Redesign of the debug tracing for SCSI records.")
Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 years agoscsi: zfcp: fix missing trace records for early returns in TMF eh handlers
Steffen Maier [Fri, 28 Jul 2017 10:30:55 +0000 (12:30 +0200)]
scsi: zfcp: fix missing trace records for early returns in TMF eh handlers

commit 1a5d999ebfc7bfe28deb48931bb57faa8e4102b6 upstream.

For problem determination we need to see that we were in scsi_eh
as well as whether and why we were successful or not.

The following commits introduced new early returns without adding
a trace record:

v2.6.35 commit a1dbfddd02d2
("[SCSI] zfcp: Pass return code from fc_block_scsi_eh to scsi eh")
on fc_block_scsi_eh() returning != 0 which is FAST_IO_FAIL,

v2.6.30 commit 63caf367e1c9
("[SCSI] zfcp: Improve reliability of SCSI eh handlers in zfcp")
on not having gotten an FSF request after the maximum number of retry
attempts and thus could not issue a TMF and has to return FAILED.

Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Fixes: a1dbfddd02d2 ("[SCSI] zfcp: Pass return code from fc_block_scsi_eh to scsi eh")
Fixes: 63caf367e1c9 ("[SCSI] zfcp: Improve reliability of SCSI eh handlers in zfcp")
Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 years agoscsi: zfcp: fix passing fsf_req to SCSI trace on TMF to correlate with HBA
Steffen Maier [Fri, 28 Jul 2017 10:30:54 +0000 (12:30 +0200)]
scsi: zfcp: fix passing fsf_req to SCSI trace on TMF to correlate with HBA

commit 9fe5d2b2fd30aa8c7827ec62cbbe6d30df4fe3e3 upstream.

Without this fix we get SCSI trace records on task management functions
which cannot be correlated to HBA trace records because all fields
related to the FSF request are empty (zero).
Also, the FCP_RSP_IU is missing as well as any sense data if available.

This was caused by v2.6.14 commit 8a36e4532ea1 ("[SCSI] zfcp: enhancement
of zfcp debug features") introducing trace records for TMFs but
hard coding NULL for a possibly existing TMF FSF request.
The scsi_cmnd scribble is also zero or unrelated for the TMF request
so it also could not lookup a suitable FSF request from there.

A broken example trace record formatted with zfcpdbf from the s390-tools
package:

Timestamp      : ...
Area           : SCSI
Subarea        : 00
Level          : 1
Exception      : -
CPU ID         : ..
Caller         : 0x...
Record ID      : 1
Tag            : lr_fail
Request ID     : 0x0000000000000000
                   ^^^^^^^^^^^^^^^^ no correlation to HBA record
SCSI ID        : 0x<scsitarget>
SCSI LUN       : 0x<scsilun>
SCSI result    : 0x000e0000
SCSI retries   : 0x00
SCSI allowed   : 0x05
SCSI scribble  : 0x0000000000000000
SCSI opcode    : 2a000017 3bb80000 08000000 00000000
FCP rsp inf cod: 0x00
                   ^^ no TMF response
FCP rsp IU     : 00000000 00000000 00000000 00000000
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                 00000000 00000000
                 ^^^^^^^^^^^^^^^^^ no interesting FCP_RSP_IU
Sense len      : ...
^^^^^^^^^^^^^^^^^^^^ no sense data length
Sense info     : ...
^^^^^^^^^^^^^^^^^^^^ no sense data content, even if present

There are some true cases where we really do not have an FSF request:
"rsl_fai" from zfcp_dbf_scsi_fail_send() called for early
returns / completions in zfcp_scsi_queuecommand(),
"abrt_or", "abrt_bl", "abrt_ru", "abrt_ar" from
zfcp_scsi_eh_abort_handler() where we did not get as far,
"lr_nres", "tr_nres" from zfcp_task_mgmt_function() where we're
successful and do not need to do anything because adapter stopped.
For these cases it's correct to pass NULL for fsf_req to _zfcp_dbf_scsi().

Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Fixes: 8a36e4532ea1 ("[SCSI] zfcp: enhancement of zfcp debug features")
Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 years agoscsi: zfcp: fix capping of unsuccessful GPN_FT SAN response trace records
Steffen Maier [Fri, 28 Jul 2017 10:30:53 +0000 (12:30 +0200)]
scsi: zfcp: fix capping of unsuccessful GPN_FT SAN response trace records

commit 975171b4461be296a35e83ebd748946b81cf0635 upstream.

v4.9 commit aceeffbb59bb ("zfcp: trace full payload of all SAN records
(req,resp,iels)") fixed trace data loss of 2.6.38 commit 2c55b750a884
("[SCSI] zfcp: Redesign of the debug tracing for SAN records.")
necessary for problem determination, e.g. to see the
currently active zone set during automatic port scan.

While it already saves space by not dumping any empty residual entries
of the large successful GPN_FT response (4 pages), there are seldom cases
where the GPN_FT response is unsuccessful and likely does not have
FC_NS_FID_LAST set in fp_flags so we did not cap the trace record.
We typically see such case for an initiator WWPN, which is not in any zone.

Cap unsuccessful responses to at least the actual basic CT_IU response
plus whatever fits the SAN trace record built-in "payload" buffer
just in case there's trailing information
of which we would at least see the existence and its beginning.

In order not to erroneously cap successful responses, we need to swap
calling the trace function and setting the CT / ELS status to success (0).

Example trace record pair formatted with zfcpdbf:

Timestamp      : ...
Area           : SAN
Subarea        : 00
Level          : 1
Exception      : -
CPU ID         : ..
Caller         : 0x...
Record ID      : 1
Tag            : fssct_1
Request ID     : 0x<request_id>
Destination ID : 0x00fffffc
SAN req short  : 01000000 fc020000 01720ffc 00000000
                 00000008
SAN req length : 20
|
Timestamp      : ...
Area           : SAN
Subarea        : 00
Level          : 1
Exception      : -
CPU ID         : ..
Caller         : 0x...
Record ID      : 2
Tag            : fsscth2
Request ID     : 0x<request_id>
Destination ID : 0x00fffffc
SAN resp short : 01000000 fc020000 80010000 00090700
                 00000000 00000000 00000000 00000000 [trailing info]
                 00000000 00000000 00000000 00000000 [trailing info]
SAN resp length: 16384
San resp info  : 01000000 fc020000 80010000 00090700
                 00000000 00000000 00000000 00000000 [trailing info]
                 00000000 00000000 00000000 00000000 [trailing info]
                 00000000 00000000 00000000 00000000 [trailing info]
                 00000000 00000000 00000000 00000000 [trailing info]
                 00000000 00000000 00000000 00000000 [trailing info]
                 00000000 00000000 00000000 00000000 [trailing info]
                 00000000 00000000 00000000 00000000 [trailing info]
                 00000000 00000000 00000000 00000000 [trailing info]
                 00000000 00000000 00000000 00000000 [trailing info]
                 00000000 00000000 00000000 00000000 [trailing info]
                 00000000 00000000 00000000 00000000 [trailing info]
                 00000000 00000000 00000000 00000000 [trailing info]
                 00000000 00000000 00000000 00000000 [trailing info]
                 00000000 00000000 00000000 00000000 [trailing info]
                 00000000 00000000 00000000 00000000 [trailing info]

The fix saves all but one of the previously associated 64 PAYload trace
record chunks of size 256 bytes each.

Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Fixes: aceeffbb59bb ("zfcp: trace full payload of all SAN records (req,resp,iels)")
Fixes: 2c55b750a884 ("[SCSI] zfcp: Redesign of the debug tracing for SAN records.")
Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 years agoscsi: zfcp: add handling for FCP_RESID_OVER to the fcp ingress path
Benjamin Block [Fri, 28 Jul 2017 10:30:52 +0000 (12:30 +0200)]
scsi: zfcp: add handling for FCP_RESID_OVER to the fcp ingress path

commit a099b7b1fc1f0418ab8d79ecf98153e1e134656e upstream.

Up until now zfcp would just ignore the FCP_RESID_OVER flag in the FCP
response IU. When this flag is set, it is possible, in regards to the
FCP standard, that the storage-server processes the command normally, up
to the point where data is missing and simply ignores those.

In this case no CHECK CONDITION would be set, and because we ignored the
FCP_RESID_OVER flag we resulted in at least a data loss or even
-corruption as a follow-up error, depending on how the
applications/layers on top behave. To prevent this, we now set the
host-byte of the corresponding scsi_cmnd to DID_ERROR.

Other storage-behaviors, where the same condition results in a CHECK
CONDITION set in the answer, don't need to be changed as they are
handled in the mid-layer already.

Following is an example trace record decoded with zfcpdbf from the
s390-tools package. We forcefully injected a fc_dl which is one byte too
small:

Timestamp      : ...
Area           : SCSI
Subarea        : 00
Level          : 3
Exception      : -
CPU ID         : ..
Caller         : 0x...
Record ID      : 1
Tag            : rsl_err
Request ID     : 0x...
SCSI ID        : 0x...
SCSI LUN       : 0x...
SCSI result    : 0x00070000
                     ^^DID_ERROR
SCSI retries   : 0x..
SCSI allowed   : 0x..
SCSI scribble  : 0x...
SCSI opcode    : 2a000000 00000000 08000000 00000000
FCP rsp inf cod: 0x00
FCP rsp IU     : 00000000 00000000 00000400 00000001
                                       ^^fr_flags==FCP_RESID_OVER
                                         ^^fr_status==SAM_STAT_GOOD
                                            ^^^^^^^^fr_resid
                 00000000 00000000

As of now, we don't actively handle to possibility that a response IU
has both flags - FCP_RESID_OVER and FCP_RESID_UNDER - set at once.

Reported-by: Luke M. Hopkins <lmhopkin@us.ibm.com>
Reviewed-by: Steffen Maier <maier@linux.vnet.ibm.com>
Fixes: 553448f6c483 ("[SCSI] zfcp: Message cleanup")
Fixes: ea127f975424 ("[PATCH] s390 (7/7): zfcp host adapter.") (tglx/history.git)
Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 years agoscsi: zfcp: fix queuecommand for scsi_eh commands when DIX enabled
Steffen Maier [Fri, 28 Jul 2017 10:30:51 +0000 (12:30 +0200)]
scsi: zfcp: fix queuecommand for scsi_eh commands when DIX enabled

commit 71b8e45da51a7b64a23378221c0a5868bd79da4f upstream.

Since commit db007fc5e20c ("[SCSI] Command protection operation"),
scsi_eh_prep_cmnd() saves scmd->prot_op and temporarily resets it to
SCSI_PROT_NORMAL.
Other FCP LLDDs such as qla2xxx and lpfc shield their queuecommand()
to only access any of scsi_prot_sg...() if
(scsi_get_prot_op(cmd) != SCSI_PROT_NORMAL).

Do the same thing for zfcp, which introduced DIX support with
commit ef3eb71d8ba4 ("[SCSI] zfcp: Introduce experimental support for
DIF/DIX").

Otherwise, TUR SCSI commands as part of scsi_eh likely fail in zfcp,
because the regular SCSI command with DIX protection data, that scsi_eh
re-uses in scsi_send_eh_cmnd(), of course still has
(scsi_prot_sg_count() != 0) and so zfcp sends down bogus requests to the
FCP channel hardware.

This causes scsi_eh_test_devices() to have (finish_cmds == 0)
[not SCSI device is online or not scsi_eh_tur() failed]
so regular SCSI commands, that caused / were affected by scsi_eh,
are moved to work_q and scsi_eh_test_devices() itself returns false.
In turn, it unnecessarily escalates in our case in scsi_eh_ready_devs()
beyond host reset to finally scsi_eh_offline_sdevs()
which sets affected SCSI devices offline with the following kernel message:

"kernel: sd H:0:T:L: Device offlined - not ready after error recovery"

Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Fixes: ef3eb71d8ba4 ("[SCSI] zfcp: Introduce experimental support for DIF/DIX")
Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 years agoskd: Submit requests to firmware before triggering the doorbell
Bart Van Assche [Thu, 17 Aug 2017 20:12:46 +0000 (13:12 -0700)]
skd: Submit requests to firmware before triggering the doorbell

commit 5fbd545cd3fd311ea1d6e8be4cedddd0ee5684c7 upstream.

Ensure that the members of struct skd_msg_buf have been transferred
to the PCIe adapter before the doorbell is triggered. This patch
avoids that I/O fails sporadically and that the following error
message is reported:

(skd0:STM000196603:[0000:00:09.0]): Completion mismatch comp_id=0x0000 skreq=0x0400 new=0x0000

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 years agoskd: Avoid that module unloading triggers a use-after-free
Bart Van Assche [Thu, 17 Aug 2017 20:12:45 +0000 (13:12 -0700)]
skd: Avoid that module unloading triggers a use-after-free

commit 7277cc67b3916eed47558c64f9c9c0de00a35cda upstream.

Since put_disk() triggers a disk_release() call and since that
last function calls blk_put_queue() if disk->queue != NULL, clear
the disk->queue pointer before calling put_disk(). This avoids
that unloading the skd kernel module triggers the following
use-after-free:

WARNING: CPU: 8 PID: 297 at lib/refcount.c:128 refcount_sub_and_test+0x70/0x80
refcount_t: underflow; use-after-free.
CPU: 8 PID: 297 Comm: kworker/8:1 Not tainted 4.11.10-300.fc26.x86_64 #1
Workqueue: events work_for_cpu_fn
Call Trace:
 dump_stack+0x63/0x84
 __warn+0xcb/0xf0
 warn_slowpath_fmt+0x5a/0x80
 refcount_sub_and_test+0x70/0x80
 refcount_dec_and_test+0x11/0x20
 kobject_put+0x1f/0x50
 blk_put_queue+0x15/0x20
 disk_release+0xae/0xf0
 device_release+0x32/0x90
 kobject_release+0x67/0x170
 kobject_put+0x2b/0x50
 put_disk+0x17/0x20
 skd_destruct+0x5c/0x890 [skd]
 skd_pci_probe+0x124d/0x13a0 [skd]
 local_pci_probe+0x42/0xa0
 work_for_cpu_fn+0x14/0x20
 process_one_work+0x19e/0x470
 worker_thread+0x1dc/0x4a0
 kthread+0x125/0x140
 ret_from_fork+0x25/0x30

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 years agomd/bitmap: disable bitmap_resize for file-backed bitmaps.
NeilBrown [Thu, 31 Aug 2017 00:23:25 +0000 (10:23 +1000)]
md/bitmap: disable bitmap_resize for file-backed bitmaps.

commit e8a27f836f165c26f867ece7f31eb5c811692319 upstream.

bitmap_resize() does not work for file-backed bitmaps.
The buffer_heads are allocated and initialized when
the bitmap is read from the file, but resize doesn't
read from the file, it loads from the internal bitmap.
When it comes time to write the new bitmap, the bh is
non-existent and we crash.

The common case when growing an array involves making the array larger,
and that normally means making the bitmap larger.  Doing
that inside the kernel is possible, but would need more code.
It is probably easier to require people who use file-backed
bitmaps to remove them and re-add after a reshape.

So this patch disables the resizing of arrays which have
file-backed bitmaps.  This is better than crashing.

Reported-by: Zhilong Liu <zlliu@suse.com>
Fixes: d60b479d177a ("md/bitmap: add bitmap_resize function to allow bitmap resizing.")
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 years agoblock: Relax a check in blk_start_queue()
Bart Van Assche [Thu, 17 Aug 2017 20:12:44 +0000 (13:12 -0700)]
block: Relax a check in blk_start_queue()

commit 4ddd56b003f251091a67c15ae3fe4a5c5c5e390a upstream.

Calling blk_start_queue() from interrupt context with the queue
lock held and without disabling IRQs, as the skd driver does, is
safe. This patch avoids that loading the skd driver triggers the
following warning:

WARNING: CPU: 11 PID: 1348 at block/blk-core.c:283 blk_start_queue+0x84/0xa0
RIP: 0010:blk_start_queue+0x84/0xa0
Call Trace:
 skd_unquiesce_dev+0x12a/0x1d0 [skd]
 skd_complete_internal+0x1e7/0x5a0 [skd]
 skd_complete_other+0xc2/0xd0 [skd]
 skd_isr_completion_posted.isra.30+0x2a5/0x470 [skd]
 skd_isr+0x14f/0x180 [skd]
 irq_forced_thread_fn+0x2a/0x70
 irq_thread+0x144/0x1a0
 kthread+0x125/0x140
 ret_from_fork+0x2a/0x40

Fixes: commit a038e2536472 ("[PATCH] blk_start_queue() must be called with irq disabled - add warning")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Andrew Morton <akpm@osdl.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 years agopowerpc: Fix DAR reporting when alignment handler faults
Michael Ellerman [Thu, 24 Aug 2017 10:49:57 +0000 (20:49 +1000)]
powerpc: Fix DAR reporting when alignment handler faults

commit f9effe925039cf54489b5c04e0d40073bb3a123d upstream.

Anton noticed that if we fault part way through emulating an unaligned
instruction, we don't update the DAR to reflect that.

The DAR value is eventually reported back to userspace as the address
in the SEGV signal, and if userspace is using that value to demand
fault then it can be confused by us not setting the value correctly.

This patch is ugly as hell, but is intended to be the minimal fix and
back ports easily.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 years agoext4: fix quota inconsistency during orphan cleanup for read-only mounts
zhangyi (F) [Thu, 24 Aug 2017 19:21:50 +0000 (15:21 -0400)]
ext4: fix quota inconsistency during orphan cleanup for read-only mounts

commit 95f1fda47c9d8738f858c3861add7bf0a36a7c0b upstream.

Quota does not get enabled for read-only mounts if filesystem
has quota feature, so that quotas cannot updated during orphan
cleanup, which will lead to quota inconsistency.

This patch turn on quotas during orphan cleanup for this case,
make sure quotas can be updated correctly.

Reported-by: Jan Kara <jack@suse.cz>
Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 years agoext4: fix incorrect quotaoff if the quota feature is enabled
zhangyi (F) [Thu, 24 Aug 2017 19:19:39 +0000 (15:19 -0400)]
ext4: fix incorrect quotaoff if the quota feature is enabled

commit b0a5a9589decd07db755d6a8d9c0910d96ff7992 upstream.

Current ext4 quota should always "usage enabled" if the
quota feautre is enabled. But in ext4_orphan_cleanup(), it
turn quotas off directly (used for the older journaled
quota), so we cannot turn it on again via "quotaon" unless
umount and remount ext4.

Simple reproduce:

  mkfs.ext4 -O project,quota /dev/vdb1
  mount -o prjquota /dev/vdb1 /mnt
  chattr -p 123 /mnt
  chattr +P /mnt
  touch /mnt/aa /mnt/bb
  exec 100<>/mnt/aa
  rm -f /mnt/aa
  sync
  echo c > /proc/sysrq-trigger

  #reboot and mount
  mount -o prjquota /dev/vdb1 /mnt
  #query status
  quotaon -Ppv /dev/vdb1
  #output
  quotaon: Cannot find mountpoint for device /dev/vdb1
  quotaon: No correct mountpoint specified.

This patch add check for journaled quotas to avoid incorrect
quotaoff when ext4 has quota feautre.

Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 years agocrypto: AF_ALG - remove SGL terminator indicator when chaining
Stephan Mueller [Thu, 21 Sep 2017 08:16:53 +0000 (10:16 +0200)]
crypto: AF_ALG - remove SGL terminator indicator when chaining

Fixed differently upstream as commit 2d97591ef43d ("crypto: af_alg - consolidation of duplicate code")

The SGL is MAX_SGL_ENTS + 1 in size. The last SG entry is used for the
chaining and is properly updated with the sg_chain invocation. During
the filling-in of the initial SG entries, sg_mark_end is called for each
SG entry. This is appropriate as long as no additional SGL is chained
with the current SGL. However, when a new SGL is chained and the last
SG entry is updated with sg_chain, the last but one entry still contains
the end marker from the sg_mark_end. This end marker must be removed as
otherwise a walk of the chained SGLs will cause a NULL pointer
dereference at the last but one SG entry, because sg_next will return
NULL.

The patch only applies to all kernels up to and including 4.13. The
patch 2d97591ef43d0587be22ad1b0d758d6df4999a0b added to 4.14-rc1
introduced a complete new code base which addresses this bug in
a different way. Yet, that patch is too invasive for stable kernels
and was therefore not marked for stable.

Fixes: 8ff590903d5fc ("crypto: algif_skcipher - User-space interface for skcipher operations")
Signed-off-by: Stephan Mueller <smueller@chronox.de>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 years agoMIPS: math-emu: MINA.<D|S>: Fix some cases of infinity and zero inputs
Aleksandar Markovic [Thu, 27 Jul 2017 16:08:53 +0000 (18:08 +0200)]
MIPS: math-emu: MINA.<D|S>: Fix some cases of infinity and zero inputs

commit 304bfe473e70523e591fb1c9223289d355e0bdcb upstream.

Fix following special cases for MINA>.<D|S>:

  - if one of the inputs is zero, and the other is subnormal, normal,
    or infinity, the  value of the former should be returned (that is,
    a zero).
  - if one of the inputs is infinity, and the other input is normal,
    or subnormal, the value of the latter should be returned.

The previous implementation's logic for such cases was incorrect - it
appears as if it implements MAXA, and not MINA instruction.

A relevant example:

MINA.S fd,fs,ft:
  If fs contains 100.0, and ft contains 0.0, fd is going to contain
  0.0 (without this patch, it used to contain 100.0).

Fixes: a79f5f9ba508 ("MIPS: math-emu: Add support for the MIPS R6 MAX{, A} FPU instruction")
Fixes: 4e9561b20e2f ("MIPS: math-emu: Add support for the MIPS R6 MIN{, A} FPU instruction")

Signed-off-by: Miodrag Dinic <miodrag.dinic@imgtec.com>
Signed-off-by: Goran Ferenc <goran.ferenc@imgtec.com>
Signed-off-by: Aleksandar Markovic <aleksandar.markovic@imgtec.com>
Reviewed-by: James Hogan <james.hogan@imgtec.com>
Cc: Bo Hu <bohu@google.com>
Cc: Douglas Leung <douglas.leung@imgtec.com>
Cc: Jin Qian <jinqian@google.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: Petar Jovanovic <petar.jovanovic@imgtec.com>
Cc: Raghu Gandham <raghu.gandham@imgtec.com>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/16885/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 years agoMIPS: math-emu: <MAXA|MINA>.<D|S>: Fix cases of both infinite inputs
Aleksandar Markovic [Thu, 27 Jul 2017 16:08:52 +0000 (18:08 +0200)]
MIPS: math-emu: <MAXA|MINA>.<D|S>: Fix cases of both infinite inputs

commit 3444c4eb534c20e44f0d6670b34263efaf8b531f upstream.

Fix the value returned by <MAXA|MINA>.<D|S> fd,fs,ft, if both inputs
are infinite. The previous implementation returned always the value
contained in ft in such cases. The correct behavior is specified
in Mips instruction set manual and is as follows:

    fs    ft        MAXA     MINA
  ---------------------------------
    inf   inf        inf      inf
    inf  -inf        inf     -inf
   -inf   inf        inf     -inf
   -inf  -inf       -inf     -inf

A relevant example:

MAXA.S fd,fs,ft:
  If fs contains +inf, and ft contains -inf, fd is going to contain
  +inf (without this patch, it used to contain -inf).

Fixes: a79f5f9ba508 ("MIPS: math-emu: Add support for the MIPS R6 MAX{, A} FPU instruction")
Fixes: 4e9561b20e2f ("MIPS: math-emu: Add support for the MIPS R6 MIN{, A} FPU instruction")

Signed-off-by: Miodrag Dinic <miodrag.dinic@imgtec.com>
Signed-off-by: Goran Ferenc <goran.ferenc@imgtec.com>
Signed-off-by: Aleksandar Markovic <aleksandar.markovic@imgtec.com>
Reviewed-by: James Hogan <james.hogan@imgtec.com>
Cc: Bo Hu <bohu@google.com>
Cc: Douglas Leung <douglas.leung@imgtec.com>
Cc: Jin Qian <jinqian@google.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: Petar Jovanovic <petar.jovanovic@imgtec.com>
Cc: Raghu Gandham <raghu.gandham@imgtec.com>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/16884/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 years agoMIPS: math-emu: <MAXA|MINA>.<D|S>: Fix cases of input values with opposite signs
Aleksandar Markovic [Thu, 27 Jul 2017 16:08:51 +0000 (18:08 +0200)]
MIPS: math-emu: <MAXA|MINA>.<D|S>: Fix cases of input values with opposite signs

commit 1a41b3b441508ae63b1a9ec699ec94065739eb60 upstream.

Fix the value returned by <MAXA|MINA>.<D|S>, if the inputs are normal
fp numbers of the same absolute value, but opposite signs.

A relevant example:

MAXA.S fd,fs,ft:
  If fs contains -3.0, and ft contains +3.0, fd is going to contain
  +3.0 (without this patch, it used to contain -3.0).

Fixes: a79f5f9ba508 ("MIPS: math-emu: Add support for the MIPS R6 MAX{, A} FPU instruction")
Fixes: 4e9561b20e2f ("MIPS: math-emu: Add support for the MIPS R6 MIN{, A} FPU instruction")

Signed-off-by: Miodrag Dinic <miodrag.dinic@imgtec.com>
Signed-off-by: Goran Ferenc <goran.ferenc@imgtec.com>
Signed-off-by: Aleksandar Markovic <aleksandar.markovic@imgtec.com>
Reviewed-by: James Hogan <james.hogan@imgtec.com>
Cc: Bo Hu <bohu@google.com>
Cc: Douglas Leung <douglas.leung@imgtec.com>
Cc: Jin Qian <jinqian@google.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: Petar Jovanovic <petar.jovanovic@imgtec.com>
Cc: Raghu Gandham <raghu.gandham@imgtec.com>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/16883/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 years agoMIPS: math-emu: <MAX|MIN>.<D|S>: Fix cases of both inputs negative
Aleksandar Markovic [Thu, 27 Jul 2017 16:08:50 +0000 (18:08 +0200)]
MIPS: math-emu: <MAX|MIN>.<D|S>: Fix cases of both inputs negative

commit aabf5cf02e22ebc4e541adf835910f388b6c3e65 upstream.

Fix the value returned by <MAX|MIN>.<D|S>, if both inputs are negative
normal fp numbers. The previous logic did not take into account that
if both inputs have the same sign, there should be separate treatment
of the cases when both inputs are negative and when both inputs are
positive.

A relevant example:

MAX.S fd,fs,ft:
  If fs contains -5.0, and ft contains -7.0, fd is going to contain
  -5.0 (without this patch, it used to contain -7.0).

Fixes: a79f5f9ba508 ("MIPS: math-emu: Add support for the MIPS R6 MAX{, A} FPU instruction")
Fixes: 4e9561b20e2f ("MIPS: math-emu: Add support for the MIPS R6 MIN{, A} FPU instruction")

Signed-off-by: Miodrag Dinic <miodrag.dinic@imgtec.com>
Signed-off-by: Goran Ferenc <goran.ferenc@imgtec.com>
Signed-off-by: Aleksandar Markovic <aleksandar.markovic@imgtec.com>
Reviewed-by: James Hogan <james.hogan@imgtec.com>
Cc: Bo Hu <bohu@google.com>
Cc: Douglas Leung <douglas.leung@imgtec.com>
Cc: Jin Qian <jinqian@google.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: Petar Jovanovic <petar.jovanovic@imgtec.com>
Cc: Raghu Gandham <raghu.gandham@imgtec.com>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/16882/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 years agoMIPS: math-emu: <MAX|MAXA|MIN|MINA>.<D|S>: Fix cases of both inputs zero
Aleksandar Markovic [Thu, 27 Jul 2017 16:08:49 +0000 (18:08 +0200)]
MIPS: math-emu: <MAX|MAXA|MIN|MINA>.<D|S>: Fix cases of both inputs zero

commit 15560a58bfd4ff82cdd16b2270d4ef9b06d2cc4d upstream.

Fix the value returned by <MAX|MAXA|MIN|MINA>.<D|S>, if both inputs
are zeros. The right behavior in such cases is stated in instruction
reference manual and is as follows:

   fs  ft       MAX     MIN       MAXA    MINA
  ---------------------------------------------
    0   0        0       0         0       0
    0  -0        0      -0         0      -0
   -0   0        0      -0         0      -0
   -0  -0       -0      -0        -0      -0

Prior to this patch, some of the above cases were yielding correct
results. However, for the sake of code consistency, all such cases
are rewritten in this patch.

A relevant example:

MAX.S fd,fs,ft:
  If fs contains +0.0, and ft contains -0.0, fd is going to contain
  +0.0 (without this patch, it used to contain -0.0).

Fixes: a79f5f9ba508 ("MIPS: math-emu: Add support for the MIPS R6 MAX{, A} FPU instruction")
Fixes: 4e9561b20e2f ("MIPS: math-emu: Add support for the MIPS R6 MIN{, A} FPU instruction")

Signed-off-by: Miodrag Dinic <miodrag.dinic@imgtec.com>
Signed-off-by: Goran Ferenc <goran.ferenc@imgtec.com>
Signed-off-by: Aleksandar Markovic <aleksandar.markovic@imgtec.com>
Reviewed-by: James Hogan <james.hogan@imgtec.com>
Cc: Bo Hu <bohu@google.com>
Cc: Douglas Leung <douglas.leung@imgtec.com>
Cc: Jin Qian <jinqian@google.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: Petar Jovanovic <petar.jovanovic@imgtec.com>
Cc: Raghu Gandham <raghu.gandham@imgtec.com>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/16881/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 years agoMIPS: math-emu: <MAX|MAXA|MIN|MINA>.<D|S>: Fix quiet NaN propagation
Aleksandar Markovic [Thu, 27 Jul 2017 16:08:48 +0000 (18:08 +0200)]
MIPS: math-emu: <MAX|MAXA|MIN|MINA>.<D|S>: Fix quiet NaN propagation

commit e78bf0dc4789bdea1453595ae89e8db65918e22e upstream.

Fix the value returned by <MAX|MAXA|MIN|MINA>.<D|S> fd,fs,ft, if both
inputs are quiet NaNs. The <MAX|MAXA|MIN|MINA>.<D|S> specifications
state that the returned value in such cases should be the quiet NaN
contained in register fs.

A relevant example:

MAX.S fd,fs,ft:
  If fs contains qNaN1, and ft contains qNaN2, fd is going to contain
  qNaN1 (without this patch, it used to contain qNaN2).

Fixes: a79f5f9ba508 ("MIPS: math-emu: Add support for the MIPS R6 MAX{, A} FPU instruction")
Fixes: 4e9561b20e2f ("MIPS: math-emu: Add support for the MIPS R6 MIN{, A} FPU instruction")

Signed-off-by: Miodrag Dinic <miodrag.dinic@imgtec.com>
Signed-off-by: Goran Ferenc <goran.ferenc@imgtec.com>
Signed-off-by: Aleksandar Markovic <aleksandar.markovic@imgtec.com>
Reviewed-by: James Hogan <james.hogan@imgtec.com>
Cc: Bo Hu <bohu@google.com>
Cc: Douglas Leung <douglas.leung@imgtec.com>
Cc: Jin Qian <jinqian@google.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: Petar Jovanovic <petar.jovanovic@imgtec.com>
Cc: Raghu Gandham <raghu.gandham@imgtec.com>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/16880/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 years agoInput: i8042 - add Gigabyte P57 to the keyboard reset table
Kai-Heng Feng [Fri, 15 Sep 2017 16:36:16 +0000 (09:36 -0700)]
Input: i8042 - add Gigabyte P57 to the keyboard reset table

commit 697c5d8a36768b36729533fb44622b35d56d6ad0 upstream.

Similar to other Gigabyte laptops, the touchpad on P57 requires a
keyboard reset to detect Elantech touchpad correctly.

BugLink: https://bugs.launchpad.net/bugs/1594214
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 years agotty: fix __tty_insert_flip_char regression
Arnd Bergmann [Wed, 2 Aug 2017 11:11:39 +0000 (13:11 +0200)]
tty: fix __tty_insert_flip_char regression

commit 8a5a90a2a477b86a3dc2eaa5a706db9bfdd647ca upstream.

Sergey noticed a small but fatal mistake in __tty_insert_flip_char,
leading to an oops in an interrupt handler when using any serial
port.

The problem is that I accidentally took the tty_buffer pointer
before calling __tty_buffer_request_room(), which replaces the
buffer. This moves the pointer lookup to the right place after
allocating the new buffer space.

Fixes: 979990c62848 ("tty: improve tty_insert_flip_char() fast path")
Reported-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Tested-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 years agotty: improve tty_insert_flip_char() slow path
Arnd Bergmann [Tue, 20 Jun 2017 21:10:42 +0000 (23:10 +0200)]
tty: improve tty_insert_flip_char() slow path

commit 065ea0a7afd64d6cf3464bdd1d8cd227527e2045 upstream.

While working on improving the fast path of tty_insert_flip_char(),
I noticed that by calling tty_buffer_request_room(), we needlessly
move to the separate flag buffer mode for the tty, even when all
characters use TTY_NORMAL as the flag.

This changes the code to call __tty_buffer_request_room() with the
correct flag, which will then allocate a regular buffer when it rounds
out of space but no special flags have been used. I'm guessing that
this is the behavior that Peter Hurley intended when he introduced
the compacted flip buffers.

Fixes: acc0f67f307f ("tty: Halve flip buffer GFP_ATOMIC memory consumption")
Cc: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 years agotty: improve tty_insert_flip_char() fast path
Arnd Bergmann [Tue, 20 Jun 2017 21:10:41 +0000 (23:10 +0200)]
tty: improve tty_insert_flip_char() fast path

commit 979990c6284814617d8f2179d197f72ff62b5d85 upstream.

kernelci.org reports a crazy stack usage for the VT code when CONFIG_KASAN
is enabled:

drivers/tty/vt/keyboard.c: In function 'kbd_keycode':
drivers/tty/vt/keyboard.c:1452:1: error: the frame size of 2240 bytes is larger than 2048 bytes [-Werror=frame-larger-than=]

The problem is that tty_insert_flip_char() gets inlined many times into
kbd_keycode(), and also into other functions, and each copy requires 128
bytes for stack redzone to check for a possible out-of-bounds access on
the 'ch' and 'flags' arguments that are passed into
tty_insert_flip_string_flags as a variable-length string.

This introduces a new __tty_insert_flip_char() function for the slow
path, which receives the two arguments by value. This completely avoids
the problem and the stack usage goes back down to around 100 bytes.

Without KASAN, this is also slightly better, as we don't have to
spill the arguments to the stack but can simply pass 'ch' and 'flag'
in registers, saving a few bytes in .text for each call site.

This should be backported to linux-4.0 or later, which first introduced
the stack sanitizer in the kernel.

Fixes: c420f167db8c ("kasan: enable stack instrumentation")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 years agomm: prevent double decrease of nr_reserved_highatomic
Minchan Kim [Tue, 13 Dec 2016 00:42:08 +0000 (16:42 -0800)]
mm: prevent double decrease of nr_reserved_highatomic

commit 4855e4a7f29d6d10b0b9c84e189c770c9a94e91e upstream.

There is race between page freeing and unreserved highatomic.

 CPU 0     CPU 1

    free_hot_cold_page
      mt = get_pfnblock_migratetype
      set_pcppage_migratetype(page, mt)
         unreserve_highatomic_pageblock
         spin_lock_irqsave(&zone->lock)
         move_freepages_block
         set_pageblock_migratetype(page)
         spin_unlock_irqrestore(&zone->lock)
      free_pcppages_bulk
        __free_one_page(mt) <- mt is stale

By above race, a page on CPU 0 could go non-highorderatomic free list
since the pageblock's type is changed.  By that, unreserve logic of
highorderatomic can decrease reserved count on a same pageblock severak
times and then it will make mismatch between nr_reserved_highatomic and
the number of reserved pageblock.

So, this patch verifies whether the pageblock is highatomic or not and
decrease the count only if the pageblock is highatomic.

Link: http://lkml.kernel.org/r/1476259429-18279-3-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Sangseok Lee <sangseok.lee@lge.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Miles Chen <miles.chen@mediatek.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 years agonfsd: Fix general protection fault in release_lock_stateid()
Chuck Lever [Sat, 29 Oct 2016 22:19:03 +0000 (18:19 -0400)]
nfsd: Fix general protection fault in release_lock_stateid()

commit f46c445b79906a9da55c13e0a6f6b6a006b892fe upstream.

When I push NFSv4.1 / RDMA hard, (xfstests generic/089, for example),
I get this crash on the server:

Oct 28 22:04:30 klimt kernel: general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
Oct 28 22:04:30 klimt kernel: Modules linked in: cts rpcsec_gss_krb5 iTCO_wdt iTCO_vendor_support sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm btrfs irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd xor pcspkr raid6_pq i2c_i801 i2c_smbus lpc_ich mfd_core sg mei_me mei ioatdma shpchp wmi ipmi_si ipmi_msghandler rpcrdma ib_ipoib rdma_ucm acpi_power_meter acpi_pad ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c mlx4_ib mlx4_en ib_core sr_mod cdrom sd_mod ast drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel igb ahci libahci ptp mlx4_core pps_core dca libata i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod
Oct 28 22:04:30 klimt kernel: CPU: 7 PID: 1558 Comm: nfsd Not tainted 4.9.0-rc2-00005-g82cd754 #8
Oct 28 22:04:30 klimt kernel: Hardware name: Supermicro Super Server/X10SRL-F, BIOS 1.0c 09/09/2015
Oct 28 22:04:30 klimt kernel: task: ffff880835c3a100 task.stack: ffff8808420d8000
Oct 28 22:04:30 klimt kernel: RIP: 0010:[<ffffffffa05a759f>]  [<ffffffffa05a759f>] release_lock_stateid+0x1f/0x60 [nfsd]
Oct 28 22:04:30 klimt kernel: RSP: 0018:ffff8808420dbce0  EFLAGS: 00010246
Oct 28 22:04:30 klimt kernel: RAX: ffff88084e6660f0 RBX: ffff88084e667020 RCX: 0000000000000000
Oct 28 22:04:30 klimt kernel: RDX: 0000000000000007 RSI: 0000000000000000 RDI: ffff88084e667020
Oct 28 22:04:30 klimt kernel: RBP: ffff8808420dbcf8 R08: 0000000000000001 R09: 0000000000000000
Oct 28 22:04:30 klimt kernel: R10: ffff880835c3a100 R11: ffff880835c3aca8 R12: 6b6b6b6b6b6b6b6b
Oct 28 22:04:30 klimt kernel: R13: ffff88084e6670d8 R14: ffff880835f546f0 R15: ffff880835f1c548
Oct 28 22:04:30 klimt kernel: FS:  0000000000000000(0000) GS:ffff88087bdc0000(0000) knlGS:0000000000000000
Oct 28 22:04:30 klimt kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 28 22:04:30 klimt kernel: CR2: 00007ff020389000 CR3: 0000000001c06000 CR4: 00000000001406e0
Oct 28 22:04:30 klimt kernel: Stack:
Oct 28 22:04:30 klimt kernel: ffff88084e667020 0000000000000000 ffff88084e6670d8 ffff8808420dbd20
Oct 28 22:04:30 klimt kernel: ffffffffa05ac80d ffff880835f54548 ffff88084e640008 ffff880835f545b0
Oct 28 22:04:30 klimt kernel: ffff8808420dbd70 ffffffffa059803d ffff880835f1c768 0000000000000870
Oct 28 22:04:30 klimt kernel: Call Trace:
Oct 28 22:04:30 klimt kernel: [<ffffffffa05ac80d>] nfsd4_free_stateid+0xfd/0x1b0 [nfsd]
Oct 28 22:04:30 klimt kernel: [<ffffffffa059803d>] nfsd4_proc_compound+0x40d/0x690 [nfsd]
Oct 28 22:04:30 klimt kernel: [<ffffffffa0583114>] nfsd_dispatch+0xd4/0x1d0 [nfsd]
Oct 28 22:04:30 klimt kernel: [<ffffffffa047bbf9>] svc_process_common+0x3d9/0x700 [sunrpc]
Oct 28 22:04:30 klimt kernel: [<ffffffffa047ca64>] svc_process+0xf4/0x330 [sunrpc]
Oct 28 22:04:30 klimt kernel: [<ffffffffa05827ca>] nfsd+0xfa/0x160 [nfsd]
Oct 28 22:04:30 klimt kernel: [<ffffffffa05826d0>] ? nfsd_destroy+0x170/0x170 [nfsd]
Oct 28 22:04:30 klimt kernel: [<ffffffff810b367b>] kthread+0x10b/0x120
Oct 28 22:04:30 klimt kernel: [<ffffffff810b3570>] ? kthread_stop+0x280/0x280
Oct 28 22:04:30 klimt kernel: [<ffffffff8174e8ba>] ret_from_fork+0x2a/0x40
Oct 28 22:04:30 klimt kernel: Code: c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 55 41 54 53 48 8b 87 b0 00 00 00 48 89 fb 4c 8b a0 98 00 00 00 <49> 8b 44 24 20 48 8d b8 80 03 00 00 e8 10 66 1a e1 48 89 df e8
Oct 28 22:04:30 klimt kernel: RIP  [<ffffffffa05a759f>] release_lock_stateid+0x1f/0x60 [nfsd]
Oct 28 22:04:30 klimt kernel: RSP <ffff8808420dbce0>
Oct 28 22:04:30 klimt kernel: ---[ end trace cf5d0b371973e167 ]---

Jeff Layton says:
> Hm...now that I look though, this is a little suspicious:
>
>    struct nfs4_openowner *oo = openowner(stp->st_openstp->st_stateowner);
>
> I wonder if it's possible for the openstateid to have already been
> destroyed at this point.
>
> We might be better off doing something like this to get the client pointer:
>
>    stp->st_stid.sc_client;
>
> ...which should be more direct and less dependent on other stateids
> staying valid.

With the suggested change, I am no longer able to reproduce the above oops.

v2: Fix unhash_lock_stateid() as well

Fix-suggested-by: Jeff Layton <jlayton@redhat.com>
Fixes: 42691398be08 ('nfsd: Fix race between FREE_STATEID and LOCK')
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Cc: Christian Theune <ct@flyingcircus.io>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 years agomd/raid5: release/flush io in raid5_do_work()
Song Liu [Thu, 24 Aug 2017 16:53:59 +0000 (09:53 -0700)]
md/raid5: release/flush io in raid5_do_work()

commit 9c72a18e46ebe0f09484cce8ebf847abdab58498 upstream.

In raid5, there are scenarios where some ios are deferred to a later
time, and some IO need a flush to complete. To make sure we make
progress with these IOs, we need to call the following functions:

    flush_deferred_bios(conf);
    r5l_flush_stripe_to_raid(conf->log);

Both of these functions are called in raid5d(), but missing in
raid5_do_work(). As a result, these functions are not called
when multi-threading (group_thread_cnt > 0) is enabled. This patch
adds calls to these function to raid5_do_work().

Note for stable branches:

  r5l_flush_stripe_to_raid(conf->log) is need for 4.4+
  flush_deferred_bios(conf) is only needed for 4.11+

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 years agox86/fsgsbase/64: Report FSBASE and GSBASE correctly in core dumps
Andy Lutomirski [Tue, 1 Aug 2017 14:11:35 +0000 (07:11 -0700)]
x86/fsgsbase/64: Report FSBASE and GSBASE correctly in core dumps

commit 9584d98bed7a7a904d0702ad06bbcc94703cb5b4 upstream.

In ELF_COPY_CORE_REGS, we're copying from the current task, so
accessing thread.fsbase and thread.gsbase makes no sense.  Just read
the values from the CPU registers.

In practice, the old code would have been correct most of the time
simply because thread.fsbase and thread.gsbase usually matched the
CPU registers.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Chang Seok <chang.seok.bae@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 years agof2fs: check hot_data for roll-forward recovery
Jaegeuk Kim [Sun, 13 Aug 2017 04:33:23 +0000 (21:33 -0700)]
f2fs: check hot_data for roll-forward recovery

commit 125c9fb1ccb53eb2ea9380df40f3c743f3fb2fed upstream.

We need to check HOT_DATA to truncate any previous data block when doing
roll-forward recovery.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 years agoipv6: fix typo in fib6_net_exit()
Eric Dumazet [Fri, 8 Sep 2017 22:48:47 +0000 (15:48 -0700)]
ipv6: fix typo in fib6_net_exit()

[ Upstream commit 32a805baf0fb70b6dbedefcd7249ac7f580f9e3b ]

IPv6 FIB should use FIB6_TABLE_HASHSZ, not FIB_TABLE_HASHSZ.

Fixes: ba1cc08d9488 ("ipv6: fix memory leak with multiple tables during netns destruction")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 years agoipv6: fix memory leak with multiple tables during netns destruction
Sabrina Dubroca [Fri, 8 Sep 2017 08:26:19 +0000 (10:26 +0200)]
ipv6: fix memory leak with multiple tables during netns destruction

[ Upstream commit ba1cc08d9488c94cb8d94f545305688b72a2a300 ]

fib6_net_exit only frees the main and local tables. If another table was
created with fib6_alloc_table, we leak it when the netns is destroyed.

Fix this in the same way ip_fib_net_exit cleans up tables, by walking
through the whole hashtable of fib6_table's. We can get rid of the
special cases for local and main, since they're also part of the
hashtable.

Reproducer:
    ip netns add x
    ip -net x -6 rule add from 6003:1::/64 table 100
    ip netns del x

Reported-by: Jianlin Shi <jishi@redhat.com>
Fixes: 58f09b78b730 ("[NETNS][IPV6] ip6_fib - make it per network namespace")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 years agogianfar: Fix Tx flow control deactivation
Claudiu Manoil [Mon, 4 Sep 2017 07:45:28 +0000 (10:45 +0300)]
gianfar: Fix Tx flow control deactivation

[ Upstream commit 5d621672bc1a1e5090c1ac5432a18c79e0e13e03 ]

The wrong register is checked for the Tx flow control bit,
it should have been maccfg1 not maccfg2.
This went unnoticed for so long probably because the impact is
hardly visible, not to mention the tangled code from adjust_link().
First, link flow control (i.e. handling of Rx/Tx link level pause frames)
is disabled by default (needs to be enabled via 'ethtool -A').
Secondly, maccfg2 always returns 0 for tx_flow_oldval (except for a few
old boards), which results in Tx flow control remaining always on
once activated.

Fixes: 45b679c9a3ccd9e34f28e6ec677b812a860eb8eb ("gianfar: Implement PAUSE frame generation support")
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 years agoRevert "net: fix percpu memory leaks"
Jesper Dangaard Brouer [Fri, 1 Sep 2017 09:26:13 +0000 (11:26 +0200)]
Revert "net: fix percpu memory leaks"

[ Upstream commit 5a63643e583b6a9789d7a225ae076fb4e603991c ]

This reverts commit 1d6119baf0610f813eb9d9580eb4fd16de5b4ceb.

After reverting commit 6d7b857d541e ("net: use lib/percpu_counter API
for fragmentation mem accounting") then here is no need for this
fix-up patch.  As percpu_counter is no longer used, it cannot
memory leak it any-longer.

Fixes: 6d7b857d541e ("net: use lib/percpu_counter API for fragmentation mem accounting")
Fixes: 1d6119baf061 ("net: fix percpu memory leaks")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 years agoRevert "net: use lib/percpu_counter API for fragmentation mem accounting"
Jesper Dangaard Brouer [Fri, 1 Sep 2017 09:26:08 +0000 (11:26 +0200)]
Revert "net: use lib/percpu_counter API for fragmentation mem accounting"

[ Upstream commit fb452a1aa3fd4034d7999e309c5466ff2d7005aa ]

This reverts commit 6d7b857d541ecd1d9bd997c97242d4ef94b19de2.

There is a bug in fragmentation codes use of the percpu_counter API,
that can cause issues on systems with many CPUs.

The frag_mem_limit() just reads the global counter (fbc->count),
without considering other CPUs can have upto batch size (130K) that
haven't been subtracted yet.  Due to the 3MBytes lower thresh limit,
this become dangerous at >=24 CPUs (3*1024*1024/130000=24).

The correct API usage would be to use __percpu_counter_compare() which
does the right thing, and takes into account the number of (online)
CPUs and batch size, to account for this and call __percpu_counter_sum()
when needed.

We choose to revert the use of the lib/percpu_counter API for frag
memory accounting for several reasons:

1) On systems with CPUs > 24, the heavier fully locked
   __percpu_counter_sum() is always invoked, which will be more
   expensive than the atomic_t that is reverted to.

Given systems with more than 24 CPUs are becoming common this doesn't
seem like a good option.  To mitigate this, the batch size could be
decreased and thresh be increased.

2) The add_frag_mem_limit+sub_frag_mem_limit pairs happen on the RX
   CPU, before SKBs are pushed into sockets on remote CPUs.  Given
   NICs can only hash on L2 part of the IP-header, the NIC-RXq's will
   likely be limited.  Thus, a fair chance that atomic add+dec happen
   on the same CPU.

Revert note that commit 1d6119baf061 ("net: fix percpu memory leaks")
removed init_frag_mem_limit() and instead use inet_frags_init_net().
After this revert, inet_frags_uninit_net() becomes empty.

Fixes: 6d7b857d541e ("net: use lib/percpu_counter API for fragmentation mem accounting")
Fixes: 1d6119baf061 ("net: fix percpu memory leaks")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 years agotcp: initialize rcv_mss to TCP_MIN_MSS instead of 0
Wei Wang [Thu, 18 May 2017 18:22:33 +0000 (11:22 -0700)]
tcp: initialize rcv_mss to TCP_MIN_MSS instead of 0

[ Upstream commit 499350a5a6e7512d9ed369ed63a4244b6536f4f8 ]

When tcp_disconnect() is called, inet_csk_delack_init() sets
icsk->icsk_ack.rcv_mss to 0.
This could potentially cause tcp_recvmsg() => tcp_cleanup_rbuf() =>
__tcp_select_window() call path to have division by 0 issue.
So this patch initializes rcv_mss to TCP_MIN_MSS instead of 0.

Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 years agoRevert "net: phy: Correctly process PHY_HALTED in phy_stop_machine()"
Florian Fainelli [Thu, 31 Aug 2017 00:49:29 +0000 (17:49 -0700)]
Revert "net: phy: Correctly process PHY_HALTED in phy_stop_machine()"

[ Upstream commit ebc8254aeae34226d0bc8fda309fd9790d4dccfe ]

This reverts commit 7ad813f208533cebfcc32d3d7474dc1677d1b09a ("net: phy:
Correctly process PHY_HALTED in phy_stop_machine()") because it is
creating the possibility for a NULL pointer dereference.

David Daney provide the following call trace and diagram of events:

When ndo_stop() is called we call:

 phy_disconnect()
    +---> phy_stop_interrupts() implies: phydev->irq = PHY_POLL;
    +---> phy_stop_machine()
    |      +---> phy_state_machine()
    |              +----> queue_delayed_work(): Work queued.
    +--->phy_detach() implies: phydev->attached_dev = NULL;

Now at a later time the queued work does:

 phy_state_machine()
    +---->netif_carrier_off(phydev->attached_dev): Oh no! It is NULL:

 CPU 12 Unable to handle kernel paging request at virtual address
0000000000000048, epc == ffffffff80de37ec, ra == ffffffff80c7c
Oops[#1]:
CPU: 12 PID: 1502 Comm: kworker/12:1 Not tainted 4.9.43-Cavium-Octeon+ #1
Workqueue: events_power_efficient phy_state_machine
task: 80000004021ed100 task.stack: 8000000409d70000
$ 0   : 0000000000000000 ffffffff84720060 0000000000000048 0000000000000004
$ 4   : 0000000000000000 0000000000000001 0000000000000004 0000000000000000
$ 8   : 0000000000000000 0000000000000000 00000000ffff98f3 0000000000000000
$12   : 8000000409d73fe0 0000000000009c00 ffffffff846547c8 000000000000af3b
$16   : 80000004096bab68 80000004096babd0 0000000000000000 80000004096ba800
$20   : 0000000000000000 0000000000000000 ffffffff81090000 0000000000000008
$24   : 0000000000000061 ffffffff808637b0
$28   : 8000000409d70000 8000000409d73cf0 80000000271bd300 ffffffff80c7804c
Hi    : 000000000000002a
Lo    : 000000000000003f
epc   : ffffffff80de37ec netif_carrier_off+0xc/0x58
ra    : ffffffff80c7804c phy_state_machine+0x48c/0x4f8
Status: 14009ce3        KX SX UX KERNEL EXL IE
Cause : 00800008 (ExcCode 02)
BadVA : 0000000000000048
PrId  : 000d9501 (Cavium Octeon III)
Modules linked in:
Process kworker/12:1 (pid: 1502, threadinfo=8000000409d70000,
task=80000004021ed100, tls=0000000000000000)
Stack : 8000000409a54000 80000004096bab68 80000000271bd300 80000000271c1e00
        0000000000000000 ffffffff808a1708 8000000409a54000 80000000271bd300
        80000000271bd320 8000000409a54030 ffffffff80ff0f00 0000000000000001
        ffffffff81090000 ffffffff808a1ac0 8000000402182080 ffffffff84650000
        8000000402182080 ffffffff84650000 ffffffff80ff0000 8000000409a54000
        ffffffff808a1970 0000000000000000 80000004099e8000 8000000402099240
        0000000000000000 ffffffff808a8598 0000000000000000 8000000408eeeb00
        8000000409a54000 00000000810a1d00 0000000000000000 8000000409d73de8
        8000000409d73de8 0000000000000088 000000000c009c00 8000000409d73e08
        8000000409d73e08 8000000402182080 ffffffff808a84d0 8000000402182080
        ...
Call Trace:
[<ffffffff80de37ec>] netif_carrier_off+0xc/0x58
[<ffffffff80c7804c>] phy_state_machine+0x48c/0x4f8
[<ffffffff808a1708>] process_one_work+0x158/0x368
[<ffffffff808a1ac0>] worker_thread+0x150/0x4c0
[<ffffffff808a8598>] kthread+0xc8/0xe0
[<ffffffff808617f0>] ret_from_kernel_thread+0x14/0x1c

The original motivation for this change originated from Marc Gonzales
indicating that his network driver did not have its adjust_link callback
executing with phydev->link = 0 while he was expecting it.

PHYLIB has never made any such guarantees ever because phy_stop() merely just
tells the workqueue to move into PHY_HALTED state which will happen
asynchronously.

Reported-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reported-by: David Daney <ddaney.cavm@gmail.com>
Fixes: 7ad813f20853 ("net: phy: Correctly process PHY_HALTED in phy_stop_machine()")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 years agoqlge: avoid memcpy buffer overflow
Arnd Bergmann [Wed, 23 Aug 2017 13:59:49 +0000 (15:59 +0200)]
qlge: avoid memcpy buffer overflow

[ Upstream commit e58f95831e7468d25eb6e41f234842ecfe6f014f ]

gcc-8.0.0 (snapshot) points out that we copy a variable-length string
into a fixed length field using memcpy() with the destination length,
and that ends up copying whatever follows the string:

    inlined from 'ql_core_dump' at drivers/net/ethernet/qlogic/qlge/qlge_dbg.c:1106:2:
drivers/net/ethernet/qlogic/qlge/qlge_dbg.c:708:2: error: 'memcpy' reading 15 bytes from a region of size 14 [-Werror=stringop-overflow=]
  memcpy(seg_hdr->description, desc, (sizeof(seg_hdr->description)) - 1);

Changing it to use strncpy() will instead zero-pad the destination,
which seems to be the right thing to do here.

The bug is probably harmless, but it seems like a good idea to address
it in stable kernels as well, if only for the purpose of building with
gcc-8 without warnings.

Fixes: a61f80261306 ("qlge: Add ethtool register dump function.")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 years agoipv6: fix sparse warning on rt6i_node
Wei Wang [Fri, 25 Aug 2017 22:03:10 +0000 (15:03 -0700)]
ipv6: fix sparse warning on rt6i_node

[ Upstream commit 4e587ea71bf924f7dac621f1351653bd41e446cb ]

Commit c5cff8561d2d adds rcu grace period before freeing fib6_node. This
generates a new sparse warning on rt->rt6i_node related code:
  net/ipv6/route.c:1394:30: error: incompatible types in comparison
  expression (different address spaces)
  ./include/net/ip6_fib.h:187:14: error: incompatible types in comparison
  expression (different address spaces)

This commit adds "__rcu" tag for rt6i_node and makes sure corresponding
rcu API is used for it.
After this fix, sparse no longer generates the above warning.

Fixes: c5cff8561d2d ("ipv6: add rcu grace period before freeing fib6_node")
Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 years agoipv6: add rcu grace period before freeing fib6_node
Wei Wang [Mon, 21 Aug 2017 16:47:10 +0000 (09:47 -0700)]
ipv6: add rcu grace period before freeing fib6_node

[ Upstream commit c5cff8561d2d0006e972bd114afd51f082fee77c ]

We currently keep rt->rt6i_node pointing to the fib6_node for the route.
And some functions make use of this pointer to dereference the fib6_node
from rt structure, e.g. rt6_check(). However, as there is neither
refcount nor rcu taken when dereferencing rt->rt6i_node, it could
potentially cause crashes as rt->rt6i_node could be set to NULL by other
CPUs when doing a route deletion.
This patch introduces an rcu grace period before freeing fib6_node and
makes sure the functions that dereference it takes rcu_read_lock().

Note: there is no "Fixes" tag because this bug was there in a very
early stage.

Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 years agoipv6: accept 64k - 1 packet length in ip6_find_1stfragopt()
Stefano Brivio [Fri, 18 Aug 2017 12:40:53 +0000 (14:40 +0200)]
ipv6: accept 64k - 1 packet length in ip6_find_1stfragopt()

[ Upstream commit 3de33e1ba0506723ab25734e098cf280ecc34756 ]

A packet length of exactly IPV6_MAXPLEN is allowed, we should
refuse parsing options only if the size is 64KiB or more.

While at it, remove one extra variable and one assignment which
were also introduced by the commit that introduced the size
check. Checking the sum 'offset + len' and only later adding
'len' to 'offset' doesn't provide any advantage over directly
summing to 'offset' and checking it.

Fixes: 6399f1fae4ec ("ipv6: avoid overflow of offset in ip6_find_1stfragopt")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 years agof2fs: fix a missing size change in f2fs_setattr
Yunlei He [Sun, 11 Dec 2016 07:35:15 +0000 (15:35 +0800)]
f2fs: fix a missing size change in f2fs_setattr

commit c0ed4405a99ec9be2a0f062eaafc002d8d26c99f upstream.

This patch fix a missing size change in f2fs_setattr

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: fix to access nullified flush_cmd_control pointer
Jaegeuk Kim [Thu, 8 Dec 2016 00:23:32 +0000 (16:23 -0800)]
f2fs: fix to access nullified flush_cmd_control pointer

commit 5eba8c5d1fb3af28b2073ba5228d4998196c1bcc upstream.

f2fs_sync_file()             remount_ro
 - f2fs_readonly
                               - destroy_flush_cmd_control
 - f2fs_issue_flush
   - no fcc pointer!

So, this patch doesn't free fcc in this case, but just stop its kernel thread
which sends flush commands.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: free meta pages if sanity check for ckpt is failed
Jaegeuk Kim [Tue, 6 Dec 2016 01:25:32 +0000 (17:25 -0800)]
f2fs: free meta pages if sanity check for ckpt is failed

commit a2125ff7dd1ed3a2a53cdc1f8f9c9cec9cfaa7ab upstream.

This fixes missing freeing meta pages in the error case.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: detect wrong layout
Jaegeuk Kim [Mon, 5 Dec 2016 21:56:04 +0000 (13:56 -0800)]
f2fs: detect wrong layout

commit 2040fce83fe17763b07c97c1f691da2bb85e4135 upstream.

Previous mkfs.f2fs allows small partition inappropriately, so f2fs should detect
that as well.

Refer this in f2fs-tools.

mkfs.f2fs: detect small partition by overprovision ratio and # of segments

Reported-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: call sync_fs when f2fs is idle
Jaegeuk Kim [Mon, 5 Dec 2016 19:37:14 +0000 (11:37 -0800)]
f2fs: call sync_fs when f2fs is idle

commit f455c8a5f0a24090e99249eb7280012376adec2c upstream.

The sync_fs in f2fs_balance_fs_bg must avoid interrupting current user requests.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agoRevert "f2fs: use percpu_counter for # of dirty pages in inode"
Jaegeuk Kim [Fri, 2 Dec 2016 23:11:32 +0000 (15:11 -0800)]
Revert "f2fs: use percpu_counter for # of dirty pages in inode"

commit 204706c7accfabb67b97eef9f9a28361b6201199 upstream.

This reverts commit 1beba1b3a953107c3ff5448ab4e4297db4619c76.

The perpcu_counter doesn't provide atomicity in single core and consume more
DRAM. That incurs fs_mark test failure due to ENOMEM.

Cc: stable@vger.kernel.org # 4.7+
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: return AOP_WRITEPAGE_ACTIVATE for writepage
Chao Yu [Tue, 29 Nov 2016 03:13:43 +0000 (19:13 -0800)]
f2fs: return AOP_WRITEPAGE_ACTIVATE for writepage

commit 0002b61bdaac732bcff364a18f5bd57c95def0a5 upstream.

We should use AOP_WRITEPAGE_ACTIVATE when we bypass writing pages.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Miao Xie <miaoxie@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: do not activate auto_recovery for fallocated i_size
Jaegeuk Kim [Mon, 28 Nov 2016 23:33:38 +0000 (15:33 -0800)]
f2fs: do not activate auto_recovery for fallocated i_size

commit 26787236b36660baf4d136281d40b5bb33a570ec upstream.

If a file needs to keep its i_size by fallocate, we need to turn off auto
recovery during roll-forward recovery.

This will resolve the below scenario.

1. xfs_io -f /mnt/f2fs/file -c "pwrite 0 4096" -c "fsync"
2. xfs_io -f /mnt/f2fs/file -c "falloc -k 4096 4096" -c "fsync"
3. md5sum /mnt/f2fs/file;
4. godown /mnt/f2fs/
5. umount /mnt/f2fs/
6. mount -t f2fs /dev/sdx /mnt/f2fs
7. md5sum /mnt/f2fs/file

Reported-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: fix 32-bit build
Arnd Bergmann [Tue, 22 Nov 2016 14:20:16 +0000 (15:20 +0100)]
f2fs: fix 32-bit build

commit 19c526515f6b998039d5d71fea879d255f173746 upstream.

The addition of multiple-device support broke CONFIG_BLK_DEV_ZONED
on 32-bit machines because of a 64-bit division:

fs/f2fs/f2fs.o: In function `__issue_discard_async':
extent_cache.c:(.text.__issue_discard_async+0xd4): undefined reference to `__aeabi_uldivmod'

Fortunately, bdev_zone_size() is guaranteed to return a power-of-two
number, so we can replace the % operator with a cheaper bit mask.

Fixes: 792b84b74b54 ("f2fs: support multiple devices")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: fix incorrect free inode count in ->statfs
Chao Yu [Fri, 18 Nov 2016 14:27:41 +0000 (22:27 +0800)]
f2fs: fix incorrect free inode count in ->statfs

commit b08b12d2ddc85b977a0531470cf6a7158289aaaf upstream.

While calculating inode count that we can create at most in the left space,
we should consider space which data/node blocks occupied, since we create
data/node mixly in main area. So fix the wrong calculation in ->statfs.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: drop duplicate header timer.h
Geliang Tang [Fri, 18 Nov 2016 14:21:13 +0000 (22:21 +0800)]
f2fs: drop duplicate header timer.h

commit b4ceec29219e340178baa9c5f17bf97a42951cc8 upstream.

Drop duplicate header timer.h from segment.c.

Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: fix wrong AUTO_RECOVER condition
Jaegeuk Kim [Thu, 17 Nov 2016 02:53:16 +0000 (18:53 -0800)]
f2fs: fix wrong AUTO_RECOVER condition

commit 97dd26ad834739d4e4ea35fd7ab5f92824de4cbb upstream.

If i_size is not aligned to the f2fs's block size, we should not skip inode
update during fsync.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: do not recover i_size if it's valid
Jaegeuk Kim [Wed, 16 Nov 2016 23:09:48 +0000 (15:09 -0800)]
f2fs: do not recover i_size if it's valid

commit 3a3a5ead7b6d2c9a29f493791ba23f264052db34 upstream.

If i_size is already valid during roll_forward recovery, we should not update
it according to the block alignment.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: fix fdatasync
Chao Yu [Thu, 17 Nov 2016 12:53:31 +0000 (20:53 +0800)]
f2fs: fix fdatasync

commit 281518c694a5228d6c46fac83529fb3e2c331281 upstream.

For below two cases, we can't guarantee data consistence:

a)
1. xfs_io "pwrite 0 4195328" "fsync"
2. xfs_io "pwrite 4195328 1024" "fdatasync"
3. godown
4. umount & mount
--> isize we updated before fdatasync won't be recovered

b)
1. xfs_io "pwrite -S 0xcc 0 4202496" "fsync"
2. xfs_io "fpunch 4194304 4096" "fdatasync"
3. godown
4. umount & mount
--> dnode we punched before fdatasync won't be recovered

The reason is that normally fdatasync won't be aware of modification
of metadata in file, e.g. isize changing, dnode updating, so in ->fsync
we will skip flushing node pages for above cases, result in making
fdatasynced file being lost during recovery.

Currently we have introduced DIRTY_META global list in sbi for tracking
dirty inode selectively, so in fdatasync we can choose to flush nodes
depend on dirty state of current inode in the list.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: fix to account total free nid correctly
Chao Yu [Thu, 17 Nov 2016 12:53:11 +0000 (20:53 +0800)]
f2fs: fix to account total free nid correctly

commit 04d47e673863c637a2b44ad34a558aeb5d0a727e upstream.

Thread A Thread B Thread C
- f2fs_create
 - f2fs_new_inode
  - f2fs_lock_op
   - alloc_nid
    alloc last nid
  - f2fs_unlock_op
- f2fs_create
 - f2fs_new_inode
  - f2fs_lock_op
   - alloc_nid
    as node count still not
    be increased, we will
    loop in alloc_nid
- f2fs_write_node_pages
 - f2fs_balance_fs_bg
  - f2fs_sync_fs
   - write_checkpoint
    - block_operations
     - f2fs_lock_all
 - f2fs_lock_op

While creating new inode, we do not allocate and account nid atomically,
so that when there is almost no free nids left, we may encounter deadloop
like above stack.

In order to avoid that, reuse nm_i::available_nids for accounting free nids
and make nid allocation and counting being atomical during node creation.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: fix an infinite loop when flush nodes in cp
Yunlei He [Wed, 16 Nov 2016 09:26:24 +0000 (17:26 +0800)]
f2fs: fix an infinite loop when flush nodes in cp

commit d40a43af0a57a017eba9ad2679183791587ceb6a upstream.

Thread A Thread B

- write_checkpoint
 - block_operations
   -blk_start_plug
    -sync_node_pages - f2fs_do_sync_file
 - fsync_node_pages
  - f2fs_wait_on_page_writeback

Thread A wait for global F2FS_DIRTY_NODES decreased to zero,
it start a plug list, some requests have been added to this list.
Thread B lock one dirty node page, and wait this page write back.
But this page has been in plug list of thread A with PG_writeback flag.
Thread A keep on running and its plug list has no chance to finish,
so it seems a deadlock between cp and fsync path.

This patch add a wait on page write back before set node page dirty
to avoid this problem.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Pengyang Hou <houpengyang@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: don't wait writeback for datas during checkpoint
Chao Yu [Wed, 16 Nov 2016 02:41:20 +0000 (10:41 +0800)]
f2fs: don't wait writeback for datas during checkpoint

commit 36951b38d13ac7cce9fcf89e0e01c22ed0d05688 upstream.

Normally, while committing checkpoint, we will wait on all pages to be
writebacked no matter the page is data or metadata, so in scenario where
there are lots of data IO being submitted with metadata, we may suffer
long latency for waiting writeback during checkpoint.

Indeed, we only care about persistence for pages with metadata, but not
pages with data, as file system consistent are only related to metadate,
so in order to avoid encountering long latency in above scenario, let's
recognize and reference metadata in submitted IOs, wait writeback only
for metadatas.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: fix wrong written_valid_blocks counting
Jaegeuk Kim [Tue, 15 Nov 2016 02:20:10 +0000 (18:20 -0800)]
f2fs: fix wrong written_valid_blocks counting

commit c79b7ff1d3c7710c23d8828a69d8cbc5597ad19f upstream.

Previously, written_valid_blocks was got by ckpt->valid_block_count. But if
the last checkpoint has some NEW_ADDR due to power-cut, we can get wrong value.
Fix it to get the number from actual written block count from sit entries.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: avoid BG_GC in f2fs_balance_fs
Jaegeuk Kim [Tue, 15 Nov 2016 01:38:35 +0000 (17:38 -0800)]
f2fs: avoid BG_GC in f2fs_balance_fs

commit 7702bdbe505a22380dd958e2ee35124c7c414806 upstream.

If many threads hit has_not_enough_free_secs() in f2fs_balance_fs() at the same
time, all the threads would do FG_GC or BG_GC.
In this critical path, we totally don't need to do BG_GC at all.
Let's avoid that.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: fix redundant block allocation
Jaegeuk Kim [Sat, 12 Nov 2016 00:46:40 +0000 (16:46 -0800)]
f2fs: fix redundant block allocation

commit c040ff9d69fd1d782fe577ba9e35c1f5798158ae upstream.

In direct_IO path of f2fs_file_write_iter(),
1. f2fs_preallocate_blocks(F2FS_GET_BLOCK_PRE_DIO)
   -> allocate LBA X
2. f2fs_direct_IO()
   -> return 0;

Then,
f2fs_write_data_page() will allocate another LBA X+1.

This makes EIO triggered by HM-SMR.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: use err for f2fs_preallocate_blocks
Jaegeuk Kim [Sat, 12 Nov 2016 00:31:56 +0000 (16:31 -0800)]
f2fs: use err for f2fs_preallocate_blocks

commit a7de608691f766cd148971a71d4f13aa1692d4c8 upstream.

This patch has no functional change.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: support multiple devices
Jaegeuk Kim [Fri, 7 Oct 2016 02:02:05 +0000 (19:02 -0700)]
f2fs: support multiple devices

commit 3c62be17d4f562f43fe1d03b48194399caa35aa5 upstream.

This patch implements multiple devices support for f2fs.
Given multiple devices by mkfs.f2fs, f2fs shows them entirely as one big
volume under one f2fs instance.

Internal block management is very simple, but we will modify block allocation
and background GC policy to boost IO speed by exploiting them accoording to
each device speed.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: allow dio read for LFS mode
Jaegeuk Kim [Fri, 11 Nov 2016 20:08:22 +0000 (12:08 -0800)]
f2fs: allow dio read for LFS mode

commit e57e9ae5b179a6b243c42bf6d9549d1595c27089 upstream.

We can allow dio reads for LFS mode, while doing buffered writes for dio writes.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: revert segment allocation for direct IO
Jaegeuk Kim [Fri, 11 Nov 2016 20:31:40 +0000 (12:31 -0800)]
f2fs: revert segment allocation for direct IO

commit 6ae1be13e85f4c42c8ca371fda50ae39eebbfd96 upstream.

Now we don't need to be too much careful about storage alignment for dio, since
its speed becomes quite fast and we'd better avoid any misalignment first.

Revert: 38aa0889b250 (f2fs: align direct_io'ed data to section)

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>